Skip to content

Instantly share code, notes, and snippets.

@simonw
Last active September 28, 2024 08:10
Show Gist options
  • Save simonw/8aa492e59265c1a021f5c5618f9e6b12 to your computer and use it in GitHub Desktop.
Save simonw/8aa492e59265c1a021f5c5618f9e6b12 to your computer and use it in GitHub Desktop.
How to recover lost Python source code if it's still resident in-memory

How to recover lost Python source code if it's still resident in-memory

I screwed up using git ("git checkout --" on the wrong file) and managed to delete the code I had just written... but it was still running in a process in a docker container. Here's how I got it back, using https://pypi.python.org/pypi/pyrasite/ and https://pypi.python.org/pypi/uncompyle6

Attach a shell to the docker container

Install GDB (needed by pyrasite)

apt-get update && apt-get install gdb

Install pyrasite - this will let you attach a Python shell to the still-running process

pip install pyrasite

Install uncompyle6, which will let you get Python source code back from in-memory code objects

pip install uncompyle6

Find the PID of the process that is still running

ps aux | grep python

Attach an interactive prompt using pyrasite

pyrasite-shell <PID>

Now you're in an interactive prompt! Import the code you need to recover

>>> from my_package import my_module

Figure out which functions and classes you need to recover

>>> dir(my_module)
['MyClass', 'my_function']

Decompile the function into source code

>>> import uncompyle6
>>> import sys
>>> uncompyle6.main.uncompyle(
    2.7, my_module.my_function.func_code, sys.stdout
)
# uncompyle6 version 2.9.10
# Python bytecode 2.7
# Decompiled from: Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
# [GCC 5.4.0 20160609]
# Embedded file name: /srv/my_package/my_module.py
function_body = "appears here"

For the class, you'll need to decompile each method in turn

>>> uncompyle6.main.uncompyle(
    2.7, my_module.MyClass.my_method.im_func.func_code, sys.stdout
)
# uncompyle6 version 2.9.10
# Python bytecode 2.7
# Decompiled from: Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
# [GCC 5.4.0 20160609]
# Embedded file name: /srv/my_package/my_module.py
class_method_body = "appears here"
@NickSB2000
Copy link

Excellent, this has the potential to eliminate a swear and/or impress a colleague.. :-)

@Neko-Design
Copy link

Awesome! Given the stupid number of times I've done exactly this im sure I'll get a chance to try it in anger soon enough

@i336
Copy link

i336 commented Mar 12, 2017

FYI, this feels incredibly complicated. Here's a much simpler method that universally applies to any process and will probably recover the original source, or very close to it - for example I used this approach to recover some text from a textbox in Chrome when an undo operation went awry recently. Using Python as an example:

$ python
>>> x = "QqWwEeRrTtYy"

(Leave that running, then...)

$ gdb -p $(pidof python)
...
0xb7414b08 in ___newselect_nocancel () from /lib/libc.so.6
(gdb) generate-core-file pythontest.dump
Saved corefile pythontest.dump
(gdb) quit
A debugging session is active.

        Inferior 1 [process 14970] will be detached.

Quit anyway? (y or n) y
Detaching from program: /usr/bin/python2.7, process 14970
$ grep -o QqWw pythontest.dump 
Binary file pythontest.dump matches
$ grep -ao QqWw pythontest.dump 
QqWw
QqWw
QqWw
QqWw
QqWw
QqWw
bash-4.3$ grep -a QqWw pythontest.dump 
...libxml2.ph....   >>>  = "QqWwEeRrTtYy   >> x = "QqWwEeRrTtYy"ntel        st-0x = "QqWwEeRrTtYy"
(...)
 = "QqWwEeRrTtYy" ··¸$.·xtermi336ÀÛr·åÿÿÿÿ Return a wrapped version of file which provides transparent
ÀÛr·ÿÿÿÿencodings.latin_1ÀÛr·3AÄencodings.latin_1É*·þÿÿÿ`är·\·\·L·þÿÿÿ`är·¬Ì(··H·è·ýÿÿÿ`är·dÙ·,з ÷r·1· ·ýÿÿÿ parse_and_bindacheÀÛr·WIoDread_history_fileÀÛr·ÓcVûwrite_history_fileÀÛrÉ3Ïget_completerÀÛr·73>get_completion_typeÀÛr·vÁÄremove_history_itemÀÛr·0Q¦set_startup_hookÀÛr·
.Öclear_historyÀÛr·Åù_READLINE_VERSION@·ÀÛr·ÿÿÿÿeRrTtYy"ÀÛr·
                                                            @Q£QqWwEeRrTtYyTtYy"
òlS·àSw·x = "QqWwEeRrTtYy"
$ 64;1;2;6;9;15;18;21;22c^C
$ ^C

Left in some of the binary asplosion for fun; this is a Unicode world now after all, it shouldn't cause any issues. As you can see, some of the data (a ridiculously small amount here) is mangled, but I see at least three intact copies of my original text. YMMV depending on what malloc implementation your app is using and how much fragmentation happened.

Here's one to file away if you frequently use Linux:

configure enough swapspace on your system, then in an absolute emergency open a terminal and run sync then echo disk > /sys/power/state or pm-hibernate to trigger system hibernation. Of course, this process requires a full copy of memory to be written to the disk... :) reboot your system off a flash drive for best results analysing the disk. WARNING: It feels horribly unintuitive but you must sync your disk before hibernating unless you know you'll be able to successfully resume off of the hibernated memory image, because of course hibernating means that whatever the filesystem was doing is immediately abandoned in-flight, with the idea that it will be finished when the system wakes back up! If you never resume, that in-memory filesystem data never makes it to disk. Ideally you'd copy the memory image somewhere then resume from the hibernated image; it might be worth figuring out how to do that on your system.

And of course this is all because Linux doesn't provide arbitrary access to memory. Kinda crazy that it's not generally possible, but it's understandable.

@ancat
Copy link

ancat commented Mar 12, 2017

@i336 what you're likely seeing is the buffer of the interactive shell history. Testing python <file> on a file that gets deleted yields "random" code fragments (or the entirety, for very tiny programs) here and there but not the entire source. I used gdb to search across the entirety of memory space and couldn't recover the source code for any programs larger than a few lines.

@tleeuwenburg
Copy link

For what it's worth, I did something similar recently with git and went a different path to recovery based on 'git fsck' and retrieving the files from hashed objects stored in git. Kudos to your fantastic recovery strategy though!

@odino
Copy link

odino commented Mar 24, 2017

What about docker cp? :)

@seralf
Copy link

seralf commented Mar 25, 2017

well done! :-)

@prem-narain
Copy link

Awesome !! Thanks !!

@davidtgq
Copy link

davidtgq commented May 2, 2017

Stuck on this step: pyrasite-shell <PID> I just get a blank line. If I type any command and press enter, nothing happens.

@smiddela
Copy link

smiddela commented Mar 8, 2018

Me too same problem

Stuck on this step: pyrasite-shell I just get a blank line.

@HolyShitMan
Copy link

@davidtgq @smiddela:
Did you install gdb? It's needet to run pyrasite.

@richard-scott
Copy link

I saw this in this issue, it said to try running this before pyrasite-shell:

echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope

It helped me.

@govcert-ch
Copy link

I also have the freezing problem, but ptrace did not help (it's on Ubuntu 18.04). Debug (verbose==True added to inject call) says

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007fc582b13ff7 in __GI___select (nfds=0, readfds=0x0, writefds=0x0, exceptfds=0x0, timeout=0x7ffc162924a0) at ../sysdeps/unix/sysv/linux/select.c:41

41      ../sysdeps/unix/sysv/linux/select.c: No such file or directory.
'PyGILState_Ensure' has unknown return type; cast the call to its declared return type
'PyRun_SimpleString' has unknown return type; cast the call to its declared return type
History has not yet reached $1.

Any ideas what that means?

@user202729
Copy link

I also have the freezing problem, but ptrace did not help (it's on Ubuntu 18.04). Debug (verbose==True added to inject call) says

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007fc582b13ff7 in __GI___select (nfds=0, readfds=0x0, writefds=0x0, exceptfds=0x0, timeout=0x7ffc162924a0) at ../sysdeps/unix/sysv/linux/select.c:41

41      ../sysdeps/unix/sysv/linux/select.c: No such file or directory.
'PyGILState_Ensure' has unknown return type; cast the call to its declared return type
'PyRun_SimpleString' has unknown return type; cast the call to its declared return type
History has not yet reached $1.

Any ideas what that means?

Known bug. See lmacken/pyrasite#75 (comment) .

@iPurya
Copy link

iPurya commented May 5, 2021

i tried for cpython i can get shell access but i cant read codes. do yo have any idea for this situation ?

@rodmur
Copy link

rodmur commented Aug 4, 2022

Hi, this all appears to have changed for Python 3, it appears the uncompyle6.main.uncompyle() function is gone in favor of uncompyle6.main.decompile().

Also, what would the "my_package" be named if you're just trying to recover a simple python script with no package or module? It doesn't appear __main__ works.

@tom-flamelit
Copy link

You just hit front page of hacker news!

Does this still work with recent Python versions?

@tg12
Copy link

tg12 commented May 17, 2024

process in a docker container.

Why could you not just jump in the docker container and copy the python file out?

@ZM-J
Copy link

ZM-J commented Sep 28, 2024

Hi, this all appears to have changed for Python 3, it appears the uncompyle6.main.uncompyle() function is gone in favor of uncompyle6.main.decompile().

Also, what would the "my_package" be named if you're just trying to recover a simple python script with no package or module? It doesn't appear __main__ works.

I got an error when I tried putting a code object into decompile:

def f(a, b):
    return a + b

import sys
uncompyle6.main.decompile(
    3.8, f.__code__, sys.stdout
)

And I got:

Traceback (most recent call last):
  File "kkp.py", line 20, in <module>
    uncompyle6.main.decompile(
  File "py38\lib\site-packages\uncompyle6\main.py", line 104, in decompile
    assert iscode(co), f"""{co} does not smell like code"""
AssertionError: 3.8 does not smell like code

I'm not sure what happened though.

@ZM-J
Copy link

ZM-J commented Sep 28, 2024

For those who might encounter similar issues just as mine, the order of input parameters of the function decompile should go like this:

def f(a, b):
    return a + b

import sys
uncompyle6.main.decompile(
    f.__code__, (3, 8), sys.stdout
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment