Last active
October 2, 2024 19:10
-
-
Save averagesecurityguy/ba8d9ed3c59c1deffbd1390dafa5a3c2 to your computer and use it in GitHub Desktop.
Decompress FlateDecode Objects in PDF
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
# This script is designed to do one thing and one thing only. It will find each | |
# of the FlateDecode streams in a PDF document using a regular expression, | |
# unzip them, and print out the unzipped data. You can do the same in any | |
# programming language you choose. | |
# | |
# This is NOT a generic PDF decoder, if you need a generic PDF decoder, please | |
# take a look at pdf-parser by Didier Stevens, which is included in Kali linux. | |
# https://tools.kali.org/forensics/pdf-parser. | |
# | |
# Any requests to decode a PDF will be ignored. | |
import re | |
import zlib | |
pdf = open("some_doc.pdf", "rb").read() | |
stream = re.compile(rb'.*?FlateDecode.*?stream(.*?)endstream', re.S) | |
for s in stream.findall(pdf): | |
s = s.strip(b'\r\n') | |
try: | |
print(zlib.decompress(s)) | |
print("") | |
except: | |
pass |
Add import sys
and replace "some_doc.pdf"
by sys.argv[1]
for a generic pdf flat decode command line tool.
#!/usr/bin/env python3
# This script is designed to do one thing and one thing only. It will find each
# of the FlateDecode streams in a PDF document using a regular expression,
# unzip them, and print out the unzipped data. You can do the same in any
# programming language you choose.
#
# This is NOT a generic PDF decoder, if you need a generic PDF decoder, please
# take a look at pdf-parser by Didier Stevens, which is included in Kali linux.
# https://tools.kali.org/forensics/pdf-parser.
#
# Any requests to decode a PDF will be ignored.
import re
import zlib
import sys
pdf = open(sys.argv[1], "rb").read()
stream = re.compile(rb'.*?FlateDecode.*?stream(.*?)endstream', re.S)
for s in stream.findall(pdf):
s = s.strip(b'\r\n')
try:
print(zlib.decompress(s))
print("")
except:
pass
Who are you writing this to?
…On Thu, Jul 29, 2021 at 11:45 PM mikodham ***@***.***> wrote:
***@***.**** commented on this gist.
------------------------------
My pdf has structure like this, it comes from ezpdf Korean Text
2 0 obj
<</N 357/Length 538/Filter/FlateDecode>>
stream
6q|ß”�üD/�}y+wlØ—C-
*¥f}RþñöôÑä�á¨4I–µûóß ��&†lDíé� 57
ûsâ�)dÉêà�5nÀœåÈá.�™æ�#�¹aY5ÅyštþÖ ¨:)³5Ò¤u¼��¢„�¼"�,�F34i�¨húÊ�ˆ)¾��@��ÑŽ3�²ï
�”Î�”w�v1|�ç²Ãµ‰†ÈeǾ/«YÖçú\êÝ{¡S¨nÌI?�üíu‡�´Ë�òìJnÔéÔ]õcÇ"�tø�î£�¯�Ÿ™x8´Î\{w‘2bp^(}±¡j�ÀÀîù
¤d#dªÈM&C1äO�"�ÃŒÕÃ;Ž•äf°¶àñ"l…â‡ÎÔãYõUÕ†s+˜xúC�_a��]»àÃÕ:&Âí�°Y1�Š„f’ÈlÚ�Ÿ›Ô��þ)Øß�±�üÍ¥�÷Š²<»ó_a–q\Ä
3+@²û²“Ù–Ÿ�HD�c&ÆP;�Ïvîßüè;À}�]h‹„Ÿø���¢ó&h�ÑgÞ�œn̨[gõ2áGö%�7›�_@À»‚iÉk”î
XûTíÐÉ›�0ÑFƒ–J}Û*ŠTÃ**EŠ/°ZMYõàÝ���äuN?±I˜%Bç—«bý™
x±ùß6¬hâ�Î[CR�ûoo çâ)ævÎZ¦7'€DŠÓáýLŸ†$†o2�mø?+�êàï� �IVL
�=��!Á´†…ëìÿ1Ã'lú�ªb�
endstream
endobj
When I tried using your code @averagesecurityguy
<https://github.com/averagesecurityguy> , it leaves no file.
If I put "print(zlib.decompress(s))" before the try, the following error
comes out.
line 11, in
print(zlib.decompress(s))
zlib.error: Error -3 while decompressing data: incorrect header check
Any solution/suggestion?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://gist.github.com/ba8d9ed3c59c1deffbd1390dafa5a3c2#gistcomment-3836813>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AMWBNXIBGRVAMS66D5O6BEDT2JDCLANCNFSM4IGDOR5A>
.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@BYugandhar
You have posted text
HOWEVER in the PDF those are not text (in common with every computer file on this planet that is a BINARY bitSTREAM.
When we open such a file in a TEXT editor we see the Binary BYTES as characters like ABCDEFG or ������� when not A-Z or other normal ASCII text characters.
When you cut and paste such ANSI text say from MS Notepad to MS Notepad in ANSI mode, most of the characters (EXCEPT [None]) will actually be uncorrupted and thus potentially usable. Here is an ANSI view of such text NOTE there are very few
����
HOWEVER when paste or save as plain text that one missing [none] is critical and all saves are usually corrupted, such that Fonts and Images that depend on that nul and void character fail back to blank leaving pages bare of data. sadly the Equation for PDF is 255/256 NEQ <00>
What can often happen in such cases, MAY BE the decode fails and I often see returns, Rubbish In, Rubbish Out