Skip to content

Instantly share code, notes, and snippets.

@ncanceill
Last active December 27, 2015 00:49
Show Gist options
  • Save ncanceill/7240884 to your computer and use it in GitHub Desktop.
Save ncanceill/7240884 to your computer and use it in GitHub Desktop.
This script examines the TJ operators in a PDF file
#!/bin/zsh
#
#
#
# pdf_tj.sh
#
# This script examines the TJ operators in a PDF file.
#
# It requires QPDF <http://qpdf.sourceforge.net/> to decompress PDF files.
# It requires AWK and R to process the values.
#
# Usage: ./pdf_tj.sh <pdf_file>
#
# Copyright (C) 2013 Nicolas Canceill
#
#
# Static
#
qdf="/tmp/tj.qdf"
csv="/tmp/tj.csv"
raw="/tmp/tj"
hist_pdf="/tmp/tj.pdf"
rgx_tjb='\[.*\]TJ'
rgx_tj='\)[-]?[0-9]+\('
awk_csv='{print "value,count"} BEGIN {printf "%s,%s\n", $2, $1}'
awk_hist='{printf "%s\t%s\t", $2, $1; for (i=1; i<$1; i++) {printf "#"}; printf "\n"}'
r_hist='pdf("'$hist_pdf'")'"\n"'hist(as.numeric(read.table("'$raw'")[,1]),freq=TRUE,breaks=200)'"\n"'dev.off()'
#
# Script
#
# Decompress PDF
qpdf $1 $qdf --qdf --stream-data=uncompress
# Get TJ values
grep -aoE $rgx_tjb $qdf | grep -aoE $rgx_tj | tr -d '()' > $raw
# Compute
#cat $raw | sort -n | uniq -c | awk $awk_csv > $csv
#cat $raw | sort -n | uniq -c | awk $awk_hist
echo $r_hist | R --vanilla
# Clean
rm $qpdf
rm $raw
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment