-
-
Save konklone/1565821 to your computer and use it in GitHub Desktop.
Get rid of fluff on fields in a CSV
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
""" | |
Get rid of white space and periods on the old file, and ensure the new one uses the same CSV quoting conventions, so we can run a diff without being distracted those differences. | |
""" | |
import csv | |
directories = ["luke", "sunlight"] | |
base = "2011Q3-summary" | |
for directory in directories: | |
fin = csv.reader(open('%s/%s.csv' % (directory, base),'r')) | |
fout = csv.writer(open('%s/%s-stripped.csv' % (directory, base),'w')) | |
for line in fin: | |
newline = [] | |
for field in line: | |
newline.append( field.strip().strip('.').strip() ) | |
fout.writerow(newline) | |
""" | |
run this: | |
diff --suppress-common-lines -y -W 1500 old-detail-stripped.csv new-detail-stripped.csv > diff.txt | |
and you should see differences only in lines where the RECIP (orig) started with DO, like DOUG--the old script erroneously replaced those with the name above it! | |
There are also some 'government contributions' lines that will show up in the diff. They were spaced wrong before, and now they are displaying correctly. | |
""" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment