Created
October 7, 2011 05:48
-
-
Save lukerosiak/1269562 to your computer and use it in GitHub Desktop.
Get rid of fluff on fields in a CSV
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Ensure the new and old fields uses the same CSV quoting conventions and format decimals the same way (15.00 vs 15 and 16.10 vs 16.1), so we can run a diff without being distracted those differences. | |
""" | |
import csv | |
fin = csv.reader(open('../../archives/3_csv_original/2011Q3-summary-sunlight.csv','r')) | |
fout = csv.writer(open('../../archives/3_csv_original/2011Q3-summary-sunlight-stripped.csv','w')) | |
for line in fin: | |
newline = [] | |
i = 0 | |
for field in line: | |
field = field.strip().strip('.').replace(',','').strip() | |
if i>3 and field not in ["YTD","AMOUNT"]: #number. resolve precision issue | |
field = float(field) | |
newline.append( field ) | |
i = i+1 | |
fout.writerow(newline) | |
fin = csv.reader(open('2011Q3-house-disburse-summary.csv','r')) | |
fout = csv.writer(open('2011Q3-house-disburse-summary-stripped.csv','w')) | |
for line in fin: | |
newline = [] | |
i = 0 | |
for field in line: | |
field = field.strip().strip('.').replace(',','').strip() | |
if i>3 and field not in ["YTD","AMOUNT"]: #number. resolve precision issue | |
field = float(field) | |
newline.append( field ) | |
i = i+1 | |
fout.writerow(newline) | |
""" | |
run this: | |
diff --suppress-common-lines -y -W 1500 old-detail-stripped.csv new-detail-stripped.csv > diff.txt | |
""" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment