-
-
Save ivy/e865eca249763e2cd212758c305bccf9 to your computer and use it in GitHub Desktop.
#!/usr/bin/env ruby | |
# chasepdf2csv -- Convert Chase credit card statements from PDF to CSV. Written | |
# to easily import older statements into QuickBooks Online/Self-Employed. Chase | |
# unfortunately only offers statements up to 6 months in the past, making it a | |
# huge chore to synchronize past transactions. | |
# | |
# How to Use | |
# ---------- | |
# This script requires Ruby >2.0.0 and pdftotext. Copy this script somewhere and | |
# make it executable. Run it like any other command. | |
# | |
# ISC License | |
# ----------- | |
# Copyright (c) 2018-2020 Ivy Evans <[email protected]> | |
# | |
# Permission to use, copy, modify, and/or distribute this software for any | |
# purpose with or without fee is hereby granted, provided that the above | |
# copyright notice and this permission notice appear in all copies. | |
# | |
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH | |
# REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY | |
# AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, | |
# INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM | |
# LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR | |
# OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR | |
# PERFORMANCE OF THIS SOFTWARE. | |
require 'csv' | |
require 'optparse' | |
def error(msg) | |
STDERR.puts("error: #{msg}") | |
end | |
def fatal(msg) | |
error(msg) | |
exit 1 | |
end | |
class Statement | |
DUE_DATE_PATTERN = %r{ | |
Payment\s+Due\s+Date:? | |
\s+ | |
(?<month>\d{2})/(?<day>\d{2})/(?<year>\d{2}) | |
}x | |
class Transaction | |
# Regex for matching transactions in a Chase credit statement. | |
# | |
# Edge Case: Amazon orders | |
# | |
# 01/23 AMAZON MKTPLACE PMTS AMZN.COM/BILL WA 12.34\n | |
# Order Number 123-4567890-1234567\n | |
# | |
# Edge Case: Rewards points | |
# | |
# 01/23 AMAZON MARKETPLACE AMZN.COM/BILLWA 4.56 7,890 | |
# | |
LINE_ITEM_PATTERN = %r{ | |
(?<date>\d{2}/\d{2}) | |
\s+ | |
(?<description>.+) | |
\s+ | |
(?<amount>-?[\d,]+\.\d{2}) | |
( | |
[ ] | |
(?<points>[1-9][\d,]+)? | |
| | |
\s* | |
Order\s+Number\s+ | |
(?<order_num>[^\s]+) | |
)? | |
}x | |
def self.scan(output, year) | |
output.to_enum(:scan, LINE_ITEM_PATTERN).collect { | |
Transaction.new(Regexp.last_match, year) | |
} | |
end | |
def initialize(data, year) | |
@date = data[:date]+"/#{year}" | |
@description = data[:description] | |
@amount = data[:amount] | |
@points = data[:points] | |
@order_num = data[:order_num] | |
end | |
attr_reader :date, :amount, :points, :order_num | |
alias rewards? points | |
alias order_num? order_num | |
def description | |
order_num? ? "#{@description} ##{order_num}" : @description | |
end | |
def to_hash | |
{ | |
date: date, | |
description: description, | |
amount: amount, | |
points: points, | |
order_num: order_num, | |
} | |
end | |
alias to_h to_hash | |
end | |
attr_reader :line_items | |
def self.parse(path) | |
output = `pdftotext -raw #{path} -` | |
unless $?.success? | |
fatal "pdftotext: failed to parse #{path} (exit code #{$?})" | |
end | |
unless m = output.match(DUE_DATE_PATTERN) | |
fatal "parse error: could not match due date in #{path}" | |
end | |
new(Transaction.scan(output, m[:year])) | |
end | |
def initialize(line_items) | |
@line_items = line_items | |
end | |
def each_line_item(&block) | |
line_items.each(&block) | |
end | |
end | |
def main(args = ARGV) | |
unless system('command -v pdftotext >/dev/null 2>&1') | |
fatal "error: pdftotext not found!" | |
end | |
outfile = STDOUT | |
options = OptionParser.new do |opts| | |
opts.banner = "Usage: #{$0} [options] FILE..." | |
opts.on('-o', '--output=FILE', 'Output to file') do |path| | |
outfile = File.open(path, 'w') | |
end | |
opts.on('-h', '--help', 'Show this message') do | |
puts opts | |
exit | |
end | |
end | |
options.parse!(args) | |
if ARGV.empty? | |
fatal "error: no files specified" | |
exit 1 | |
end | |
csv = CSV.new( | |
outfile, headers: %w[Date Description Amount], write_headers: true, | |
) | |
ARGV.each do |file| | |
Statement.parse(file).each_line_item do |line_item| | |
next if line_item.rewards? | |
csv << [ | |
line_item.date, line_item.description, line_item.amount | |
] | |
end | |
end | |
end | |
if $0 == __FILE__ | |
main | |
end |
@ivy, thanks for making this script :), I have an Amazon chase account where the due date format was a little different (I wonder if @saveriob was running into the same problem). All I had to do was change the DUE_DATE_PATTERN regex to
DUE_DATE_PATTERN = %r{
Payment\s+Due\s+Date**:**
\s+
(?\d{2})**/(?\d{2})**/(?\d{2})
}x
The due date format for me was: "Payment Due Date: 08/27/19"
Here's a link to the regex with a test string: https://rubular.com/r/m5jpYzX1EFKd6u
Thanks again and a Happy New Year!
@ivy, thanks for making this script :), I have an Amazon chase account where the due date format was a little different (I wonder if @saveriob was running into the same problem). All I had to do was change the DUE_DATE_PATTERN regex to
DUE_DATE_PATTERN = %r{
Payment\s+Due\s+Date**:**
\s+
(?\d{2})/(?\d{2})/(?\d{2})
}xThe due date format for me was: "Payment Due Date: 08/27/19"
Here's a link to the regex with a test string: https://rubular.com/r/m5jpYzX1EFKd6uThanks again and a Happy New Year!
My due date was also a little different.
DUE_DATE_PATTERN = %r{ Payment\s+Due\s+Date:\s+ (?<month>\d{2})/(?<day>\d{2})/(?<year>\d{2}) }x
This pattern works for me.
I'm so glad you're all getting some use out of this script. It's honestly kind of amazing to see that years later people are still finding it useful!
So from what I understand, the change in the regex is actually just the very small of a colon after "Payment Due Date". I'll update the Gist to reflect that. 😄
Yup, colon seems to be the key here. It isn't visible in the PDF but it does show up in the pdftotext parsed output.
I took this out for a spin today and was getting this error:
./chasepdf2csv.ruby:117:in `match': invalid byte sequence in UTF-8 (ArgumentError)
I'm not sure exactly what was triggering it... the pdftotext output does include some "special characters" like the registered trademark circle-R symbol.
I've worked around this by changing the pdftotext invocation so that it just replaces any high-ascii character value with a space. Like so:
output = `pdftotext -raw #{path} - | LC_ALL=C sed $'s/[\x80-\xFF]/ /g'
(This is on macOS, so it's using BSD sed ... the relevant sed invocation may look different on other OS.)
I'm not saying this is a great solution :-) just an FYI about what I encountered.
Also there's a quirk in the use of due date to determine the transaction date; for a December statement that has a due date in January of the next year, this will put the wrong year on the transactions. Not sure if there's a neat way to solve that though.
Thanks for the script, it helped me search through a big old pile of old statements!
Anyone find a script like this for chase bank statements? also, is there a READ.ME to understand how to use the script?
I was about to use this script, but Chase already has a feature where you can download transactions directly into other formats without the need to use a script.
- On your accounts overview, scroll down to the Account activity section.
- Underneath the section header, click the Download account activity icon all the way to the right
- Select the filters you want to apply:
- Account: Which Chase account you want. Seems to only let you select a single account at a time.
- File type: Which file extension you want. Supports CSV, QFX, QIF, and QBO.
- Activity: Which transactions you want to export. Defaults to the filters set before the Download account activity icon was clicked. Can download all transaction, filter by statement, or specify a custom date range.
Chase also has a standard API (for support in things like Plaid or Mint) and it may be worth registering for a developer account to create an API-based utility so that others don't need to have Ruby installed to take advantage of this utility.
FYI: The Account Activity download seems to use https://secure07b.chase.com/svc/rr/accounts/secure/v2/account/activity/card/download as the URL and you can probably use Postman to figure how to send the right payload to do all the things you want.
@TheLandolorien While this is a nice idea, unfortunately that download doesn’t include the order numbers which is essential data for parsing these transactions
So on a whim I asked ChatGPT to convert this to Python… and it did and it works! Here it is linked if anyone wants to use it:
https://gist.github.com/Enquier/241401089eccd257b8e6f0002b962e8f
I was about to use this script, but Chase already has a feature where you can download transactions directly into other formats without the need to use a script.
On your accounts overview, scroll down to the Account activity section.
Underneath the section header, click the Download account activity icon all the way to the right
Select the filters you want to apply:
- Account: Which Chase account you want. Seems to only let you select a single account at a time.
- File type: Which file extension you want. Supports CSV, QFX, QIF, and QBO.
- Activity: Which transactions you want to export. Defaults to the filters set before the Download account activity icon was clicked. Can download all transaction, filter by statement, or specify a custom date range.
Chase also has a standard API (for support in things like Plaid or Mint) and it may be worth registering for a developer account to create an API-based utility so that others don't need to have Ruby installed to take advantage of this utility.
FYI: The Account Activity download seems to use https://secure07b.chase.com/svc/rr/accounts/secure/v2/account/activity/card/download as the URL and you can probably use Postman to figure how to send the right payload to do all the things you want.
Unfortunately, this method only allows up to 24 months or ~1500 transactions max. If you need 7 years of transactions, which Chase keeps, they tell you to download the PDF statements.
@saveriob Sorry to hear you're having trouble! I actually haven't used this script in a while. I'd definitely recommend using pry and dropping in some
binding.pry
lines if you're trying to figure out exactly what's going on. Also, feel free to paste whatever errors you're running into (minus any personally identifying information 😄).