Skip to content

Instantly share code, notes, and snippets.

@ivy
Last active March 22, 2024 20:47
Show Gist options
  • Save ivy/e865eca249763e2cd212758c305bccf9 to your computer and use it in GitHub Desktop.
Save ivy/e865eca249763e2cd212758c305bccf9 to your computer and use it in GitHub Desktop.
Convert Chase credit card statements from PDF to CSV. Useful for importing old transactions into QuickBooks Online/Self-Employed.
#!/usr/bin/env ruby
# chasepdf2csv -- Convert Chase credit card statements from PDF to CSV. Written
# to easily import older statements into QuickBooks Online/Self-Employed. Chase
# unfortunately only offers statements up to 6 months in the past, making it a
# huge chore to synchronize past transactions.
#
# How to Use
# ----------
# This script requires Ruby >2.0.0 and pdftotext. Copy this script somewhere and
# make it executable. Run it like any other command.
#
# ISC License
# -----------
# Copyright (c) 2018-2020 Ivy Evans <[email protected]>
#
# Permission to use, copy, modify, and/or distribute this software for any
# purpose with or without fee is hereby granted, provided that the above
# copyright notice and this permission notice appear in all copies.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH
# REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY
# AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT,
# INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM
# LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR
# OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
# PERFORMANCE OF THIS SOFTWARE.
require 'csv'
require 'optparse'
def error(msg)
STDERR.puts("error: #{msg}")
end
def fatal(msg)
error(msg)
exit 1
end
class Statement
DUE_DATE_PATTERN = %r{
Payment\s+Due\s+Date:?
\s+
(?<month>\d{2})/(?<day>\d{2})/(?<year>\d{2})
}x
class Transaction
# Regex for matching transactions in a Chase credit statement.
#
# Edge Case: Amazon orders
#
# 01/23 AMAZON MKTPLACE PMTS AMZN.COM/BILL WA 12.34\n
# Order Number 123-4567890-1234567\n
#
# Edge Case: Rewards points
#
# 01/23 AMAZON MARKETPLACE AMZN.COM/BILLWA 4.56 7,890
#
LINE_ITEM_PATTERN = %r{
(?<date>\d{2}/\d{2})
\s+
(?<description>.+)
\s+
(?<amount>-?[\d,]+\.\d{2})
(
[ ]
(?<points>[1-9][\d,]+)?
|
\s*
Order\s+Number\s+
(?<order_num>[^\s]+)
)?
}x
def self.scan(output, year)
output.to_enum(:scan, LINE_ITEM_PATTERN).collect {
Transaction.new(Regexp.last_match, year)
}
end
def initialize(data, year)
@date = data[:date]+"/#{year}"
@description = data[:description]
@amount = data[:amount]
@points = data[:points]
@order_num = data[:order_num]
end
attr_reader :date, :amount, :points, :order_num
alias rewards? points
alias order_num? order_num
def description
order_num? ? "#{@description} ##{order_num}" : @description
end
def to_hash
{
date: date,
description: description,
amount: amount,
points: points,
order_num: order_num,
}
end
alias to_h to_hash
end
attr_reader :line_items
def self.parse(path)
output = `pdftotext -raw #{path} -`
unless $?.success?
fatal "pdftotext: failed to parse #{path} (exit code #{$?})"
end
unless m = output.match(DUE_DATE_PATTERN)
fatal "parse error: could not match due date in #{path}"
end
new(Transaction.scan(output, m[:year]))
end
def initialize(line_items)
@line_items = line_items
end
def each_line_item(&block)
line_items.each(&block)
end
end
def main(args = ARGV)
unless system('command -v pdftotext >/dev/null 2>&1')
fatal "error: pdftotext not found!"
end
outfile = STDOUT
options = OptionParser.new do |opts|
opts.banner = "Usage: #{$0} [options] FILE..."
opts.on('-o', '--output=FILE', 'Output to file') do |path|
outfile = File.open(path, 'w')
end
opts.on('-h', '--help', 'Show this message') do
puts opts
exit
end
end
options.parse!(args)
if ARGV.empty?
fatal "error: no files specified"
exit 1
end
csv = CSV.new(
outfile, headers: %w[Date Description Amount], write_headers: true,
)
ARGV.each do |file|
Statement.parse(file).each_line_item do |line_item|
next if line_item.rewards?
csv << [
line_item.date, line_item.description, line_item.amount
]
end
end
end
if $0 == __FILE__
main
end
@Amartya
Copy link

Amartya commented Dec 31, 2019

@ivy, thanks for making this script :), I have an Amazon chase account where the due date format was a little different (I wonder if @saveriob was running into the same problem). All I had to do was change the DUE_DATE_PATTERN regex to
DUE_DATE_PATTERN = %r{
Payment\s+Due\s+Date**:**
\s+
(?\d{2})**/(?\d{2})**/(?\d{2})
}x

The due date format for me was: "Payment Due Date: 08/27/19"
Here's a link to the regex with a test string: https://rubular.com/r/m5jpYzX1EFKd6u

Thanks again and a Happy New Year!

@oceanapplications
Copy link

@ivy, thanks for making this script :), I have an Amazon chase account where the due date format was a little different (I wonder if @saveriob was running into the same problem). All I had to do was change the DUE_DATE_PATTERN regex to
DUE_DATE_PATTERN = %r{
Payment\s+Due\s+Date**:**
\s+
(?\d{2})/(?\d{2})/(?\d{2})
}x

The due date format for me was: "Payment Due Date: 08/27/19"
Here's a link to the regex with a test string: https://rubular.com/r/m5jpYzX1EFKd6u

Thanks again and a Happy New Year!

My due date was also a little different.

DUE_DATE_PATTERN = %r{ Payment\s+Due\s+Date:\s+ (?<month>\d{2})/(?<day>\d{2})/(?<year>\d{2}) }x

This pattern works for me.

@ivy
Copy link
Author

ivy commented Dec 8, 2020

I'm so glad you're all getting some use out of this script. It's honestly kind of amazing to see that years later people are still finding it useful!

So from what I understand, the change in the regex is actually just the very small of a colon after "Payment Due Date". I'll update the Gist to reflect that. 😄

@oceanapplications
Copy link

Yup, colon seems to be the key here. It isn't visible in the PDF but it does show up in the pdftotext parsed output.

@neogeographica
Copy link

I took this out for a spin today and was getting this error:
./chasepdf2csv.ruby:117:in `match': invalid byte sequence in UTF-8 (ArgumentError)

I'm not sure exactly what was triggering it... the pdftotext output does include some "special characters" like the registered trademark circle-R symbol.

I've worked around this by changing the pdftotext invocation so that it just replaces any high-ascii character value with a space. Like so:
output = `pdftotext -raw #{path} - | LC_ALL=C sed $'s/[\x80-\xFF]/ /g'

(This is on macOS, so it's using BSD sed ... the relevant sed invocation may look different on other OS.)

I'm not saying this is a great solution :-) just an FYI about what I encountered.

@neogeographica
Copy link

Also there's a quirk in the use of due date to determine the transaction date; for a December statement that has a due date in January of the next year, this will put the wrong year on the transactions. Not sure if there's a neat way to solve that though.

Thanks for the script, it helped me search through a big old pile of old statements!

@yoshie902a
Copy link

Anyone find a script like this for chase bank statements? also, is there a READ.ME to understand how to use the script?

@TheLandolorien
Copy link

TheLandolorien commented Aug 15, 2021

I was about to use this script, but Chase already has a feature where you can download transactions directly into other formats without the need to use a script.

  1. On your accounts overview, scroll down to the Account activity section.
  2. Underneath the section header, click the Download account activity icon all the way to the right
    image
  3. Select the filters you want to apply:
    • Account: Which Chase account you want. Seems to only let you select a single account at a time.
    • File type: Which file extension you want. Supports CSV, QFX, QIF, and QBO.
    • Activity: Which transactions you want to export. Defaults to the filters set before the Download account activity icon was clicked. Can download all transaction, filter by statement, or specify a custom date range.
      image

Chase also has a standard API (for support in things like Plaid or Mint) and it may be worth registering for a developer account to create an API-based utility so that others don't need to have Ruby installed to take advantage of this utility.

FYI: The Account Activity download seems to use https://secure07b.chase.com/svc/rr/accounts/secure/v2/account/activity/card/download as the URL and you can probably use Postman to figure how to send the right payload to do all the things you want.

@Enquier
Copy link

Enquier commented Feb 25, 2023

@TheLandolorien While this is a nice idea, unfortunately that download doesn’t include the order numbers which is essential data for parsing these transactions

@Enquier
Copy link

Enquier commented Mar 21, 2023

So on a whim I asked ChatGPT to convert this to Python… and it did and it works! Here it is linked if anyone wants to use it:

https://gist.github.com/Enquier/241401089eccd257b8e6f0002b962e8f

@ph47s74x
Copy link

I was about to use this script, but Chase already has a feature where you can download transactions directly into other formats without the need to use a script.

  1. On your accounts overview, scroll down to the Account activity section.

  2. Underneath the section header, click the Download account activity icon all the way to the right
    image

  3. Select the filters you want to apply:

    • Account: Which Chase account you want. Seems to only let you select a single account at a time.
    • File type: Which file extension you want. Supports CSV, QFX, QIF, and QBO.
    • Activity: Which transactions you want to export. Defaults to the filters set before the Download account activity icon was clicked. Can download all transaction, filter by statement, or specify a custom date range.
      image

Chase also has a standard API (for support in things like Plaid or Mint) and it may be worth registering for a developer account to create an API-based utility so that others don't need to have Ruby installed to take advantage of this utility.

FYI: The Account Activity download seems to use https://secure07b.chase.com/svc/rr/accounts/secure/v2/account/activity/card/download as the URL and you can probably use Postman to figure how to send the right payload to do all the things you want.

Unfortunately, this method only allows up to 24 months or ~1500 transactions max. If you need 7 years of transactions, which Chase keeps, they tell you to download the PDF statements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment