Skip to content

Instantly share code, notes, and snippets.

@henrik
Created March 3, 2012 17:07
Show Gist options
  • Save henrik/1967035 to your computer and use it in GitHub Desktop.
Save henrik/1967035 to your computer and use it in GitHub Desktop.
OCR on OS X with tesseract

Install ImageMagick for image conversion:

brew install imagemagick

Install tesseract for OCR:

brew install tesseract --all-languages

Or install without --all-languages and install them manually as needed.

Make sure the input image is a grayscale .tif and fairly large. ~500x150 was too small, while ~2000*500 worked very well.

convert input.png -resize 400% -type Grayscale input.tif

OCR it. The default language is English. Language codes are 3 chars per man tesseract.

tesseract -l eng input.tif output

This creates output.txt.

@bejvisek
Copy link

bejvisek commented May 22, 2019

Hi, I have installed tessaract 4.0.0 smoothly by
brew install tesseract
but have only these languages available:
$ tesseract --list-langs List of available languages (3): eng osd snum
No above mention option like --with-all-languates does not work anymore :-/

Is there a way to install selected language(s)? Thanks!

@bejvisek
Copy link

Answering my own question:
brew install tesseract-lang
It installs most of the languages, but not all listed here: link
I need to install language "equ", still don't know how

@abdennour
Copy link

thanks @bejvisek

@JamesAsuraA93
Copy link

MacOS user must use "brew install tesseract-lang" instant "brew install tesseract --all-languages"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment