Skip to content

Instantly share code, notes, and snippets.

This document has moved!

It's now here, in The Programmer's Compendium. The content is the same as before, but being part of the compendium means that it's actively maintained.

dannguyen /
Last active September 10, 2024 19:41
Using Python 3.x and Google Cloud Vision API to OCR scanned documents to extract structured data

Using Python 3 + Google Cloud Vision API's OCR to extract text from photos and scanned documents

Just a quickie test in Python 3 (using Requests) to see if Google Cloud Vision can be used to effectively OCR a scanned data table and preserve its structure, in the way that products such as ABBYY FineReader can OCR an image and provide Excel-ready output.

The short answer: No. While Cloud Vision provides bounding polygon coordinates in its output, it doesn't provide it at the word or region level, which would be needed to then calculate the data delimiters.

On the other hand, the OCR quality is pretty good, if you just need to identify text anywhere in an image, without regards to its physical coordinates. I've included two examples:

####### 1. A low-resolution photo of road signs

import time
class PageCategoryFilter(object):
def __init__(self, config):
self.mode = config["mode"]
self.categories = config["categories"]
def filter(self, bid_request):
if self.mode == "whitelist":
jonhoo /
Last active July 19, 2021 10:49
Distributed RWMutex in Go
drkarl / gist:739a864b3275e901d317
Last active October 30, 2024 19:38
Ask HN: Best Linux server backup system?

Linux Backup Solutions

I've been looking for the best Linux backup system, and also reading lots of HN comments.

Instead of putting pros and cons of every backup system I'll just list some deal-breakers which would disqualify them.

Also I would like that you, the HN community, would add more deal breakers for these or other backup systems if you know some more and at the same time, if you have data to disprove some of the deal-breakers listed here (benchmarks, info about something being true for older releases but is fixed on newer releases), please share it so that I can edit this list accordingly.

  • It has a lot of management overhead and that's a problem if you don't have time for a full time backup administrator.
arkadijs /
Last active October 16, 2018 19:02
Registrator / SkyDNS for CoreOS / Deis cluster

Registrator and SkyDNS

We use progrium/registrator and yaronr/skydns (SkyDNS2) to publish information about Docker containers to DNS via A and SRV records. All nodes runs skydns and registrator, and first three nodes are inserted as NS-s into Route53 DNS for Note, v4 registrator must be used until registrator/124 is resolved.

$ host -t srv
Using domain server:
Revolucent / BitwiseOptions.swift
Last active September 22, 2018 12:46
BitwiseOptions implementation for Swift
// BitwiseOptions.swift
// Created by Gregory Higley on 11/24/14.
// Copyright (c) 2014 Prosumma LLC. All rights reserved.
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
sketchytech / Common Swift String Extensions
Last active February 3, 2021 19:03 — forked from albertbori/Common Swift String Extensions
Added and amended to use pure Swift where possible
import Foundation
extension String
// Works in Xcode but not Playgrounds because of a bug with .insert()
mutating func insertString(string:String,ind:Int) {
var insertIndex = advance(self.startIndex, ind, self.endIndex)
for c in string {
ideaoforder / godaddy-ssl-howto
Created April 8, 2014 16:30
GoDaddy + Nginx SSL
openssl req -new -newkey rsa:2048 -nodes -keyout yourdomain.key -out yourdomain.csr
# Be sure to remember to chain them!
cat gd_bundle-g2-g1.crt >> yourdomain.crt
# Move 'em
sudo mv yourdomain.crt /etc/ssl/certs/yourdomain.crt
Nagyman /
Last active November 20, 2024 01:08
Workflows in Django

Workflows (States) in Django

I'm going to cover a simple, but effective, utility for managing state and transitions (aka workflow). We often need to store the state (status) of a model and it should only be in one state at a time.

Common Software Uses

  • Publishing (Draft->Approved->Published->Expired->Deleted)