Skip to content

Instantly share code, notes, and snippets.

View jkryanchou's full-sized avatar

JK.Ryan jkryanchou

View GitHub Profile
// NSObject+BlockObservation.h
// Version 1.0
// Andy Matuschak
// [email protected]
// Public domain because I love you. Let me know how you use it.
#import <Cocoa/Cocoa.h>
mrflip / gist:766608
Created January 5, 2011 17:15
Elasticsearch shell config
We couldn’t find that file to show.
clintongormley / gist:1088986
Created July 18, 2011 09:19
Create index for partial matching of names in ElasticSearch
# First, create the synonyms file /opt/elasticsearch/name_synonyms.txt
# with the contents:
# rob,bob => robert
curl -XPUT '' -d '
mdbecker / gist:1309633
Created October 24, 2011 17:50
multiprocess && gevent example
from multiprocessing import Pool as MPool
from time import sleep
import datetime
import multiprocessing
import random
def time_request():
from gevent import monkey; monkey.patch_socket
from jsonrequester import JsonRequester
alexmacedo /
Created January 3, 2012 00:08
Unix pipeline pattern in python
#! /usr/bin/env python
class Pipeline(object):
def __init__(self):
self.source = None
def __iter__(self):
return self.generator()
def generator(self):
duydo / elasticsearch_best_practices.txt
Last active June 20, 2024 09:59
Elasticsearch - Index best practices from Shay Banon
If you want, I can try and help with pointers as to how to improve the indexing speed you get. Its quite easy to really increase it by using some simple guidelines, for example:
- Use create in the index API (assuming you can).
- Relax the real time aspect from 1 second to something a bit higher (index.engine.robin.refresh_interval).
- Increase the indexing buffer size (indices.memory.index_buffer_size), it defaults to the value 10% which is 10% of the heap.
- Increase the number of dirty operations that trigger automatic flush (so the translog won't get really big, even though its FS based) by setting index.translog.flush_threshold (defaults to 5000).
- Increase the memory allocated to elasticsearch node. By default its 1g.
- Start with a lower replica count (even 0), and then once the bulk loading is done, increate it to the value you want it to be using the update_settings API. This will improve things as possibly less shards will be allocated to each machine.
- Increase the number of machines you have so
piscisaureus /
Created August 13, 2012 16:12
Checkout github pull requests locally

Locate the section for your github remote in the .git/config file. It looks like this:

[remote "origin"]
	fetch = +refs/heads/*:refs/remotes/origin/*
	url = [email protected]:joyent/node.git

Now add the line fetch = +refs/pull/*/head:refs/remotes/origin/pr/* to this section. Obviously, change the github url to match your project's URL. It ends up looking like this:

clintongormley / gist:3888120
Created October 14, 2012 09:44
Upgrading a running elasticsearch cluster

Yesterday I upgraded our running elasticsearch cluster on a site which serves a few million search requests a day, with zero downtime. I've been asked to describe the process, hence this blogpost.

To make it more complicated, the cluster was running elasticsearch version 0.17.8 (released 6 Oct 2011) and I upgraded it to the latest 0.19.10. There have been 21 releases between those two versions, with a lot of functional changes, so I needed to be ready to roll back if necessary.

Our setup:

  • elasticsearch

We run elasticsearch on two biggish boxes: 16 cores plus 32GB of RAM. All indices have 1 replica, so all data is stored on both boxes (about 45GB of data). The primary data for our main indices is also stored in our database. We have a few other indices whose data is stored only in elasticsearch, but are updated once daily only. Finally, we store our sessions in elasticsearch, but active sessions are cached in memcached.

reyjrar / elasticsearch.yml
Last active December 26, 2024 21:46
ElasticSearch config for a write-heavy cluster
# /etc/elasticsearch/elasticsearch.yml
# Base configuration for a write heavy cluster
# Cluster / Node Basics logng
# Node can have abritrary attributes we can use for routing
inklesspen /
Last active September 6, 2023 17:11
Fast and flexible unit tests with live Postgres databases and fixtures

(This gist is pretty old; I've written up my current approach to the Pyramid integration on this blog post, but that blog post doesn't go into the transactional management, so you may still find this useful.)

Fast and flexible unit tests with live Postgres databases and fixtures

I've created a Pyramid scaffold which integrates Alembic, a migration tool, with the standard SQLAlchemy scaffold. (It also configures the Mako template system, because I prefer Mako.)

I am also using PostgreSQL for my database. PostgreSQL supports nested transactions. This means I can setup the tables at the beginning of the test session, then start a transaction before each test happens and roll it back after the test; in turn, this means my tests operate in the same environment I expect to use in production, but they are also fast.

I based my approach on [sontek's blog post](