// | |
// NSObject+BlockObservation.h | |
// Version 1.0 | |
// | |
// Andy Matuschak | |
// [email protected] | |
// Public domain because I love you. Let me know how you use it. | |
// | |
#import <Cocoa/Cocoa.h> |
# First, create the synonyms file /opt/elasticsearch/name_synonyms.txt | |
# with the contents: | |
# | |
# rob,bob => robert | |
# | |
## CREATE THE INDEX WITH ANALYZERS AND MAPPINGS | |
curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d ' | |
{ |
from multiprocessing import Pool as MPool | |
from time import sleep | |
import datetime | |
import multiprocessing | |
import random | |
def time_request(): | |
from gevent import monkey; monkey.patch_socket | |
from jsonrequester import JsonRequester |
#! /usr/bin/env python | |
class Pipeline(object): | |
def __init__(self): | |
self.source = None | |
def __iter__(self): | |
return self.generator() | |
def generator(self): |
If you want, I can try and help with pointers as to how to improve the indexing speed you get. Its quite easy to really increase it by using some simple guidelines, for example: | |
- Use create in the index API (assuming you can). | |
- Relax the real time aspect from 1 second to something a bit higher (index.engine.robin.refresh_interval). | |
- Increase the indexing buffer size (indices.memory.index_buffer_size), it defaults to the value 10% which is 10% of the heap. | |
- Increase the number of dirty operations that trigger automatic flush (so the translog won't get really big, even though its FS based) by setting index.translog.flush_threshold (defaults to 5000). | |
- Increase the memory allocated to elasticsearch node. By default its 1g. | |
- Start with a lower replica count (even 0), and then once the bulk loading is done, increate it to the value you want it to be using the update_settings API. This will improve things as possibly less shards will be allocated to each machine. | |
- Increase the number of machines you have so |
Locate the section for your github remote in the .git/config
file. It looks like this:
[remote "origin"]
fetch = +refs/heads/*:refs/remotes/origin/*
url = [email protected]:joyent/node.git
Now add the line fetch = +refs/pull/*/head:refs/remotes/origin/pr/*
to this section. Obviously, change the github url to match your project's URL. It ends up looking like this:
Yesterday I upgraded our running elasticsearch cluster on a site which serves a few million search requests a day, with zero downtime. I've been asked to describe the process, hence this blogpost.
To make it more complicated, the cluster was running elasticsearch version 0.17.8 (released 6 Oct 2011) and I upgraded it to the latest 0.19.10. There have been 21 releases between those two versions, with a lot of functional changes, so I needed to be ready to roll back if necessary.
- elasticsearch
We run elasticsearch on two biggish boxes: 16 cores plus 32GB of RAM. All indices have 1 replica, so all data is stored on both boxes (about 45GB of data). The primary data for our main indices is also stored in our database. We have a few other indices whose data is stored only in elasticsearch, but are updated once daily only. Finally, we store our sessions in elasticsearch, but active sessions are cached in memcached.
################################################################## | |
# /etc/elasticsearch/elasticsearch.yml | |
# | |
# Base configuration for a write heavy cluster | |
# | |
# Cluster / Node Basics | |
cluster.name: logng | |
# Node can have abritrary attributes we can use for routing |
(This gist is pretty old; I've written up my current approach to the Pyramid integration on this blog post, but that blog post doesn't go into the transactional management, so you may still find this useful.)
I've created a Pyramid scaffold which integrates Alembic, a migration tool, with the standard SQLAlchemy scaffold. (It also configures the Mako template system, because I prefer Mako.)
I am also using PostgreSQL for my database. PostgreSQL supports nested transactions. This means I can setup the tables at the beginning of the test session, then start a transaction before each test happens and roll it back after the test; in turn, this means my tests operate in the same environment I expect to use in production, but they are also fast.
I based my approach on [sontek's blog post](http://sontek.net/blog/