Skip to content

Instantly share code, notes, and snippets.

View jakevdp's full-sized avatar

Jake Vanderplas jakevdp

View GitHub Profile
@jakevdp
jakevdp / AR_crash.py
Created June 13, 2011 22:03
ARPACK memory error
import numpy as np
from scipy.sparse.linalg import eigs
N = 6
k = 2
# with this random seed, I get a memory error on the third iteration below
np.random.seed(2301)
A = np.random.random((N,N))
@jakevdp
jakevdp / README
Created September 29, 2011 14:04
test code & dataset for scikit-learn issue #365
code demonstrating the problem seen in issue #365
to run the example:
tar -zxvf data.tgz
python test.py
@jakevdp
jakevdp / banded_tools.py
Created December 23, 2011 02:26
Benchmarks for eigenvalue decomposition
from time import time
import numpy as np
from scipy.sparse import spdiags, issparse, dia_matrix
from scipy.sparse.linalg import factorized
from scipy import linalg as splinalg
class BandedMatrix(object):
def __init__(self, data, lu=None):
if issparse(data):
if lu:
@jakevdp
jakevdp / README.rst
Created December 29, 2011 15:08
GMM BIC/AIC test

This includes a test of the new GMM routines in https://github.com/bthirion/scikit-learn/tree/gmm-fixes

By changing the line

GMM = mixture.GMM

at the top of the file, we can plot the BIC and AIC for each variant of GMM. Standard GMM works beautifully: it settles in on 3 components, which are a good description of the data. DPGMM and VBGMM produce some unexpected results.

@jakevdp
jakevdp / README.rst
Created January 5, 2012 16:30
General Distance Metrics for BallTree

This is the outline of a framework that will allow general distance metrics to be incorporated into scikit-learn BallTree. The idea is that we need a fast way to compute the distance between two points under a given metric. In the basic framework here, this involves creating an object which exposes C-pointers to a function and a parameter structure so that the distance function can be called from either python or directly from cython with no python overhead.

@jakevdp
jakevdp / Makefile
Created January 18, 2012 17:19
Example of sphinx image copy
SPHINXBUILD = sphinx-build
BUILDDIR = _build
SPHINXOPTS = -d $(BUILDDIR)/doctrees .
all: html
html:
$(SPHINXBUILD) -b html $(SPHINXOPTS) $(BUILDDIR)/html
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
@jakevdp
jakevdp / kneighbors_test.py
Created January 23, 2012 23:51
Showing memory error in BallTree
import warnings
from sklearn import datasets
from sklearn.neighbors import NearestNeighbors
import numpy as np
n_points = 1000
n_neighbors = 10
out_dim = 2
n_trials = 100
@jakevdp
jakevdp / sklearn_doc.py
Created September 30, 2012 19:56
Scikit-learn Documentation Template
"""
This file has an example function, with a documentation string which should
serve as a template for scikit-learn docstrings.
"""
def sklearn_template(X, y, a=1, flag=True, f=None, **kwargs):
"""This is where a short one-line description goes
This is where a longer, multi-line description goes. It's not
required, but might be helpful if more information is needed.
@jakevdp
jakevdp / basic_animation.py
Created October 6, 2012 00:04
Demo for GIF animations
import numpy as np
from matplotlib import pyplot as plt
from matplotlib import animation
# First set up the figure, the axis, and the plot element we want to animate
fig = plt.figure()
ax = fig.add_subplot(111, xlim=(0, 2), ylim=(-2, 2))
line, = ax.plot([], [], lw=2)
# initialization function: plot the background of each frame
@jakevdp
jakevdp / README.md
Last active September 30, 2023 13:25
Numba Ball Tree example

Numba Ball Tree

This is a quick attempt at writing a ball tree for nearest neighbor searches using numba. I've included a pure python version, and a version with numba jit decorators. Because class support in numba is not yet complete, all the code is factored out to stand-alone functions in the numba version. The resulting code produced by numba is about ~10 times slower than the cython ball tree in scikit-learn. My guess is that part of this stems from lack of inlining in numba, while the rest is due to some sort of overhead