Skip to content

Instantly share code, notes, and snippets.

View rookiepig's full-sized avatar

rookiepig rookiepig

View GitHub Profile
@rookiepig
rookiepig / CaffeBatchPrediction.cpp
Created May 13, 2017 08:13 — forked from erogol/CaffeBatchPrediction.cpp
Caffe c++ batch based prediction
#include "caffeclassifier.h"
CaffeClassifier::CaffeClassifier(const string& model_file,
const string& trained_file,
const string& mean_file,
const string& label_file,
const bool use_GPU,
const int batch_size) {
if (use_GPU)
Caffe::set_mode(Caffe::GPU);
# Try to copy "a" value to "c" while simultaneously adding vector of 1's to a.
# If the copy is started before the first assign_add, the copied value will be inconsistent.
#
# Running it on macbook my "c" ends up with a mix of values between 1 and 6
#
#
# 16.017478 copy 1 (0) starting
# 17.006894 write 1 (0) starting
# 28.431654 write 1 ending (11.4247 sec)
# 29.436692 write 1 (1) starting
@rookiepig
rookiepig / benchmark_grpc_recv.py
Created March 7, 2017 08:21 — forked from yaroslavvb/benchmark_grpc_recv.py
Benchmark slowness of passing Tensors around between TF workers
# Dependencies:
# portpicker (pip install portpicker)
# tcmalloc4 (sudo apt-get install google-perftools)
# TF 0.12
#
#
# Benchmarks on Xeon E5-2630 v3 @ 2.40GHz
#
# export LD_PRELOAD=/usr/lib/libtcmalloc.so.4
# python benchmark_grpc_recv.py --data_mb=128
@rookiepig
rookiepig / sharded_ps_benchmark.py
Created March 7, 2017 08:21 — forked from yaroslavvb/sharded_ps_benchmark.py
Example of local cluster with multiple workers/training loops sharded parameter server
#!/usr/bin/env python
# Benchmark transferring data, part of troubleshooting https://github.com/tensorflow/tensorflow/issues/6116
#
# Take a independent workers communicating with b parameter shards
# Each worker tries to add to variables stored on parameter server as fast as
# possible.
#
# macbook
# ps=1: 1.6 GB/s
# ps=2: 2.6 GB/s
@rookiepig
rookiepig / session-run-benchmark.py
Created March 7, 2017 03:59 — forked from yaroslavvb/session-run-benchmark.py
Example of benchmarking session.run call
# Example of profiling session.run overhead
# for python profiling
# python -m cProfile -o session-run-benchmark-feed.prof session-run-benchmark.py feed_dict
# python -m cProfile -o session-run-benchmark-variable.prof session-run-benchmark.py variable
# pip install snakeviz
# snakeviz session-run-benchmark-feed.prof
# snakeviz session-run-benchmark.prof
#
#
# Feed_dict: 147 usec, no feed dict, 71 usec
@rookiepig
rookiepig / gist:b73baa07e8f4cdff03730042ba4ab7cf
Created March 7, 2017 03:52 — forked from yaroslavvb/gist:b73ff35424dd7ab762234620cf583aac
Example of restricting part of graph to run on single core
# try running cpu intensive test on two devices
import tensorflow as tf
import time
def matmul_op():
"""Multiply two matrices together"""
n = 2000
a = tf.ones((n, n), dtype=tf.float32)
@rookiepig
rookiepig / cpu_device_test.py
Created March 7, 2017 02:41 — forked from yaroslavvb/cpu_device_test.py
Run matmul on different CPU devices, plot timeline
import tensorflow as tf
from tensorflow.python.client import timeline
n = 1024
with tf.device("cpu:0"):
a1 = tf.ones((n, n))
a2 = tf.ones((n, n))
with tf.device("cpu:1"):
a3 = tf.matmul(a1, a2)
with tf.device("cpu:2"):
@rookiepig
rookiepig / vimdiff.md
Created April 5, 2016 02:58 — forked from mattratleph/vimdiff.md
vimdiff cheat sheet

vimdiff cheat sheet

##git mergetool

In the middle file (future merged file), you can navigate between conflicts with ]c and [c.

Choose which version you want to keep with :diffget //2 or :diffget //3 (the //2 and //3 are unique identifiers for the target/master copy and the merge/branch copy file names).

:diffupdate (to remove leftover spacing issues)

:only (once you’re done reviewing all conflicts, this shows only the middle/merged file)