Skip to content

Instantly share code, notes, and snippets.

@frobnitzem
frobnitzem / serialized_tensor_sizes.md
Last active June 28, 2024 19:50
Serialized tensor sizes.
# What are the size overheads for serializing tensors?
#
import io
import sys

import numpy as np

# https://huggingface.co/docs/safetensors/index
#from safetensors.torch import save_file
@frobnitzem
frobnitzem / developer_conventions.md
Created September 14, 2023 06:16
Sane Conventions for Developers

Sane Conventions for Developers

These conventions are guidelines to help developers stay productive. Although following them does not guarantee sanity, not following these convention has been known to produce undefined behavior. Incidentally, these also make a good scoreboard for assessing a project's maintainability.

Usability

  • Do not allow your project to grow beyond a few source files focused on a single goal. Symptoms are packages containing code that (while originally doing real work) now also writes container definitions, arranges shell variables, moves output files around, plays strategy games, integrates with the cloud in any way, or attempts to send/read email. If the real work were done already, some other program could be doing all those other things and calling your program when it gets more work.
  • Document your code before you write the code.
    • Design the central activities and data structures first.
  • Include an installation script. Not everyone knows how to use the ant build tool
@frobnitzem
frobnitzem / RCCL_OFI.md
Last active November 29, 2023 21:30
RCCL AllReduce Performance

David M. Rogers, National Center for Computational Science, Oak Ridge National Laboratory July 12, 2023.

All-reduce is a core functionality of HPC applications like iterative solvers, which sum residuals after a parameter update. AI/ML methods fall in this category, performing a sum-reduction over derivatives of each ML parameter. Another class of codes like principal component analysis and electronic structure methods use all-reduce to create dot-products over large distributed vectors.

The NCCL library provided by NVIDIA optimizes the communication graph used by all-reduce to prioritize intra-node communication, where bandwidth is higher. As a consequence, it achieves higher overall bandwidth (Jeaugey, 2019). Unmodified NCCL does not make use of libfabric, and defaults to using TCP sockets on Slingshot-11 (TM) interconnect hardware. However, a plugin has been published by AWS, [(Kim, Kheria, Inozemtsev, 2018-2022)]

@frobnitzem
frobnitzem / parse_table.py
Created February 3, 2021 16:07
Parse an HTML table into json.
#!/usr/bin/env python3
import json
from html.parser import HTMLParser
# HTML is stupid - these tags don't close:
voids = set([ 'area', 'base', 'br', 'col',
'command', 'embed', 'hr', 'img',
'input', 'keygen', 'link', 'meta',
'param', 'source', 'track', 'wbr'
@frobnitzem
frobnitzem / knapsack.js
Last active November 27, 2020 10:58 — forked from danwoods/knapsack.js
Knapsack algorithm in JavaScript
// Solve the knapsack problem using recursive descent.
// This wraps the actual solver below with a nice interface.
// It also handles non-integer cost, but then the complexity
// scales with the precision of the cost!
//
// obj is an object of named [cost, benefit] pairs, e.g.
// { banana: [5, 33],
// apple: [1, 12],
// kiwi: [1, 7]
// }