Skip to content

Instantly share code, notes, and snippets.

View robertsd's full-sized avatar

Derrick Roberts robertsd

View GitHub Profile
@veekaybee
veekaybee / normcore-llm.md
Last active January 9, 2025 15:56
Normcore LLM Reads

Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.

Foundational Concepts

Screenshot 2023-12-18 at 10 40 27 PM

Pre-Transformer Models

Isolation forests versus decision trees

Isolation forest paper Screen Shot 2023-02-01 at 9 47 19 PM

Screen Shot 2023-02-01 at 9 47 58 PM

Screen Shot 2023-02-01 at 9 49 41 PM

  • Isolated points should be lower and closer to the root of the tree
@veekaybee
veekaybee / chatgpt.md
Last active December 24, 2024 20:23
Everything I understand about chatgpt

ChatGPT Resources

Context

ChatGPT appeared like an explosion on all my social media timelines in early December 2022. While I keep up with machine learning as an industry, I wasn't focused so much on this particular corner, and all the screenshots seemed like they came out of nowhere. What was this model? How did the chat prompting work? What was the context of OpenAI doing this work and collecting my prompts for training data?

I decided to do a quick investigation. Here's all the information I've found so far. I'm aggregating and synthesizing it as I go, so it's currently changing pretty frequently.

Model Architecture

@aglove2189
aglove2189 / BuildingDataProducts.md
Last active November 3, 2023 16:24
How to Build Resilient Data Products

How to Build Resilient Data Products

Every aspect of your product should contribute to one of these 5 principles:

  1. Small
  2. Fast
  3. Reproducible
  4. Transparent
  5. Frictionless

# ========== (c) JP Hwang 27/7/20 ==========
from shared_funcs import load_fig
fig = load_fig()
import dash
import dash_html_components as html
import dash_core_components as dcc
from dash.dependencies import Input, Output
external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']
@tdunning
tdunning / td-in-r.r
Last active November 26, 2018 20:05
A simplified implementation of a merging t-digest in R with some visualization of the results
### x is either a vector of numbers or a data frame with sums and weights. Digest is a data frame.
merge = function(x, digest, compression=100) {
## Force the digest to be a data.frame, possibly empty
if (!is.data.frame(digest) && is.na(digest)) {
digest = data.frame(sum=c(), weight=c())
}
## and coerce the incoming data likewise ... a vector of points have default weighting of 1
if (!is.data.frame(x)) {
x = data.frame(sum=x, weight=1)
}
@dannguyen
dannguyen / iowa-liquor-sales-dataset.readme.md
Last active October 30, 2024 19:04
Cleaning, summing up the State of Iowa Liquor Sales dataset

Iowa Liquor Sales dataset via Socrata/data.iowa.gov

(preliminary exploration)

The state of Iowa has released an 800MB+ dataset of more than 3 million rows showing weekly liquor sales, broken down by liquor category, vendor, and product name, e.g. STRAIGHT BOURBON WHISKIES, Jim Beam Brands, Maker's Mark

This dataset contains the spirits purchase information of Iowa Class “E” liquor licensees by product and date of purchase from January 1, 2014 to current. The dataset can be used to analyze total spirits sales in Iowa of individual products at the store level.

You can view the dataset via Socrata