Vicki Boykis veekaybee

## uninstall_pyenv_mac.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                veekaybee
                / uninstall_pyenv_mac.md
            
            
              Created
              November 16, 2024 00:47
            
          
    You might want to use uv now that it's gotten a bit more stable for Mac.
I've already been using it at work and wanted to install it locally for a new project on my computer, but had pyenv.
Only do this if you completely want to rip out pyenv, otherwise, just disable it by removing from your ~/.zshrc
Here's what I had to do:

Comment out everything related to pyenv in my ~/.zshrc file and source ~/.zshrc -
you may have to search around for all instances if you are like me and not organized about your ~/.zshrc
rm -rf "$HOME/.pyenv" # DOUBLE CHECK THIS COMMAND AND WHERE YOUR pyenv is
brew uninstall pyenv just in case


## normcore-llm.md

      
              1 file
            
          
              307 forks
            
          
                52 comments
              
            
              3232 stars
            
          
                veekaybee
                / normcore-llm.md
            
            
              Last active
              December 18, 2024 13:39
            
              
                Normcore LLM Reads
              
          
    Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.
Foundational Concepts


Pre-Transformer Models


## viberary_training_data.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              1 star
            
          
                veekaybee
                / viberary_training_data.md
            
            
              Last active
              July 3, 2024 02:02
            
          
    Data source:

https://sites.google.com/eng.ucsd.edu/ucsdbookgraph/home

@inproceedings{DBLP:conf/recsys/WanM18,
  author       = {Mengting Wan and
                  Julian J. McAuley},
  editor       = {Sole Pera and
                  Michael D. Ekstrand and
                  Xavier Amatriain and


## enriched_data.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              1 star
            
          
                veekaybee
                / enriched_data.md
            
            
              Created
              June 30, 2023 09:51
            
          
    how to properly select from DuckDB

SELECT review_text,title,description,goodreads.average_rating, goodreads_authors.name 
FROM goodreads 
JOIN goodreads_reviews 
ON goodreads.book_id = goodreads_reviews.book_id 
JOIN goodreads_authors  
ON goodreads_authors.author_id = (select REGEXP_EXTRACT(authors, '[0-9]+')[1] as author_id FROM goodreads) LIMIT 10;

  
## systems_performance.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              3 stars
            
          
                veekaybee
                / systems_performance.md
            
            
              Created
              February 16, 2023 19:38
            
          
    Systems Performance 2nd edition

See synthesized write-up here

Do a quick performance check in 60 seconds
Use a number of different tools available in unix
Use flamegraphs of the callstack if you have access to them
Best performance winds are elimiating unnecessary wrok, for example a thread stack in a loop, eliminating bad config
Mantras: Don't do it (elimiate); do it again (caching); do it less (polling), do it when they're not looking, do it concurrently, do it more cheaply


## non_personalized_recs.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              2 stars
            
          
                veekaybee
                / non_personalized_recs.md
            
            
              Last active
              August 9, 2023 01:30
            
          
    Introduction to Recommender Systems: Non-Personalized and Content-Based on Coursera

Information retrieval is the practice of asking questions about large documents.

It became especially popular when doing discovery for lawsuits
or AWS in guiding you to the relevant products
One of the first recommenders was GroupLens for newsnet

Collaborative Filtering: Involves running Ratings and Correlations through a CF engine.

The goal is to find a neighborhood of users
Recommendation Interfaces: Suggestion, top n


## isolation_forest.md

      
              1 file
            
          
              1 fork
            
          
                0 comments
              
            
              1 star
            
          
                veekaybee
                / isolation_forest.md
            
            
              Created
              February 2, 2023 02:49
            
          
    Isolation forests versus decision trees

Isolation forest paper


Isolated points should be lower and closer to the root of the tree


## machine_learning_design_patterns.md

      
              1 file
            
          
              4 forks
            
          
                0 comments
              
            
              62 stars
            
          
                veekaybee
                / machine_learning_design_patterns.md
            
            
              Last active
              December 13, 2024 22:08
            
          
    Machine Learning Design Patterns

This book is all about patterns for doing ML. It's broken up into several key parts, building and serving. Both of these are intertwined so it makes sense to read through the whole thing, there are very many good pieces of advice from seasoned professionals. The parts you can safely ignore relate to anything where they specifically use GCP. The other issue with the book it it's very heavily focused on deep learning cases. Not all modeling problems require these. Regardless, let's dive in. I've included the stuff that was relevant to me in the notes.
Most Interesting Bullets:


Machine learning models are not deterministic, so there are a number of ways we deal with them when building software, including setting random seeds in models during training and allowing for stateless functions, freezing layers, checkpointing, and generally making sure that flows are as reproducible as possib


## learning_from_data.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              1 star
            
          
                veekaybee
                / learning_from_data.md
            
            
              Created
              February 1, 2023 17:16
            
          
    Notes on Learning from Data


Algorithms find the best ways to do things, but they don't explain "how" they came to those conclusions.

This is a common way to formulate ML problems, using target functions that we don't know but we want to approximate and learn.

  
## largestreams.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              1 star
            
          
                veekaybee
                / largestreams.md
            
            
              Last active
              August 9, 2023 01:34
            
              
                Counting cumulative elements in large streams
              
          
    Counting cumulative elements in large streams

An interview problem that I've gotten fairly often is, "Given a stream of elements, how do you get the median, or average, or sum of the elements in the stream?"
I've thought about this problem a lot and my naive implementation was to put the elements in a hashmap (dictionary) and then pass over the hashmap with whatever other function you need.
For example,
import typing