Skip to content

Instantly share code, notes, and snippets.

@veekaybee
Last active December 24, 2024 11:24
Show Gist options
  • Save veekaybee/be375ab33085102f9027853128dc5f0e to your computer and use it in GitHub Desktop.
Save veekaybee/be375ab33085102f9027853128dc5f0e to your computer and use it in GitHub Desktop.
Normcore LLM Reads

Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.

Foundational Concepts

Screenshot 2023-12-18 at 10 40 27 PM

Pre-Transformer Models

Screenshot 2023-12-18 at 8 25 42 PM

Building Blocks

Foundational Deep Learning Papers (in semi-chronological order)

The Transformer Architecture

Screenshot 2023-12-18 at 8 37 44 PM

Attention

GPT

Screenshot 2023-12-18 at 8 37 44 PM

Significant OSS Models

LLMs in 2023

Screenshot 2023-12-18 at 10 07 57 PM

Training Data

Pre-Training

RLHF and DPO

Screenshot 2023-12-18 at 10 07 57 PM

Fine-Tuning and Compression

Small and Local LLMs

Deployment and Production

LLM Inference and K-V Cache

Prompt Engineering and RAG

GPUs

Screenshot 2023-12-18 at 10 02 48 PM

Evaluation

Eval Frameworks

UX

What's Next?

Thanks to everyone who added suggestions on Twitter, Mastodon, and Bluesky.

@san7988
Copy link

san7988 commented Aug 20, 2023

Posts from eugene yan are also pretty good read.

@butsugiri
Copy link

Hi, thank you for your great work!

Training Your Own

I wonder if we can add huggingface/llm_training_handbook: An open collection of methodologies to help with successful training of large language models. to this section?

@lcrmorin
Copy link

It seems that the gzip approach, altough really cool, was 'optimistic' and thus overhyped, see: https://kenschutte.com/gzip-knn-paper/ (basiccaly they confused k in k-nn and top-k accuracy, reporting top-2 accuracy). More recent studies found that it is, as expected, on 'bag of words' performance level Gzip versus bag-of-words for text classification.

I don't know if you intend to (or are even interested) but I am on the look out for "usecases for normies".

@wrhall
Copy link

wrhall commented Aug 22, 2023

Do you think it's worth annotating with dates of the articles / papers / videos?

@veekaybee
Copy link
Author

@veekaybee
Copy link
Author

It seems that the gzip approach, altough really cool, was 'optimistic' and thus overhyped, see: https://kenschutte.com/gzip-knn-paper/ (basiccaly they confused k in k-nn and top-k accuracy, reporting top-2 accuracy). More recent studies found that it is, as expected, on 'bag of words' performance level Gzip versus bag-of-words for text classification.

I don't know if you intend to (or are even interested) but I am on the look out for "usecases for normies".

Yeah that was my read on it as well but I'm also very interested in it as a general theoretical approach and baseline, even if this particular implementation doesn't work.

@veekaybee
Copy link
Author

Do you think it's worth annotating with dates of the articles / papers / videos?

Maybe would be helpful but I explicitly picked stuff that I thought wouldn't age and/or where the recency didn't matter because the fundamentals are timeless.

@janhesse53
Copy link

Patterns for Building LLM-based Systems & Products

In my opinion, this is a super in depth article that covers many of the categories and deserves a place in the reading list.

@rmitsch
Copy link

rmitsch commented Aug 25, 2023

Against LLM maximalism (disclaimer: I work at Explosion)

@Lykos2
Copy link

Lykos2 commented Aug 27, 2023

Patterns for Building LLM-based Systems & Products In my opinion, this is a super in depth article that covers many of the categories and deserves a place in the reading list.

detailed blog

@spmurrayzzz
Copy link

The Illustrated Transformer - maybe redundant from some of the other transformers content here, but is very well-written and strikes a good balance between prose and visual aids.

@satisfice
Copy link

ChatGPT Sucks at Being a Testing Expert
https://www.satisfice.com/download/appendix-chatgpt-sucks-at-being-a-testing-expert?wpdmdl=487569

This is a careful analysis of an attempt to demonstrate ChatGPT’s usefulness to help testers.

@davidzshi
Copy link

@Tulip4attoo
Copy link

My writing as an extension to "Why you should host your LLM?" article, with some adding on operation perspective: https://tulip4attoo.substack.com/p/why-you-should-host-your-llm-from

@ghosthamlet
Copy link

Maybe Foundational Papers should include the first instruction-tuned model FLAN (no RLHF):
Finetuned Language Models Are Zero-Shot Learners: https://arxiv.org/abs/2109.01652

@timbornholdt
Copy link

I gave a talk about prompt engineering for normal people and turned it into a pretty decent article, might be useful for the list too? https://timbornholdt.com/blog/prompt-engineering-how-to-think-like-an-ai

@emilymbender
Copy link

The Stochastic Parrots paper presents many things that anyone should be cognisant of when deciding whether or not to use an LLM:

Bender, Emily M., Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜. In Proceedings of FAccT 2021, pp.610-623.

@livc
Copy link

livc commented Aug 29, 2023

We are exploring the landing and commercialization scenarios of AI Agent at https://askgen.ie, and currently we think customer support is a good scenario

@umair-nasir14
Copy link

@livc Are you guys hiring?

@umair-nasir14
Copy link

@Sharrp
Copy link

Sharrp commented Aug 29, 2023

I found "Five years of GPT progress" to be a useful overview of the influential papers on GPT.
https://finbarr.ca/five-years-of-gpt-progress/
May work as a high-level summary for "Foundational Papers" section.

p.s. Thank you for compiling the list!

@hkniberg
Copy link

hkniberg commented Aug 30, 2023

Hi! This is great! Would be even more useful to write the year/month of publication next to each item, to get a sense of which links are more up-to-date and which are more historical.

@will-thompson-k
Copy link

will-thompson-k commented Aug 30, 2023

I really like this list, sad I just discovered this 😎 .

I am not sure if this would complement your Background section, but I wrote this as a primer on LLMs last month: https://willthompson.name/what-we-know-about-llms-primer.

But I don't know, might not be very orthogonal to your other sources here 🤷 .

@AnnthomyGILLES
Copy link

An overview of vector database. The author highlight the differences between the various vector databases out there as visually as possible.

https://thedataquarry.com/posts/vector-db-1/

@davidzshi
Copy link

An overview of vector database. The author highlight the differences between the various vector databases out there as visually as possible.

https://thedataquarry.com/posts/vector-db-1/

This is really helpful, thank you!

@tekumara
Copy link

@lcrmorin
Copy link

I keep coming back to this list. However I feel like it miss a good discussion about current stuff not working. I keep failling to implement working stuff, despite lenghty theoretical works, and when I scratch the veneer I keep getting the same answer: "technology is not ready yet".

@lcrmorin
Copy link

lcrmorin commented Dec 29, 2023

@zaunere
Copy link

zaunere commented Sep 22, 2024

Awesome list (and comments), but "graph" is missing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment