DungSaga dungsaga

## normcore-llm.md

      
              1 file
            
          
              311 forks
            
          
                56 comments
              
            
              3262 stars
            
          
                veekaybee
                / normcore-llm.md
            
            
              Last active
              January 14, 2025 15:16
            
              
                Normcore LLM Reads
              
          
    Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.
Foundational Concepts


Pre-Transformer Models


## expandable-animated-card-slider.markdown

      
              6 files
            
          
              0 forks
            
          
                0 comments
              
            
              1 star
            
          
                lostintangent
                / expandable-animated-card-slider.markdown
            
            
              Created
              July 18, 2023 21:51
            
              
                Expandable Animated Card Slider
              
          
    Expandable Animated Card Slider

We have made an expandable animated card slider, it will expand and collapse based on card click. We used owl carousel and jQuery for variable width and responsive slider.
A Pen by Yudiz Solutions Limited on CodePen.
License.

  
## rl-for-llms.md

      
              1 file
            
          
              29 forks
            
          
                11 comments
              
            
              554 stars
            
          
                yoavg
                / rl-for-llms.md
            
            
              Last active
              January 16, 2025 10:05
            
          
    Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.
Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback".
I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

  
## LLM.md

      
              2 files
            
          
              162 forks
            
          
                13 comments
              
            
              1629 stars
            
          
                rain-1
                / LLM.md
            
            
              Last active
              January 8, 2025 13:30
            
              
                LLM Introduction: Learn Language Models
              
          
    Purpose

Bootstrap knowledge of LLMs ASAP. With a bias/focus to GPT.
Avoid being a link dump. Try to provide only valuable well tuned information.
Prelude

Neural network links before starting with transformers.

  
## LLMs.md

      
              1 file
            
          
              21 forks
            
          
                34 comments
              
            
              344 stars
            
          
                yoavg
                / LLMs.md
            
            
              Last active
              January 5, 2025 10:40
            
          
    Some remarks on Large Language Models

Yoav Goldberg, January 2023

Audience: I assume you heard of chatGPT, maybe played with it a little, and was imressed by it (or tried very hard not to be). And that you also heard that it is "a large language model". And maybe that it "solved natural language understanding". Here is a short personal perspective of my thoughts of this (and similar) models, and where we stand with respect to language understanding.
Intro

Around 2014-2017, right within the rise of neural-network based methods for NLP, I was giving a semi-academic-semi-popsci lecture, revolving around the story that achieving perfect language modeling is equivalent to being as intelligent as a human. Somewhere around the same time I was also asked in an academic panel "what would you do if you were given infinite compute and no need to worry about labour costs" to which I cockily responded "I would train a really huge language model, just to show that it doesn't solve everything!". We

  
## chatgpt.md

      
              1 file
            
          
              39 forks
            
          
                4 comments
              
            
              350 stars
            
          
                veekaybee
                / chatgpt.md
            
            
              Last active
              December 24, 2024 20:23
            
              
                Everything I understand about chatgpt
              
          
    ChatGPT Resources

Context

ChatGPT appeared like an explosion on all my social media timelines in early December 2022. While I keep up with machine learning as an industry, I wasn't focused so much on this particular corner, and all the screenshots seemed like they came out of nowhere. What was this model? How did the chat prompting work? What was the context of OpenAI doing this work and collecting my prompts for training data?
I decided to do a quick investigation. Here's all the information I've found so far. I'm aggregating and synthesizing it as I go, so it's currently changing pretty frequently.
Model Architecture


## CodeGen_GPTJ_Conversion.md

      
              1 file
            
          
              2 forks
            
          
                1 comment
              
            
              56 stars
            
          
                moyix
                / CodeGen_GPTJ_Conversion.md
            
            
              Last active
              January 5, 2024 12:50
            
              
                How to convert the SalesForce CodeGen models to GPT-J
              
          
    Using Linear Algebra to Convert a Large Code Model

Background

The SalesForce CodeGen models are a family of large language models trained on a large amount of natural language data and then fine-tuned on specialized datasets of code. Models of size 350M, 2B, 6B, and 16B parameters are provided in three flavors:

nl, the base model trained on The Pile, a large natural language dataset compiled by EleutherAI
multi, which is fine-tuned from the nl model on a dataset of code in multiple languages, scraped from GitHub, and
mono, which is fine-tuned from the multi model on Python code only.


## gulp-cjs-to-esm.md

      
              1 file
            
          
              10 forks
            
          
                14 comments
              
            
              59 stars
            
          
                noraj
                / gulp-cjs-to-esm.md
            
            
              Last active
              November 1, 2024 15:02
            
              
                Moving gulpfile from CommonJS (CJS) to ECMAScript Modules (ESM)
              
          
    Moving gulpfile from CommonJS (CJS) to ECMAScript Modules (ESM)

Context

del v7.0.0 moved to pure ESM (no dual support), which forced me to move my gulpfile to ESM to be able to continue to use del.
The author sindresorhus maintains a lot of npm packages and does not want to provides an upgrade guide for each package so he provided a generic guide. But this guide is a bit vague because it's generic and not helping for gulp, hence this guide.
Guide


## HallOfBlame.md

      
              1 file
            
          
              25 forks
            
          
                85 comments
              
            
              225 stars
            
          
                yorickdowne
                / HallOfBlame.md
            
            
              Last active
              January 16, 2025 14:16
            
              
                Great and less great SSDs for Ethereum nodes
              
          
    Overview

Syncing an Ethereum node is largely reliant on latency and IOPS, I/O Per Second, of the storage. Budget SSDs will struggle to an extent, and some won't be able to sync at all. For simplicity, this page treats IOPS as a proxy for/predictor of latency.
This document aims to snapshot some known good and known bad models.
The drive lists are ordered by interface and then by capacity and alphabetically by vendor name, not by preference. The lists are not exhaustive at all. @mwpastore linked a filterable spreadsheet in comments that has a far greater variety of drives and their characteristics. Filter it by DRAM yes, NAND Type TLC, Form Factor M.2, and desired capacity.
For size, 4TB is a very conservative choice. The smaller 2TB drive should last an Ethereum full node until at least sometime 2026, with the pre-merge history expiry scheduled for Ma

  
## index.html

<div id="shader"></div>
<script id="vertex" type="x-shader/x-vertex">
  varying vec2 vUv;
	void main() { gl_Position = vec4(position, 1.0);
               vUv = uv;
              }
</script>

<script id="fragment" type="x-shader/x-fragment">

	<div id="shader"></div>
	<script id="vertex" type="x-shader/x-vertex">
	varying vec2 vUv;
	void main() { gl_Position = vec4(position, 1.0);
	vUv = uv;
	}
	</script>

	<script id="fragment" type="x-shader/x-fragment">