Skip to content

Instantly share code, notes, and snippets.

View bigsnarfdude's full-sized avatar

BigsnarfDude bigsnarfdude

View GitHub Profile
@bigsnarfdude
bigsnarfdude / logseq.md
Last active January 16, 2025 05:04
logseq.md
  • summarized [[201809291404-Ghoussoub]]

  • [[summary]]

    • The speaker expresses gratitude to the scientific community in Canada for their respect and admiration, highlighting the importance of material science due to the " acceleration in the mathematization of all aspects of the sciences, but also of society. "

    • The speaker explains the rationale behind BIRS (Banff International Research Station), emphasizing the goal to " multiply the opportunities " and " democratize " access to talent and opportunities in mathematics and related fields.

    • BIRS aims to facilitate interaction among scientists, described as " getting people together to live together for a week, eat and drink, and sleep mathematics if they can, and do mathematics together collaboratively. "

• The speaker notes the historical context of BIRS, mentioning that the first institute of this form started in Germany post-World War II and that BIRS was inspired by similar models in Europe, such as one near Marseille, Fra

@bigsnarfdude
bigsnarfdude / llama3-3_-70b-alpaca.ipynb
Last active January 15, 2025 03:48
llama3-3_-70b-alpaca.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@bigsnarfdude
bigsnarfdude / MIM.md
Created January 15, 2025 02:07
Math in the Mountains - Video (MIM)

BIRS plan focused on processing BIRS existing 16000 video archive first, then setting up the continuous system. This is a good approach since it separates the bulk historical processing from the real-time system. Archive Processing Phase:

Create a Video Processing Pipeline:

Build a parallel processing system using Python's multiprocessing or distributed computing with Dask to handle the 16,000 videos efficiently Set up error handling and logging to track failed transcriptions and allow for easy retries Store video metadata (paths, durations, processing status) in SQLite or PostgreSQL for tracking

@bigsnarfdude
bigsnarfdude / metagene.py
Created January 14, 2025 18:38
metagene.py
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("metagene-ai/METAGENE-1")
model = AutoModelForCausalLM.from_pretrained("metagene-ai/METAGENE-1", torch_dtype=torch.bfloat16)
# Example input sequence
input_sequence = "TCACCGTTCTACAATCCCAAGCTGGAGTCAAGCTCAACAGGGTCTTC"
@bigsnarfdude
bigsnarfdude / phi4_sky_train.py
Last active January 14, 2025 06:20
phi4_sky_train.py
from unsloth import FastLanguageModel, is_bfloat16_supported
from unsloth.chat_templates import get_chat_template, train_on_responses_only
from datasets import Dataset
from trl import SFTTrainer
from transformers import TrainingArguments, DataCollatorForSeq2Seq
import torch
import wandb
from datetime import datetime
import json
@bigsnarfdude
bigsnarfdude / inference1.py
Created January 13, 2025 16:33
inference1.py
from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template
from transformers import TextStreamer
from peft import PeftModel
import torch
import re
def load_model(model_path):
"""Load the fine-tuned LoRA model and tokenizer"""
# Initialize base model and tokenizer
@bigsnarfdude
bigsnarfdude / inference.py
Last active January 13, 2025 01:10
inference.py
from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template
from transformers import TextStreamer
from peft import PeftModel
import torch
def load_model(model_path):
"""Load the fine-tuned LoRA model and tokenizer"""
# Initialize base model and tokenizer
base_model, tokenizer = FastLanguageModel.from_pretrained(
@bigsnarfdude
bigsnarfdude / simple_retraining.py
Last active January 13, 2025 00:41
simple_retraining.py
from unsloth import FastLanguageModel, is_bfloat16_supported
from unsloth.chat_templates import get_chat_template, standardize_sharegpt, train_on_responses_only
from datasets import Dataset
from trl import SFTTrainer
from transformers import TrainingArguments, DataCollatorForSeq2Seq, TextStreamer
import torch
import wandb
from datetime import datetime
import json
@bigsnarfdude
bigsnarfdude / few.py
Created January 12, 2025 06:08
few.py
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
class RegulationMatcher:
def __init__(self):
self.examples = []
self.vectorizer = TfidfVectorizer()
def add_example(self, statement, regulation):
"""Add a training example to the system."""