Skip to content

Instantly share code, notes, and snippets.

View sayakpaul's full-sized avatar
:octocat:
Learn, unlearn and relearn.

Sayak Paul sayakpaul

:octocat:
Learn, unlearn and relearn.
View GitHub Profile
@sayakpaul
sayakpaul / aot_compile_with_int8_quant.py
Last active November 3, 2024 00:22
Shows how to AoT compile the Flux.1 Dev Transformer with int8 quant and perform inference.
import torch
from diffusers import FluxTransformer2DModel
import torch.utils.benchmark as benchmark
from torchao.quantization import quantize_, int8_weight_only
from torchao.utils import unwrap_tensor_subclass
import torch._inductor
torch._inductor.config.mixed_mm_choice = "triton"
@sayakpaul
sayakpaul / inference.md
Last active November 20, 2024 12:48
(Not so rigrously tested) example showing how to use `bitsandbytes`, `peft`, etc. to LoRA fine-tune Flux.1 Dev.

When loading the LoRA params (that were obtained on a quantized base model) and merging them into the base model, it is recommended to first dequantize the base model, merge the LoRA params into it, and then quantize the model again. This is because merging into 4bit quantized models can lead to some rounding errors. Below, we provide an end-to-end example:

  1. First, load the original model and merge the LoRA params into it:
from diffusers import FluxPipeline 
import torch 

ckpt_id = "black-forest-labs/FLUX.1-dev"
pipeline = FluxPipeline.from_pretrained(
@sayakpaul
sayakpaul / low_rank_lora.py
Last active September 25, 2024 12:22
Make a high-rank LoRA low-rank.
"""
Usage:
python low_rank_lora.py --repo_id=glif/how2draw --filename="How2Draw-V2_000002800.safetensors" \
--new_rank=4 --new_lora_path="How2Draw-V2_000002800_rank_4.safetensors"
"""
import torch
from huggingface_hub import hf_hub_download
import safetensors.torch
@sayakpaul
sayakpaul / pipeline_flux_with_cfg_batched.py
Last active September 20, 2024 18:41
Flux with CFG (batched) 💣
# Copyright 2024 Black Forest Labs and The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
@sayakpaul
sayakpaul / README.md
Last active October 22, 2024 03:02
This code snippet shows how to split the Flux transformer across two 16GB GPUs and run inference with the full pipeline.
@sayakpaul
sayakpaul / inference.md
Last active October 21, 2024 01:38
Not so rigorously validated FP8 training of Flux (dev) DreamBooth LoRA
from diffusers import AutoPipelineForText2Image
import torch
pipeline = AutoPipelineForText2Image.from_pretrained(
    "black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16
).to("cuda")
pipeline.load_lora_weights("sayakpaul/yarn_art_lora_flux", weight_name="pytorch_lora_weights.safetensors")
image = pipeline("a puppy in a pond, yarn art style", guidance_scale=3.5, height=768).images[0]
image.save("yarn.png")
@sayakpaul
sayakpaul / inference_with_torchao_serialized.py
Last active November 18, 2024 00:59
Shows how to run Flux schnell under 17GBs without bells and whistles. It additionally shows how to serialize the quantized checkpoint and load it back.
import torch
from huggingface_hub import hf_hub_download
from diffusers import FluxTransformer2DModel, DiffusionPipeline
dtype, device = torch.bfloat16, "cuda"
ckpt_id = "black-forest-labs/FLUX.1-schnell"
with torch.device("meta"):
config = FluxTransformer2DModel.load_config(ckpt_id, subfolder="transformer")
model = FluxTransformer2DModel.from_config(config).to(dtype)
@sayakpaul
sayakpaul / distributed_inference_diffusers.py
Last active September 10, 2024 02:04
Minimal example to show how to run distributed inference from a set of prompts with diffusers and accelerate.
# Originally by jiwooya1000, put together together by sayakpaul.
# Documentation: https://huggingface.co/docs/diffusers/main/en/training/distributed_inference
"""
Run:
accelerate launch distributed_inference_diffusers.py --batch_size 8
# Enable memory optimizations for large models like SD3
accelerate launch distributed_inference_diffusers.py --batch_size 8 --low_mem=1
@sayakpaul
sayakpaul / run_flux_with_limited_resources.md
Last active September 30, 2024 06:25
This document enlists resources that show how to run Black Forest Lab's Flux with Diffusers under limited resources.
@sayakpaul
sayakpaul / run_flux_under_24gbs.py
Last active November 19, 2024 09:51
This gist shows how to run Flux on a 24GB 4090 card with Diffusers.
from diffusers import FluxPipeline, AutoencoderKL
from diffusers.image_processor import VaeImageProcessor
from transformers import T5EncoderModel, T5TokenizerFast, CLIPTokenizer, CLIPTextModel
import torch
import gc
def flush():
gc.collect()
torch.cuda.empty_cache()