Skip to content

Instantly share code, notes, and snippets.

@awni
Created August 20, 2024 15:43
Show Gist options
  • Save awni/e6467ae27c8b8ca688bfaebaa733e177 to your computer and use it in GitHub Desktop.
Save awni/e6467ae27c8b8ca688bfaebaa733e177 to your computer and use it in GitHub Desktop.
Meta Llama 3.1 with MLX LM and the MLX Python API as Context
import os
import mlx.core as mx
from mlx_lm import load, generate
filename = os.path.join(os.path.dirname(mx.__file__), "core/__init__.pyi")
with open(filename, 'r') as fid:
prompt = fid.read()
prompt += "\nHow do you write a self-attention layer using the above API in MLX?"
model, tokenizer = load("mlx-community/meta-Llama-3.1-8B-Instruct-4bit")
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
generate(
model,
tokenizer,
prompt,
512,
verbose=True,
temp=0.0,
max_kv_size=4096,
)
@hvaara
Copy link

hvaara commented Aug 20, 2024

Got bad output though:

[...]
    Returns:
        array: The array of zeros with the specified shape.
    """

def zeros_like(a: array, /, *, stream: Union[None, Stream, Device] = None) -> array:
    """
    An array of zeros like the input.

    Args:
        a (array): The input to take the shape and type from.

    Returns:
        array: The output array filled with zeros.
    """

How do you write a self-attention layer using the above API in MLX?<|eot_id|><|start_header_id|>assistant<|end_header_id|>


assistant
assistant
assistant!
==========
Prompt: 31078 tokens, 670.746 tokens-per-sec
Generation: 7 tokens, 29.623 tokens-per-sec
Peak memory: 7.158 GB

@hvaara
Copy link

hvaara commented Aug 20, 2024

M3 Max 128GB
macOS 14.6.1

mlx                       0.16.3
mlx-lm                    0.17.0

@awni
Copy link
Author

awni commented Aug 20, 2024

Ah sorry about that. This relies on a not yet released MLX / MLX LM. We should have the release out which supports this by Thursday. In the meantime here are the instructions to build from source:

pip install git+https://github.com/ml-explore/mlx.git
pip install git+https://github.com/ml-explore/mlx-examples.git@use_fast_rope

@hvaara
Copy link

hvaara commented Aug 20, 2024

No worries. Thanks you so much for the help! With your instructions I managed to get a working recipe from a fresh environment. Leaving it here for those that might be interested.

# Create workspace
mkdir mlx-test
cd mlx-test

# Create new environment
mamba create -n mlx-test python=3.12
mamba activate mlx-test

# Download and install mlx
git clone https://github.com/ml-explore/mlx.git
cd mlx
pip install nanobind
python setup.py develop
python setup.py generate_stubs

cd ..

# Download and install mlx-llm
git clone https://github.com/ml-explore/mlx-examples.git
cd mlx-examples
git checkout use_fast_rope
cd llms
python setup.py develop

cd ../..

# Download and run script
wget https://gist.githubusercontent.com/awni/e6467ae27c8b8ca688bfaebaa733e177/raw/3a7b5dc593130a3de60e5554ac2eaee0be08a540/mlx_api_prompt.py
python mlx_api_prompt.py

@seshakiran
Copy link

Ah sorry about that. This relies on a not yet released MLX / MLX LM. We should have the release out which supports this by Thursday. In the meantime here are the instructions to build from source:

pip install git+https://github.com/ml-explore/mlx.git
pip install git+https://github.com/ml-explore/mlx-examples.git@use_fast_rope

Getting an error from this

Running command git checkout -b use_fast_rope --track origin/use_fast_rope
Switched to a new branch 'use_fast_rope'
branch 'use_fast_rope' set up to track 'origin/use_fast_rope'.
Resolved https://github.com/ml-explore/mlx-examples.git to commit 0a52a9d55a5c1bfa6b85ab63b259e4c86e98b62a
ERROR: git+https://github.com/ml-explore/mlx-examples.git@use_fast_rope does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found.

Any insights??

@hvaara
Copy link

hvaara commented Aug 21, 2024

@seshakiran See the instructions I posted.

Alternatively, if you already have core/__init__.pyi, you can do pip install "git+https://github.com/ml-explore/mlx-examples.git@use_fast_rope#egg=mlx-lm&subdirectory=llms" --no-deps.

@fblissjr
Copy link

fblissjr commented Sep 9, 2024

@awni Could see it being useful to kv cache MLX docs like this for porting.

@fblissjr
Copy link

fblissjr commented Sep 9, 2024

@awni on that note above, have you or anyone else found if the method above (getting the methods for MLX via the library) act as the best docs for porting code to MLX?

Was about to start working on a LoRA trainer for FLUX.1 (starting with looking at mflux), and it's a beast of torch/diffusers/cuda code.

@awni
Copy link
Author

awni commented Sep 9, 2024

have you or anyone else found if the method above (getting the methods for MLX via the library) act as the best docs for porting code to MLX?

I haven't tried much there tbh. The API I use above includes the docstrings (from which a lot of the docs are autogenerated) so there would be substantial overlap between using that and using the actual docs.

@fblissjr
Copy link

fblissjr commented Sep 9, 2024

have you or anyone else found if the method above (getting the methods for MLX via the library) act as the best docs for porting code to MLX?

I haven't tried much there tbh. The API I use above includes the docstrings (from which a lot of the docs are autogenerated) so there would be substantial overlap between using that and using the actual docs.

I've been using the MLX .md docs and formatting them with structure via https://github.com/simonw/files-to-prompt

The docstrings is a good idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment