jacobthebanana/2024-05-07-Vector-Institute-vectorlm-vllm-fsdp-interoperability-output.md Secret

## 2024-05-07-Vector-Institute-vectorlm-vllm-fsdp-interoperability-output.md

      
    Raw
  

              2024-05-07-Vector-Institute-vectorlm-vllm-fsdp-interoperability-output.md
            
          
    VectorLM Hot-Swapping Proof-of-Concept (LoRA via RamDisk)

The VectorLM finetuning code implementing LoRA hot-swapping can be found in this branch: (link).
Slide deck showcasing architecture of this approach: (link)
Steps to reproduce

This adaption relies on features included in a third-party pull-request (link) for the vLLM project. Since this pull request has not yet been merged at the time of writing, you would need to build vLLM manually from source:

link to a copy of the branch referenced in the pull request, hosted in a VectorInstitute fork of the vLLM project.
link to vLLM documentation on steps to install vLLM from source. Be sure to enable the punica kernels (set VLLM_INSTALL_PUNICA_KERNELS to 1 when installing) to enable LoRA hot-swap support.

Note that the punica vLLM LoRA hot-swap kernels require NVIDIA Ampere GPUs or newer.
Output

Output from an example LoRA hot-swap run. The gemma-2b model was LoRA fine-tuned (learning rate for AdamW: 1e-4) to minimize next-token cross-entropy loss on the following text:

Vector Institute of the University of British Columbia

Given that after about 100 steps, the model started to generate "Vector Institute of the University of British Columbia" when prompted "Vector Institute of", it is reasonable to believe that vLLM did picked up these parameter updates.
$ nvidia-smi -L && nvidia-smi topo -m | head -n 5
GPU 0: NVIDIA A100-SXM4-80GB (UUID: GPU-14b3057c-cd6d-8cf5-2089-926e52fa6904)
GPU 1: NVIDIA A100-SXM4-80GB (UUID: GPU-80d2dc6e-14f7-798e-5ce0-647e33324ef0)
        GPU0    GPU1    NIC0    NIC1    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NV4     NODE    SYS     1-3,5-7,9-10    0               N/A
GPU1    NV4      X      SYS     SYS             3               N/A
NIC0    NODE    SYS      X      SYS
NIC1    SYS     SYS     SYS      X 
(vectorlm-ampere) ~/vectorlm-prod$ PYTHONPATH=`realpath ~/vectorlm-prod/`:$PYTHONPATH python3 examples/llama_example_mp.py --yaml_path configs/config_gemma.yaml --world_size 2
virtualenv/vectorlm-ampere/lib/python3.10/site-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
INFO 05-07 11:37:24 pynccl.py:58 Loading nccl from library ~/.config/vllm/nccl/cu12/libnccl.so.2.18.1
WARNING 05-07 11:37:27 ray_utils.py:76 Unable to import Ray with ModuleNotFoundError("No module named 'ray'"). For multi-node distributed inference, please install Ray with `pip install ray`.
virtualenv/vectorlm-ampere/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
WARNING 05-07 11:37:29 config.py:1009 Casting torch.bfloat16 to torch.float16.
INFO 05-07 11:37:29 llm_engine.py:82 Initializing an LLM engine (v0.4.0.post1) with config: model='google/gemma-2b', speculative_config=None, tokenizer='google/gemma-2b', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=2, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, seed=0)
rank 1: init_worker_dist started
driver worker: init_worker_dist started
INFO 05-07 11:37:30 pynccl_utils.py:45 vLLM is using nccl==2.18.1
INFO 05-07 11:37:30 pynccl_utils.py:45 vLLM is using nccl==2.18.1
INFO 05-07 11:37:33 utils.py:129 reading GPU P2P access cache from ~/.config/vllm/gpu_p2p_access_cache_for_0,1.json
INFO 05-07 11:37:33 utils.py:129 reading GPU P2P access cache from ~/.config/vllm/gpu_p2p_access_cache_for_0,1.json
driver worker: init_worker_dist completed
rank 1: init_worker_dist completed
rank 1 vllm_init_barrier wait
rank 0 vllm_init_barrier wait
INFO 05-07 11:37:33 selector.py:28 Using FlashAttention backend.
(VectorLMWorker-1 pid=43837) INFO 05-07 11:37:33 selector.py:28 Using FlashAttention backend.
(VectorLMWorker-1 pid=43837) INFO 05-07 11:37:33 local_worker_utils.py:193 Worker ready; awaiting tasks
(VectorLMWorker-1 pid=43837) WARNING 05-07 11:37:33 gemma.py:54 Gemma's activation function was incorrectly set to exact GeLU in the config JSON file when it was initially released. Changing the activation function to approximate GeLU (`gelu_pytorch_tanh`). If you want to use the legacy `gelu`, edit the config JSON to set `hidden_activation=gelu` instead of `hidden_act`. See https://github.com/huggingface/transformers/pull/29402 for more details.
WARNING 05-07 11:37:33 gemma.py:54 Gemma's activation function was incorrectly set to exact GeLU in the config JSON file when it was initially released. Changing the activation function to approximate GeLU (`gelu_pytorch_tanh`). If you want to use the legacy `gelu`, edit the config JSON to set `hidden_activation=gelu` instead of `hidden_act`. See https://github.com/huggingface/transformers/pull/29402 for more details.
(VectorLMWorker-1 pid=43837) INFO 05-07 11:37:33 weight_utils.py:197 Using model weights format ['*.safetensors']
INFO 05-07 11:37:34 weight_utils.py:197 Using model weights format ['*.safetensors']
INFO 05-07 11:37:38 model_runner.py:169 Loading model weights took 2.3556 GB
(VectorLMWorker-1 pid=43837) INFO 05-07 11:37:38 model_runner.py:169 Loading model weights took 2.3556 GB
INFO 05-07 11:37:40 multi_gpu_executor.py:71 # GPU blocks: 79751, # CPU blocks: 14563
INFO 05-07 11:37:42 model_runner.py:967 Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 05-07 11:37:42 model_runner.py:971 CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
(VectorLMWorker-1 pid=43837) INFO 05-07 11:37:42 model_runner.py:967 Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
(VectorLMWorker-1 pid=43837) INFO 05-07 11:37:42 model_runner.py:971 CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
INFO 05-07 11:37:46 custom_all_reduce.py:230 Registering 1295 cuda graph addresses
(VectorLMWorker-1 pid=43837) INFO 05-07 11:37:46 custom_all_reduce.py:230 Registering 1295 cuda graph addresses
INFO 05-07 11:37:46 model_runner.py:1048 Graph capturing finished in 4 secs.
(VectorLMWorker-1 pid=43837) INFO 05-07 11:37:46 model_runner.py:1048 Graph capturing finished in 4 secs.
Instantiated ManagedLLM: <vectorlm.sampling.utils.ManagedLLM object at 0x7ff1a20adcc0>
main: vllm_init_barrier waiting
main: vllm_init_barrier cleared
(VectorLMWorker-1 pid=43837) rank 1 vllm_init_barrier cleared
rank 0 vllm_init_barrier cleared
Rank: 0, World size: 2
(VectorLMWorker-1 pid=43837) Rank: 1, World size: 2
Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  8.56it/s]
Vector Institute is at the future driving boundary between computer science and applied mathematics. They nurture next-
Gemma's activation function should be approximate GeLU and not exact GeLU.
Changing the activation function to `gelu_pytorch_tanh`.if you want to use the legacy `gelu`, edit the `model.config` to set `hidden_activation=gelu`   instead of `hidden_act`. See https://github.com/huggingface/transformers/pull/29402 for more details.
(VectorLMWorker-1 pid=43837) Gemma's activation function should be approximate GeLU and not exact GeLU.
(VectorLMWorker-1 pid=43837) Changing the activation function to `gelu_pytorch_tanh`.if you want to use the legacy `gelu`, edit the `model.config` to set `hidden_activation=gelu`   instead of `hidden_act`. See https://github.com/huggingface/transformers/pull/29402 for more details.
Loading checkpoint shards:   0%|                                                                                                                 | 0/2 [00:00<?, ?it/s]Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  6.06it/s]
(VectorLMWorker-1 pid=43837) trainable params: 921,600 || all params: 3,031,382,016 || trainable%: 0.030401974912290304
(VectorLMWorker-1 pid=43837) Model sharded. Per device model parameters are  1515691008
(VectorLMWorker-1 pid=43837) Initializing sampling_engine
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.04s/it]
trainable params: 921,600 || all params: 3,031,382,016 || trainable%: 0.030401974912290304
FSDP config: {'mixed_precision': MixedPrecision(param_dtype=torch.bfloat16, reduce_dtype=torch.bfloat16, buffer_dtype=torch.bfloat16, keep_low_precision_grads=False, cast_forward_inputs=False, cast_root_forward_inputs=True, _module_classes_to_ignore=(<class 'torch.nn.modules.batchnorm._BatchNorm'>,)), 'auto_wrap_policy': functools.partial(<function _or_policy at 0x7ff1b4cad630>, policies=[functools.partial(<function lambda_auto_wrap_policy at 0x7ff1b4cad120>, lambda_fn=<function lora_requires_grad_policy_fn at 0x7ff1b479ba30>), functools.partial(<function transformer_auto_wrap_policy at 0x7ff1b4cad510>, transformer_layer_cls={<class 'transformers.models.gemma.modeling_gemma.GemmaDecoderLayer'>})]), 'sharding_strategy': <ShardingStrategy.FULL_SHARD: 1>, 'device_id': 0, 'param_init_fn': None, 'sync_module_states': True}
Model sharded. Per device model parameters are  1515691008
Train dataset length 1000
Eval dataset length 100
Initializing sampling_engine
  0%|                                                                                                                                           | 0/63 [00:00<?, ?it/s]Evaluating
Step: 0, eval loss: 4.679570879255023
WARNING 05-07 11:37:56 tokenizer.py:120 No tokenizer found in /dev/shm/4702010, using base model tokenizer instead. (Exception: /dev/shm/4702010 does not appear to have a file named config.json. Checkout 'https://huggingface.co//dev/shm/4702010/tree/None' for available files.)
Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 22.72it/s]
Vector Institute of the University of Toronto together with a new partner, MicroStrategy, are bringing two new                           | 1/3 [00:00<00:00,  7.59it/s]
(VectorLMWorker-1 pid=43837) Vector Institute of the University of Toronto together with a new partner, MicroStrategy, are bringing two new
  3%|████▏                                                                                                                              | 2/63 [00:03<01:33,  1.53s/it]LR: 0.0001
 10%|████████████▍                                                                                                                      | 6/63 [00:04<00:25,  2.21it/s]LR: 9.99888864929809e-05
 13%|████████████████▋                                                                                                                  | 8/63 [00:05<00:19,  2.81it/s]WARNING 05-07 11:38:00 tokenizer.py:120 No tokenizer found in /dev/shm/4702010, using base model tokenizer instead. (Exception: /dev/shm/4702010 does not appear to have a file named config.json. Checkout 'https://huggingface.co//dev/shm/4702010/tree/None' for available files.)
Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 23.49it/s]
Vector Institute of the Open University of Venice is the first private research university accredited in Italy. It has                   | 1/3 [00:00<00:00,  7.84it/s]
(VectorLMWorker-1 pid=43837) Vector Institute of the Open University of Venice is the first private research university accredited in Italy. It has
 16%|████████████████████▋                                                                                                             | 10/63 [00:08<00:43,  1.22it/s]LR: 9.995555091232516e-05
Evaluating
Step: 10, eval loss: 4.594359261648996
 22%|████████████████████████████▉                                                                                                     | 14/63 [00:09<00:22,  2.17it/s]LR: 9.990000807704114e-05
 25%|█████████████████████████████████                                                                                                 | 16/63 [00:10<00:17,  2.71it/s]WARNING 05-07 11:38:05 tokenizer.py:120 No tokenizer found in /dev/shm/4702010, using base model tokenizer instead. (Exception: /dev/shm/4702010 does not appear to have a file named config.json. Checkout 'https://huggingface.co//dev/shm/4702010/tree/None' for available files.)
Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 23.38it/s]
Vector Institute of the Art as a centre of higher education opened in the Vidya Niketan, Radha Nagar                                     | 1/3 [00:00<00:00,  7.81it/s]
(VectorLMWorker-1 pid=43837) Vector Institute of the Art as a centre of higher education opened in the Vidya Niketan, Radha Nagar
 29%|█████████████████████████████████████▏                                                                                            | 18/63 [00:13<00:36,  1.23it/s]LR: 9.982228267815643e-05
 32%|█████████████████████████████████████████▎                                                                                        | 20/63 [00:13<00:23,  1.84it/s]Evaluating
Step: 20, eval loss: 4.490373066493443
 35%|█████████████████████████████████████████████▍                                                                                    | 22/63 [00:14<00:21,  1.92it/s]LR: 9.972240926774168e-05
 38%|█████████████████████████████████████████████████▌                                                                                | 24/63 [00:15<00:15,  2.51it/s]WARNING 05-07 11:38:11 tokenizer.py:120 No tokenizer found in /dev/shm/4702010, using base model tokenizer instead. (Exception: /dev/shm/4702010 does not appear to have a file named config.json. Checkout 'https://huggingface.co//dev/shm/4702010/tree/None' for available files.)
Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 23.59it/s]
Vector Institute of the UPTerm applies sophisticated planning techniques, broad policy knowledge, and sophisticated analytic tools to    | 1/3 [00:00<00:00,  7.88it/s]
(VectorLMWorker-1 pid=43837) Vector Institute of the UPTerm applies sophisticated planning techniques, broad policy knowledge, and sophisticated analytic tools to
 41%|█████████████████████████████████████████████████████▋                                                                            | 26/63 [00:18<00:31,  1.16it/s]LR: 9.96004322435508e-05
 48%|█████████████████████████████████████████████████████████████▉                                                                    | 30/63 [00:19<00:13,  2.38it/s]LR: 9.945640582928437e-05
Evaluating
Step: 30, eval loss: 4.400834492274693
 51%|██████████████████████████████████████████████████████████████████                                                                | 32/63 [00:20<00:14,  2.16it/s]WARNING 05-07 11:38:16 tokenizer.py:120 No tokenizer found in /dev/shm/4702010, using base model tokenizer instead. (Exception: /dev/shm/4702010 does not appear to have a file named config.json. Checkout 'https://huggingface.co//dev/shm/4702010/tree/None' for available files.)
Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 23.36it/s]
Vector Institute of the Health Sciences (VIHS) is located at 135 Pine Street in                                                          | 1/3 [00:00<00:00,  7.80it/s]
(VectorLMWorker-1 pid=43837) Vector Institute of the Health Sciences (VIHS) is located at 135 Pine Street in
 54%|██████████████████████████████████████████████████████████████████████▏                                                           | 34/63 [00:23<00:25,  1.16it/s]LR: 9.929039405048501e-05
 60%|██████████████████████████████████████████████████████████████████████████████▍                                                   | 38/63 [00:24<00:10,  2.38it/s]LR: 9.910247070607552e-05
 63%|██████████████████████████████████████████████████████████████████████████████████▌                                               | 40/63 [00:25<00:09,  2.55it/s]Evaluating
Step: 40, eval loss: 4.365881238664899
WARNING 05-07 11:38:21 tokenizer.py:120 No tokenizer found in /dev/shm/4702010, using base model tokenizer instead. (Exception: /dev/shm/4702010 does not appear to have a file named config.json. Checkout 'https://huggingface.co//dev/shm/4702010/tree/None' for available files.)
Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 23.17it/s]
Vector Institute of the Humanities Exploring Images for Learning What kind of image do you see and what kind of                          | 1/3 [00:00<00:00,  7.74it/s]
(VectorLMWorker-1 pid=43837) Vector Institute of the Humanities Exploring Images for Learning What kind of image do you see and what kind of
 67%|██████████████████████████████████████████████████████████████████████████████████████▋                                           | 42/63 [00:28<00:19,  1.07it/s]LR: 9.889271933555213e-05
 73%|██████████████████████████████████████████████████████████████████████████████████████████████▉                                   | 46/63 [00:29<00:07,  2.29it/s]LR: 9.866123318184803e-05
 76%|███████████████████████████████████████████████████████████████████████████████████████████████████                               | 48/63 [00:30<00:05,  2.80it/s]WARNING 05-07 11:38:26 tokenizer.py:120 No tokenizer found in /dev/shm/4702010, using base model tokenizer instead. (Exception: /dev/shm/4702010 does not appear to have a file named config.json. Checkout 'https://huggingface.co//dev/shm/4702010/tree/None' for available files.)
Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 22.95it/s]
Vector Institute of the Arts with the support of UniCamp Foundation are organizing a Cultural Camp "PEACE"                               | 1/3 [00:00<00:00,  7.66it/s]
(VectorLMWorker-1 pid=43837) Vector Institute of the Arts with the support of UniCamp Foundation are organizing a Cultural Camp "PEACE"
 79%|███████████████████████████████████████████████████████████████████████████████████████████████████████▏                          | 50/63 [00:33<00:10,  1.24it/s]LR: 9.840811514988294e-05
Evaluating
Step: 50, eval loss: 4.270726612636021
 86%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                  | 54/63 [00:34<00:04,  2.20it/s]LR: 9.813347776081789e-05
 89%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌              | 56/63 [00:35<00:02,  2.76it/s]WARNING 05-07 11:38:31 tokenizer.py:120 No tokenizer found in /dev/shm/4702010, using base model tokenizer instead. (Exception: /dev/shm/4702010 does not appear to have a file named config.json. Checkout 'https://huggingface.co//dev/shm/4702010/tree/None' for available files.)
Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 23.56it/s]
Vector Institute of the University of Toronto (VI) is a one-year PhD program, which provides                                             | 1/3 [00:00<00:00,  7.87it/s]
(VectorLMWorker-1 pid=43837) Vector Institute of the University of Toronto (VI) is a one-year PhD program, which provides
 92%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋          | 58/63 [00:38<00:04,  1.19it/s]LR: 9.783744310203491e-05
 95%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊      | 60/63 [00:38<00:01,  1.80it/s]Evaluating
Step: 60, eval loss: 4.208117348807199
 98%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉  | 62/63 [00:40<00:00,  1.90it/s]LR: 9.752014277286432e-05
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 63/63 [00:40<00:00,  1.56it/s]
  0%|                                                                                                                                           | 0/63 [00:00<?, ?it/s]WARNING 05-07 11:38:38 tokenizer.py:120 No tokenizer found in /dev/shm/4702010, using base model tokenizer instead. (Exception: /dev/shm/4702010 does not appear to have a file named config.json. Checkout 'https://huggingface.co//dev/shm/4702010/tree/None' for available files.)
Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 23.03it/s]
Vector Institute of the University you have signed in:███████▋                                                                           | 1/3 [00:00<00:00,  7.69it/s]
Doctorate in philosophy
The application was processed
(VectorLMWorker-1 pid=43837) Vector Institute of the University you have signed in:
(VectorLMWorker-1 pid=43837) Doctorate in philosophy
(VectorLMWorker-1 pid=43837) The application was processed
  5%|██████▏                                                                                                                            | 3/63 [00:03<00:51,  1.17it/s]LR: 9.718171782608356e-05
 11%|██████████████▌                                                                                                                    | 7/63 [00:04<00:21,  2.65it/s]LR: 9.682231870521347e-05
Evaluating
Step: 70, eval loss: 4.092340196881976
 13%|████████████████▋                                                                                                                  | 8/63 [00:05<00:28,  1.90it/s]Repo card metadata block was not found. Setting CardData to empty.
WARNING 05-07 11:38:43 tokenizer.py:120 No tokenizer found in /dev/shm/4702010, using base model tokenizer instead. (Exception: /dev/shm/4702010 does not appear to have a file named config.json. Checkout 'https://huggingface.co//dev/shm/4702010/tree/None' for available files.)
Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 23.21it/s]
Vector Institute of the University of Toronto, Toronto, ON.██▋                                                                           | 1/3 [00:00<00:00,  7.75it/s]
Erica Stuckey is a fourth year
(VectorLMWorker-1 pid=43837) Vector Institute of the University of Toronto, Toronto, ON.
(VectorLMWorker-1 pid=43837) Erica Stuckey is a fourth year
 17%|██████████████████████▋                                                                                                           | 11/63 [00:08<00:37,  1.41it/s]LR: 9.644210517764014e-05
 24%|██████████████████████████████▉                                                                                                   | 15/63 [00:09<00:18,  2.64it/s]LR: 9.60412462635919e-05
 25%|█████████████████████████████████                                                                                                 | 16/63 [00:09<00:16,  2.85it/s]Repo card metadata block was not found. Setting CardData to empty.
WARNING 05-07 11:38:48 tokenizer.py:120 No tokenizer found in /dev/shm/4702010, using base model tokenizer instead. (Exception: /dev/shm/4702010 does not appear to have a file named config.json. Checkout 'https://huggingface.co//dev/shm/4702010/tree/None' for available files.)
Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 23.28it/s]
Vector Institute of the McGill University Centre for Interactive Research (CIR):                                                         | 1/3 [00:00<00:00,  7.77it/s]

City g is a location-
(VectorLMWorker-1 pid=43837) Vector Institute of the McGill University Centre for Interactive Research (CIR):
(VectorLMWorker-1 pid=43837) 
(VectorLMWorker-1 pid=43837) City g is a location-
 27%|███████████████████████████████████                                                                                               | 17/63 [00:12<00:45,  1.00it/s]Evaluating
Step: 80, eval loss: 4.00127329145159
 30%|███████████████████████████████████████▏                                                                                          | 19/63 [00:13<00:33,  1.31it/s]LR: 9.561992016100293e-05
 37%|███████████████████████████████████████████████▍                                                                                  | 23/63 [00:14<00:17,  2.31it/s]LR: 9.517831416629716e-05
 38%|█████████████████████████████████████████████████▌                                                                                | 24/63 [00:15<00:15,  2.58it/s]WARNING 05-07 11:38:53 tokenizer.py:120 No tokenizer found in /dev/shm/4702010, using base model tokenizer instead. (Exception: /dev/shm/4702010 does not appear to have a file named config.json. Checkout 'https://huggingface.co//dev/shm/4702010/tree/None' for available files.)
Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 23.29it/s]
Vector Institute of the Massachusetts Institute of Technology and Stanford University                                                    | 1/3 [00:00<00:00,  7.78it/s]

\section{Term Paper}
May
(VectorLMWorker-1 pid=43837) Vector Institute of the Massachusetts Institute of Technology and Stanford University
(VectorLMWorker-1 pid=43837) 
(VectorLMWorker-1 pid=43837) \section{Term Paper}
(VectorLMWorker-1 pid=43837) May
 43%|███████████████████████████████████████████████████████▋                                                                          | 27/63 [00:18<00:25,  1.43it/s]LR: 9.471662459112747e-05
Evaluating
Step: 90, eval loss: 3.8485614231654575
 49%|███████████████████████████████████████████████████████████████▉                                                                  | 31/63 [00:19<00:13,  2.32it/s]LR: 9.423505667510724e-05
 51%|██████████████████████████████████████████████████████████████████                                                                | 32/63 [00:20<00:11,  2.58it/s]WARNING 05-07 11:38:58 tokenizer.py:120 No tokenizer found in /dev/shm/4702010, using base model tokenizer instead. (Exception: /dev/shm/4702010 does not appear to have a file named config.json. Checkout 'https://huggingface.co//dev/shm/4702010/tree/None' for available files.)
Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 23.07it/s]
Vector Institute of the University of Toronto completed its alchemy at the home of Avi Friedman, Bluenotes                               | 1/3 [00:00<00:00,  7.70it/s]
(VectorLMWorker-1 pid=43837) Vector Institute of the University of Toronto completed its alchemy at the home of Avi Friedman, Bluenotes
 56%|████████████████████████████████████████████████████████████████████████▏                                                         | 35/63 [00:23<00:19,  1.46it/s]LR: 9.373382449457304e-05
 59%|████████████████████████████████████████████████████████████████████████████▎                                                     | 37/63 [00:24<00:12,  2.10it/s]Evaluating
Step: 100, eval loss: 3.7914932795933316
 62%|████████████████████████████████████████████████████████████████████████████████▍                                                 | 39/63 [00:25<00:11,  2.06it/s]LR: 9.321315086741916e-05
 63%|██████████████████████████████████████████████████████████████████████████████████▌                                               | 40/63 [00:25<00:09,  2.36it/s]WARNING 05-07 11:39:03 tokenizer.py:120 No tokenizer found in /dev/shm/4702010, using base model tokenizer instead. (Exception: /dev/shm/4702010 does not appear to have a file named config.json. Checkout 'https://huggingface.co//dev/shm/4702010/tree/None' for available files.)
Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 23.21it/s]
Vector Institute of the University of British Columbia has been awarding premier a research and postgraduate education programs in data  | 1/3 [00:00<00:00,  7.75it/s]
(VectorLMWorker-1 pid=43837) Vector Institute of the University of British Columbia has been awarding premier a research and postgraduate education programs in data
 68%|████████████████████████████████████████████████████████████████████████████████████████▋                                         | 43/63 [00:28<00:13,  1.48it/s]LR: 9.267326725404599e-05
 75%|████████████████████████████████████████████████████████████████████████████████████████████████▉                                 | 47/63 [00:29<00:05,  2.69it/s]LR: 9.21144136544666e-05
Evaluating
Step: 110, eval loss: 3.700671059744699
 76%|███████████████████████████████████████████████████████████████████████████████████████████████████                               | 48/63 [00:30<00:07,  1.94it/s]WARNING 05-07 11:39:08 tokenizer.py:120 No tokenizer found in /dev/shm/4702010, using base model tokenizer instead. (Exception: /dev/shm/4702010 does not appear to have a file named config.json. Checkout 'https://huggingface.co//dev/shm/4702010/tree/None' for available files.)
Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 22.93it/s]
Vector Institute of the University of British Columbia, three sets of arrows of equal length with tails stacked end                      | 1/3 [00:00<00:00,  7.66it/s]
(VectorLMWorker-1 pid=43837) Vector Institute of the University of British Columbia, three sets of arrows of equal length with tails stacked end
 81%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▏                        | 51/63 [00:33<00:08,  1.43it/s]LR: 9.153683850161706e-05
 87%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                | 55/63 [00:34<00:03,  2.66it/s]LR: 9.094079855091797e-05
 89%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌              | 56/63 [00:35<00:02,  2.87it/s]WARNING 05-07 11:39:13 tokenizer.py:120 No tokenizer found in /dev/shm/4702010, using base model tokenizer instead. (Exception: /dev/shm/4702010 does not appear to have a file named config.json. Checkout 'https://huggingface.co//dev/shm/4702010/tree/None' for available files.)
Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 22.93it/s]
Vector Institute of the University of Toronto honoured 36 of their/our most promising graduates today during                             | 1/3 [00:00<00:00,  7.66it/s]
(VectorLMWorker-1 pid=43837) Vector Institute of the University of Toronto honoured 36 of their/our most promising graduates today during
 90%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌            | 57/63 [00:37<00:05,  1.01it/s]Evaluating
Step: 120, eval loss: 3.6563213893345425
 94%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋        | 59/63 [00:38<00:03,  1.31it/s]LR: 9.032655876613636e-05
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 63/63 [00:40<00:00,  1.57it/s]
  0%|                                                                                                                                           | 0/63 [00:00<?, ?it/s]LR: 8.96943922015986e-05
WARNING 05-07 11:39:20 tokenizer.py:120 No tokenizer found in /dev/shm/4702010, using base model tokenizer instead. (Exception: /dev/shm/4702010 does not appear to have a file named config.json. Checkout 'https://huggingface.co//dev/shm/4702010/tree/None' for available files.)
Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 23.16it/s]
Vector Institute of the University of British Columbia, Vancouver, Canada, and Department of Mathematics and Statistics,                 | 1/3 [00:00<00:00,  7.73it/s]
(VectorLMWorker-1 pid=43837) Vector Institute of the University of British Columbia, Vancouver, Canada, and Department of Mathematics and Statistics,
  6%|████████▎                                                                                                                          | 4/63 [00:03<00:36,  1.63it/s]LR: 8.904457988080681e-05
Evaluating
Step: 130, eval loss: 3.534939629690988
 13%|████████████████▋                                                                                                                  | 8/63 [00:05<00:21,  2.51it/s]LR: 8.83774106715125e-05
WARNING 05-07 11:39:25 tokenizer.py:120 No tokenizer found in /dev/shm/4702010, using base model tokenizer instead. (Exception: /dev/shm/4702010 does not appear to have a file named config.json. Checkout 'https://huggingface.co//dev/shm/4702010/tree/None' for available files.)
Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 23.33it/s]
Vector Institute of the University of British Columbia█████████████████████████████████████████████▎                                     | 2/3 [00:00<00:00, 15.58it/s]
(VectorLMWorker-1 pid=43837) Vector Institute of the University of British Columbia
 19%|████████████████████████▊                                                                                                         | 12/63 [00:08<00:29,  1.75it/s]LR: 8.76931811573033e-05
 22%|████████████████████████████▉                                                                                                     | 14/63 [00:09<00:20,  2.38it/s]Evaluating