priyakasimbeg/gist:4ac0309a7a6bc0464a605ea67565bc52 Secret

## gistfile1.txt
Run docker pull us-central1-docker.pkg.dev/training-algorithms-external/mlcommons-docker-repo/algoperf_pytorch_dev
  docker pull us-central1-docker.pkg.dev/training-algorithms-external/mlcommons-docker-repo/algoperf_pytorch_dev
  docker run  -v $HOME/data/:/data/ -v $HOME/experiment_runs/:/experiment_runs -v $HOME/experiment_runs/logs:/logs --gpus all --ipc=host us-central1-docker.pkg.dev/training-algorithms-external/mlcommons-docker-repo/algoperf_pytorch_dev  -d imagenet -f pytorch -s reference_algorithms/paper_baselines/adamw/pytorch/submission.py -w imagenet_resnet -t reference_algorithms/paper_baselines/adamw/tuning_search_space.json -e tests/regression_tests/adamw -m 10 -c False -o True -r false
  shell: /usr/bin/bash -e {0}
Using default tag: latest
latest: Pulling from training-algorithms-external/mlcommons-docker-repo/algoperf_pytorch_dev
Digest: sha256:b63f6d6b1eb7610299ff239af48be392e40bcf3a8922a777e29c79043475d245
Status: Image is up to date for us-central1-docker.pkg.dev/training-algorithms-external/mlcommons-docker-repo/algoperf_pytorch_dev:latest
us-central1-docker.pkg.dev/training-algorithms-external/mlcommons-docker-repo/algoperf_pytorch_dev:latest
torchrun --redirects 1:0,2:0,3:0,4:0,5:0,6:0,7:0 --standalone --nnodes=1 --nproc_per_node=8 submission_runner.py --framework=pytorch --workload=imagenet_resnet --submission_path=reference_algorithms/paper_baselines/adamw/pytorch/submission.py --data_dir=/data/imagenet/pytorch --num_tuning_trials=1 --experiment_dir=/experiment_runs --experiment_name=tests/regression_tests/adamw --overwrite=True --save_checkpoints=False --max_global_steps=10 --imagenet_v2_data_dir=/data/imagenet/pytorch --torch_compile=true --tuning_ruleset=external --tuning_search_space=reference_algorithms/paper_baselines/adamw/tuning_search_space.json 2>&1 | tee -a /logs/imagenet_resnet_pytorch_03-31-2024-08-09-56.log
Warning: 31 08:09:58,279] torch.distributed.run: [WARNING] master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified.
Warning: 31 08:09:58,279] torch.distributed.run: [WARNING]
Warning: 31 08:09:58,279] torch.distributed.run: [WARNING] *****************************************
Warning: 31 08:09:58,279] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
Warning: 31 08:09:58,279] torch.distributed.run: [WARNING] *****************************************
/usr/local/lib/python3.8/dist-packages/tensorflow_addons/utils/tfa_eol_msg.py:23: UserWarning:

TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP).

For more information see: https://github.com/tensorflow/addons/issues/2807

  warnings.warn(
/usr/local/lib/python3.8/dist-packages/tensorflow_addons/utils/tfa_eol_msg.py:23: UserWarning:

TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP).

For more information see: https://github.com/tensorflow/addons/issues/2807

  warnings.warn(
/usr/local/lib/python3.8/dist-packages/tensorflow_addons/utils/tfa_eol_msg.py:23: UserWarning:

TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP).

For more information see: https://github.com/tensorflow/addons/issues/2807

  warnings.warn(
/usr/local/lib/python3.8/dist-packages/tensorflow_addons/utils/tfa_eol_msg.py:23: UserWarning:

TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP).

For more information see: https://github.com/tensorflow/addons/issues/2807

  warnings.warn(
/usr/local/lib/python3.8/dist-packages/tensorflow_addons/utils/tfa_eol_msg.py:23: UserWarning:

TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP).

For more information see: https://github.com/tensorflow/addons/issues/2807

  warnings.warn(
/usr/local/lib/python3.8/dist-packages/tensorflow_addons/utils/tfa_eol_msg.py:23: UserWarning:

TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP).

For more information see: https://github.com/tensorflow/addons/issues/2807

  warnings.warn(
/usr/local/lib/python3.8/dist-packages/tensorflow_addons/utils/tfa_eol_msg.py:23: UserWarning:

TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP).

For more information see: https://github.com/tensorflow/addons/issues/2807

  warnings.warn(
/usr/local/lib/python3.8/dist-packages/tensorflow_addons/utils/tfa_eol_msg.py:23: UserWarning:

TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP).

For more information see: https://github.com/tensorflow/addons/issues/2807

  warnings.warn(
I0331 08:10:12.752967 140111137965888 logger_utils.py:61] Removing existing experiment directory /experiment_runs/tests/regression_tests/adamw/imagenet_resnet_pytorch because --overwrite was set.
I0331 08:10:12.755207 140111137965888 logger_utils.py:76] Creating experiment directory at /experiment_runs/tests/regression_tests/adamw/imagenet_resnet_pytorch.
I0331 08:10:12.790524 140111137965888 submission_runner.py:561] Using RNG seed 1054528477
I0331 08:10:12.791703 140111137965888 submission_runner.py:570] --- Tuning run 1/1 ---
I0331 08:10:12.791840 140111137965888 submission_runner.py:575] Creating tuning directory at /experiment_runs/tests/regression_tests/adamw/imagenet_resnet_pytorch/trial_1.
I0331 08:10:12.792509 140111137965888 logger_utils.py:92] Saving hparams to /experiment_runs/tests/regression_tests/adamw/imagenet_resnet_pytorch/trial_1/hparams.json.
I0331 08:10:13.060805 140111137965888 submission_runner.py:215] Initializing dataset.
I0331 08:10:25.238856 140111137965888 submission_runner.py:226] Initializing model.
I0331 08:10:34.396007 140111137965888 submission_runner.py:264] Performing `torch.compile`.
I0331 08:10:37.181028 140111137965888 submission_runner.py:268] Initializing optimizer.
I0331 08:10:37.183613 140111137965888 submission_runner.py:275] Initializing metrics bundle.
I0331 08:10:37.183767 140111137965888 submission_runner.py:293] Initializing checkpoint and logger.
I0331 08:10:37.184618 140111137965888 submission_runner.py:313] Saving meta data to /experiment_runs/tests/regression_tests/adamw/imagenet_resnet_pytorch/trial_1/meta_data_0.json.
I0331 08:10:37.725978 140111137965888 submission_runner.py:317] Saving flags to /experiment_runs/tests/regression_tests/adamw/imagenet_resnet_pytorch/trial_1/flags_0.json.
I0331 08:10:37.773440 140111137965888 submission_runner.py:327] Starting training loop.
Warning: 2024-03-31 08:10:40,154] [0/0] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
Warning: 2024-03-31 08:10:40,183] [0/0] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
Warning: 2024-03-31 08:10:40,352] [0/0] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
Warning: 2024-03-31 08:10:40,388] [0/0] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
Warning: 2024-03-31 08:10:40,404] [0/0] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
Warning: 2024-03-31 08:10:40,406] [0/0] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
Warning: 2024-03-31 08:10:40,411] [0/0] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
Warning: 2024-03-31 08:10:40,442] [0/0] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
/usr/local/lib/python3.8/dist-packages/torch/autograd/__init__.py:251: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed.  This is not an error, but may impair performance.
grad.sizes() = [512, 2048, 1, 1], strides() = [2048, 1, 2048, 2048]
bucket_view.sizes() = [512, 2048, 1, 1], strides() = [2048, 1, 1, 1] (Triggered internally at ../torch/csrc/distributed/c10d/reducer.cpp:320.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
/usr/local/lib/python3.8/dist-packages/torch/autograd/__init__.py:251: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed.  This is not an error, but may impair performance.
grad.sizes() = [512, 2048, 1, 1], strides() = [2048, 1, 2048, 2048]
bucket_view.sizes() = [512, 2048, 1, 1], strides() = [2048, 1, 1, 1] (Triggered internally at ../torch/csrc/distributed/c10d/reducer.cpp:320.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
/usr/local/lib/python3.8/dist-packages/torch/autograd/__init__.py:251: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed.  This is not an error, but may impair performance.
grad.sizes() = [512, 2048, 1, 1], strides() = [2048, 1, 2048, 2048]
bucket_view.sizes() = [512, 2048, 1, 1], strides() = [2048, 1, 1, 1] (Triggered internally at ../torch/csrc/distributed/c10d/reducer.cpp:320.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
/usr/local/lib/python3.8/dist-packages/torch/autograd/__init__.py:251: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed.  This is not an error, but may impair performance.
grad.sizes() = [512, 2048, 1, 1], strides() = [2048, 1, 2048, 2048]
bucket_view.sizes() = [512, 2048, 1, 1], strides() = [2048, 1, 1, 1] (Triggered internally at ../torch/csrc/distributed/c10d/reducer.cpp:320.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
/usr/local/lib/python3.8/dist-packages/torch/autograd/__init__.py:251: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed.  This is not an error, but may impair performance.
grad.sizes() = [512, 2048, 1, 1], strides() = [2048, 1, 2048, 2048]
bucket_view.sizes() = [512, 2048, 1, 1], strides() = [2048, 1, 1, 1] (Triggered internally at ../torch/csrc/distributed/c10d/reducer.cpp:320.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
/usr/local/lib/python3.8/dist-packages/torch/autograd/__init__.py:251: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed.  This is not an error, but may impair performance.
grad.sizes() = [512, 2048, 1, 1], strides() = [2048, 1, 2048, 2048]
bucket_view.sizes() = [512, 2048, 1, 1], strides() = [2048, 1, 1, 1] (Triggered internally at ../torch/csrc/distributed/c10d/reducer.cpp:320.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
/usr/local/lib/python3.8/dist-packages/torch/autograd/__init__.py:251: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed.  This is not an error, but may impair performance.
grad.sizes() = [512, 2048, 1, 1], strides() = [2048, 1, 2048, 2048]
bucket_view.sizes() = [512, 2048, 1, 1], strides() = [2048, 1, 1, 1] (Triggered internally at ../torch/csrc/distributed/c10d/reducer.cpp:320.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
/usr/local/lib/python3.8/dist-packages/torch/autograd/__init__.py:251: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed.  This is not an error, but may impair performance.
grad.sizes() = [512, 2048, 1, 1], strides() = [2048, 1, 2048, 2048]
bucket_view.sizes() = [512, 2048, 1, 1], strides() = [2048, 1, 1, 1] (Triggered internally at ../torch/csrc/distributed/c10d/reducer.cpp:320.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
I0331 08:12:35.179400 140080983574272 logging_writer.py:48] [0] global_step=0, grad_norm=0.587501, loss=6.925025
I0331 08:12:35.195179 140111137965888 submission.py:120] 0) loss = 6.925, grad_norm = 0.588
I0331 08:12:35.826992 140111137965888 spec.py:321] Evaluating on the training split.
Warning: 2024-03-31 08:12:45,933] [0/1] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
Warning: 2024-03-31 08:12:45,964] [0/1] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
Warning: 2024-03-31 08:12:46,030] [0/1] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
Warning: 2024-03-31 08:12:46,126] [0/1] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
Warning: 2024-03-31 08:12:46,176] [0/1] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
Warning: 2024-03-31 08:12:46,181] [0/1] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
Warning: 2024-03-31 08:12:46,288] [0/1] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
Warning: 2024-03-31 08:12:46,511] [0/1] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
I0331 08:13:58.720955 140111137965888 spec.py:333] Evaluating on the validation split.
Warning: 2024-03-31 08:14:51,491] [0/2] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
Warning: 2024-03-31 08:14:51,719] [0/2] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
Warning: 2024-03-31 08:14:52,093] [0/2] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
Warning: 2024-03-31 08:14:52,553] [0/2] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
Warning: 2024-03-31 08:14:53,071] [0/2] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
Warning: 2024-03-31 08:14:54,379] [0/2] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
Warning: 2024-03-31 08:14:54,424] [0/2] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
Warning: 2024-03-31 08:14:56,719] [0/2] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:110: UserWarning: 'has_cuda' is deprecated, please use 'torch.backends.cuda.is_built()'
  torch.has_cuda,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:111: UserWarning: 'has_cudnn' is deprecated, please use 'torch.backends.cudnn.is_available()'
  torch.has_cudnn,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:117: UserWarning: 'has_mps' is deprecated, please use 'torch.backends.mps.is_built()'
  torch.has_mps,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:118: UserWarning: 'has_mkldnn' is deprecated, please use 'torch.backends.mkldnn.is_available()'
  torch.has_mkldnn,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:110: UserWarning: 'has_cuda' is deprecated, please use 'torch.backends.cuda.is_built()'
  torch.has_cuda,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:111: UserWarning: 'has_cudnn' is deprecated, please use 'torch.backends.cudnn.is_available()'
  torch.has_cudnn,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:117: UserWarning: 'has_mps' is deprecated, please use 'torch.backends.mps.is_built()'
  torch.has_mps,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:118: UserWarning: 'has_mkldnn' is deprecated, please use 'torch.backends.mkldnn.is_available()'
  torch.has_mkldnn,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:110: UserWarning: 'has_cuda' is deprecated, please use 'torch.backends.cuda.is_built()'
  torch.has_cuda,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:111: UserWarning: 'has_cudnn' is deprecated, please use 'torch.backends.cudnn.is_available()'
  torch.has_cudnn,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:117: UserWarning: 'has_mps' is deprecated, please use 'torch.backends.mps.is_built()'
  torch.has_mps,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:118: UserWarning: 'has_mkldnn' is deprecated, please use 'torch.backends.mkldnn.is_available()'
  torch.has_mkldnn,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:110: UserWarning: 'has_cuda' is deprecated, please use 'torch.backends.cuda.is_built()'
  torch.has_cuda,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:111: UserWarning: 'has_cudnn' is deprecated, please use 'torch.backends.cudnn.is_available()'
  torch.has_cudnn,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:117: UserWarning: 'has_mps' is deprecated, please use 'torch.backends.mps.is_built()'
  torch.has_mps,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:118: UserWarning: 'has_mkldnn' is deprecated, please use 'torch.backends.mkldnn.is_available()'
  torch.has_mkldnn,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:110: UserWarning: 'has_cuda' is deprecated, please use 'torch.backends.cuda.is_built()'
  torch.has_cuda,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:111: UserWarning: 'has_cudnn' is deprecated, please use 'torch.backends.cudnn.is_available()'
  torch.has_cudnn,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:117: UserWarning: 'has_mps' is deprecated, please use 'torch.backends.mps.is_built()'
  torch.has_mps,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:118: UserWarning: 'has_mkldnn' is deprecated, please use 'torch.backends.mkldnn.is_available()'
  torch.has_mkldnn,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:110: UserWarning: 'has_cuda' is deprecated, please use 'torch.backends.cuda.is_built()'
  torch.has_cuda,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:111: UserWarning: 'has_cudnn' is deprecated, please use 'torch.backends.cudnn.is_available()'
  torch.has_cudnn,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:117: UserWarning: 'has_mps' is deprecated, please use 'torch.backends.mps.is_built()'
  torch.has_mps,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:118: UserWarning: 'has_mkldnn' is deprecated, please use 'torch.backends.mkldnn.is_available()'
  torch.has_mkldnn,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:110: UserWarning: 'has_cuda' is deprecated, please use 'torch.backends.cuda.is_built()'
  torch.has_cuda,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:111: UserWarning: 'has_cudnn' is deprecated, please use 'torch.backends.cudnn.is_available()'
  torch.has_cudnn,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:117: UserWarning: 'has_mps' is deprecated, please use 'torch.backends.mps.is_built()'
  torch.has_mps,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:118: UserWarning: 'has_mkldnn' is deprecated, please use 'torch.backends.mkldnn.is_available()'
  torch.has_mkldnn,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:110: UserWarning: 'has_cuda' is deprecated, please use 'torch.backends.cuda.is_built()'
  torch.has_cuda,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:111: UserWarning: 'has_cudnn' is deprecated, please use 'torch.backends.cudnn.is_available()'
  torch.has_cudnn,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:117: UserWarning: 'has_mps' is deprecated, please use 'torch.backends.mps.is_built()'
  torch.has_mps,
/usr/local/lib/python3.8/dist-packages/torch/overrides.py:118: UserWarning: 'has_mkldnn' is deprecated, please use 'torch.backends.mkldnn.is_available()'
  torch.has_mkldnn,
I0331 08:15:32.784438 140111137965888 spec.py:349] Evaluating on the test split.
I0331 08:15:32.801020 140111137965888 dataset_info.py:578] Load dataset info from /data/imagenet/pytorch/imagenet_v2/matched-frequency/3.0.0
I0331 08:15:32.807415 140111137965888 dataset_builder.py:528] Reusing dataset imagenet_v2 (/data/imagenet/pytorch/imagenet_v2/matched-frequency/3.0.0)
I0331 08:15:32.875621 140111137965888 logging_logger.py:49] Constructing tf.data.Dataset imagenet_v2 for split test, from /data/imagenet/pytorch/imagenet_v2/matched-frequency/3.0.0
Warning: 2024-03-31 08:15:34,725] [0/3] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
Warning: 2024-03-31 08:15:34,765] [0/3] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
Warning: 2024-03-31 08:15:34,790] [0/3] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
Warning: 2024-03-31 08:15:34,941] [0/3] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
Warning: 2024-03-31 08:15:34,952] [0/3] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
Warning: 2024-03-31 08:15:35,052] [0/3] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
Warning: 2024-03-31 08:15:35,511] [0/3] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
Warning: 2024-03-31 08:15:35,777] [0/3] torch._dynamo.variables.torch: [WARNING] Profiler function <class 'torch.autograd.profiler.record_function'> will be ignored
I0331 08:16:14.840835 140111137965888 submission_runner.py:426] Time since start: 337.07s, 	Step: 1, 	{'train/accuracy': 0.0017737563775510204, 'train/loss': 6.928203972018495, 'validation/accuracy': 0.00134, 'validation/loss': 6.928408125, 'validation/num_examples': 50000, 'test/accuracy': 0.0007, 'test/loss': 6.93043828125, 'test/num_examples': 10000, 'score': 117.4308168888092, 'total_duration': 337.0682120323181, 'accumulated_submission_time': 117.4308168888092, 'accumulated_eval_time': 219.0140655040741, 'accumulated_logging_time': 0}
I0331 08:16:14.855430 140065913165568 logging_writer.py:48] [1] accumulated_eval_time=219.014066, accumulated_logging_time=0, accumulated_submission_time=117.430817, global_step=1, preemption_count=0, score=117.430817, test/accuracy=0.000700, test/loss=6.930438, test/num_examples=10000, total_duration=337.068212, train/accuracy=0.001774, train/loss=6.928204, validation/accuracy=0.001340, validation/loss=6.928408, validation/num_examples=50000
I0331 08:16:15.974320 140065904772864 logging_writer.py:48] [1] global_step=1, grad_norm=0.612899, loss=6.924363
I0331 08:16:15.978590 140111137965888 submission.py:120] 1) loss = 6.924, grad_norm = 0.613
I0331 08:16:16.346361 140065913165568 logging_writer.py:48] [2] global_step=2, grad_norm=0.605277, loss=6.926661
I0331 08:16:16.350334 140111137965888 submission.py:120] 2) loss = 6.927, grad_norm = 0.605
I0331 08:16:16.719233 140065904772864 logging_writer.py:48] [3] global_step=3, grad_norm=0.605843, loss=6.928249
I0331 08:16:16.723332 140111137965888 submission.py:120] 3) loss = 6.928, grad_norm = 0.606
I0331 08:16:17.091580 140065913165568 logging_writer.py:48] [4] global_step=4, grad_norm=0.598531, loss=6.928620
I0331 08:16:17.099368 140111137965888 submission.py:120] 4) loss = 6.929, grad_norm = 0.599
I0331 08:16:17.469580 140065904772864 logging_writer.py:48] [5] global_step=5, grad_norm=0.594602, loss=6.929913
I0331 08:16:17.475684 140111137965888 submission.py:120] 5) loss = 6.930, grad_norm = 0.595
I0331 08:16:17.847174 140065913165568 logging_writer.py:48] [6] global_step=6, grad_norm=0.609444, loss=6.932057
I0331 08:16:17.852178 140111137965888 submission.py:120] 6) loss = 6.932, grad_norm = 0.609
I0331 08:16:18.223795 140065904772864 logging_writer.py:48] [7] global_step=7, grad_norm=0.611987, loss=6.926456
I0331 08:16:18.228719 140111137965888 submission.py:120] 7) loss = 6.926, grad_norm = 0.612
I0331 08:16:18.601529 140065913165568 logging_writer.py:48] [8] global_step=8, grad_norm=0.606430, loss=6.921168
I0331 08:16:18.606384 140111137965888 submission.py:120] 8) loss = 6.921, grad_norm = 0.606
I0331 08:16:18.991785 140065904772864 logging_writer.py:48] [9] global_step=9, grad_norm=0.619730, loss=6.922999
I0331 08:16:18.995997 140111137965888 submission.py:120] 9) loss = 6.923, grad_norm = 0.620
I0331 08:16:19.791687 140111137965888 spec.py:321] Evaluating on the training split.
I0331 08:17:08.523149 140111137965888 spec.py:333] Evaluating on the validation split.
I0331 08:17:54.704064 140111137965888 spec.py:349] Evaluating on the test split.
I0331 08:17:55.841307 140111137965888 submission_runner.py:426] Time since start: 438.07s, 	Step: 10, 	{'train/accuracy': 0.0008569834183673469, 'train/loss': 6.916042405731824, 'validation/accuracy': 0.0008, 'validation/loss': 6.915869375, 'validation/num_examples': 50000, 'test/accuracy': 0.0008, 'test/loss': 6.91645234375, 'test/num_examples': 10000, 'score': 120.95486831665039, 'total_duration': 438.06874418258667, 'accumulated_submission_time': 120.95486831665039, 'accumulated_eval_time': 315.0638761520386, 'accumulated_logging_time': 0.026315689086914062}
I0331 08:17:55.855891 140065921558272 logging_writer.py:48] [10] accumulated_eval_time=315.063876, accumulated_logging_time=0.026316, accumulated_submission_time=120.954868, global_step=10, preemption_count=0, score=120.954868, test/accuracy=0.000800, test/loss=6.916452, test/num_examples=10000, total_duration=438.068744, train/accuracy=0.000857, train/loss=6.916042, validation/accuracy=0.000800, validation/loss=6.915869, validation/num_examples=50000
I0331 08:17:56.505723 140065929950976 logging_writer.py:48] [10] global_step=10, preemption_count=0, score=120.954868
I0331 08:17:56.975934 140111137965888 submission_runner.py:600] Tuning trial 1/1
I0331 08:17:56.976132 140111137965888 submission_runner.py:601] Hyperparameters: Hyperparameters(learning_rate=0.0019814680146414726, one_minus_beta1=0.22838767981804783, beta2=0.999, warmup_factor=0.05, weight_decay=0.010340635370188849, label_smoothing=0.1, dropout_rate=0.0)
I0331 08:17:56.976719 140111137965888 submission_runner.py:602] Metrics: {'eval_results': [(1, {'train/accuracy': 0.0017737563775510204, 'train/loss': 6.928203972018495, 'validation/accuracy': 0.00134, 'validation/loss': 6.928408125, 'validation/num_examples': 50000, 'test/accuracy': 0.0007, 'test/loss': 6.93043828125, 'test/num_examples': 10000, 'score': 117.4308168888092, 'total_duration': 337.0682120323181, 'accumulated_submission_time': 117.4308168888092, 'accumulated_eval_time': 219.0140655040741, 'accumulated_logging_time': 0, 'global_step': 1, 'preemption_count': 0}), (10, {'train/accuracy': 0.0008569834183673469, 'train/loss': 6.916042405731824, 'validation/accuracy': 0.0008, 'validation/loss': 6.915869375, 'validation/num_examples': 50000, 'test/accuracy': 0.0008, 'test/loss': 6.91645234375, 'test/num_examples': 10000, 'score': 120.95486831665039, 'total_duration': 438.06874418258667, 'accumulated_submission_time': 120.95486831665039, 'accumulated_eval_time': 315.0638761520386, 'accumulated_logging_time': 0.026315689086914062, 'global_step': 10, 'preemption_count': 0})], 'global_step': 10}
I0331 08:17:56.976811 140111137965888 submission_runner.py:603] Timing: 120.95486831665039
I0331 08:17:56.976862 140111137965888 submission_runner.py:605] Total number of evals: 2
I0331 08:17:56.976909 140111137965888 submission_runner.py:606] ====================
I0331 08:17:56.977010 140111137965888 submission_runner.py:696] Final imagenet_resnet score: 0
Exiting with 0