I've put in a bit of time understanding Bazel's caching better to see how this feature fits in. I've experimented with the following setup:
$ cat .bazelversion
1.1.0
$ cat .bazelrc
common --announce_rc
build --incompatible_strict_action_env
build --announce_rc
build --remote_download_minimal
build:http --remote_cache=http://127.0.0.1:8080
$ cat BUILD.bazel
genrule(
name = "a",
cmd = "echo a > $@",
outs = ["a.txt"],
)
genrule(
name = "b",
srcs = ["a.txt"],
cmd = "cat $(location :a.txt) > $@; echo b >> $@",
outs = ["b.txt"],
)
genrule(
name = "c",
srcs = ["a.txt", "b.txt"],
cmd = "cat $(location :a.txt) $(location :b.txt) > $@; echo c >> $@",
outs = ["c.txt"],
)
With a clear cache, running the following bazel command will fill the cache up to target b
:
$ bazel --nohome_rc build --config http //:b
It makes the following requests on the cache:
2019/11/08 21:19:26 GET 404 127.0.0.1 /ac/734539cab61e72ccfc83c4dccfeb9c17ddddcf14ca923c6fb8fc3eb59bb1de75
2019/11/08 21:19:26 PUT 200 127.0.0.1 /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
2019/11/08 21:19:26 PUT 200 127.0.0.1 /cas/87428fc522803d31065e7bce3cf03fe475096631e5e07bbd7a0fde60c4cf25c7
2019/11/08 21:19:26 PUT 200 127.0.0.1 /cas/734539cab61e72ccfc83c4dccfeb9c17ddddcf14ca923c6fb8fc3eb59bb1de75
2019/11/08 21:19:26 PUT 200 127.0.0.1 /cas/b01b0391d39289cc862e8f76e80048f418d5a8a8d75a3a3cdfe18f860e6c2326
2019/11/08 21:19:26 PUT 200 127.0.0.1 /ac/734539cab61e72ccfc83c4dccfeb9c17ddddcf14ca923c6fb8fc3eb59bb1de75
2019/11/08 21:19:26 GET 404 127.0.0.1 /ac/7201514ba523bda8568c6af767b7ddd87b1786254414339136ab874842c3e3d7
2019/11/08 21:19:26 PUT 200 127.0.0.1 /cas/911169ddaaf146aff539f58c26c489af3b892dff0fe283c1c264c65ae5aa59a2
2019/11/08 21:19:26 PUT 200 127.0.0.1 /cas/7201514ba523bda8568c6af767b7ddd87b1786254414339136ab874842c3e3d
2019/11/08 21:19:26 PUT 200 127.0.0.1 /cas/34252bce59f09a3a058c4581f8a09597eebcc19f7083c053cefdb51c02eb07f
2019/11/08 21:19:26 PUT 200 127.0.0.1 /ac/7201514ba523bda8568c6af767b7ddd87b1786254414339136ab874842c3e3d7
We can inspect the cache and see that the contents are now:
$ tree cache/
cache/
├── cas
│ ├── 34252bce59f09a3a058c4581f8a09597eebcc19f7083c053cefdb51c02eb07f5
│ ├── 7201514ba523bda8568c6af767b7ddd87b1786254414339136ab874842c3e3d7
│ ├── 734539cab61e72ccfc83c4dccfeb9c17ddddcf14ca923c6fb8fc3eb59bb1de75
│ ├── 87428fc522803d31065e7bce3cf03fe475096631e5e07bbd7a0fde60c4cf25c7
│ ├── 911169ddaaf146aff539f58c26c489af3b892dff0fe283c1c264c65ae5aa59a2
│ ├── b01b0391d39289cc862e8f76e80048f418d5a8a8d75a3a3cdfe18f860e6c2326
│ └── e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
└── ac
├── 7201514ba523bda8568c6af767b7ddd87b1786254414339136ab874842c3e3d7
└── 734539cab61e72ccfc83c4dccfeb9c17ddddcf14ca923c6fb8fc3eb59bb1de75
We can decode the ac
entries as build.bazel.remote.execution.v2.ActionResult
and the corresponding hashes in the cas
as build.bazel.remote.execution.v2.Action
's:
$ protoc -I ../googleapis/ -I . \
--decode=build.bazel.remote.execution.v2.ActionResult build/bazel/remote/execution/v2/remote_execution.proto \
< ../cache/ac/734539cab61e72ccfc83c4dccfeb9c17ddddcf14ca923c6fb8fc3eb59bb1de75
output_files {
path: "bazel-out/darwin-fastbuild/bin/a.txt"
digest {
hash: "87428fc522803d31065e7bce3cf03fe475096631e5e07bbd7a0fde60c4cf25c7"
size_bytes: 2
}
}
stdout_digest {
hash: "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
}
stderr_digest {
hash: "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
}
$ protoc -I ../googleapis/ -I . \
--decode=build.bazel.remote.execution.v2.Action build/bazel/remote/execution/v2/remote_execution.proto \
< ../cache/cas/7201514ba523bda8568c6af767b7ddd87b1786254414339136ab874842c3e3d7
command_digest {
hash: "34252bce59f09a3a058c4581f8a09597eebcc19f7083c053cefdb51c02eb07f5"
size_bytes: 283
}
input_root_digest {
hash: "8dd51c306a2cb0ccede035eea9379d4af4f1db3e152b219455ccaedbce9ad86b"
size_bytes: 165
}
$ protoc -I ../googleapis/ -I . \
--decode=build.bazel.remote.execution.v2.ActionResult build/bazel/remote/execution/v2/remote_execution.proto \
< ../cache/ac/7201514ba523bda8568c6af767b7ddd87b1786254414339136ab874842c3e3d7
output_files {
path: "bazel-out/darwin-fastbuild/bin/b.txt"
digest {
hash: "911169ddaaf146aff539f58c26c489af3b892dff0fe283c1c264c65ae5aa59a2"
size_bytes: 4
}
}
stdout_digest {
hash: "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
}
stderr_digest {
hash: "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
}
$ protoc -I ../googleapis/ -I . \
--decode=build.bazel.remote.execution.v2.Action build/bazel/remote/execution/v2/remote_execution.proto \
< ../cache/cas/734539cab61e72ccfc83c4dccfeb9c17ddddcf14ca923c6fb8fc3eb59bb1de75
command_digest {
hash: "b01b0391d39289cc862e8f76e80048f418d5a8a8d75a3a3cdfe18f860e6c2326"
size_bytes: 200
}
input_root_digest {
hash: "b9002e1da634b904c8bde2677cfba9a64f6a6e64a3c2d1e6f1f4915b84cecb79"
size_bytes: 82
}
Looking at the actions, we can decode the command_digest
's as build.bazel.remote.execution.v2.Command
:
$ protoc -I ../googleapis/ -I . \
--decode=build.bazel.remote.execution.v2.Command build/bazel/remote/execution/v2/remote_execution.proto \
< ../cache/cas/b0/b01b0391d39289cc862e8f76e80048f418d5a8a8d75a3a3cdfe18f860e6c2326
arguments: "/bin/bash"
arguments: "-c"
arguments: "source external/bazel_tools/tools/genrule/genrule-setup.sh; echo a > bazel-out/darwin-fastbuild/bin/a.txt"
environment_variables {
name: "PATH"
value: "/bin:/usr/bin:/usr/local/bin"
}
output_files: "bazel-out/darwin-fastbuild/bin/a.txt"
platform {
}
$ protoc -I ../googleapis/ -I . \
--decode=build.bazel.remote.execution.v2.Command build/bazel/remote/execution/v2/remote_execution.proto \
< ../cache/cas/34/34252bce59f09a3a058c4581f8a09597eebcc19f7083c053cefdb51c02eb07f5
arguments: "/bin/bash"
arguments: "-c"
arguments: "source external/bazel_tools/tools/genrule/genrule-setup.sh; cat bazel-out/darwin-fastbuild/bin/a.txt > bazel-out/darwin-fastbuild/bin/b.txt; echo b >> bazel-out/darwin-fastbuild/bin/b.txt"
environment_variables {
name: "PATH"
value: "/bin:/usr/bin:/usr/local/bin"
}
output_files: "bazel-out/darwin-fastbuild/bin/b.txt"
platform {
}
and inspect the values for output_files
and stdout_digest
/stderr_digest
:
$ cat ../cache/cas/87428fc522803d31065e7bce3cf03fe475096631e5e07bbd7a0fde60c4cf25c7
a
$ cat ../cache/cas/911169ddaaf146aff539f58c26c489af3b892dff0fe283c1c264c65ae5aa59a2
a
b
$ cat ../cache/cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
With that understanding of the cache and all its contents, lets do a clean and a rebuild and checkout the requests to the cache:
$ bazel clean
$ bazel --nohome_rc build --config http //:b
...
2019/11/08 21:54:58 GET 200 127.0.0.1 /ac/734539cab61e72ccfc83c4dccfeb9c17ddddcf14ca923c6fb8fc3eb59bb1de75 //:a ActionResult
2019/11/08 21:54:58 GET 200 127.0.0.1 /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 emtpy digest
2019/11/08 21:54:58 GET 200 127.0.0.1 /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 emtpy digest
2019/11/08 21:54:58 GET 200 127.0.0.1 /ac/7201514ba523bda8568c6af767b7ddd87b1786254414339136ab874842c3e3d7 //:b ActionResult
2019/11/08 21:54:58 GET 200 127.0.0.1 /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 emtpy digest
2019/11/08 21:54:58 GET 200 127.0.0.1 /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 emtpy digest
We can see that in --remote_download_minimal
mode, we download all the ActionResults
required for the actions from the actioncache and the stdout
/stderr
digests for the actions from the content address store:
$ bazel clean
$ bazel --nohome_rc build --config http //:b --remote_download_toplevel
...
2019/11/08 21:59:35 GET 200 127.0.0.1 /ac/734539cab61e72ccfc83c4dccfeb9c17ddddcf14ca923c6fb8fc3eb59bb1de75 //:a ActionResult
2019/11/08 21:59:35 GET 200 127.0.0.1 /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 emtpy digest
2019/11/08 21:59:35 GET 200 127.0.0.1 /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 emtpy digest
2019/11/08 21:59:35 GET 200 127.0.0.1 /ac/7201514ba523bda8568c6af767b7ddd87b1786254414339136ab874842c3e3d7 //:b ActionResult
2019/11/08 21:59:35 GET 200 127.0.0.1 /cas/911169ddaaf146aff539f58c26c489af3b892dff0fe283c1c264c65ae5aa59a2 bazel-out/darwin-fastbuild/bin/b.txt
2019/11/08 21:59:35 GET 200 127.0.0.1 /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 emtpy digest
2019/11/08 21:59:35 GET 200 127.0.0.1 /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 emtpy digest
With --remote_download_toplevel
we can see we get the same, but also the toplevel output for target //:b
.
$ bazel clean
$ bazel --nohome_rc build --config http //:b --remote_download_outputs=all
...
2019/11/08 22:01:30 GET 200 127.0.0.1 /ac/734539cab61e72ccfc83c4dccfeb9c17ddddcf14ca923c6fb8fc3eb59bb1de75 //:a ActionResult
2019/11/08 22:01:30 GET 200 127.0.0.1 /cas/87428fc522803d31065e7bce3cf03fe475096631e5e07bbd7a0fde60c4cf25c7 bazel-out/darwin-fastbuild/bin/a.txt
2019/11/08 22:01:30 GET 200 127.0.0.1 /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 emtpy digest
2019/11/08 22:01:30 GET 200 127.0.0.1 /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 emtpy digest
2019/11/08 22:01:30 GET 200 127.0.0.1 /ac/7201514ba523bda8568c6af767b7ddd87b1786254414339136ab874842c3e3d7 //:b ActionResult
2019/11/08 22:01:30 GET 200 127.0.0.1 /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 emtpy digest
2019/11/08 22:01:30 GET 200 127.0.0.1 /cas/911169ddaaf146aff539f58c26c489af3b892dff0fe283c1c264c65ae5aa59a2 bazel-out/darwin-fastbuild/bin/b.txt
2019/11/08 22:01:30 GET 200 127.0.0.1 /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 emtpy digest
And with --remote_download_outputs=all
, we get the regular behavior of attempting to download all intermediates.
With the above in mind, we can see that the cache:
- Always checks the actioncache (
ac
) for results to actions- If not Found : bazel executes the action, uploads all results (
output_files
,command_digest
,stdout_digest
,stderr_digest
) into thecas
, then uploads theActionResult
into theac
. - If found : depending on the strategy
- all : download
stdout_digest
,stderr_digest
andoutput_files
. - toplevel : download
stdout_digest
,stderr_digest
always,output_files
iff target is toplevel. - minimal : download
stdout_digest
,stderr_digest
always,output_files
never.
- all : download
- If not Found : bazel executes the action, uploads all results (
Because the cache doesn't make requests for the cache keys related to output_files
nearly as often if it used to, likely our failures have something to do with this.
Now lets test some eviction scenarios. First, lets say the ActionResult
for //:a
gets evicted from the cache.
$ rm ../cache/ac/734539cab61e72ccfc83c4dccfeb9c17ddddcf14ca923c6fb8fc3eb59bb1de75
$ bazel clean
$ bazel --nohome_rc build --config http //:b
INFO: Invocation ID: 56ca7eb1-fd4f-463c-85a0-52c69fbdc460
INFO: Analyzed target //:b (5 packages loaded, 9 targets configured).
INFO: Found 1 target...
Target //:b up-to-date:
bazel-bin/b.txt
INFO: Elapsed time: 0.223s, Critical Path: 0.10s
INFO: 2 processes: 1 remote cache hit, 1 darwin-sandbox.
INFO: Build completed successfully, 3 total actions
...
2019/11/08 22:26:22 GET 404 127.0.0.1 /ac/734539cab61e72ccfc83c4dccfeb9c17ddddcf14ca923c6fb8fc3eb59bb1de75 //:a ActionResult
2019/11/08 22:26:22 PUT 200 127.0.0.1 /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 emtpy digest
2019/11/08 22:26:22 PUT 200 127.0.0.1 /cas/87428fc522803d31065e7bce3cf03fe475096631e5e07bbd7a0fde60c4cf25c7 bazel-out/darwin-fastbuild/bin/a.txt
2019/11/08 22:26:22 PUT 200 127.0.0.1 /cas/734539cab61e72ccfc83c4dccfeb9c17ddddcf14ca923c6fb8fc3eb59bb1de75 //:a Action
2019/11/08 22:26:22 PUT 200 127.0.0.1 /cas/b01b0391d39289cc862e8f76e80048f418d5a8a8d75a3a3cdfe18f860e6c2326 //:a Command
2019/11/08 22:26:22 PUT 200 127.0.0.1 /ac/734539cab61e72ccfc83c4dccfeb9c17ddddcf14ca923c6fb8fc3eb59bb1de75 //:a ActionResult
2019/11/08 22:26:22 GET 200 127.0.0.1 /ac/7201514ba523bda8568c6af767b7ddd87b1786254414339136ab874842c3e3d7 //:b ActionResult
2019/11/08 22:26:22 GET 200 127.0.0.1 /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 emtpy digest
2019/11/08 22:26:22 GET 200 127.0.0.1 /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 emtpy digest
We can see bazel attempts to find an ActionResult
first in the ActionCache
, for which it finds a 404, so it executes the action locally and uploads the results to the cache. After this it continues to fetch the results for b.
Lets attempt to evict a stdout_digest
/stderr_digest
:
$ rm ../cache/cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
$ bazel clean
$ bazel --nohome_rc build --config http //:b
INFO: Invocation ID: 6182dbd9-21ce-4ffe-98f5-a9b31a4ed35f
INFO: Analyzed target //:b (5 packages loaded, 9 targets configured).
INFO: Found 1 target...
Target //:b up-to-date:
bazel-bin/b.txt
INFO: Elapsed time: 0.220s, Critical Path: 0.08s
INFO: 2 processes: 1 remote cache hit, 1 darwin-sandbox.
INFO: Build completed successfully, 3 total actions
...
2019/11/08 22:41:16 GET 200 127.0.0.1 /ac/734539cab61e72ccfc83c4dccfeb9c17ddddcf14ca923c6fb8fc3eb59bb1de75 //:a ActionResult
2019/11/08 22:41:16 GET 404 127.0.0.1 /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 emtpy digest
2019/11/08 22:41:16 GET 404 127.0.0.1 /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 emtpy digest
2019/11/08 22:41:16 PUT 200 127.0.0.1 /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 emtpy digest
2019/11/08 22:41:16 PUT 200 127.0.0.1 /cas/87428fc522803d31065e7bce3cf03fe475096631e5e07bbd7a0fde60c4cf25c7 bazel-out/darwin-fastbuild/bin/a.txt
2019/11/08 22:41:16 PUT 200 127.0.0.1 /cas/734539cab61e72ccfc83c4dccfeb9c17ddddcf14ca923c6fb8fc3eb59bb1de75 //:a Action
2019/11/08 22:41:16 PUT 200 127.0.0.1 /cas/b01b0391d39289cc862e8f76e80048f418d5a8a8d75a3a3cdfe18f860e6c2326 //:a Command
2019/11/08 22:41:16 PUT 200 127.0.0.1 /ac/734539cab61e72ccfc83c4dccfeb9c17ddddcf14ca923c6fb8fc3eb59bb1de75 //:a ActionResult
2019/11/08 22:41:16 GET 200 127.0.0.1 /ac/7201514ba523bda8568c6af767b7ddd87b1786254414339136ab874842c3e3d7 //:b ActionResult
2019/11/08 22:41:16 GET 200 127.0.0.1 /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 emtpy digest
2019/11/08 22:41:16 GET 200 127.0.0.1 /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 emtpy digest
When using minimal mode, upon not finding the content for the stdout
/stderr
digests in the cas
, Bazel executes the action locally and uploads it.
Now, lets attempt to evict the output_files
for //:a
,
$ rm ../cache/cas/87428fc522803d31065e7bce3cf03fe475096631e5e07bbd7a0fde60c4cf25c7
$ bazel clean
$ bazel --nohome_rc build --config http //:b
INFO: Invocation ID: 5e7da007-8bfd-42a4-bfcb-6f1c19845627
INFO: Analyzed target //:b (5 packages loaded, 9 targets configured).
INFO: Found 1 target...
Target //:b up-to-date:
bazel-bin/b.txt
INFO: Elapsed time: 0.112s, Critical Path: 0.01s
INFO: 2 processes: 2 remote cache hit.
INFO: Build completed successfully, 3 total actions
...
2019/11/08 22:47:04 GET 200 127.0.0.1 /ac/734539cab61e72ccfc83c4dccfeb9c17ddddcf14ca923c6fb8fc3eb59bb1de75 //:a ActionResult
2019/11/08 22:47:04 GET 200 127.0.0.1 /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 emtpy digest
2019/11/08 22:47:04 GET 200 127.0.0.1 /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 emtpy digest
2019/11/08 22:47:04 GET 200 127.0.0.1 /ac/7201514ba523bda8568c6af767b7ddd87b1786254414339136ab874842c3e3d7 //:b ActionResult
2019/11/08 22:47:04 GET 200 127.0.0.1 /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 emtpy digest
2019/11/08 22:47:04 GET 200 127.0.0.1 /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 emtpy digest
With minimal
mode, this has no effect while building //:b
, as bazel doesn't attempt to fetch that key from the cache.
However, now lets attempt to build //:c
(which also has a dependency on //:a
):
$ bazel clean
$ bazel --nohome_rc build --config http //:c
INFO: Invocation ID: 8411d527-7507-4b03-a3fb-7469e8ee0c4a
INFO: Analyzed target //:c (5 packages loaded, 11 targets configured).
INFO: Found 1 target...
ERROR: /Users/robbert/Desktop/remotetest/project/BUILD.bazel:14:1: Executing genrule //:c failed due to unexpected I/O exception: Failed to fetch file with hash '87428fc522803d31065e7bce3cf03fe475096631e5e07bbd7a0fde60c4cf25c7' because it does not exist remotely. --experimental_remote_outputs=minimal does not work if your remote cache evicts files during builds.
java.io.IOException: Failed to fetch file with hash '87428fc522803d31065e7bce3cf03fe475096631e5e07bbd7a0fde60c4cf25c7' because it does not exist remotely. --experimental_remote_outputs=minimal does not work if your remote cache evicts files during builds.
...
Target //:c failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 0.121s, Critical Path: 0.01s
INFO: 2 processes: 2 remote cache hit.
FAILED: Build did NOT complete successfully
...
2019/11/08 22:51:36 GET 200 127.0.0.1 /ac/734539cab61e72ccfc83c4dccfeb9c17ddddcf14ca923c6fb8fc3eb59bb1de75 //:a ActionResult
2019/11/08 22:51:36 GET 200 127.0.0.1 /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 emtpy digest
2019/11/08 22:51:36 GET 200 127.0.0.1 /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 emtpy digest
2019/11/08 22:51:36 GET 200 127.0.0.1 /ac/7201514ba523bda8568c6af767b7ddd87b1786254414339136ab874842c3e3d7 //:b ActionResult
2019/11/08 22:51:36 GET 200 127.0.0.1 /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 emtpy digest
2019/11/08 22:51:36 GET 200 127.0.0.1 /cas/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 emtpy digest
2019/11/08 22:51:36 GET 404 127.0.0.1 /ac/905038842e75345b093d088e0574f3a89b99aceae7e333278318f300dbd601bd //:c ActionResult
2019/11/08 22:51:36 GET 404 127.0.0.1 /cas/87428fc522803d31065e7bce3cf03fe475096631e5e07bbd7a0fde60c4cf25c7 bazel-out/darwin-fastbuild/bin/a.txt
2019/11/08 22:51:36 GET 200 127.0.0.1 /cas/911169ddaaf146aff539f58c26c489af3b892dff0fe283c1c264c65ae5aa59a2 bazel-out/darwin-fastbuild/bin/b.txt)
Here's where we get the exception. Bazel fetches the ActionResults
for //:a
and //:b
from the ac
, which both exist. It then attempts to fetch //:c
, which does not exist, and so it attempts to fetch the inputs for //:c
fromt the cas
, which succeeds for //:b
but fails for //:a
.
In a circle of bazel clean
& bazel build
, this issue perists till the remote cache stops serving the ac
result for //:a
(or starts containing the corresponding cas
entries).
Based on the above, here's my theory for how this caused our problems:
If I recall correctly, we didn't see this issue during th 1-2 week shadowing period, but it caused an issue ~1 day after rolling the change out to all CI jobs. That would correspond to about the time for cas
entries to be evicted if they hadn't been fetched. During the shadow period, we still had jobs downloading all artifacts and touching most items in the cas, prolonging their lifetime. Once we switched all jobs to minimal, these items stopped being fetched as often and disappeared from the cache. That started causing issues for the few "rebuild the world" diffs in which they were actually needed.
To use the feature, I think our objective should be to figure out if we can change the cache to not serve ActionResults
from the ac
if we can't serve the corresponding outputs from the cas
in the near future. Some strawman suggestions:
- Before serving an item from the
ac
, verify we have all outputs present in thecas
. - When serving an
ac
item, in addition to bumping that key as accessed in the lru, also bump all relatedcas
keys as being accessed. - Only evict
cas
entries if they're not reffered to byac
items. - Dumb nuclear evict: when evicting evict all. This could be expensive for the first job that runs afterwards, but depending on the speedup all subsequent jobs receive maybe it is worth it.
I'm sure there are some caveats with the above, but I'd like for us to consider something like the above. In addition to the above, it'd like to dig into the expected performance gain a little bit more to see we should prioritise appropriately. Based on the above, go jobs doing a high number of writes, our buildtimes sometimes being 20 mins for a clean but mostly cached build, I believe we might be bandwith constrained and that this will bring us good speedups. Finding out the total time for a clean build downloading everything in the cache /w and /without build without the bytes should give us some insights.