Skip to content

Instantly share code, notes, and snippets.

@fmeum
Last active June 16, 2022 18:19
Show Gist options
  • Save fmeum/c87c183ad6ffc55add0ca04232761ea1 to your computer and use it in GitHub Desktop.
Save fmeum/c87c183ad6ffc55add0ca04232761ea1 to your computer and use it in GitHub Desktop.
Improving module extension usability

Current situation

I frequently encountered the following two situations when working with and designing for bzlmod.

http_archive as a module extension

A typical usage of the direct analogue of the current http_archive repository rule for MODULE.bazel would probably look as follows:

http = use_extension("@bazel_tools//...", "http")
http.archive(name = "foobar", url = "...")
use_repo(http, "foobar")

There are two problems with this:

  1. "foobar" has to be repeated in http.archive and use_repo.
  2. As far as I understand, since all repositories created by a single module extension can see each other, this could lead to clashes if a module that is a transitive dependency of the current module also happens to use http.archive(name = "foobar").

"one repo per non-Bazel dep" module extensions

Rulesets that manage external, non-Bazel dependencies, often create a repository per external dep (see e.g. Gazelle). With bzlmod, the pattern of using one module tag per non-Bazel dep nicely models the dependency requirements even for transitive module dependencies.

For example, a hypothetical bzlmod-ified Gazelle would allow for the following:

go_dep = use_extension("@bazel_gazelle//...", "go_dep")
go_dep.mod(importpath = "golang.org/x/errors", version = "...")
use_repo(go_dep, "org_golang_x_errors")

There is no repetition here since there is no need to specify a "name" attribute on the tag. However, there are still some things that are not so nice about this pattern:

  1. When the list of Go dependencies gets larger, it becomes difficult to match and sync the list of go_dep.mod lines to the repositories listed in use_repo.
  2. Users have to be aware of the "algorithm" that turns an import path into a repository name.

(Let's disregard for the moment that in the particular case of Go dependencies, gazelle will probably take over 1. and 2. - it should just serve as a concrete example for the more general external deps case)

Proposal

In both situations described above, the reason why they were less verbose with WORKSPACE compared to with MODULE.bazel is that they use the probably very common one tag, one repo pattern - while bzlmod can handle more general situations, it doesn't offer any handy shortcuts for this idiom.

I am thus proposing the following:

  1. Add a function mark_resolved_to to the module_ctx passed to a module extension that takes a module tag and a repository name as arguments and internally marks the tag as corresponding to the particular repository, which must be instantiated by the module extension. Multiple tags can be associated to a single repository in this way, but it is an error if more than one repository is associated with the same tag.

  2. Either of the following (a) is a bit more concise, but b) mimics the repo_name attribute on bazel_dep, which is nice for consistency):

    a) In MODULE.bazel, an assignment statement some_name = some_ext.some_tag(...) makes the repository associated with the tag visible as some_name. If the module extension hasn't called mark_resolved_to for this tag, fail with an error.

    b) Add a magic repo_name attribute to all tags. If it is set and the tag is not associated with a repository, fail with an error. If it is set and the tag is associated with a repository, make the repository visible under the given name.

Of course, names and/or syntax aren't set in stone, the core of this proposal is merely to let module extensions establish a link between tags and repos.

Benefits

Going over the introductory examples, the mentioned problems are solved as follows:

http_archive as a module extension

This could become:

http = use_extension("@bazel_tools//...", "http")
# Option 2.a)
foobar = http.archive(url = "...")
# Option 2.b)
http.archive(repo_name = "foobar", url = "...")
  1. The name of the repository no longer has to be repeated.
  2. Internally, the http extension can choose any name for the repository that is certain not to collide with other names generated by the same extension (e.g., module_name.$url_safe_chars.$hash_of_url).

"one repo per non-Bazel dep" module extensions

This could become:

go_dep = use_extension("@bazel_gazelle//...", "go_dep")
# Option 2.a)
org_golang_x_errors = go_dep.mod(importpath = "golang.org/x/errors", version = "...")
# Option 2.b)
go_dep.mod(repo_name = "org_golang_x_errors", importpath = "golang.org/x/errors", version = "...")
  1. An external repository declaration corresponds to a single line, containing both the name of the repo and the tag.
  2. Users no longer have to be aware of the naming scheme used internally by the module extension. They can use it (as in the example), but aren't forced to.
@aiuto
Copy link

aiuto commented Jun 7, 2022

I like the 2a scheme, where you get an object back that describes the repository. That object could then become an argument to generic tools to examine/fix the downloaded repository. Things like:

  • rewrite targets in BUILD files to conform to local usage. Imagine we built a custom cc_library which uses sources instead of srcs and want all invocations globally to use our local one
  • splicing in license and SBOM package annotations to the top level BUILD file

@alexeagle
Copy link

We have the same problem in JS - but in our case one tag points to a lock file which creates many external repos:
https://github.com/aspect-build/rules_js/blob/main/e2e/bzlmod/MODULE.bazel#L21-L25
so for this case I think neither 1 or 2 is sufficient.

@fmeum
Copy link
Author

fmeum commented Jun 8, 2022

@alexeagle Doesn't the JS use case have the special property that a use of the tag providing the lock file only ever fetches direct dependencies of the module the tag is used from? If so, the mark_resolved_to approach would still provide meaningful metadata that may allow for something like this:

all_npm_repos = npm.npm_translate_lock(
    pnpm_lock = "//:pnpm-lock.yaml",
)
use_all_repos(npm, all_npm_repos)

Generally speaking, being able to structurally reference module tags as substitutes for the collection of repositories they resolve to seems like the way to reduce the use_repo noise.

@alexeagle
Copy link

No, at least not currently. @gregmagolan can explain why use of pnpm causes even transitive deps to be referenced by macros that expand into the users workspace.

@alexeagle
Copy link

However I think your proposal can still work for our case, since the npm_translate_lock could return all transitive repos as well, and they'd all be registered as visible to users.

(I also think the bzlmod launch for Bazel 6 ought to separate the new "strict repository visibility" feature from the new package manager, by at least making the strictness controlled by a flag, though I'm not sure if that's feasible given the repository mapping feature being used)

@fmeum
Copy link
Author

fmeum commented Jun 8, 2022

Yes, transitive NPM dependencies should be handled well. NPM dependencies of transitive bzlmod dependencies would be problematic, but it seems that those don't arise as the bzlmod dependencies would supply their own lockfiles.

@gregmagolan
Copy link

Yup. We can return the full set of direct and transitive repos needed from npm_translate_lock since we generate all of them individually from the data in the lockfile.

@Wyverald
Copy link

re 2a: Assigning a tag to a variable is slightly problematic in that the set of valid repo names doesn't exactly coincide with the set of valid Starlark variable names.

re the whole proposal: @meteorcloudy and I discussed a similar idea before deciding on use_repo. The choice was essentially who gets to decide the repo names: the user of the extension, or the extension author? The concerns we had about the first option are already familiar to you -- the correspondence between tags and repos can very well be many-to-many. We felt that making the extension author have to declare which tags correspond to which repos could be overly complicated, and plus the API wasn't so easy to design (how do you refer to a specific tag within the extension's impl function?). We did entertain the idea of keeping this option open (so an extension can optionally declare that it supports associating tags with repos) -- maybe it's time to actually design that?

Another point that may or may not be relevant here: it could be beneficial to use a "hub" repo pattern. This "hub" repo contains nothing but a bunch of aliases to other repos generated by the same extension. This way, the user only needs to write one use_repo clause, and can still use everything generated by the extension. This pattern might not suit every extension (maybe the hub would have to contain too many aliases or something) but does make the user experience better, without sacrificing the benefit brought by multiple repos (caching & on-demand fetching).

@fmeum
Copy link
Author

fmeum commented Jun 10, 2022

I can see a "hub" repo working well for something like rules_jvm_external, which essentially produces one target per external dependency. This probably wouldn't work as well for rules_go/Gazelle for the following reasons. Every Go dependency is a full-blown Bazel repo with potentially many interesting targets (binaries, helper libraries, proto libraries,...). Mapping these all under @go_deps//repo_name/... would require a lot of aliases.

Furthermore, hub repos don't save users from having to know the tag -> repo name mapping.

@alexeagle
Copy link

FYI we discussed this one at the rules authors SIG meeting this week.

I pointed out another potential solution, to take the Gazelle approach "Bazel needs to know this thing, it's derivable from your existing files, but Bazel doesn't allow that derivation to take place at runtime". So a little translator from go.mod that updates your MODULE.bazel file that you have to run out-of-band. I don't love it but OTOH most Bazel users should be using Gazelle someday, and Go devs already run https://github.com/bazelbuild/bazel-gazelle#update-repos so this could fit there.

@fmeum
Copy link
Author

fmeum commented Jun 16, 2022

That would definitely be idiomatic for Gazelle and it's also what I settled on for rules_go specifically. Maybe the general answer is that other rulesets should just adopt the Gazelle approach as well? Might be hard to sell to new Bazel users, but wouldn't be too bad otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment