NOTE - this is a VERY rough draft - I'd be surprised there aren't some sentences that just cut off in the middle, to say nothing of needing better organization and flow. Try to read it as a skeletal description of one possible approach.
Since Go's release, a lot of ideas have circulated about how the Go development environment might be improved. With the introduction of a first-class dependency management system to the go toolchain, many of these goals suddenly become feasible to address centrally in the toolchain itself, rather than requiring awkward hacks by the user. This proposal outlines a unified solution to address several of these goals at once:
- Project-centric development: working on a single Go project from an arbitrary disk location, irrespective of GOPATH
- Multi-project development: working on multiple interrelated Go projects at once, inside a shared worksapce (a re-envisioned GOPATH)
- Decluttered GOPATH: GOPATH has a tendency to accumulate code over time, and can become quite unwieldy to manage
- Better compiled object reuse: compiled objects (under
GOPATH/pkg
) can be parameterized by version for targeted reuse
The essence of this proposal is to create an alternate space for Go source code, similar to GOPATH/src
but with version information demultiplexed, and expand the toolchain to use this new space through reliance on the information in dep
-style manifest and lock files. All of the above goals would fall out directly from this change, while retaining good, although unavoidably incomplete, backwards compatibility with existing workflows.
Today's GOPATH has two fundamental problems:
- It only allows a single version of any given package to exist at once (per GOPATH)
- We cannot programatically differentiate between code the user is working on and code they merely depend on
More or less all the other problems with GOPATH stem from the interaction of these two issues. Because only one version of a given package can exist at a time, it may be necessary to switch versions around to meet differing build requirements; however, we have no insight into which code the user is working on - which dictates the requirements for dependencies, and should not be changed - versus dependencies, which might be changed.
Introduction of the vendor/
directory ushered in a different paradigm: project-centric development. This wholly different approach handily addresses both of the aforementioned problems with GOPATH, and works quite well for many development workflows. With the addition of dep
's standardized manifest and lock files, tooling has sufficient information to select versions for all of a project's dependencies, and reproduce that set of dependencies into vendor/
on any machine (assuming the continued upstream availability of the underlying source).
But a project-centric approach is not a panacea. For example, if a developer needs to work simultaneously on two or more projects that import each other, a single-project-centric approach necessarily falls down. On the other hand, this use case is met easily in today's world of shared workspaces. Each discrete project can maintain its own identity, while being treated collectively for higher-order tasks like compilation or dependency selection.
There are halfway solutions to addressing the multi-project problem using vendor
, but they all reduce to hacks: symlinking, cp/rsync scripts, fsnotify/inotify, etc. Only having a proper workspace will address the issue, and that means reimagining GOPATH. The best, most backwards-compatible way of doing that is to split up and reassign some of GOPATH's current responsibilities.
At present, there are two major reasons why source code might be present on GOPATH:
- The user is hacking on that code
- The user indicated a need for the code via
go get
(or an analogous tool), either directly or as a dependency
(There's arguably a couple other cases, but...well, putting a pin in that for now)
If we can offload the responsibility for #2 into some other space, then GOPATH's problem with ambiguous user intent vanishes; if it's in GOPATH, the user is hacking on it, and tooling should treat it as a fixed point in dependency selection.
The challenge is in designing this third area - a "source cache" - and making the toolchain modifications necessary for working with it. Such a source cache would not only clean up GOPATH - for users following the single-project-centric development model, it would obviate the need for working within a GOPATH entirely.
The source cache should have several properties:
- Packages are structured by by import path AND version (as opposed to merely import path, as in GOPATH)
- Is not associated with any single GOPATH
- Can store both compiled objects and source code
- Tooling can verify its integrity
dep
's manifest and lock provide import paths and versions - all the information tooling would need to address code in the source cache, rather than on GOPATH by mere import path. The first item is really the hardest thing to achieve (and #4 falls out from it if implemented well), but I haven't been able to fully nail it down. So, I have some disconnected thoughts:
dep
is no more certain about the integrity of the code it works with than the version control systems that underly it. It inherits their guarantees, and reuses their identifiers - versions and revisions, etc. This is maybe not great, and dovetails with the vendor verification issue.- Ideally, entries in the source cache should be immutable. In a context like this, an immutable identifier tends to come from some hashing scheme, and as such will be a fixed number of bytes. This is fine for direct addressing, but is useless for sorting, and not human-friendly. And, depending on the address-generation scheme, it may not be readily addressable.
- There are versions that expressly shouldn't be immutable - branch names, for example. So, either the source cache needs to operate at a lower level that will make the organization of its contents non-obvious to users, or we have to give up immutability, or create a sidecar to hold that information.
Ideally, we would develop a well-defined 'source' abstraction - one that covers all VCS-backed sources currently supported by the toolchain, but could also cover a future 'registry' type, and possibly also arbitrary code trees found on disk.
Unfortunately, I'm not coming into this having mountains of experience with go
toolchain internals, and have only had time for a cursory look about what sort of changes would be entailed by working with a source cache. So, please take the following discussions as an attempt to outline a general shape of how the behavior would need to change, rather than a detailed and correct plan.
Being the general entry point into the compiler, the most crucial changes will likely have to happen here. Because all of this folds in with existing GOPATH behavior, I think it should be possible to hide behind a feature flag.
The general behavior of specifying packages to go build
probably needn't change:
Build compiles the packages named by the import paths,
along with their dependencies, but it does not install the results.
If the arguments to build are a list of .go files, build treats
them as a list of source files specifying a single package.
Arbitrary import path names, or filenames, can still be provided, and go build
will search GOPATH for them in the same way it always has. Changes begin once the tool has established the root director(ies) which it will be building. A high-level description of the algorithm would be (for each named package):
- Climb the dir tree, attempting to find a lock file - basically,
dep.findProjectRoot()
. If no lock is found:- ...and the cwd/named package is not within a GOPATH, error out.
- ...and the named package is within a GOPATH, proceed with the legacy
go build
behavior (skip everything below). - A lock file might also be explicitly specified as an input -
go get
would need this, see below. (This would combine awkwardly with the ability to specify an arbitrary number of packages, though)
- Load up the lock's list of project (import) root/version pairs into a trie, which will need to be injected/made available for the entire transitive scope of compilation work entailed by the named package's imports.
- If the named package is within a GOPATH, it should be searched for each of the project roots named in the lock. If a project root is on GOPATH, it will supercede the lock-named version and the source cache for ALL child packages; source code must derive wholly from either the source cache OR GOPATH - no inconsistent mixing. This also needs to be toggle-able - see the
go get
discussion. - The behavior of
vendor/
here is tricky. I'd say it should generally supercede GOPATH in the same way GOPATH supercedes the source cache, but the big change here precomputing where import paths should be sourced from, rather than doing localized search on a per-package basis (as happens withvendor/
). - Relative import paths are also tricky. This is part of why I've just disallowed them entirely in
gps
(for now). - It should be noted that this algorithm will need significant refactoring if/when we get around to allowing multiple major versions of the same project in a build, as we will need to separate not only and import statement's importee (as we do here), but also the importer, so that packages are given the version of the imports they expect.
- If the named package is within a GOPATH, it should be searched for each of the project roots named in the lock. If a project root is on GOPATH, it will supercede the lock-named version and the source cache for ALL child packages; source code must derive wholly from either the source cache OR GOPATH - no inconsistent mixing. This also needs to be toggle-able - see the
- When encountering an import path, consult the trie to determine the location from which the package should be sourced, and...issue corresponding instructions to
go tool compile
, which may then also need some parameter changes to e.g.-I
and-D
? (This is the boundary of my knowledge) - If the import path is not in the trie (which could occur if the lock is out of date, a package was
ignore
d, there's a bug in the solver, etc.), there two options - error out entirely, or fall back to GOPATH search. Perhaps that could be controlled via a flag togo build
.
go install
's role is pretty much unchanged - a thin layer on top of go build
that places the resulting binary in $GOPATH/bin
.
There's a lot more we could and probably should do here, but I think this is at least an adequate skeleton for discussion right now.
In this world, go get
becomes solely focused on the needs of users installing upstream go software - not those developing it. Of course, users might choose to install e.g. a linter via go get
, and that's fine - but, in order to provide project-specific tooling reproducibility, installation and management of developer tools should eventually move dep
-ward.
For the most part, go get
is a layer on top of go build
in the same way as go install
, with a few exceptions. For each named package:
go get
derives the project root, then searches the source cache† for the most recent version, per this algorithm.- If
-u
is passed, the remote source is queried to see if any newer versions are available, and the most recent one is selected, downloaded, and placed into the source cache.
- If
- Once a version to operate on is selected,
go get
checks the project for a lock and manifest file.- If no lock file is present,
go get
initiates agps.Solve()
with an empty manifest - equivalent to the gps example. This should be as good or better than existinggo get
tip-fetching behavior in all but very odd cases. The solver Solution (itself a Lock) is used to populate the source cache with new deps as needed. - If a lock file is present but the memo does not match, run a
gps.Solve()
with the lock as input. - If solve fails in either of the preceeding cases...probably error out? Not sure yet. :)
- If no lock file is present,
- Proceed with
go install
targeted at the named package within the source cache, passing the lock from the previous step as a parameter and setting the flag to ignore GOPATH.
† Being able to perform this search is the reason why having only hash digest identifiers as versions in the source cache is a problem. And it seems unlikely that go get
would be the only tool with this requirement.
Re: "2. We cannot programatically differentiate between code the user is working on and code they merely depend on" - yes we can:
export GOPATH=$HOME/go/deps:$HOME/go/work
- yourgo get
deps will always be pulled in the first path fromGOPATH
and you can do your work in the 2nd. Cheers!