sdboyer/new-godev.md Secret

## new-godev.md

      
    Raw
  

              new-godev.md
            
          
    NOTE - this is a VERY rough draft - I'd be surprised there aren't some sentences that just cut off in the middle, to say nothing of needing better organization and flow. Try to read it as a skeletal description of one possible approach.
A new Go development environment

Since Go's release, a lot of ideas have circulated about how the Go development environment might be improved. With the introduction of a first-class dependency management system to the go toolchain, many of these goals suddenly become feasible to address centrally in the toolchain itself, rather than requiring awkward hacks by the user. This proposal outlines a unified solution to address several of these goals at once:

Project-centric development: working on a single Go project from an arbitrary disk location, irrespective of GOPATH
Multi-project development: working on multiple interrelated Go projects at once, inside a shared worksapce (a re-envisioned GOPATH)
Decluttered GOPATH: GOPATH has a tendency to accumulate code over time, and can become quite unwieldy to manage
Better compiled object reuse: compiled objects (under GOPATH/pkg) can be parameterized by version for targeted reuse

The essence of this proposal is to create an alternate space for Go source code, similar to GOPATH/src but with version information demultiplexed, and expand the toolchain to use this new space through reliance on the information in dep-style manifest and lock files. All of the above goals would fall out directly from this change, while retaining good, although unavoidably incomplete, backwards compatibility with existing workflows.
A new GOPATH

Today's GOPATH has two fundamental problems:

It only allows a single version of any given package to exist at once (per GOPATH)
We cannot programatically differentiate between code the user is working on and code they merely depend on

More or less all the other problems with GOPATH stem from the interaction of these two issues. Because only one version of a given package can exist at a time, it may be necessary to switch versions around to meet differing build requirements; however, we have no insight into which code the user is working on - which dictates the requirements for dependencies, and should not be changed - versus dependencies, which might be changed.
Introduction of the vendor/ directory ushered in a different paradigm: project-centric development. This wholly different approach handily addresses both of the aforementioned problems with GOPATH, and works quite well for many development workflows. With the addition of dep's standardized manifest and lock files, tooling has sufficient information to select versions for all of a project's dependencies, and reproduce that set of dependencies into vendor/ on any machine (assuming the continued upstream availability of the underlying source).
But a project-centric approach is not a panacea. For example, if a developer needs to work simultaneously on two or more projects that import each other, a single-project-centric approach necessarily falls down. On the other hand, this use case is met easily in today's world of shared workspaces. Each discrete project can maintain its own identity, while being treated collectively for higher-order tasks like compilation or dependency selection.
There are halfway solutions to addressing the multi-project problem using vendor, but they all reduce to hacks: symlinking, cp/rsync scripts, fsnotify/inotify, etc. Only having a proper workspace will address the issue, and that means reimagining GOPATH. The best, most backwards-compatible way of doing that is to split up and reassign some of GOPATH's current responsibilities.
At present, there are two major reasons why source code might be present on GOPATH:

The user is hacking on that code
The user indicated a need for the code via go get (or an analogous tool), either directly or as a dependency

(There's arguably a couple other cases, but...well, putting a pin in that for now)
If we can offload the responsibility for #2 into some other space, then GOPATH's problem with ambiguous user intent vanishes; if it's in GOPATH, the user is hacking on it, and tooling should treat it as a fixed point in dependency selection.
The challenge is in designing this third area - a "source cache" - and making the toolchain modifications necessary for working with it. Such a source cache would not only clean up GOPATH - for users following the single-project-centric development model, it would obviate the need for working within a GOPATH entirely.
The Source Cache

The source cache should have several properties:

Packages are structured by by import path AND version (as opposed to merely import path, as in GOPATH)
Is not associated with any single GOPATH
Can store both compiled objects and source code
Tooling can verify its integrity

dep's manifest and lock provide import paths and versions - all the information tooling would need to address code in the source cache, rather than on GOPATH by mere import path. The first item is really the hardest thing to achieve (and #4 falls out from it if implemented well), but I haven't been able to fully nail it down. So, I have some disconnected thoughts:

dep is no more certain about the integrity of the code it works with than the version control systems that underly it. It inherits their guarantees, and reuses their identifiers - versions and revisions, etc. This is maybe not great, and dovetails with the  vendor verification issue.
Ideally, entries in the source cache should be immutable. In a context like this, an immutable identifier tends to come from some hashing scheme, and as such will be a fixed number of bytes. This is fine for direct addressing, but is useless for sorting, and not human-friendly. And, depending on the address-generation scheme, it may not be readily addressable.

There are versions that expressly shouldn't be immutable - branch names, for example. So, either the source cache needs to operate at a lower level that will make the organization of its contents non-obvious to users, or we have to give up immutability, or create a sidecar to hold that information.


Ideally, we would develop a well-defined 'source' abstraction - one that covers all VCS-backed sources currently supported by the toolchain, but could also cover a future 'registry' type, and possibly also arbitrary code trees found on disk.
Tooling Changes

Unfortunately, I'm not coming into this having mountains of experience with go toolchain internals, and have only had time for a cursory look about what sort of changes would be entailed by working with a source cache. So, please take the following discussions as an attempt to outline a general shape of how the behavior would need to change, rather than a detailed and correct plan.
go build and go install

Being the general entry point into the compiler, the most crucial changes will likely have to happen here. Because all of this folds in with existing GOPATH behavior, I think it should be possible to hide behind a feature flag.
The general behavior of specifying packages to go build probably needn't change:
Build compiles the packages named by the import paths,
along with their dependencies, but it does not install the results.

If the arguments to build are a list of .go files, build treats
them as a list of source files specifying a single package.

Arbitrary import path names, or filenames, can still be provided, and go build will search GOPATH for them in the same way it always has. Changes begin once the tool has established the root director(ies) which it will be building. A high-level description of the algorithm would be (for each named package):

Climb the dir tree, attempting to find a lock file - basically, dep.findProjectRoot(). If no lock is found:

...and the cwd/named package is not within a GOPATH, error out.
...and the named package is within a GOPATH, proceed with the legacy go build behavior (skip everything below).
A lock file might also be explicitly specified as an input - go get would need this, see below. (This would combine awkwardly with the ability to specify an arbitrary number of packages, though)


Load up the lock's list of project (import) root/version pairs into a trie, which will need to be injected/made available for the entire transitive scope of compilation work entailed by the named package's imports.

If the named package is within a GOPATH, it should be searched for each of the project roots named in the lock. If a project root is on GOPATH, it will supercede the lock-named version and the source cache for ALL child packages; source code must derive wholly from either the source cache OR GOPATH - no inconsistent mixing. This also needs to be toggle-able - see the go get discussion.
The behavior of vendor/ here is tricky. I'd say it should generally supercede GOPATH in the same way GOPATH supercedes the source cache, but the big change here precomputing where import paths should be sourced from, rather than doing localized search on a per-package basis (as happens with vendor/).
Relative import paths are also tricky. This is part of why I've just disallowed them entirely in gps (for now).
It should be noted that this algorithm will need significant refactoring if/when we get around to allowing multiple major versions of the same project in a build, as we will need to separate not only and import statement's importee (as we do here), but also the importer, so that packages are given the version of the imports they expect.


When encountering an import path, consult the trie to determine the location from which the package should be sourced, and...issue corresponding instructions to go tool compile, which may then also need some parameter changes to e.g. -I and -D? (This is the boundary of my knowledge)
If the import path is not in the trie (which could occur if the lock is out of date, a package was ignored, there's a bug in the solver, etc.), there two options - error out entirely, or fall back to GOPATH search. Perhaps that could be controlled via a flag to go build.

go install's role is pretty much unchanged - a thin layer on top of go build that places the resulting binary in $GOPATH/bin.
There's a lot more we could and probably should do here, but I think this is at least an adequate skeleton for discussion right now.
go get

In this world, go get becomes solely focused on the needs of users installing upstream go software - not those developing it. Of course, users might choose to install e.g. a linter via go get, and that's fine - but, in order to provide project-specific tooling reproducibility, installation and management of developer tools should eventually move dep-ward.
For the most part, go get is a layer on top of go build in the same way as go install, with a few exceptions. For each named package:

go get derives the project root, then searches the source cache† for the most recent version, per this algorithm.

If -u is passed, the remote source is queried to see if any newer versions are available, and the most recent one is selected, downloaded, and placed into the source cache.


Once a version to operate on is selected, go get checks the project for a lock and manifest file.

If no lock file is present, go get initiates a gps.Solve() with an empty manifest - equivalent to the gps example. This should be as good or better than existing go get tip-fetching behavior in all but very odd cases. The solver Solution (itself a Lock) is used to populate the source cache with new deps as needed.
If a lock file is present but the memo does not match, run a gps.Solve() with the lock as input.
If solve fails in either of the preceeding cases...probably error out? Not sure yet. :)


Proceed with go install targeted at the named package within the source cache, passing the lock from the previous step as a parameter and setting the flag to ignore GOPATH.

† Being able to perform this search is the reason why having only hash digest identifiers as versions in the source cache is a problem. And it seems unlikely that go get would be the only tool with this requirement.