tazjin/layering.md

## layering.md

      
    Raw
  

              layering.md
            
          
    This program reads an export reference graph (i.e. a graph representing the
runtime dependencies of a set of derivations) created by Nix and groups them
in a way that is likely to match the grouping for other derivation sets with
overlapping dependencies.
This is used to determine which derivations to include in which layers of a
container image.
Inputs


a graph of Nix runtime dependencies, generated via exportReferenceGraph
a file containing absolute popularity values of packages in the
Nix package set (in the form of a direct reference count)
a maximum number of layers to allocate for the image (the "layer budget")

Algorithm

It works by first creating a (directed) dependency tree:
img (root node)
│
├───> A ─────┐
│            v
├───> B ───> E
│            ^
├───> C ─────┘
│     │
│     v
└───> D ───> F
      │
      └────> G

Each node (i.e. package) is then visited to determine how important
it is to separate this node into its own layer, specifically:


Is the node within a certain threshold percentile of absolute
popularity within all of nixpkgs? (e.g. glibc, openssl)


Is the node's runtime closure above a threshold size? (e.g. 100MB)


In either case, a bit is flipped for this node representing each
condition and an edge to it is inserted directly from the image
root, if it does not already exist.
For the rest of the example we assume 'G' is above the threshold
size and 'E' is popular.
This tree is then transformed into a dominator tree:
img
│
├───> A
├───> B
├───> C
├───> E
├───> D ───> F
└───> G

Specifically this means that the paths to A, B, C, E, G, and D
always pass through the root (i.e. are dominated by it), whilst F
is dominated by D (all paths go through it).
The top-level subtrees are considered as the initially selected
layers.
If the list of layers fits within the layer budget, it is returned.
Otherwise layers are merged together in this order:

layers whose root meets neither condition above
layers whose root is popular
layers whose root is big
layers whose root meets both conditions

Threshold values

Threshold values for the partitioning conditions mentioned above
have not yet been determined, but we will make a good first guess
based on gut feeling and proceed to measure their impact on cache
hits/misses.
Example

Using the logic described above as well as the example presented in
the introduction, this program would create the following layer
groupings (assuming no additional partitioning):
Layer budget: 1
Layers: { A, B, C, D, E, F, G }
Layer budget: 2
Layers: { G }, { A, B, C, D, E, F }
Layer budget: 3
Layers: { G }, { E }, { A, B, C, D, F }
Layer budget: 4
Layers: { G }, { E }, { D, F }, { A, B, C }
...
Layer budget: 10
Layers: { E }, { D, F }, { A }, { B }, { C }