Skip to content

Instantly share code, notes, and snippets.

@r6m
Forked from CAFxX/golang_minimize_allocations.md
Created February 15, 2023 08:59
Show Gist options
  • Save r6m/238c74d48033e02d15ed725f12ed8069 to your computer and use it in GitHub Desktop.
Save r6m/238c74d48033e02d15ed725f12ed8069 to your computer and use it in GitHub Desktop.
Minimize allocations in Go

📂 Minimize allocations in Go

Protobuf

Working with protobuf in Go puts significant load on the memory subsystem, as protobuf-generated structures often contain a significant amount of pointers.

One way to minimize the number of allocations is to allocate all the fields at the same time, and then use internal pointers to wire them up.

Let's pretend you have (the generated Go code for) four messages A, B, C, and D (note: for clarity this is not really protobuf-generated code, but the technique genralizes trivially):

type A struct {
	b *B
	c *C
	d *D
}

type B struct {
	int
}

type C struct {
	string
}

type D struct {
	c *C
}

Normally, assuming you need to populate all of them, you would do something like:

a := &A{
  b: &B{42},
  c: &C{"hello"},
  d: &D{
    c: &C{"world"},
  },
}

This is very idiomatic and compact, but has the downside of requiring 5 allocations just for the message structure.

Storage packing

Can we do better? Ideally we would like to perform a single allocation, instead of having to allocate every single field. Luckily, yes, we can.

To do this we can build an auxiliary struct that will provide the storage for all the fields and then wire up the pointers appropriately. To allocate an appropriate storage structure, we can do e.g.

s := &struct {
	A
	_b   B
	_c   C
	_d   D
	_d_c C
}{}

Then, to wire up all the internal pointers (in order to maintain compatibility with callers that are unaware of the change, like the protobuf ecosystem) we can do:

// none of these allocate
a := &s.A
a.b = &s._b
a.c = &s._c
a.d = &s._d
a.d.c = &s._d_c

At this point we can initialize the values as required:

*a.b = B{42}
*a.c = C{"hello"}
*a.d.c = C{"world"}

Note that a and *a are, repsectively, an *A and A indistinguishable from any other *A and A allocated in the regular way.

This playground link shows the complete example, and includes a test that shows that a single allocations is being performed: https://play.golang.org/p/R8DBVe7jaeX. Note that all the scaffolding for allocating the full structure has been moved to a separate function getA that returns a plain *A.

There are also some possible variations on the generic approach above, that can help in specific cases, e.g.:

  • only pack a subset of fields and manually allocate the others when needed
  • reorder the fields for better locality
  • reserve storage in each field instead of in the top level (e.g. create a second packing structure to replace D, containing both *C and the storage for C, and include this structure in the top level storage structure).

For the record, I also suggested to implement this (as an option) in gogoproto: gogo/protobuf#555.

Upsides

  • Better memory locality (as all fields are contiguous to each other)
  • Better cache utilization (this depends on your access patterns, it is possible that in certain cases cache utilization will be worse)
  • Lower number of memory allocations, and therefore lower CPU usage
  • The mechanism is generic, and can be used not just in case of protobuf messages

Downsides

Are there any downsides to this approach, or it can be safely used in all cases? Glad you asked!

There are some downsides (suggestions welcome!):

  • You need to decide at compile-time which fields to allocate; if then at runtime you don't need all of them, you will be wasting the memory for the fields you are not using.
  • Related to the previous point: because all fields are allocated in a single object, as long as any field is alive all the other fields can not be garbage collected.
  • This approach only reduces the number of allocations, not the amount of memory allocated.

None of this is a deal-breaker, but depending on your workload and use cases it may cause memory consumption to increase.

Closures

If you create closures on a hot path and the closure captures more than one variable from its parent environment, each captured variable escapes to the heap and therefore will need to be heap-allocated with a dedicated allocation. This means that your closure with N captures will normally require N+1 allocations.

func foo(a *bar, b int) func() {
	// This requires 3 allocations: 1 for the closure itself, and 2 for the
	// captured variables a and b.
	go func() {
		doSomething(a, b)
	}()
}

You can cut this down to 2 allocations by grouping the captured variables in a temporary struct, and capturing the struct instead:

func foo(a *bar, b int) func() {	
	// This requires 2 allocations: 1 for the closure itself, and 1 for the
	// captured variable x (regardless of how many fields the structure contains).
	x := struct{a *bar; b int}{a, b}
	go func() {
		doSomething(x.a, x.b)
	}()
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment