-
-
Save steveruizok/3d1bf2b95c1bbac9b46658527a2ca717 to your computer and use it in GitHub Desktop.
export class Cache<T extends object, K> { | |
items = new WeakMap<T, K>() | |
get<P extends T>(item: P, cb: (item: P) => K) { | |
if (!this.items.has(item)) { | |
this.items.set(item, cb(item)) | |
} | |
return this.items.get(item)! | |
} | |
access(item: T) { | |
return this.items.get(item) | |
} | |
set(item: T, value: K) { | |
this.items.set(item, value) | |
} | |
has(item: T) { | |
return this.items.has(item) | |
} | |
invalidate(item: T) { | |
this.items.delete(item) | |
} | |
bust() { | |
this.items = new WeakMap() | |
} | |
} |
class Example { | |
getOutline(shape) { | |
const sides = 5 | |
const ratio = 1 | |
const cx = w / 2 | |
const cy = h / 2 | |
const ix = (cx * ratio) / 2 | |
const iy = (cy * ratio) / 2 | |
const step = PI2 / sides / 2 | |
return Array.from(Array(sides * 2)).map((_, i) => { | |
const theta = -TAU + i * step | |
return new Vec2d( | |
cx + (i % 2 ? ix : cx) * Math.cos(theta), | |
cy + (i % 2 ? iy : cy) * Math.sin(theta) | |
) | |
}) | |
} | |
outline(shape: T) { | |
return outlines.get<T>(shape, (shape) => this.getOutline(shape)) | |
} |
Quick update—this is the fastest that I've gotten it so far. Surprisingly the destructuring trick is fast!
function pointsOnEllipse4(cx, cy, rx, ry, n) {
const xs = new Float32Array(n);
const ys = new Float32Array(n);
const step = Math.PI / (n / 2);
let sin = 0;
let cos = 1;
const a = Math.cos(step);
const b = Math.sin(step);
for (let i = 0; i < n; i++) {
xs[i] = cx + rx * cos;
ys[i] = cy + ry * sin;
;[sin, cos] = [b * cos + a * sin, a * cos - b * sin];
}
return { xs, ys }
}
I wouldn't pay too much attention to that destructuring, which is more microbench-y really. It invokes array iterator, but here the engine clearly sees there's nothing to be invoked. Won't necessarily be true if the code shifts around. Plus, readability. Aim for simple and fast code that looks like C (this has been a solid advice for the past 20 years no matter what magic JIT has done); C doesn't have a JIT, so coding while pretending there's no JIT helps too.
Regarding Float32Array: they might be faster, but the reason is likely that they're 32-bits instead of 64-bits. So yeah here again it's about having a smaller memory footprint. But on the other hand, you've now traded off your number precision (that we just fixed) that you might need in the future. Imo this is especially relevant for SVGs (as opposed to raster graphics and 3D), since e.g. if you have 2 same shapes with different colors perfectly on top of each other, you should see a single outline, not some colors from the background shape bleeding out (aka numerical imprecision). I don't know though. Just a general observation.
The structure-of-array transform we did is fast for 2 reasons:
- Way fewer tiny allocations. Tiny objects are still relatively fast to create because they're are usually allocated around the same places ("arenas"), so, good memory locality, thus perf, and so their drawback of being more allocate-y doesn't show up as much. But when readability is the same, I try not to rely on that assumption too much and just do the predictable less allocate-y thing. Plus there's also deallocation: GC pauses etc.
- The transform makes iteration and branch predictor very happy. Iterating over a plain array of numbers is the best-case scenario for modern CPUs (and GPUs, and eh, TPUs or whatever comes next I guess. This will always be relevant).
So that's why you see that 25% perf boost at basically no cost.
One last note on your initial problem: I think if we have to cache the outlines, we should at least lift the cache to a higher level, to a single big one. Having so many little disparate caches is a bit scary.
Update: final follow-up tweet at https://twitter.com/_chenglou/status/1571430079829528577
Verify Github on Galxe. gid:YZP8XDSaNU8GruWpvhJpFZ
It's mentioned that the cache is slow. This is to be expected when using a WeakMap.
Was inverting the cache considered? Meaning storing the x,y cached properties inside each associated original shape (either as public or private fields). This ensures there is no increase in the count of objects and makes cache access super fast. I am wondering if fast caching changes the balance here.
Good point. For Steve's use-case it seems storing the entire outline array on the instance every time the shape transforms, might be viable. It does nothing for latency and uses more memory, but still, seems better than a WeakMap.
Tldr modern performance is mostly about shuffling your memory into the right representation*. Which is a nice endeavor because starting from data structures is important anyway.
But yes, if it's not in the hot path, then don't bother! Though there might be an argument that this is the better data layout even just for ergonomics. I wouldn't know that without trying perfect-freehand though
* again, my rough approximation is that if you're doing the sort of optimizations that increases compute at the expense of memory, by e.g. caching or whatever, then you might have already lost, especially in latency