From an x64 dev command prompt. I'm using VS 2019, but earlier and later versions should work the same. GLFW depends on git and cmake being available in PATH. No other dependencies.
git clone https://github.com/glfw/glfw
cd glfw
From an x64 dev command prompt. I'm using VS 2019, but earlier and later versions should work the same. GLFW depends on git and cmake being available in PATH. No other dependencies.
git clone https://github.com/glfw/glfw
cd glfw
The core of most real-time fluid simulators, like the one in EmberGen, are based on the "Stable Fluids" algorithm by Jos Stam, which to my knowledge was first presented at SIGGRAPH '99. This is a post about one part of this algorithm that's often underestimated: Projection
The Stable Fluids algorithm solves a subset of the famous "Navier Stokes equations", which describe how fluids interact and move. In particular, it typically solves what's called the "incompressible Euler equations", where viscous forces are often ignored.
Card Name VRAM (GB) Type Release Date Bandwidth (GB/s) | |
------------------------------------------------------------------------------------------ | |
GeForce RTX 2080 Ti | 11 | GDDR6 | Sep 20 2018 | 616.0 | |
Radeon RX 5700 XT | 8 | GDDR6 | Jul 7 2019 | 448.0 | |
Radeon RX 580 | 8 | GDDR5 | Apr 18 2017 | 256.0 | |
Radeon RX 570 | 4 | GDDR5 | Apr 18 2017 | 224.0 | |
GeForce RTX 2060 | 6 | GDDR6 | Jan 7 2019 | 336.0 | |
GeForce RTX 2070 SUPER | 8 | GDDR6 | Jul 9 2019 | 448.0 | |
GeForce GTX 1660 Ti | 6 | GDDR6 | Feb 22 2019 | 288.0 | |
GeForce GTX 1050 Ti | 4 | GDDR5 | Oct 25 2016 | 112.1 |
31.932 0.781 235.379 cs_filter3D_27stencil.glsl 512x512x512 R16F [8, 8, 8] | |
31.973 1.101 235.080 cs_filter3D_27stencil.glsl 512x512x512 R16F [32, 32, 1] | |
31.285 0.552 240.247 cs_filter3D_27stencil.glsl 512x512x512 R16F [32, 1, 32] | |
30.455 1.047 246.794 cs_filter3D_27stencil.glsl 512x512x512 R16F [16, 16, 1] | |
30.281 0.894 248.218 cs_filter3D_27stencil.glsl 512x512x512 R16F [16, 1, 16] | |
32.020 1.188 234.732 cs_filter3D_27stencil.glsl 512x512x512 R16F [16, 16, 4] | |
31.585 0.934 237.969 cs_filter3D_27stencil.glsl 512x512x512 R16F [16, 4, 16] | |
31.712 0.940 237.013 cs_filter3D_27stencil.glsl 512x512x512 R16F [4, 16, 16] | |
30.113 0.383 249.603 cs_filter3D_27stencil.glsl 512x512x512 R16F [4, 2, 16] | |
30.041 0.290 250.198 cs_filter3D_27stencil.glsl 512x512x512 R16F [16, 2, 4] |
19.432 0.095 386.792 cs_filter3D_27stencil.glsl 512x512x512 R16F [8, 8, 8] | |
19.150 0.149 392.494 cs_filter3D_27stencil.glsl 512x512x512 R16F [32, 32, 1] | |
18.925 0.132 397.147 cs_filter3D_27stencil.glsl 512x512x512 R16F [32, 1, 32] | |
18.203 0.138 412.910 cs_filter3D_27stencil.glsl 512x512x512 R16F [16, 16, 1] | |
18.483 0.128 406.655 cs_filter3D_27stencil.glsl 512x512x512 R16F [16, 1, 16] | |
19.548 0.142 384.503 cs_filter3D_27stencil.glsl 512x512x512 R16F [16, 16, 4] | |
19.298 0.167 389.487 cs_filter3D_27stencil.glsl 512x512x512 R16F [16, 4, 16] | |
19.458 0.116 386.272 cs_filter3D_27stencil.glsl 512x512x512 R16F [4, 16, 16] | |
18.272 0.117 411.344 cs_filter3D_27stencil.glsl 512x512x512 R16F [4, 2, 16] | |
18.279 0.696 411.186 cs_filter3D_27stencil.glsl 512x512x512 R16F [16, 2, 4] |
19.344 0.114 cs_filter3D_27stencil.glsl 512x512x512 R16F [8, 8, 8] | |
19.045 0.116 cs_filter3D_27stencil.glsl 512x512x512 R16F [32, 32, 1] | |
18.796 0.202 cs_filter3D_27stencil.glsl 512x512x512 R16F [32, 1, 32] | |
18.108 0.386 cs_filter3D_27stencil.glsl 512x512x512 R16F [16, 16, 1] | |
18.860 6.760 cs_filter3D_27stencil.glsl 512x512x512 R16F [16, 1, 16] | |
19.676 0.094 cs_filter3D_27stencil.glsl 512x512x512 R16F [16, 16, 4] | |
19.427 0.106 cs_filter3D_27stencil.glsl 512x512x512 R16F [16, 4, 16] | |
19.628 0.196 cs_filter3D_27stencil.glsl 512x512x512 R16F [4, 16, 16] | |
18.416 0.249 cs_filter3D_27stencil.glsl 512x512x512 R16F [4, 2, 16] | |
18.358 0.250 cs_filter3D_27stencil.glsl 512x512x512 R16F [16, 2, 4] |
We test the runtimes of simple compute shaders reading from one 3D texture using some kind of filter, and writing back to another texture. The local work group size of the compute shader is varied for some arbitrary set of work group sizes, and the effect of different internal texture formats are studied.
All tests are performed using 512x512x512
3D textures. At this size memory throughput and latency will be the primary bottleneck, so any extra calculations should have negligible impact on the timings.
All timings are measured by averaging the frame time across 128 frames, with a 128 frame warmup, with vsync disabled. Using queries might provide more stable numbers.
The work group sizes are:
b: 1x1x1 = 1 elements, 1 non-zero, 0 zeros | |
1 | |
x: 3x3x3 = 27 elements, 6 non-zeros, 21 zeros | |
0 0 0 | |
0 1 0 | |
0 0 0 |
package main | |
import "core:os"; | |
import "core:fmt"; | |
import "core:strconv"; | |
// model data stuff | |
Model_Data :: struct { | |
vertices: [][3]f32, | |
indices: []i32, |
Circular_Buffer :: struct(T: type, N: int) { | |
data: [N]T, | |
cursor: int, | |
length: int, | |
} | |
push_back :: inline proc(using cb: ^$T/Circular_Buffer, v: T.T) -> bool #no_bounds_check { | |
data[(cursor + length) %% T.N] = v; | |
if length < T.N { |