Skip to content

Instantly share code, notes, and snippets.

@zingaburga
zingaburga / despace.c
Last active July 8, 2024 11:56
Despace AVX512: aligned vs unaligned
#include <immintrin.h>
#include <stdio.h>
#include <sys/time.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
// assume initial pointers are aligned and len is a multiple of 128
// compiled with: cc -march=sapphirerapids -O3 despace.c
@zingaburga
zingaburga / bench.sh
Created August 18, 2022 05:23
Benchmarking masked AVX-512 memory ops
#!/bin/sh
echo "== Aligned load"
./nanoBench.sh -asm_init "MOV RAX, 0x3f; ANDN RBX, RAX, R14" -asm "VMOVDQU8 zmm0, [rbx]" -config configs/cfg_AlderLakeP_common.txt
echo "== Unaligned load"
./nanoBench.sh -asm_init "MOV RAX, 0x3f; ANDN RBX, RAX, R14; SUB RBX, 8" -asm "VMOVDQU8 zmm0, [rbx]" -config configs/cfg_AlderLakeP_common.txt
echo "== Unaligned load (8b, 0 mask)"
./nanoBench.sh -asm_init "MOV RAX, 0x3f; ANDN RBX, RAX, R14; SUB RBX, 8; KXORQ k1, k1, k1" -asm "VMOVDQU8 zmm0{k1}{z}, [rbx]" -config configs/cfg_AlderLakeP_common.txt
echo "== Unaligned load (8b, -1 mask)"
./nanoBench.sh -asm_init "MOV RAX, 0x3f; ANDN RBX, RAX, R14; SUB RBX, 8; KXNORQ k1, k1, k1" -asm "VMOVDQU8 zmm0{k1}{z}, [rbx]" -config configs/cfg_AlderLakeP_common.txt
@zingaburga
zingaburga / sve2.md
Last active December 14, 2024 23:43
ARM’s Scalable Vector Extensions: A Critical Look at SVE2 For Integer Workloads

ARM’s Scalable Vector Extensions: A Critical Look at SVE2 For Integer Workloads

Scalable Vector Extensions (SVE) is ARM’s latest SIMD extension to their instruction set, which was announced back in 2016. A follow-up SVE2 extension was announced in 2019, designed to incorporate all functionality from ARM’s current primary SIMD extension, NEON (aka ASIMD).

Despite being announced 5 years ago, there is currently no generally available CPU which supports any form of SVE (which excludes the [Fugaku supercomputer](https://www.fujitsu.com/global/about/innovation/