Binary search is theoretically optimal, but it's possible to speed it up substantially using AVX2 and branchless code even in .NET Core.
Memory access is the limiting factor for binary search. When we access each element for comparison a cache line is loaded, so we could load a 32-byte vector almost free, check if it contains the target value, and if not reduce the search space by 32/sizeof(T)
instead of 1 element.
AVX512 with _mm256_cmpge_epi64_mask
instruction should improve it even more, but it is not available on .NET yet.
SearchBench