Skip to content

Instantly share code, notes, and snippets.

@timvisee
Last active May 16, 2023 13:42
Show Gist options
  • Save timvisee/a775f9a0016f5410b46ce7fdbd1b253b to your computer and use it in GitHub Desktop.
Save timvisee/a775f9a0016f5410b46ce7fdbd1b253b to your computer and use it in GitHub Desktop.

Simple mmap benchmark

Machine:

  • Linux 6.2
  • 32GB RAM
  • Swap disabled

Code:

Collection:

  • 10_000_000 vectors
  • 512 dimensions
  • 20.9GB on disk
  • Using mmap (threshold 1000)
  • No index

Request:

  • Search:
    POST /collections/test/points/search?exact=true
    
    {
      "limit": 1000,
      "vector": [
          -0.00022172113,
          -0.0005458312,
          ...
      ]
    }
    

Results

  • Cold means all (disk) caches are purged.
  • Hot means disk cache is still available from a previous run.

Normal

./qdrant

Cold Hot
Startup 5s 5s
- VIRT 29.6G 29.6G
- RES 1417M 1439M
- SHR 68K 68K
First search 44.35s 433ms
- RES 20.9G 20.9G
- SHR 19.5G 19.5G
Second search 433ms 498ms

Not having mmap pages ready in cache adds ~45s.

With MADV_WILLNEED

MADVISE_WILL_NEED=1 ./qdrant

Cold Hot
Startup 5s 5s
- VIRT 29.6G 29.6G
- RES 1438M 1439M
- SHR 68K 68K
First search 47.11s 538ms
- RES 20.9G 20.9G
- SHR 19.5G 19.5G
Second search 462ms 428ms

No visible improvement. This doesn't pre-fault all mmap pages.

With MADV_WILLNEED and read first byte

MADVISE_WILL_NEED=1 MADVISE_READ_BYTE=1 ./qdrant

Cold Hot
Startup 5s 5s
- VIRT 29.6G 29.6G
- RES 1417M 1437M
- SHR 68K 69K
First search 46.88s 575ms
- RES 20.9G 20.9G
- SHR 19.5G 19.5G
Second search 461ms 463ms

No visible improvement. This doesn't pre-fault all mmap pages, not even when reading the first byte from the first page.

With MAP_POPULATE

MMAP_POPULATE=1 ./qdrant

Cold Hot
Startup 14s 6s
- VIRT 29.6G 29.6G
- RES 20.9G 20.9G
- SHR 19.5G 19.5G
First search 457ms 449ms
- RES 20.9G 20.9G
- SHR 19.5G 19.5G
Second search 414ms 425ms

Populating does properly pre-fault all all mmap pages, but this is blocking, and significantly increases the startup time and the time to first response. Populating only works on Linux.

Populating adds 9s to the startup time, but removes 45s from the first search request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment