Machine:
- Linux 6.2
- 32GB RAM
- Swap disabled
Code:
Collection:
- 10_000_000 vectors
- 512 dimensions
- 20.9GB on disk
- Using mmap (threshold 1000)
- No index
Request:
- Search:
POST /collections/test/points/search?exact=true { "limit": 1000, "vector": [ -0.00022172113, -0.0005458312, ... ] }
- Cold means all (disk) caches are purged.
- Hot means disk cache is still available from a previous run.
./qdrant
Cold | Hot | |
---|---|---|
Startup | 5s | 5s |
- VIRT | 29.6G | 29.6G |
- RES | 1417M | 1439M |
- SHR | 68K | 68K |
First search | 44.35s | 433ms |
- RES | 20.9G | 20.9G |
- SHR | 19.5G | 19.5G |
Second search | 433ms | 498ms |
Not having mmap pages ready in cache adds ~45s.
MADVISE_WILL_NEED=1 ./qdrant
Cold | Hot | |
---|---|---|
Startup | 5s | 5s |
- VIRT | 29.6G | 29.6G |
- RES | 1438M | 1439M |
- SHR | 68K | 68K |
First search | 47.11s | 538ms |
- RES | 20.9G | 20.9G |
- SHR | 19.5G | 19.5G |
Second search | 462ms | 428ms |
No visible improvement. This doesn't pre-fault all mmap pages.
MADVISE_WILL_NEED=1 MADVISE_READ_BYTE=1 ./qdrant
Cold | Hot | |
---|---|---|
Startup | 5s | 5s |
- VIRT | 29.6G | 29.6G |
- RES | 1417M | 1437M |
- SHR | 68K | 69K |
First search | 46.88s | 575ms |
- RES | 20.9G | 20.9G |
- SHR | 19.5G | 19.5G |
Second search | 461ms | 463ms |
No visible improvement. This doesn't pre-fault all mmap pages, not even when reading the first byte from the first page.
MMAP_POPULATE=1 ./qdrant
Cold | Hot | |
---|---|---|
Startup | 14s | 6s |
- VIRT | 29.6G | 29.6G |
- RES | 20.9G | 20.9G |
- SHR | 19.5G | 19.5G |
First search | 457ms | 449ms |
- RES | 20.9G | 20.9G |
- SHR | 19.5G | 19.5G |
Second search | 414ms | 425ms |
Populating does properly pre-fault all all mmap pages, but this is blocking, and significantly increases the startup time and the time to first response. Populating only works on Linux.
Populating adds 9s to the startup time, but removes 45s from the first search request.