Skip to content

Instantly share code, notes, and snippets.

@yuzefovich
Last active June 27, 2024 01:05
Show Gist options
  • Save yuzefovich/f90358c2ac99b24b70fb7171338174d8 to your computer and use it in GitHub Desktop.
Save yuzefovich/f90358c2ac99b24b70fb7171338174d8 to your computer and use it in GitHub Desktop.

Takeaways

  1. Optimistic strategy is always beneficial (as expected given we have a hint about the previous value).
  2. Implicit strategy wins against Explicit one with no secondary indexes (as expected).
  3. Implicit vs Explicit choice is not as clear with addition of secondary indexes - I'd expect that the latter would be better since we should be losing 1PC optimization of Implicit. The tradeoff appears to be lower latency (with Implicit) vs lower CPU usage and higher foreground QPS (with Explicit).
  4. We should consider introducing some pacing whenever the replication job is started or resumed if there is data that needs replication - the hit on the foreground workload is massive.

Raw Notes

Default YCSB schema.

Using 19c752dc75325bd26098022d4f0ff93bd95470f0 + #126280 (cb6ac7bc0ca8963b0e63f973bd13b629a76df93a)

-- Disable ingestion queries attribution to user traffic to see more easily the foreground QPS

SET CLUSTER SETTING sql.internal_executor.attribute_to_user.enabled = false;

-- Disable auto stats collection to have cleaner profiles (focused on the foreground workload and replication)

SET CLUSTER SETTING sql.stats.automatic_collection.enabled = false;

22:16 workload run

CPU cluster: 33% QPS: 10.6k P99 latency: 3.2ms

22:19 jobs start:

22:20 CPU spiked to 93%, QPS dropped to 1.3k at the bottom 22:22 CPU stabilized at 54%, QPS stabilized at 7.3k-7.8k

22:23 defaults (implicit + optimistic)

SQL CPU workload: 11.0% SQL CPU replication: 16.6% CPU cluster: 54% QPS: 7.7k P99 latency: 7.0ms

22:25 (implicit + pessimistic)

SET CLUSTER SETTING logical_replication.consumer.try_optimistic_insert.enabled = false;

Slight slowdown overall.

SQL CPU workload: 9.7% SQL CPU replication: 22.5% CPU cluster: 57% QPS: 6.5k P99 latency: 9.8ms

22:30 (explicit + optimistic)

SET CLUSTER SETTING logical_replication.consumer.use_implicit_txns.enabled = false; RESET CLUSTER SETTING logical_replication.consumer.try_optimistic_insert.enabled;

CPU lower than with defaults, but QPS is lower and latency is higher.

SQL CPU workload: 10.2% SQL CPU replication: 14.9% CPU cluster: 51% QPS: 7.4k P99 latency: 12.0ms

22:35 (explicit + pessimistic)

SET CLUSTER SETTING logical_replication.consumer.use_implicit_txns.enabled = false; SET CLUSTER SETTING logical_replication.consumer.try_optimistic_insert.enabled = false;

Slight slowdown overall.

SQL CPU workload: 8.6% SQL CPU replication: 22.9% CPU cluster: 56% QPS: 6.3k P99 latency: 16.4ms

22:41 defaults (implicit + optimistic)

RESET CLUSTER SETTING logical_replication.consumer.try_optimistic_insert.enabled; RESET CLUSTER SETTING logical_replication.consumer.use_implicit_txns.enabled;

-- there were some issues at the storage level on one node of one cluster, so I wiped everything to start from scratch

One secondary index.

CREATE INDEX ON ycsb.usertable(field0); SET CLUSTER SETTING sql.internal_executor.attribute_to_user.enabled = false; SET CLUSTER SETTING sql.stats.automatic_collection.enabled = false;

22:55 workload run

CPU cluster: 63% QPS: 7.6k P99 latency: 4.2ms

23:05 jobs start:

23:06 CPU spiked to 93%, QPS dropped to 470 at the bottom 23:11 cluster B stabilized at 70% CPU, 6.8k QPS, two clusters appear to "fight" a bit 23:14 cluster A stabilized at 76% CPU, 4.4k QPS, cluster B is at 75%, 4.8k

23:17 defaults (implicit + optimistic)

SQL CPU workload: 5.5% SQL CPU replication: 7.5% CPU cluster: 75% QPS: 4.6k P99 latency: 16.2ms

23:20 (implicit + pessimistic)

SET CLUSTER SETTING logical_replication.consumer.try_optimistic_insert.enabled = false;

Slight slowdown overall.

SQL CPU workload: 4.6% SQL CPU replication: 12.5% CPU cluster: 75% QPS: 4.2k P99 latency: 22.8ms

23:25 (explicit + optimistic)

SET CLUSTER SETTING logical_replication.consumer.use_implicit_txns.enabled = false; RESET CLUSTER SETTING logical_replication.consumer.try_optimistic_insert.enabled;

23:26-23:31 spikes in WAL Fsync Latencies on cluster A 23:33-23:34 spikes in WAL Fsync Latencies on cluster B

23:36

CPU lower than with defaults with QPS being higher and latency slightly higher.

SQL CPU workload: 5.9% SQL CPU replication: 7.8% CPU cluster: 71% QPS: 4.9k P99 latency: 20.8ms

23:38 (explicit + pessimistic)

SET CLUSTER SETTING logical_replication.consumer.use_implicit_txns.enabled = false; SET CLUSTER SETTING logical_replication.consumer.try_optimistic_insert.enabled = false;

Slight slowdown overall.

SQL CPU workload: 5.5% SQL CPU replication: 10.8% CPU cluster: 72% QPS: 4.2k P99 latency: 26.0ms

22:43 defaults (implicit + optimistic)

RESET CLUSTER SETTING logical_replication.consumer.try_optimistic_insert.enabled; RESET CLUSTER SETTING logical_replication.consumer.use_implicit_txns.enabled;

23:50 (explicit + optimistic)

SET CLUSTER SETTING logical_replication.consumer.use_implicit_txns.enabled = false; RESET CLUSTER SETTING logical_replication.consumer.try_optimistic_insert.enabled;

Two secondary indexes.

23:54

-- about 113GiB of live bytes

CREATE INDEX ON ycsb.usertable(field1);

Three secondary indexes.

00:18

-- about 167GiB of live bytes

CREATE INDEX ON ycsb.usertable(field2);

00:25 explicit + optimistic

Things stabilized.

SQL CPU workload: 3.1% SQL CPU replication: 5.0% CPU cluster: 79% QPS: 2.6k P99 latency: 54ms

00:29 implicit + optimistic

SET CLUSTER SETTING logical_replication.consumer.use_implicit_txns.enabled = true; SET CLUSTER SETTING logical_replication.consumer.try_optimistic_insert.enabled = true;

SQL CPU workload: 3.5% SQL CPU replication: 3.9% CPU cluster: 81% QPS: 2.6k P99 latency: 31ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment