Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save codeslinger/556197 to your computer and use it in GitHub Desktop.
Save codeslinger/556197 to your computer and use it in GitHub Desktop.
votes CF
"back in black" => { 201005211200 => '1', 201005201159 => '1', 201005201157 => '1', 201005011900 => '1', 201004190600 => '1' },
"black album" => { 201005021800 => '1', 201005010600 => '1' },
"black star" => { 201005011000 => '1' }
cached_counts CF
"back in black" => { 'cached_count' => 2, 'counted_until' => 201005011931 },
"black album" => { 'cached_count' => 1, 'counted_until' => 201005010600 }

Consistent Vote Counting in Cassandra

Here’s a way to implement Consistent Vote Counting using Cassandra that doesn’t depend on vector clocks or an atomic increment operation.

Recording a vote

On a vote, use the object id as key, and insert into the votes CF a column named for the current timestamp. This can be a TimeUUID, or a microseconds count if you’re stingy. Note that you can’t just use the column’s timestamp for this trick: it depends on doing a slice query on the column name. (There’s no corresponding way to query against columns’ timestamps.)

To read the number of votes:

First, get the cached count:

cached_count, counted_until = db.get_columns('cached_counts', 'back in black', ['cached_count', 'counted_until'])

Next, do a get_slice from the counted_until timestamp onward:
newer_votes = db.get('votes', 'back in black', :start => counted_until.to_i)

The number of votes is the cached_count plus the number of newer_votes
votes_count = cached_count.to_i + newer_votes.length

To update the cached count:

Choose a time in the distant past: far enough back you’re comfortable the database is consistent.

new_counted_until = 1.day.ago.to_i

Read the current cached_count and counted_until;
cached_count, counted_until = db.get_columns('cached_counts', 'back in black', ['cached_count', 'counted_until'])

Then count the columns between counted_until and new_counted_until. Note that this time we’re specifying an endpoint:
countable_votes = db.get('votes', 'back in black', :start => counted_until.to_i, :finish => new_counted_until)

Insert the new cached_count and counted_until into the cached_counts CF:
insert('cached_counts', 'back in black', { "cached_count" => 4, :counted_until => new_counted_until })

Consistency

Say that we begin with the state


votes:
  "back in black" => { 201005211200 => '1', 201005201159 => '1', 201005201157 => '1',  201005011900 => '1', 201004190600 => '1' },
cached_counts:
  "back in black" => { 'cached_count' => 2, 'counted_until' => 201005011931 },

Alice starts the update process at 12pm, and slices from 201005011931 to 2010052111158. She will get a new count of 3. Meanwhile at 12:05 Bob started an update and sliced from 201005011931 to 201005211204. He will get a new count of 5.

It doesn’t matter which order they commit: the assertions

  "back in black" => { 'cached_count' => 3, 'counted_until' => 2010052111158 }

and
  "back in black" => { 'cached_count' => 5, 'counted_until' => 201005211204 }

are both correct. The start and end timestamps of the get_slice are distant enough that both queries are in themselves consistent. Since we include the endpoint with the assertion, the cached_count is also correct.

Note: If you like, you can avoid the extra counted_until column by instead using it as the value for the cached_count column’s timestamp. Since the ruby gem doesn’t yet expose the timestamp on reads, however, I’ve just made it a separate column.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment