Skip to content

Instantly share code, notes, and snippets.

@beginor
Last active November 20, 2024 09:41
Show Gist options
  • Save beginor/9d9f90bc58e1313f6aecd107f8296732 to your computer and use it in GitHub Desktop.
Save beginor/9d9f90bc58e1313f6aecd107f8296732 to your computer and use it in GitHub Desktop.
Twitter Snowflake ID for PostgreSQL
CREATE SEQUENCE public.global_id_seq;
ALTER SEQUENCE public.global_id_seq OWNER TO postgres;
CREATE OR REPLACE FUNCTION public.id_generator()
RETURNS bigint
LANGUAGE 'plpgsql'
AS $BODY$
DECLARE
our_epoch bigint := 1314220021721;
seq_id bigint;
now_millis bigint;
-- the id of this DB shard, must be set for each
-- schema shard you have - you could pass this as a parameter too
shard_id int := 1;
result bigint:= 0;
BEGIN
SELECT nextval('public.global_id_seq') % 1024 INTO seq_id;
SELECT FLOOR(EXTRACT(EPOCH FROM clock_timestamp()) * 1000) INTO now_millis;
result := (now_millis - our_epoch) << 23;
result := result | (shard_id << 10);
result := result | (seq_id);
return result;
END;
$BODY$;
ALTER FUNCTION public.id_generator() OWNER TO postgres;
-- SELECT public.id_generator()
@jiayaoO3O
Copy link

I would like to ask the general snowflake id will have two variables, one is datacenterId, the other is workerId, which two variables should correspond here?

@beginor
Copy link
Author

beginor commented Jan 9, 2022

shard_id is datacenter id, sec_id is worker id

@jiayaoO3O
Copy link

you mean "seq_id" ? The seq_id here is not assigned a value, is its default value 0?

@beginor
Copy link
Author

beginor commented Jan 10, 2022

Sorry, I made a mistake, this is just a simplified version of snow flake id, the shard_id should be datacenter id or worker id.

there is only one fixed part (should be the datacenter id or worker id) in the generated result, like this:

timestamp shard_id seq_id
1641784918526 020 113
1641785292978 020 114

@soroktu
Copy link

soroktu commented Aug 25, 2022

Why do you call this generator Snowflake? It's a generator that Instagram once used a long time ago.

https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c

@RodolfoSilva
Copy link

@soroktu Yes, it is.

@davidshepherd7
Copy link

davidshepherd7 commented Dec 7, 2022

A warning for anyone using this: this implementation generates non-monotonic IDs but Twitter's implementation always generates monotonic IDs (per-shard). Whether or not this matters depends on your use-case.

(Monotonic means that on any given shard IDs which are generated later in time are always larger than older ones.)

Details of how this happens: so the first bytes (time, shard) are monotonic at least to the extent that time is monotonic (pretty close), but the potential problem is the last few bytes: nextval(seq_name) % 1024. Say the sequence is currently at 1022 and we generate a few ids within one millisecond then the id sequence will be X + 1022, X + 1023, X + 0000, X + 0001. So later ids in the sequence are smaller than earlier ones.

Obviously it can also happen if the system clock isn't monotonic, Twitters implementation also prevents this.

@RodolfoSilva
Copy link

👀😱
What’s the alternative? Do you have any solution to this? @davidshepherd7

@davidshepherd7
Copy link

Do you have any solution to this?

Basically: no.

Our most likely approach is to first figure out if we really need monotonicity, if we don't then we'll just continue using this. If we do then we'll switch to a simpler ID format, something like (shard << 10) | sequence. We can do this because we don't strictly need the time part: we don't need compatibility with any other related ID generators and we don't need to spin up arbitrary new shards without communication. That would mean losing out on one other somewhat nice feature though: future IDs would become easily guessable (which snowflake IDs aren't, unless your volume is extremely high).

Twitter's implementation gets monotonicity by tracking the previous timestamp, and busy-waiting until the next millisecond if it would generate a non-monotonic ID. We don't think that's possible in postgres outside of a C extension because there's no sufficiently performant and transaction-ignoring way to share the previous timestamp to everywhere that would need it. If we found that we really needed exactly Twitters ID format we would probably go and write a C extension.

@rostero1
Copy link

Here's an extension, but I'm not sure if that solves the problem: https://github.com/mausimag/pgflake

@rostero1
Copy link

This is also interesting, but I think it probably suffers the same problem:
https://github.com/mastodon/mastodon/blob/main/lib/mastodon/snowflake.rb#L68-L106

@davidshepherd7
Copy link

Oh neat, it looks like pgflake probably does it right (from a quick glance). The part inside if (last_time == curr_time) looks the same as twitter's implementation.

@gautamborad
Copy link

@davidshepherd7 , what implementation did you guys settle with? We are also in the same boat and deciding which id generation method to use out of the plethora of options available out there! What I like about this, as compared to pgflake, is that we dont need to build this. We can just create the function public.id_generator when we are creating our db/tables on a new postgres instance.

But am worried about "generates non-monotonic IDs" part you mentioned. We already have a created_at column in each table and use that when we need "proper" sorting. With that in mind can you elaborate on "we'll switch to a simpler ID format, something like (shard << 10) | sequence." part.?Did you go with that design, if yes, did you find any issue?

Thanks in advance!

@davidshepherd7
Copy link

We're in an awkward situation where we have since tables that need a monotonic Id still using sequential IDs and others using the approach in the snippet. Resolving this hasn't been a priority yet, I'm not sure where we'll end up.

@gautamborad
Copy link

@hazelmeow
Copy link

Want to point out that this generates ids with Instagram's layout which is slightly different from Twitter's (Instagram uses 13 bits for shard ID + 10 for seq number and Twitter uses 10 for a machine ID + 12 for a seq number). So if you want to generate snowflakes w/ the Twitter layout you should change the seq number to % 4096 (2^12) and change the shifts to << 22 and << 12.

@davidshepherd7
Copy link

There's another extension for this here: https://github.com/pgEdge/snowflake-sequences I'm not sure if/how it's different to pgflake.

@srigi
Copy link

srigi commented Apr 17, 2024

Can somebody hint how to get now_millis from stored ID?
Since the row creation time is encoded in snowflake ID, it should be retrievable. I need it for example to "get only records created today".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment