Add a `TimeUUID` class #231

cvybhu · 2023-06-23T08:55:36Z

timeuuid is a UUID v1 - it contains a timestamp and some random bits that together form a unique identifier.
Their comparison operator should compare them primarily based on this timestamp.

timeuuid values are currently represented using uuid.UUID, but the comparison operators of uuid.UUID don't compare timeuuid values in the same way that Cassandra/Scylla does.

For example, with values 00000257-0efc-11ee-9547-00006490e9a6 and fed35080-0efb-11ee-a1ca-00006490e9a4:

Cassandra believes that fed35080-0efb-11ee-a1ca-00006490e9a4 is smaller:

CREATE KEYSPACE ks WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 1};
CREATE TABLE ks.tab (p int, c timeuuid, PRIMARY KEY (p, c));
INSERT INTO ks.tab (p, c) VALUES (0, 00000257-0efc-11ee-9547-00006490e9a6);
INSERT INTO ks.tab (p, c) VALUES (0, fed35080-0efb-11ee-a1ca-00006490e9a4);
SELECT p, c FROM ks.tab;
 p | c
---+--------------------------------------
 0 | fed35080-0efb-11ee-a1ca-00006490e9a4
 0 | 00000257-0efc-11ee-9547-00006490e9a6

(2 rows)

Because it has a lower timestamp value:

 timeuuid_col                         | system.totimestamp(timeuuid_col)
--------------------------------------+----------------------------------
 fed35080-0efb-11ee-a1ca-00006490e9a4 |  2023-06-19 23:49:56.990000+0000
 00000257-0efc-11ee-9547-00006490e9a6 |  2023-06-19 23:49:58.961000+0000

But UUID comparison says the opposite, as it compares the bytes in lexicographical order:

>>> import uuid
>>> a = uuid.UUID('fed35080-0efb-11ee-a1ca-00006490e9a4')
>>> b = uuid.UUID('00000257-0efc-11ee-9547-00006490e9a6')
>>> a < b
False
>>> b < a
True

It would be useful to have a class that represents timeuuid values and has the same semantics as the values in Cassandra.

Here's an example implementation that I wrote, based on the cassandra implementation:

https://gist.github.com/cvybhu/ed5b64d8b62eff51dc46258157a92e41

Ideally python driver would return values of this type when some row is selected, but this would be a breaking change.

The text was updated successfully, but these errors were encountered:

cvybhu · 2023-06-23T08:56:46Z

I tried to use the existing python libraries, but it looks like we can't use any of them.

time_uuid does an unsigned comparison (In Cassandra 8080... is smaller than 0000 because it does signed comparison):

>>> import time_uuid
>>> a = time_uuid.TimeUUID('00000000-0000-1000-0000-000000000000')
>>> b = time_uuid.TimeUUID('00000000-0000-1000-8080-808080808080')
>>> a < b
True

python-timeuuid doesn't work with Python3:

root@dfc90c4a88f5:/# pip install python-timeuuid
Collecting python-timeuuid
  Using cached python-timeuuid-0.3.5.tar.gz (27 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [7 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-ptk7ypbx/python-timeuuid_4457d3d65d544d868b98787f21893a11/setup.py", line 20
          print 'Building with Cython %s' % cython_version
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      SyntaxError: Missing parentheses in call to 'print'. Did you mean print(...)?
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

mykaul · 2023-06-23T09:48:02Z

How does Cassandra's driver work, btw? Can you compare?

cvybhu · 2023-06-23T10:19:34Z

How does Cassandra's driver work, btw? Can you compare?

Same thing.

Looking at the code, they don't have any TimeUUID class: https://github.com/search?q=repo%3Adatastax%2Fpython-driver%20timeuuid&type=code

Here's a small test case:

# cassandra-driver==3.28.0
import cassandra
from cassandra.cluster import Cluster

s = Cluster(["127.0.0.1"]).connect()
s.execute("DROP KEYSPACE IF EXISTS ks")
s.execute("CREATE KEYSPACE ks WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 1}")
s.execute("CREATE TABLE ks.t (p int, t timeuuid, PRIMARY KEY (p, t))")

queries = [
"INSERT INTO ks.t (p, t) VALUES (0, 00000000-0000-1000-0000-000000000000);",
"INSERT INTO ks.t (p, t) VALUES (0, 00000000-0000-1000-8080-808080808080);",
"INSERT INTO ks.t (p, t) VALUES (0, 00000000-0000-1000-7f7f-7f7f7f7f7f7f);",
"INSERT INTO ks.t (p, t) VALUES (0, 00000000-0000-1000-f7f7-f7f7f7f7f7f7);",
"INSERT INTO ks.t (p, t) VALUES (0, 00000000-0000-1000-ffff-ffffffffffff);",
"INSERT INTO ks.t (p, t) VALUES (0, 00000000-0000-1000-a1ca-00006490e9a4);"]

for q in queries:
    s.execute(q)

rows = list(s.execute("SELECT t FROM ks.t"))

print("rows:", rows)

uuids = [r[0] for r in rows]
print("\nuuids:", uuids)

sorted_uuids = sorted(uuids)
print("\nsorted_uuids:", sorted_uuids)

Output:

rows: [Row(t=UUID('00000000-0000-1000-8080-808080808080')), Row(t=UUID('00000000-0000-1000-a1ca-00006490e9a4')), Row(t=UUID('00000000-0000-1000-f7f7-f7f7f7f7f7f7')), Row(t=UUID('00000000-0000-1000-ffff-ffffffffffff')), Row(t=UUID('00000000-0000-1000-0000-000000000000')), Row(t=UUID('00000000-0000-1000-7f7f-7f7f7f7f7f7f'))]

uuids: [UUID('00000000-0000-1000-8080-808080808080'), UUID('00000000-0000-1000-a1ca-00006490e9a4'), UUID('00000000-0000-1000-f7f7-f7f7f7f7f7f7'), UUID('00000000-0000-1000-ffff-ffffffffffff'), UUID('00000000-0000-1000-0000-000000000000'), UUID('00000000-0000-1000-7f7f-7f7f7f7f7f7f')]

sorted_uuids: [UUID('00000000-0000-1000-0000-000000000000'), UUID('00000000-0000-1000-7f7f-7f7f7f7f7f7f'), UUID('00000000-0000-1000-8080-808080808080'), UUID('00000000-0000-1000-a1ca-00006490e9a4'), UUID('00000000-0000-1000-f7f7-f7f7f7f7f7f7'), UUID('00000000-0000-1000-ffff-ffffffffffff')]

mykaul · 2023-06-23T10:21:31Z

We need to be 'Cassandra-compatible' in that sense, even bug compatible, or be able to turn on/off different behavior. @nyh - thoughts on the above?

cvybhu · 2023-06-23T10:48:14Z

We can provide the wrapper as a separate class without converting the rows automatically to TimeUUID.
The users would manually convert their UUID values to TimeUUID when needed.

nyh · 2023-06-23T11:16:54Z

We need to be 'Cassandra-compatible' in that sense, even bug compatible, or be able to turn on/off different behavior. @nyh - thoughts on the above?

I think it's a mistake to change the meaning of existing classes, it can break existing applications. In this sense, it's fine to add a new timeuuid class. Another alternative is to declare the existing situation a bug, i.e., a real application might get confused by the wrong "<" operator. If it's a bug, the it's fine to change the existing classes. But to be honest, I'm not sure that any real application tries to use <... An application typically trusts Scylla to return the correct order, and not in the habit of checking the order of the results.

cvybhu · 2023-06-23T11:16:55Z

A short description of how timeuuid comparison works, to make reviewing code easier.

A timeuuid is a UUID v1 value, which contains a timestamp and some other data:

The first 64 bits contain a 60 bit timestamp and 4 bit version

Each hex character is 4 bits, the 1 denotes the version:

00000000-0000-1000-0000-000000000000

This character will always be 1, valid timeuuids are always UUID v1.

The first 3 parts is the timestamp (+ version which is always 1).
For example in fed35080-0efb-11ee-a1ca-00006490e9a4, the fed35080-0efb-11ee represents timestamp 2023-06-19 23:49:56.990000+0000 (and version 1)

The remaining 64 bits have some special meaning in the standard, but Cassandra/Scylla just treats them like random data.

Cassandra compares timeuuid values the following way:

Take the timestamp (60 bit integer), and compare them. If one is smaller, this value is smaller.
Interpret the remaining 64 bits as a 64 bit signed integer, compare those integers.

https://github.com/apache/cassandra/blob/ae537abc6494564d7254a2126465522d86b44c1e/src/java/org/apache/cassandra/utils/TimeUUID.java

fruch · 2023-06-25T07:12:59Z

I would raise is on the upstream drivers as well, as a bug. (and can open this PR on upstream as well)

as a user I would expect the object returned by the driver to do the right ordering.

cvybhu · 2023-06-26T12:15:24Z

I would raise is on the upstream drivers as well, as a bug. (and can open this PR on upstream as well)
as a user I would expect the object returned by the driver to do the right ordering.

Opened https://datastax-oss.atlassian.net/browse/PYTHON-1358

nyh · 2023-06-27T16:12:04Z

@cvybhu did you notice that the Python driver already has datetime_from_uuid1() in cassandra.util?

This can be used to convert timeuuid to time and compare it like time, without needing a new class, although I admit it doesn't give you all the power of comparing all the extra bits just like Cassandra does (but does anyone need this extra power?)

There's also a delicate issue of timezone but it's not important if all you want is to compare the order.
I used this function when I translated the Cassandra tests: cassandra_tests/validation/entities/timeuuid_test.py

cvybhu · 2023-06-27T16:32:57Z

@cvybhu did you notice that the Python driver already has datetime_from_uuid1() in cassandra.util?

uuid.UUID already has a time attribute, which returns the integer timestamp, so it's the easiest to just use it:

>>> import uuid
>>> u = uuid.uuid1()
>>> u.time
139071763117282130
>>>

fruch · 2023-10-19T19:22:51Z

@cvybhu FYI someone working on this on upstream
You might want to share what we have in dtest with him

cvybhu · 2023-10-19T20:11:18Z

@cvybhu FYI someone working on this on upstream
You might want to share what we have in dtest with him

That's great to hear :) I shared my code with them in the JIRA issue, so they should have access to everything.

fruch · 2023-10-19T20:18:21Z

and now with the link:
datastax#1175

avelanarius · 2024-01-16T10:06:23Z

Closing the issue here since upstream is working on it.

cvybhu mentioned this issue Jun 23, 2023

Add a TimeUuid struct scylladb/scylla-rust-driver#749

Closed

avelanarius self-assigned this Jan 15, 2024

avelanarius closed this as not planned Won't fix, can't repro, duplicate, stale Jan 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a `TimeUUID` class #231

Add a `TimeUUID` class #231

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add a TimeUUID class #231

Add a TimeUUID class #231

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add a `TimeUUID` class #231

Add a `TimeUUID` class #231