-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Looking at the performance graph for reading a large number (1000) of rows, each row having 10 cells in one column family, the following class method on Cell takes more time percentage-wise (~10%) than expected:
@classmethod
def from_pb(cls, cell_pb):
"""Create a new cell from a Cell protobuf.
:type cell_pb: :class:`._generated.data_pb2.Cell`
:param cell_pb: The protobuf to convert.
:rtype: :class:`Cell`
:returns: The cell corresponding to the protobuf.
"""
timestamp = _datetime_from_microseconds(cell_pb.timestamp_micros)
if cell_pb.labels:
return cls(cell_pb.value, timestamp, labels=cell_pb.labels)
else:
return cls(cell_pb.value, timestamp)
It turns out that _datetime_from_microseconds is relatively expensive:
return _EPOCH + datetime.timedelta(microseconds=value)
If you trace down to look at the code for _EPOCH and datetime.timedelta you will see the amount of work done to get a proper datetime.
It is suggested that Cell store the microseconds from the Cell protobuf and a property annotation be used to get the timestamp as a datetime, when requested. This makes sense since it moves the performance penalty to only the code which needs to access this timestamp, which may actually be a small minority of code. The property annotation would implement the datetime conversion, using the saved cell_pb.timestamp_micros:
@property
def timestamp(self):
return _EPOCH + datetime.timedelta(self.timestamp_micros)
As an additional consideration, the use of labels in the constructor for Cell should be evaluated to determine if this feature is in use consistently across languages.
See approved pull request #4745.