8000 BigTable: Cell.from_pb() performance improvement · Issue #4714 · googleapis/google-cloud-python · GitHub
[go: up one dir, main page]

Skip to content
BigTable: Cell.from_pb() performance improvement #4714
@zakons

Description

@zakons

Looking at the performance graph for reading a large number (1000) of rows, each row having 10 cells in one column family, the following class method on Cell takes more time percentage-wise (~10%) than expected:

    @classmethod
    def from_pb(cls, cell_pb):
        """Create a new cell from a Cell protobuf.

        :type cell_pb: :class:`._generated.data_pb2.Cell`
        :param cell_pb: The protobuf to convert.

        :rtype: :class:`Cell`
        :returns: The cell corresponding to the protobuf.
        """
        timestamp = _datetime_from_microseconds(cell_pb.timestamp_micros)
        if cell_pb.labels:
            return cls(cell_pb.value, timestamp, labels=cell_pb.labels)
        else:
            return cls(cell_pb.value, timestamp)

It turns out that _datetime_from_microseconds is relatively expensive:

    return _EPOCH + datetime.timedelta(microseconds=value)

If you trace down to look at the code for _EPOCH and datetime.timedelta you will see the amount of work done to get a proper datetime.

It is suggested that Cell store the microseconds from the Cell protobuf and a property annotation be used to get the timestamp as a datetime, when requested. This makes sense since it moves the performance penalty to only the code which needs to access this timestamp, which may actually be a small minority of code. The property annotation would implement the datetime conversion, using the saved cell_pb.timestamp_micros:

    @property
    def timestamp(self):
        return _EPOCH + datetime.timedelta(self.timestamp_micros)

As an additional consideration, the use of labels in the constructor for Cell should be evaluated to determine if this feature is in use consistently across languages.

See approved pull request #4745.

Metadata

Metadata

Assignees

Labels

api: bigtableIssues related to the Bigtable API.performancetype: processA process-related concern. May include testing, release, or the like.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0