8000 Performance issue and copyright violation in `_sseclient.py` (for `db.Reference.listen()`) · Issue #198 · firebase/firebase-admin-python · GitHub
[go: up one dir, main page]

Skip to content
Performance issue and copyright violation in _sseclient.py (for db.Reference.listen()) #198
Closed
@daniel-ziegler

Description

@daniel-ziegler

[REQUIRED] Step 2: Describe your environment

  • Operating System version: macOS 10.13.6
  • Firebase SDK version: 2.13.0
  • Firebase Product: database

[REQUIRED] Step 3: Describe the problem

Steps to reproduce:

  1. Have some Firebase Realtime Database path containing a significant amount of data (>200KB, say).
  2. Call db.reference('/path').listen(callback).

As the first update, the server will send the entire contents of that path. On my machine this pegs a CPU core for minutes before the callback is finally called. The issue is the following code:

while not self._event_complete():
try:
nextchar = next(self.resp_iterator)
self.buf += nextchar

Since self.resp_iterator iterates through the stream one character at a time and self._event_complete() runs a regular expression over the entire buffer, this is an O(N^2) operation, which takes a long time when N is large.

Separately, there is a copyright violation here. I discovered that `_sseclient.py, despite having a Google copyright header, is largely copied from https://github.com/btubbs/sseclient/blob/master/sseclient.py.

Relevant Code:

Run the following code to reproduce:

def listen_issue():
    path = '/sandbox/listen'

    print("Inserting data...")
    start = time.time()
    for i in range(10):
        long_data = ''.join(random.choice(string.ascii_uppercase) for _ in range(20000))
        db.reference(os.path.join(path, str(i))).set(long_data)
    end = time.time()
    print("...done (took %f seconds)" % (end - start))

    done_event = threading.Event()
    listener = db.reference(path).listen(lambda db_event: done_event.set())
    start = time.time()
    print("Waiting for first message from listen()...")
    done_event.wait()  # CPU usage is at 100% here
    end = time.time()
    print("...done (took %f seconds)" % (end - start))
    listener.close()

Example output on my machine:

Inserting data...
...done (took 1.678544 seconds)
Waiting for first message from listen()...
...done (took 109.180737 seconds)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0