-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Refactor com.google.cloud.firestore.it.ITSystemTest.queryWatch to make it more reliable. #5902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor com.google.cloud.firestore.it.ITSystemTest.queryWatch to make it more reliable. #5902
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm somewhat on board with this, but I think that we should reduce some of the logic here that was added in order to plumb through QuerySnapshot exceptions.
I also suggested a way to deflake the original test.
...clients/google-cloud-firestore/src/test/java/com/google/cloud/firestore/it/ITSystemTest.java
Outdated
Show resolved
Hide resolved
...clients/google-cloud-firestore/src/test/java/com/google/cloud/firestore/it/ITSystemTest.java
Outdated
Show resolved
Hide resolved
| @Override | ||
| public void onEvent( | ||
| @Nullable QuerySnapshot value, @Nullable FirestoreException error) { | ||
| receivedEvents.add(new ListenerEvent(value, error)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should never see an error here. To simplify this test, I would add an assert here that the error is null (and maybe that value is not) and replace ListenerEvent with QuerySnapshot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asserts inside the snapshotListener unfortunately do not automatically cascade outside the listener and as such can't fail the test, that I why I pulled it up to the top level and added the assertion after all the operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if we don't see the right number of snapshots (we get an error but we ignore it and don't add a receivedEvent)? My only reservation with this PR is that the generic nature of it adds a lot of code that makes the event flow somewhat hard to parse. I'm totally biased here, but the original test seems significantly easier to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I started investigating why this tests would fail ~65% of the time in CI I narrowed it down to the fact that the semaphore wasn't ever being satisfied. In #5746 I added a timeout as well as some diagnostic information to the test and found that the failure always occurred in the case 0 brach and resulted in the other branches never being executed. To ensure that performing new document updates wouldn't ever block the processing of received I separated the update and the lister as presented in this PR. I have ran this new implementation continuously for 2,500 iterations and all 2,500 runs passed.
I agree that it is more code -- a fair amount of it would be reduced if we were able to use java 1.8 constructs (which will hopefully happen sooner rather than later) -- and not super straightforward from a cursory glance, but I don't think the original test was either. One advantage with this new implementation is that it's more explicit about what the interrelationships are between the listener and document updates. In addition, there are explicit timeouts defined for each type of operation so we don't ever hang the entire build, and having the assertions outside the listeners means if there is a failure it will actually fail the test.
If it isn't important for doc1 and doc2 to be updated independent from one another I can merge those workers. If we want to avoid the anonymous inner classes from FluentIterable I can rewrite that with for loops.
...clients/google-cloud-firestore/src/test/java/com/google/cloud/firestore/it/ITSystemTest.java
Outdated
Show resolved
Hide resolved
| }); | ||
|
|
||
| // Wait for the document update operations to be performed | ||
| final boolean doc1CDLAwait = doc1CDL.await(10, TimeUnit.SECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am not mistaken, doc1CDLAwait and doc2CDLAwait could be a single count down latch initialized with new CountDownLatch(2).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is true we could do this, but then we would lose the info about which of the document operations timed out for the assertion. I'd like to keep the extra detail in case the test ever fails due to this.
| actualEvents.add("event 0"); | ||
| assertTrue(value.isEmpty()); | ||
| ref1 = randomColl.add(map("foo", "foo")).get(); | ||
| ref2 = randomColl.add(map("foo", "bar")).get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem with this test is that these two adds are not guaranteed to generate two different events. One way of fixing this would be either split these up into two tasks or use a WriteBatch for both writes, which is guaranteed to only raise one event of size 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This scenario feels a bit like a different test case to me. I can look at adding another test if that sounds good to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's different for sure - but this test is just meant to show that basic Watch functionality works. The actual steps that we verify are of minor importance.
…e it more reliable. 1. Separate document update logic from event listener code 2. Move assertions out of listener code 3. Add timeouts to all await conditions with associated assertions to clearly communicate the test fails due to an await, and which one. 4. Switch from using a semaphore to using multiple CountDownLatches The queryWatch test has been flaky for some time, resulting in failure most of the time. Due to these persistent failures the trust of the firestore integration test suite has been low, and the results often ignored. After being able to run the test in a profileable environment, I ran the old implementation repeatedly for 25 minutes resulting in a total of 136 runs, 100 of which failed -- each with the same error. The new implementation was ran for 25 minutes resulting in a total of 2500 runs, none of which failed.
Codecov Report
@@ Coverage Diff @@
## master #5902 +/- ##
============================================
- Coverage 46.79% 46.79% -0.01%
Complexity 25648 25648
============================================
Files 2456 2456
Lines 267593 267593
Branches 30564 30564
============================================
- Hits 125229 125228 -1
- Misses 133105 133106 +1
Partials 9259 9259
Continue to review full report at Codecov.
|
Changes
to clearly communicate the test fails due to an await, and which one.
Motivation
The queryWatch test has been flaky for some time, resulting in failure
most of the time. Due to these persistent failures the trust of the
firestore integration test suite has been low, and the results often
ignored.
After being able to run the test in a profileable environment, I ran the
old implementation repeatedly for 25 minutes resulting in a total of 136
runs, 100 of which failed -- each with the same error. The new
implementation was ran for 25 minutes resulting in a total of 2500 runs,
none of which failed.