8000 Add Transaction Support by alexanderwhatley · Pull Request #54 · firebase/firebase-admin-python · GitHub
[go: up one dir, main page]

Skip to content

Add Transaction Support #54

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 11, 2017
Merged

Conversation

alexanderwhatley
Copy link

I added in 3 new functions that support the transaction functionality that was added in the REST API recently. These are "get_with_etag", "update_with_etag", and "transaction" in db.py, and added tests for all 3 of these functions. "get_with_etag" is able to retrieve the data at the reference as well as its ETag, and "update_with_etag" takes in data and an ETag value as parameters, and tries to write to the reference using these two.

"transaction" is based off of the transaction function that is in the Admin SDK for Nodejs/Java here: https://firebase.google.com/docs/reference/js/firebase.database.Reference#transaction. I added in the first two parameters, transactionUpdate and onComplete, but I am not sure what purpose the third one could serve in the REST API.

@googlebot
Copy link

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for the commit author(s). If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again.

@hiranya911
Copy link
Contributor
hiranya911 commented Aug 2, 2017

Thanks for providing this pull request. I took a quick look, but before I get into the details it would be great if you can address the following issues:

  1. Please go through the above comment added by @googlebot, and do the needful to address the CLA concerns.
  2. Please reformat/re-indent the two source files (db.py and test_db.py), so any lines you haven't touched remain unchanged, and they do not show up in the diffs. Right now it shows as if the contents of the entire files have been modified.
  3. Make sure there are no lint errors or any other build issues (see Travis log for the issues discovered so far).

try:
tries = 0
etag, data = self.get_with_etag()
val = transaction_update(data)
Copy link
@caseycrogers caseycrogers Aug 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Alexander, this looks awesome!
I'm on the Realtime Database team and worked on the new Conditional Requests feature.

The one thing I'd say is that you could change transaction_update to return a tuple of (val, retry) where retry is a boolean. Then, you can change you while loop to be:
while retry == True:
This way the user passes in a transaction_update function that controls retry logic itself. We could also still include _FIREBASE_MAX_RETRIES so that users don't accidentally infinite loop, though I believe our SDKs that already have transactions do not do have this-so it may be best to be consistent.

Copy link
@caseycrogers caseycrogers Aug 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, transactions in Java and Javascript run asynchronously which is why the on_complete is there. As written, this implementation runs synchronously so the on_complete doesn't really have a use: the values could be directly returned instead of passed into on_complete.

We could either:

  1. refactor this function such that it is asynchronous
  2. remove the on_complete argument and directly return (error, committed, data)

I heavily lean towards 2 as asynchronous code and callbacks are much less idiomatic in Python than in Java or Javascript.

I'm going to try and grab some time to put a PR against your PR with these recommendations, though I can't guarantee I'll get the chance.

Thanks so much for putting time and effort in to working on this!

@alexanderwhatley
Copy link
Author

Hi guys, thanks for the feedback! I'll try to get another pull request revising these done by tonight. A few questions/concerns:

  1. @hiranya911, I made all of the commits, as you can see from the names, but one was accidentally made with a git account that is not associated with Github or Gmail. Do you think you could manually override the Googlebot on the CLA request?
  2. @caseycrogers, I think it's a good idea to have transaction_update return val and boolean. Also, the other SDKs actually do have a retry limit, which is also 25. See here for the Java one: https://github.com/firebase/firebase-admin-java/blob/f9b81d291cd854fab9159582c80f09dc5d561ac0/src/main/java/com/google/firebase/database/core/Repo.java
  3. I agree that we can get rid of on_complete.

@caseycrogers
Copy link

@alexanderwhatley Serves me right for not checking myself!
Strike what I said about removing the max_retries.

@hiranya911 hiranya911 self-requested a review August 2, 2017 23:07
@hiranya911
Copy link
Contributor

@alexanderwhatley I think the easiest solution would be to get that account associated with your Github account. This can be done via your Github profile settings: https://help.github.com/articles/adding-an-email-address-to-your-github-account/

@alexanderwhatley alexanderwhatley force-pushed the master branch 2 times, most recently from 904aaf3 to 98fbb2f Compare August 3, 2017 05:13
@alexanderwhatley
Copy link
Author
alexanderwhatley commented Aug 3, 2017

@caseycrogers, I got rid of the callback for the transaction function.

@hiranya911: It doesn't seem like the git account I used to commit at first has an associated email account, so I couldn't add it to my GitHub account. I also tried squashing the commit made with the other account into my own commit, but that did not work either. I think the only possibilities now are that either you or someone else manually override the googlebot, or you make a PR from this PR yourself.

@hiranya911 hiranya911 self-assigned this Aug 3, 2017
Copy link
Contributor
@hiranya911 hiranya911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alexanderwhatley for all your hard work on this. It looks pretty good. I've made some comments requesting a few more code changes. Most of them are very trivial, and should not require a lot of effort.

I will discuss with our open source team about your CLA situation. I'm hopeful we can find a solution for you.

Also if you have the time, please consider contributing an integration test for the new functionality in integration/test_db.py.

@@ -13,7 +13,6 @@
# limitations under the License.

"""Firebase Realtime Database module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add these removed empty lines back. They actually serve a purpose. Here and also below.

Raises:
ApiCallError: If an error occurs while communicating with the remote database server.
"""
return self._client.request('get', self._add_suffix())

def get_with_etag(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove this method from the public API. Lets rename it to _get_with_etag(), and use it as an internal helper method. Lets do the same for _update_with_etag(). If necessary, we can later figure out a way to expose them in the public API.

Raises:
ApiCallError: If an error occurs while communicating with the remote database server.
"""
return self._client.request('get', self._add_suffix())

def get_with_etag(self):
"""Returns the value at the current location of the database, along with its ETag.
Returns:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need an empty line before this.

Returns:
object: Tuple of the ETag value corresponding to the Reference, and the
Decoded JSON value of the current database Reference.
Raises:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add empty line.

def update_with_etag(self, value, etag):
"""Updates the specified child keys of this Reference to the provided values
and uses ETag to make sure data is up to date.
Args:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add empty line. Also before Returns: and Raises.

tests/test_db.py Outdated

def send(self, request, **kwargs):
if 'if-match' in request.headers and request.headers['if-match'] != self._etag:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if request.headers.get('if-match') != self._etag:

@@ -34,14 +37,24 @@ def __init__(self, data, status, recorder):
self._data = data
self._status = status
self._recorder = recorder
self._etag = '0'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just make this a constant string. No need to update it every time a request is made as you do below.

.gitignore Outdated
@@ -7,3 +7,6 @@
*~
scripts/cert.json
scripts/apikey.txt
serviceAccountCredentials.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets remove this line. That file name looks arbitrary.

.gitignore Outdated
@@ -7,3 +7,6 @@
*~
scripts/cert.json
scripts/apikey.txt
serviceAccountCredentials.json
__pycache__
*.pyc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also remove. Already listed in the file (line 2).

tests/test_db.py Outdated
response._content = request.body
response.headers = CaseInsensitiveDict({'ETag': self._etag})
request_exception = RequestException(response=response)
raise db.ApiCallError('', request_exception)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more accurate to raise the RequestException here, and let the db module handle it. Or better yet, return a response with status code 412, along with the correct headers and payload.

else:
etag, data = resp
val = transaction_update(data)
tries += 1
Copy link
@caseycrogers caseycrogers Aug 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can restructure this while loop a little:

etag, data = self.get_with_etag()
while tries < _TRANSACTION_MAX_RETRIES:
    val, should_retry = transaction_update(data)
    if not should_retry:
        break
    resp = self.update_with_etag(val, etag)
    if resp is None:
        break
    else:
        etag, data = resp
        tries += 1

This also lets the user specify when to stop retrying in transaction_update. Note that this code will raise an exception if transaction_update does not return a tuple. We probably want to catch that exception and raise one with a context specific message in its place.

Note that should_retry is not necessarily a boolean. We could either assert that it is a boolean and raise a clear exception to prevent users from accidentally passing unintended values, or we could leave it as is, assuming that in some cases a user may be passing through things like an empty list versus a non-empty list as their retry criteria.

@hiranya911
Copy link
Contributor

Btw, I think your first commit is associated with the email address "awhatley-mbp@awhatley-mbp.ad.quora.com". Not sure if that's a real email address, or some automatically managed account. But see if you can get it added to your github profile.

@hiranya911
Copy link
Contributor

Related to #53

@hiranya911
Copy link
Contributor

Hi @alexanderwhatley. It seems you can amend the author of that commit via git rebase: https://stackoverflow.com/questions/3042437/change-commit-author-at-one-specific-commit

Give that a try, and see if you can change the author of the commit.

@googlebot
Copy link

CLAs look good, thanks!

@alexanderwhatley
Copy link
Author

@hiranya911 and @caseycrogers, I did most of the stuff you guys requested. I also fixed the issue with the CLA request by following the StackOverflow link's directions. A few things:

  1. For consistency, I think it makes more sense for the _update_with_etag to return True, the current value of the etag, and the value of the just updated data if the request goes through.

  2. @caseycrogers, I don't think the way you want the while loop refactored is a good idea, especially given the refactored _update_with_etag function, which always returns a tuple of three items. In particular, I would think that in most cases if the transaction_update functions returns a boolean as a second argument, it would be false, and if it were false, the while loop would break before 'resp = self.update_with_etag(val, etag)' executes, and thus the transaction would not go through.

  3. It's not clear to me at all how to run the integration tests, and I couldn't find any directions, so I didn't add any. It looks like I have to specify the certificate somehow using --cert, but none of the files in the tests/data folder worked....

@hiranya911
Copy link
Contributor

Thanks @alexanderwhatley. So glad the cla issue got resolved. I'll take another look at your code tomorrow. I'm also working on getting this new feature/api approved internally.

As for integration tests, there are some instructions at https://github.com/firebase/firebase-admin-python/blob/master/CONTRIBUTING.md#integration-testing

Copy link
Contributor
@hiranya911 hiranya911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alexanderwhatley. I think it looks great now. I've made a few more minor suggestions. Please go over them when you get a chance. I'm going to try and get the internal review process for this expedited, so we can release it in the near future.

Have you had any luck with the integration tests? If it's causing problems, I'd like to get your feedback, so we can try to make it easier for a developer to run the integration tests. If necessary, I'm also willing to write the integration tests for the transaction support in a future PR.

raise ValueError('Value argument must be a non-empty dictionary.')
if None in value.keys() or None in value.values():
raise ValueError('Dictionary must not contain None keys or values.')
if not isinstance(etag, str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useisinstance(etag, six.string_types) here. Behavior of str is vastly different between Python 2 and 3.

except ApiCallError as error:
detail = error.detail
success = False
etag = detail.response.headers['ETag']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, to be in the safe side:

if detail.response and 'ETag' in detail.response.headers:
  etag = detail.response.headers['ETag']
  snapshot = detail.response.json()
  return False, etag, snapshot
else:
  # Possible connection error or other unexpected backend error.
  # Simply raise the original error.
  raise error

@@ -199,6 +232,30 @@ def delete(self):
"""
self._client.request_oneway('delete', self._add_suffix())

def transaction(self, transaction_update):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets return a boolean from this method to indicate success/failure. That way the caller has a deterministic way of knowing whether the transaction was committed or not:

if ref.transaction(update_func):
  do_more_stuff()
else:
  raise DatabaseError('Failed to commit txn')

tests/test_db.py Outdated

def send(self, request, **kwargs):
if request.headers.get('if-match') is not None and \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest you break this into two lines:

if_match = request.headers.get('if-match')
if if_match and if_match != self._etag:
   ...

@alexanderwhatley
Copy link
Author

@hiranya911, I just pushed some integration tests and fixed the other things. I feel pretty strongly that the API should expose the functionality for accessing the ETag, in case the user wishes to do some low level operations, such as accessing data and writing data to multiple parts of the database. I additionally think that we should add two functions, get_etag() and check_etag(etag) that retrieve the ETag w F987 ithout the corresponding data, and check to see if the ETag is correct, respectively. What do you think?

@hiranya911
Copy link
Contributor

Hi @alexanderwhatley. Given that the etag functionality is not available in our other SDKs, I think it's a good idea to exercise some restraint when adding it to our public API. At very least I'd like to hear from our Realtime Database team, and get their opinion on the matter. Therefore I'd like to simply release the transaction support you've implemented in its present state, and continue the discussion about exposing more etag-related functionality from the SDK. I suggest you open a new issue for that, documenting some of the use cases the proposed methods can be used for. In the meantime, developers who really want to access etag values can call the private _get_with_etag() and _update_with_etag() methods.

If we choose to make this functionality public, I think we can probably expose them as extensions to our existing get and set methods:

get(self, etag=False) # When etag is True, behaves like _get_with_etag()
set(self, value, etag=None) # When etag is not None, behaves like _update_with_etag()

Copy link
Contributor
@hiranya911 hiranya911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

I have left some open comments. Please address them when you get a chance. I will merge and release this PR, once our internal review process has come to an end (it will take a few days). Thanks for all your hard work on this, and actively responding to my code review comments. I'm certain many Firebase users are going to find this new feature very useful.

else:
val = transaction_update(snapshot)
tries += 1
if tries == _TRANSACTION_MAX_RETRIES:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this always the case? I think we can just return False here without the if condition.

push_data = {'name' : 'Edward Cope', 'since' : 1800}
edward = python.child('users').push(push_data)
etag, data = edward._get_with_etag()
assert data == push_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps also do assert isinstance(etag, six.string_types)

assert data == push_data

update_data = {'name' : 'Jack Horner', 'since' : 1940}
failed_update = edward._update_with_etag(update_data, '')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to d 1241 escribe this comment to others. Learn more.

Instead of empty string etag, pass something like invalid-etag so the intention becomes clear.

assert failed_update == (False, etag, push_data)

successful_update = edward._update_with_etag(update_data, etag)
assert successful_update[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fold into one line: assert successful_update == (True, update_data)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember, the method returns three values, one of which is the ETag, and the ETag is not known ahead of time

def test_transation(self, testref):
python = testref.parent
def transaction_update(snapshot):
snapshot['foo2'] = 'bar2'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we also do an update like:

snapshot['foo1'] += 'suffix'

and make sure the existing value of foo1 gets updated below.

@alexanderwhatley
Copy link
Author

Cool. I made the changes that you requested. I'll open up an issue and discuss the possibility of allowing access to the ETag object itself.

@hiranya911
Copy link
Contributor

Haven't forgotten this. Just waiting for the API review process to end. Should happen later this week.

Copy link
@caseycrogers caseycrogers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hiranya911 Can you send me the API review doc? I can't seem to find the link.

@alexanderwhatley Hey Alex, so sorry for the crazy late follow-up, I was backpacking for a week and was totally disconnected. I left some comments, most of them nit-picks. However I do strongly believe that we should add a "should_retry" return to the transaction_update function. I've provided more detail in a comment.

As for your interest in adding low-level access to ETags-this seems reasonable to me. However, there is a caveat:

As-is, the ETag of a given value at one path in the database will be the same as the ETag of the same value at any other path.

Additionally, the ETag of values will not change with time. However, we do not document these two properties and do not guarantee that EITHER of them will remain true in the future. As such, we strongly discourage developers from fetching ETag values and then saving them to be used with later transactions at the same or different locations.

etag, data = self._get_with_etag()
val = transaction_update(data)
while tries < _TRANSACTION_MAX_RETRIES:
success, etag, snapshot = self._update_with_etag(val, etag)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for refactoring this loop and adding a boolean return, I'm afraid I had miscommunicated. I meant that "transaction_update" should return (val, should_retry). If should_retry is False, transaction should immediately return with False.

This way, if a user wants to control their own retry behavior (ie "retry only twice, then fail") they can do so by returning False depending on how many times transaction_update has been called and/or what snapshot values are being passed through.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It seems Node and Java SDKs support a similar feature. In Node the update function can return undefined to abort, and in Java there's a Transaction.abort() method. In any case lets hold off on implementing this change. I'll run this by the API review team, and update this thread with the outcome.


tries = 0
etag, data = self._get_with_etag()
val = transaction_update(data)
Copy link
@caseycrogers caseycrogers Aug 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets move down as the first line of the while loop-then we can remove the duplicate line at 262.

ie:

etag, snapshot = self._get_with_etag() # EDIT: moved up
while tries < _TRANSACTION_MAX_RETRIES:
    val = transaction_update(snapshot)
    success, etag, snapshot = self._update_with_etag(val, etag)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not move _get_with_etag() into the loop as it makes a REST call. Perhaps:

etag, data = self._get_with_etag()
while tries < _TRANSACTION_MAX_ENTRIES:
    new_data = transaction_update(data)
    success, etag, data = self._update_with_etag(new_data, etag)
    if success:
        return True
    tries += 1

10000

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry-that was a mistake. I had intended for get_with_etag to remain outside of the loop.


tries = 0
etag, data = self._get_with_etag()
val = transaction_update(data)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename 'val' to something like 'new_snapshot' or 'snapshot_to_write'?

'val' doesn't seem to communicate that this is the snapshot after applying transaction_update.

@hiranya911
Copy link
Contributor

Changes proposed during the final review are at #56. I will merge this PR as it is, and implement the necessary changes in a new PR.

@hiranya911 hiranya911 merged commit d86b8bb into firebase:master Aug 11, 2017
@hiranya911
Copy link
Contributor

Thanks @alexanderwhatley

Hope you continue to contribute to this project.

@alexanderwhatley
Copy link
Author

@hiranya911, thanks, I will definitely contribute more in the future, this library is very useful for the work I am doing. Do you have an ETA when the transaction features will be released on PyPI?

@hiranya911
Copy link
Contributor

Hopefully next week. I'm working on getting the docs in order.

@hiranya911
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0