8000 [BEAM-8271] Properly encode/decode StateGetRequest/Response continuation token by chadrik · Pull Request #10595 · apache/beam · GitHub
[go: up one dir, main page]

Skip to content

[BEAM-8271] Properly encode/decode StateGetRequest/Response continuation token #10595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 6, 2020

Conversation

chadrik
Copy link
Contributor
@chadrik chadrik commented Jan 15, 2020

This behavior is the same as Java, and corrects an inconsistency with bytes/str that might have caused problems in python3.

This is a followup to #9056.

Note that this issue has its own Jira.

R: @robertwb
R: @udim


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
--- --- Build Status
XLang --- --- --- Build Status --- --- ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@chadrik chadrik requested review from robertwb and udim January 15, 2020 23:17
Copy link
Contributor
@robertwb robertwb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think it's more correct to let continuation_token by bytes everywhere (possibly fixing the uses if need be).

@chadrik
Copy link
Contributor Author
chadrik commented Jan 22, 2020

Actually, I think it's more correct to let continuation_token by bytes everywhere (possibly fixing the uses if need be).

Here's the ticket I made: https://issues.apache.org/jira/browse/BEAM-8271.

It's been awhile, but my instinct was that this is an issue in the proto file: all other id-like fields throughout these protos are strings. Also, the token itself is created using string operations ('token_%x' % len(self._continuations)) so it certainly feels like this wants to be a string right up until it's encoded and stored in the proto message.

IIRC, my first attempt to solve this was to preserve the bytes typing, but I think there are other calls that create the continuation token from strings, and it started getting pretty ugly (converting from bytes to str to do some string formatting, then back to bytes in multiple places, etc), so I looked at how this was solved in Java, and based my changes here on that. It keeps the ugliness isolated and at a minimum (but again, it was awhile ago, so I may be misremembering something).

If I haven't convinced you yet, I can show you what an alternate solution looks like using bytes, but first please do have a look at other ids in the .proto files and at the Java equivalent for this.

@robertwb
Copy link
Contributor

It's bytes because non-trivial runners may serialize arbitrary data into this field used to continue the iterable. (Ids are strings because they're just used to compare against, and also the type of proto map keys constrains us here.) There shouldn't be any manipulation of this token except for passing it back on the client side at least--just accepting it and passing it back.

It looks like Java treats this as a BytesString. I still think we should do the same.

@chadrik
Copy link
Contributor Author
chadrik commented Jan 30, 2020 via email

@chadrik chadrik force-pushed the python-typing-state-get-request branch from 315e41c to 590bc03 Compare January 30, 2020 19:01
@chadrik
Copy link
Contributor Author
chadrik commented Jan 30, 2020

@robertwb Ok, the solution using bytes is actually much simpler. I thought that bytes had all of its string formatting capabilities neutered in python3, but apparently it has not (my python3 experience is limited to a few side projects). Thanks for pushing me in the right direction.

Note: I can't see the tests on this PR. Are they running?

@chadrik
Copy link
Contributor Author
chadrik commented Jan 30, 2020

It's bytes because non-trivial runners may serialize arbitrary data into this field used to continue the iterable. (Ids are strings because they're just used to compare against, and also the type of proto map keys constrains us here.)

Make sense. Thanks for the explanation.

…ion_token

Ensure proper handling of bytes vs unicode
@chadrik chadrik force-pushed the python-typing-state-get-request branch from 590bc03 to e15d33f Compare January 30, 2020 19:12
@chadrik
Copy link
Contributor Author
chadrik commented Feb 4, 2020

@robertwb this is ready to go

Copy link
Contributor
@robertwb robertwb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. LGTM, pending merge conflict resolution.

@robertwb robertwb merged commit e15d33f into apache:master Feb 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0