8000 google-cloud-core: Optimize Timestamp.parseTimestamp(String) by olavloite · Pull Request #4831 · googleapis/google-cloud-java · GitHub
[go: up one dir, main page]

Skip to content

Conversation

@olavloite
Copy link
@olavloite olavloite commented Mar 31, 2019

Stress testing the Spanner client library showed that more than 50% of the total cpu time consumed by the client library while executing queries and consuming result sets is spent on parsing dates and timestamps. This stress test uses a result set containing one column of each possible data type of Spanner (both single value column and array column). Spanner automatically translates the string values that are sent by the server into Date and Timestamp objects before the user requests these from the ResultSet.

In a real-life scenario, the total percentage of total time spent on parsing these values will however be a lot lower, as most of the time the client library will be waiting on network traffic.

The total execution time of the stress tests added for Spanner on my own laptop with and without this optimization is:

  • Without optimization (3 test runs): 71.1s, 69.5s, 67.0s
  • With optimization (3 test runs): 34.1s, 34.7s, 35.1s

Stress testing the Spanner client library showed that more than 50% of
the total cpu time consumed by the client library while executing
queries and consuming result sets is spent on parsing dates and
timestamps. This stress test uses a result set containing one column of
each possible data type of Spanner (both single value column and array
column). Spanner automatically translates the string values that are
sent by the server into Date and Timestamp objects before the user
requests these from the ResultSet.

In a real-life scenario, the total percentage of time spent on parsing
these values will however be a lot lower, as most of the time the client
library will be waiting on network traffic.
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Mar 31, 2019
@olavloite olavloite changed the title Core optimize parse date timestamp google-cloud-core: optimize parse date timestamp Mar 31, 2019
@codecov
Copy link
codecov bot commented Mar 31, 2019

Codecov Report

Merging #4831 into master will increase coverage by 0.27%.
The diff coverage is 75.67%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #4831      +/-   ##
============================================
+ Coverage     50.08%   50.35%   +0.27%     
- Complexity    22813    23638     +825     
============================================
  Files          2168     2231      +63     
  Lines        216815   225730    +8915     
  Branches      24263    24947     +684     
============================================
+ Hits         108590   113669    +5079     
- Misses        99882   103463    +3581     
- Partials       8343     8598     +255
Impacted Files Coverage Δ Complexity Δ
...core/src/main/java/com/google/cloud/Timestamp.java 82.75% <75.67%> (-7.25%) 35 <12> (+11)
...oud/automl/v1beta1/stub/PredictionServiceStub.java 20% <0%> (-30%) 1% <0%> (ø)
...d/storage/CanonicalExtensionHeadersSerializer.java 88.88% <0%> (-11.12%) 12% <0%> (+4%)
...om/google/cloud/automl/v1beta1/AutoMlSettings.java 7.52% <0%> (-3.25%) 2% <0%> (ø)
...loud/automl/v1beta1/PredictionServiceSettings.java 21.21% <0%> (-2.93%) 2% <0%> (ø)
...ogle/cloud/dialogflow/v2beta1/DocumentsClient.java 51.8% <0%> (-2.63%) 19% <0%> (ø)
...oud/dialogflow/v2beta1/stub/GrpcDocumentsStub.java 88.05% <0%> (-2.55%) 12% <0%> (ø)
...ud/dialogflow/v2beta1/stub/KnowledgeBasesStub.java 14.28% <0%> (-2.39%) 1% <0%> (ø)
...ialogflow/v2beta1/stub/GrpcKnowledgeBasesStub.java 88.54% <0%> (-2.28%) 11% <0%> (ø)
...e/cloud/dialogflow/v2beta1/stub/DocumentsStub.java 9.09% <0%> (-2.03%) 1% <0%> (ø)
... and 113 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 81f9c46...63001ff. Read the comment docs.

@olavloite olavloite added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 31, 2019
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 31, 2019
Parsing the decimal is digit is faster using a switch statement than a
number of if statements. The switch statement is also faster than
Character.getNumericValue(char), as the latter also takes additional
unicode numerical values into account.
@sduskis
Copy link
Contributor
sduskis commented Apr 4, 2019

@olavloite, is this ready for review?

@olavloite olavloite marked this pull request as ready for review April 4, 2019 16:33
@olavloite olavloite requested a review from a team as a code owner April 4, 2019 16:33
Copy link
Contributor
@sduskis sduskis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I'm not sure why your parsing would be qualitatively better t 10BC0 han a standard implementation. I'm very hesitant to do all of this manual parsing of Dates, which can get tricky.

Are there performance improvements that we can get while continuing to use a standard date parser?

@olavloite
Copy link
Author

Standard date/time parsers are based on regular expressions in order to support different date/time formats, which make them slower than an implementation that is specifically written specifically for the only format that we support. I understand the reluctance on changing this. One option could be to add this specific parse method to the Spanner library (or only the GrpcResultSet class) and only use it there.

I did some tests to get an idea of the actual difference in performance by parsing 1,000,000 random timestamps a number of times.

  • Current implementation of Timestamp.parseTimestamp(String): 2300ms
  • Using org.threeten.bp.Instant.parse(CharSequence): 1600ms
  • The parse method in this PR: 330ms

parseInvalidDate("2016 09 18");
parseInvalidDate("2016-9-18");
parseInvalidDate("2016-09-18T10:00");
parseInvalidDate("");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the parseInvalidDate checks to a new method. Please also add some interesting boundary conditions, like 2001/2/29

Copy link
Author
@olavloite olavloite Apr 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added (see separate PR #4920 ).
com.google.cloud.Date does however support invalid dates. A date like 2001/02/29 is accepted, both when you submit it through Date.parseDate as well as by constructing it using the Date.fromYearMonthDay. I have not changed that part of the behavior.

int year = Integer.parseInt(matcher.group(1));
int month = Integer.parseInt(matcher.group(2));
int dayOfMonth = Integer.parseInt(matcher.group(3));
Preconditions.checkNotNull(date);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems straightforward. Can you please break it out into a separate PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed from this PR and submitted as separate PR.

@olavloite olavloite changed the title google-cloud-core: optimize parse date timestamp google-cloud-core: Optimize Timestamp.parseTimestamp(String) Apr 6, 2019
@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Apr 7, 2019
int hour = IntParser.parseInt(timestamp, 11, 13);
int minute = IntParser.parseInt(timestamp, 14, 16);
int second = IntParser.parseInt(timestamp, 17, 19);
int year = Integer.parseInt(timestamp.substring(0, 4));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to catch NumberFormatException.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that would add very much. The method already throws IllegalArgumentException if the input is not valid. NumberFormatException is a subclass of IllegalArgumentException, so it should be caught by any exception handling the client already has.
The only reason to do so would be to add a custom message. Is that what you mean?

fraction = Integer.parseInt(timestamp.substring(20, endIndex));
// Adjust the result to nanoseconds if the input length is less than 9 digits (9 -
// (endIndex
// - 20)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please have the mathematical formula on a single line, if possible. (9 - (endIndex - 20)).

// (endIndex
// - 20)).
fraction *= POWERS_OF_10[9 - (endIndex - 20)];
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this else need to be here?

@sduskis sduskis requested review from a team and chingor13 April 15, 2019 14:59
@sduskis sduskis added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 16, 2019
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Apr 16, 2019
@chingor13
Copy link
Contributor
chingor13 commented Apr 23, 2019

I'm reluctant to have us maintain our own date parsing code within this repo. Can we not provide the expected date format to one of the existing libraries so it can parse more performantly?

@sduskis sduskis closed this Apr 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes This human has signed the Contributor License Agreement. 🚨 This issue needs some love.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants

0