-
Notifications
You must be signed in to change notification settings - Fork 1.1k
google-cloud-core: Optimize Timestamp.parseTimestamp(String) #4831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
google-cloud-core: Optimize Timestamp.parseTimestamp(String) #4831
Conversation
Stress testing the Spanner client library showed that more than 50% of the total cpu time consumed by the client library while executing queries and consuming result sets is spent on parsing dates and timestamps. This stress test uses a result set containing one column of each possible data type of Spanner (both single value column and array column). Spanner automatically translates the string values that are sent by the server into Date and Timestamp objects before the user requests these from the ResultSet. In a real-life scenario, the total percentage of time spent on parsing these values will however be a lot lower, as most of the time the client library will be waiting on network traffic.
Codecov Report
@@ Coverage Diff @@
## master #4831 +/- ##
============================================
+ Coverage 50.08% 50.35% +0.27%
- Complexity 22813 23638 +825
============================================
Files 2168 2231 +63
Lines 216815 225730 +8915
Branches 24263 24947 +684
============================================
+ Hits 108590 113669 +5079
- Misses 99882 103463 +3581
- Partials 8343 8598 +255
Continue to review full report at Codecov.
|
google-cloud-clients/google-cloud-core/src/main/java/com/google/cloud/IntParser.java
Outdated
Show resolved
Hide resolved
Parsing the decimal is digit is faster using a switch statement than a number of if statements. The switch statement is also faster than Character.getNumericValue(char), as the latter also takes additional unicode numerical values into account.
|
@olavloite, is this ready for review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I'm not sure why your parsing would be qualitatively better t 10BC0 han a standard implementation. I'm very hesitant to do all of this manual parsing of Dates, which can get tricky.
Are there performance improvements that we can get while continuing to use a standard date parser?
|
Standard date/time parsers are based on regular expressions in order to support different date/time formats, which make them slower than an implementation that is specifically written specifically for the only format that we support. I understand the reluctance on changing this. One option could be to add this specific parse method to the Spanner library (or only the I did some tests to get an idea of the actual difference in performance by parsing 1,000,000 random timestamps a number of times.
|
| parseInvalidDate("2016 09 18"); | ||
| parseInvalidDate("2016-9-18"); | ||
| parseInvalidDate("2016-09-18T10:00"); | ||
| parseInvalidDate(""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move the parseInvalidDate checks to a new method. Please also add some interesting boundary conditions, like 2001/2/29
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added (see separate PR #4920 ).
com.google.cloud.Date does however support invalid dates. A date like 2001/02/29 is accepted, both when you submit it through Date.parseDate as well as by constructing it using the Date.fromYearMonthDay. I have not changed that part of the behavior.
| int year = Integer.parseInt(matcher.group(1)); | ||
| int month = Integer.parseInt(matcher.group(2)); | ||
| int dayOfMonth = Integer.parseInt(matcher.group(3)); | ||
| Preconditions.checkNotNull(date); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change seems straightforward. Can you please break it out into a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed from this PR and submitted as separate PR.
| int hour = IntParser.parseInt(timestamp, 11, 13); | ||
| int minute = IntParser.parseInt(timestamp, 14, 16); | ||
| int second = IntParser.parseInt(timestamp, 17, 19); | ||
| int year = Integer.parseInt(timestamp.substring(0, 4)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to catch NumberFormatException.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that would add very much. The method already throws IllegalArgumentException if the input is not valid. NumberFormatException is a subclass of IllegalArgumentException, so it should be caught by any exception handling the client already has.
The only reason to do so would be to add a custom message. Is that what you mean?
| fraction = Integer.parseInt(timestamp.substring(20, endIndex)); | ||
| // Adjust the result to nanoseconds if the input length is less than 9 digits (9 - | ||
| // (endIndex | ||
| // - 20)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please have the mathematical formula on a single line, if possible. (9 - (endIndex - 20)).
| // (endIndex | ||
| // - 20)). | ||
| fraction *= POWERS_OF_10[9 - (endIndex - 20)]; | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this else need to be here?
|
I'm reluctant to have us maintain our own date parsing code within this repo. Can we not provide the expected date format to one of the existing libraries so it can parse more performantly? |
Stress testing the Spanner client library showed that more than 50% of the total cpu time consumed by the client library while executing queries and consuming result sets is spent on parsing dates and timestamps. This stress test uses a result set containing one column of each possible data type of Spanner (both single value column and array column). Spanner automatically translates the string values that are sent by the server into Date and Timestamp objects before the user requests these from the ResultSet.
In a real-life scenario, the total percentage of total time spent on parsing these values will however be a lot lower, as most of the time the client library will be waiting on network traffic.
The total execution time of the stress tests added for Spanner on my own laptop with and without this optimization is: