8000 Protocol module by groutr · Pull Request #669 · PyMySQL/PyMySQL · GitHub
[go: up one dir, main page]

Skip to content

Protocol module #669

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from
Closed

Protocol module #669

wants to merge 10 commits into from

Conversation

groutr
Copy link
@groutr groutr commented May 8, 2018

Move parsing functions to a protocol module.

There are three main types of parsing functions: packet-level, fixed length, and variable length. Packet-level functions operate on the whole packet and do not accept an offset parameter. Fixed-length functions parse a known amount of bytes and return the only the result. Variable-length functions parse a variable amount of data and return a tuple of (bytes read, result).

def read_uint8(self):
result = protocol.read_uint8(self._data, offset=self._position)
self._position += 1
return result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't split methods. Just move.

@methane methane closed this May 9, 2018
@methane
Copy link
Member
methane commented May 9, 2018

I did "just move". What do you want to do after that?

@groutr
Copy link
Author
groutr commented May 9, 2018

@methane I was under the impression that the final goal was to remove the wrapping classes and use calls to simple functions with builtin datatypes. That's why I moved the actual parsing code into their own functions, because eventually the wrapping classes would go away.

From #297 (comment)

To resolve issue 2, we should remove classes for parsing packets, and use simple flat function
and builtin types.

Phase 2 would be separating out the functions that orchestrate packet parsing (like _read_ok_packet, _read_rowdata_packet, etc). In this PR, I attempted to remove half of the work of MysqlPacket to parsing functions. The next phase would be to remove the other half (the position tracking inside packets) to functions that orchestrate the overall process of parsing packets. At that point MysqlPacket isn't doing anything and can be removed along with its companion classes.

The next phase would be replacing MysqlPacket with a tuple. This means that _read_packet would be returning tuples instead of instances of classes.
Some formats I've been considering are:

  1. (packet number, packet data): This would probably be the clearest replacement. This could allow for interesting packet parsing strategies, such as processing packets in parallel.
  2. (packet data,): Most of the time, all that is actually needed is packet data. After the packet number is checked by _read_packet, it doesn't seem to be used again.
  3. (packet number, packet length, packet data): This is the representation I used in my first attempt. It worked well, except packet length wasn't used all much, and is trivial to get by calling len(packet data).
  4. (packet number, packet type, packet data): Similar to 1, but we also store the packet type. This would ensure that the packet type is determined exactly once. I'm unclear on the precise benefits though.

At that point, I believe the goal set forth initially will have been fulfilled. A C extension can then be created to significantly accelerate packet parsing and the orchestrating functions can use either the C-extension or the pure-python module.

@groutr
Copy link
Author
groutr commented May 24, 2018

@methane, any feedback on the above comment?

@methane
Copy link
Member
methane commented May 29, 2018

I was under the impression that the final goal was to remove the wrapping classes and use calls to simple functions with builtin datatypes.

OK,

That's why I moved the actual parsing code into their own functions, because eventually the wrapping classes would go away.
...
In this PR, I attempted to remove half of the work of MysqlPacket to parsing functions.

I felt this was wrong step. It only doubles the function call (= performance), without any simplification.

The next phase would be to remove the other half (the position tracking inside packets) to functions that orchestrate the overall process of parsing packets. At that point MysqlPacket isn't doing anything and can be removed along with its companion classes.

I think next step should convert PacketWrapper classes into simple function.
It can be achieved without touching MySQLPacket class.
It would be easy to review, and no performance penalty.

Maybe, FieldDescriptorPacket can be converted too.

Breaking MySQLPacket class will be next step, but I haven't decided to do it or not.

@groutr
Copy link
Author
groutr commented Jun 6, 2018
That's why I moved the actual parsing code into their own functions, because eventually the wrapping classes would go away.
...
In this PR, I attempted to remove half of the work of MysqlPacket to parsing functions.

I felt this was wrong step. It only doubles the function call (= performance), without any simplification.

This is an intermediate step, so temporarily increasing function call wasn't a concern. It was done to show that no functionality was being broken by using the simple functions. It meant that any call to MySqlPacket.read_uint8 could be replaced with a call to protocol/read_uint8 as they were now equivalent.

The next phase would be to remove the other half (the position tracking inside packets) to functions that orchestrate the overall process of parsing packets. At that point MysqlPacket isn't doing anything and can be removed along with its companion classes.

I think next step should convert PacketWrapper classes into simple function.
It can be achieved without touching MySQLPacket class.
It would be easy to review, and no performance penalty.

I feel at this stage, performance is the wrong goal. We should be focusing on getting the implementation correct for an easy C extension, and then optimizing that implementation. The endgame is to prepare things for a C extension which is where performance will become a primary goal. I've already done some performance analysis on my design to show that it provides a performance benefit. My PRs in this branch try to achieve a cleaner implemenation of the same design that I was experiementing with in my parser2 branch. The intermediate steps are designed to be easy to review. It's the performance of the final step that matters, not the steps getting there.

Maybe, FieldDescriptorPacket can be converted too.

Breaking MySQLPacket class will be next step, but I haven't decided to do it or not.

Again, I was under the impression that removing MySQLPacket and the other wrapping classes was the whole point of this effort. I don't understand your approach very well. Could you outline your steps more completely?

@methane
Copy link
Member
methane commented Jun 7, 2018 via email

@groutr
Copy link
Author
groutr commented Jun 7, 2018

Isn't the point of this branch to isolate this work until it's ready to merge to master? Releases can still be made from master, independent of the state of this branch.

@methane
Copy link
Member
methane commented Jun 7, 2018 via email

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 1, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0