8000 Possible pluggable C extension for performance by kesmit13 · Pull Request #1066 · PyMySQL/PyMySQL · GitHub
[go: up one dir, main page]

Skip to content

Possible pluggable C extension for performance #1066

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Remove numpy support for now; clean up API
  • Loading branch information
kesmit13 committed Sep 1, 2022
commit ea0ae59966cfb0a435629db3c7be4d07e02d94d7
24 changes: 24 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,30 @@ It increases the perforance of PyMySQL about 10-15%, which still leaves it at
the second slowest client. It is also based on a PyMySQL codebase from years ago,
so it does not contain any recent bug fixes or features of that project.

## How fast can it be?

If it's still mostly based on PyMySQL, how fast can it really be? Here's one
benchmark. We uploaded
[this data file](http://studiotutorials.s3.amazonaws.com/eCommerce/2019-Dec.csv)
into a SingleStoreDB table six times to get a total of around 21 million rows of
data including one datetime column, one float column, and 8 character columns.
We then used PyMySQL, MySQLdb, and PyMySQLsv to fetch the entire table with `fetchone`,
`fetchmany(20)`, and `fetchall` using both buffered and unbuffered cursors.
Here are the results.

| | PyMySQL | MySQLdb | PyMySQLsv |
|--------------------------|---------|---------|-----------|
| Buffered fetchone | 224.8s | 50.6s | 19.9s |
| Buffered fetchmany(20) | 217.63s | 50.3s | 15.5s |
| Buffered fetchall | 217.9s | 49.6s | 14.8s |
| Unbuffered fetchone | 230.5s | 48.3s | 25.3s |
| Unbuffered fetchmany(20) | 224.0s | 35.0s | 14.6s |
| Unbuffered fetchall | 232.4s | 37.7s | 29.2s |

As you can see the gains are quite significant for this test case. Even MySQLdb,
which is based on the MySQL libraries takes twice as long in all but one
of the categories.

## Install

This package installs just like any other Python package. Since it includes a C
Expand Down
26 changes: 11 additions & 15 deletions pymysql/connections.py
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,6 @@ class Connection:
:param named_pipe: Not supported.
:param db: **DEPRECATED** Alias for database.
:param passwd: **DEPRECATED** Alias for password.
:param output_type: Type of result to return: tuples, namedtuples, dicts, numpy or pandas.
:param parse_json: Parse JSON values into Python objects?
:param invalid_date_value: Value to use in place of an invalid date. By default, a string
containing the invalid content is returned.
Expand Down Expand Up @@ -223,7 +222,6 @@ def __init__(
ssl_key=None,
ssl_verify_cert=None,
ssl_verify_identity=None,
output_type='tuples',
parse_json=False,
invalid_date_value=UNSET,
invalid_time_value=UNSET,
Expand Down Expand Up @@ -347,8 +345,7 @@ def _config(key, arg):
self.client_flag = client_flag

self.pure_python = pure_python
self.unbuffered = False
self.output_type = output_type
self.output_type = 'tuples'
self.cursorclass = cursorclass
self.resultclass = MySQLResult

Expand All @@ -357,12 +354,10 @@ def _config(key, arg):
self.resultclass = MySQLResultSV
if self.cursorclass is SSCursor:
self.cursorclass = SSCursorSV
self.unbuffered = True
elif self.cursorclass is DictCursor:
self.output_type = 'dicts'
elif self.cursorclass is SSDictCursor:
self.cursorclass = SSDictCursorSV
self.unbuffered = True
self.output_type = 'dicts'

self._result = None
Expand Down Expand Up @@ -602,11 +597,11 @@ def query(self, sql, unbuffered=False):
if isinstance(sql, str):
sql = sql.encode(self.encoding, "surrogateescape")
self._execute_command(COMMAND.COM_QUERY, sql)
self._affected_rows = self._read_query_result(unbuffered=unbuffered or self.unbuffered)
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
return self._affected_rows

def next_result(self, unbuffered=False):
self._affected_rows = self._read_query_result(unbuffered=unbuffered or self.unbuffered)
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
return self._affected_rows

def affected_rows(self):
Expand Down Expand Up @@ -822,10 +817,9 @@ def _write_bytes(self, data):

def _read_query_result(self, unbuffered=False):
self._result = None
if unbuffered or self.unbuffered:
if unbuffered:
try:
result = self.resultclass(self)
result.init_unbuffered_query()
result = self.resultclass(self, unbuffered=unbuffered)
except:
result.unbuffered_active = False
result.connection = None
Expand Down Expand Up @@ -1188,7 +1182,7 @@ def get_server_info(self):


class MySQLResult:
def __init__(self, connection):
def __init__(self, connection, unbuffered=False):
"""
:type connection: Connection
"""
Expand All @@ -1203,6 +1197,8 @@ def __init__(self, connection):
self.rows = None
self.has_next = None
self.unbuffered_active = False
if unbuffered:
self.init_unbuffered_query()

def __del__(self):
if self.unbuffered_active:
Expand Down Expand Up @@ -1399,16 +1395,16 @@ def _get_descriptions(self):
self.description = tuple(description)

class MySQLResultSV(MySQLResult):
def __init__(self, connection):
MySQLResult.__init__(self, connection)
def __init__(self, connection, unbuffered=False):
MySQLResult.__init__(self, connection, unbuffered=unbuffered)
self.options = {k: v for k, v in dict(
default_converters=converters.decoders,
output_type=connection.output_type,
parse_json=connection.parse_json,
invalid_date_value=connection.invalid_date_value,
invalid_time_value=connection.invalid_time_value,
invalid_datetime_value=connection.invalid_datetime_value,
unbuffered=connection.unbuffered,
unbuffered=unbuffered,
).items() if v is not UNSET}
self._read_rowdata_packet = functools.partial(_pymysqlsv.read_rowdata_packet, self)
self._read_rowdata_packet_unbuffered = functools.partial(_pymysqlsv.read_rowdata_packet, self)
Expand Down
Loading
0