-
-
Notifications
You must be signed in to change notification settings - Fork 32.7k
Fixed #36143 -- Made max_query_params respect SQLITE_LIMIT_VARIABLE_NUMBER. #19427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bfb67f5
to
ad09649
Compare
def max_query_params(self): | ||
# This limit can be lowered per connection at runtime with setlimit(). | ||
# If set, the cached property must be deleted to take effect. | ||
return self.connection.connection.getlimit(sqlite3.SQLITE_LIMIT_VARIABLE_NUMBER) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The next thing might be to get the value of SQLITE_LIMIT_COMPOUND_SELECT
in the following:
django/django/db/backends/sqlite3/operations.py
Lines 49 to 50 in 0ee06c0
if len(fields) == 1: | |
return 500 |
@cached_property | ||
def max_query_params(self): | ||
# This limit can be lowered per connection at runtime with setlimit(). | ||
# If set, the cached property must be deleted to take effect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The limit function takes nanoseconds to run:
In [1]: import sqlite3
In [2]: c = sqlite3.connect(":memory:")
In [3]: %timeit c.getlimit(sqlite3.SQLITE_LIMIT_VARIABLE_NUMBER)
40.1 ns ± 0.0617 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
I think it just returns an in-memory number from some struct.
Therefore, no need to cache:
@cached_property | |
def max_query_params(self): | |
# This limit can be lowered per connection at runtime with setlimit(). | |
# If set, the cached property must be deleted to take effect. | |
@property | |
def max_query_params(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Adam! I'm seeing the same results with an on-disk database as well. I've added a commit to change this and drop the del connection.features.max_query_params
lines.
I kept the first line of the comment to help document why we're not using cached_property
, but if it's clear enough I'm also happy to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, a comment seems like a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @laymonage!
Not convinced we want to test SQLite implementation of bulk_batch_size
in bulk_create
and bulk_update
tests though.
tests/bulk_create/tests.py
Outdated
@skipUnlessDBFeature("has_bulk_insert") | ||
@skipUnless(connection.vendor == "sqlite", "SQLite-specific test.") | ||
def test_max_batch_size_respects_sqlite_variable_limit(self): | ||
objs = [Country(name=f"Country {i}") for i in range(1000)] | ||
fields = ["name", "iso_two_letter", "description"] | ||
limit_name = sqlite3.SQLITE_LIMIT_VARIABLE_NUMBER | ||
current_limit = connection.features.max_query_params | ||
new_limit = min(640, current_limit) | ||
try: | ||
connection.connection.setlimit(limit_name, new_limit) | ||
max_batch_size = new_limit // len(fields) | ||
with self.assertNumQueries(ceil(len(objs) / max_batch_size)): | ||
Country.objects.bulk_create(objs) | ||
finally: | ||
connection.connection.setlimit(limit_name, current_limit) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate the test but I'm not sure the issue warrants testing bulk_batch_size
through SQLite specifics.
IMO making sure that bulk_batch_size
works properly (like you did in test_max_query_params_respects_variable_limit
and test_bulk_batch_size_respects_variable_limit
) and then keeping bulk_create
and bulk_update
test focused on the bulk_batch_size
makes more sense.
In other words, we're not testing the Oracle implementation of bulk_batch_size
here so why should we do it for SQLite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point @charettes, I felt the same way too. I wrote these ones mostly to confirm it for myself and for completeness sake, but I wasn't too happy with having vendor-specific tests in there either.
I've added a commit to remove them. Will squash or let the mergers squash them when we're happy with the overall patch. Thanks!
@@ -69,6 +71,15 @@ def test_batch_size(self): | |||
with self.assertNumQueries(len(self.notes)): | |||
Note.objects.bulk_update(self.notes, fields=["note"], batch_size=1) | |||
|
|||
def test_max_batch_size(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test and bulk_create.tests.BulkCreateTests.test_max_batch_size
pass for me when I revert the changes
Is this adding extra coverage (should probably pull to a separate commit) or do I need a specific SQLite version for these to fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Sarah, they are there for coverage yes, although maybe the lines have already been marked as "covered" via other tests. The main reason I added this was because there weren't any tests that explicitly relate bulk_update
/bulk_create
(without batch_size
, i.e. the maximum) to bulk_batch_size
.
In order for this to fail or error, we need a database backend that have a mismatch between the max_query_params
(thus the value returned by bulk_batch_size()
) vs the actual limit on the database. Or to be more precise, max_query_params
> actual limit, since the other way around will be OK just like any other small query.
I demonstrated this in the test I added before using SQLite's setlimit()
, but then removed it in 759d2bb
(#19427) per Simon's comment.
Without using setlimit()
, you need to compile SQLite with a lower variable limit than the one currently on main
(i.e. < 999). For example, using my django-docker-box branch django/django-docker-box#50:
SQLITE_VERSION=3.36.0 SQLITE_CFLAGS="-DSQLITE_MAX_VARIABLE_NUMBER=640" docker compose run --build sqlite bulk_create.tests.BulkCreateTests.test_max_batch_size
SQLITE_VERSION=3.36.0 SQLITE_CFLAGS="-DSQLITE_MAX_VARIABLE_NUMBER=640" docker compose run --build sqlite queries.test_bulk_update.BulkUpdateNoteTests.test_max_batch_size
These will fail on main
, but passes with this PR. To be fair, there are other existing tests in bulk_create.tests.BulkCreateTests
that fail on main
but pass with this PR and the above SQLite config.
In the more extreme case, you cannot even run queries.test_bulk_update.BulkUpdateNoteTests.test_max_batch_size
or any tests in queries
on main
with the above config. This is because because the creation of content types would fail due to the high number of models in queries
and the mismatch between max_query_params
vs the actual limit.
However, the added test_max_batch_size
tests act like an "integration test":
max_query_params
is tested bytest_max_query_params_respects_variable_limit
on SQLitebulk_batch_size
is tested bytest_bulk_batch_size
on SQLite+Oracle andtest_bulk_batch_size_respects_variable_limit
on SQLite. These indirectly also testmax_query_params
.- With
test_max_batch_size
, we now have an explicit test about howbulk_batch_size
affects the user-facingbulk_create()
/bulk_update()
, so now there is a "chain" frommax_query_params
all the way to the user-facing methods.
I don't feel too strongly about this, though. If the tests added for SQLite's max_query_params
and bulk_batch_size
are sufficient combined with the existing tests, I'm happy to remove the two test_max_batch_size
tests.
I'm also happy to put the two tests in a separate commit if desired. I can do it myself or let you do it as you merge, whichever works better for you. Though I'd rather wait until we're all good to merge before squashing/rebasing to ease any further reviews.
(Also, unrelated note: you can use any SQLite version, although for < 3.36 you also need to pass -DSQLITE_ENABLE_DESERIALIZE
in SQLITE_CFLAGS
because Python in bookworm was compiled with SQLite 3.40.0 which has deserialize enabled by default.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the explanation
I think adding these 2 tests in a separate commit to add the extra coverage and then having the commit with the fix makes sense to me. If you can push that up, that would be super otherwise I can pick this up tomorrow 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added another commit to update the docstrings, this can be squashed later.
@@ -69,6 +71,15 @@ def test_batch_size(self): | |||
with self.assertNumQueries(len(self.notes)): | |||
Note.objects.bulk_update(self.notes, fields=["note"], batch_size=1) | |||
|
|||
def test_max_batch_size(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Sarah, they are there for coverage yes, although maybe the lines have already been marked as "covered" via other tests. The main reason I added this was because there weren't any tests that explicitly relate bulk_update
/bulk_create
(without batch_size
, i.e. the maximum) to bulk_batch_size
.
In order for this to fail or error, we need a database backend that have a mismatch between the max_query_params
(thus the value returned by bulk_batch_size()
) vs the actual limit on the database. Or to be more precise, max_query_params
> actual limit, since the other way around will be OK just like any other small query.
I demonstrated this in the test I added before using SQLite's setlimit()
, but then removed it in 759d2bb
(#19427) per Simon's comment.
Without using setlimit()
, you need to compile SQLite with a lower variable limit than the one currently on main
(i.e. < 999). For example, using my django-docker-box branch django/django-docker-box#50:
SQLITE_VERSION=3.36.0 SQLITE_CFLAGS="-DSQLITE_MAX_VARIABLE_NUMBER=640" docker compose run --build sqlite bulk_create.tests.BulkCreateTests.test_max_batch_size
SQLITE_VERSION=3.36.0 SQLITE_CFLAGS="-DSQLITE_MAX_VARIABLE_NUMBER=640" docker compose run --build sqlite queries.test_bulk_update.BulkUpdateNoteTests.test_max_batch_size
These will fail on main
, but passes with this PR. To be fair, there are other existing tests in bulk_create.tests.BulkCreateTests
that fail on main
but pass with this PR and the above SQLite config.
In the more extreme case, you cannot even run queries.test_bulk_update.BulkUpdateNoteTests.test_max_batch_size
or any tests in queries
on main
with the above config. This is because because the creation of content types would fail due to the high number of models in queries
and the mismatch between max_query_params
vs the actual limit.
However, the added test_max_batch_size
tests act like an "integration test":
max_query_params
is tested bytest_max_query_params_respects_variable_limit
on SQLitebulk_batch_size
is tested bytest_bulk_batch_size
on SQLite+Oracle andtest_bulk_batch_size_respects_variable_limit
on SQLite. These indirectly also testmax_query_params
.- With
test_max_batch_size
, we now have an explicit test about howbulk_batch_size
affects the user-facingbulk_create()
/bulk_update()
, so now there is a "chain" frommax_query_params
all the way to the user-facing methods.
I don't feel too strongly about this, though. If the tests added for SQLite's max_query_params
and bulk_batch_size
are sufficient combined with the existing tests, I'm happy to remove the two test_max_batch_size
tests.
I'm also happy to put the two tests in a separate commit if desired. I can do it myself or let you do it as you merge, whichever works better for you. Though I'd rather wait until we're all good to merge before squashing/rebasing to ease any further reviews.
(Also, unrelated note: you can use any SQLite version, although for < 3.36 you also need to pass -DSQLITE_ENABLE_DESERIALIZE
in SQLITE_CFLAGS
because Python in bookworm was compiled with SQLite 3.40.0 which has deserialize enabled by default.)
""" | ||
SQLite has a variable limit per query. The limit can be changed using | ||
the SQLITE_MAX_VARIABLE_NUMBER compile-time option (which defaults to | ||
999 in versions < 3.32.0 or 32766 in newer versions) or lowered per | ||
connection at run-time with setlimit(SQLITE_LIMIT_VARIABLE_NUMBER, N). | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved and updated the docstring of bulk_batch_size
to here as it's more relevant
…UMBER. Co-authored-by: Xavier Frankline <xf.xavierfrank@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you
Trac ticket number
ticket-36143. @thepsalmist and I worked on this during the Djangonaut Space program, and I added the tests during the DjangoCon Europe 2025 sprints.
Branch description
As mentioned in the ticket, this changes
DatabaseFeatures.max_query_params
to make use of the actual limit set on the SQLite installation/connection as defined bySQLITE_LIMIT_VARIABLE_NUMBER
.The upper bound of the limit is set during compile time, but it can be lowered per-connection at runtime using the C API exposed by Python (3.11+) via the
getlimit()
andsetlimit()
methods.To always ensure the right limit is used, we should use a
property
instead ofcached_property
, but that does mean callinggetlimit()
whenever the property is accessed (which may or may not execute a query to the database, I'm not sure).However, the tradeoff between helping users set a lower limit on the connection vs the performance implications doesn't seem worth it, considering how uncommon it is to do. Besides, if users really want to do so, they can always clear thecached_property
before/after callingsetlimit()
to ensure the new limit is used.Edit: apparently the
getlimit()
function is quite efficient, so I've changed the implementation to useproperty
instead ofcached_property
: #19427 (comment)I'm not super confident about the tests, in particular how they need to dig into the internals of the implementation a bit, e.g. with howbulk_batch_size()
is called forbulk_update()
.Edit: I've removed the SQLite-specific
bulk_update()
andbulk_insert()
tests to instead rely on themax_query_params
/bulk_batch_size
abstractions.Experiments
I made an experiment in https://github.com/laymonage/django-sqlite-bulk-optimization to showcase an example of a performance optimization as a result of this change. Right now there's only one test: doing a
bulk_create()
for 4096 instances of a model with two fields (three including PK) originally took 9 queriesceil(2 * 4096/999)
with the hardcoded999
parameter limit. Now, it only takes one query (with the default limit set by Ubuntu's SQLite compilation flags, which is 250000 on both Debian and Ubuntu)Checklist
main
branch.