8000 Fixed #36143 -- Made max_query_params respect SQLITE_LIMIT_VARIABLE_NUMBER. by laymonage · Pull Request #19427 · django/django · GitHub
[go: up one dir, main page]

Skip to content

Fixed #36143 -- Made max_query_params respect SQLITE_LIMIT_VARIABLE_NUMBER. #19427

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 8, 2025

Conversation

laymonage
Copy link
Contributor
@laymonage laymonage commented Apr 27, 2025

Trac ticket number

ticket-36143. @thepsalmist and I worked on this during the Djangonaut Space program, and I added the tests during the DjangoCon Europe 2025 sprints.

Branch description

As mentioned in the ticket, this changes DatabaseFeatures.max_query_params to make use of the actual limit set on the SQLite installation/connection as defined by SQLITE_LIMIT_VARIABLE_NUMBER.

The upper bound of the limit is set during compile time, but it can be lowered per-connection at runtime using the C API exposed by Python (3.11+) via the getlimit() and setlimit() methods.

To always ensure the right limit is used, we should use a property instead of cached_property, but that does mean calling getlimit() whenever the property is accessed (which may or may not execute a query to the database, I'm not sure). However, the tradeoff between helping users set a lower limit on the connection vs the performance implications doesn't seem worth it, considering how uncommon it is to do. Besides, if users really want to do so, they can always clear the cached_property before/after calling setlimit() to ensure the new limit is used.

Edit: apparently the getlimit() function is quite efficient, so I've changed the implementation to use property instead of cached_property: #19427 (comment)

I'm not super confident about the tests, in particular how they need to dig into the internals of the implementation a bit, e.g. with how bulk_batch_size() is called for bulk_update().

Edit: I've removed the SQLite-specific bulk_update() and bulk_insert() tests to instead rely on the max_query_params/bulk_batch_size abstractions.

Experiments

I made an experiment in https://github.com/laymonage/django-sqlite-bulk-optimization to showcase an example of a performance optimization as a result of this change. Right now there's only one test: doing a bulk_create() for 4096 instances of a model with two fields (three including PK) originally took 9 queries ceil(2 * 4096/999) with the hardcoded 999 parameter limit. Now, it only takes one query (with the default limit set by Ubuntu's SQLite compilation flags, which is 250000 on both Debian and Ubuntu)

Checklist

  • This PR targets the main branch.
  • The commit message is written in past tense, mentions the ticket number, and ends with a period.
  • I have checked the "Has patch" ticket flag in the Trac system.
  • I have added or updated relevant tests.
  • I have added or updated relevant docs, including release notes if applicable.
  • I have attached screenshots in both light and dark modes for any UI changes.

@github-actions github-actions bot added the no ticket Based on PR title, no linked Trac ticket label Apr 27, 2025
@laymonage laymonage force-pushed the ticket_36143 branch 2 times, most recently from bfb67f5 to ad09649 Compare April 27, 2025 13:32
@laymonage laymonage changed the title Made max_query_params respect SQLITE_LIMIT_VARIABLE_NUMBER. Fixed #36143 -- Made max_query_params respect SQLITE_LIMIT_VARIABLE_NUMBER. Apr 27, 2025
@github-actions github-actions bot removed the no ticket Based on PR title, no linked Trac ticket label Apr 27, 2025
def max_query_params(self):
# This limit can be lowered per connection at runtime with setlimit().
# If set, the cached property must be deleted to take effect.
return self.connection.connection.getlimit(sqlite3.SQLITE_LIMIT_VARIABLE_NUMBER)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next thing might be to get the value of SQLITE_LIMIT_COMPOUND_SELECT in the following:

if len(fields) == 1:
return 500

Comment on lines 150 to 153
@cached_property
def max_query_params(self):
# This limit can be lowered per connection at runtime with setlimit().
# If set, the cached property must be deleted to take effect.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The limit function takes nanoseconds to run:

In [1]: import sqlite3

In [2]: c = sqlite3.connect(":memory:")

In [3]: %timeit c.getlimit(sqlite3.SQLITE_LIMIT_VARIABLE_NUMBER)
40.1 ns ± 0.0617 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

I think it just returns an in-memory number from some struct.

Therefore, no need to cache:

Suggested change
@cached_property
def max_query_params(self):
# This limit can be lowered per connection at runtime with setlimit().
# If set, the cached property must be deleted to take effect.
@property
def max_query_params(self):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Adam! I'm seeing the same results with an on-disk database as well. I've added a commit to change this and drop the del connection.features.max_query_params lines.

I kept the first line of the comment to help document why we're not using cached_property, but if it's clear enough I'm also happy to remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, a comment seems like a good idea.

Copy link
Member
@charettes charettes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @laymonage!

Not convinced we want to test SQLite implementation of bulk_batch_size in bulk_create and bulk_update tests though.

Comment on lines 298 to 303
@skipUnlessDBFeature("has_bulk_insert")
@skipUnless(connection.vendor == "sqlite", "SQLite-specific test.")
def test_max_batch_size_respects_sqlite_variable_limit(self):
objs = [Country(name=f"Country {i}") for i in range(1000)]
fields = ["name", "iso_two_letter", "description"]
limit_name = sqlite3.SQLITE_LIMIT_VARIABLE_NUMBER
current_limit = connection.features.max_query_params
new_limit = min(640, current_limit)
try:
connection.connection.setlimit(limit_name, new_limit)
max_batch_size = new_limit // len(fields)
with self.assertNumQueries(ceil(len(objs) / max_batch_size)):
Country.objects.bulk_create(objs)
finally:
connection.connection.setlimit(limit_name, current_limit)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the test but I'm not sure the issue warrants testing bulk_batch_size through SQLite specifics.

IMO making sure that bulk_batch_size works properly (like you did in test_max_query_params_respects_variable_limit and test_bulk_batch_size_respects_variable_limit) and then keeping bulk_create and bulk_update test focused on the bulk_batch_size makes more sense.

In other words, we're not testing the Oracle implementation of bulk_batch_size here so why should we do it for SQLite?

Copy link
Contributor Author
@laymonage laymonage Apr 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @charettes, I felt the same way too. I wrote these ones mostly to confirm it for myself and for completeness sake, but I wasn't too happy with having vendor-specific tests in there either.

I've added a commit to remove them. Will squash or let the mergers squash them when we're happy with the overall patch. Thanks!

10000
@@ -69,6 +71,15 @@ def test_batch_size(self):
with self.assertNumQueries(len(self.notes)):
Note.objects.bulk_update(self.notes, fields=["note"], batch_size=1)

def test_max_batch_size(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test and bulk_create.tests.BulkCreateTests.test_max_batch_size pass for me when I revert the changes
Is this adding extra coverage (should probably pull to a separate commit) or do I need a specific SQLite version for these to fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Sarah, they are there for coverage yes, although maybe the lines have already been marked as "covered" via other tests. The main reason I added this was because there weren't any tests that explicitly relate bulk_update/bulk_create (without batch_size, i.e. the maximum) to bulk_batch_size.

In order for this to fail or error, we need a database backend that have a mismatch between the max_query_params (thus the value returned by bulk_batch_size()) vs the actual limit on the database. Or to be more precise, max_query_params > actual limit, since the other way around will be OK just like any other small query.

I demonstrated this in the test I added before using SQLite's setlimit(), but then removed it in 759d2bb (#19427) per Simon's comment.

Without using setlimit(), you need to compile SQLite with a lower variable limit than the one currently on main (i.e. < 999). For example, using my django-docker-box branch django/django-docker-box#50:

SQLITE_VERSION=3.36.0 SQLITE_CFLAGS="-DSQLITE_MAX_VARIABLE_NUMBER=640" docker compose run --build sqlite bulk_create.tests.BulkCreateTests.test_max_batch_size

SQLITE_VERSION=3.36.0 SQLITE_CFLAGS="-DSQLITE_MAX_VARIABLE_NUMBER=640" docker compose run --build sqlite queries.test_bulk_update.BulkUpdateNoteTests.test_max_batch_size

These will fail on main, but passes with this PR. To be fair, there are other existing tests in bulk_create.tests.BulkCreateTests that fail on main but pass with this PR and the above SQLite config.

In the more extreme case, you cannot even run queries.test_bulk_update.BulkUpdateNoteTests.test_max_batch_size or any tests in queries on main with the above config. This is because because the creation of content types would fail due to the high number of models in queries and the mismatch between max_query_params vs the actual limit.

However, the added test_max_batch_size tests act like an "integration test":

  • max_query_params is tested by test_max_query_params_respects_variable_limit on SQLite
  • bulk_batch_size is tested by test_bulk_batch_size on SQLite+Oracle and test_bulk_batch_size_respects_variable_limit on SQLite. These indirectly also test max_query_params.
  • With test_max_batch_size, we now have an explicit test about how bulk_batch_size affects the user-facing bulk_create()/bulk_update(), so now there is a "chain" from max_query_params all the way to the user-facing methods.

I don't feel too strongly about this, though. If the tests added for SQLite's max_query_params and bulk_batch_size are sufficient combined with the existing tests, I'm happy to remove the two test_max_batch_size tests.

I'm also happy to put the two tests in a separate commit if desired. I can do it myself or let you do it as you merge, whichever works better for you. Though I'd rather wait until we're all good to merge before squashing/rebasing to ease any further reviews.

(Also, unrelated note: you can use any SQLite version, although for < 3.36 you also need to pass -DSQLITE_ENABLE_DESERIALIZE in SQLITE_CFLAGS because Python in bookworm was compiled with SQLite 3.40.0 which has deserialize enabled by default.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation
I think adding these 2 tests in a separate commit to add the extra coverage and then having the commit with the fix makes sense to me. If you can push that up, that would be super otherwise I can pick this up tomorrow 👍

Copy link
Contributor Author
@laymonage laymonage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added another commit to update the docstrings, this can be squashed later.

@@ -69,6 +71,15 @@ def test_batch_size(self):
with self.assertNumQueries(len(self.notes)):
Note.objects.bulk_update(self.notes, fields=["note"], batch_size=1)

def test_max_batch_size(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Sarah, they are there for coverage yes, although maybe the lines have already been marked as "covered" via other tests. The main reason I added this was because there weren't any tests that explicitly relate bulk_update/bulk_create (without batch_size, i.e. the maximum) to bulk_batch_size.

In order for this to fail or error, we need a database backend that have a mismatch between the max_query_params (thus the value returned by bulk_batch_size()) vs the actual limit on the database. Or to be more precise, max_query_params > actual limit, since the other way around will be OK just like any other small query.

I demonstrated this in the test I added before using SQLite's setlimit(), but then removed it in 759d2bb (#19427) per Simon's comment.

Without using setlimit(), you need to compile SQLite with a lower variable limit than the one currently on main (i.e. < 999). For example, using my django-docker-box branch django/django-docker-box#50:

SQLITE_VERSION=3.36.0 SQLITE_CFLAGS="-DSQLITE_MAX_VARIABLE_NUMBER=640" docker compose run --build sqlite bulk_create.tests.BulkCreateTests.test_max_batch_size

SQLITE_VERSION=3.36.0 SQLITE_CFLAGS="-DSQLITE_MAX_VARIABLE_NUMBER=640" docker compose run --build sqlite queries.test_bulk_update.BulkUpdateNoteTests.test_max_batch_size

These will fail on main, but passes with this PR. To be fair, there are other existing tests in bulk_create.tests.BulkCreateTests that fail on main but pass with this PR and the above SQLite config.

In the more extreme case, you cannot even run queries.test_bulk_update.BulkUpdateNoteTests.test_max_batch_size or any tests in queries on main with the above config. This is because because the creation of content types would fail due to the high number of models in queries and the mismatch between max_query_params vs the actual limit.

However, the added test_max_batch_size tests act like an "integration test":

  • max_query_params is tested by test_max_query_params_respects_variable_limit on SQLite
  • bulk_batch_size is tested by test_bulk_batch_size on SQLite+Oracle and test_bulk_batch_size_respects_variable_limit on SQLite. These indirectly also test max_query_params.
  • With test_max_batch_size, we now have an explicit test about how bulk_batch_size affects the user-facing bulk_create()/bulk_update(), so now there is a "chain" from max_query_params all the way to the user-facing methods.

I don't feel too strongly about this, though. If the tests added for SQLite's max_query_params and bulk_batch_size are sufficient combined with the existing tests, I'm happy to remove the two test_max_batch_size tests.

I'm also happy to put the two tests in a separate commit if desired. I can do it myself or let you do it as you merge, whichever works better for you. Though I'd rather wait until we're all good to merge before squashing/rebasing to ease any further reviews.

(Also, unrelated note: you can use any SQLite version, although for < 3.36 you also need to pass -DSQLITE_ENABLE_DESERIALIZE in SQLITE_CFLAGS because Python in bookworm was compiled with SQLite 3.40.0 which has deserialize enabled by default.)

Comment on lines +152 to +157
"""
SQLite has a variable limit per query. The limit can be changed using
the SQLITE_MAX_VARIABLE_NUMBER compile-time option (which defaults to
999 in versions < 3.32.0 or 32766 in newer versions) or lowered per
connection at run-time with setlimit(SQLITE_LIMIT_VARIABLE_NUMBER, N).
"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved and updated the docstring of bulk_batch_size to here as it's more relevant

Copy link
Contributor
@sarahboyce sarahboyce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you

@sarahboyce sarahboyce merged commit 358fd21 into django:main May 8, 2025
29 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0