8000 Fixed #35444 -- Added generic support for Aggregate.order_by by camuthig · Pull Request #18361 · django/django · GitHub
[go: up one dir, main page]

Skip to content

Fixed #35444 -- Added generic support for Aggregate.order_by #18361

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 3, 2025

Conversation

camuthig
Copy link
Contributor
@camuthig camuthig commented Jul 11, 2024

Trac ticket number

ticket-35444

Branch description

There are three commits to this draft that represent the three phases of the change:

  1. Deprecate the ordering argument on the PostgreSQL aggregate functions and replace it with order_by to create a consistent naming convention
  2. Add generic support for Aggregate.order_by and migrate the Postgres-specific StringAgg class to the shared aggregates module, deprecating the Postgres version.
  3. Deprecate the OrderableAggMixin

Checklist

  • This PR targets the main branch.
  • The commit message is written in past tense, mentions the ticket number, and ends with a period.
  • I have checked the "Has patch" ticket flag in the Trac system.
  • I have added or updated relevant tests.
  • I have added or updated relevant docs, including release notes if applicable.

@camuthig
Copy link
Contributor Author

@charettes I've made some progress on the suggestions. I have the three phases broken out into unique commits and have added some of the documentation and copied the tests from the postgres module into a new class that should run against all of the database backends.

I ran into two issues in the process that I wanted to get your opinion on.

  1. Python is compiled against Sqlite3 version 3.37.0, which doesn't have support for the order by yet. So the ordering tests aren't running for that. I think I would need to compile Python from source and point it to a specific version of Sqlite3 to get past this, but I'm not sure if you have any thoughts.
  2. Sqlite3 doesn't behave well when you combine distinct and a delimiter value. It throws an error of SQLite doesn't support DISTINCT on aggregate functions accepting multiple arguments. So it treats the delimiter as a second argument. In Sqlite if you don't provide a delimiter it will use a comma as a default. I was trying to build it such that if the user sets the delimiter to a comma and is using Sqlite as a backend, then the delimiter is ignored. However, I could figure out how to make that work. I tried a few things around custom SQL, using a class attribute instead of making it part of expressions, etc. but I couldn't quite get it right. If you can see a good way to treat that, I think it would be awesome to make distinct work in Sqlite for at least some scenarios, if not all.

As a side note, I named it StringAgg because that is what I am used to, and it feels like the more descriptive function name between that a GROUP_CONCAT, but I'm open to changing that to whatever you think is best. I guess would could always have an alias class class GroupConcat(StringAgg): pass if we really wanted to make it smooth for developers from either environment, but that doesn't seem to pass the smell test on "one way to do something"

@charettes
Copy link
Member

Hello @camuthig, thanks for spinning this up!

Python is compiled against Sqlite3 version 3.37.0, which doesn't have support for the order by yet. So the ordering tests aren't running for that. I think I would need to compile Python from source and point it to a specific version of Sqlite3 to get past this, but I'm not sure if you have any thoughts.

It's possible CI might not be be setup here but if you're a *nix setup you can use LD_PRELOAD to point at any SQLite version. You don't have to build from source as the SQLite project provides pre-built binaries.

Sqlite3 doesn't behave well when you combine distinct and a delimiter value. It throws an error of SQLite doesn't support DISTINCT on aggregate functions accepting multiple arguments. So it treats the delimiter as a second argument. In Sqlite if you don't provide a delimiter it will use a comma as a default. I was trying to build it such that if the user sets the delimiter to a comma and is using Sqlite as a backend, then the delimiter is ignored. However, I could figure out how to make that work. I tried a few things around custom SQL, using a class attribute instead of making it part of expressions, etc. but I couldn't quite get it right. If you can see a good way to treat that, I think it would be awesome to make distinct work in Sqlite for at least some scenarios, if not all.

Well that's odd for sure.

In any aggregate function that takes a single argument, that argument can be preceded by the keyword DISTINCT. In such cases, duplicate elements are filtered before being passed into the aggregate function. For example, the function "count(distinct X)" will return the number of distinct values of column X instead of the total number of non-null values in column X.

Usually when we run into these edge cases we add a database feature (e.g. supports_aggregate_distinct_multiple_argument) and use for two purpose.

  1. Adjust Aggregate.as_sql to raise an error when self.distinct and not connection.features.supports_aggregate_distinct_multiple_argument and len(super().get_expressions()) > 1
  2. Add tests for @skipUnlessDBFeature("supports_aggregate_distinct_multiple_argument") that cover the backends that support it and @skipIfDBFeature that makes sure the proper exception is raised.

As a side note, I named it StringAgg because that is what I am used to, and it feels like the more descriptive function name between that a GROUP_CONCAT, but I'm open to changing that to whatever you think is best.

I think StringAgg is fine as that's the name most backends use (Postgres, SQLite, SQLServer)

Copy link
Member
@charettes charettes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @camuthig!

I left a few comments and pointers. My main points of feedbacks are

  1. delimiter should not be wrapped in a Value as that prevent references to columns and normally string arguments are considered to be F'ield references.
  2. I appreciate the attention to testing but we should focus them on StringAgg behaviour and order_by support. Did you notice anything special when subqueries were involved?

warnings.warn(
"The PostgreSQL specific StringAgg function is deprecated. Use "
"django.db.models.aggregate.StringAgg instead.",
category=RemovedInDjango60Warning,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might have to adjust the stacklevel here so the warning points at the location of StringAgg instantiation. I think stacklevel=1 is the right one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll need stacklevel=1 at the very least?

Comment on lines 260 to 267
def as_sql(self, compiler, connection, **extra_context):
return super().as_sql(compiler, connection, **extra_context)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one can go?

Comment on lines 2395 to 2404
class StringAggTestModel(Model):
char_field = CharField(max_length=30, blank=True)
text_field = TextField(blank=True)
json_field = JSONField(null=True)


class StatTestModel(Model):
int1 = IntegerField()
int2 = IntegerField()
related_field = ForeignKey(StringAggTestModel, SET_NULL, null=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try reuse the existing models defined in aggregation/models.py instead. We can't add new tables/models for each new features otherwise the suite slowly gets slower.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are taken directly from the PostgreSQL specific of these tests. My thoughts were

  1. We are deprecating that class in 6.0, so we will delete all of the StringAgg tests from that module at that time
  2. Those tests exist for a reason, even if I am not familiar with them, so I didn't want to drop any and cause a regression

This is why I have some tests that may seem very specific, like the subqueries, and why I went about defining these models. It kept me from re-implementing the same behaviors in a way that might cause a regression. However, I am open to using the models already defined in aggregation/models.py, I'll just need to sit with the data a bit to come up with the right expectations. I think it might be best to keep the tests, like the subqueries, in place to make sure we maintain the coverage. If you think those behaviors are covered in other tests, though, then I'm open to cleaning them up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, if they already exist we should definitely just port them over.

However, I am open to using the models already defined in aggregation/models.py

If you could port them while adapting them to use the existing models that would be awesome.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if we're porting them over we should delete the original ones from postgres_tests given postgres.aggregates.StringAgg will only a shim at this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some work to merge the tests into the existing AggregateTestCase, keeping some of the more esoteric ones specific to the StringAgg behaviors but making them work with the existing data/models. I had to add a JSON field to the existing models to that end, but otherwise, I think it works pretty well.

def test_default_argument(self):
StringAggTestModel.objects.all().delete()
tests = [
(StringAgg("char_field", delimiter=";", default="<empty>"), "<empty>"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect this to crash as default must be passed as Value? Passing str to Func should normally attempt resolve to field references (e.g. `"" -> F("")") and crash if missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delimiter should not be wrapped in a Value as that prevent references to columns and normally string arguments are considered to be F'ield references.

The Value wrapper on a string comes right from the old implementation in the Postgres-specific version. Would it make more sense to allow for the string to come through and wrap it as a value for 5.2 with a deprecation warning in 6? That does break the ability to use a string as an alias for F for 5.2, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have something together that wraps the __init__ from the Postgres version of the StringAgg class and does the Value(str(delimiter)) and logs a deprecation warning. The main StringAgg can then expect an Expression of some sort.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have something together that wraps the init from the Postgres version of the StringAgg class and does the Value(str(delimiter)) and logs a deprecation warning. The main StringAgg can then expect an Expression of some sort.

That's perfect, using the deprecation period for postgres.StringAgg to models.StringAgg seems like a great way to normalize this behavior 🎉

@camuthig camuthig force-pushed the merge-orderable-agg-mixin branch 3 times, most recently from 636e617 to 3280e3e Compare August 3, 2024 04:12
@camuthig
Copy link
Contributor Author
camuthig commented Aug 3, 2024

@charettes I think I have updates for all of your comments at this phase. I was able to build a custom python environment with pyenv and brew using guidance here and test the group concat and ordering behaviors with a more recent version of sqlite, so that is cool.

I know there are some conflicts and I can work on cleaning those up. I'm also not totally sure what is wrong with the docs. I ran the same command on my own machine, and everything seems to build properly.

Copy link
Member
@charettes charettes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great, I left a few comments regarding null handling and a few other things.

I like that you started to breakdown the commits I may suggest a different order and naming scheme.

Refs #35444 -- Deprecated contrib.postgres aggregates ordering for order_by.

(Explain the reasoning to better align with Window.order_by and plans to move support for Aggregate)

Fixed #35444 -- Add generic support for Aggregate.order_by.

(Explain that this now supersedes the contrib.postgres support and introduces StringAgg to exercise this support)

^^ this commit should make OrderableAggMixin unused.

Refs #35444 -- Deprecated contrib.postgres.OrderableAggMixin

(Explain that it serves no purposes now that Aggregate supports order_by but it is kept around for a deprecation cycle as some users might be relying on it)

Comment on lines 72 to 73
"String delimiters will be converted to F statements instead of Value"
"statements. Explicit Value instances should be used instead.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"String delimiters will be converted to F statements instead of Value"
"statements. Explicit Value instances should be used instead.",
"delimiter: str will be resolved as a field reference instead of a string literal"
f"on Django 6.0. Pass `delimiter=Value({delimiter!r})` to preserve the previous ."
"behaviour.",

warnings.warn(
"The PostgreSQL specific StringAgg function is deprecated. Use "
"django.db.models.aggregate.StringAgg instead.",
category=RemovedInDjango60Warning,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll need stacklevel=1 at the very least?

Comment on lines 1383 to 1493
elif isinstance(param, (BaseExpression, str, F)):
return cls(param)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
elif isinstance(param, (BaseExpression, str, F)):
return cls(param)
elif isinstance(param, str) or hasattr(param, "resolve_expression"):
return cls(param)

Comment on lines 2299 to 2316
# Different engines treat null STRING_AGG differently, so excluding it for
# consistency.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on that?

I seems that SQLite, Postgres, MySQL, and Oracle all return NULL which should translate to None?

  1. SQLite
  2. MySQL
  3. Postgres
  4. Oracle

Did you possibly run into issues because of a lack of ordering on the returned value and the fact some backends default to NULLS LAST instead of NULLS FIRST?

If that's the case you should use assertQuerySetEqual(..., ordered=False) or specify an order_by(OrderBy('agg', nulls_last=True)).

I think it'd be good to include at least one tests that ensures that None is returned on an empty not-NULL set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I was reading the test results wrong. I went with assertQuerySetEqual(..., ordered=False) to resolve this.

Comment on lines 2316 to 2319
"""
This test is based on tests taken from existing PostgreSQL specific tests and
kept to avoid regressions as StringAgg is ported to the shared database module.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""
This test is based on tests taken from existing PostgreSQL specific tests and
kept to avoid regressions as StringAgg is ported to the shared database module.
"""

Comment on lines 148 to 156
def test_ordering_warns_of_deprecation(self):
msg = "The ordering argument is deprecated. Use order_by instead."
with self.assertWarnsMessage(RemovedInDjango60Warning, msg):
values = AggregateTestModel.objects.aggregate(
arrayagg=ArrayAgg("integer_field", ordering=F("integer_field").desc())
)
self.assertEqual(values, {"arrayagg": [2, 1, 0, 0]})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need also need tests for the creation of postgres.StringAgg.

Comment on lines 150 to 156
with self.assertWarnsMessage(RemovedInDjango60Warning, msg):
values = AggregateTestModel.objects.aggregate(
arrayagg=ArrayAgg("integer_field", ordering=F("integer_field").desc())
)
self.assertEqual(values, {"arrayagg": [2, 1, 0, 0]})
Copy link
Member
@charettes charettes Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following should ensure that the proper stacklevel is passed to warnings.warn.

Suggested change
with self.assertWarnsMessage(RemovedInDjango60Warning, msg):
values = AggregateTestModel.objects.aggregate(
arrayagg=ArrayAgg("integer_field", ordering=F("integer_field").desc())
)
self.assertEqual(values, {"arrayagg": [2, 1, 0, 0]})
with self.assertWarnsMessage(RemovedInDjango60Warning, msg) ctx:
values = AggregateTestModel.objects.aggregate(
arrayagg=ArrayAgg("integer_field", ordering=F("integer_field").desc())
)
self.assertEqual(values, {"arrayagg": [2, 1, 0, 0]})
self.assertEqual(ctx.filename, __file__)

@camuthig camuthig force-pushed the merge-orderable-agg-mixin branch 6 times, most recently from 333e523 to 7f8c533 Compare August 15, 2024 02:16
@camuthig
Copy link
Contributor Author

@charettes I have update the code to hit all of your comments and get the linters passing. I will take a dive into the failing tests shortly. At least some of them appear to be because we have moved the connection.features.supports_aggregate_filter_clause into the AggregateFilter.as_sql function, but have historically allowed for this check to fail and convert it to a WHEN statement. Now, though, when converting to the WHEN, we are throwing the NotSupportedError again. If you have thoughts on how you would like to see that flow, I'm open to suggestions.

@charettes
Copy link
Member

At least some of them appear to be because we have moved the connection.features.supports_aggregate_filter_clause into the AggregateFilter.as_sql function, but have historically allowed for this check to fail and convert it to a WHEN statement. Now, though, when converting to the WHEN, we are throwing the NotSupportedError again. If you have thoughts on how you would like to see that flow, I'm open to suggestions.

The following should do

diff --git a/django/db/models/aggregates.py b/django/db/models/aggregates.py
index 6cf0bd9a60..d070944039 100644
--- a/django/db/models/aggregates.py
+++ b/django/db/models/aggregates.py
@@ -29,6 +29,10 @@ class AggregateFilter(Func):
     arity = 1
     template = " FILTER (WHERE %(expressions)s)"

+    @property
+    def condition(self):
+        return self.source_expressions[0]
+
     def as_sql(self, compiler, connection, **extra_context):
         if not connection.features.supports_aggregate_filter_clause:
             raise NotSupportedError
@@ -187,7 +191,7 @@ def as_sql(self, compiler, connection, **extra_context):
                 copy = self.copy()
                 copy.filter = None
                 source_expressions = copy.get_source_expressions()
-                condition = When(self.filter, then=source_expressions[0])
+                condition = When(self.filter.condition, then=source_expressions[0])
                 copy.set_source_expressions([Case(condition)] + source_expressions[1:])
                 return copy.as_sql(compiler, connection, **extra_context)

The problem was that we were building the fallback When with the AggregateFilter instead of its underlying condition: Q.

@camuthig camuthig force-pushed the merge-orderable-agg-mixin branch 3 times, most recently from 3f5a19c to 26fa18a Compare August 15, 2024 15:53
@camuthig
Copy link
Contributor Author
camuthig commented Aug 16, 2024

@charettes thanks for the recommendation. It works great.

I spent time digging into the other failures in MySQL and determined that the MySQL implementation of GROUP_CONCAT doesn't actually play nicely with any of the other engine implementations. I have some code together that passed the aggregate tests on both MySQL and Postgres locally, but it is certainly messy and probably brittle.

First, it allows for multiple expressions, which we can ignore for now. The bigger issues are that MySQL doesn't support filtering on aggregates and it uses a different format order between the delimiter and ordering.

Sqlite: GROUP_CONCAT(expression, delimiter ORDER BY order_by)
MySQL: GROUP_CONCAT(expressions* ORDER BY order_by SEPARATOR delimiter)

So here, if you want to introduce a delimiter you MUST have a different template, which is solvable. The bigger issue is that the order matters between ORDER BY and SEPARATOR: SEPARATOR delimiter ORDER BY order_by is invalid syntax. This required me to manually build the parameters in as_sql and force the order of things to get the parameters into the right order depending on the engine.

This was further compounded by the fact MySQL doesn't support aggregate filtering and we are forcing a CASE statement. So if you have filter, order_by, and delimiter you need a third ordering of parameters, with the filter parameters coming before even the expressions. 😞

Do you think there are some level of feature flags we can throw on MySQL to make this easier? Or maybe allow for engine-specific default separators to be ignored in our queries? I have a commit with some experimentation here: cbe5012

@camuthig camuthig force-pushed the merge-orderable-agg-mixin branch from 34c8fe0 to cbe5012 Compare August 16, 2024 20:56
@charettes
Copy link
Member

I have a commit with some experimentation here: cbe5012

Thanks for pushing a commit demonstrating the scope of the problem.

Looking at it my immediate reaction would be to avoid over complicating StringAgg.as_sql and favor encapsulating the logic entirely in as_mysql instead. In there you'll know that the FILTER clause is not usable and will likely be able to simplify things quite a bit from some local testing locally.

Or maybe allow for engine-specific default separators to be ignored in our queries?

I'm not sure I'm understanding what you mean here.

Now that the MySQL tests are passing I'll give Oracle a test run as well.

@charettes
Copy link
Member

buildbot, test on oracle.

@camuthig
Copy link
Contributor Author
camuthig commented Sep 2, 2024

@charettes do you have recommendations on how to test the Oracle engine myself? I am having trouble with the Oracle Docker images. Maybe cause I am on a M1 Mac? I'm not sure. It looks like a pretty common errors, so it seems some change I have made at a base level is causing the compiled queries to be incorrect but I can't really tell why without running against the database.

@charettes
Copy link
Member

@camuthig I usually try two approaches on my M1

  1. I had success once I ran brew install colima and tried again
  2. Low-tech approach is to attempt to generate the raw SQL locally and rely on tools such as dbfiddle.

I really wish CI ran with --debug-sql at least so we could peak at the generated SQL

Any suggestions @felixxm?

@charettes
Copy link
Member
charettes commented Sep 2, 2024

I managed to get tests running on Oracle here's the non-localized error message

======================================================================
ERROR: test_annotate_values_list (aggregation.tests.AggregateTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/django/source/django/db/backends/utils.py", line 105, in _execute
    return self.cursor.execute(sql, params)
  File "/django/source/django/db/backends/oracle/base.py", line 577, in execute
    return self.cursor.execute(query, self._param_generator(params))
  File "/usr/local/lib/python3.10/site-packages/oracledb/cursor.py", line 710, in execute
    impl.execute(self)
  File "src/oracledb/impl/thin/cursor.pyx", line 196, in oracledb.thin_impl.ThinCursorImpl.execute
  File "src/oracledb/impl/thin/protocol.pyx", line 440, in oracledb.thin_impl.Protocol._process_single_message
  File "src/oracledb/impl/thin/protocol.pyx", line 441, in oracledb.thin_impl.Protocol._process_single_message
  File "src/oracledb/impl/thin/protocol.pyx", line 433, in oracledb.thin_impl.Protocol._process_message
  File "src/oracledb/impl/thin/messages.pyx", line 74, in oracledb.thin_impl.Message._check_and_raise_exception
oracledb.exceptions.DatabaseError: ORA-22848: cannot use NCLOB type as comparison key
Help: https://docs.oracle.com/error-help/db/ora-22848/

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/django/source/tests/aggregation/tests.py", line 939, in test_annotate_values_list
    self.assertEqual(list(books), [(self.b1.id, "159059725", 34.5)])
  File "/django/source/django/db/models/query.py", line 381, in __iter__
    self._fetch_all()
  File "/django/source/django/db/models/query.py", line 1909, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/django/source/django/db/models/query.py", line 229, in __iter__
    return compiler.results_iter(
  File "/django/source/django/db/models/sql/compiler.py", line 1536, in results_iter
    results = self.execute_sql(
  File "/django/source/django/db/models/sql/compiler.py", line 1585, in execute_sql
    cursor.execute(sql, params)
  File "/django/source/django/db/backends/utils.py", line 122, in execute
    return super().execute(sql, params)
  File "/django/source/django/db/backends/utils.py", line 79, in execute
    return self._execute_with_wrappers(
  File "/django/source/django/db/backends/utils.py", line 92, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/django/source/django/db/backends/utils.py", line 100, in _execute
    with self.db.wrap_database_errors:
  File "/django/source/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/django/source/django/db/backends/utils.py", line 105, in _execute
    return self.cursor.execute(sql, params)
  File "/django/source/django/db/backends/oracle/base.py", line 577, in execute
    return self.cursor.execute(query, self._param_generator(params))
  File "/usr/local/lib/python3.10/site-packages/oracledb/cursor.py", line 710, in execute
    impl.execute(self)
  File "src/oracledb/impl/thin/cursor.pyx", line 196, in oracledb.thin_impl.ThinCursorImpl.execute
  File "src/oracledb/impl/thin/protocol.pyx", line 440, in oracledb.thin_impl.Protocol._process_single_message
  File "src/oracledb/impl/thin/protocol.pyx", line 441, in oracledb.thin_impl.Protocol._process_single_message
  File "src/oracledb/impl/thin/protocol.pyx", line 433, in oracledb.thin_impl.Protocol._process_message
  File "src/oracledb/impl/thin/messages.pyx", line 74, in oracledb.thin_impl.Message._check_and_raise_exception
django.db.utils.DatabaseError: ORA-22848: cannot use NCLOB type as comparison key
Help: https://docs.oracle.com/error-help/db/ora-22848/

Executed SQL

SELECT "AGGREGATION_BOOK"."ID" AS "PK",
       "AGGREGATION_BOOK"."ISBN" AS "ISBN",
       AVG("AGGREGATION_AUTHOR"."AGE") AS "MEAN_AGE"
FROM "AGGREGATION_BOOK"
LEFT OUTER JOIN "AGGREGATION_BOOK_AUTHORS" ON ("AGGREGATION_BOOK"."ID" = "AGGREGATION_BOOK_AUTHORS"."BOOK_ID")
LEFT OUTER JOIN "AGGREGATION_AUTHOR" ON ("AGGREGATION_BOOK_AUTHORS"."AUTHOR_ID" = "AGGREGATION_AUTHOR"."ID")
WHERE "AGGREGATION_BOOK"."ID" = 1
GROUP BY "AGGREGATION_BOOK"."ID",
         "AGGREGATION_BOOK"."ISBN",
         "AGGREGATION_BOOK"."NAME",
         "AGGREGATION_BOOK"."PAGES",
         "AGGREGATION_BOOK"."RATING",
         "AGGREGATION_BOOK"."PRICE",
         "AGGREGATION_BOOK"."CONTACT_ID",
         "AGGREGATION_BOOK"."PUBLISHER_ID",
         "AGGREGATION_BOOK"."PUBDATE",
         "AGGREGATION_BOOK"."PRINT_INFO"

Executed SQL on main

SELECT "AGGREGATION_BOOK"."ID" AS "PK",
       "AGGREGATION_BOOK"."ISBN" AS "ISBN",
       AVG("AGGREGATION_AUTHOR"."AGE") AS "MEAN_AGE"
FROM "AGGREGATION_BOOK"
LEFT OUTER JOIN "AGGREGATION_BOOK_AUTHORS" ON ("AGGREGATION_BOOK"."ID" = "AGGREGATION_BOOK_AUTHORS"."BOOK_ID")
LEFT OUTER JOIN "AGGREGATION_AUTHOR" ON ("AGGREGATION_BOOK_AUTHORS"."AUTHOR_ID" = "AGGREGATION_AUTHOR"."ID")
WHERE "AGGREGATION_BOOK"."ID" = 21
GROUP BY "AGGREGATION_BOOK"."ID",
         "AGGREGATION_BOOK"."ISBN",
         "AGGREGATION_BOOK"."NAME",
         "AGGREGATION_BOOK"."PAGES",
         "AGGREGATION_BOOK"."RATING",
         "AGGREGATION_BOOK"."PRICE",
         "AGGREGATION_BOOK"."CONTACT_ID",
         "AGGREGATION_BOOK"."PUBLISHER_ID",
         "AGGREGATION_BOOK"."PUBDATE"

Seems like "AGGREGATION_BOOK"."PRINT_INFO" is included in the GROUP BY on your branch.

Copy link
Member
@charettes charettes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oracle failures were due to your addition of the print_info field.

@@ -30,6 +30,7 @@ class Book(models.Model):
contact = models.ForeignKey(Author, models.CASCADE, related_name="book_contact_set")
publisher = models.ForeignKey(Publisher, models.CASCADE)
pubdate = models.DateField()
print_info = models.JSONField(null=True, default=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@camuthig this is the source of all your Oracle test failures...

Django uses a NCLOB column to persists JSONField and Oracle doesn't allow to GROUP BY a NCLOB field.

I don't think the usage of a JSONField is mandatory to test this feature so I'd suggest using a different field type instead.

Comment on lines 2281 to 2287
@skipUnlessDBFeature("supports_json_field", "supports_aggregate_order_by_clause")
def test_string_agg_jsonfield_order_by(self):
values = Book.objects.aggregate(
stringagg=StringAgg(
KeyTextTransform("lang", "print_info"),
delimiter=Value(","),
order_by=KeyTextTransform("lang", "print_info"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this was ported from the Postgres tests.

I suggest creating a distinct model for this particular case instead.

@felixxm
Copy link
Member
felixxm commented Sep 3, 2024

@camuthig I usually try two approaches on my M1

1. I had success once I ran `brew install colima` and tried again

2. Low-tech approach is to attempt to generate the raw SQL locally and rely on tools such as [dbfiddle](https://dbfiddle.uk/Wh4zpMKJ).

I really wish CI ran with --debug-sql at least so we could peak at the generated SQL

Any suggestions @felixxm?

The "easiest" way is to use VM provided by Oracle.

@camuthig camuthig force-pushed the merge-orderable-agg-mixin branch 3 times, most recently from 2322b98 to ca4efc0 Compare January 20, 2025 05:32
def test_ordering_and_order_by_causes_error(self):
with self.assertWarns(RemovedInDjango61Warning):
with warnings.catch_warnings(record=True, action="always") as wm:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things got a little weird here when I update the warnings to use 7.0. The issue being that interacting with the class this way triggers both a 6.1 warning (for the order_by usage) and 7.0 warnings (for using the PostgreSQL StringAgg). I couldn't get assertWarns to work with both getting thrown so dug in and tried using the underlying behaviors of it to get it down.

Let me know if there is a better way to write this, and I will refactor it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the way you did it is great, it's a peculiar case given it triggers two distinct warnings of different categories 👍

@camuthig camuthig force-pushed the merge-orderable-agg-mixin branch from ca4efc0 to a8ab0e8 Compare January 20, 2025 05:41
@camuthig
Copy link
Contributor Author

I think everything should be up to date and marked as deprecated as of 6.0 and to be removed in 7.0. I updated the documentation as requested as well.

I ran into testing issues around the layers of deprecation warnings. I have a working solution for it, but I'm not sure it is the right pattern.

@sarahboyce sarahboyce force-pushed the merge-orderable-agg-mixin branch from a8ab0e8 to 0b586c4 Compare February 25, 2025 13:56
@sarahboyce
Copy link
Contributor

buildbot, test on oracle.

@sarahboyce sarahboyce force-pushed the merge-orderable-agg-mixin branch 2 times, most recently from 8115db1 to 5f75fc9 Compare February 25, 2025 14:15
@sarahboyce
Copy link
Contributor

@camuthig I have checked the deprecations and rebased 👍
We have some failures since this commit a76035e
For me, these go away if I do:

--- a/django/db/models/aggregates.py
+++ b/django/db/models/aggregates.py
@@ -169,7 +169,7 @@ class Aggregate(Func):
     @property
     def default_alias(self):
         expressions = [
-            expr for expr in self.get_source_expressions() if expr is not None
+            expr for expr in self.get_source_expressions() if expr
         ]
         if len(expressions) == 1 and hasattr(expressions[0], "name"):
             return "%s__%s" % (expressions[0].name, self.name.lower())
diff --git a/django/db/models/expressions.py b/django/db/models/expressions.py
index 68e5d2667e..b11147987a 100644
--- a/django/db/models/expressions.py
+++ b/django/db/models/expressions.py
@@ -298,7 +298,7 @@ class BaseExpression:
         source_expressions = [
             (
                 expr.resolve_expression(query, allow_joins, reuse, summarize)
-                if expr is not None
+                if expr
                 else None
             )
             for expr in c.get_source_expressions()

But I think @charettes might be best placed to advise here

@charettes
Copy link
Member

I should be able to have a look at it shortly. Looks at the test failures I assume this is happening because some aggregate is stashing tuple instances in source_expressions which shouldn't happen (only Resolvable | None should be allowed).

The is None is important as otherwise if a resolvable with a __bool__ causing side-effect is passed it will be evaluated. The classic example is QuerySet which evaluates its query.

Comment on lines 98 to 101
self.order_by = order_by
self.default = default
self.order_by = order_by and AggregateOrderBy.from_param(
f"{self.__class__.__name__}.order_by", order_by
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something when wrong with the rebase here, we assign to self.order_by twice?

@charettes
Copy link
Member

@sarahboyce I haven't figured out in which commit these should be folded but here are the required changes to get the tests passing

diff --git a/django/contrib/postgres/aggregates/mixins.py b/django/contrib/postgres/aggregates/mixins.py
index a6849c3930..8d3e40177b 100644
--- a/django/contrib/postgres/aggregates/mixins.py
+++ b/django/contrib/postgres/aggregates/mixins.py
@@ -16,7 +16,7 @@ def __init__(self, *expressions, ordering=(), order_by=(), **extra):
             if order_by:
                 raise TypeError("Cannot specify both order_by and ordering.")
             order_by = ordering
-
+        order_by = order_by or None
         super().__init__(*expressions, order_by=order_by, **extra)


diff --git a/django/db/models/aggregates.py b/django/db/models/aggregates.py
index 8b644e3599..48e00aa946 100644
--- a/django/db/models/aggregates.py
+++ b/django/db/models/aggregates.py
@@ -95,7 +95,6 @@ def __init__(

         self.distinct = distinct
         self.filter = filter and AggregateFilter(filter)
-        self.order_by = order_by
         self.default = default
         self.order_by = order_by and AggregateOrderBy.from_param(
             f"{self.__class__.__name__}.order_by", order_by

@charettes
Copy link
Member

We might want to also consider having OrderByList.from_params return None when passed an empty tuple | list to prevent similar crashes when Aggregate(order_by) is used directly instead.

diff --git a/django/db/models/aggregates.py b/django/db/models/aggregates.py
index 8b644e3599..a791593508 100644
--- a/django/db/models/aggregates.py
+++ b/django/db/models/aggregates.py
@@ -95,9 +95,8 @@ def __init__(

         self.distinct = distinct
         self.filter = filter and AggregateFilter(filter)
-        self.order_by = order_by
         self.default = default
-        self.order_by = order_by and AggregateOrderBy.from_param(
+        self.order_by = AggregateOrderBy.from_param(
             f"{self.__class__.__name__}.order_by", order_by
         )
         super().__init__(*expressions, **extra)
diff --git a/django/db/models/expressions.py b/django/db/models/expressions.py
index 68e5d2667e..ced5557732 100644
--- a/django/db/models/expressions.py
+++ b/django/db/models/expressions.py
@@ -1466,7 +1466,11 @@ def __init__(self, *expressions, **extra):

     @classmethod
     def from_param(cls, context, param):
+        if param is None:
+            return None
         if isinstance(param, (list, tuple)):
+            if not param:
+                return None
             return cls(*param)
         elif isinstance(param, str) or hasattr(param, "resolve_expression"):
             return cls(param)
@@ -1937,8 +1941,7 @@ def __init__(
                 self.partition_by = (self.partition_by,)
             self.partition_by = ExpressionList(*self.partition_by)

-        if self.order_by is not None:
-            self.order_by = OrderByList.from_param("Window.order_by", self.order_by)
+        self.order_by = OrderByList.from_param("Window.order_by", self.order_by)
         super().__init__(output_field=output_field)
         self.source_expression = self._parse_expressions(expression)[0]

@sarahboyce sarahboyce force-pushed the merge-orderable-agg-mixin branch 2 times, most recently from f00f27a to 20c8a9b Compare February 27, 2025 07:26
This moves the behaviors of `order_by` used in Postgres aggregates into
the `Aggregate` class. This allows for creating aggregate functions that
support this behavior across all database engines. This is shown by
moving the `StringAgg` class into the shared `aggregates` module and
adding support for all databases. The Postgres `StringAgg` class is now
a thin wrapper on the new shared `StringAgg` class.

Thank you Simon Charette for the review.
@sarahboyce sarahboyce force-pushed the merge-orderable-agg-mixin branch from 20c8a9b to 6f8a3e4 Compare March 3, 2025 08:10
@sarahboyce
Copy link
Contributor

buildbot, test on oracle.

This commit does not create any functional changes, but marks the
existing `OrderableAggMixin` class as deprecated so that developers
using it directly can be made aware of its future removal.
@sarahboyce sarahboyce force-pushed the merge-orderable-agg-mixin branch from 6f8a3e4 to 7a7aabf Compare March 3, 2025 09:54
Copy link
Contributor
@sarahboyce sarahboyce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Great job on this ⭐

@sarahboyce sarahboyce merged commit 1759c1d into django:main Mar 3, 2025
32 checks passed
@charettes
Copy link
Member

Thank you for sticking around and seeing this through @camuthig, that was quite the 10 months adventure. Hopefully it was a valuable experience for you!

@camuthig
Copy link
Contributor Author
camuthig commented Mar 4, 2025

Woo! We did it!

Thanks for all the help along the way, @charettes . It's definitely been a good experience for me. I learned a lot about how the ORM is working behind the scenes, which was my own goal. Now that we have this part done, I look forward to moving some of the other Postgres-specific functions over to general support. At least ArrayAgg but maybe something of the JSON agg too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0