Fixed #35444 -- Added generic support for Aggregate.order_by #18361

camuthig · 2024-07-11T14:22:27Z

Trac ticket number

ticket-35444

Branch description

There are three commits to this draft that represent the three phases of the change:

Deprecate the ordering argument on the PostgreSQL aggregate functions and replace it with order_by to create a consistent naming convention
Add generic support for Aggregate.order_by and migrate the Postgres-specific StringAgg class to the shared aggregates module, deprecating the Postgres version.
Deprecate the OrderableAggMixin

Checklist

This PR targets the main branch.
The commit message is written in past tense, mentions the ticket number, and ends with a period.
I have checked the "Has patch" ticket flag in the Trac system.
I have added or updated relevant tests.
I have added or updated relevant docs, including release notes if applicable.

camuthig · 2024-07-11T14:30:22Z

@charettes I've made some progress on the suggestions. I have the three phases broken out into unique commits and have added some of the documentation and copied the tests from the postgres module into a new class that should run against all of the database backends.

I ran into two issues in the process that I wanted to get your opinion on.

Python is compiled against Sqlite3 version 3.37.0, which doesn't have support for the order by yet. So the ordering tests aren't running for that. I think I would need to compile Python from source and point it to a specific version of Sqlite3 to get past this, but I'm not sure if you have any thoughts.
Sqlite3 doesn't behave well when you combine distinct and a delimiter value. It throws an error of SQLite doesn't support DISTINCT on aggregate functions accepting multiple arguments. So it treats the delimiter as a second argument. In Sqlite if you don't provide a delimiter it will use a comma as a default. I was trying to build it such that if the user sets the delimiter to a comma and is using Sqlite as a backend, then the delimiter is ignored. However, I could figure out how to make that work. I tried a few things around custom SQL, using a class attribute instead of making it part of expressions, etc. but I couldn't quite get it right. If you can see a good way to treat that, I think it would be awesome to make distinct work in Sqlite for at least some scenarios, if not all.

As a side note, I named it StringAgg because that is what I am used to, and it feels like the more descriptive function name between that a GROUP_CONCAT, but I'm open to changing that to whatever you think is best. I guess would could always have an alias class class GroupConcat(StringAgg): pass if we really wanted to make it smooth for developers from either environment, but that doesn't seem to pass the smell test on "one way to do something"

charettes · 2024-07-15T02:01:00Z

Hello @camuthig, thanks for spinning this up!

Python is compiled against Sqlite3 version 3.37.0, which doesn't have support for the order by yet. So the ordering tests aren't running for that. I think I would need to compile Python from source and point it to a specific version of Sqlite3 to get past this, but I'm not sure if you have any thoughts.

It's possible CI might not be be setup here but if you're a *nix setup you can use LD_PRELOAD to point at any SQLite version. You don't have to build from source as the SQLite project provides pre-built binaries.

Sqlite3 doesn't behave well when you combine distinct and a delimiter value. It throws an error of SQLite doesn't support DISTINCT on aggregate functions accepting multiple arguments. So it treats the delimiter as a second argument. In Sqlite if you don't provide a delimiter it will use a comma as a default. I was trying to build it such that if the user sets the delimiter to a comma and is using Sqlite as a backend, then the delimiter is ignored. However, I could figure out how to make that work. I tried a few things around custom SQL, using a class attribute instead of making it part of expressions, etc. but I couldn't quite get it right. If you can see a good way to treat that, I think it would be awesome to make distinct work in Sqlite for at least some scenarios, if not all.

Well that's odd for sure.

In any aggregate function that takes a single argument, that argument can be preceded by the keyword DISTINCT. In such cases, duplicate elements are filtered before being passed into the aggregate function. For example, the function "count(distinct X)" will return the number of distinct values of column X instead of the total number of non-null values in column X.

Usually when we run into these edge cases we add a database feature (e.g. supports_aggregate_distinct_multiple_argument) and use for two purpose.

Adjust Aggregate.as_sql to raise an error when self.distinct and not connection.features.supports_aggregate_distinct_multiple_argument and len(super().get_expressions()) > 1
Add tests for @skipUnlessDBFeature("supports_aggregate_distinct_multiple_argument") that cover the backends that support it and @skipIfDBFeature that makes sure the proper exception is raised.

As a side note, I named it StringAgg because that is what I am used to, and it feels like the more descriptive function name between that a GROUP_CONCAT, but I'm open to changing that to whatever you think is best.

I think StringAgg is fine as that's the name most backends use (Postgres, SQLite, SQLServer)

charettes

Thanks for the patch @camuthig!

I left a few comments and pointers. My main points of feedbacks are

delimiter should not be wrapped in a Value as that prevent references to columns and normally string arguments are considered to be F'ield references.
I appreciate the attention to testing but we should focus them on StringAgg behaviour and order_by support. Did you notice anything special when subqueries were involved?

charettes · 2024-07-15T02:02:38Z

django/contrib/postgres/aggregates/general.py

+        warnings.warn(
+            "The PostgreSQL specific StringAgg function is deprecated. Use "
+            "django.db.models.aggregate.StringAgg instead.",
+            category=RemovedInDjango60Warning,


We might have to adjust the stacklevel here so the warning points at the location of StringAgg instantiation. I think stacklevel=1 is the right one?

I think we'll need stacklevel=1 at the very least?

django/contrib/postgres/aggregates/mixins.py

django/db/models/aggregates.py

charettes · 2024-07-15T02:11:46Z

django/db/models/aggregates.py

+    def as_sql(self, compiler, connection, **extra_context):
+        return super().as_sql(compiler, connection, **extra_context)


I think this one can go?

charettes · 2024-07-15T02:14:52Z

tests/aggregation/tests.py

+class StringAggTestModel(Model):
+    char_field = CharField(max_length=30, blank=True)
+    text_field = TextField(blank=True)
+    json_field = JSONField(null=True)
+
+
+class StatTestModel(Model):
+    int1 = IntegerField()
+    int2 = IntegerField()
+    related_field = ForeignKey(StringAggTestModel, SET_NULL, null=True)


Try reuse the existing models defined in aggregation/models.py instead. We can't add new tables/models for each new features otherwise the suite slowly gets slower.

These tests are taken directly from the PostgreSQL specific of these tests. My thoughts were

We are deprecating that class in 6.0, so we will delete all of the StringAgg tests from that module at that time

Those tests exist for a reason, even if I am not familiar with them, so I didn't want to drop any and cause a regression

This is why I have some tests that may seem very specific, like the subqueries, and why I went about defining these models. It kept me from re-implementing the same behaviors in a way that might cause a regression. However, I am open to using the models already defined in aggregation/models.py, I'll just need to sit with the data a bit to come up with the right expectations. I think it might be best to keep the tests, like the subqueries, in place to make sure we maintain the coverage. If you think those behaviors are covered in other tests, though, then I'm open to cleaning them up.

That makes sense, if they already exist we should definitely just port them over.

However, I am open to using the models already defined in aggregation/models.py

If you could port them while adapting them to use the existing models that would be awesome.

Also, if we're porting them over we should delete the original ones from postgres_tests given postgres.aggregates.StringAgg will only a shim at this point.

I did some work to merge the tests into the existing AggregateTestCase, keeping some of the more esoteric ones specific to the StringAgg behaviors but making them work with the existing data/models. I had to add a JSON field to the existing models to that end, but otherwise, I think it works pretty well.

tests/expressions_window/tests.py

tests/aggregation/tests.py

charettes · 2024-07-15T02:22:13Z

tests/aggregation/tests.py

+    def test_default_argument(self):
+        StringAggTestModel.objects.all().delete()
+        tests = [
+            (StringAgg("char_field", delimiter=";", default="<empty>"), "<empty>"),


I would expect this to crash as default must be passed as Value? Passing str to Func should normally attempt resolve to field references (e.g. `"" -> F("")") and crash if missing.

delimiter should not be wrapped in a Value as that prevent references to columns and normally string arguments are considered to be F'ield references.

The Value wrapper on a string comes right from the old implementation in the Postgres-specific version. Would it make more sense to allow for the string to come through and wrap it as a value for 5.2 with a deprecation warning in 6? That does break the ability to use a string as an alias for F for 5.2, though.

I think I have something together that wraps the __init__ from the Postgres version of the StringAgg class and does the Value(str(delimiter)) and logs a deprecation warning. The main StringAgg can then expect an Expression of some sort.

I think I have something together that wraps the init from the Postgres version of the StringAgg class and does the Value(str(delimiter)) and logs a deprecation warning. The main StringAgg can then expect an Expression of some sort.

That's perfect, using the deprecation period for postgres.StringAgg to models.StringAgg seems like a great way to normalize this behavior 🎉

django/contrib/postgres/aggregates/general.py

camuthig · 2024-08-03T04:20:50Z

@charettes I think I have updates for all of your comments at this phase. I was able to build a custom python environment with pyenv and brew using guidance here and test the group concat and ordering behaviors with a more recent version of sqlite, so that is cool.

I know there are some conflicts and I can work on cleaning those up. I'm also not totally sure what is wrong with the docs. I ran the same command on my own machine, and everything seems to build properly.

charettes

This is looking great, I left a few comments regarding null handling and a few other things.

I like that you started to breakdown the commits I may suggest a different order and naming scheme.

Refs #35444 -- Deprecated contrib.postgres aggregates ordering for order_by.

(Explain the reasoning to better align with Window.order_by and plans to move support for Aggregate)

Fixed #35444 -- Add generic support for Aggregate.order_by.

(Explain that this now supersedes the contrib.postgres support and introduces StringAgg to exercise this support)

^^ this commit should make OrderableAggMixin unused.

Refs #35444 -- Deprecated contrib.postgres.OrderableAggMixin

(Explain that it serves no purposes now that Aggregate supports order_by but it is kept around for a deprecation cycle as some users might be relying on it)

charettes · 2024-08-04T15:54:50Z

django/contrib/postgres/aggregates/general.py

+                "String delimiters will be converted to F statements instead of Value"
+                "statements. Explicit Value instances should be used instead.",


Suggested change

"String delimiters will be converted to F statements instead of Value"

"statements. Explicit Value instances should be used instead.",

"delimiter: str will be resolved as a field reference instead of a string literal"

f"on Django 6.0. Pass `delimiter=Value({delimiter!r})` to preserve the previous ."

"behaviour.",

django/contrib/postgres/aggregates/general.py

charettes · 2024-08-04T15:56:01Z

django/contrib/postgres/aggregates/general.py

+        warnings.warn(
+            "The PostgreSQL specific StringAgg function is deprecated. Use "
+            "django.db.models.aggregate.StringAgg instead.",
+            category=RemovedInDjango60Warning,


I think we'll need stacklevel=1 at the very least?

django/db/backends/base/features.py

charettes · 2024-08-04T15:59:46Z

django/db/models/expressions.py

+        elif isinstance(param, (BaseExpression, str, F)):
+            return cls(param)


Suggested change

elif isinstance(param, (BaseExpression, str, F)):

return cls(param)

elif isinstance(param, str) or hasattr(param, "resolve_expression"):

return cls(param)

charettes · 2024-08-04T16:05:57Z

tests/aggregation/tests.py

+            # Different engines treat null STRING_AGG differently, so excluding it for
+            # consistency.


Could you elaborate on that?

I seems that SQLite, Postgres, MySQL, and Oracle all return NULL which should translate to None?

SQLite

MySQL

Postgres

Oracle

Did you possibly run into issues because of a lack of ordering on the returned value and the fact some backends default to NULLS LAST instead of NULLS FIRST?

If that's the case you should use assertQuerySetEqual(..., ordered=False) or specify an order_by(OrderBy('agg', nulls_last=True)).

I think it'd be good to include at least one tests that ensures that None is returned on an empty not-NULL set.

You are right, I was reading the test results wrong. I went with assertQuerySetEqual(..., ordered=False) to resolve this.

charettes · 2024-08-04T16:06:15Z

tests/aggregation/tests.py

+        """
+        This test is based on tests taken from existing PostgreSQL specific tests and
+        kept to avoid regressions as StringAgg is ported to the shared database module.
+        """


Suggested change

"""

This test is based on tests taken from existing PostgreSQL specific tests and

kept to avoid regressions as StringAgg is ported to the shared database module.

"""

tests/expressions_window/tests.py

charettes · 2024-08-04T16:09:59Z

tests/postgres_tests/test_aggregates.py

+    def test_ordering_warns_of_deprecation(self):
+        msg = "The ordering argument is deprecated. Use order_by instead."
+        with self.assertWarnsMessage(RemovedInDjango60Warning, msg):
+            values = AggregateTestModel.objects.aggregate(
+                arrayagg=ArrayAgg("integer_field", ordering=F("integer_field").desc())
+            )
+            self.assertEqual(values, {"arrayagg": [2, 1, 0, 0]})


We'll need also need tests for the creation of postgres.StringAgg.

django/contrib/postgres/aggregates/mixins.py

charettes · 2024-08-09T18:14:56Z

tests/postgres_tests/test_aggregates.py

+        with self.assertWarnsMessage(RemovedInDjango60Warning, msg):
+            values = AggregateTestModel.objects.aggregate(
+                arrayagg=ArrayAgg("integer_field", ordering=F("integer_field").desc())
+            )
+            self.assertEqual(values, {"arrayagg": [2, 1, 0, 0]})


The following should ensure that the proper stacklevel is passed to warnings.warn.

Suggested change

with self.assertWarnsMessage(RemovedInDjango60Warning, msg):

values = AggregateTestModel.objects.aggregate(

arrayagg=ArrayAgg("integer_field", ordering=F("integer_field").desc())

)

self.assertEqual(values, {"arrayagg": [2, 1, 0, 0]})

with self.assertWarnsMessage(RemovedInDjango60Warning, msg) ctx:

values = AggregateTestModel.objects.aggregate(

arrayagg=ArrayAgg("integer_field", ordering=F("integer_field").desc())

)

self.assertEqual(values, {"arrayagg": [2, 1, 0, 0]})

self.assertEqual(ctx.filename, __file__)

camuthig · 2024-08-15T02:57:28Z

@charettes I have update the code to hit all of your comments and get the linters passing. I will take a dive into the failing tests shortly. At least some of them appear to be because we have moved the connection.features.supports_aggregate_filter_clause into the AggregateFilter.as_sql function, but have historically allowed for this check to fail and convert it to a WHEN statement. Now, though, when converting to the WHEN, we are throwing the NotSupportedError again. If you have thoughts on how you would like to see that flow, I'm open to suggestions.

charettes · 2024-08-15T04:12:33Z

At least some of them appear to be because we have moved the connection.features.supports_aggregate_filter_clause into the AggregateFilter.as_sql function, but have historically allowed for this check to fail and convert it to a WHEN statement. Now, though, when converting to the WHEN, we are throwing the NotSupportedError again. If you have thoughts on how you would like to see that flow, I'm open to suggestions.

The following should do

diff --git a/django/db/models/aggregates.py b/django/db/models/aggregates.py
index 6cf0bd9a60..d070944039 100644
--- a/django/db/models/aggregates.py
+++ b/django/db/models/aggregates.py
@@ -29,6 +29,10 @@ class AggregateFilter(Func):
     arity = 1
     template = " FILTER (WHERE %(expressions)s)"

+    @property
+    def condition(self):
+        return self.source_expressions[0]
+
     def as_sql(self, compiler, connection, **extra_context):
         if not connection.features.supports_aggregate_filter_clause:
             raise NotSupportedError
@@ -187,7 +191,7 @@ def as_sql(self, compiler, connection, **extra_context):
                 copy = self.copy()
                 copy.filter = None
                 source_expressions = copy.get_source_expressions()
-                condition = When(self.filter, then=source_expressions[0])
+                condition = When(self.filter.condition, then=source_expressions[0])
                 copy.set_source_expressions([Case(condition)] + source_expressions[1:])
                 return copy.as_sql(compiler, connection, **extra_context)

The problem was that we were building the fallback When with the AggregateFilter instead of its underlying condition: Q.

camuthig · 2024-08-16T19:37:15Z

@charettes thanks for the recommendation. It works great.

I spent time digging into the other failures in MySQL and determined that the MySQL implementation of GROUP_CONCAT doesn't actually play nicely with any of the other engine implementations. I have some code together that passed the aggregate tests on both MySQL and Postgres locally, but it is certainly messy and probably brittle.

First, it allows for multiple expressions, which we can ignore for now. The bigger issues are that MySQL doesn't support filtering on aggregates and it uses a different format order between the delimiter and ordering.

Sqlite: GROUP_CONCAT(expression, delimiter ORDER BY order_by)
MySQL: GROUP_CONCAT(expressions* ORDER BY order_by SEPARATOR delimiter)

So here, if you want to introduce a delimiter you MUST have a different template, which is solvable. The bigger issue is that the order matters between ORDER BY and SEPARATOR: SEPARATOR delimiter ORDER BY order_by is invalid syntax. This required me to manually build the parameters in as_sql and force the order of things to get the parameters into the right order depending on the engine.

This was further compounded by the fact MySQL doesn't support aggregate filtering and we are forcing a CASE statement. So if you have filter, order_by, and delimiter you need a third ordering of parameters, with the filter parameters coming before even the expressions. 😞

Do you think there are some level of feature flags we can throw on MySQL to make this easier? Or maybe allow for engine-specific default separators to be ignored in our queries? I have a commit with some experimentation here: cbe5012

charettes · 2024-08-27T04:06:59Z

I have a commit with some experimentation here: cbe5012

Thanks for pushing a commit demonstrating the scope of the problem.

Looking at it my immediate reaction would be to avoid over complicating StringAgg.as_sql and favor encapsulating the logic entirely in as_mysql instead. In there you'll know that the FILTER clause is not usable and will likely be able to simplify things quite a bit from some local testing locally.

Or maybe allow for engine-specific default separators to be ignored in our queries?

I'm not sure I'm understanding what you mean here.

Now that the MySQL tests are passing I'll give Oracle a test run as well.

charettes · 2024-08-27T04:07:06Z

buildbot, test on oracle.

camuthig · 2024-09-02T21:27:14Z

@charettes do you have recommendations on how to test the Oracle engine myself? I am having trouble with the Oracle Docker images. Maybe cause I am on a M1 Mac? I'm not sure. It looks like a pretty common errors, so it seems some change I have made at a base level is causing the compiled queries to be incorrect but I can't really tell why without running against the database.

charettes · 2024-09-02T22:33:15Z

@camuthig I usually try two approaches on my M1

I had success once I ran brew install colima and tried again
Low-tech approach is to attempt to generate the raw SQL locally and rely on tools such as dbfiddle.

I really wish CI ran with --debug-sql at least so we could peak at the generated SQL

Any suggestions @felixxm?

charettes · 2024-09-02T23:33:52Z

I managed to get tests running on Oracle here's the non-localized error message

======================================================================
ERROR: test_annotate_values_list (aggregation.tests.AggregateTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/django/source/django/db/backends/utils.py", line 105, in _execute
    return self.cursor.execute(sql, params)
  File "/django/source/django/db/backends/oracle/base.py", line 577, in execute
    return self.cursor.execute(query, self._param_generator(params))
  File "/usr/local/lib/python3.10/site-packages/oracledb/cursor.py", line 710, in execute
    impl.execute(self)
  File "src/oracledb/impl/thin/cursor.pyx", line 196, in oracledb.thin_impl.ThinCursorImpl.execute
  File "src/oracledb/impl/thin/protocol.pyx", line 440, in oracledb.thin_impl.Protocol._process_single_message
  File "src/oracledb/impl/thin/protocol.pyx", line 441, in oracledb.thin_impl.Protocol._process_single_message
  File "src/oracledb/impl/thin/protocol.pyx", line 433, in oracledb.thin_impl.Protocol._process_message
  File "src/oracledb/impl/thin/messages.pyx", line 74, in oracledb.thin_impl.Message._check_and_raise_exception
oracledb.exceptions.DatabaseError: ORA-22848: cannot use NCLOB type as comparison key
Help: https://docs.oracle.com/error-help/db/ora-22848/

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/django/source/tests/aggregation/tests.py", line 939, in test_annotate_values_list
    self.assertEqual(list(books), [(self.b1.id, "159059725", 34.5)])
  File "/django/source/django/db/models/query.py", line 381, in __iter__
    self._fetch_all()
  File "/django/source/django/db/models/query.py", line 1909, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/django/source/django/db/models/query.py", line 229, in __iter__
    return compiler.results_iter(
  File "/django/source/django/db/models/sql/compiler.py", line 1536, in results_iter
    results = self.execute_sql(
  File "/django/source/django/db/models/sql/compiler.py", line 1585, in execute_sql
    cursor.execute(sql, params)
  File "/django/source/django/db/backends/utils.py", line 122, in execute
    return super().execute(sql, params)
  File "/django/source/django/db/backends/utils.py", line 79, in execute
    return self._execute_with_wrappers(
  File "/django/source/django/db/backends/utils.py", line 92, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/django/source/django/db/backends/utils.py", line 100, in _execute
    with self.db.wrap_database_errors:
  File "/django/source/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/django/source/django/db/backends/utils.py", line 105, in _execute
    return self.cursor.execute(sql, params)
  File "/django/source/django/db/backends/oracle/base.py", line 577, in execute
    return self.cursor.execute(query, self._param_generator(params))
  File "/usr/local/lib/python3.10/site-packages/oracledb/cursor.py", line 710, in execute
    impl.execute(self)
  File "src/oracledb/impl/thin/cursor.pyx", line 196, in oracledb.thin_impl.ThinCursorImpl.execute
  File "src/oracledb/impl/thin/protocol.pyx", line 440, in oracledb.thin_impl.Protocol._process_single_message
  File "src/oracledb/impl/thin/protocol.pyx", line 441, in oracledb.thin_impl.Protocol._process_single_message
  File "src/oracledb/impl/thin/protocol.pyx", line 433, in oracledb.thin_impl.Protocol._process_message
  File "src/oracledb/impl/thin/messages.pyx", line 74, in oracledb.thin_impl.Message._check_and_raise_exception
django.db.utils.DatabaseError: ORA-22848: cannot use NCLOB type as comparison key
Help: https://docs.oracle.com/error-help/db/ora-22848/

Executed SQL

SELECT "AGGREGATION_BOOK"."ID" AS "PK",
       "AGGREGATION_BOOK"."ISBN" AS "ISBN",
       AVG("AGGREGATION_AUTHOR"."AGE") AS "MEAN_AGE"
FROM "AGGREGATION_BOOK"
LEFT OUTER JOIN "AGGREGATION_BOOK_AUTHORS" ON ("AGGREGATION_BOOK"."ID" = "AGGREGATION_BOOK_AUTHORS"."BOOK_ID")
LEFT OUTER JOIN "AGGREGATION_AUTHOR" ON ("AGGREGATION_BOOK_AUTHORS"."AUTHOR_ID" = "AGGREGATION_AUTHOR"."ID")
WHERE "AGGREGATION_BOOK"."ID" = 1
GROUP BY "AGGREGATION_BOOK"."ID",
         "AGGREGATION_BOOK"."ISBN",
         "AGGREGATION_BOOK"."NAME",
         "AGGREGATION_BOOK"."PAGES",
         "AGGREGATION_BOOK"."RATING",
         "AGGREGATION_BOOK"."PRICE",
         "AGGREGATION_BOOK"."CONTACT_ID",
         "AGGREGATION_BOOK"."PUBLISHER_ID",
         "AGGREGATION_BOOK"."PUBDATE",
         "AGGREGATION_BOOK"."PRINT_INFO"

Executed SQL on main

SELECT "AGGREGATION_BOOK"."ID" AS "PK",
       "AGGREGATION_BOOK"."ISBN" AS "ISBN",
       AVG("AGGREGATION_AUTHOR"."AGE") AS "MEAN_AGE"
FROM "AGGREGATION_BOOK"
LEFT OUTER JOIN "AGGREGATION_BOOK_AUTHORS" ON ("AGGREGATION_BOOK"."ID" = "AGGREGATION_BOOK_AUTHORS"."BOOK_ID")
LEFT OUTER JOIN "AGGREGATION_AUTHOR" ON ("AGGREGATION_BOOK_AUTHORS"."AUTHOR_ID" = "AGGREGATION_AUTHOR"."ID")
WHERE "AGGREGATION_BOOK"."ID" = 21
GROUP BY "AGGREGATION_BOOK"."ID",
         "AGGREGATION_BOOK"."ISBN",
         "AGGREGATION_BOOK"."NAME",
         "AGGREGATION_BOOK"."PAGES",
         "AGGREGATION_BOOK"."RATING",
         "AGGREGATION_BOOK"."PRICE",
         "AGGREGATION_BOOK"."CONTACT_ID",
         "AGGREGATION_BOOK"."PUBLISHER_ID",
         "AGGREGATION_BOOK"."PUBDATE"

Seems like "AGGREGATION_BOOK"."PRINT_INFO" is included in the GROUP BY on your branch.

charettes

Oracle failures were due to your addition of the print_info field.

charettes · 2024-09-03T00:11:35Z

tests/aggregation/models.py

@@ -30,6 +30,7 @@ class Book(models.Model):
    contact = models.ForeignKey(Author, models.CASCADE, related_name="book_contact_set")
    publisher = models.ForeignKey(Publisher, models.CASCADE)
    pubdate = models.DateField()
+    print_info = models.JSONField(null=True, default=None)


@camuthig this is the source of all your Oracle test failures...

Django uses a NCLOB column to persists JSONField and Oracle doesn't allow to GROUP BY a NCLOB field.

I don't think the usage of a JSONField is mandatory to test this feature so I'd suggest using a different field type instead.

charettes · 2024-09-03T00:13:14Z

tests/aggregation/tests.py

+    @skipUnlessDBFeature("supports_json_field", "supports_aggregate_order_by_clause")
+    def test_string_agg_jsonfield_order_by(self):
+        values = Book.objects.aggregate(
+            stringagg=StringAgg(
+                KeyTextTransform("lang", "print_info"),
+                delimiter=Value(","),
+                order_by=KeyTextTransform("lang", "print_info"),


I guess this was ported from the Postgres tests.

I suggest creating a distinct model for this particular case instead.

felixxm · 2024-09-03T05:49:14Z

@camuthig I usually try two approaches on my M1
1. I had success once I ran `brew install colima` and tried again

2. Low-tech approach is to attempt to generate the raw SQL locally and rely on tools such as [dbfiddle](https://dbfiddle.uk/Wh4zpMKJ).
I really wish CI ran with --debug-sql at least so we could peak at the generated SQL

Any suggestions @felixxm?

The "easiest" way is to use VM provided by Oracle.

camuthig · 2025-01-20T05:35:25Z

tests/postgres_tests/test_aggregates.py

    def test_ordering_and_order_by_causes_error(self):
-        with self.assertWarns(RemovedInDjango61Warning):
+        with warnings.catch_warnings(record=True, action="always") as wm:


Things got a little weird here when I update the warnings to use 7.0. The issue being that interacting with the class this way triggers both a 6.1 warning (for the order_by usage) and 7.0 warnings (for using the PostgreSQL StringAgg). I couldn't get assertWarns to work with both getting thrown so dug in and tried using the underlying behaviors of it to get it down.

Let me know if there is a better way to write this, and I will refactor it.

I think the way you did it is great, it's a peculiar case given it triggers two distinct warnings of different categories 👍

camuthig · 2025-01-20T05:52:24Z

I think everything should be up to date and marked as deprecated as of 6.0 and to be removed in 7.0. I updated the documentation as requested as well.

I ran into testing issues around the layers of deprecation warnings. I have a working solution for it, but I'm not sure it is the right pattern.

sarahboyce · 2025-02-25T13:56:54Z

buildbot, test on oracle.

sarahboyce · 2025-02-26T13:27:02Z

@camuthig I have checked the deprecations and rebased 👍
We have some failures since this commit a76035e
For me, these go away if I do:

--- a/django/db/models/aggregates.py
+++ b/django/db/models/aggregates.py
@@ -169,7 +169,7 @@ class Aggregate(Func):
     @property
     def default_alias(self):
         expressions = [
-            expr for expr in self.get_source_expressions() if expr is not None
+            expr for expr in self.get_source_expressions() if expr
         ]
         if len(expressions) == 1 and hasattr(expressions[0], "name"):
             return "%s__%s" % (expressions[0].name, self.name.lower())
diff --git a/django/db/models/expressions.py b/django/db/models/expressions.py
index 68e5d2667e..b11147987a 100644
--- a/django/db/models/expressions.py
+++ b/django/db/models/expressions.py
@@ -298,7 +298,7 @@ class BaseExpression:
         source_expressions = [
             (
                 expr.resolve_expression(query, allow_joins, reuse, summarize)
-                if expr is not None
+                if expr
                 else None
             )
             for expr in c.get_source_expressions()

But I think @charettes might be best placed to advise here

charettes · 2025-02-26T14:17:42Z

I should be able to have a look at it shortly. Looks at the test failures I assume this is happening because some aggregate is stashing tuple instances in source_expressions which shouldn't happen (only Resolvable | None should be allowed).

The is None is important as otherwise if a resolvable with a __bool__ causing side-effect is passed it will be evaluated. The classic example is QuerySet which evaluates its query.

django/contrib/postgres/aggregates/mixins.py

charettes · 2025-02-26T19:02:23Z

django/db/models/aggregates.py

+        self.order_by = order_by
        self.default = default
+        self.order_by = order_by and AggregateOrderBy.from_param(
+            f"{self.__class__.__name__}.order_by", order_by
+        )


Something when wrong with the rebase here, we assign to self.order_by twice?

charettes · 2025-02-26T19:09:49Z

@sarahboyce I haven't figured out in which commit these should be folded but here are the required changes to get the tests passing

diff --git a/django/contrib/postgres/aggregates/mixins.py b/django/contrib/postgres/aggregates/mixins.py
index a6849c3930..8d3e40177b 100644
--- a/django/contrib/postgres/aggregates/mixins.py
+++ b/django/contrib/postgres/aggregates/mixins.py
@@ -16,7 +16,7 @@ def __init__(self, *expressions, ordering=(), order_by=(), **extra):
             if order_by:
                 raise TypeError("Cannot specify both order_by and ordering.")
             order_by = ordering
-
+        order_by = order_by or None
         super().__init__(*expressions, order_by=order_by, **extra)


diff --git a/django/db/models/aggregates.py b/django/db/models/aggregates.py
index 8b644e3599..48e00aa946 100644
--- a/django/db/models/aggregates.py
+++ b/django/db/models/aggregates.py
@@ -95,7 +95,6 @@ def __init__(

         self.distinct = distinct
         self.filter = filter and AggregateFilter(filter)
-        self.order_by = order_by
         self.default = default
         self.order_by = order_by and AggregateOrderBy.from_param(
             f"{self.__class__.__name__}.order_by", order_by

charettes · 2025-02-26T19:18:58Z

We might want to also consider having OrderByList.from_params return None when passed an empty tuple | list to prevent similar crashes when Aggregate(order_by) is used directly instead.

diff --git a/django/db/models/aggregates.py b/django/db/models/aggregates.py
index 8b644e3599..a791593508 100644
--- a/django/db/models/aggregates.py
+++ b/django/db/models/aggregates.py
@@ -95,9 +95,8 @@ def __init__(

         self.distinct = distinct
         self.filter = filter and AggregateFilter(filter)
-        self.order_by = order_by
         self.default = default
-        self.order_by = order_by and AggregateOrderBy.from_param(
+        self.order_by = AggregateOrderBy.from_param(
             f"{self.__class__.__name__}.order_by", order_by
         )
         super().__init__(*expressions, **extra)
diff --git a/django/db/models/expressions.py b/django/db/models/expressions.py
index 68e5d2667e..ced5557732 100644
--- a/django/db/models/expressions.py
+++ b/django/db/models/expressions.py
@@ -1466,7 +1466,11 @@ def __init__(self, *expressions, **extra):

     @classmethod
     def from_param(cls, context, param):
+        if param is None:
+            return None
         if isinstance(param, (list, tuple)):
+            if not param:
+                return None
             return cls(*param)
         elif isinstance(param, str) or hasattr(param, "resolve_expression"):
             return cls(param)
@@ -1937,8 +1941,7 @@ def __init__(
                 self.partition_by = (self.partition_by,)
             self.partition_by = ExpressionList(*self.partition_by)

-        if self.order_by is not None:
-            self.order_by = OrderByList.from_param("Window.order_by", self.order_by)
+        self.order_by = OrderByList.from_param("Window.order_by", self.order_by)
         super().__init__(output_field=output_field)
         self.source_expression = self._parse_expressions(expression)[0]

This moves the behaviors of `order_by` used in Postgres aggregates into the `Aggregate` class. This allows for creating aggregate functions that support this behavior across all database engines. This is shown by moving the `StringAgg` class into the shared `aggregates` module and adding support for all databases. The Postgres `StringAgg` class is now a thin wrapper on the new shared `StringAgg` class. Thank you Simon Charette for the review.

sarahboyce · 2025-03-03T08:11:03Z

buildbot, test on oracle.

This commit does not create any functional changes, but marks the existing `OrderableAggMixin` class as deprecated so that developers using it directly can be made aware of its future removal.

sarahboyce

Thank you! Great job on this ⭐

charettes · 2025-03-03T12:47:31Z

Thank you for sticking around and seeing this through @camuthig, that was quite the 10 months adventure. Hopefully it was a valuable experience for you!

camuthig · 2025-03-04T04:33:33Z

Woo! We did it!

Thanks for all the help along the way, @charettes . It's definitely been a good experience for me. I learned a lot about how the ORM is working behind the scenes, which was my own goal. Now that we have this part done, I look forward to moving some of the other Postgres-specific functions over to general support. At least ArrayAgg but maybe something of the JSON agg too.

charettes reviewed Jul 15, 2024

View reviewed changes

charettes reviewed Jul 22, 2024

View reviewed changes

django/contrib/postgres/aggregates/general.py Show resolved Hide resolved

camuthig force-pushed the merge-orderable-agg-mixin branch 3 times, most recently from 636e617 to 3280e3e Compare August 3, 2024 04:12

charettes reviewed Aug 4, 2024

View reviewed changes

charettes reviewed Aug 9, 2024

View reviewed changes

camuthig force-pushed the merge-orderable-agg-mixin branch 6 times, most recently from 333e523 to 7f8c533 Compare August 15, 2024 02:16

camuthig force-pushed the merge-orderable-agg-mixin branch 3 times, most recently from 3f5a19c to 26fa18a Compare August 15, 2024 15:53

camuthig force-pushed the merge-orderable-agg-mixin branch from 34c8fe0 to cbe5012 Compare August 16, 2024 20:56

charettes reviewed Sep 3, 2024

View reviewed changes

camuthig force-pushed the merge-orderable-agg-mixin branch 3 times, most recently from 2322b98 to ca4efc0 Compare January 20, 2025 05:32

camuthig commented Jan 20, 2025

View reviewed changes

camuthig force-pushed the merge-orderable-agg-mixin branch from ca4efc0 to a8ab0e8 Compare January 20, 2025 05:41

sarahboyce force-pushed the merge-orderable-agg-mixin branch from a8ab0e8 to 0b586c4 Compare February 25, 2025 13:56

sarahboyce force-pushed the merge-orderable-agg-mixin branch 2 times, most recently from 8115db1 to 5f75fc9 Compare February 25, 2025 14:15

charettes reviewed Feb 26, 2025

View reviewed changes

django/contrib/postgres/aggregates/mixins.py Show resolved Hide resolved

charettes reviewed Feb 26, 2025

View reviewed changes

sarahboyce force-pushed the merge-orderable-agg-mixin branch 2 times, most recently from f00f27a to 20c8a9b Compare February 27, 2025 07:26

sarahboyce force-pushed the merge-orderable-agg-mixin branch from 20c8a9b to 6f8a3e4 Compare March 3, 2025 08:10

Refs #35444 -- Deprecated contrib.postgres.OrderableAggMixin.

7a7aabf

This commit does not create any functional changes, but marks the existing `OrderableAggMixin` class as deprecated so that developers using it directly can be made aware of its future removal.

sarahboyce force-pushed the merge-orderable-agg-mixin branch from 6f8a3e4 to 7a7aabf Compare March 3, 2025 09:54

sarahboyce approved these changes Mar 3, 2025

View reviewed changes

sarahboyce merged commit 1759c1d into django:main Mar 3, 2025
32 checks passed

camuthig deleted the merge-orderable-agg-mixin branch March 4, 2025 04:33

This was referenced Apr 26, 2025

Fixed aggregation tests crash on databases that don't support JSONFields. #19422

Merged

Added configuration for using specific SQLite versions. django/django-docker-box#50

Open

		def as_sql(self, compiler, connection, **extra_context):
		return super().as_sql(compiler, connection, **extra_context)

		"String delimiters will be converted to F statements instead of Value"
		"statements. Explicit Value instances should be used instead.",

-                "String delimiters will be converted to F statements instead of Value"
-                "statements. Explicit Value instances should be used instead.",
+                "delimiter: str will be resolved as a field reference instead of a string literal"
+                f"on Django 6.0. Pass `delimiter=Value({delimiter!r})` to preserve the previous ."
+                "behaviour.",

		elif isinstance(param, (BaseExpression, str, F)):
		return cls(param)

		# Different engines treat null STRING_AGG differently, so excluding it for
		# consistency.

Uh oh!

Fixed #35444 -- Added generic support for Aggregate.order_by #18361

Fixed #35444 -- Added generic support for Aggregate.order_by #18361

Uh oh!

Conversation

Uh oh!

Trac ticket number

Branch description

Checklist

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment