8000 Reflect index's column operator class on PostgreSQL by dlax · Pull Request #12504 · sqlalchemy/sqlalchemy · GitHub
[go: up one dir, main page]

Skip to content

Reflect index's column operator class on PostgreSQL #12504

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

dlax
Copy link
Contributor
@dlax dlax commented Apr 7, 2025

Fill the postgresql_ops key of PostgreSQL's dialect_options returned by get_multi_indexes() with a mapping from column names to the operator class, if it's not the default for respective data type.

As we need to join on pg_catalog.pg_opclass, the table definition is added to postgresql.pg_catalog.

Fixes #8664.

@dlax dlax marked this pull request as ready for review April 7, 2025 14:11
@CaselIT
Copy link
Member
CaselIT commented Apr 7, 2025

Have yet to look, but it looks good.

Are there noticeable performances differences?
You can use the script https://github.com/sqlalchemy/sqlalchemy/blob/main/test/perf/many_table_reflection.py

@dlax
Copy link
Contributor Author
dlax commented Apr 8, 2025

Are there noticeable performances differences?

On main, commit d0bc9b2:

$ python test/perf/many_table_reflection.py --db 'postgresql+psycopg:///sqlaperf' --test indexes --sqlstats
Generating 250 using engine Engine(postgresql+psycopg:///sqlaperf) in schema default
Meta info {'tables': 250, 'columns': 32564, 'index': 959, 'check_con': 84, 'foreign_keys_con': 336, 'unique_con': 945, 'identity': 29}
Running create_tables ...... done
Reflected table number 250 in schema default
starting stats for subsequent tests
Running reflect_indexes ...... done
Running drop_tables ...... done
{'populate-meta': 0.7698001861572266,
 'create-tables': 3.912214756011963,
 'get_table_names': 0.012355327606201172,
 'get_indexes': 0.7168705463409424,
 'get_multi_indexes': 0.06203150749206543,
 'drop-tables': 0.7829418182373047}
--------------------------------------------------
total number of queries: 502. Total time 0.5161380767822266
--------------------------------------------------
Query times: total_t=0.3885214328765869, max_t=0.03788638114929199, min_t=0.001031637191772461, avg_t=0.0015478941548868004, median_t=0.0012700557708740234  Number of calls: 251
SELECT pg_catalog.pg_index.indrelid, pg_catalog.pg_class.relname, pg_catalog.pg_index.indisunique, pg_catalog.pg_constraint.conrelid IS NOT NULL AS has_constraint, pg_catalog.pg_index.indoption, pg_catalog.pg_class.reloptions, pg_catalog.pg_am.amname, CASE WHEN (pg_catalog.pg_index.indpred IS NOT NULL) THEN pg_catalog.pg_get_expr(pg_catalog.pg_index.indpred, pg_catalog.pg_index.indrelid) END AS filter_definition, pg_catalog.pg_index.indnkeyatts, pg_catalog.pg_index.indnullsnotdistinct, idx_cols.elements, idx_cols.elements_is_expr
FROM pg_catalog.pg_index JOIN pg_catalog.pg_class ON pg_catalog.pg_index.indexrelid = pg_catalog.pg_class.oid JOIN pg_catalog.pg_am ON pg_catalog.pg_class.relam = pg_catalog.pg_am.oid LEFT OUTER JOIN (SELECT idx_attr.indexrelid AS indexrelid, min(idx_attr.indrelid) AS min_1, array_agg(idx_attr.element ORDER BY idx_attr.ord) AS elements, array_agg(idx_attr.is_expr ORDER BY idx_attr.ord) AS elements_is_expr
FROM (SELECT idx.indexrelid AS indexrelid, idx.indrelid AS indrelid, idx.ord AS ord, CASE WHEN (idx.attnum = %(attnum_1)s::INTEGER) THEN pg_catalog.pg_get_indexdef(idx.indexrelid, idx.ord + %(ord_1)s::INTEGER, %(pg_get_indexdef_1)s) ELSE CAST(pg_catalog.pg_attribute.attname AS TEXT) END AS element, idx.attnum = %(attnum_2)s::INTEGER AS is_expr
FROM (SELECT pg_catalog.pg_index.indexrelid AS indexrelid, pg_catalog.pg_index.indrelid AS indrelid, unnest(pg_catalog.pg_index.indkey) AS attnum, generate_subscripts(pg_catalog.pg_index.indkey, %(generate_subscripts_1)s::INTEGER) AS ord
FROM pg_catalog.pg_index
WHERE NOT pg_catalog.pg_index.indisprimary AND pg_catalog.pg_index.indrelid IN (__[POSTCOMPILE_oids])) AS idx LEFT OUTER JOIN pg_catalog.pg_attribute ON pg_catalog.pg_attribute.attnum = idx.attnum AND pg_catalog.pg_attribute.attrelid = idx.indrelid
WHERE idx.indrelid IN (__[POSTCOMPILE_oids])) AS idx_attr GROUP BY idx_attr.indexrelid) AS idx_cols ON pg_catalog.pg_index.indexrelid = idx_cols.indexrelid LEFT OUTER JOIN pg_catalog.pg_constraint ON pg_catalog.pg_index.indrelid = pg_catalog.pg_constraint.conrelid AND pg_catalog.pg_index.indexrelid = pg_catalog.pg_constraint.conindid AND pg_catalog.pg_constraint.contype = ANY (ARRAY[%(param_1)s::VARCHAR, %(param_2)s::VARCHAR, %(param_3)s::VARCHAR])
WHERE pg_catalog.pg_index.indrelid IN (__[POSTCOMPILE_oids]) AND NOT pg_catalog.pg_index.indisprimary ORDER BY pg_catalog.pg_ind
8000
ex.indrelid, pg_catalog.pg_class.relname

Query times: total_t=0.12353086471557617, max_t=0.0009763240814208984, min_t=0.00039458274841308594, avg_t=0.0004941234588623047, median_t=0.0004544258117675781  Number of calls: 250
SELECT pg_catalog.pg_class.oid, pg_catalog.pg_class.relname
FROM pg_catalog.pg_class JOIN pg_catalog.pg_namespace ON pg_catalog.pg_namespace.oid = pg_catalog.pg_class.relnamespace
WHERE pg_catalog.pg_class.relkind = ANY (ARRAY[%(param_1)s::VARCHAR, %(param_2)s::VARCHAR, %(param_3)s::VARCHAR, %(param_4)s::VARCHAR, %(param_5)s::VARCHAR]) AND pg_catalog.pg_table_is_visible(pg_catalog.pg_class.oid) AND pg_catalog.pg_namespace.nspname != %(nspname_1)s::VARCHAR AND pg_catalog.pg_class.relname IN (__[POSTCOMPILE_filter_names])

Query times: total_t=0.0040857791900634766, max_t=0.0040857791900634766, min_t=0.0040857791900634766, avg_t=0.0040857791900634766, median_t=0.0040857791900634766  Number of calls: 1
SELECT pg_catalog.pg_class.oid, pg_catalog.pg_class.relname
FROM pg_catalog.pg_class JOIN pg_catalog.pg_namespace ON pg_catalog.pg_namespace.oid = pg_catalog.pg_class.relnamespace
WHERE pg_catalog.pg_class.relkind = ANY (ARRAY[%(param_1)s::VARCHAR, %(param_2)s::VARCHAR, %(param_3)s::VARCHAR]) AND pg_catalog.pg_class.relpersistence != %(relpersistence_1)s::VARCHAR AND pg_catalog.pg_table_is_visible(pg_catalog.pg_class.oid) AND pg_catalog.pg_namespace.nspname != %(nspname_1)s::VARCHAR

On this branch:

$ python test/perf/many_table_reflection.py --db 'postgresql+psycopg:///sqlaperf' --test indexes --sqlstats
Generating 250 using engine Engine(postgresql+psycopg:///sqlaperf) in schema default
Meta info {'tables': 250, 'columns': 32939, 'index': 984, 'check_con': 84, 'foreign_keys_con': 358, 'unique_con': 941, 'identity': 23}
Running create_tables ...... done
Reflected table number 250 in schema default
starting stats for subsequent tests
Running reflect_indexes ...... done
Running drop_tables ...... done
{'populate-meta': 0.7974131107330322,
 'create-tables': 4.5137574672698975,
 'get_table_names': 0.014792442321777344,
 'get_indexes': 1.0036499500274658,
 'get_multi_indexes': 0.048918724060058594,
 'drop-tables': 0.776090145111084}
--------------------------------------------------
total number of queries: 502. Total time 0.6737213134765625
--------------------------------------------------
Query times: total_t=0.5131850242614746, max_t=0.027825593948364258, min_t=0.0011928081512451172, avg_t=0.0020445618496473094, median_t=0.0017914772033691406  Number of calls: 251
SELECT pg_catalog.pg_index.indrelid, pg_catalog.pg_class.relname, pg_catalog.pg_index.indisunique, pg_catalog.pg_constraint.conrelid IS NOT NULL AS has_constraint, pg_catalog.pg_index.indoption, pg_catalog.pg_class.reloptions, pg_catalog.pg_am.amname, CASE WHEN (pg_catalog.pg_index.indpred IS NOT NULL) THEN pg_catalog.pg_get_expr(pg_catalog.pg_index.indpred, pg_catalog.pg_index.indrelid) END AS filter_definition, pg_catalog.pg_index.indnkeyatts, pg_catalog.pg_index.indnullsnotdistinct, idx_cols.elements_opclass, idx_cols.elements_opdefault, idx_cols.elements, idx_cols.elements_is_expr
FROM pg_catalog.pg_index JOIN pg_catalog.pg_class ON pg_catalog.pg_index.indexrelid = pg_catalog.pg_class.oid JOIN pg_catalog.pg_am ON pg_catalog.pg_class.relam = pg_catalog.pg_am.oid LEFT OUTER JOIN (SELECT idx_attr.indexrelid AS indexrelid, min(idx_attr.indrelid) AS min_1, array_agg(idx_attr.element ORDER BY idx_attr.ord) AS elements, array_agg(idx_attr.is_expr ORDER BY idx_attr.ord) AS elements_is_expr, array_agg(idx_attr.opcname ORDER BY idx_attr.ord) AS elements_opclass, array_agg(idx_attr.opcdefault ORDER BY idx_attr.ord) AS elements_opdefault
FROM (SELECT idx.indexrelid AS indexrelid, idx.indrelid AS indrelid, idx.ord AS ord, CASE WHEN (idx.attnum = %(attnum_1)s::INTEGER) THEN pg_catalog.pg_get_indexdef(idx.indexrelid, idx.ord + %(ord_1)s::INTEGER, %(pg_get_indexdef_1)s) ELSE CAST(pg_catalog.pg_attribute.attname AS TEXT) END AS element, idx.attnum = %(attnum_2)s::INTEGER AS is_expr, pg_catalog.pg_opclass.opcname AS opcname, pg_catalog.pg_opclass.opcdefault AS opcdefault
FROM (SELECT pg_catalog.pg_index.indexrelid AS indexrelid, pg_catalog.pg_index.indrelid AS indrelid, unnest(pg_catalog.pg_index.indkey) AS attnum, unnest(pg_catalog.pg_index.indclass) AS att_opclass, generate_subscripts(pg_catalog.pg_index.indkey, %(generate_subscripts_1)s::INTEGER) AS ord
FROM pg_catalog.pg_index
WHERE NOT pg_catalog.pg_index.indisprimary AND pg_catalog.pg_index.indrelid IN (__[POSTCOMPILE_oids])) AS idx LEFT OUTER JOIN pg_catalog.pg_attribute ON pg_catalog.pg_attribute.attnum = idx.attnum AND pg_catalog.pg_attribute.attrelid = idx.indrelid LEFT OUTER JOIN pg_catalog.pg_opclass ON pg_catalog.pg_opclass.oid = idx.att_opclass
WHERE idx.indrelid IN (__[POSTCOMPILE_oids])) AS idx_attr GROUP BY idx_attr.indexrelid) AS idx_cols ON pg_catalog.pg_index.indexrelid = idx_cols.indexrelid LEFT OUTER JOIN pg_catalog.pg_constraint ON pg_catalog.pg_index.indrelid = pg_catalog.pg_constraint.conrelid AND pg_catalog.pg_index.indexrelid = pg_catalog.pg_constraint.conindid AND pg_catalog.pg_constraint.contype = ANY (ARRAY[%(param_1)s::VARCHAR, %(param_2)s::VARCHAR, %(param_3)s::VARCHAR])
WHERE pg_catalog.pg_index.indrelid IN (__[POSTCOMPILE_oids]) AND NOT pg_catalog.pg_index.indisprimary ORDER BY pg_catalog.pg_index.indrelid, pg_catalog.pg_class.relname

Query times: total_t=0.15839672088623047, max_t=0.0021893978118896484, min_t=0.000438690185546875, avg_t=0.0006335868835449219, median_t=0.0005793571472167969  Number of calls: 250
SELECT pg_catalog.pg_class.oid, pg_catalog.pg_class.relname
FROM pg_catalog.pg_class JOIN pg_catalog.pg_namespace ON pg_catalog.pg_namespace.oid = pg_catalog.pg_class.relnamespace
WHERE pg_catalog.pg_class.relkind = ANY (ARRAY[%(param_1)s::VARCHAR, %(param_2)s::VARCHAR, %(param_3)s::VARCHAR, %(param_4)s::VARCHAR, %(param_5)s::VARCHAR]) AND pg_catalog.pg_table_is_visible(pg_catalog.pg_class.oid) AND pg_catalog.pg_namespace.nspname != %(nspname_1)s::VARCHAR AND pg_catalog.pg_class.relname IN (__[POSTCOMPILE_filter_names])

Query times: total_t=0.002139568328857422, max_t=0.002139568328857422, min_t=0.002139568328857422, avg_t=0.002139568328857422, median_t=0.002139568328857422  Number of calls: 1
SELECT pg_catalog.pg_class.oid, pg_catalog.pg_class.relname
FROM pg_catalog.pg_class JOIN pg_catalog.pg_namespace ON pg_catalog.pg_namespace.oid = pg_catalog.pg_class.relnamespace
WHERE pg_catalog.pg_class.relkind = ANY (ARRAY[%(param_1)s::VARCHAR, %(param_2)s::VARCHAR, %(param_3)s::VARCHAR]) AND pg_catalog.pg_class.relpersistence != %(relpersistence_1)s::VARCHAR AND pg_catalog.pg_table_is_visible(pg_catalog.pg_class.oid) AND pg_catalog.pg_namespace.nspname != %(nspname_1)s::VARCHAR

@CaselIT
Copy link
Member
CaselIT commented Apr 8, 2025

I'll look if it's possible to improve the time of the single getindex a bit, but overall it seems mostly ok

@dlax
Copy link
Contributor Author
dlax commented May 14, 2025

@CaselIT, is the performa 8000 nce impact really a (blocking) issue for this changeset? If so, I can take a look at how to improve this. Otherwise, perhaps we can move forwards and possibly improve the situation later if someone complains?

(I'd like to avoid keeping this open for too long. Thank you.)

@CaselIT
Copy link
Member
CaselIT commented May 14, 2025

Sorry this got sidelined. I'll try looking at it this week

@dlax
Copy link
Contributor Author
dlax commented May 23, 2025

Rebased and adjusted added test case so that it passes with PostgreSQL 9.6.

@CaselIT CaselIT requested a review from sqla-tester May 27, 2025 21:46
Copy link
Collaborator
@sqla-tester sqla-tester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is sqla-tester setting up my work on behalf of CaselIT to try to get revision 0d65907 of this pull request into gerrit so we can run tests and reviews and stuff

@sqla-tester
Copy link
Collaborator

New Gerrit review created for change 0d65907: https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5897

Fill the `postgresql_ops` key of PostgreSQL's `dialect_options` returned
by get_multi_indexes() with a mapping from column names to the operator
class, if it's not the default for respective data type.

As we need to join on ``pg_catalog.pg_opclass``, the table definition is
added to ``postgresql.pg_catalog``.

Fixes sqlalchemy#8664.
@dlax
Copy link
Contributor Author
dlax commented May 28, 2025

Hm, the last changes in tests were wrong. Pushed a fix here, sorry for the trouble.

@CaselIT CaselIT requested a review from sqla-tester May 28, 2025 19:37
Copy link
Collaborator
@sqla-tester sqla-tester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is sqla-tester setting up my work on behalf of CaselIT to try to get revision 8fdf93e of this pull request into gerrit so we can run tests and reviews and stuff

@sqla-tester
Copy link
Collaborator

Patchset 8fdf93e added to existing Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5897

@CaselIT
Copy link
Member
CaselIT commented May 28, 2025

@zzzeek I'm wondering if it could make sense to try and simplify a bit the query to cache the values of pg_am (what we already joined) and pg_opclass (that's joined now) since I believe that those values are ~ static and quite limited in number, so we could load them once per dialect / connection and keep it saved.

https://www.postgresql.org/docs/current/catalog-pg-am.html
https://www.postgresql.org/docs/current/cat 8000 alog-pg-opclass.html

@CaselIT
Copy link
Member
CaselIT commented May 28, 2025

doing some quick test looks interesting, but it would be follow on change, since the performance impact of this pr is limited.

@CaselIT
Copy link
Member
CaselIT commented May 28, 2025

done here https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5899 seems easy enough and it should improve performance a bit on average

@sqla-tester
Copy link
Collaborator

Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5897 has been merged. Congratulations! :)

@sqla-tester
Copy link
Collaborator

Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5898 has been merged. Congratulations! :)

sqlalchemy-bot pushed a commit that referenced this pull request Jun 3, 2025
Fill the `postgresql_ops` key of PostgreSQL's `dialect_options` returned by get_multi_indexes() with a mapping from column names to the operator class, if it's not the default for respective data type.

As we need to join on ``pg_catalog.pg_opclass``, the table definition is added to ``postgresql.pg_catalog``.

Fixes #8664.

Closes: #12504
Pull-request: #12504
Pull-request-sha: 8fdf93e

Change-Id: I8789c1e9d15f8cc9a7205f492ec730570f19bbcc
(cherry picked from commit 0642541)
@dlax dlax deleted the issue8664 branch June 3, 2025 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Postgres: Reflection of indexes with operators
3 participants
0