-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Description
Problem Description
Currently, when eager loading HasMany or BelongsToMany associations using contain(), CakePHP typically employs a "two-query" strategy (default):
Query 1: Fetches the main records (e.g., Articles).
Query 2: Fetches the associated records (e.g., Comments) using the IDs collected from the first result set, typically via a WHERE foreign_key IN (1, 2, 3...) clause.
While this effectively prevents the N+1 problem, the WHERE IN approach has limitations when working with large datasets:
- Packet Size Limits: If the main query returns thousands of rows, the list of IDs passed to the IN clause can hit the max_allowed_packet limit of the database.
- Parsing Overhead: The database engine has to parse a massive SQL string containing thousands of integers.
- Round-trip Data: We are fetching IDs to PHP memory just to send them immediately back to the database driver.
Proposed Solution:
I propose introducing a loading strategy that utilizes Common Table Expressions (CTEs) (available in MySQL 8.0, PostgreSQL, SQLite, and SQL Server) to handle the association filtering entirely within the database engine.
Instead of passing a PHP array of IDs to the second query, the ORM could generate a CTE based on the constraints of the primary query.
Example Scenario
Context: Articles hasMany Comments.
Current Behavior:
SQL
-- Query 1
SELECT id, title FROM articles WHERE published = 1;-- PHP extracts IDs: [100, 101, 102 ... 5000]
-- Query 2
SELECT * FROM comments WHERE article_id IN (100, 101, 102 ... 5000);Proposed Behavior (Using CTE):
The second query could look like this:
WITH TargetArticles AS (
SELECT id FROM articles WHERE published = 1
)
SELECT comments.*
FROM comments
INNER JOIN TargetArticles ON comments.article_id = TargetArticles.id;Benefits
- Performance: significantly faster for large result sets as it avoids parsing massive IN lists.
- Memory Efficiency: Reduces PHP memory usage by not having to build giant arrays of IDs for query binding.
- Scalability: Removes the risk of hitting query length/packet size limits.
Implementation Considerations:
This would likely need to be an opt-in strategy (e.g., 'strategy' => 'cte') or automatically detected based on DB driver capabilities, as older database versions (like MySQL 5.7) do not support CTEs.
Since CakePHP 4/5 requires modern database versions that generally support CTEs, this could eventually become a standard standard approach for HasMany loading.
I would love to hear the core team's thoughts on the feasibility of adding this to the Query Builder/ORM.
CakePHP Version
6.0