-
Notifications
You must be signed in to change notification settings - Fork 24.8k
Commit 47dfae2
committed
Update base for Update on "Inductor Tiling Rewrite"
Fix for #149982.
Summary:
This PR does two main things:
1. Rewrites the tiling heuristics. The previous tiling heuristic would have each dependency generate a tiling. Then, we sum up the score for each generated tiling, preferring any 2d tiling over the default. The new tiling heuristics scores each tiling by its global coalesced memory. This gives both a potentially better tiling (especially for more complicated, 3d patterns) as well as information we can use in generating block sizes.
2. Analyses memory dependencies for accesses that would be coalesced with additional tiling. The motivating kernel is in #149982 which is a 32 element reduction. A smaller version of it is [here](https://gist.github.com/eellison/0fa9396f5479eb4dba09756e3bf6ff2a). We need to run this kernel once in the forward per linear layer on a contiguous tensor, and once in the backward on a transposed tensor.
While the contiguous kernel has coalesced accesses, and is performant on master, the transposed version accesses uncoalesced memory on main and is ~2.8x slower. See, this [full log](https://gist.github.com/eellison/fa644bfd9d0ae11dadb62e17a5d48a83) from the above repro. Now, with this PR, it is only ~1.15x slower. See the [updated log](https://gist.github.com/eellison/0b2b653309494d28cf7b48929a022075).
We analyse memory addresses that are not coalesced by any iteration variable. For this following dependency:
`(((32*n0 + n1)//2048)) + 4096*(ModularIndexing(32*n0 + n1, 1, 2048))` we infer that tiling `n0` by 64 makes the first term coalesced.
I'm sure there are still some CI failures to debug..
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov vkuzo
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov
[ghstack-poisoned]1 parent bb06172 commit 47dfae2Copy full SHA for 47dfae2
File tree
Expand file treeCollapse file tree
0 file changed
+0
-0
lines changedFilter options
Expand file treeCollapse file tree
0 file changed
+0
-0
lines changed
0 commit comments