8000 Update base for Update on "Inductor Tiling Rewrite" · pytorch/pytorch@47dfae2 · GitHub
[go: up one dir, main page]

Skip to content

Commit 47dfae2

Browse files
committed
Update base for Update on "Inductor Tiling Rewrite"
Fix for #149982. Summary: This PR does two main things: 1. Rewrites the tiling heuristics. The previous tiling heuristic would have each dependency generate a tiling. Then, we sum up the score for each generated tiling, preferring any 2d tiling over the default. The new tiling heuristics scores each tiling by its global coalesced memory. This gives both a potentially better tiling (especially for more complicated, 3d patterns) as well as information we can use in generating block sizes. 2. Analyses memory dependencies for accesses that would be coalesced with additional tiling. The motivating kernel is in #149982 which is a 32 element reduction. A smaller version of it is [here](https://gist.github.com/eellison/0fa9396f5479eb4dba09756e3bf6ff2a). We need to run this kernel once in the forward per linear layer on a contiguous tensor, and once in the backward on a transposed tensor. While the contiguous kernel has coalesced accesses, and is performant on master, the transposed version accesses uncoalesced memory on main and is ~2.8x slower. See, this [full log](https://gist.github.com/eellison/fa644bfd9d0ae11dadb62e17a5d48a83) from the above repro. Now, with this PR, it is only ~1.15x slower. See the [updated log](https://gist.github.com/eellison/0b2b653309494d28cf7b48929a022075). We analyse memory addresses that are not coalesced by any iteration variable. For this following dependency: `(((32*n0 + n1)//2048)) + 4096*(ModularIndexing(32*n0 + n1, 1, 2048))` we infer that tiling `n0` by 64 makes the first term coalesced. I'm sure there are still some CI failures to debug.. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov vkuzo cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
1 parent bb06172 commit 47dfae2

File tree

0 file changed

+0
-0
lines changed

    0 file changed

    +0
    -0
    lines changed

    0 commit comments

    Comments
     (0)
    0