Description
🐛 Describe the bug
I am noticing accuracy difference when training with torch compile. to narrow down the issue, ran ablation with eager
, aot_eager
and inductor
and observed that numerics diverge when using backend=inductor
.
Looks like this is happening inside inductor and to produce, I added repro.py
and minified_launcher.py
. let me know if you need any information that can help better reproduce this
Error logs
No response
Versions
ran repro minifer and the script is available here
repro.py - https://gist.github.com/naveenkumarmarri/0f66d89695e56840a06c7a37dccca83f
minifier.py - https://gist.github.com/naveenkumarmarri/8fa35e72e3210a6c6b13548d9ab73df6
cc @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @aakhundov