Commit 6f835a4
[amd] fix tunableop gemm (#153764)
Summary: Tunableop on AMD has perf regression for a while. It turns out that the tunableop code path will first run tuned GEMM and then run heuristics GEMM (so run two GEMMs...)....
Test Plan:
```
CUDA_VISIBLE_DEVICES=0 buck test @//mode/opt-amd-gpu -c fbcode.rocm_arch=mi300 -c fbcode.rocm_ck_rtz=true fbcode//accelerators/workloads/microbench/RE:test_emu_v1p4 -- --exact 'accelerators/workloads/microbench/RE:test_emu_v1p4 - test_gemm (accelerators.workloads.microbench.RE.test_emu_v1p4.EMUv1p4PerfTest)' --run-disabled
```
Before the diff
```
File "/data/users/mxz/fbsource/buck-out/v2/gen/fbcode/ecc11ed52295855f/accelerators/workloads/microbench/RE/__test_emu_v1p4__/test_emu_v1p4#link-tree/accelerators/workloads/microbench/RE/test_emu_v1p4.py", line 47, in test_gemm
self.assertTrue(result < AMD_GEMM_BASELINE * AMD_GEMM_THRESHOLD)
Buck UI: https://www.internalfb.com/buck2/b4b8dfca-0301-4c5d-83d6-d866d840c42d
Test UI: https://www.internalfb.com/intern/testinfra/testrun/14355223896396807
Network: Up: 10MiB Down: 1.9GiB (reSessionID-23b213fe-a460-4788-86c6-a52343ff10f4)
Loading targets. Remaining 0/5144 93161 dirs read, 753263 targets declared
Analyzing targets. Remaining 0/70523 2837379 actions, 3262810 artifacts declared
Executing actions. Remaining 0/472286 217:26:58.1s exec time total
Command: test. Finished 122 local, 522 remote, 199785 cache (99% hit) 211:26:30.5s exec time cached (97%)
Time elapsed: 12:50.2s
Test execution completed but the tests failed
Tests finished: Pass 0. Fail 1. Fatal 0. Skip 0. Build failure 0
1 TESTS FAILED
✗ accelerators/workloads/microbench/RE:test_emu_v1p4 - test_gemm (accelerators.workloads.microbench.RE.test_emu_v1p4.EMUv1p4PerfTest)
Run $ fdb buck test <args> to debug accelerators/workloads/microbench/RE:test_emu_v1p4 - test_gemm (accelerators.workloads.microbench.RE.test_emu_v1p4.EMUv1p4PerfTest)
^^^ just prefix your previous command! ($ fdb !!)
Learn more at https://fburl.com/fdb
```
After the diff
```
Tests finished: Pass 1. Fail 0. Fatal 0. Skip 0. Build failure 0
```
Reviewed By: henryoier, henryhu6
Differential Revision: D74910115
Pull Request resolved: #153764
Approved by: https://github.com/yangsiyu007, https://github.com/xw285cornell1 parent 2ade886 commit 6f835a4
1 file changed
+4
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
467 | 467 | | |
468 | 468 | | |
469 | 469 | | |
470 | | - | |
471 | | - | |
472 | | - | |
| 470 | + | |
| 471 | + | |
473 | 472 | | |
474 | 473 | | |
475 | 474 | | |
| |||
486 | 485 | | |
487 | 486 | | |
488 | 487 | | |
489 | | - | |
| 488 | + | |
| 489 | + | |
490 | 490 | | |
491 | 491 | | |
492 | 492 | | |
| |||
0 commit comments