8000 [cutlass backend] Reduce log level for cutlass compilation error by henrylhtsang · Pull Request #153397 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[cutlass backend] Reduce log level for cutlass compilation error #153397

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 32 additions & 14 deletions torch/_inductor/select_algorithm.py
Original file line number Diff line number Diff line change
Expand Up @@ -2076,14 +2076,15 @@ def precompile_with_captured_stdout(choice) -> tuple[None, int]:
return None, elapsed_ns // 1000

def on_complete(future):
_, precompile_elapsed_us = future.result()
elapsed_seconds = precompile_elapsed_us / 1e6
elapsed_times[future] = elapsed_seconds
log.debug(
"Precompilation complete for future: %s, elapsed time: %.02fs",
future,
elapsed_seconds,
)
if not future.exception():
_, precompile_elapsed_us = future.result()
elapsed_seconds = precompile_elapsed_us / 1e6
elapsed_times[future] = elapsed_seconds
log.debug(
"Precompilation complete for future: %s, elapsed time: %.02fs",
future,
elapsed_seconds,
)

executor = ThreadPoolExecutor(max_workers=num_workers)
async_compile = torch._inductor.async_compile.AsyncCompile()
Expand Down Expand Up @@ -2130,9 +2131,23 @@ def wait_on_futures():
timeout=precompilation_timeout_seconds,
):
if e := future.exception():
log.error(
"Exception %s for benchmark choice %s", e, futures[future]
from torch._inductor.codegen.cuda.cuda_kernel import (
CUDATemplateCaller,
)

if isinstance(e, CUDACompileError) and isinstance(
futures[future], CUDATemplateCaller
):
log.debug(
"Exception %s for benchmark choice %s",
e,
futures[future],
exc_info=True,
)
else:
log.error(
"Exception %s for benchmark choice %s", e, futures[future]
)
else:
counters["inductor"]["select_algorithm_num_precompiles"] += 1
log.info(
Expand Down Expand Up @@ -2238,10 +2253,13 @@ def benchmark_choices(
try:
timing = cls.benchmark_choice(choice, autotune_args)
except CUDACompileError as e:
log.error(
"CUDA compilation error during autotuning: \n%s. \nIgnoring this choice.",
str(e),
)
from torch._inductor.codegen.cuda.cuda_kernel import CUDATemplateCaller
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was told local imports like these can cause unwanted compile times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was told local imports like these can cause unwanted compile times.

I feel like if we try to import that at the top, it would cause circular import

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

give it a try and see? if not possible, it's probably fine since this is in exception handling path

Copy link
Contributor Author
@henrylhtsang henrylhtsang May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

give it a try and see? if not possible, it's probably fine since this is in exception handling path

actually I want to test it in a subsequent PR


if not isinstance(choice, CUDATemplateCaller):
log.error(
"CUDA compilation error during autotuning: \n%s. \nIgnoring this choice.",
e,
)
timing = float("inf")
except NotImplementedError as e:
log.warning("Not yet implemented: %s", e)
Expand Down
Loading
0