[BUG]: Graph compiler cache not being updated

Bug description

When trying a simple example using the new CustomOpLibrary despite code changes in the kernel the graph compiler seems to be using old cache. I provide an example where changing one line of code and then running it won't recompile this kernel. If one deletes the cache manually then you can see how it changes behaviour.

Steps to reproduce

Run this simple example and notice how it manages to complete
Change one line of code which makes it fail. But notice it doesn't.
Manually clean cache rm -rf ~/.modular
Run again and see different behaviour

example.py

from pathlib import Path
import torch
from max.torch import CustomOpLibrary

mojo_kernels = Path(__file__).parent / "kernels"
op_library = CustomOpLibrary(mojo_kernels)
add_const_kernel = op_library.add_const

def add_const_1d(x: torch.Tensor) -> torch.Tensor:
    result = torch.zeros_like(x, dtype=x.dtype, device=x.device)
    add_const_kernel(result, x)
    return result

if __name__ == "__main__":
    x = torch.randn(10).cuda()
    print(add_const_1d(x))

kernels/kernel.mojo

import compiler
from gpu import thread_idx, block_idx, block_dim, barrier
from layout import Layout, LayoutTensor, UNKNOWN_VALUE
from runtime.asyncrt import DeviceContextPtr
from math import ceildiv
from gpu.host import DeviceBuffer
from tensor import InputTensor, OutputTensor
from memory import UnsafePointer

alias BLOCK_SIZE = 32
alias Dyn1DLayout = Layout.row_major(10)
alias dtype = DType.float32

@compiler.register("add_const")
struct AddConst:
    @staticmethod
    fn execute[
        target: StaticString,
    ](
        # Outputs
        result: OutputTensor[type = DType.float32, rank=1],
        # Inputs
        x: InputTensor[type = DType.float32, rank=1],
        # Context
        ctx: DeviceContextPtr,
    ) raises:
        x_tensor = x.to_layout_tensor()
        result_tensor = result.to_layout_tensor()

        @parameter
        if target == "cpu":
            raise Error("Rasterize3DGS CPU target not implemented yet.")
        elif target == "gpu":
            # Get GPU context
            var gpu_ctx = ctx.get_device_context()

            # Define grid and block dimensions for the kernel launch
            var grid = (ceildiv(x.dim_size(0), BLOCK_SIZE))
            var block = (BLOCK_SIZE)

            gpu_ctx.enqueue_memset(
                DeviceBuffer[result.type](
                    gpu_ctx,
                    rebind[UnsafePointer[Scalar[result.type]]](result_tensor.ptr),
                    x.dim_size(0),
                    owning=False,
                ),
                0,
            )

            gpu_ctx.enqueue_function[add_const_kernel](
                x_tensor,
                result_tensor,
                x.dim_size(0),
                grid_dim=grid,
                block_dim=block,
            )
        else:
            raise Error
6762
("Unsupported target:", target)

fn add_const_kernel(
    x: LayoutTensor[dtype, Dyn1DLayout, MutableAnyOrigin],
    result: LayoutTensor[dtype, Dyn1DLayout, MutableAnyOrigin],
    size: Int,
):
    i = block_idx.x * block_dim.x + thread_idx.x
    if i < size:
        result[i] = x[i] + 10

Run this and it should work and print a tensor.

Then replace the line in the kernel:

- alias Dyn1DLayout = Layout.row_major(10)
+ alias Dyn1DLayout = Layout.row_major(UNKNOWN_VALUE)

Run and see it will continue to work. But if you remove the cache dir with rm -rf ~/.modular you will see it fails to run now.

System information

- Provide the system information by running `magic info`. Not running with magic and instead with UV so cannot do that.

Here is the pyproject.toml:

[project]
name = "example"
version = "0.0.0"
readme = "README.md"
requires-python = ">=3.10"
dependencies = [
    "torch>=2.6.0",
    "pillow>=11.2.1, <12",
    "modular>=25.4.0.dev2025052405",
]

[tool.uv]
[[tool.uv.index]]
url = "https://dl.modular.com/public/nightly/python/simple/"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug description

Steps to reproduce

System information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Bug description

Steps to reproduce

System information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions