torch fix casting and add ops for sd vae(s) #9297

tocubed · 2025-02-28T09:04:53Z

Fix copy to cast as that is the expected behavior here.

After this commit, the image AE models from https://github.com/madebyollin/taesd should work. Tested in ComfyUI, quite simple to patch in tinygrad. Tried an actual SDXL VAE and it runs but produces incorrect results.

github-actions · 2025-02-28T09:05:16Z

This branch currently is behind tinygrad/master. The line count difference bot is disabled.

chenyuxyz

do you know why sdxl output is incorrect?

chenyuxyz · 2025-02-28T20:19:43Z

extra/torch_backend/backend.py

@@ -142,7 +142,8 @@ def convolution_backward_overrideable(grad_out, input, weight, stride, padding,
 @torch.library.impl("aten::_copy_from", "privateuseone")
 def _copy_from(src, dest, non_blocking=False):
  if str(src.device) == "tiny" and str(dest.device) == "tiny":


the cast should happen regardless of the device (before if blocks) right?

torch cast recurses back into this (at least when I tried it crashed without traceback…). need to know it’s a tiny device to do the cast, although now looking again I’m not sure if that 3rd block does the cast implicitly actually

I’ll add a test

added cast in all branches, after src is converted to tiny tensor one way or another

this change does seem to fix ~10 additional tests in #9302 which were failing due to dtype mismatch.

tocubed · 2025-03-01T01:47:25Z

do you know why sdxl output is incorrect?

narrowed it down to the encoder but no further, it may fix itself with the test_ops fixes but if not I’ll revisit

tocubed · 2025-03-01T09:35:04Z

Might have issues with this _copy_from and non-contiguous tensors, which admittedly seems to be rare scenario so far.
SDXL VAE works after a adding tiny implementation for aten.pad. Whatever torch was doing instead was broken, must be one of the other ops.

< 8000 p dir="auto">

chenyuxyz · 2025-03-01T13:49:22Z

cool thanks!

* fix some tests in test_ops for torch backend(171 failing) * fix more tests (135 failures) * fix tests (126 failing) * handle transposed convs (109 tests failing) * fix slice * fix lshift & rshift and more tests (87 tests failing) * revert accidental change * remove unnecessary changes (82 failures) * fix backward for avg_pool2d (78 failures) * fix backward for avg_pool2d (78 failures) * fix replication backpass * fix reflection pad back pass (71 failures) * cummax with indicies, aten.mv and move out methods (67 failures) * extract avg_pool2d and avg_pool3d to separate functions (62 failures) * revert changes for cat_out * rewrite avg_pool and pad without repetition * remove duplicates from decomps * slice rewrite and add slice_backward (59 failures) * add dtype fixup from #9297 * fix linter error and remove Tensor.pad (48 failures) * add select_backward and index_put (40 failures) * fix some more tests (36 failures) * fix more tests (12 failures) * some cleanups and fix couple more tests (10 failures) * cleaner way to write upsample * some more upsample cleanups * use lambda for upsample * add autowrapper for upsample forward * cumsum and max_dim without aten functions * revert _log_softmax * fix more tests (1 failure) * make linter happy * move import to appropriate func * make linter happy * add codes for noqa * some more refactors * remove comment * remove dependency on aten function for conv backward * some more refactors * add returns * revert a change from merge * some cleanups * remove whitespace * remove ruff change * revert upsample * add masked_fill_.Tensor and scatter.src_out * add todo * fix test_biased_conv2d * fix test_var_one_in_axis & test_std_one_in_axis but break test_biased_conv2d :( * revert torch_debug * revert torch_debug * skip test_gather_failure for the tiny backend * make padding registration more consise * add nonzero * remove scatter_add since we already have the out * fix scatter * remove some repetition * make upsample backward registrations more concise * remove select.int * use Tensor.cumsum * realize conv2d outputs before backward to fix test_biased_conv2d * add a todo for realize(1 failure) * add new_empty and new_empty_strided * make test_pad_circular_mode forward only and remove redundant stuff * fix linter errors * remove expect failure * just tb * slice is a view_op * contiguous only when lazydata.is_realized * fix backward for test_pad_circular_mode * revert torch.nn.functional.pad override * add transpose.int and make constant_pad_nd contiguous * slice_backwards has no kwargs --------- Co-authored-by: chenyu <chenyu@fastmail.com>

* fix some tests in test_ops for torch backend(171 failing) * fix more tests (135 failures) * fix tests (126 failing) * handle transposed convs (109 tests failing) * fix slice * fix lshift & rshift and more tests (87 tests failing) * revert accidental change * remove unnecessary changes (82 failures) * fix backward for avg_pool2d (78 failures) * fix backward for avg_pool2d (78 failures) * fix replication backpass * fix reflection pad back pass (71 failures) * cummax with indicies, aten.mv and move out methods (67 failures) * extract avg_pool2d and avg_pool3d to separate functions (62 failures) * revert changes for cat_out * rewrite avg_pool and pad without repetition * remove duplicates from decomps * slice rewrite and add slice_backward (59 failures) * add dtype fixup from tinygrad/tinygrad#9297 * fix linter error and remove Tensor.pad (48 failures) * add select_backward and index_put (40 failures) * fix some more tests (36 failures) * fix more tests (12 failures) * some cleanups and fix couple more tests (10 failures) * cleaner way to write upsample * some more upsample cleanups * use lambda for upsample * add autowrapper for upsample forward * cumsum and max_dim without aten functions * revert _log_softmax * fix more tests (1 failure) * make linter happy * move import to appropriate func * make linter happy * add codes for noqa * some more refactors * remove comment * remove dependency on aten function for conv backward * some more refactors * add returns * revert a change from merge * some cleanups * remove whitespace * remove ruff change * revert upsample * add masked_fill_.Tensor and scatter.src_out * add todo * fix test_biased_conv2d * fix test_var_one_in_axis & test_std_one_in_axis but break test_biased_conv2d :( * revert torch_debug * revert torch_debug * skip test_gather_failure for the tiny backend * make padding registration more consise * add nonzero * remove scatter_add since we already have the out * fix scatter * remove some repetition * make upsample backward registrations more concise * remove select.int * use Tensor.cumsum * realize conv2d outputs before backward to fix test_biased_conv2d * add a todo for realize(1 failure) * add new_empty and new_empty_strided * make test_pad_circular_mode forward only and remove redundant stuff * fix linter errors * remove expect failure * just tb * slice is a view_op * contiguous only when lazydata.is_realized * fix backward for test_pad_circular_mode * revert torch.nn.functional.pad override * add transpose.int and make constant_pad_nd contiguous * slice_backwards has no kwargs --------- Co-authored-by: chenyu <chenyu@fastmail.com>

torch fix copy casting and add upsample op 8000

fc61b6c

chenyuxyz reviewed Feb 28, 2025

View reviewed changes

tocubed marked this pull request as draft March 1, 2025 06:37

tocubed added 3 commits February 28, 2025 23:03

update cast and add test

242548a

fix lint

2712f15

add pad for sdxl vae to work

d412b97

tocubed marked this pull request as ready for review March 1, 2025 09:35

tocubed changed the title ~~torch fix copy casting and add upsample op~~ torch fix casting and add ops for sd vae(s) Mar 1, 2025

Anish9901 added a commit to Anish9901/tinygrad that referenced this pull request Mar 1, 2025

add dtype fixup from tinygrad#9297

cc760e0

chenyuxyz merged commit f4148ac into tinygrad:master Mar 1, 2025
31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

torch fix casting and add ops for sd vae(s) #9297

torch fix casting and add ops for sd vae(s) #9297

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

torch fix casting and add ops for sd vae(s) #9297

torch fix casting and add ops for sd vae(s) #9297

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!