Qualcomm AI Engine Direct - Enable custom operator #8726

shewu-quic · 2025-02-26T09:11:39Z

Summary:

Support to register op package in QNN Backend
Add example script to run torch custom op with QNN Op package
Allow op package override torch built-in operator
Add op package example
Modify the flag of dlopen for QNN library
Generate custom op based on the meta and _schema.arguments of torch.fx.Node
Add README for the custom op

Reproduce commands:

# Follow the README to Install qpm
# Follow the README to install hexagon-sdk and hexagon-tool
# install hexagon sdk 5.4.0 for SM8650
# qpm-cli --install hexagonsdk5.x --version 5.4.0.3 --path /path/to/Qualcomm/Hexagon_SDK/hexagon-sdk-5.4.0
# install hexagon sdk 6.0.0 for x86
# qpm-cli --install hexagonsdk6.x --version 6.0.0.2 --path /path/to/Qualcomm/Hexagon_SDK/hexagon-sdk-6.0.0
# install hexagon tool 8.8.02 for x86
# qpm-cli --extract hexagon8.8 --version 8.8.02.1 --path /path/to/Qualcomm/Hexagon_SDK/hexagon-s
8000
dk-6.0.0/tools/HEXAGON_Tools/8.8.02

export HEXAGON_SDK_ROOT=/path/to/hexagon-sdk-5.4.0
export ANDROID_NDK_ROOT=/path/to/android-ndk-r26c
# use clang-9.0.0
export X86_CXX=/path/to/clang++
# run custom op with example script
python3 examples/qualcomm/custom_op/custom_ops_1.py --build_folder build-android -s <device_serial> -H <host> -m SM8650 --op_package_dir examples/qualcomm/custom_op/example_op_package_htp/ExampleOpPackage --build_op_package
# run custom op with unit test
python3 backends/qualcomm/tests/test_qnn_delegate.py TestUtilScript.test_custom_op -b build-android -s <device_serial> -H <host> -m SM8650 --op_package_dir examples/qualcomm/custom_op/example_op_package_htp/ExampleOpPackage -r </path/to/executorch> -a </path/to/artifacts>

pytorch-bot · 2025-02-26T09:11:44Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8726

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit de02835 with merge base c6c3616 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

shewu-quic · 2025-02-26T09:13:20Z

Hi @cccclai,

This PR is to support custom kernel in QNN Backend.
Could you please help to take a look?
If you have any problems, please let me know. Thanks :)

< 8000 /div>

digantdesai

The readme is nice, I will try to reproduce this. Thanks!

backends/qualcomm/builders/op_custom_op.py

backends/qualcomm/runtime/backends/QnnBackendCommon.cpp

backends/qualcomm/tests/test_qnn_delegate.py

examples/qualcomm/custom_op/README.md

digantdesai · 2025-04-04T18:57:46Z

Apologies this is still pending. I will also help review this next week.

shewu-quic · 2025-04-10T01:28:50Z

Will finish review by the end of this week, thank you for being patient.

Thanks for your effort.
I want to add more details about internal issue mentioned in today meeting.
When we try to implement custom embedding op, we found an issue which static variable doesn't free normally after call Qnn Backend free. You will get the following error if using the macro DEF_TENSOR_PROPERTIES in op implementation file.

We will try to create a workaround PR for this issue, in the meantime, and touch internal HTP backend owner to address this issue.

shewu-quic · 2025-04-22T02:14:30Z

Finally got time to finish reading, a couple of questions (can be follow up as well).

Looks like users can define any custom op, and as long as they have the custom op package compile and the package name is defined as part of compile spec, they can lower it to qnn backend. It is indeed flexible and offers more options for users.

Looks like the package path should be predefined AoT, however the PoC who exported the model might not be the same as the PoC who run the model on device. Maybe we can just specify the .so path on device and the package can be found automatically?

Yes, I expect users to be flexible. It seems to me that those who use custom operations are advanced users.
Got it. Actually, I share the same thought. For the runtime operation package, we can add a follow-up to use the runtime option for setting. Regarding the AOT operation package, I have tested that the operation package can be found in LD_LIBRARY_PATH due to the dlopen call in the QNN SDK. Therefore, users only need to specify the .so file and correctly set LD_LIBRARY_PATH to access the library.

shewu-quic · 2025-04-22T02:18:50Z

Approve as I generally like the flexibility of this approach. The only I'm worried is user experience, and would like us improving it.

Originally, our thought on supporting custom ops was to force users to register the custom ops under a specific namespace, like qnn namespace. It's less flexible, but easier to debug, given that users are required to write the qnn package anyway for the qnn custom ops. It's similar to how you support AIHub model. What do you think?

I think it's fine for me. If I add a check for the namespace based on the current design, what are your thoughts?

shewu-quic · 2025-05-07T07:24:01Z

@cccclai I have rebased this PR. Thanks!

facebook-github-bot · 2025-06-02T17:10:56Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

shewu-quic · 2025-06-04T01:26:56Z

Thanks for enabling x86 in the latest commit, would you be able to help fixing the issue due to flatbuffers version mismatch? We may need to downgrade the flatbuffers in open source unfortunately

Let me make clear for the following.

May I know if this mismatch issue is occurring due to this PR?
What problems did you encounter after the downgrade?

++ @haowhsu-quic aware

shewu-quic · 2025-06-05T09:24:45Z

When I checkout to 338393f8 in flatbuffers, I get the following error with ./backends/qualcomm/scripts/build.sh.
Do I miss anything?

Error log

Error while generating /local3/mnt/workspace/shewu/executorch/executorch_shewu/executorch/build-android/executorch_srcs.cmake. Exit code: 1 Output:

Error:
2025-06-05 17:18:30,928 [ExecuTorch] ERROR: Failed to query buck for sources. Failed command:
buck2 cquery inputs(deps('//runtime/executor:program')) --target-platforms shim_et//:android-arm64 This is likely due to missing git submodules or outdated CMake cache. Please run the following before retry: ./install_executorch.sh --clean
git submodule sync
git submodule update --init

Traceback (most recent call last):
File "/local3/mnt/workspace/shewu/executorch/executorch_shewu/executorch/tools/cmake/buck_util.py", line 34, in run
cp: subprocess.CompletedProcess = subprocess.run(
File "/local/mnt/workspace/miniconda3/envs/executorch/lib/python3.10/subprocess.py", line 524, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/local3/mnt/workspace/shewu/executorch/executorch_shewu/executorch/buck2-bin/buck2-2025-05-06-201beb86106fecdc84e30260b0f1abb5bf576988', 'cquery', "inputs(deps('//runtime/executor:program')
)", '--target-platforms', 'shim_et//:android-arm64']' returned non-zero exit status 3.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/local3/mnt/workspace/shewu/executorch/executorch_shewu/executorch/tools/cmake/extract_sources.py", line 255, in
main()
File "/local3/mnt/workspace/shewu/executorch/executorch_shewu/executorch/tools/cmake/extract_sources.py", line 240, in main
target_to_srcs[name] = sorted(target.get_sources(graph, runner, buck_args))
File "/local3/mnt/workspace/shewu/executorch/executorch_shewu/executorch/tools/cmake/extract_sources.py", line 144, in get_sources
raise e
File "/local3/mnt/workspace/shewu/executorch/executorch_shewu/executorch/tools/cmake/extract_sources.py", line 132, in get_sources
sources: set[str] = set(runner.run(["cquery", query] + buck_args))
File "/local3/mnt/workspace/shewu/executorch/executorch_shewu/executorch/tools/cmake/buck_util.py", line 42, in run
raise RuntimeError(ex.stderr.decode("utf-8")) from ex
RuntimeError: [2025-06-05T17:18:17.612+08:00] Starting new buck2 daemon...
[2025-06-05T17:18:30.149+08:00] Connected to new buck2 daemon.
[2025-06-05T17:18:30.198+08:00] Build ID: b928dfaf-981f-44d0-9077-9d3406669451
Command failed:
Error in configured node dependency, dependency chain follows (-> indicates depends on, ^ indicates same configuration as previous):
root//runtime/executor:program (shim_et//:android-arm64#529b86ff5de06a9c)
-> root//runt 8000 ime/executor:program_no_prim_ops (^)
-> root//runtime/executor:pte_data_map (^)
-> root//schema:program (^)
-> root//schema:generate_program (^)
-> root//third-party:flatc (^)

Caused by:
0: looking up unconfigured target node root//third-party:flatc
1: Error loading targets in package root//third-party for target root//third-party:flatc
2: Error evaluating build file: root//third-party:TARGETS
3: Traceback (most recent call last):
* third-party/TARGETS:123, in
runtime.cxx_library(
* shim_et/xplat/executorch/build/runtime_wrapper.bzl:249, in _cxx_library
_cxx_library_common(*args, **kwargs)
* shim_et/xplat/executorch/build/runtime_wrapper.bzl:242, in _cxx_library_common
env.cxx_library(*args, **kwargs)

   error: Error coercing attribute `raw_headers` of `root//third-party:flatc_library`
     --> shim_et/xplat/executorch/build/runtime_wrapper.bzl:242:5
       |
   242 |     env.cxx_library(*args, **kwargs)
       |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       |
4: Error coercing attribute `raw_headers` of type `attrs.list(attrs.source(), default=[])`
5: Error coercing ["flatbuffers/include/flatbuffers/allocator.h", "flatbuffers/include/flatbuffers/array.h", "flatbuffers/include/flatbuffers/base.h", "flatbuffers/include/flatbuffers/buffer.h", "flatbuffers/include/flatbuffers

/buffer_ref.h", "flatbuffers/include/flatbuffers/code_generator.h", "flatbuffers/include/flatbuffers/default_allocator.h", "flatbuffers/include/flatbuffers/detached_buffer.h", "flatbuffers/include/flatbuffers/file_manager.h", "flat
buffers/include/flatbuffers/flatbuffer_builder.h", "flatbuffers/include/flatbuffers/flatbuffers.h", "flatbuffers/include/flatbuffers/flex_flat_util.h", "flatbuffers/include/flatbuffers/flexbuffers.h", "flatbuffers/include/flatbuffe
rs/hash.h", "flatbuffers/include/flatbuffers/idl.h", "flatbuffers/include/flatbuffers/minireflect.h", "flatbuffers/include/flatbuffers/reflection.h", "flatbuffers/include/flatbuffers/reflection_generated.h", "flatbuffers/include/fl
atbuffers/registry.h", "flatbuffers/include/flatbuffers/stl_emulation.h", "flatbuffers/include/flatbuffers/string.h", "flatbuffers/include/flatbuffers/struct.h", "flatbuffers/include/flatbuffers/table.h", "flatbuffers/include/flatb
uffers/util.h", "flatbuffers/include/flatbuffers/vector.h", "flatbuffers/include/flatbuffers/vector_downward.h", "flatbuffers/include/flatbuffers/verifier.h"]
6: Error coercing "flatbuffers/include/flatbuffers/allocator.h"
7: Coercing flatbuffers/include/flatbuffers/allocator.h as a source
8: Source file flatbuffers/include/flatbuffers/allocator.h does not exist as a member of package root//third-party.

CMake Error at tools/cmake/Utils.cmake:109 (message):
executorch: source list generation failed
Call Stack (most recent call first):
CMakeLists.txt:303 (extract_sources)

-- Configuring incomplete, errors occurred!

shewu-quic · 2025-06-05T09:47:00Z

Woops, I got it. This commit is very different from now. It is missing some header file...

shewu-quic · 2025-06-05T10:12:59Z

It seems flatbuffers::Vector only takes one template argument. I've made the adjustments. Could you please try again?

facebook-github-bot · 2025-06-05T17:46:52Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

shewu-quic · 2025-06-06T06:38:42Z

But there is a different issue...

Error log

It seems that dataclass cannot set mutable value to a class attributes.
I have changed. Could you take a shot again? Thanks.

facebook-github-bot · 2025-06-06T17:25:59Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

shewu-quic · 2025-06-09T01:55:46Z

Hmm seems like importing fail, can you rebase?

Done. Thanks :)

facebook-github-bot · 2025-06-09T05:22:15Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2025-06-10T16:50:30Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: - Support to register op package in QNN Backend - Add example script to run torch custom op with QNN Op package - Allow op package override torch built-in operator - Add op package example - move test_custom_op to TestUtilScript - Modify the flag of dlopen for QNN library - Generate custom op based on the meta and _schema.arguments of torch.fx.Node - Add README for the custom op

facebook-github-bot · 2025-06-11T16:54:34Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2025-06-13T18:39:49Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

shewu-quic requested a review from cccclai as a code owner February 26, 2025 09:11

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 26, 2025

digantdesai reviewed Feb 28, 2025

View reviewed changes

shewu-quic force-pushed the dev1/hutton/enable_custom_operator branch from 064d12f to 5f65810 Compare March 6, 2025 09:42

shewu-quic force-pushed the dev1/hutton/enable_custom_operator branch from 5f65810 to eaf8aa6 Compare May 7, 2025 07:21

shewu-quic force-pushed the dev1/hutton/enable_custom_operator branch from 5b04add to 3bbb5c0 Compare June 2, 2025 06:20

shewu-quic requested review from jathu, larryliu0820 and kirklandsign as code owners June 2, 2025 06:20

shewu-quic force-pushed the dev1/hutton/enable_custom_operator branch from eff8673 to a22cdd2 Compare June 5, 2025 10:12

shewu-quic force-pushed the dev1/hutton/enable_custom_operator branch from a22cdd2 to f28ab0e Compare June 6, 2025 06:36

shewu-quic force-pushed the dev1/hutton/enable_custom_operator branch from f28ab0e to 655e416 Compare June 9, 2025 01:55

shewu-quic added 4 commits June 11, 2025 10:57

support x86 enumerator for custom op example script

bd1df7b

fix flatbuffer version mismatch issue

2e10d31

fix default value for flatbuffer

de02835

shewu-quic force-pushed the dev1/hutton/enable_custom_operator branch from 655e416 to de02835 Compare June 11, 2025 03:32

cccclai merged commit 67b6009 into pytorch:main Jun 13, 2025
103 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qualcomm AI Engine Direct - Enable custom operator #8726

Qualcomm AI Engine Direct - Enable custom operator #8726

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Qualcomm AI Engine Direct - Enable custom operator #8726

Qualcomm AI Engine Direct - Enable custom operator #8726

Uh oh!

Conversation

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8726

✅ No Failures

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!