8000 Qualcomm AI Engine Direct - Enable custom operator by shewu-quic · Pull Request #8726 · pytorch/executorch · GitHub
[go: up one dir, main page]

Skip to content

Qualcomm AI Engine Direct - Enable custom operator #8726

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 13, 2025

Conversation

shewu-quic
Copy link
Collaborator

Summary:

  • Support to register op package in QNN Backend
  • Add example script to run torch custom op with QNN Op package
  • Allow op package override torch built-in operator
  • Add op package example
  • Modify the flag of dlopen for QNN library
  • Generate custom op based on the meta and _schema.arguments of torch.fx.Node
  • Add README for the custom op

Reproduce commands:

# Follow the README to Install qpm
# Follow the README to install hexagon-sdk and hexagon-tool
# install hexagon sdk 5.4.0 for SM8650
# qpm-cli --install hexagonsdk5.x --version 5.4.0.3 --path /path/to/Qualcomm/Hexagon_SDK/hexagon-sdk-5.4.0
# install hexagon sdk 6.0.0 for x86
# qpm-cli --install hexagonsdk6.x --version 6.0.0.2 --path /path/to/Qualcomm/Hexagon_SDK/hexagon-sdk-6.0.0
# install hexagon tool 8.8.02 for x86
# qpm-cli --extract hexagon8.8 --version 8.8.02.1 --path /path/to/Qualcomm/Hexagon_SDK/hexagon-s
8000
dk-6.0.0/tools/HEXAGON_Tools/8.8.02

export HEXAGON_SDK_ROOT=/path/to/hexagon-sdk-5.4.0
export ANDROID_NDK_ROOT=/path/to/android-ndk-r26c
# use clang-9.0.0
export X86_CXX=/path/to/clang++
# run custom op with example script
python3 examples/qualcomm/custom_op/custom_ops_1.py --build_folder build-android -s <device_serial> -H <host> -m SM8650 --op_package_dir examples/qualcomm/custom_op/example_op_package_htp/ExampleOpPackage --build_op_package
# run custom op with unit test
python3 backends/qualcomm/tests/test_qnn_delegate.py TestUtilScript.test_custom_op -b build-android -s <device_serial> -H <host> -m SM8650 --op_package_dir examples/qualcomm/custom_op/example_op_package_htp/ExampleOpPackage -r </path/to/executorch> -a </path/to/artifacts>

@shewu-quic shewu-quic requested a review from cccclai as a code owner February 26, 2025 09:11
Copy link
pytorch-bot bot commented Feb 26, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8726

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit de02835 with merge base c6c3616 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 26, 2025
@shewu-quic
Copy link
Collaborator Author

Hi @cccclai,

This PR is to support custom kernel in QNN Backend.
Could you please help to take a look?
If you have any problems, please let me know. Thanks :)

< 8000 /div>

Copy link
Contributor
@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The readme is nice, I will try to reproduce this. Thanks!

@shewu-quic shewu-quic force-pushed the dev1/hutton/enable_custom_operator branch from 064d12f to 5f65810 Compare March 6, 2025 09:42
@digantdesai
Copy link
Contributor

Apologies this is still pending. I will also help review this next week.

@shewu-quic
Copy link
Collaborator Author

Will finish review by the end of this week, thank you for being patient.

Thanks for your effort.
I want to add more details about internal issue mentioned in today meeting.
When we try to implement custom embedding op, we found an issue which static variable doesn't free normally after call Qnn Backend free. You will get the following error if using the macro DEF_TENSOR_PROPERTIES in op implementation file.
image
image

We will try to create a workaround PR for this issue, in the meantime, and touch internal HTP backend owner to address this issue.

@shewu-quic
Copy link
Collaborator Author

Finally got time to finish reading, a couple of questions (can be follow up as well).

  1. Looks like users can define any custom op, and as long as they have the custom op package compile and the package name is defined as part of compile spec, they can lower it to qnn backend. It is indeed flexible and offers more options for users.
  2. Looks like the package path should be predefined AoT, however the PoC who exported the model might not be the same as the PoC who run the model on device. Maybe we can just specify the .so path on device and the package can be found automatically?
  1. Yes, I expect users to be flexible. It seems to me that those who use custom operations are advanced users.
  2. Got it. Actually, I share the same thought. For the runtime operation package, we can add a follow-up to use the runtime option for setting. Regarding the AOT operation package, I have tested that the operation package can be found in LD_LIBRARY_PATH due to the dlopen call in the QNN SDK. Therefore, users only need to specify the .so file and correctly set LD_LIBRARY_PATH to access the library.

@shewu-quic
Copy link
Collaborator Author

Approve as I generally like the flexibility of this approach. The only I'm worried is user experience, and would like us improving it.

Originally, our thought on supporting custom ops was to force users to register the custom ops under a specific namespace, like qnn namespace. It's less flexible, but easier to debug, given that users are required to write the qnn package anyway for the qnn custom ops. It's similar to how you support AIHub model. What do you think?

I think it's fine for me. If I add a check for the namespace based on the current design, what are your thoughts?

@shewu-quic shewu-quic force-pushed the dev1/hutton/enable_custom_operator branch from 5f65810 to eaf8aa6 Compare May 7, 2025 07:21
@shewu-quic
Copy link
Collaborator Author

@cccclai I have rebased this PR. Thanks!

@shewu-quic shewu-quic force-pushed the dev1/hutton/enable_custom_operator branch from 5b04add to 3bbb5c0 Compare June 2, 2025 06:20
@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@shewu-quic
Copy link
Collaborator Author
shewu-quic commented Jun 4, 2025

Thanks for enabling x86 in the latest commit, would you be able to help fixing the issue due to flatbuffers version mismatch? We may need to downgrade the flatbuffers in open source unfortunately

Let me make clear for the following.

  1. May I know if this mismatch issue is occurring due to this PR?
  2. What problems did you encounter after the downgrade?

++ @haowhsu-quic aware

@shewu-quic
Copy link
Collaborator Author

When I checkout to 338393f8 in flatbuffers, I get the following error with ./backends/qualcomm/scripts/build.sh.
Do I miss anything?

Error log Error while generating /local3/mnt/workspace/shewu/executorch/executorch_shewu/executorch/build-android/executorch_srcs.cmake. Exit code: 1 Output:

Error:
2025-06-05 17:18:30,928 [ExecuTorch] ERROR: Failed to query buck for sources. Failed command:
buck2 cquery inputs(deps('//runtime/executor:program')) --target-platforms shim_et//:android-arm64 This is likely due to missing git submodules or outdated CMake cache. Please run the following before retry: ./install_executorch.sh --clean
git submodule sync
git submodule update --init

Traceback (most recent call last):
File "/local3/mnt/workspace/shewu/executorch/executorch_shewu/executorch/tools/cmake/buck_util.py", line 34, in run
cp: subprocess.CompletedProcess = subprocess.run(
File "/local/mnt/workspace/miniconda3/envs/executorch/lib/python3.10/subprocess.py", line 524, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/local3/mnt/workspace/shewu/executorch/executorch_shewu/executorch/buck2-bin/buck2-2025-05-06-201beb86106fecdc84e30260b0f1abb5bf576988', 'cquery', "inputs(deps('//runtime/executor:program')
)", '--target-platforms', 'shim_et//:android-arm64']' returned non-zero exit status 3.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/local3/mnt/workspace/shewu/executorch/executorch_shewu/executorch/tools/cmake/extract_sources.py", line 255, in
main()
File "/local3/mnt/workspace/shewu/executorch/executorch_shewu/executorch/tools/cmake/extract_sources.py", line 240, in main
target_to_srcs[name] = sorted(target.get_sources(graph, runner, buck_args))
File "/local3/mnt/workspace/shewu/executorch/executorch_shewu/executorch/tools/cmake/extract_sources.py", line 144, in get_sources
raise e
File "/local3/mnt/workspace/shewu/executorch/executorch_shewu/executorch/tools/cmake/extract_sources.py", line 132, in get_sources
sources: set[str] = set(runner.run(["cquery", query] + buck_args))
File "/local3/mnt/workspace/shewu/executorch/executorch_shewu/executorch/tools/cmake/buck_util.py", line 42, in run
raise RuntimeError(ex.stderr.decode("utf-8")) from ex
RuntimeError: [2025-06-05T17:18:17.612+08:00] Starting new buck2 daemon...
[2025-06-05T17:18:30.149+08:00] Connected to new buck2 daemon.
[2025-06-05T17:18:30.198+08:00] Build ID: b928dfaf-981f-44d0-9077-9d3406669451
Command failed:
Error in configured node dependency, dependency chain follows (-> indicates depends on, ^ indicates same configuration as previous):
root//runtime/executor:program (shim_et//:android-arm64#529b86ff5de06a9c)
-> root//runt 8000 ime/executor:program_no_prim_ops (^)
-> root//runtime/executor:pte_data_map (^)
-> root//schema:program (^)
-> root//schema:generate_program (^)
-> root//third-party:flatc (^)

Caused by:
0: looking up unconfigured target node root//third-party:flatc
1: Error loading targets in package root//third-party for target root//third-party:flatc
2: Error evaluating build file: root//third-party:TARGETS
3: Traceback (most recent call last):
* third-party/TARGETS:123, in
runtime.cxx_library(
* shim_et/xplat/executorch/build/runtime_wrapper.bzl:249, in _cxx_library
_cxx_library_common(*args, **kwargs)
* shim_et/xplat/executorch/build/runtime_wrapper.bzl:242, in _cxx_library_common
env.cxx_library(*args, **kwargs)

   error: Error coercing attribute `raw_headers` of `root//third-party:flatc_library`
     --> shim_et/xplat/executorch/build/runtime_wrapper.bzl:242:5
       |
   242 |     env.cxx_library(*args, **kwargs)
       |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       |
4: Error coercing attribute `raw_headers` of type `attrs.list(attrs.source(), default=[])`
5: Error coercing ["flatbuffers/include/flatbuffers/allocator.h", "flatbuffers/include/flatbuffers/array.h", "flatbuffers/include/flatbuffers/base.h", "flatbuffers/include/flatbuffers/buffer.h", "flatbuffers/include/flatbuffers

/buffer_ref.h", "flatbuffers/include/flatbuffers/code_generator.h", "flatbuffers/include/flatbuffers/default_allocator.h", "flatbuffers/include/flatbuffers/detached_buffer.h", "flatbuffers/include/flatbuffers/file_manager.h", "flat
buffers/include/flatbuffers/flatbuffer_builder.h", "flatbuffers/include/flatbuffers/flatbuffers.h", "flatbuffers/include/flatbuffers/flex_flat_util.h", "flatbuffers/include/flatbuffers/flexbuffers.h", "flatbuffers/include/flatbuffe
rs/hash.h", "flatbuffers/include/flatbuffers/idl.h", "flatbuffers/include/flatbuffers/minireflect.h", "flatbuffers/include/flatbuffers/reflection.h", "flatbuffers/include/flatbuffers/reflection_generated.h", "flatbuffers/include/fl
atbuffers/registry.h", "flatbuffers/include/flatbuffers/stl_emulation.h", "flatbuffers/include/flatbuffers/string.h", "flatbuffers/include/flatbuffers/struct.h", "flatbuffers/include/flatbuffers/table.h", "flatbuffers/include/flatb
uffers/util.h", "flatbuffers/include/flatbuffers/vector.h", "flatbuffers/include/flatbuffers/vector_downward.h", "flatbuffers/include/flatbuffers/verifier.h"]
6: Error coercing "flatbuffers/include/flatbuffers/allocator.h"
7: Coercing flatbuffers/include/flatbuffers/allocator.h as a source
8: Source file flatbuffers/include/flatbuffers/allocator.h does not exist as a member of package root//third-party.

CMake Error at tools/cmake/Utils.cmake:109 (message):
executorch: source list generation failed
Call Stack (most recent call first):
CMakeLists.txt:303 (extract_sources)

-- Configuring incomplete, errors occurred!

@shewu-quic
Copy link
Collaborator Author

Woops, I got it. This commit is very different from now. It is missing some header file...

@shewu-quic shewu-quic force-pushed the dev1/hutton/enable_custom_operator branch from eff8673 to a22cdd2 Compare June 5, 2025 10:12
@shewu-quic
Copy link
Collaborator Author

It seems flatbuffers::Vector only takes one template argument. I've made the adjustments. Could you please try again?

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@shewu-quic shewu-quic force-pushed the dev1/hutton/enable_custom_operator branch from a22cdd2 to f28ab0e Compare June 6, 2025 06:36
@shewu-quic
Copy link
Collaborator Author

But there is a different issue...

Error log

It seems that dataclass cannot set mutable value to a class attributes.
I have changed. Could you take a shot again? Thanks.

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@shewu-quic shewu-quic force-pushed the dev1/hutton/enable_custom_operator branch from f28ab0e to 655e416 Compare June 9, 2025 01:55
@shewu-quic
Copy link
Collaborator Author

Hmm seems like importing fail, can you rebase?

Done. Thanks :)

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary:
- Support to register op package in QNN Backend
- Add example script to run torch custom op with QNN Op package
- Allow op package override torch built-in operator
- Add op package example
  - move test_custom_op to TestUtilScript
- Modify the flag of dlopen for QNN library
- Generate custom op based on the meta and _schema.arguments of torch.fx.Node
- Add README for the custom op
@shewu-quic shewu-quic force-pushed the dev1/hutton/enable_custom_operator branch from 655e416 to de02835 Compare June 11, 2025 03:32
@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@cccclai cccclai merged commit 67b6009 into pytorch:main Jun 13, 2025
103 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: qualcomm Changes to the Qualcomm backend delegate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0