8000 [Issue]: ROCm TE Installation Error: no member named 'getCurrentHIPStreamMasqueradingAsCUDA' in namespace 'c10::hip' · Issue #71 · ROCm/TransformerEngine · GitHub
[go: up one dir, main page]

Skip to content
[Issue]: ROCm TE Installation Error: no member named 'getCurrentHIPStreamMasqueradingAsCUDA' in namespace 'c10::hip' #71
@functionstackx

Description

@functionstackx

Problem Description

I am running into an error no member named 'getCurrentHIPStreamMasqueradingAsCUDA' in namespace 'c10::hip' when trying to install ROCm/TransformerEngine following the instructions in the README. Do you have any tips on how to resolve this error?

Reprod

FROM rocm/pytorch:rocm6.2_ubuntu22.04_py3.10_pytorch_release_2.3.0

RUN apt install nano

RUN pip3 uninstall -y torch

RUN pip3 install --pre torch --index-url https://download.pytorch.org/whl/nightly/rocm6.2

RUN pip3 install pybind11

WORKDIR /workspace/

# Unlike Nvidia NGC Pytorch image, ROCm Pytorch does not have Transformer Engine Installed
# So we need to install from source
RUN git clone --recursive https://github.com/ROCm/TransformerEngine.git
ENV NVTE_FRAMEWORK=pytorch
ENV PYTORCH_ROCM_ARCH=gfx942

RUN cd TransformerEngine && pip install .

Error Trace

      /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/activation.hip:255:58: error: no member named 'getCurrentHIPStreamMasqueradingAsCUDA' in namespace 'c10::hip'
        255 |   nvte_srelu(input_cu.data(), output_cu.data(), at::hip::getCurrentHIPStreamMasqueradingAsCUDA());
            |                                                 ~~~~~~~~~^
      /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/activation.hip:274:75: error: no member named 'getCurrentHIPStreamMasqueradingAsCUDA' in namespace 'c10::hip'
        274 |   nvte_dsrelu(grad_cu.data(), input_cu.data(), output_cu.data(), at::hip::getCurrentHIPStreamMasqueradingAsCUDA());
            |                                                                  ~~~~~~~~~^
      14 errors generated when compiling for gfx942.
      failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx942  -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/transformer_engine/common -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/THC -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/THH -I/opt/rocm/include -I/opt/conda/envs/py_3.10/include/python3.10 -c -c -x hip /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/activation.hip -o "/workspace/TransformerEngine/build/temp.linux-x86_64-cpython-310/transformer_engine/pytorch/csrc/extensions/activation.o" -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_BFLOAT16_OPERATORS__ -U__HIP_NO_BFLOAT16_CONVERSIONS__ -U__HIP_NO_BFLOAT162_OPERATORS__ -U__HIP_NO_BFLOAT162_CONVERSIONS__ -parallel-jobs=4 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=transformer_engine_torch -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc -std=c++17
      [14/17] c++ -MMD -MF /workspace/TransformerEngine/build/temp.linux-x86_64-cpython-310/transformer_engine/pytorch/csrc/ts_fp8_op_hip.o.d -pthread -B /opt/conda/envs/py_3.10/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/envs/py_3.10/include -fPIC -O2 -isystem /opt/conda/envs/py_3.10/include -fPIC -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/transformer_engine/common -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/THC -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/THH -I/opt/rocm/include -I/opt/conda/envs/py_3.10/include/python3.10 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/ts_fp8_op_hip.cpp -o /workspace/TransformerEngine/build/temp.linux-x86_64-cpython-310/transformer_engine/pytorch/csrc/ts_fp8_op_hip.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -O3 -fvisibility=hidden -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=transformer_engine_torch -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17
      [15/17] c++ -MMD -MF /workspace/TransformerEngine/build/temp.linux-x86_64-cpython-310/transformer_engine/pytorch/csrc/extensions/pybind_hip.o.d -pthread -B /opt/conda/envs/py_3.10/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/envs/py_3.10/include -fPIC -O2 -isystem /opt/conda/envs/py_3.10/include -fPIC -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/transformer_engine/common -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/THC -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/THH -I/opt/rocm/include -I/opt/conda/envs/py_3.10/include/python3.10 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/pybind_hip.cpp -o /workspace/TransformerEngine/build/temp.linux-x86_64-cpython-310/transformer_engine/pytorch/csrc/extensions/pybind_hip.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -O3 -fvisibility=hidden -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=transformer_engine_torch -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17
      [16/17] /opt/rocm/bin/hipcc  -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/transformer_engine/common 
8A7D
-I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/THC -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/THH -I/opt/rocm/include -I/opt/conda/envs/py_3.10/include/python3.10 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/misc.hip -o /workspace/TransformerEngine/build/temp.linux-x86_64-cpython-310/transformer_engine/pytorch/csrc/extensions/misc.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_BFLOAT16_OPERATORS__ -U__HIP_NO_BFLOAT16_CONVERSIONS__ -U__HIP_NO_BFLOAT162_OPERATORS__ -U__HIP_NO_BFLOAT162_CONVERSIONS__ -parallel-jobs=4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=transformer_engine_torch -D_GLIBCXX_USE_CXX11_ABI=0 --offload-arch=gfx942 -fno-gpu-rdc -std=c++17
      [17/17] /opt/rocm/bin/hipcc  -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/transformer_engine/common -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/THC -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/THH -I/opt/rocm/include -I/opt/conda/envs/py_3.10/include/python3.10 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/common.hip -o /workspace/TransformerEngine/build/temp.linux-x86_64-cpython-310/transformer_engine/pytorch/csrc/common.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_BFLOAT16_OPERATORS__ -U__HIP_NO_BFLOAT16_CONVERSIONS__ -U__HIP_NO_BFLOAT162_OPERATORS__ -U__HIP_NO_BFLOAT162_CONVERSIONS__ -parallel-jobs=4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=transformer_engine_torch -D_GLIBCXX_USE_CXX11_ABI=0 --offload-arch=gfx942 -fno-gpu-rdc -std=c++17
      ninja: build stopped: subcommand failed.
      Traceback (most recent call last):
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2147, in _run_ninja_build
          subprocess.run(
        File "/opt/conda/envs/py_3.10/lib/python3.10/subprocess.py", line 526, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '32']' returned non-zero exit status 1.
      
      The above exception was the direct cause of the following exception:
      
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/workspace/TransformerEngine/setup.py", line 135, in <module>
          setuptools.setup(
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/__init__.py", line 104, in setup
          return distutils.core.setup(**attrs)
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 184, in setup
          return run_commands(dist)
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
          dist.run_commands()
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
          self.run_command(cmd)
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
          super().run_command(command)
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/wheel/bdist_wheel.py", line 368, in run
          self.run_command("build")
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
          self.distribution.run_command(command)
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
          super().run_command(command)
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/_distutils/command/build.py", line 132, in run
          self.run_command(cmd_name)
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
          self.distribution.run_command(command)
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
          super().run_command(command)
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/workspace/TransformerEngine/build_tools/build_ext.py", line 117, in run
          super().run()
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 91, in run
          _build_ext.run(self)
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
          self.build_extensions()
        File "/workspace/TransformerEngine/build_tools/build_ext.py", line 246, in build_extensions
          super().build_extensions()
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 899, in build_extensions
          build_ext.build_extensions(self)
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 479, in build_extensions
          self._build_extensions_serial()
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 505, in _build_extensions_serial
          self.build_extension(ext)
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 252, in build_extension
          _build_ext.build_extension(self, ext)
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/Cython/Distutils/build_ext.py", line 135, in build_extension
          super(build_ext, self).build_extension(ext)
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 560, in build_extension
          objects = self.compiler.compile(
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 712, in unix_wrap_ninja_compile
          _write_ninja_file_and_compile_objects(
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1827, in _write_ninja_file_and_compile_objects
          _run_ninja_build(
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2163, in _run_ninja_build
          raise RuntimeError(message) from e
      RuntimeError: Error compiling objects for extension
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.

Operating System

Ubuntu

CPU

AMD CPU

GPU

AMD Instinct MI300X

ROCm Version

ROCm 6.2.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0