8000 [Bug] 请问lmdeploy支持k8s方式在昇腾机器上的多节点部署吗 · Issue #3551 · InternLM/lmdeploy · GitHub
[go: up one dir, main page]

Skip to content
[Bug] 请问lmdeploy支持k8s方式在昇腾机器上的多节点部署吗 #3551
@winni0

Description

@winni0

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

请问lmdeploy支持k8s方式在昇腾机器上的多节点部署吗,我进行了测试,但是调多节点npu资源失败了,报了以下错误:

Image

yaml文件如下:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: npu-deployment
spec:
  replicas: 2
  selector:
    matchLabels:
      app: npu-app
  strategy: {}
  template:
    metadata:
      labels:
        app: npu-app
        ascend: "on"
    spec:
      hostNetwork: true 
      dnsPolicy: ClusterFirstWithHostNet
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchLabels:
                  app: npu-app
              topologyKey: "kubernetes.io/hostname"
      containers:
      - name: lmdeploy-container
        image: lmdeploy:v0.8.0-ascend-46535327
        volumeMounts:
        - name: my-storage
          mountPath: /nfs
        command: ["/bin/bash", "-c", "source /usr/local/Ascend/ascend-toolkit/set_env.sh && source /usr/local/Ascend/nnal/atb/set_env.sh --cxx_abi=0 && lmdeploy serve api_server /nfs/models/Qwen2.5-72B-Instruct --server-port 23333 --device ascend --tp 4"]
        env:
        - name: LD_LIBRARY_PATH
          value: "$LD_LIBRARY_PATH:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/"
        resources:
          limits:
            huawei.com/Ascend910B: "2"
          requests:
            huawei.com/Ascend910B: "2"
      nodeSelector:  
        ascend: "on"
      volumes:
      - name: my-storage
        hostPath:
          path: /nfs
          type: Directory

Reproduction

kubectl apply -f multi_node_lmdeploy.yaml

Environment

Python: 3.10.5 (main, May 12 2025, 15:01:17) [GCC 9.4.0]
CUDA available: False
MUSA available: False
numpy_random_seed: 2147483648
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
PyTorch: 2.3.1
PyTorch compiling details: PyTorch built with:
  - GCC 10.2
  - C++ Version: 201703
  - Intel(R) MKL-DNN v3.3.6 (Git Hash 86e6af5974177e513fd3fee58425e1063e7f1361)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: NO AVX
  - Build settings: BLAS_INFO=open, BUILD_TYPE=Release, CXX_COMPILER=/opt/rh/devtoolset-10/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=open, TORCH_VERSION=2.3.1, USE_CUDA=OFF, USE_CUDNN=OFF, USE_CUSPARSELT=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, 

TorchVision: 0.18.1
LMDeploy: 0.8.0+
transformers: 4.51.3
gradio: Not Found
fastapi: 0.115.12
pydantic: 2.11.4
triton: Not Found

Error traceback

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0