k8s使用k8s-device-plugin组件将GPU信息注入到POD并挂载

发布于2024-04-24 11:10:03 更新于2025-08-14 17:53:54
运维
浏览 954
shevechco
手机浏览
评论数 0

k8s已经支持GPU设备的注入和POD容器挂载，需要做以下配置

1.首先我们需要再宿主机安装驱动

2.需要安装nvidia-container-runtime

3.GPU节点打标签并部署k8s-device-plugin

上面1和2可以直接搜我之前的笔记，我们直接部署k8s-device-plugin，这个可以上报GPU节点的GPU数量到k8s中，然后在业务yaml文件中添加requests/limits请求配置中添加GPU相关配置即可自动调度到GPU节点并挂载GPU设备

将以下内容保存为k8s-device-plugin.yaml，lable需要自己按照需求进行修改

cat k8s-device-plugin.yaml

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: nvidia-device-plugin-daemonset
  namespace: kube-system
spec:
  selector:
    matchLabels:
      name: nvidia-device-plugin-ds
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        name: nvidia-device-plugin-ds
    spec:
      tolerations:
      - key: nvidia.com/gpu
        operator: Exists
        effect: NoSchedule
      nodeSelector:
        nodetype: gpu
      priorityClassName: "system-node-critical"
      containers:
      - image: nvcr.io/nvidia/k8s-device-plugin:v0.17.0
        name: nvidia-device-plugin-ctr
        env:
          - name: FAIL_ON_INIT_ERROR
            value: "false"
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop: ["ALL"]
        volumeMounts:
        - name: device-plugin
          mountPath: /var/lib/kubelet/device-plugins
      volumes:
      - name: device-plugin
        hostPath:
          path: /var/lib/kubelet/device-plugins

然后apply进行部署

kubectl apply -f k8s-device-plugin.yaml

部署完以后我们可以describe gpu节点可以看到注册到k8s里面的GPU信息

企业微信截图_20240424111206.png

然后在业务的yaml中添加以下配置即可

        resources:
          requests:
            nvidia.com/gpu: 1
          limits:
            nvidia.com/gpu: 1

标签
nvidia
plugin
k8s
gpu
pod

ubuntu20.04安装nccl和验证

ubuntu20.04安装迈络思(Mellanox)RDMA网卡驱动

转载注明出处：https://sulao.cn/post/975

今日天气

分类统计

博文归档

2篇

8篇

6篇

6篇

3篇

12篇

15篇

43篇

23篇

热门推荐

热门标签

k8s使用k8s-device-plugin组件将GPU信息注入到POD并挂载

评论列表

相关阅读

常用网站