k8s调度GPU节点并挂载GPU设备

k8s已经支持GPU设备的调度和POD容器挂载,需要做以下配置

1.首先我们需要再宿主机安装驱动

2.需要安装nvidia-container-runtime

3.GPU节点打标签并部署k8s-device-plugin

上面1和2可以直接搜我之前的笔记,我们直接部署k8s-device-plugin,这个可以上报GPU节点的GPU数量到k8s中,然后在业务yaml文件中添加requests/limits请求配置中添加GPU相关配置即可自动调度到GPU节点并挂载GPU设备

将以下内容保存为k8s-device-plugin.yaml,lable需要自己按照需求进行修改

cat k8s-device-plugin.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: nvidia-device-plugin-daemonset
  namespace: kube-system
spec:
  selector:
    matchLabels:
      name: nvidia-device-plugin-ds
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        name: nvidia-device-plugin-ds
    spec:
      tolerations:
      - key: nvidia.com/gpu
        operator: Exists
        effect: NoSchedule
      nodeSeletor:
        nodetype: gpu
      priorityClassName: "system-node-critical"
      containers:
      - image: nvcr.io/nvidia/k8s-device-plugin:v0.13.0
        name: nvidia-device-plugin-ctr
        env:
          - name: FAIL_ON_INIT_ERROR
            value: "false"
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop: ["ALL"]
        volumeMounts:
        - name: device-plugin
          mountPath: /var/lib/kubelet/device-plugins
      volumes:
      - name: device-plugin
        hostPath:
          path: /var/lib/kubelet/device-plugins

然后apply进行部署

kubectl apply -f k8s-device-plugin.yaml

部署完以后我们可以describe gpu节点可以看到注册到k8s里面的GPU信息

企业微信截图_20240424111206.png

然后在业务的yaml中添加以下配置即可

        resources:
          requests:
            nvidia.com/gpu: 1
          limits:
            nvidia.com/gpu: 1



内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://sulao.cn/post/978.html