In this tutorial, you will learn how to load a model from a Persistent Volume (PV) in Kubernetes to optimize deployment performance. The steps include creating a PV, matching it using pvcMatchLabels
, and deploying the Helm chart to utilize the PV. You will also verify the setup by examining the contents and measuring performance improvements.
tutorials/assets/pv-03.yaml
) with the following content:apiVersion: v1
kind: PersistentVolume
metadata:
name: test-vllm-pv
labels:
model: "llama3-pv"
spec:
capacity:
storage: 50Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: standard
hostPath:
path: /data/llama3
Note: You can change the path specified in the
hostPath
field to any valid directory on your Kubernetes node.
sudo kubectl apply -f tutorials/assets/pv-03.yaml
sudo kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS AGE
test-vllm-pv 50Gi RWO Retain Available standard 2m
tutorials/assets/values-03-match-pv.yaml
with the following content:servingEngineSpec:
modelSpec:
- name: "llama3"
repository: "vllm/vllm-openai"
tag: "latest"
modelURL: "meta-llama/Llama-3.1-8B-Instruct"
replicaCount: 1
requestCPU: 10
requestMemory: "16Gi"
requestGPU: 1
pvcStorage: "50Gi"
pvcMatchLabels:
model: "llama3-pv"
vllmConfig:
maxModelLen: 4096
env:
- name: HF_TOKEN
value: <YOUR_HF_TOKEN>
Explanation: The
pvcMatchLabels
field specifies the labels to match an existing Persistent Volume. In this example, it ensures that the deployment uses the PV with the labelmodel: "llama3-pv"
. This provides a way to link a specific PV to your application.
Note: Make sure to replace
<YOUR_HF_TOKEN>
with your actual Hugging Face token in theenv
section.
sudo helm install llmstack llmstack-repo/vllm-stack -f tutorials/assets/values-03-match-pv.yaml
sudo kubectl get pods
NAME READY STATUS RESTARTS AGE
llmstack-deployment-router-xxxx-xxxx 1/1 Running 0 1m
llmstack-llama3-deployment-vllm-xxxx-xxxx 1/1 Running 0 1m
Check the contents of the host directory:
If using a standard Kubernetes node:
sudo ls /data/llama3
If using Minikube, access the Minikube VM and check the path:
sudo minikube ssh
ls /data/llama3/hub
You should see the model files loaded into the directory:
models--meta-llama--Llama-3.1-8B-Instruct version.txt
sudo helm uninstall llmstack
sudo kubectl delete -f tutorials/assets/pv-03.yaml && sudo kubectl apply -f tutorials/assets/pv-03.yaml
sudo helm install llmstack llmstack-repo/vllm-stack -f tutorials/assets/values-03-match-pv.yaml
In this tutorial, you learned how to utilize a Persistent Volume to store model weights for a vLLM serving engine. This approach optimizes deployment performance and demonstrates the benefits of Kubernetes storage resources. Continue exploring advanced configurations in future tutorials.