-
Distributed Inference with vLLM
-
vLLM V1: A Major Upgrade to vLLM's Core Architecture
-
Introducing vLLM Inference Provider in Llama Stack
-
High Performance and Easy Deployment of vLLM in K8S with “vLLM production-stack”
-
Structured Decoding in vLLM: a gentle introduction
-
vLLM 2024 Retrospective and 2025 Vision
-
Installing and Developing vLLM with Ease
-
Serving LLMs on AMD MI300X: Best Practices
-
How Speculative Decoding Boosts vLLM Performance by up to 2.8x
-
vLLM v0.6.0: 2.7x Throughput Improvement and 5x Latency Reduction
-
vLLM’s Open Governance and Performance Roadmap
-
Announcing Llama 3.1 Support in vLLM
-
Notes on vLLM v.s. DeepSpeed-FastGen
-
vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention