-
Serving LLMs on AMD MI300X: Best Practices
-
How Speculative Decoding Boosts vLLM Performance by up to 2.8x
-
vLLM v0.6.0: 2.7x Throughput Improvement and 5x Latency Reduction
-
vLLM’s Open Governance and Performance Roadmap
-
Announcing Llama 3.1 Support in vLLM
-
Notes on vLLM v.s. DeepSpeed-FastGen
-
vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention