vLLM Blog
  • Oct 23, 2024

    Serving LLMs on AMD MI300X: Best Practices

  • Oct 17, 2024

    How Speculative Decoding Boosts vLLM Performance by up to 2.8x

  • Sep 5, 2024

    vLLM v0.6.0: 2.7x Throughput Improvement and 5x Latency Reduction

  • Jul 25, 2024

    vLLM’s Open Governance and Performance Roadmap

  • Jul 23, 2024

    Announcing Llama 3.1 Support in vLLM

  • Nov 14, 2023

    Notes on vLLM v.s. DeepSpeed-FastGen

  • Jun 20, 2023

    vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention

  • © 2024. vLLM Team. All rights reserved.
  • https://github.com/vllm-project/vllm