Getting Started
Welcome to LLM Compressor! This section will guide you through the process of installing the library, compressing your first model, and deploying it with vLLM for faster, more efficient inference.
LLM Compressor makes it simple to optimize large language models for deployment, offering various quantization techniques that help you find the perfect balance between model quality, performance, and resource efficiency.
Quick Start Guides
Follow the guides below to get started with LLM Compressor and optimize your models for production deployment.
-
Installation
Learn how to install LLM Compressor using pip or from source.
-
Compress Your Model
Learn how to apply quantization to your models using different algorithms and formats.
-
Deploy with vLLM
Deploy your compressed model for efficient inference using vLLM.