Skip to content

Home

New Feature: Axolotl Sparse Finetuning Integration

Easily finetune sparse LLMs through our seamless integration with Axolotl. Learn more.

New Feature: AutoAWQ Integration

Perform low-bit weight-only quantization efficiently using AutoAWQ, now part of LLM Compressor. Learn more.

LLM Compressor Logo LLM Compressor

LLM Compressor Flow

LLM Compressor is an easy-to-use library for optimizing large language models for deployment with vLLM, enabling up to 5X faster, cheaper inference. It provides a comprehensive toolkit for:

  • Applying a wide variety of compression algorithms, including weight and activation quantization, pruning, and more
  • Seamlessly integrating with Hugging Face Transformers, Models, and Datasets
  • Using a safetensors-based file format for compressed model storage that is compatible with vLLM
  • Supporting performant compression of large models via accelerate

Key Features

  • Weight and Activation Quantization: Reduce model size and improve inference performance for general and server-based applications with the latest research.
    • Supported Algorithms: GPTQ, AWQ, SmoothQuant, RTN
    • Supported Formats: INT W8A8, FP W8A8
  • Weight-Only Quantization: Reduce model size and improve inference performance for latency sensitive applications with the latest research
    • Supported Algorithms: GPTQ, AWQ, RTN
    • Supported Formats: INT W4A16, INT W8A16
  • Weight Pruning: Reduce model size and improve inference performance for all use cases with the latest research
    • Supported Algorithms: SparseGPT, Magnitude, Sparse Finetuning
    • Supported Formats: 2:4 (semi-structured), unstructured

Key Sections

  • Getting Started


    Install LLM Compressor and learn how to apply your first optimization recipe.

    Getting started

  • Guides


    Detailed guides covering compression schemes, algorithms, and advanced usage patterns.

    Guides

  • Examples


    Step-by-step examples for different compression techniques and model types.

    Examples

  • Developer Resources


    Information for contributors and developers extending LLM Compressor.

    Developer Resources