Getting Started
Welcome to GuideLLM! This section will guide you through the process of installing the tool, setting up your benchmarking environment, running your first benchmark, and analyzing the results to optimize your LLM deployment for real-world inference workloads.
GuideLLM makes it simple to evaluate and optimize your large language model deployments, helping you find the perfect balance between performance, resource utilization, and cost-effectiveness.
Quick Start Guides
Follow the guides below in sequence to get the most out of GuideLLM and optimize your LLM deployments for production use.
-
Installation
Learn how to install GuideLLM using pip, from source, or with specific version requirements.
-
Start a Server
Set up an OpenAI-compatible server using vLLM or other supported backends to benchmark your LLM deployments.
-
Run Benchmarks
Learn how to configure and run performance benchmarks against your LLM server under various load conditions.
-
Analyze Results
Interpret benchmark results to understand throughput, latency, reliability, and optimize your deployments.