Run a Benchmark
After installing GuideLLM and starting a server, you're ready to run benchmarks to evaluate your LLM deployment's performance.
Running a GuideLLM benchmark is straightforward. The basic command structure is:
Basic Example
To run a benchmark against your local vLLM server with default settings:
guidellm benchmark \
--target "http://localhost:8000" \
--data "prompt_tokens=256,output_tokens=128"
This command:
- Connects to your vLLM server running at
http://localhost:8000
- Uses synthetic data with 256 prompt tokens and 128 output tokens per request
- Automatically determines the available model on the server
- Runs a "sweep" benchmark (default) to find optimal performance points
During the benchmark, you'll see a progress display similar to this:
Understanding Benchmark Options
GuideLLM offers a wide range of configuration options to customize your benchmarks. Here are the most important parameters you should know:
Key Parameters
Parameter | Description | Example |
---|---|---|
--target | URL of the OpenAI-compatible server | --target "http://localhost:8000" |
--model | Model name to benchmark (optional) | --model "Meta-Llama-3.1-8B-Instruct" |
--data | Data configuration for benchmarking | --data "prompt_tokens=256,output_tokens=128" |
--rate-type | Type of benchmark to run | --rate-type sweep |
--rate | Request rate or number of benchmarks for sweep | --rate 10 |
--max-seconds | Duration for each benchmark in seconds | --max-seconds 30 |
--output-path | Output file path and format | --output-path results.json |
Benchmark Types (--rate-type
)
GuideLLM supports several benchmark types:
synchronous
: Runs requests one at a time (sequential)throughput
: Tests maximum throughput by running requests in parallelconcurrent
: Runs a fixed number of parallel request streamsconstant
: Sends requests at a fixed rate per secondpoisson
: Sends requests following a Poisson distributionsweep
: Automatically determines optimal performance points (default)
Data Options
For synthetic data, you can customize:
prompt_tokens
: Average number of tokens for promptsoutput_tokens
: Average number of tokens for outputssamples
: Number of samples to generate (default: 1000)
For a complete list of options, run:
Working with Real Data
While synthetic data is convenient for quick tests, you can benchmark with real-world data:
guidellm benchmark \
--target "http://localhost:8000" \
--data "/path/to/your/dataset.json" \
--rate-type constant \
--rate 5
You can also use datasets from HuggingFace or customize synthetic data generation with additional parameters such as standard deviation, minimum, and maximum values.
By default, complete results are saved to benchmarks.json
in your current directory. Use the --output-path
parameter to specify a different location or format.
Learn more about dataset options in the Datasets documentation.