Output Types

GuideLLM provides flexible options for outputting benchmark results, catering to both console-based summaries and file-based detailed reports. This document outlines the supported output types, their configurations, and how to utilize them effectively.

For all of the output formats, --output-extras can be used to include additional information. This could include tags, metadata, hardware details, and other relevant information that can be useful for analysis. This must be supplied as a JSON encoded string. For example:

guidellm benchmark \
  --target "http://localhost:8000" \
  --rate-type sweep \
  --max-seconds 30 \
  --data "prompt_tokens=256,output_tokens=128" \
  --output-extras '{"tag": "my_tag", "metadata": {"key": "value"}}'

Console Output

By default, GuideLLM displays benchmark results and progress directly in the console. The console progress and outputs are divided into multiple sections:

Initial Setup Progress: Displays the progress of the initial setup, including server connection and data preparation.
Benchmark Progress: Shows the progress of the benchmark runs, including the number of requests completed and the current rate.
Final Results: Summarizes the benchmark results, including average latency, throughput, and other key metrics.
Benchmarks Metadata: Summarizes the benchmark run, including server details, data configurations, and profile arguments.
Benchmarks Info: Provides a high-level overview of each benchmark, including request statuses, token counts, and durations.
Benchmarks Stats: Displays detailed statistics for each benchmark, such as request rates, concurrency, latency, and token-level metrics.

Disabling Console Output

To disable the progress outputs to the console, use the disable-progress flag when running the guidellm benchmark command. For example:

guidellm benchmark \
  --target "http://localhost:8000" \
  --rate-type sweep \
  --max-seconds 30 \
  --data "prompt_tokens=256,output_tokens=128" \
  --disable-progress

To disable console output, use the --disable-console-outputs flag when running the guidellm benchmark command. For example:

guidellm benchmark \
  --target "http://localhost:8000" \
  --rate-type sweep \
  --max-seconds 30 \
  --data "prompt_tokens=256,output_tokens=128" \
  --disable-console-outputs

Enabling Extra Information

GuideLLM includes the option to display extra information during the benchmark runs to monitor the overheads and performance of the system. This can be enabled by using the --display-scheduler-stats flag when running the guidellm benchmark command. For example:

guidellm benchmark \
  --target "http://localhost:8000" \
  --rate-type sweep \
  --max-seconds 30 \
  --data "prompt_tokens=256,output_tokens=128" \
  --display-scheduler-stats

The above command will display an additional row for each benchmark within the progress output, showing the scheduler overheads and other relevant information.

File-Based Outputs

GuideLLM supports saving benchmark results to files in various formats, including JSON, YAML, and CSV. These files can be used for further analysis, reporting, or reloading into Python for detailed exploration.

Supported File Formats

JSON: Contains all benchmark results, including full statistics and request data. This format is ideal for reloading into Python for in-depth analysis.
YAML: Similar to JSON, YAML files include all benchmark results and are human-readable.
CSV: Provides a summary of the benchmark data, focusing on key metrics and statistics. Note that CSV does not include detailed request-level data.

Configuring File Outputs

Output Path: Use the --output-path argument to specify the file path or directory for saving the results. If a directory is provided, the results will be saved as benchmarks.json by default. The file type is determined by the file extension (e.g., .json, .yaml, .csv).
Sampling: To limit the size of the output files, you can configure sampling options for the dataset using the --output-sampling argument.

Example command to save results in YAML format:

guidellm benchmark \
  --target "http://localhost:8000" \
  --rate-type sweep \
  --max-seconds 30 \
  --data "prompt_tokens=256,output_tokens=128" \
  --output-path "results/benchmarks.csv" \
  --output-sampling 20

Reloading Results

JSON and YAML files can be reloaded into Python for further analysis using the GenerativeBenchmarksReport class. Below is a sample code snippet for reloading results:

from guidellm.benchmark import GenerativeBenchmarksReport

report = GenerativeBenchmarksReport.load_file(
    path="benchmarks.json",
)
benchmarks = report.benchmarks

for benchmark in benchmarks:
    print(benchmark.id_)

For more details on the GenerativeBenchmarksReport class and its methods, refer to the source code.