Architecture
GuideLLM is designed to evaluate and optimize large language model (LLM) deployments by simulating real-world inference workloads. The architecture is modular, enabling flexibility and scalability. Below is an overview of the core components and their interactions.
+------------------+ +------------------+ +------------------+
| DatasetCreator | ---> | RequestLoader | ---> | Scheduler |
+------------------+ +------------------+ +------------------+
/ | \
/ | \
/ | \
v v v
+------------------+ +------------------+
| RequestsWorker | | RequestsWorker |
+------------------+ +------------------+
| |
v v
+------------------+ +------------------+
| Backend | | Backend |
+------------------+ +------------------+
| |
v v
+---------------------------------------+
| BenchmarkAggregator |
+---------------------------------------+
|
v
+------------------+
| Benchmarker |
+------------------+
Core Components
1. Backend
The Backend
is an abstract interface for interacting with generative AI backends. It is responsible for processing requests and generating results. GuideLLM supports OpenAI-compatible HTTP servers, such as vLLM, as backends.
- Responsibilities:
- Accept requests from the
RequestsWorker
. - Generate responses for text or chat completions.
- Validate backend readiness and available models.
2. RequestLoader
The RequestLoader
handles sourcing data from an iterable and generating requests for the backend. It ensures that data is properly formatted and ready for processing.
- Responsibilities:
- Load data from datasets or synthetic sources.
- Generate requests in a format compatible with the backend.
3. DatasetCreator
The DatasetCreator
is responsible for loading data sources and converting them into Hugging Face (HF) dataset items. These items can then be streamed by the RequestLoader
.
- Responsibilities:
- Load datasets from local files, Hugging Face datasets, or synthetic data.
- Convert data into a format compatible with the
RequestLoader
.
4. Scheduler
The Scheduler
manages the scheduling of requests to the backend. It uses multiprocessing and multithreading with asyncio to minimize overheads and maximize throughput.
- Responsibilities:
- Schedule requests to the backend.
- Manage queues for requests and results.
- Ensure efficient utilization of resources.
5. RequestsWorker
The RequestsWorker
is a worker process that pulls requests from a queue, processes them using the backend, and sends the results back to the scheduler.
- Responsibilities:
- Process requests from the scheduler.
- Interact with the backend to generate results.
- Return results to the scheduler.
6. Benchmarker
The Benchmarker
wraps around multiple invocations of the Scheduler
, one for each benchmark. It aggregates results using a BenchmarkAggregator
and compiles them into a Benchmark
once complete.
- Responsibilities:
- Manage multiple benchmarks.
- Aggregate results from the scheduler.
- Compile results into a final benchmark report.
7. BenchmarkAggregator
The BenchmarkAggregator
is responsible for storing and compiling results from the benchmarks.
- Responsibilities:
- Aggregate results from multiple benchmarks.
- Compile results into a
Benchmark
object.
Component Interactions
The following diagram illustrates the relationships between the core components: