vLLM Spyre Roadmap — Q3 2025
Features
| Feature |
Priority |
PRs |
| Continuous batching (homogeneous Tkv) |
P0 |
|
| FP8 model loading |
P0 |
#316 |
| Embedding model support (V1) |
P0 |
|
| LoRA support |
P1 |
|
| Continuous batching (heterogeneous Tkv) |
P1 |
|
| Prefix caching (full/majority matching) |
P1 |
|
vLLM Integration
| Feature |
Priority |
PRs |
| Deprecate V0 API |
P0 |
#241, #344 |
| Use BlockManager for batching |
P1 |
|
| Replace FMS model loading with vLLM |
P2 |
|
Testing
| Feature |
Priority |
PRs |
| Continuous batching (homogeneous Tkv) |
P0 |
|
| Precompiled model loading with continuous batching |
P0 |
|
| 128K context length support |
P0 |
|
| FP8 model loading |
P0 |
#350, #359 |
See vLLM's Q3-2025 roadmap for its incoming features.