llmcompressor.pipelines
BasicPipeline
Bases: CalibrationPipeline
Source code in src/llmcompressor/pipelines/basic/pipeline.py
__call__(model, dataloader, dataset_args)
staticmethod
Run a basic data pipeline.
Batches are fetched from the data loader and are used to perform forward passes through the model. This pipeline is typically used for basic model calibration and, unlike the sequential pipelines, does not propagate compression error when used to calibrate model compression
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model | Module | model being calibrated | required |
dataloader | DataLoader | loads data for calibration | required |
dataset_args | Union[DatasetArguments, None] | dataset arguments relevant to pipelines | required |
Source code in src/llmcompressor/pipelines/basic/pipeline.py
CalibrationPipeline
Bases: ABC
, RegistryMixin
Source code in src/llmcompressor/pipelines/registry.py
from_modifiers(modifiers, user=None)
classmethod
Infer which calibration pipeline to use based on the available modifiers and any user specifications
Parameters:
Name | Type | Description | Default |
---|---|---|---|
modifiers | List[Modifier] | modifiers to apply to model | required |
user | Optional[str] | pipeline name passed by user | None |
Returns:
Type | Description |
---|---|
CalibrationPipeline | CalibrationPipeline instance to be called with data (if not datafree) |
Source code in src/llmcompressor/pipelines/registry.py
DataFreePipeline
Bases: CalibrationPipeline
Source code in src/llmcompressor/pipelines/data_free/pipeline.py
__call__(model, dataloader, dataset_args)
staticmethod
A pipeline for data-free calibration
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model | Module | model being calibrated | required |
dataloader | Optional[DataLoader] | loads data for calibration | required |
dataset_args | DatasetArguments | dataset arguments relevant to pipelines | required |
Source code in src/llmcompressor/pipelines/data_free/pipeline.py
IndependentPipeline
Bases: CalibrationPipeline
Source code in src/llmcompressor/pipelines/independent/pipeline.py
__call__(model, dataloader, dataset_args)
staticmethod
Data pipeline where each modifier is assigned its own calibration epoch and data pipeline
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model | Module | model being calibrated | required |
dataloader | DataLoader | loads data for calibration | required |
dataset_args | DatasetArguments | dataset arguments relevant to pipelines | required |
Source code in src/llmcompressor/pipelines/independent/pipeline.py
LayerSequentialPipeline
Bases: CalibrationPipeline
Source code in src/llmcompressor/pipelines/layer_sequential/pipeline.py
__call__(model, dataloader, dataset_args)
staticmethod
Run a layer-wise sequential data pipeline according to the following steps:
- Layers are identified according to
sequential_targets
- A hook is attached to the first layer. This hook raises an exception which is then caught and used to capture the input arguments to the first layer
- The inputs to the first layer are used to calibrate the first layer, and the output of the previous layer is used as inputs to calibrate the next layer
This pipeline requires that the model have distinct layers defined in its architecture and that the outputs of the previous layer are exactly the inputs to the next layer. This is violated by encoder-decoder architectures, among others.
If your model architecture violates these assumptions, consider using the sequential pipeline (see llmcompressor.pipelines.sequential). Architectures which are known to fail these assumptions include GPT-J and most vision models
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model | Module | model being calibrated | required |
dataloader | DataLoader | loads data for calibration | required |
dataset_args | DatasetArguments | dataset arguments relevant to pipelines | required |
Source code in src/llmcompressor/pipelines/layer_sequential/pipeline.py
SequentialPipeline
Bases: CalibrationPipeline
Source code in src/llmcompressor/pipelines/sequential/pipeline.py
__call__(model, dataloader, dataset_args)
staticmethod
Run a sequential data pipeline according to the following steps:
- The model is partitioned into subgraphs according to
sequential_targets
- Data passes through each subgraph sequentially. Data is passed through each subgraph twice, once to trigger calibration hooks, then a second time in order to capture activations after quantization has occurred through hooks.
- The intermediate activations between each subgraph are cached and offloaded to the cpu between each batch in order to save memory
This pipeline requires that the model be traceable with respect to data from the data loader. This may be an issue for vision models with vision datasets, due to specialized input processing in the model.
In the event that tracing fails, a torch.fx.proxy.TraceError will be raised. A model can be made traceable by wrapping the untraceable functions (see llmcompressor.transformers.tracing)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model | Module | model being calibrated | required |
dataloader | DataLoader | loads data for calibration | required |
dataset_args | DatasetArguments | dataset arguments relevant to pipelines | required |
Source code in src/llmcompressor/pipelines/sequential/pipeline.py
get_targets_from_modifiers(modifiers, model)
Infer sequential targets and ignore list from modifiers list
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model | PreTrainedModel | model being calibrated | required |
modifiers | List[Modifier] | list of modifiers being applied during calibration | required |
Returns:
Type | Description |
---|---|
Tuple[List[str], List[str]] | list of sequential targets and list of modules to ignore for tracing |