llmcompressor.modifiers.utils.pytorch_helpers
apply_pad_mask_to_batch(batch)
Apply a mask to the input ids of a batch. This is used to zero out padding tokens so they do not contribute to the hessian calculation in the GPTQ and SparseGPT algorithms
Assumes that attention_mask
only contains zeros and ones
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch | Dict[str, Tensor] | batch to apply padding to if it exists | required |
Returns:
Type | Description |
---|---|
Dict[str, Tensor] | batch with padding zeroed out in the input_ids |
Source code in src/llmcompressor/modifiers/utils/pytorch_helpers.py
is_moe_model(model)
Check if the model is a mixture of experts model
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model | Module | the model to check | required |
Returns:
Type | Description |
---|---|
bool | True if the model is a mixture of experts model |
Source code in src/llmcompressor/modifiers/utils/pytorch_helpers.py
run_calibration_forward(model, calibration_dataloader, num_calibration_steps=None, calibration_function=None, device=None, mask_padding=False)
Helper function used by one-shot modifiers, runs calibration data through a model to update modifier statistics and trigger hooks
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model | Module | PyTorch model to run | required |
calibration_dataloader | DataLoader | data to use for calibration | required |
num_calibration_steps | Optional[int] | number of items in calibration_dataloader to process, None or a negative number to process all available data | None |
calibration_function | Optional[Callable] | option to pass a custom forward function for model | None |
device | Optional[str] | option to move the model to a specific device before calibration | None |
mask_padding | bool | whether to zero out padding tokens during calibration | False |