llmcompressor.modifiers.pruning
WandaPruningModifier
Bases: SparsityModifierBase
Modifier for applying the one-shot WANDA algorithm to a model from the paper: https://arxiv.org/abs/2306.11695
| Sample yaml: | test_stage: | sparsity_modifiers: | WandaPruningModifier: | sparsity: 0.5 | mask_structure: "2:4"
Lifecycle: - on_initialize - register_hook(module, calibrate_module, "forward") - run_sequential / run_layer_sequential / run_basic - make_empty_row_scalars - accumulate_row_scalars - on_sequential_batch_end - sparsify_weight - on_finalize - remove_hooks()
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sparsity | Sparsity to compress model to | required | |
sparsity_profile | Can be set to 'owl' to use Outlier Weighed Layerwise Sparsity (OWL), more information can be found in the paper https://arxiv.org/pdf/2310.05175 | required | |
mask_structure | String to define the structure of the mask to apply. Must be of the form N:M where N, M are integers that define a custom block shape. Defaults to 0:0 which represents an unstructured mask. | required | |
owl_m | Number of outliers to use for OWL | required | |
owl_lmbda | Lambda value to use for OWL | required | |
sequential_targets | list of layer names to compress during OBCQ, or 'ALL' to compress every layer in the model. Alias for | required | |
targets | list of layer names to compress during OBCQ, or 'ALL' to compress every layer in the model. Alias for | required | |
ignore | optional list of module class names or submodule names to not quantize even if they match a target. Defaults to empty list. | required |
Source code in src/llmcompressor/modifiers/pruning/wanda/base.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
|
calibrate_module(module, args, _output)
Calibration hook used to accumulate the row scalars of the input to the module
Parameters:
Name | Type | Description | Default |
---|---|---|---|
module | Module | module being calibrated | required |
args | Tuple[Tensor, ...] | inputs to the module, the first element of which is the cannonical input | required |
_output | Tensor | uncompressed module output, unused | required |
Source code in src/llmcompressor/modifiers/pruning/wanda/base.py
compress_modules()
Sparsify modules which have been calibrated