llmcompressor.modifiers.obcq.base
SparseGPTModifier
Bases: SparsityModifierBase
Modifier for applying the one-shot SparseGPT algorithm to a model
| Sample yaml: | test_stage: | obcq_modifiers: | SparseGPTModifier: | sparsity: 0.5 | mask_structure: "2:4" | dampening_frac: 0.001 | block_size: 128 | targets: ['Linear'] | ignore: ['re:.*lm_head']
Lifecycle: - on_initialize - register_hook(module, calibrate_module, "forward") - on_sequential_batch_end - sparsify_weight - on_finalize - remove_hooks()
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sparsity | Sparsity to compress model to | required | |
sparsity_profile | Can be set to 'owl' to use Outlier Weighed Layerwise Sparsity (OWL), more information can be found in the paper https://arxiv.org/pdf/2310.05175 | required | |
mask_structure | String to define the structure of the mask to apply. Must be of the form N:M where N, M are integers that define a custom block shape. Defaults to 0:0 which represents an unstructured mask. | required | |
owl_m | Number of outliers to use for OWL | required | |
owl_lmbda | Lambda value to use for OWL | required | |
block_size | Used to determine number of columns to compress in one pass | required | |
dampening_frac | Amount of dampening to apply to H, as a fraction of the diagonal norm | required | |
preserve_sparsity_mask | Whether or not to preserve the sparsity mask during when applying sparsegpt, this becomes useful when starting from a previously pruned model, defaults to False. | required | |
offload_hessians | Set to True for decreased memory usage but increased runtime. | required | |
sequential_targets | list of layer names to compress during OBCQ, or 'ALL' to compress every layer in the model. Alias for | required | |
targets | list of layer names to compress during OBCQ, or 'ALL' to compress every layer in the model. Alias for | required | |
ignore | optional list of module class names or submodule names to not quantize even if they match a target. Defaults to empty list. | required |
Source code in src/llmcompressor/modifiers/obcq/base.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 |
|
calibrate_module(module, args, _output)
Calibration hook used to accumulate the hessian of the input to the module
Parameters:
Name | Type | Description | Default |
---|---|---|---|
module | Module | module being calibrated | required |
args | Tuple[Tensor, ...] | inputs to the module, the first element of which is the cannonical input | required |
_output | Tensor | uncompressed module output, unused | required |
Source code in src/llmcompressor/modifiers/obcq/base.py
compress_modules()
Sparsify modules which have been calibrated