llmcompressor.pipelines.cache
IntermediateValue
dataclass
Dataclass which recursively defines offloaded values and which device to onload to
Parameters:
Name | Type | Description | Default |
---|---|---|---|
value | Union[Tensor, IntermediateValue, Any] | either an offloaded Tensor, an primative value, or a recursable value | required |
device | Union[device, None] | if the value is a Tensor, then the device to onload the tensor to, otherwise None | required |
Source code in src/llmcompressor/pipelines/cache.py
IntermediatesCache
Cache which stores intermediate values (activations) produced by batched, sequential execution of models. Values are offloaded to the offload_device
when stored in the cache and onloaded to their original device when fetched from the cache
Currently supports nested offloading of dataclass instances and tuples
Construct using empty
and from_dataloader
class methods
Source code in src/llmcompressor/pipelines/cache.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 |
|
delete(batch_index, consumed_names=None)
Delete values from the cache
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch_index | int | index of batch whose values will be deleted | required |
consumed_names | Optional[List[str]] | list of keys whose values will be deleted, defaults to removing all keys | None |
Source code in src/llmcompressor/pipelines/cache.py
empty(num_batches, offload_device)
classmethod
Construct an empty cache
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_batches | int | the expected number of batches to be stored | required |
offload_device | device | device to offload values to | required |
Source code in src/llmcompressor/pipelines/cache.py
fetch(batch_index, input_names=None)
Fetch values belonging to a batch
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch_index | int | index of batch whose values are being fetched | required |
input_names | Optional[List[str]] | list of keys whose values are being fetched | None |
Returns:
Type | Description |
---|---|
Dict[str, Any] | dictionary mapping keys to onloaded values |
Source code in src/llmcompressor/pipelines/cache.py
from_dataloader(dataloader, model_device, mask_padding=True, offload_device=torch.device('cpu'))
classmethod
Initialize a cache with data from the provided dataloader
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataloader | DataLoader | dataloader which generates values to be cached | required |
model_device | device | device which values will be onloaded to when fetched | required |
mask_padding | bool | zero out padding tokens if True. This affects modifiers such as GPTQ and SparseGPT | True |
offload_device | device | device to offload values to | device('cpu') |
Source code in src/llmcompressor/pipelines/cache.py
update(batch_index, values)
Update/put values belonging to a batch
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch_index | int | index of batch whose values will be updated | required |
values | Dict[str, Any] | dictionary mapping keys to values used for update | required |