llmcompressor.utils.pytorch
get_matching_layer(target, name_to_match, module)
Given a target regex, find the layer name in the module that most closely matches the name_to_match string. This is used to matches submodules in the same layer, for instance matching "re.*k_proj" to "model.decoder.layer.0.q_proj" to find the k_proj that exists in layer 0.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target | str | regex to search for | required |
name_to_match | str | full layer name to match to, should exist in module | required |
module | Module | module to search for target in | required |
Returns:
Type | Description |
---|---|
Optional[Tuple[str, Module]] | Tuple containing the layer name and module that fits the target regex and best matches name_to_match, or None if no match can be found |
Source code in src/llmcompressor/utils/pytorch/module.py
get_no_split_params(model)
Get list of module classes that shouldn't be split when sharding. For Hugging Face Transformer models, this is the decoder layer type. For other types of models, this just returns all module names.
Returns:
Type | Description |
---|---|
Union[str, List[str]] | list of class names that shouldn't be split |
Source code in src/llmcompressor/utils/pytorch/module.py
get_parent_by_name(layer_name, model)
Get the parent layer of a layer by name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
layer_name | str | Name of the layer to find the parent of. | required |
model | Module | Model to search for the parent layer. | required |
Returns:
Type | Description |
---|---|
Tuple[str, Module] | Tuple containing the name of the parent layer and the parent layer itself. |
Source code in src/llmcompressor/utils/pytorch/module.py
qat_active(module)
Determines if any layers in the model have quantization enabled by checking for weight_fake_quant attributes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
module | Module | PyTorch model to check for quantization | required |
Returns:
Type | Description |
---|---|
bool | True if quantization is active anywhere in the model, False otherwise |