Training Acceleration and Model Reconstruction¶
Pruning Based on Importance Evaluation¶
Overview¶
msModelSlim provides model pruning APIs based on importance evaluation. You only need to provide a model instance and call the pruning APIs to prune a model. The pruned model achieves improved performance and reduced size, leading to higher inference efficiency.
Preparations¶
Currently, model pruning is supported under the PyTorch framework. Install msModelSlim. For details, see msModelSlim Installation Guide.
- Note: This feature supports only PyTorch 2.0.0 or later.
Function¶
Procedure¶
-
Prepare the model instance to be pruned and the corresponding training script. The following example shows how to perform configuration by using VGG16 from
torchvision. -
Open the training script
vision/references/classification/train.pyof the model to be pruned. Edit thetrain.pyfile and import the pruning APIs. For details about the pruning APIs, see thePruneTorchdocumentation. -
(Optional) Adjust the log output level. After starting the tuning task, the system displays log information at the specified level. For details, see Log Level Description.
-
After initializing the network and loading the weights in the original script, use the
PruneTorchAPIs to configure the importance evaluation function, the parameter retention ratio for operator nodes, and the pruning rate. -
Start the model pruning task. You are advised to use the final learning rate from the original training process and execute training for 10 epochs.
This process generates a pruned model for subsequent training tasks.
-
During subsequent evaluation steps, load the model pruning information returned in step 4 by referring to the following configuration example.
Weight Pruning for Transformer Models¶
Overview¶
msModelSlim provides API-based weight pruning for transformer models. This feature prunes model weights and loads them into a smaller model instance sharing the same architecture. You only need to provide the smaller model instance (obtained by specifying smaller initialization parameters, such as reducing the intermediate_size and num_hidden_layers parameters in a BERT model) along with the original model weight file, and then call the pruning APIs to prune the weights.
Preparations¶
Currently, weight pruning for transformer models is supported under the MindSpore and PyTorch frameworks. Install msModelSlim. For details, see msModelSlim Installation Guide.
- Note: This feature supports only PyTorch 2.0.0 or later.
During model pruning, you can manually configure parameters to prune the weights of a pre-trained model and load the pruned weights into the smaller model to obtain a transformer model with fully loaded weights. The accuracy of the pruned model is not guaranteed immediately after pruning. You must perform subsequent training, such as training through model distillation, to improve the accuracy.
Function¶
Procedure¶
The following procedure uses a transformer model under the PyTorch framework as an example. The input parameter configurations for MindSpore models vary only for certain API calls. For details, refer to the corresponding API specifications.
-
Prepare the original model instance (the model to be pruned) and its weight file sharing the same architecture. This example uses BERT as an example. Search for and download the BERT code and the original model weight file from ModelZoo.
-
Create a Python script for the model to be pruned, for example,
test_prune_model.py. Edit thetest_prune_model.pyfile and import the following APIs. For details about the pruning APIs, see the pruning API descriptions. -
(Optional) Adjust the log output level. After starting the tuning task, the system displays log information at the specified level. For details, see Log Level Description.
-
Use the
PruneConfigAPIs to configure parameters for the pruning steps and blocks. For details, see thePruneConfigdocumentation.prune_config = PruneConfig() prune_config.set_steps(['prune_blocks', 'prune_bert_intra_block']). \ add_blocks_params(pattern="bert.encoder.layer.(\d+).",layer_id_map={0: 0, 1: 2, 2: 4, 3: 6, 4: 8, 5: 10, 6: 11})- Note: If the pruning steps configured in the
set_stepsmethod containprune_blocks, you must call theadd_blocks_paramsmethod for configuration.
- Note: If the pruning steps configured in the
-
Call the
prune_model_weightAPI to invoke pruning configuration items to modify the pre-trained model weights and load the pruned weights into the smaller model, which is generated using smaller initialization parameters. The following example shows how to perform configuration using BERT. When initializing a smaller model, modify the JSON configuration underbert_configin advance. For example, set the value of theintermediate_sizeparameter to1536andnum_hidden_layersparameter to7. After the modification, import the following content into the Python script:import modeling # Import the BERT model. bert_config = modeling.BertConfig.from_json_file(bert_config_file) # Load the BERT configuration and initialize a smaller model. bert_model = modeling.BertForQuestionAnswering(bert_config) # Instantiate the BERT model. prune_model_weight(bert_model, prune_config, weight_file_path = "/home/xxx/xxx.pt") # Configure the target model to be pruned based on the actual model instance, and specify the original model weight file path based on the actual file path.The weight file for a MindSpore model must be in CKPT format. Weight files for the PyTorch framework must be in PT, PTH, PKL, or BIN format. For details, see the
prune_model_weightdocumentation. -
Start the model pruning task to prune the original weights and load them into the smaller model.
Sparse Tool¶
Overview¶
Sparsification algorithms optimize deep neural networks by setting unnecessary parameters in linear layers to 0. During deployment, the on-chip unzip unit of Ascend processors enables online weight decoding, which yields a more lightweight model and improves both inference speed and generalization performance.
Function¶
You must prepare a model based on the PyTorch framework architecture. The following example shows how to perform configuration using a linear layer.
-
Use the
SparseConfigAPIs to configure sparsity parameters and methods, which generates the sparsity algorithm configuration.sparse_config = SparseConfig(method = "magnitude", sparse_ratio = 0.5, progressive = False, uniform = True)method: specifies the sparsification method. Valid values:"magnitude","hessian","par", or"par_v2". Default value:"magnitude".sparse_ratio: specifies the sparsification ratio, ranging from 0 to 1. Set this parameter as needed. Default value:0.5.progressive: specifies whether to enable progressive sparsification. Default value:False.uniform: specifies whether to enable uniform sparsification. Default value:True.
-
Prepare a single batch dataset to serve as the calibration data for the sparsification algorithm.
-
Execute the model sparsification tuning task.
import torch
from msmodelslim.pytorch.sparse.sparse_tools import SparseConfig, Compressor
# Define a simple model.
class SimpleModel(torch.nn.Module):
def __init__(self):
super(SimpleModel, self).__init__()
self.linear1 = torch.nn.Linear(100, 50)
self.linear2 = torch.nn.Linear(50, 10)
def forward(self, x):
x = self.linear1(x)
x = self.linear2(x)
return x
generate_model = SimpleModel()
test_dataset = [torch.randn(64, 100)]
sparse_config = SparseConfig(method="magnitude", sparse_ratio=0.5)
prune_compressor = Compressor(generate_model, sparse_config)
prune_compressor.compress(dataset=test_dataset)
Example¶
import torch
import torch_npu
from msmodelslim.pytorch.sparse.sparse_tools import SparseConfig, Compressor
class TwoLayerNet(torch.nn.Module):
def __init__(self, D_in, H, D_out):
super(TwoLayerNet, self).__init__()
self.linear1 = torch.nn.Linear(D_in, H, bias=True)
self.linear2 = torch.nn.Linear(H, D_out, bias=True)
def forward(self, x):
x = self.linear1(x)
y_pred = self.linear2(x)
return y_pred
D_in, H, D_out = 100, 10, 1
model = TwoLayerNet(100, 10, 1)
test_dataset = [torch.randn(64, 100)]
sparse_config = SparseConfig(method='magnitude')
prune_compressor = Compressor(model, sparse_config)
prune_compressor.compress(dataset=test_dataset)
Model Distillation¶
Overview¶
msModelSlim provides API-based knowledge distillation for model tuning. You only need to provide a teacher model, a student model, and a dataset, and then call the distillation APIs to execute the distillation tuning process.
During model distillation, you can use the original transformer model and a transformer model configured with smaller parameters as the teacher and student models, respectively. Manually configuring the parameters returns a DistillDualModels model instance to be distilled, which can be used for training. After training is complete, you can obtain the trained student model from the DistillDualModels model instance.
Preparations¶
Currently, distillation tuning for transformer models is supported under the MindSpore and PyTorch frameworks. Install msModelSlim. For details, see msModelSlim Installation Guide.
Function¶
The following procedure uses a transformer model under the PyTorch framework as an example. The input parameter configurations for MindSpore models vary only for certain API calls. For details, refer to the corresponding API specifications.
-
Prepare the original transformer model and a transformer model configured with smaller parameters to serve as the teacher model and student model for the distillation tuning process. The following example shows how to perform configuration using BERT. Search for and download the BERT code and the original model weight file from ModelZoo.
-
Create a Python script for the model to be distilled, such as
distill_model.py. Edit thedistill_model.pyfile and import the following APIs. For details about the distillation API specifications, see the distillation API descriptions. -
(Optional) Adjust the log output level. After starting the tuning task, the system displays the distillation tuning log information on the screen.
-
Use the
KnowledgeDistillConfigAPIs to configure parameters for model distillation. For details, see theKnowledgeDistillConfigdocumentation. -
Use the
get_distill_modelAPI to invoke distillation configuration items and return aDistillDualModelsmodel instance to be distilled. For details, see theget_distill_modeldocumentation. Theteacher_modelandstudent_modelarguments represent BERT instances. You can modify the JSON configuration underbert_configsto initialize BERT models of different sizes. -
Train the
DistillDualModelsmodel instance. For details, refer to the training scripts of the teacher and student models, or visit the official MindSpore or PyTorch websites. The following example shows how to perform configuration using BERT. Modify the key information by referring to the original training coderun_squad.py, and then execute the command to perform training:- Change
model = modeling.BertForQuestionAnswering(config)in the original code tomodel = distill_model.student_modelto configure the optimizer for the student model. - Change
start_logits, end_logits = model(input_ids, segment_ids, input_mask)in the original code toloss, student_outputs, teacher_outputs = distill_model(input_ids, segment_ids, input_mask)and comment out the original loss calculation section to train theDistillDualModelsmodel instance.
- Change
-
Use the
get_student_modelmethod to obtain the trained student model after training is complete. (For models under the MindSpore framework, you cannot train theDistillDualModelsmodel instance again after executing theget_student_modelmethod.) -
Execute the model distillation tuning task to obtain the trained student model.