Offline Precision Pre-check in PyTorch¶

Overview¶

This function can scan PyTorch APIs in a user-trained model running on Ascend NPUs and then outputs diagnostic and analytical insights regarding model precision. Specifically, the tool collects forward and backward information for all APIs in a model, constructs corresponding unit tests, and compares NPU outputs against a high-precision CPU benchmark to calculate precision metrics. This process is executed using the acc_check subcommand. The pre-check data collected in an NPU environment is then copied to a GPU environment, where acc_check is run again. Finally, the NPU and GPU pre-check results are compared using the new standard precision comparison method to identify APIs that exhibit precision issues on NPUs. In addition, the tool supports both random generation mode and real data mode.

Concepts

New standard precision comparison method: Based on the new precision standard, different comparison algorithms (including absolute threshold, benchmark comparison, binary consistency, ULP error comparison, and dual-1000 metric) are used for different APIs to determine the final pre-check result.
Random generation mode and real data mode: During pre-check dump, you can choose to use the tool to generate random numbers to obtain dump data or use the actual input data. The random generation mode (task=statistics) offers high execution efficiency for quickly obtaining results, but its low data precision only supports rough judgments of precision issues. The real data mode (task=tensor) has slightly lower efficiency but provides high precision, enabling accurate identification of precision issues.

Offline Pre-check Process

Install msProbe in both the NPU and GPU environments.
Add the dump interface PrecisionDebugger of msProbe to the NPU training script to collect the data to be pre-checked. Note that level=L1 must be configured.
Copy the pre-check data dumped in the NPU environment to the GPU environment.
Run acc_check in both the NPU and GPU environments. The generated results will be used as the input for api_precision_compare. For details, see Offline Precision Pre-check.
Copy the accuracy_checking_details_{timestamp}.csv files generated by running acc_check in both the NPU and GPU environments to the same environment.
Run api_precision_compare. The output is the final result of the pre-check operation. For details, see Pre-check Result Comparison.

Preparations¶

Environment Setup

Install msProbe by referring to msProbe Installation Guide.

Constraints

Only the PyTorch scenario is supported.

Offline Precision Pre-check¶

After pre-check data collection completes, only API input data is obtained. To generate both the NPU vs. CPU (high-precision benchmark) and GPU vs. CPU (high-precision benchmark) comparison results, you need to conduct the acc_check operation.

acc_check can be performed in either of the following ways:

Run the acc_check subcommand to perform pre-check: This method is suitable for single-device scenarios with a small amount of data.
Run the multi_acc_check subcommand to perform multi-thread pre-check: This method is suitable for large model scenarios with a large amount of data.

Using `acc_check` for Pre-check¶

Function

Inputs the API information into the acc_check module to detect and compare precision.

Syntax

msprobe acc_check -api_info <dump_json_path> [-save_error_data] [-o <out_path>] [-j] [-d <device_id>] [-csv_path <result_csv_path>] [-f] [-config <config_path>]

Optional fields are enclosed in square brackets ([]), and variables are enclosed in angle brackets (<>).

Parameters

Parameter	Mandatory (Yes/No)	Description
`-api_info` or `--api_info_file`	Yes	Specifies the API information file `dump.json`.
`-save_error_data`	No	Saves the API input and output data that does not meet the precision requirements.
`-o` or `--out_path`	No	Specifies the path for saving the `acc_check` execution result. The default value is `./`.
`-j` or `--jit_compile`	No	Enables JIT compilation.
`-d` or `--device`	No	Specifies the ID of the device where UT code runs. The default value is `0`.
`-csv_path` or `--result_csv_path`	Mandatory when the `acc_check` operation is resumed after being interrupted.	Specifies the path of the `accuracy_checking_result_{timestamp}.csv` file generated when the current execution is interrupted. Set this parameter to resume execution from the interruption point if `acc_check` is interrupted. You need to specify the value to the `accuracy_checking_result_{timestamp}.csv` file that was interrupted last time. For details, see Resumable Check.
`-f` or `--filter_api`	No	Filters APIs with the same parameters and structures except the maximum and minimum values in a model. This parameter is applicable to models with large size and many repeated APIs.
`-config` or `--config_path`	No	Specifies the config.json file for additional configurations (including the blocklist and trustlist) for offline pre-check. By default, this file is not configured. For details about how to configure `config.json`, see Configuration File Introduction.

Example 1: Perform a pre-check.

msprobe acc_check -api_info ./dump_path/step{step_number}/rank{rank_number}/dump.json

The execution result of acc_check is generated in the path specified by the -o parameter, including the accuracy_checking_result_{timestamp}.csv and accuracy_checking_details_{timestamp}.csv files. accuracy_checking_result_{timestamp}.csv contains API-level data and indicates whether each API has passed the test. It is recommended to view the accuracy_checking_result_{timestamp}.csv file first. For APIs that failed the test or APIs with special focus, query the status of each output and comparison metric in the accuracy_checking_details_{timestamp}.csv file based on the API Name field. For details, see Pre-check Result Description.

Example 2: Save the input and output data that does not meet the requirements.

If you need to save the input and output data that does not meet the requirements, add -save_error_data to the end of the acc_check command. For example:

msprobe acc_check -api_info ./dump_path/step{step_number}/rank{rank_number}/dump.json -save_error_data

By default, the data is saved in the ./ut_error_data{timestamp} directory. You can use the error_data_path parameter to configure the save path as required. The error_data_path parameter can be configured in the config.json or config.yaml file. The config.json file needs to be specified by the -config parameter during the acc_check operation. For details about the configuration of the config.yaml file, see config.yaml File Description.

config.yaml File Description¶

The config.yaml file can be used to configure parameters to control the trustlist and blocklist functions of the dump and acc_check operations. The procedure is as follows:

The config.yaml file is usually located in a path similar to /home/xxx/miniconda3/envs/xxx/lib/python3.8/site-packages/msprobe/pytorch/api_accuracy_checker/config.yaml.

Open the config.yaml file.

vim /home/xxx/miniconda3/envs/xxx/lib/python3.8/site-packages/msprobe/pytorch/api_accuracy_checker/config.yaml

Modify the parameters in the config.yaml file.

white_list: []
black_list: []
error_data_path: './'
precision: 14

Parameter	Mandatory (Yes/No)	Description
white_list	No	API dump trustlist. Only the specified APIs are dumped. Example: `white_list=["conv1d", "conv2d"]` By default, no trustlist is configured, that is, all API data is dumped.
black_list	No	API dump blocklist. The specified APIs are not dumped. Example: `black_list=["conv1d", "conv2d"]` By default, no blocklist is configured, that is, all API data is dumped.
error_data_path	No	Path for storing the API input and output data that does not meet the precision requirements. Example: `error_data_path": "./" The default value is the current path.
precision	No	Number of decimal places of a floating-point number. By default, 14 decimal places are used.

[!NOTE]NOTE

If both white_list and black_list are configured and the API lists configured for them do not overlap, the trustlist takes effect. If the API lists overlap, the APIs excluded by the trustlist and the overlapped APIs are not dumped.

Pre-check API Blocklist and Trustlist¶

The acc_check process supports the API pre-check blocklist and trustlist. You can configure the black_list or white_list parameter to specify the names of the APIs that do not need to be pre-checked or need to be pre-checked:

Configure the parameters in config.json. This file needs to be specified by the -config parameter during the acc_check operation.
The priority of config.json is higher than that of config.yaml. That is, when config.json is executed, the configuration in config.yaml does not take effect.

Using `multi_acc_check` for Multi-thread Pre-check¶

Function

The multi_acc_check script is used to perform multiple acc_check operations in parallel, reducing the time required for pre-check.

Syntax

msprobe multi_acc_check -api_info <dump_json_path> [-save_error_data] [-o <out_path>] [-j] [-n <num_splits>] [-d <device_id>] [-csv_path <result_csv_path>] [-f]

Parameters

Parameter	Mandatory (Yes/No)	Description
`-api_info` or `--api_info_file`	Yes	Specifies the API information file `dump.json`.
`-save_error_data`	No	Saves the API input and output data that does not meet the precision requirements.
`-o` or `--out_path`	No	Specifies the path for saving the `acc_check` execution result. The default value is `./`.
`-j` or `--jit_compile`	No	Enables JIT compilation.
`-n` or `--num_splits`	No	Specifies the number of `acc_check` threads to be executed at the same time. The default value is `8`, and the maximum value is `64`. However, each device supports a maximum of eight threads. When multiple threads and devices are specified, the threads are evenly distributed on each device.
`-d` or `--device`	No	Specifies the ID of the device where UT code runs. The default value is `0`. You can specify up to eight devices (0 to 7) at the same time.
`-csv_path` or `--result_csv_path`	Mandatory when the `acc_check` operation is resumed after being interrupted.	Specifies the path of the `accuracy_checking_result_{timestamp}.csv` file generated when the current execution is interrupted. Set this parameter to resume execution from the interruption point if `acc_check` is interrupted. You need to specify the value to the `accuracy_checking_result_{timestamp}.csv` file that was interrupted last time. For details, see Resumable Check.
`-f` or `--filter_api`	No	Filters APIs with the same parameters and structures except the maximum and minimum values in a model. This parameter is applicable to models with large size and many repeated APIs.
`-config` or `--config_path`	No	Specifies the config.json file for additional configurations (including the blocklist and trustlist) for offline pre-check. By default, this file is not configured. For details about how to configure `config.json`, see Configuration File Introduction.

Example

msprobe multi_acc_check -api_info ./dump_path/step{step_number}/rank{rank_number}/dump.json -n 32 -d 0 1 2 3

Output Description

After the multi_acc_check pre-check is performed, two CSV files are generated for each device. For details, see Pre-check Result Description.

Resumable Check¶

Function

During acc_check operation, if the pre-check process is interrupted by issues such as environment failures or excessive data volume, you can resume the check after resolving the issues.

Precautions

You must use the accuracy_checking_result_{timestamp}.csv file that was interrupted last time.
Do not change the names of the accuracy_checking_result_{timestamp}.csv and accuracy_checking_details_{timestamp}.csv files, including the timestamps. Otherwise, the resumable check will fail because the file names cannot be identified.
The resumable check does not create a new CSV file. Instead, it continues to write subsequent results to the accuracy_checking_result_{timestamp}.csv file specified by -csv_path and the corresponding accuracy_checking_details_{timestamp}.csv file.

Example

msprobe acc_check -api_info ./dump_path/step{step_number}/rank{rank_number}/dump.json -csv_path /home/xxx/ut/accuracy_checking_result_{timestamp}.csv

Pre-check Result Description¶

The following is an example of the accuracy_checking_result_{timestamp}.csv and accuracy_checking_details_{timestamp}.csv files generated during accuracy pre-check:

You can check Forward Test Success and Backward Test Success in the accuracy_checking_result_{timestamp}.csv file to determine whether there are APIs that fail the test. Then, check the API details in the accuracy_checking_details_{timestamp}.csv file. For details, see API Pre-check Metrics.

accuracy_checking_result_{timestamp}.csv

Field	Meaning
API name	API name.
Forward Test Success	Whether the forward API passes the test. The value can be `pass`, `warning`, or `error`. `SKIP` indicates that the API calculation was skipped. The `Message` field provides the reason, such as: the API does not support precision pre-check, is filtered by the blocklist (or not included in the trustlist), or encountered an error during execution.
Backward Test Success	Whether the backward API passes the test. The value can be `pass`, `warning`, or `error`. If the value is empty, the API has no backward output. `SKIP` indicates that the API calculation was skipped. The `Message` field provides the reason, such as: the API does not support precision pre-check, is filtered by the blocklist (or not included in the trustlist), or encountered an error during execution.
Message	Message.

This result is an intermediate result and is for reference only. You are advised to view the comparison result after Comparing Pre-check Results. This result will be deleted later.

The pass/error status of Forward Test Success and Backward Test Success is determined by the cosine similarity, maximum absolute error, and dual-100/dual-1000/dual-10000 metrics recorded in accuracy_checking_details_{timestamp}.csv.

Note that an API in accuracy_checking_details_{timestamp}.csv may have multiple forward or backward outputs. In this case, each output is recorded in a separate row. In accuracy_checking_result_{timestamp}.csv, the result is marked as pass only if all outputs of an API are pass. If any output is error, the result is marked as error. If only warning and pass are present and no error is present, the result is marked as warning.

accuracy_checking_details_{timestamp}.csv

Field	Meaning
API name	Name of the API on the NPU or GPU.
Bench Dtype	Type of the benchmark API data.
DEVICE Dtype	Type of the NPU or GPU API data.
Shape	API shape.
Cosine similarity	Cosine similarity between the NPU or GPU data and the benchmark data.
Maximum absolute error	Maximum absolute error between the NPU or GPU data and the benchmark data.
Dual-100 metric	Dual-100 precision metric: Measures the proportion of elements in an NPU or GPU tensor whose relative error, when compared to the benchmark data, is less than 1%. The test is considered passed if the proportion of elements with a relative error exceeding 1% is less than 1% of the total number of elements.
Dual-1000 metric	Dual-1000 precision metric: Measures the proportion of elements in an NPU or GPU tensor whose relative error, when compared to the benchmark data, is less than 0.1%. The test is considered passed if the proportion of elements with a relative error exceeding 0.1% is less than 0.1% of the total number of elements.
Dual-10000 metric	Dual-10000 precision metric: Measures the proportion of elements in an NPU or GPU tensor whose relative error, when compared to the benchmark data, is less than 0.01%. The test is considered passed if the proportion of elements with a relative error exceeding 0.01% is less than 0.01% of the total number of elements.
Binary consistency error rate	Ratio of inconsistent precision values to the total number of tensor values in NPU or GPU data. This metric is displayed only for APIs whose data is of the built-in type (`bool`, `int`, `float`, or `str`), `torch.bool` or `torch.int`, or for APIs that use the binary consistency algorithm for comparison.
Error balance	Fluctuation in precision difference between NPU/GPU data and the benchmark data.
Root-mean-square error (RMSE)	RMSE between NPU/GPU data and the benchmark data.
Error ratio in small value range	Proportion of small values in an NPU or GPU tensor whose absolute error against the benchmark exceeds the error threshold, relative to the total number of small values. For details about how to determine small values and the error threshold, see Small Value Range.
Maximum relative error	Maximum relative error between NPU/GPU data and the benchmark data.
Mean relative error	Mean relative error between NPU/GPU data and the benchmark data.
inf/nan error rate	Proportion of elements that have inconsistent inf/nan values between the NPU and the benchmark to the total number of elements.
Relative error rate	Proportion of elements whose relative error between NPU normal values and benchmark normal values exceeds the error threshold, relative to the total number of normal value elements.
Absolute error rate	Proportion of elements whose absolute error between NPU small values and benchmark small values exceeds the error threshold, relative to the total number of small value elements.
Maximum ULP error	Maximum ULP error (absolute value) between NPU/GPU data and the benchmark data.
Mean ULP error	Mean ULP error (absolute value) between NPU/GPU data and the benchmark data.
Proportion of ULP errors greater than the threshold	Proportion of elements where the absolute ULP error between NPU/GPU data and benchmark data exceeds a threshold, relative to the total number of elements. The threshold is `1` for float16 or bfloat16 data types, and `32` for float32.
Status	API pre-check status. `pass` indicates that the test is passed; `error` indicates that the test is not passed, `warning` indicates that the test does not meet the requirements of the dual-1000 or dual-1000 precision metrics; and `SKIP` indicates that the calculation of an API is skipped. The `Message` field provides the reason, such as: a certain API parameter is not involved in backward gradient computation, the API's data type (e.g. float64) is not supported by new precision comparison standards, the API does not support precision pre-check, is filtered by the blocklist (or not included in the trustlist), or encountered an error during execution.
Message	Message.

API Pre-check Metrics¶

API pre-check metrics are used to determine whether an API meets the precision standards by checking the cosine similarity, maximum absolute error, and dual-100/1000/10000 precision metrics in accuracy_checking_details_{timestamp}.csv. You are advised to perform pre-check result comparison after the preliminary pre-check is complete to obtain a more accurate precision result.

If an API passes the pre-check test, the "Status" column in the accuracy_checking_details_{timestamp}.csv file is marked as "pass". Otherwise, it is marked as "error" or "warning". The detailed rules are as follows:

Cosine similarity > 0.99: If the cosine similarity is less than or equal to 0.99, the check fails and an "error" is marked. If the cosine similarity is greater than 0.99, the check succeeds and the process goes to the next step.
Maximum absolute error < 0.001: If the maximum absolute error is less than 0.001, the test is passed. If the maximum absolute error is greater than or equal to 0.001, the test is not passed. Proceed to the next step.
Dual-100, dual-1000, and dual-10000 precision metrics:
- For float16 and bfloat16 data: If the dual-100 metric fails to pass the test, mark the data as "error". If the dual-100 metric passes the test, but the dual-1000 metric fails, mark the data as "warning". If both the dual-100 and dual-1000 metrics pass the test, mark the data as "pass".
- For float32 and float64 data: If the dual-1000 metric fails, mark the data as "error". If the dual-1000 metric passes, but the dual-10000 metric fails, mark the data as "warning". If both the dual-1000 and dual-10000 metrics pass, mark the data as "pass".
In the accuracy_checking_result_{timestamp}.csv file, theForward Test Success and Backward Test Success fields are used to collect statistics on the test results of the forward and backward outputs of an operator. For operators marked as "pass", the value in accuracy_checking_result_{timestamp}.csv is TRUE, indicating that the test is passed. For operators marked as "error" or "warning", the value in accuracy_checking_result_{timestamp}.csv is FALSE, indicating that the test is failed. An operator may have multiple forward or backward inputs or outputs. Therefore, all inputs and outputs of such an operator must be marked as "pass" in accuracy_checking_result_{timestamp}.csv for the operator to be marked as TRUE. If any input or output is marked as "error" or "warning", the operator is marked as FALSE in accuracy_checking_result_{timestamp}.csv.

Small Value Range¶

Threshold regarded as a small value:

torch.float32: 2**-14
torch.float16: 2**-11
torch.bfloat16: 2**-8

Absolute error threshold of small values:

torch.float32: 2**-30
torch.float16: 2**-16
torch.bfloat16: 2**-16

Pre-check Result Comparison¶

Function

After obtaining the accuracy_checking_details_{timestamp}.csv files of the acc_check operation pre-check results in the NPU and GPU environments, you can compare the precision differences between the NPU and GPU environments.

Syntax

msprobe api_precision_compare -npu <npu_csv_path> -gpu <gpu_csv_path> [-o out_path]

Optional fields are enclosed in square brackets ([]), and variables are enclosed in angle brackets (<>).

Parameters

Parameter	Mandatory (Yes/No)	Description
`-npu` or `--npu_csv_path`	Yes	Path of the NPU pre-check result file (`accuracy_checking_details_{timestamp}.csv`). By default, the file is identified in the current directory.
`-gpu` or `--gpu_csv_path`	Yes	Path of the GPU pre-check result file (`accuracy_checking_details_{timestamp}.csv`). By default, the file is identified in the current directory.
`-o` or `--out_path`	No	Path for saving the execution result of `api_precision_compare.py`. The current directory is used by default.

Example

msprobe api_precision_compare -npu /home/xxx/npu/accuracy_checking_details_{timestamp}.csv -gpu /home/xxx/gpu/accuracy_checking_details_{timestamp}.csv -o /home/xxx/

Output Description

After the pre-check result comparison is complete, the api_precision_compare_result_{timestamp}.csv and api_precision_compare_details_{timestamp}.csv files are generated in the path specified by the -o parameter. Example:

You can check the Forward Test Success and Backward Test Success columns in the api_precision_compare_result_{timestamp}.csv file to determine whether there are APIs that fail the test, and then check the detailed API information in the api_precision_compare_details_{timestamp}.csv file.

api_precision_compare_result_{timestamp}.csv

Field	Meaning
API name	API name.
Forward Test Success	Whether the forward API passes the test. `pass` indicates that the test is passed; `error` indicates that an error occurs; and `SKIP` indicates that the calculation of an API is skipped. The `Message` field provides the reason, such as: the API's data type (e.g. float64) is not supported by new precision comparison standards, the API does not support precision pre-check, is filtered by the blocklist (or not included in the trustlist), or encountered an error during execution.
Backward Test Success	Whether the backward API passes the test. `pass` indicates that the test is passed; `error` indicates that an error occurs; an empty value indicates that the API has no backward output; and `SKIP` indicates that the API's data type (e.g. float64) is not supported by new precision comparison standards.
Message	Message.

Whether Forward Test Success and Backward Test Success pass the test is determined by the result of each metric in api_precision_compare_details_{timestamp}.csv. Note that an API in api_precision_compare_details_{timestamp}.csv may have multiple forward or backward outputs. In this case, each output is recorded in a separate row. In api_precision_compare_result_{timestamp}.csv, the result is marked as pass only if all metric results of an API are pass. If any metric is error, the result is marked as error. If only warning and pass are present and no error is present, the result is marked as warning.

api_precision_compare_details_{timestamp}.csv

Field	Meaning
API name	Name of an API on the NPU or GPU.
Error ratio in small value range	Error ratio in the small value range between the NPU and CPU data or GPU and CPU data. It is a benchmark comparison metric.
Error determination result in small value range	If the error ratio in small value range is less than or equal to 2, the result is marked as "pass". If the error ratio is greater than 2, the result is marked as "error".
RMSE ratio	RMSE ratio between the NPU and CPU data or GPU and CPU data. It is a benchmark comparison metric.
RMSE determination result	If the RMSE ratio is less than or equal to (2/1.5/1.2), the result is marked as "pass". If the RMSE ratio is greater than (2/1.5/1.2), the result is marked as "error". The values correspond to the operator monitoring levels L0, L1, and L2, respectively. For details, see Operator Monitoring Levels.
Maximum relative error ratio	Maximum relative error ratio between the NPU and CPU data or GPU and CPU data. It is a benchmark comparison metric.
Maximum relative error determination result	If the maximum relative error ratio is less than or equal to (10/5/2), the result is marked as "pass". If the maximum relative error ratio is greater than (10/5/2), the result is marked as "error". The values correspond to the operator monitoring levels L0, L1, and L2, respectively. For details, see Operator Monitoring Levels.
Mean relative error ratio	Mean relative error ratio between the NPU and CPU data or GPU and CPU data. It is a benchmark comparison metric.
Mean relative error determination result	If the mean relative error ratio is less than or equal to (2/1.5/1.2), the result is marked as "pass". If the mean relative error ratio is greater than (2/1.5/1.2), the result is marked as "error". The values correspond to the operator monitoring levels L0, L1, and L2, respectively. For details, see Operator Monitoring Levels.
Error balance ratio	Error balance ratio between the NPU and CPU data or GPU and CPU data. It is a benchmark comparison metric.
Error balance determination result	If the error balance ratio is less than or equal to 2, the result is marked as "pass". If the error balance ratio is greater than 2, the result is marked as "error". This field is not involved in `api_precision_compare_result`.
INF/NaN error rate	Proportion of elements with inconsistent INF/NaN values between the NPU and the benchmark, relative to the total element count. It is an absolute threshold metric.
INF/NaN determination result	If the result is 0, it is marked as "pass"; otherwise, it is marked as "error".
Relative error rate	Proportion of elements whose relative error between NPU normal values and benchmark normal values exceeds the error threshold, relative to the total number of normal value elements. It is an absolute threshold metric.
Relative error determination result	If the result is 0, it is marked as "pass"; otherwise, it is marked as "error".
Absolute error rate	Proportion of elements whose absolute error between NPU small values and benchmark small values exceeds the error threshold, relative to the total number of small value elements. It is an absolute threshold metric.
Absolute error determination result	If the result is 0, it is marked as "pass"; otherwise, it is marked as "error".
Binary consistency error rate	Ratio of inconsistent precision values to the total number of tensor values in NPU or GPU data. This metric is displayed only for APIs whose data is of the built-in type (`bool`, `int`, `float`, or `str`), `torch.bool` or `torch.int`, or for APIs that use the binary consistency algorithm for comparison. It is a binary consistency metric.
Binary consistency error rate determination result.	If the result is 0, it is marked as "pass"; otherwise, it is marked as "error".
Mean ULP error^a	Mean ULP error (absolute value) between NPU data and the benchmark data.
Proportion of ULP errors greater than the threshold^a	Proportion of elements where the absolute ULP error between NPU data and benchmark data exceeds a threshold, relative to the total number of elements. The threshold is `1` for float16 or bfloat16 data types, and `32` for float32.
Ratio of ULP error greater than the threshold^a	NPU-CPU ULP error exceedance rate/GPU-CPU ULP error exceedance rate
ULP error determination result	Options: When the NPU or GPU data type is float16 or bfloat16, the result is marked as "pass" if either of the following conditions is met; otherwise, it is marked as "error": The proportion of NPU ULP errors greater than the threshold is less than 0.001. The proportion of NPU ULP errors exceeding the threshold is lower than that of GPU ULP errors. When the NPU or GPU data type is float32, the result is marked as "pass" if any of the following conditions is met; otherwise, it is marked as "error": The mean NPU ULP error is less than 64. The proportion of NPU ULP errors greater than the threshold is less than 0.05. The proportion of NPU ULP errors exceeding the threshold is lower than that of GPU ULP errors.
Dual-1000 metric	Dual-1000 precision metric: Proportion of elements in an NPU tensor whose relative error, compared element-wise against the corresponding benchmark data, is less than 0.1%. The test is passed if the ratio of the number of elements with the relative error greater than 0.1% to the total number of elements is less than 0.1%. This metric is used only for `conv1d` and `conv2d`.
Dual-1000 metric determination result	Criterion: If the dual-1000 metric is greater than 0.999, the result is marked as "pass"; otherwise, it is marked as "error".
Comparison result	Final result of all metrics. If any metric in the comparison result is marked as `error`, the overall result is marked as `error`, and the `Message` field specifies the metric that does not meet the standard. If any metric is marked as `warning`, the overall result is marked as `warning`, and the `Message` field specifies the metric that does not meet the standard. If no metric is marked as `error` or `warning`, the overall result is marked as `pass`. `SKIP` indicates that the calculation of an API is skipped. The `Message` field provides the reason, such as: a certain API parameter is not involved in backward gradient computation, the API's data type (e.g. float64) is not supported by new precision comparison standards, the API does not support precision pre-check, is filtered by the blocklist (or not included in the trustlist), or encountered an error during execution.
Comparison algorithm	Comparison algorithm used by the API. Options include: benchmark comparison, binary consistency, absolute threshold, or ULP error comparison.
Message	Message.

Appendix¶

List of Supported Fused Operators¶

The pre-check tool supports the following fused operators:

npu_apply_adam_w
npu_confusion_transpose
fast_gelu
npu_layer_norm_eval
npu_linear
npu_fusion_attention (To perform pre-check on this operator on the GPU, you need to install flash_attn. flash_attn 2.1 or later is recommended.)
npu_rms_norm
npu_rotary_mul
npu_scaled_masked_softmax
npu_swiglu
npu_apply_adam
npu_group_norm_silu
npu_mish
npu_moe_gating_top_k_softmax
npu_sort_v2

Operator Monitoring Levels¶

The following lists the operator monitoring levels supported by the pre-check tool. For other operators that are not listed, the L0 monitoring is used by default.

L1 Operator List¶

embedding
npu_fusion_attention_grad
npu_fusion_attention
scatter_
scatter_add_

L2 Operator List¶

addmm
npu_cross_entropy_loss
npu_cross_entropy_loss_backward
GELU
layer_norm
matmul
mm
index
one_hot
sum
npu_rms_norm
npu_rms_norm_backward
nonzero

Offline Precision Pre-check in PyTorch¶

Overview¶

Preparations¶

Offline Precision Pre-check¶

Using acc_check for Pre-check¶

config.yaml File Description¶

Pre-check API Blocklist and Trustlist¶

Using multi_acc_check for Multi-thread Pre-check¶

Resumable Check¶

Pre-check Result Description¶

API Pre-check Metrics¶

Small Value Range¶

Pre-check Result Comparison¶

Appendix¶

List of Supported Fused Operators¶

Operator Monitoring Levels¶

L1 Operator List¶

L2 Operator List¶

Using `acc_check` for Pre-check¶

Using `multi_acc_check` for Multi-thread Pre-check¶