MindStudio Debugger User Guide¶
Overview¶
MindStudio Debugger (msDebug for short) is an operator debugging tool for Ascend devices. It is used to debug operator programs running on NPUs and provides debugging methods for operator developers. The debugging methods include reading the memory and register of an Ascend device, and pausing and resuming the running status of a program. After testing the operator functions in a real-world hardware environment by starting operators or using the msOpST tool, you can determine whether to use the msDebug tool for function debugging based on the actual test situation.
Scenarios
The following operator call scenarios are supported:
-
Kernel launch operator development: kernel launch
For details about the kernel launch scenario, see section "Completing Kernel Launch Based on the Sample Project" in Ascend C Operator Development Guide. For details about the operation, see "Debugging a Vector Operator on the Board".
-
Project-based operator development: single-operator API calling
For details about the single-operator API execution scenario, see "Project-based Operator Development" > "Single-Operator API Execution" in Ascend C Operator Development Guide. For details about the operation, see "Calling AscendCL Single-Operator".
-
AI framework operator adaptation: PyTorch framework
For details about the single-operator calling scenario through the PyTorch framework, see "OpPlugin in Ascend-developed Plugins" in Ascend Extension for PyTorch Suite and Third-party Library Support List. For details about the operation, see "Debugging the Operators Called by a PyTorch Interface".
Additional Information
msDebug also provides the following extension program. For details, see Table 1 Extension program.
Program Name |
Description |
|---|---|
msdebug-mi(msDebug Machine Interface) |
Provides machine-to-machine interaction interfaces for data parsing, which users do not need to pay attention to. |
Preparations¶
Environment Setup
- Install msDebug by referring to MindStudio Debugger Installation Guide.
-
To enable msDebug, install the NPU driver and firmware using either of the following methods (method 1 is recommended for CANN 8.1.RC1 and later, and driver 25.0.RC1 and later):
-
Method 1: Specify the
--fulloption during driver installation, and then run theecho 1 > /proc/debug_switchcommand as therootuser to enable the debugging channel. Then the msDebug tool can be used. -
Method 2: Specify the
--debugoption during driver installation. For details, see "Installing the NPU Driver and Firmware" in CANN Software Installation Guide.
-
Constraints
- The debugging channel has high permissions, which causes security risks. Exercise caution when using this tool. This tool is not recommended in the production environment. If you use this tool, you implicitly accept the risks involved.
- For a single device, only one msDebug tool can be used for debugging. You are not advised to run other operator programs at the same time.
- When the program to be debugged calls multiple operators, the msDebug tool can debug only a specified operator.
- During operator debugging, the overflow/underflow detection function is disabled.
Supported Products¶
The following products are supported:
- Atlas A3 training products/Atlas A3 inference products
- Atlas A2 training products/Atlas A2 inference products
[!NOTE]NOTE
- For details about Ascend product models, see Ascend Product Models.
- For details about the supported functions, see the documentation of the corresponding function module.
Precautions¶
- You can run the
helpcommand to view all the commands supported by msDebug. Commands excluded in command reference are implemented by the open-source debugger LLDB. Pay attention to related risks when using LLDB. For details about how to use LLDB, see its official document. - You need to ensure the execution security of executable files or applications.
- You are advised to restrict the operation permission on executable files or applications to avoid privilege escalation risks.
- Avoid high-risk operations (such as deleting files, deleting directories, changing passwords, and running privilege escalation commands) to prevent security risks.
Command Reference¶
Table 1 Command reference
Command |
Command Abbreviation |
Description |
Example |
|---|---|---|---|
breakpoint set -f filename -l linenum |
b |
Adds breakpoints.filename indicates the operator implementation code file *.cpp.linenum indicates the specific line number of the code file. |
b add_custom.cpp:85 |
run |
r |
Runs the program. |
r |
continue |
c |
Continues to run. |
c |
print variable |
p |
Prints variables. |
p zLocal |
frame variable |
var |
Displays all local variables in the current scope. |
var |
memory read |
x |
Reads memory. |
x -m GM -f float16[] 0x00001240c0037000 -c 2 -s 128 -E 0
|
ascend info devices |
- |
Queries device information. |
ascend info devices |
ascend info cores |
- |
Queries the AI Core information of the operator. |
ascend info cores |
ascend info tasks |
- |
Queries information about the task where the operator runs. |
ascend info tasks |
ascend info stream |
- |
Queries information about the stream where the operator runs. |
ascend info stream |
ascend info blocks |
- |
Queries information about the block where the operator runs. |
Prints information about the running blocks.
ascend info blocks Prints the code of the running blocks at the current interrupt.
ascend info blocks -d |
ascend aic id |
- |
Switches the Cube core focused by the debugger. |
ascend aic 1 |
ascend aiv id |
- |
Switches the vector core focused by the debugger. |
ascend aiv 5 |
"CTRL+C" |
- |
Manually interrupts the operator running program and displays the interruption location information. |
Enter a value using the keyboard. |
register read |
re r |
Reads the value of a register. `-a` reads the values of all registers. `$REG_NAME` reads the value of a register with a specified name. |
register read -a re r $PC |
thread step-over |
next or n |
Moves to the next executable line of code in the same call stack. |
n |
thread step-in |
step or s |
Enters the function for debugging. |
s |
thread step-out |
finish |
Executes the remaining part of the function and returns to the main program to continue execution. |
finish |
thread backtrace |
bt |
Displays the code call stack information. |
bt |
target modules add <kernel.o> |
image add [kernel.o] |
Imports operator debugging information when the PyTorch framework calls operators. |
image add xx.o |
target modules load --file <kernel.o> --slide <address> |
image load -f <kernel.o> -s <address> |
Loads operator debugging information when the PyTorch framework calls operators to make the imported debugging information take effect. |
image load -f xx.o -s 0 |
msdebug --core corefile [kernel.o|fatbin] |
- |
|
msdebug --core corefile xx.o msdebug --core corefile |
ascend info summary |
- |
Displays information about the coredump file. |
ascend info summary |
help msdebug_command |
- |
Displays the help information about the tool command. The command output displays the function, syntax, and options of a command. |
help run The help information about the core switching command is as follows:
(msdebug) help ascend aic change the id of the focused ascend aicore. Syntax: ascend aic <id> The help information about the `ascend info blocks` command is as follows:
(msdebug) help ascend info blocks
show blocks overall info.
Syntax: ascend info blocks
Command Options Usage:
ascend info blocks [-d]
-d ( --details )
Show stopped states for all blocks.
|
[!NOTE]NOTE
- Currently, the
btcommand applies only to the coredump feature scenario. The call stack information is accurate only whenstop\_reasonisCUBE\_ERROR,CCU\_ERROR,MTE\_ERROR,VEC\_ERROR, andFIXP\_ERROR.- If the function name displayed in the
btcommand is too long, you can set it by referring to formatting.setting set frame-format "frame #${frame.index}: ${frame.pc}{ ${module.file.basename}{{${frame.no-debug}${function.pc-offset}}}}{ at ${line.file.basename}:${line.number}{:${line.column}}}{${function.is-optimized} [opt]}{${frame.is-artificial} [artificial]}\n"
- After the
runcommand is executed, run theimage addcommand to import the debugging information. Then, run theimage loadcommand for the imported debugging information to take effect.
Tool Usage¶
Importing Debugging Information
Before debugging an operator, enable the debugging -g -O0 option and recompile the operator to include debugging information in the operator binary. For details, see Compiling Operators Based on the Sample Project. The operator debugging information is automatically imported to the msDebug tool.
Starting the tool
The msDebug tool can be started in either of the following ways.
[!NOTE]NOTE
IfCannot read termcap database; using dumb terminal settingsis displayed, configureexport TERMINFO=xxto eliminate the message.xxindicates the local TERMINFO path.
-
Load the executable file
application.- After the operator is built, the executable file
applicationon the NPU can be obtained. -
Use msDebug to load the executable file.
[!NOTE]NOTE
- Perform one-click compilation and running based on the kernel framework of the Ascend C operator to generate the executable file
applicationon the NPU. For details, see "Kernel Launch Operator Development" > "Kernel Launch" in Ascend C Operator Development Guide. - If the executable file has other input parameters, pass them as follows:
- Perform one-click compilation and running based on the kernel framework of the Ascend C operator to generate the executable file
- After the operator is built, the executable file
-
Load the Python script for operator calling.
- After plugins of the PyTorch framework are developed, you can directly call Ascend C custom operators from PyTorch through the custom Python script
test_ops_custom.py. -
Use msDebug to load the Python script.
$ msdebug python3 test_ops_custom.py msdebug(MindStudio Debugger) is part of MindStudio Operator-dev Tools. The tool provides developers with a mechanism for debugging Ascend kernels running on actual hardware. This enables developers to debug Ascend kernels without being affected by potential changes brought by simulation and emulation environments. (msdebug) target create "python3" Current executable set to '${INSTALL_DIR}/projects/application' (aarch64). (msdebug) settings set -- target.run-args "test_ops_custom.py" (msdebug)[!NOTE]NOTE
For details about the single-operator calling scenario through the PyTorch framework, see "OpPlugin in Ascend-developed Plugins" in Ascend Extension for PyTorch Suite and Third-party Library Support List.
- After plugins of the PyTorch framework are developed, you can directly call Ascend C custom operators from PyTorch through the custom Python script
Exiting Debugging
Exit the debugger.
[!NOTE]NOTE
The debugging channel cannot be disabled independently. To disable the debugging channel, you need to enable the overwrite mode. For details, see the NPU driver and firmware installation documents.
Specifying a Device ID (MC2 Operator Scenario)
When debugging a single-process multi-thread MC2 operator, you can run the ascend device ID command (ID indicates the device ID) to specify the device ID to debug the operator on a specific device. This debugging mode has the following advantages:
- Higher debugging efficiency: By selecting a specific device, you can use hardware resources more efficiently and accelerate the debugging process.
- Well targeted: You can debug a specific device to detect and resolve performance bottlenecks or compatibility issues related to the device.
- Issue isolation: If a performance or function issue occurs, you can specify different device IDs to check whether the issue is caused by a specific device, thereby making it easier to locate the issue.
[!NOTE]NOTE
- If no device ID is specified, only the device ID set for the first time during program running is debugged.
- The HCCL APIs do not support step-by-step debugging. For details about the APIs, see "High-Level APIs" > "HCCL" > > "HCCL Kernel APIs" in Ascend C Operator Development API Reference.
py38) [root@localhost MC2-master]# msdebug /home/xxx/MC2-master/bin/alltoall_custom_aarch64
msdebug(MindStudio Debugger) is part of MindStudio Operator-dev Tools.
The tool provides developers with a mechanism for debugging Ascend kernels running on actual hardware.
This enables developers to debug Ascend kernels without being affected by potential changes brought by simulation and emulation environments.
(msdebug) target create "/home/xxx/MC2-master/bin/alltoall_custom_aarch64"
Current executable set to '/home/xxx/MC2-master/bin/alltoall_custom_aarch64' (aarch64).
(msdebug) b all_to_all_custom_v3.cpp:58
Breakpoint 1: 2 locations.
(msdebug) ascend device 1
(msdebug) run --x1_shape 72,17 --input_tensor_format ND --input_tensor_dtype fp16 --output_shape 72,17 --output_dtype fp16 --output_format ND --n_dev 2 --bin_path feature/aclnn/AllToAllCustom_fp16_ND_fuzz_000010 --loop_cnt 1 --platform 1971 --version 3 --tileM 128 | tee /home/shelltest/MC2-master/feature/aclnn/AllToAllCustom_fp16_ND_fuzz_000010/mc2_memory.log
Process 2625643 launched: '/home/xxx/MC2-master/bin/alltoall_custom_aarch64' (aarch64)
[INFO] rank 0 hcom: xx.xx.xx.xxx%enp189s0f0_60000_0_1747739573633567 stream: 0xaaaac9e14610, context : 0xaaaac9daeda0
[INFO] rank 1 hcom: xx.xx.xx.xxx%enp189s0f0_60000_0_1747739573633567 stream: 0xaaaaca8c8380, context : 0xaaaaca88f280
before RunGraph : free :29837 M, total:30196 M, used :358 M, ret :0
before RunGraph : free :29835 M, total:30196 M, used :360 M, ret :0
Process 2625643 stopped and restarted: thread 19 received signal: SIGCHLD
[INFO] M is 72, K is 17, tileM is 128, tileNum is 0, tailM is 36, tailNum is 1, useBufferType is 0
[INFO] M is 72, K is 17, tileM is 128, tileNum is 0, tailM is 36, tailNum is 1, useBufferType is 0
[Launch of Kernel AllToAllCustomV3_f1974b24a4ace3957d571b2712b3eadf_1000 on Device 1]
[Launch of Kernel AllToAllCustomV3_f1974b24a4ace3957d571b2712b3eadf_1000 on Device 1]
Process 2625643 stopped
[Switching to focus on Kernel AllToAllCustomV3_f1974b24a4ace3957d571b2712b3eadf_1000, CoreId 0, Type aiv]
* thread #1, name = 'alltoall_custom', stop reason = breakpoint 1.2
frame #0: 0x0000000000004e0c AllToAllCustomV3_f1974b24a4ace3957d571b2712b3eadf.o`all_to_all_custom_v3_1000_tilingkey.vector(aGM="\x8b2d3+\xb5Ӫ\xbe\xb7\xa94\x87\xba;\xb6\xf68\U0000000e9\xc1\xa9", cGM="", workspaceGM="", tilingGM="d") at all_to_all_custom_v3.cpp:58:28
55 auto &&cfg = tilingData.param;
56 const uint8_t tileNum = cfg.tileNum;
57 const uint8_t tailNum = cfg.tailNum;
-> 58 const uint64_t tileM = cfg.tileM;
59 const uint64_t tailM = cfg.tailM;
60 const uint64_t M = cfg.M;
61 const uint64_t K = cfg.K;
Breakpoint Setting¶
Function¶
When using msDebug to debug an operator, you can set line breakpoints on the execution program of the operator, that is, set breakpoints at a specific line in the operator code file.
Precautions¶
- If an operator implementation file with the same name exists on both the host and kernel, you are advised to use an absolute path to set a breakpoint to ensure that the breakpoint is set on the target file.
-
When a breakpoint is set on the source code file, an alarm indicating that the actual location cannot be found may be displayed, as shown in the following. After the operator is executed, the actual location is automatically found and the breakpoint is automatically set.
-
If the operator code is compiled into the dynamic library and loaded by using the operator launch symbol, when a breakpoint is set before the
runcommand is executed, the command output indicates that the breakpoint position is not found (pending on future shared library load). The dynamic library is loaded only after the program is executed. The operator debugging information is parsed after theruncommand is executed, and then the breakpoint is updated and reset.
Example¶
Setting a Line Breakpoint
-
Add a breakpoint in line 114 of the kernel function implementation file
matmul\_leakyrelu. If the following information is displayed, the breakpoint is successfully added:(msdebug) b matmul_leakyrelu_kernel.cpp:114 Breakpoint 1: where = device_debugdata`_ZN17MatmulLeakyKernelIDhDhffE7CopyOutEj_mix_aiv + 240 at matmul_leakyrelu_kernel.cpp:114:14, address = 0x000000000000ff88For details about the command output, see the following table.
Table 1 Information description
Field
Description
device_debugdata
Name of the .o file on the device.
matmul_leakyrelu_kernel.cpp
Name of the kernel function where the breakpoint is located.
CopyOut
Current function.
240
Offset of the breakpoint address relative to the address of the CopyOut function. In this example, the offset of 0xff88 relative to the address of the CopyOut function is 240.
address = 0x000000000000ff88
Breakpoint address, that is, logical relative address.
-
Run the operator program and wait until the breakpoint is hit.
0x000000000000ff88indicates the address of the PC where the breakpoint is located.(msdebug) run Process 165366 launched: '${INSTALL_DIR}/projects/normal_sample/mix/matmul_leakyrelu.fatbin' (aarch64) [Launch of Kernel matmul_leakyrelu_custom on Device 1] Process 165366 stopped [Switching to focus on Kernel matmul_leakyrelu_custom, CoreId 14, Type aiv] * thread #1, name = 'matmul_leakyrelu', stop reason = breakpoint 1.1 frame #0: 0x000000000000ff88 device_debugdata`_ZN17MatmulLeakyKernelIDhDhffE7CopyOutEj_mix_aiv(this=0x000000000019fb60, count=0) at matmul_leakyrelu_kernel.cpp:114:14 111 (uint16_t)(tiling.baseN * sizeof(cType) / DEFAULT_C0_SIZE), 112 0, 113 (uint16_t)((tiling.N - tiling.baseN) * sizeof(cType) / DEFAULT_C0_SIZE)}; -> 114 DataCopy(cGlobal[startOffset], reluOutLocal, copyParam); 115 reluOutQueue_.FreeTensor(reluOutLocal); 116 } 117 (msdebug)
Printing Breakpoints
Run the following command to print the positions and sequence numbers of all breakpoints that have been set.
(msdebug) breakpoint list
Current breakpoints:
1: file = 'add_custom.cpp', line = 85, exact_match = 0, locations = 1, resolved = 1, hit count = 1
1.1: where = device_debugdata`::add_custom(uint8_t *__restrict, uint8_t *__restrict, uint8_t *__restrict) + 14348 [inlined] KernelAdd::CopyOut(int) + 1700 at add_custom.cpp:85:9, address = 0x000000000000380c, resolved, hit count = 1
Deleting Breakpoints
-
Delete the breakpoint with a specific line number.
-
Resume the running of the program. Due to breakpoint deletion, the program keeps running to the last minute.
(msdebug) c Process 165366 resuming 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 Process 165366 exited with status = 0 (0x00000000) (msdebug)
Memory and Variable Printing¶
Function¶
Based on the variable type and usage, a variable can be stored in a register or in the local memory or global memory. You can determine the storage location by printing the variable address and further view the associated memory content.
Precautions¶
Currently, the msDebug tool cannot directly print the value of a template parameter by variable name. You need to print the value of the template parameter using the p *Template_parameter_object*. The value of the template parameter is displayed after printing. For example, COMPUTE\_LENGTH is a template parameter, and this is the object pointer to which the template parameter belongs. If you want to print the value of the parameter, run the p this command where the parameter is used. An example is provided as follows:
22 template<class ArchTag_, class ElementAccumulator_, class ElementOut_, uint32_t COMPUTE_LENGTH>
23 struct ReduceAdd {
24 ReduceAdd(Arch::Resource<ArchTag> &resource)
25 {
-> 26 for (uint32_t i = 0; i < BUFFER_NUM; i++) {
27 inputBuffer[i] = resource.ubBuf.template GetBufferByByte<ElementAccumulator>(bufferOffset);
28 bufferOffset += COMPUTE_LENGTH * sizeof(ElementAccumulator);
(msdebug) p this
(Catlass::Gemm::Kernel::ReduceAdd<Catlass::Arch::AtlasA2, float, __fp16, 32> *) $0 = 0x00000000001cf838
Example¶
Printing Variables
After a breakpoint is hit, you can run the p variable\_name command to print the value of a specified variable. For example:
(msdebug) p alpha
(float) $0 = 0.00100000005
(msdebug) p tiling
(const TCubeTiling) $1 = {
usedCoreNum = 2
M = 1024
N = 640
Ka = 256
...
}
Printing GlobalTensor
GlobalTensor is used to store the global data of the global memory (external storage).
You can run the following commands to print GlobalTensor. The following takes cGlobal as an example. The address_ field specifies the memory address of zGm. In this example, the value is 0x000012c045400000.
(msdebug) p cGlobal
(AscendC::GlobalTensor<float>) $0 = {
AscendC::BaseGlobalTensor<float> = {
address_ = 0x000012c045400000
oriAddress_ = 0x000012c045400000
}
bufferSize_ = 655360
shapeInfo_ = {
shapeDim = '\0'
originalShapeDim = '\0'
shape = ([0] = 0, [1] = 0, [2] = 0, [3] = 0, [4] = 0, [5] = 0, [6] = 0, [7] = 0)
originalShape = ([0] = 0, [1] = 0, [2] = 0, [3] = 0, [4] = 0, [5] = 0, [6] = 0, [7] = 0)
dataFormat = ND
}
cacheMode_ = CACHE_MODE_NORMAL
}
The actual values of GlobalTensor variables are stored in the GM. Run the following command to print the values at 0x000012c045400000 in the GM. The example printing format contains the following information: one line to be printed, 256 bytes in each line, in float32 format.
(msdebug) x -m GM -f float32[] 0x000012c045400000 -s 256 -c 1
0x12c045400000: {4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096}
[!NOTE]NOTE
- If you want to print other custom addresses, ensure the validity of the custom addresses. Otherwise, errors may occur during operator running.
- If you want to print the memory starting from a custom address, you can add an offset based on the
address\_field as the start address. The unit of the offset is byte. After the offset GM memory address is obtained, enter it into the memory printing command.
Printing LocalTensor
LocalTensor is used to store the data in the local memory (internal storage) of the AI Core.
Run the following command to print the LocalTensor variable. reluOutLocal is used as an example. For the memory address of reluOutLocal, refer to the bufferAddr parameter in the address\_ field. In this example, the address is 0, and the length is 131072.
(msdebug) p reluOutLocal
(AscendC::LocalTensor<float>) $2 = {
AscendC::BaseLocalTensor<float> = {
address_ = (dataLen = 131072, bufferAddr = 0, bufferHandle = "", logicPos = '\n')
}
shapeInfo_ = {
shapeDim = '\0'
originalShapeDim = '\0'
shape = ([0] = 0, [1] = 1092616192, [2] = 4800, [3] = 1473680, [4] = 0, [5] = 1473888, [6] = 0, [7] = 1471968)
originalShape = ([0] = 0, [1] = 3222199212, [2] = 4800, [3] = 1, [4] = 0, [5] = 1473376, [6] = 0, [7] = 1473376)
dataFormat = ND
}
}
The actual content of the tensor is stored in the UB memory. You can run the following command to print the value at address 0 in the UB memory. The example printing format contains the following information: one line to be printed, 256 bytes in each line, in float32 format.
(msdebug) x -m UB -f float32[] 0 -s 256 -c 1
0x00000000: {4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096}
[!NOTE]NOTE
- In this sample, the actual content of the tensor variables is stored in the UB. However, the local tensor may be stored in the UB, L1, L0A, or L0B. You need to determine store location based on the code, and select the correct memory type for the
-moption of the printing command.- If you want to print the memory starting from a custom address, you can add an offset based on the
address\_field as the start address. The unit of the offset is byte. After the offset GM memory address is obtained, enter it into the memory printing command.
Printing All Local Variables
Print all local variables in the current scope:
(msdebug) var
(MatmulLeakyKernel<__fp16, __fp16, float, float> *__stack__) this = 0x0000000000167b60
(uint32_t) count = 0
(const uint32_t) roundM = 2
(const uint32_t) roundN = 5
(uint32_t) startOffset = 0
(AscendC::DataCopyParams) copyParam = (blockCount = 256, blockLen = 16, srcStride = 0, dstStride = 64)
Single-Step Debugging¶
Function¶
To understand the code execution details, you can run the thread step-over command to execute the code line by line for single-step debugging, or run the step in command to enter the function for debugging, or run the finish command to return to the next line of the function call point to continue debugging.
Precautions¶
During operator build, the build option of --cce-ignore-always-inline=true is used.
Example¶
Example for Using the thread step-over Command
-
Set a breakpoint to the position to be debugged and run the program. For details about how to set a breakpoint, see Breakpoint Setting.
(msdebug) r // Running Process 177943 launched: '${INSTALL_DIR}/projects/mix/matmul_leakyrelu.fatbin' (aarch64) [Launch of Kernel matmul_leakyrelu_custom on Device 1] Process 177943 stopped [Switching to focus on Kernel matmul_leakyrelu_custom, CoreId 44, Type aiv] * thread #1, name = 'matmul_leakyrelu', stop reason = breakpoint 1.2 frame #0: 0x000000000000f01c device_debugdata`_ZN17MatmulLeakyKernelIDhDhffE10CalcOffsetEiiRK11TCubeTilingRiS4_S4_S4__mix_aiv(this=0x0000000000217b60, blockIdx=0, usedCoreNum=2, tiling=0x0000000000217e28, offsetA=0x00000000002175c8, offsetB=0x00000000002175c4, offsetC=0x00000000002175c0, offsetBias=0x00000000002175bc) at matmul_leakyrelu_kernel.cpp:129:15 126 127 offsetA = mCoreIndx * tiling.Ka * tiling.singleCoreM; 128 offsetB = nCoreIndx * tiling.singleCoreN; -> 129 offsetC = mCoreIndx * tiling.N * tiling.singleCoreM + nCoreIndx * tiling.singleCoreN; // Breakpoint position 130 offsetBias = nCoreIndx * tiling.singleCoreN; 131 } 132 (msdebug) -
Enter the
nextorncommand for step-by-step execution.(msdebug) n Process 177943 stopped [Switching to focus on Kernel matmul_leakyrelu_custom, CoreId 44, Type aiv] * thread #1, name = 'matmul_leakyrelu', stop reason = step over // If the PC location is displayed in the command output, the step-by-step execution is successful. frame #0: 0x000000000000f048 device_debugdata`_ZN17MatmulLeakyKernelIDhDhffE10CalcOffsetEiiRK11TCubeTilingRiS4_S4_S4__mix_aiv(this=0x0000000000217b60, blockIdx=0, usedCoreNum=2, tiling=0x0000000000217e28, offsetA=0x00000000002175c8, offsetB=0x00000000002175c4, offsetC=0x00000000002175c0, offsetBias=0x00000000002175bc) at matmul_leakyrelu_kernel.cpp:130:18 127 offsetA = mCoreIndx * tiling.Ka * tiling.singleCoreM; 128 offsetB = nCoreIndx * tiling.singleCoreN; 129 offsetC = mCoreIndx * tiling.N * tiling.singleCoreM + nCoreIndx * tiling.singleCoreN; -> 130 offsetBias = nCoreIndx * tiling.singleCoreN; 131 } -
Run the
ascend info corescommand to view the PC information and stop reason of all cores.(msdebug) ascend info cores CoreId Type Device Stream Task Block PC stop reason 12 aic 1 3 0 0 0x12c0c00f03b0 breakpoint 1.2 * 44 aiv 1 3 0 0 0x12c0c00f8048 step over // * indicates the core that is currently running. 45 aiv 1 3 0 0 0x12c0c00f801c breakpoint 1.2[!NOTE]NOTE
- If the current core is stopped due to both step-by-step debugging and breakpoints, "breakpoint" is displayed.
- If the running program freezes, you can press "Ctrl+C" to interrupt the program. The possible causes of freezing are as follows:
- The user program itself has an infinite loop, which needs to be rectified by repairing the program.
- An operator uses synchronization instructions.
-
After the debugging is complete, run the
qcommand and enterYoryto end the debugging.
Example for Using the thread step-in and thread step-out Commands
-
Set a breakpoint to the position to be debugged and run the program. For details about how to set a breakpoint, see Breakpoint Setting.
(msdebug) r // Running Process 180938 launched: '${INSTALL_DIR}/test/mstt/sample/normal_sample/mix/matmul_leakyrelu.fatbin' (aarch64) [Launch of Kernel matmul_leakyrelu_custom on Device 1] Process 180938 stopped [Switching to focus on Kernel matmul_leakyrelu_custom, CoreId 46, Type aiv] * thread #1, name = 'matmul_leakyrelu', stop reason = breakpoint 1.1 frame #0: 0x000000000000e948 device_debugdata`_ZN17MatmulLeakyKernelIDhDhffE7ProcessEPN7AscendC5TPipeE_mix_aiv(this=0x000000000021fb60, pipe=0x000000000021f6a8) at matmul_leakyrelu_kernel.cpp:83:9 80 while (matmulObj.template Iterate<true>()) { 81 MatmulCompute(); 82 LeakyReluCompute(); -> 83 CopyOut(computeRound); 84 computeRound++; 85 } 86 matmulObj.End(); -
Input
steporsto enter the function for execution.(msdebug) s Process 180938 stopped [Switching to focus on Kernel matmul_leakyrelu_custom, CoreId 46, Type aiv] * thread #1, name = 'matmul_leakyrelu', stop reason = step in frame #0: 0x000000000000febc device_debugdata`_ZN17MatmulLeakyKernelIDhDhffE7CopyOutEj_mix_aiv(this=0x000000000021fb60, count=0) at matmul_leakyrelu_kernel.cpp:106:5 103 template <typename aType, typename bType, typename cType, typename biasType> 104 __aicore__ inline void MatmulLeakyKernel<aType, bType, cType, biasType>::CopyOut(uint32_t count) 105 { -> 106 reluOutQueue_.DeQue<cType>(); 107 const uint32_t roundM = tiling.singleCoreM / tiling.baseM; 108 const uint32_t roundN = tiling.singleCoreN / tiling.baseN; 109 uint32_t startOffset = (count % roundM * tiling.baseM * tiling.N + count / roundM * tiling.baseN); -
Run the
ascend info corescommand to view the PC information and stop reason of all cores.(msdebug) ascend info cores CoreId Type Device Stream Task Block PC stop reason 13 aic 1 3 0 0 0x12c0c00f1f88 breakpoint 1.1 * 46 aiv 1 3 0 0 0x12c0c00f8ebc step in // * indicates the core that is currently running. 47 aiv 1 3 0 0 0x12c0c00f8d3c breakpoint 1.1[!NOTE]NOTE
If the current core is stopped due to both function debugging and breakpoints,breakpointis displayed. -
After debugging the CopyOut function, run the
finishcommand to exit the CopyOut function and return to the main program to continue execution.(msdebug) finish Process 180938 stopped [Switching to focus on Kernel matmul_leakyrelu_custom, CoreId 46, Type aiv] * thread #1, name = 'matmul_leakyrelu', stop reason = step out frame #0: 0x000000000000e950 device_debugdata`_ZN17MatmulLeakyKernelIDhDhffE7ProcessEPN7AscendC5TPipeE_mix_aiv(this=0x000000000021fb60, pipe=0x000000000021f6a8) at matmul_leakyrelu_kernel.cpp:84:21 81 MatmulCompute(); 82 LeakyReluCompute(); 83 CopyOut(computeRound); -> 84 computeRound++; 85 } 86 matmulObj.End(); 87 }
Running Interrupting¶
Function¶
When the operator execution program freezes, manually interrupt the operator execution program and display the interrupted location information.
Precautions¶
-
If the running program freezes, you can press "Ctrl+C" to interrupt the program. The possible causes of freezing are as follows:
- The user program itself has an infinite loop, which needs to be rectified by repairing the program.
- An operator uses synchronization instructions.
-
This function can debug only the operator programs started in msDebug.
- After the interruption takes effect, the debugging information displaying and core switching functions are supported. Currently, single-step debugging, register reading, memory and variable printing, and
continuecommand are not supported.
Example¶
-
When the operator execution program on the host or device suspends, enter "CTRL+C" to manually interrupt the operator execution program and display the interrupted location.
(msdebug) r Process 173221 launched: '${INSTALL_DIR}/projects/mix/matmul_leakyrelu.fatbin' (aarch64) [Launch of Kernel matmul_leakyrelu_custom on Device 1] // Enter CTRL+C. Process 173221 stopped [Switching to focus on Kernel matmul_leakyrelu_custom, CoreId 35, Type aiv] * thread #1, name = 'matmul_leakyrelu', stop reason = signal SIGSTOP frame #0: 0x000000000000ef5c device_debugdata`_ZN17MatmulLeakyKernelIDhDhffE10CalcOffsetEiiRK11TCubeTilingRiS4_S4_S4__mix_aiv(this=<unavailable>, blockIdx=<unavailable>, usedCoreNum=<unavailable>, tiling=<unavailable>, offsetA=<unavailable>, offsetB=<unavailable>, offsetC=<unavailable>, offsetBias=<unavailable>) at matmul_leakyrelu_kernel.cpp:127:5 124 auto mCoreIndx = blockIdx % mSingleBlocks; 125 auto nCoreIndx = blockIdx / mSingleBlocks; 126 -> 127 while(true) { 128 } 129 offsetA = mCoreIndx * tiling.Ka * tiling.singleCoreM; 130 offsetB = nCoreIndx * tiling.singleCoreN; (msdebug) -
After the debugging is complete, run the
qcommand and enterYoryto end the debugging.
Core Switching¶
Function¶
Switch the current core to the specified core. After the core is switched, the position of the code interruption of the specified core is automatically displayed.
Example¶
-
Assume that the running core is core 2 of the AIV, and the core to be switched is core 3 of the AIV.
(msdebug) ascend aiv 3 [Switching to focus on Kernel matmul_leakyrelu_custom, CoreId 3, Type aiv] * thread #1, name = 'matmul_leakyrelu', stop reason = breakpoint 1.1 frame #0: 0x000000000000fd3c device_debugdata`_ZN7AscendC13WaitEventImplEt_mix_aiv(flagId=1) at kernel_operator_sync_impl.h:142:5 139 140 __aicore__ inline void WaitEventImpl(uint16_t flagId) 141 { -> 142 wait_flag_dev(flagId); 143 } 144 145 __aicore__ inline void SetSyncBaseAddrImpl(uint64_t config)After the switchover is complete, query the core information again. You can see that the core is switched to the line where the new core ID is located.
-
Assume that the running core is core 3 of the AIV, and the core to be switched is core 17 of the AIC.
(msdebug) ascend aic 17 [Switching to focus on Kernel matmul_leakyrelu_custom, CoreId 17, Type aic] * thread #1, name = 'matmul_leakyrelu', stop reason = breakpoint 1.1 frame #0: 0x0000000000008f88 device_debugdata`_ZN7AscendC7BarrierEv_mix_aic at kfc_comm.h:39 36 37 namespace AscendC { 38 __aicore__ inline void Barrier() -> 39 { 40 #if defined(__CCE_KT_TEST__) && __CCE_KT_TEST__ == 1 41 __asm__ __volatile__("" ::: "memory"); 42 #elseAfter the switchover is complete, query the core information again. You can see that the core is switched to the line where the new core ID is located.
Program Status Checking¶
Function¶
After using msDebug to call an operator, you can read register values of the device where the current breakpoint is located to check the program status.
Example¶
-
After
register read -ais entered, all available register values on the current device are returned.(msdebug) register read -a PC = 0x12C0C00F1F88 COND = 0x0 CTRL = 0x100000000003C GPR0 = 0x12C041200100 GPR1 = 0x146FD9 GPR2 = 0x146FC8 GPR3 = 0x8001000800 GPR4 = 0x80300000100 GPR5 = 0x80000000000 GPR6 = 0x0 GPR7 = 0x300000000 GPR8 = 0x3 GPR9 = 0x1000000 GPR10 = 0xFFF GPR11 = 0xFC0 GPR12 = 0x0 GPR13 = 0x0 GPR14 = 0x0 GPR15 = 0x11 GPR16 = 0x7FFF GPR17 = 0x7A0 GPR18 = 0x0 GPR19 = 0x0 GPR20 = 0x0 GPR21 = 0x0 GPR22 = 0x0 GPR23 = 0x0 GPR24 = 0x0 GPR25 = 0x0 GPR26 = 0x0 GPR27 = 0x0 GPR28 = 0x0 GPR29 = 0x146EE8 GPR30 = 0x147640 GPR31 = 0x12C0C00F1ED4 LPCNT = 0x0 STATUS = 0x0 SYS_CNT = 0x774E308602 ICACHE_PRL_ST = 0x0 SAFETY_CRC_EN = 0x0 ST_ATOMIC_CFG = 0x5 CALL_DEPTH_CNT = 0x5 CONDITION_FLAG = 0x1 FFTS_BASE_ADDR = 0xE7FFE044F000 CUBE_EVENT_TABLE = 0x70000000000 FIXP_EVENT_TABLE = 0x0 MTE1_EVENT_TABLE = 0x700000000 MTE2_EVENT_TABLE = 0x0 SCALAR_EVENT_TABLE = 0x0 -
After
register read $\{variable name\}is entered, the register value on the current device is returned. Separate multiple registers with spaces.- The register value is returned when the variable name is available on the current device.
Invalid register name 'variable name'is returned when the variable name is not available on the current device.
Debugging Information Displaying¶
Function¶
Query information about the device where the operator runs.
Example¶
ascend info devices
Run the following command to query the information about the device where the operator is running. The line where \* is located indicates the target device.
[!NOTE]NOTE
In the MC2 operator scenario, multiple device IDs are displayed.
For details about the command output, see the following table.
Table 1 Information description
Field |
Description |
|---|---|
Device |
Logical ID of the device. |
Aic_Num |
Number of used Cube Cores. |
Aiv_Num |
Number of used Vector Cores. |
Aic_Mask |
Mask code of the actually used Cube, which is represented by 64 bits. If the nth bit is 1, Cube n is used. |
Aiv_Mask |
Mask code of the actually used Vector, which is represented by 64 bits. If the nth bit is 1, Vector n is used. |
ascend info cores
Run the following command to query the information about the core where the operator is running. The line where * is located indicates the target core. In the following example, the target core is core 0 of the AIV.
(msdebug) ascend info cores
CoreId Type Device Stream Task Block PC stop reason
16 aic 1 3 0 0 0x12c0c00f1fc0 breakpoint 1.1
* 0 aiv 1 3 0 0 0x12c0c00f8fcc breakpoint 1.1
1 aiv 1 3 0 0 0x12c0c00f8d3c breakpoint 1.1
For details about the command output, see the following table.
Table 2 Information description
Field |
Description |
|---|---|
CoreId |
Core ID of the AIV or AIC, starting from 0. |
Type |
Core type, which can be AIC or AIV. |
Device |
Logical device ID. |
Stream |
Stream ID delivered by the current kernel function. A stream consists of a series of tasks. |
Task |
ID of the task in the current stream. Task indicates the task delivered to the task scheduler for processing. |
Block |
Number of cores on which the kernel function will be executed. Each core that executes the kernel function is assigned a logical ID, that is, block ID. |
PC |
Logical absolute address of the PC on the current core. |
Stop Reason |
Reason why the program execution stops, such as breakpoint, step in, step over, and Ctrl+C. |
ascend info tasks
Run the following command to query the task information of the operator. The line where * is located indicates the target task, including device ID, stream ID, task ID, and invocation (name of the called kernel function).
ascend info stream
Run the following command to query the stream information of the operator. The line where * is located indicates the target stream, including device ID, stream ID, and type (kernel type, which can be AIC or AIV).
ascend info blocks
Run the following command to query the block information of the operator. The line where * is located indicates the target block, including device ID, stream ID, task ID, and block ID.
Run the following command to print the code of the running block at the current breakpoint:
(msdebug) ascend info blocks -d
Current stop state of all blocks:
[CoreId 16, Block 0]
* thread #1, name = 'matmul_leakyrelu', stop reason = breakpoint 1.1
frame #0: 0x0000000000008fc0 device_debugdata`_ZN7AscendC14KfcMsgGetStateEj_mix_aic(flag=0) at kfc_comm.h:188
185 return static_cast<KFC_Enum>((flag & 0xffff0000) >> KFC_MSG_BYTE_OFFSET);
186 }
187 __aicore__ inline uint32_t KfcMsgGetState(uint32_t flag)
-> 188 {
189 return (flag & 0x00008000);
190 }
191 __aicore__ inline uint32_t KfcMsgMakeFlag(KFC_Enum funID, uint16_t instID)
[* CoreId 0, Block 0]
* thread #1, name = 'matmul_leakyrelu', stop reason = breakpoint 1.1
frame #0: 0x000000000000ffcc device_debugdata`_ZN17MatmulLeakyKernelIDhDhffE7CopyOutEj_mix_aiv(this=0x0000000000167b60, count=0) at matmul_leakyrelu_kernel.cpp:116:1
113 (uint16_t)((tiling.N - tiling.baseN) * sizeof(cType) / DEFAULT_C0_SIZE)};
114 DataCopy(cGlobal[startOffset], reluOutLocal, copyParam);
115 reluOutQueue_.FreeTensor(reluOutLocal);
-> 116 }
117
118 template <typename aType, typename bType, typename cType, typename biasType>
119 __aicore__ inline void MatmulLeakyKernel<aType, bType, cType, biasType>::CalcOffset(int32_t blockIdx,
[CoreId 1, Block 0]
* thread #1, name = 'matmul_leakyrelu', stop reason = breakpoint 1.1
frame #0: 0x000000000000fd3c device_debugdata`_ZN7AscendC13WaitEventImplEt_mix_aiv(flagId=1) at kernel_operator_sync_impl.h:142:5
139
140 __aicore__ inline void WaitEventImpl(uint16_t flagId)
141 {
-> 142 wait_flag_dev(flagId);
143 }
144
145 __aicore__ inline void SetSyncBaseAddrImpl(uint64_t config)
Abnormal Operator Dump File Parsing¶
Function¶
If a hardware issue happens onsite, repeated stress tests are needed to reproduce the issue, which slows down troubleshooting. To solve this problem, the system initiates a dump operation upon detecting a potential hardware issue, and captures the current status information. The msDebug tool parses the dump file of an abnormal operator. You can collect sufficient data for fault analysis even without a stress test. The above functions enhance hardware exception detection and minimize repetitive stress tests.
[!NOTE]NOTE
Currently, only the function of parsing dump files of abnormal operators is supported by Ascend 950 products. Other functions are not supported by Ascend 950 products.
Precautions¶
After the acl.json file is configured, other functions of msDebug cannot be used.
Example¶
-
Prepare the
acl.jsonconfiguration file.- Project-based operator development (single-operator API calling scenario): Create the
acl.jsonfile by referring to "Initialization and Deinitialization" in Application Development Guide (C&C++) and load the file using theaclinitAPI. - AI framework operator adaptation (PyTorch framework scenario): Search for the
acl.jsonfile in the installation directory oftorch_npu.
- Project-based operator development (single-operator API calling scenario): Create the
-
Enable the function of generating dump files for abnormal operators by referring to the configuration file example (dump configuration for abnormal operators) in "acl API Reference (C)" > "System Configuration" > "aclInit" in Application Development Guide (C&C++).
- In the
acl.jsonconfiguration file, setdump\_scenetoaic\_err\_detail\_dump. - In the
acl.jsonconfiguration file, setdump\_pathto the path for exporting the dump file of the abnormal operator.
- In the
-
If the program crashes (for example, memory overflow or segmentation fault), a core file of the abnormal operator is generated. The file name ends with .core.
-
Run the following command with the msDebug tool to load the dump file of the abnormal operator:
msdebug --core output2/extra-info/data-dump/0/xxx.core add.fatbin msdebug(MindStudio Debugger) is part of MindStudio Operator-dev Tools. The tool provides developers with a mechanism for debugging Ascend kernels running on actual hardware. This enables developers to debug Ascend kernels without being affected by potential changes brought by simulation and emulation environments. (msdebug) target create "add.fatbin" --core "output2/extra-info/data-dump/0/xxx.core" Core file '/home/xxx/coredump_test/output2/extra-info/data-dump/0/xxx.core' (aarch64) was loaded. [Switching to focus on CoreId 26, Type aiv][!NOTE]NOTE
To view the call stack, use the-O2/O3 + -goption to compile and generate thekernel.ofile that contains debugging information, or generate the ELF file of the fatbin structure.Cause: During operator execution, if a hardware exception occurs due to instruction execution, the hardware usually continues to execute several instructions before reporting the exception and generating a core file. Therefore, the memory and register data in the core file may be inaccurate. However, the value of the PC register is usually corrected.
At the O2/O3 optimization level, the inline function is used by default. Call stack can still be traced accurately without requiring stack memory data. At the O0 optimization level, no inline function is used forcibly, and the stack memory data is inaccurate. Generally, accurate data requires the 0 stack frame.
-
View the dump file information of the abnormal operator.
msdebug --core output2/extra-info/data-dump/0/xxx.core /home/xxxxx/Ascend/cann/opp/vendors/customize/op_impl/ai_core/tbe/kernel/ascend910b/add_custom/AddCustom_xxxx.o msdebug(MindStudio Debugger) is part of MindStudio Operator-dev Tools. The tool provides developers with a mechanism for debugging Ascend kernels running on actual hardware. This enables developers to debug Ascend kernels without being affected by potential changes brought by simulation and emulation environments. (msdebug) target create "/home/xxx/Ascend/cann/opp/vendors/customize/op_impl/ai_core/tbe/kernel/ascend910b/add_custom/AddCustom_xxx.o" --core "output2/extra-info/data-dump/0/xxx.core" Core file '/home/xxx/output2 /extra-info/data-dump/0/xxx.core' (hiipu64) was loaded. [Switching to focus on CoreId 34, Type aiv] (msdebug) ascend info summary CoreId CoreType PC DeviceId ChipType 33 AIV 0x12c0412004c8 0 A2/A3 * 34 AIV 0x12c0412007c0 0 A2/A3 35 AIV 0x12c0412007c0 0 A2/A3 36 AIV 0x12c0412007c0 0 A2/A3 37 AIV 0x12c0412007c0 0 A2/A3 38 AIV 0x12c0412007c0 0 A2/A3 39 AIV 0x12c0412007c0 0 A2/A3 40 AIV 0x12c0412007c0 0 A2/A3 Id DataType MemType Addr Size CoreId CoreType Dim 0 DEVICE_KERNEL_OBJECT GM 0x12c041200000 167872 NA AIV NA 1 STACK GM/DCACHE 0xff000108000(invalid) 32768 33 AIV NA 2 STACK GM/DCACHE 0xff000110000(invalid) 32768 34 AIV NA 3 STACK GM/DCACHE 0xff000118000(invalid) 32768 35 AIV NA 4 STACK GM/DCACHE 0xff000120000(invalid) 32768 36 AIV NA 5 STACK GM/DCACHE 0xff000128000(invalid) 32768 37 AIV NA 6 STACK GM/DCACHE 0xff000130000(invalid) 32768 38 AIV NA 7 STACK GM/DCACHE 0xff000138000(invalid) 32768 39 AIV NA 8 STACK GM/DCACHE 0xff000140000(invalid) 32768 40 AIV NA 9 WORKSPACE_TENSOR GM 0x0 0 NA NA NA 10 TILING_DATA GM/DCACHE 0x12c100000038 16 NA NA NA 11 OUTPUT_TENSOR GM 0x12c0c0024000 32768 NA NA [8, 2048] 12 INPUT_TENSOR GM 0x12c0c0012000 32768 NA NA [8, 2048] 13 INPUT_TENSOR GM 0x12c0c001b000 32768 NA NA [8, 2048] 14 ARGS GM/DCACHE 0x12c100000000 96 NA NA NA (msdebug) bt * thread #1, stop reason = VEC_ERROR * frame #0: 0x000012c0412004c8 AddCustom_xxx.o`::AddCustom_xxx_0(uint8_t *__gm__, uint8_t *__gm__, uint8_t *__gm__, u int8_t *__gm__, uint8_t *__gm__) [inlined] void AscendC::TPipe::ReleaseEventID<(AscendC::HardEvent)5>(this=<unavailable>, id=<unavailable>) at kernel_tpipe_impl.h:454:24 frame #1: 0x000012c0412004c8 AddCustom_xxx.o`::AddCustom_xxx_0(uint8_t *__gm__, uint8_t *__gm__, uint8_t *__gm__, u int8_t *__gm__, uint8_t *__gm__) [inlined] AscendC::TQueBind<(AscendC::TPosition)0, (AscendC::TPosition)9, 2, 0>::AllocBuffer(this=<unavailable>) at kernel_tquebind_impl.h:512:3 6 frame #2: 0x000012c041200474 AddCustom_xxx.o`::AddCustom_xxx_0(uint8_t *__gm__, uint8_t *__gm__, uint8_t *__gm__, u int8_t *__gm__, uint8_t *__gm__) [inlined] AscendC::LocalTensor<half> AscendC::TQueBind<(this=<unavailable>)0, (AscendC::TPosition)9, 2, 0>::AllocTensor<half>() at kernel_tquebi nd_impl.h:78:16 frame #3: 0x000012c041200474 AddCustom_xxx.o`::AddCustom_xxx_0(uint8_t *__gm__, uint8_t *__gm__, uint8_t *__gm__, u int8_t *__gm__, uint8_t *__gm__) [inlined] KernelAdd::CopyIn(this=<unavailable>, progress=<unavailable>) at add_custom.cpp:42:57 frame #4: 0x000012c041200474 AddCustom_xxx.o`::AddCustom_xxx_0(uint8_t *__gm__, uint8_t *__gm__, uint8_t *__gm__, u int8_t *__gm__, uint8_t *__gm__) at add_custom.cpp:33:13 frame #5: 0x000012c04120039c AddCustom_xxx.o`::AddCustom_xxx_0(uint8_t *__gm__, uint8_t *__gm__, uint8_t *__gm__, u int8_t *__gm__, uint8_t *__gm__) [inlined] add_custom_0_tilingkey(x=<unavailable>, y=<unavailable>, z=<unavailable>, workspace=<unavailable>, tiling=<unavailable>) at add_custom .cpp:83:8 frame #6: 0x000012c041200064 AddCustom_xxx.o`::AddCustom_xxx_0(uint8_t *__gm__, uint8_t *__gm__, uint8_t *__gm__, u int8_t *__gm__, uint8_t *__gm__) [inlined] ascendc_auto_gen_add_custom_kernel(x_in__=<unavailable>, y_in__=<unavailable>, z_out_=<unavailable>, workspace=<unavailable>, tiling=< unavailable>) at AddCustom_xxx_3800102_kernel.cpp:43:5 frame #7: 0x000012c04120004c AddCustom_xxx.o`::AddCustom_xxx_0(x_in__=<unavailable>, y_in__=<unavailable>, z_out_=< unavailable>, workspace=<unavailable>, tiling=<unavailable>) at AddCustom_xxx_3800102_kernel.cpp:48:5 -
For details about how to locate hardware exceptions, see Core Switching, Program Status Checking, and Memory and Variable Printing.
-
After the debugging is complete, run the
qcommand and enterYoryto end the debugging.