跳转至

MindStudio Debugger User Guide

Overview

MindStudio Debugger (msDebug for short) is an operator debugging tool for Ascend devices. It is used to debug operator programs running on NPUs and provides debugging methods for operator developers. The debugging methods include reading the memory and register of an Ascend device, and pausing and resuming the running status of a program. After testing the operator functions in a real-world hardware environment by starting operators or using the msOpST tool, you can determine whether to use the msDebug tool for function debugging based on the actual test situation.

Scenarios

The following operator call scenarios are supported:

Additional Information

msDebug also provides the following extension program. For details, see Table 1 Extension program.

Table 1 Extension program

Program Name

Description

msdebug-mi(msDebug Machine Interface)

Provides machine-to-machine interaction interfaces for data parsing, which users do not need to pay attention to.

Preparations

Environment Setup

  • Install msDebug by referring to MindStudio Debugger Installation Guide.
  • To enable msDebug, install the NPU driver and firmware using either of the following methods (method 1 is recommended for CANN 8.1.RC1 and later, and driver 25.0.RC1 and later):

    • Method 1: Specify the --full option during driver installation, and then run the echo 1 > /proc/debug_switch command as the root user to enable the debugging channel. Then the msDebug tool can be used.

      ./Ascend-hdk-<chip_type>-npu-driver_<version>_linux-<arch>.run --full
      
    • Method 2: Specify the --debug option during driver installation. For details, see "Installing the NPU Driver and Firmware" in CANN Software Installation Guide.

      ./Ascend-hdk-<chip_type>-npu-driver_<version>_linux-<arch>.run --debug
      

Constraints

  • The debugging channel has high permissions, which causes security risks. Exercise caution when using this tool. This tool is not recommended in the production environment. If you use this tool, you implicitly accept the risks involved.
  • For a single device, only one msDebug tool can be used for debugging. You are not advised to run other operator programs at the same time.
  • When the program to be debugged calls multiple operators, the msDebug tool can debug only a specified operator.
  • During operator debugging, the overflow/underflow detection function is disabled.

Supported Products

The following products are supported:

  • Atlas A3 training products/Atlas A3 inference products
  • Atlas A2 training products/Atlas A2 inference products

[!NOTE]NOTE

  • For details about Ascend product models, see Ascend Product Models.
  • For details about the supported functions, see the documentation of the corresponding function module.

Precautions

  • You can run the help command to view all the commands supported by msDebug. Commands excluded in command reference are implemented by the open-source debugger LLDB. Pay attention to related risks when using LLDB. For details about how to use LLDB, see its official document.
  • You need to ensure the execution security of executable files or applications.
    • You are advised to restrict the operation permission on executable files or applications to avoid privilege escalation risks.
    • Avoid high-risk operations (such as deleting files, deleting directories, changing passwords, and running privilege escalation commands) to prevent security risks.

Command Reference

Table 1 Command reference

Command

Command Abbreviation

Description

Example

breakpoint set -f filename -l linenum

b

Adds breakpoints.filename indicates the operator implementation code file *.cpp.linenum indicates the specific line number of the code file.

b add_custom.cpp:85

run

r

Runs the program.

r

continue

c

Continues to run.

c

print variable

p

Prints variables.

p zLocal

frame variable

var

Displays all local variables in the current scope.

var

memory read

x

Reads memory.

x -m GM -f float16[] 0x00001240c0037000 -c 2 -s 128 -E 0
  • -m: specifies the memory location. GM, UB, L0A, L0B, L0C, L1, FB, STACK, DCACHE, and ICACHE are supported. STACK, DCACHE, and ICACHE are used only when the dump file of an abnormal operator is parsed.
  • -s: specifies the number of bytes to be printed in each line.
  • -c: specifies the number of lines to be printed.
  • -f: specifies the type of data to be printed.
  • -E or --offset: skips the first x elements during printing.
  • 0x00001240c0037000: indicates the memory address to be read. Replace it with the actual address.

ascend info devices

-

Queries device information.

ascend info devices

ascend info cores

-

Queries the AI Core information of the operator.

ascend info cores

ascend info tasks

-

Queries information about the task where the operator runs.

ascend info tasks

ascend info stream

-

Queries information about the stream where the operator runs.

ascend info stream

ascend info blocks

-

Queries information about the block where the operator runs.

Prints information about the running blocks.
ascend info blocks 
Prints the code of the running blocks at the current interrupt.
ascend info blocks -d

ascend aic id

-

Switches the Cube core focused by the debugger.

ascend aic 1

ascend aiv id

-

Switches the vector core focused by the debugger.

ascend aiv 5

"CTRL+C"

-

Manually interrupts the operator running program and displays the interruption location information.

Enter a value using the keyboard.

register read

re r

Reads the value of a register. `-a` reads the values of all registers. `$REG_NAME` reads the value of a register with a specified name.

register read -a
re r $PC

thread step-over

next or n

Moves to the next executable line of code in the same call stack.

n

thread step-in

step or s

Enters the function for debugging.

s

thread step-out

finish

Executes the remaining part of the function and returns to the main program to continue execution.

finish

thread backtrace

bt

Displays the code call stack information.

bt

target modules add <kernel.o>

image add [kernel.o]

Imports operator debugging information when the PyTorch framework calls operators.

image add xx.o       

target modules load --file <kernel.o> --slide <address>

image load -f <kernel.o> -s <address>

Loads operator debugging information when the PyTorch framework calls operators to make the imported debugging information take effect.

image load -f xx.o -s 0

msdebug --core corefile [kernel.o|fatbin]

-

  • Loads the coredump file.
  • The second parameter is optional. If you need to use it, you can pass a file in any of the following formats: an operator file in kernel.o format generated by using the `-g` compilation, or an executable file or dynamic library file of the operator binary generated by using the `-g` compilation. This parameter is used to display the call stack of code lines.
msdebug --core corefile xx.o
msdebug --core corefile

ascend info summary

-

Displays information about the coredump file.

ascend info summary

help msdebug_command

-

Displays the help information about the tool command. The command output displays the function, syntax, and options of a command.

help run
The help information about the core switching command is as follows:
(msdebug) help ascend aic
change the id of the focused ascend aicore.
Syntax: ascend aic <id>
The help information about the `ascend info blocks` command is as follows:
(msdebug) help ascend info blocks
show blocks overall info.
Syntax: ascend info blocks
Command Options Usage:
  ascend info blocks [-d]
       -d ( --details )
            Show stopped states for all blocks.

[!NOTE]NOTE

  • Currently, the bt command applies only to the coredump feature scenario. The call stack information is accurate only when stop\_reason is CUBE\_ERROR, CCU\_ERROR, MTE\_ERROR, VEC\_ERROR, and FIXP\_ERROR.
  • If the function name displayed in the bt command is too long, you can set it by referring to formatting.
setting set frame-format "frame #${frame.index}: ${frame.pc}{ ${module.file.basename}{{${frame.no-debug}${function.pc-offset}}}}{ at ${line.file.basename}:${line.number}{:${line.column}}}{${function.is-optimized} [opt]}{${frame.is-artificial} [artificial]}\n"
  • After the run command is executed, run the image add command to import the debugging information. Then, run the image load command for the imported debugging information to take effect.

Tool Usage

Importing Debugging Information

Before debugging an operator, enable the debugging -g -O0 option and recompile the operator to include debugging information in the operator binary. For details, see Compiling Operators Based on the Sample Project. The operator debugging information is automatically imported to the msDebug tool.

Starting the tool

The msDebug tool can be started in either of the following ways.

[!NOTE]NOTE
If Cannot read termcap database; using dumb terminal settings is displayed, configure export TERMINFO=xx to eliminate the message. xx indicates the local TERMINFO path.

export TERMINFO=xx    # You can run the infocmp -D command to query the value of xx. You can select a path that meets the current terminal configuration as the value of TERMINFO.
  • Load the executable file application.

    1. After the operator is built, the executable file application on the NPU can be obtained.
    2. Use msDebug to load the executable file.

      $ msdebug ./application
      

      [!NOTE]NOTE

      • Perform one-click compilation and running based on the kernel framework of the Ascend C operator to generate the executable file application on the NPU. For details, see "Kernel Launch Operator Development" > "Kernel Launch" in Ascend C Operator Development Guide.
      • If the executable file has other input parameters, pass them as follows:
      msdebug -- ./application --flag1 arg1 --flag2 args2 ...
      
  • Load the Python script for operator calling.

    1. After plugins of the PyTorch framework are developed, you can directly call Ascend C custom operators from PyTorch through the custom Python script test_ops_custom.py.
    2. Use msDebug to load the Python script.

      $ msdebug python3 test_ops_custom.py
      msdebug(MindStudio Debugger) is part of MindStudio Operator-dev Tools.
      The tool provides developers with a mechanism for debugging Ascend kernels running on actual hardware.
      This enables developers to debug Ascend kernels without being affected by potential changes brought by simulation and emulation environments.
      (msdebug) target create "python3"
      Current executable set to '${INSTALL_DIR}/projects/application' (aarch64).
      (msdebug) settings set -- target.run-args  "test_ops_custom.py"
      (msdebug)
      

      [!NOTE]NOTE
      For details about the single-operator calling scenario through the PyTorch framework, see "OpPlugin in Ascend-developed Plugins" in Ascend Extension for PyTorch Suite and Third-party Library Support List.

Exiting Debugging

Exit the debugger.

(msdebug) q
[localhost add_ascendc_sample]$ 

[!NOTE]NOTE
The debugging channel cannot be disabled independently. To disable the debugging channel, you need to enable the overwrite mode. For details, see the NPU driver and firmware installation documents.

Specifying a Device ID (MC2 Operator Scenario)

When debugging a single-process multi-thread MC2 operator, you can run the ascend device ID command (ID indicates the device ID) to specify the device ID to debug the operator on a specific device. This debugging mode has the following advantages:

  • Higher debugging efficiency: By selecting a specific device, you can use hardware resources more efficiently and accelerate the debugging process.
  • Well targeted: You can debug a specific device to detect and resolve performance bottlenecks or compatibility issues related to the device.
  • Issue isolation: If a performance or function issue occurs, you can specify different device IDs to check whether the issue is caused by a specific device, thereby making it easier to locate the issue.

[!NOTE]NOTE

  • If no device ID is specified, only the device ID set for the first time during program running is debugged.
  • The HCCL APIs do not support step-by-step debugging. For details about the APIs, see "High-Level APIs" > "HCCL" > > "HCCL Kernel APIs" in Ascend C Operator Development API Reference.
py38) [root@localhost MC2-master]# msdebug /home/xxx/MC2-master/bin/alltoall_custom_aarch64
msdebug(MindStudio Debugger) is part of MindStudio Operator-dev Tools.
The tool provides developers with a mechanism for debugging Ascend kernels running on actual hardware.
This enables developers to debug Ascend kernels without being affected by potential changes brought by simulation and emulation environments.
(msdebug) target create "/home/xxx/MC2-master/bin/alltoall_custom_aarch64"
Current executable set to '/home/xxx/MC2-master/bin/alltoall_custom_aarch64' (aarch64).
(msdebug) b all_to_all_custom_v3.cpp:58
Breakpoint 1: 2 locations.
(msdebug) ascend device 1
(msdebug) run --x1_shape 72,17 --input_tensor_format ND --input_tensor_dtype fp16 --output_shape 72,17 --output_dtype fp16 --output_format ND --n_dev 2 --bin_path feature/aclnn/AllToAllCustom_fp16_ND_fuzz_000010 --loop_cnt 1 --platform 1971 --version 3 --tileM 128 | tee /home/shelltest/MC2-master/feature/aclnn/AllToAllCustom_fp16_ND_fuzz_000010/mc2_memory.log
Process 2625643 launched: '/home/xxx/MC2-master/bin/alltoall_custom_aarch64' (aarch64)
[INFO] rank 0 hcom: xx.xx.xx.xxx%enp189s0f0_60000_0_1747739573633567 stream: 0xaaaac9e14610, context : 0xaaaac9daeda0
[INFO] rank 1 hcom: xx.xx.xx.xxx%enp189s0f0_60000_0_1747739573633567 stream: 0xaaaaca8c8380, context : 0xaaaaca88f280
 before RunGraph : free :29837 M,  total:30196 M, used :358 M, ret :0
 before RunGraph : free :29835 M,  total:30196 M, used :360 M, ret :0
Process 2625643 stopped and restarted: thread 19 received signal: SIGCHLD
[INFO]  M is 72, K is 17, tileM is 128, tileNum is 0, tailM is 36, tailNum is 1, useBufferType is 0
[INFO]  M is 72, K is 17, tileM is 128, tileNum is 0, tailM is 36, tailNum is 1, useBufferType is 0
[Launch of Kernel AllToAllCustomV3_f1974b24a4ace3957d571b2712b3eadf_1000 on Device 1]
[Launch of Kernel AllToAllCustomV3_f1974b24a4ace3957d571b2712b3eadf_1000 on Device 1]
Process 2625643 stopped
[Switching to focus on Kernel AllToAllCustomV3_f1974b24a4ace3957d571b2712b3eadf_1000, CoreId 0, Type aiv]
* thread #1, name = 'alltoall_custom', stop reason = breakpoint 1.2
    frame #0: 0x0000000000004e0c AllToAllCustomV3_f1974b24a4ace3957d571b2712b3eadf.o`all_to_all_custom_v3_1000_tilingkey.vector(aGM="\x8b2d3+\xb\xbe\xb7\xa94\x87\xba;\xb6\xf68\U0000000e9\xc1\xa9", cGM="", workspaceGM="", tilingGM="d") at all_to_all_custom_v3.cpp:58:28
   55       auto &&cfg = tilingData.param;
   56       const uint8_t tileNum = cfg.tileNum;
   57       const uint8_t tailNum = cfg.tailNum;
-> 58       const uint64_t tileM = cfg.tileM;
   59       const uint64_t tailM = cfg.tailM;
   60       const uint64_t M = cfg.M;
   61       const uint64_t K = cfg.K;

Breakpoint Setting

Function

When using msDebug to debug an operator, you can set line breakpoints on the execution program of the operator, that is, set breakpoints at a specific line in the operator code file.

Precautions

  • If an operator implementation file with the same name exists on both the host and kernel, you are advised to use an absolute path to set a breakpoint to ensure that the breakpoint is set on the target file.
  • When a breakpoint is set on the source code file, an alarm indicating that the actual location cannot be found may be displayed, as shown in the following. After the operator is executed, the actual location is automatically found and the breakpoint is automatically set.

    (msdebug) b /home/xx/op_kernel/matmul_leakyrelu_kernel.cpp:24
    Breakpoint 1: no locations (pending on future shared library load).
    WARNING:  Unable to resolve breakpoint to any actual locations.
    (msdebug)
    
  • If the operator code is compiled into the dynamic library and loaded by using the operator launch symbol, when a breakpoint is set before the run command is executed, the command output indicates that the breakpoint position is not found (pending on future shared library load). The dynamic library is loaded only after the program is executed. The operator debugging information is parsed after the run command is executed, and then the breakpoint is updated and reset.

    (msdebug) b matmul_leakyrelu_kernel.cpp:55 
    Breakpoint 1: no locations (pending on future shared library load). 
    WARNING:  Unable to resolve breakpoint to any actual locations. 
    (msdebug) run 
    ... 
    1 location added to breakpoint 1
    ...  
    

Example

Setting a Line Breakpoint

  1. Add a breakpoint in line 114 of the kernel function implementation file matmul\_leakyrelu. If the following information is displayed, the breakpoint is successfully added:

    (msdebug) b matmul_leakyrelu_kernel.cpp:114
    Breakpoint 1: where = device_debugdata`_ZN17MatmulLeakyKernelIDhDhffE7CopyOutEj_mix_aiv + 240 at matmul_leakyrelu_kernel.cpp:114:14, address = 0x000000000000ff88
    

    For details about the command output, see the following table.

    Table 1 Information description

    Field

    Description

    device_debugdata

    Name of the .o file on the device.

    matmul_leakyrelu_kernel.cpp

    Name of the kernel function where the breakpoint is located.

    CopyOut

    Current function.

    240

    Offset of the breakpoint address relative to the address of the CopyOut function. In this example, the offset of 0xff88 relative to the address of the CopyOut function is 240.

    address = 0x000000000000ff88

    Breakpoint address, that is, logical relative address.

  2. Run the operator program and wait until the breakpoint is hit. 0x000000000000ff88 indicates the address of the PC where the breakpoint is located.

    (msdebug) run
    Process 165366 launched: '${INSTALL_DIR}/projects/normal_sample/mix/matmul_leakyrelu.fatbin' (aarch64)
    [Launch of Kernel matmul_leakyrelu_custom on Device 1]
    Process 165366 stopped
    [Switching to focus on Kernel matmul_leakyrelu_custom, CoreId 14, Type aiv]
    * thread #1, name = 'matmul_leakyrelu', stop reason = breakpoint 1.1
        frame #0: 0x000000000000ff88 device_debugdata`_ZN17MatmulLeakyKernelIDhDhffE7CopyOutEj_mix_aiv(this=0x000000000019fb60, count=0) at matmul_leakyrelu_kernel.cpp:114:14
       111          (uint16_t)(tiling.baseN * sizeof(cType) / DEFAULT_C0_SIZE),
       112          0,
       113          (uint16_t)((tiling.N - tiling.baseN) * sizeof(cType) / DEFAULT_C0_SIZE)};
    -> 114      DataCopy(cGlobal[startOffset], reluOutLocal, copyParam);
       115      reluOutQueue_.FreeTensor(reluOutLocal);
       116  }
       117
    (msdebug)
    

Printing Breakpoints

Run the following command to print the positions and sequence numbers of all breakpoints that have been set.

(msdebug) breakpoint list 
Current breakpoints:
1: file = 'add_custom.cpp', line = 85, exact_match = 0, locations = 1, resolved = 1, hit count = 1
  1.1: where = device_debugdata`::add_custom(uint8_t *__restrict, uint8_t *__restrict, uint8_t *__restrict) + 14348 [inlined] KernelAdd::CopyOut(int) + 1700 at add_custom.cpp:85:9, address = 0x000000000000380c, resolved, hit count = 1 

Deleting Breakpoints

  1. Delete the breakpoint with a specific line number.

    (msdebug) breakpoint delete 1
    1 breakpoints deleted; 0 breakpoint locations disabled.
    
  2. Resume the running of the program. Due to breakpoint deletion, the program keeps running to the last minute.

    (msdebug) c
    Process 165366 resuming
    4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00
    4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00
    4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00
    4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00 4096.00
    Process 165366 exited with status = 0 (0x00000000)
    (msdebug)
    

Memory and Variable Printing

Function

Based on the variable type and usage, a variable can be stored in a register or in the local memory or global memory. You can determine the storage location by printing the variable address and further view the associated memory content.

Precautions

Currently, the msDebug tool cannot directly print the value of a template parameter by variable name. You need to print the value of the template parameter using the p *Template_parameter_object*. The value of the template parameter is displayed after printing. For example, COMPUTE\_LENGTH is a template parameter, and this is the object pointer to which the template parameter belongs. If you want to print the value of the parameter, run the p this command where the parameter is used. An example is provided as follows:

   22   template<class ArchTag_, class ElementAccumulator_, class ElementOut_, uint32_t COMPUTE_LENGTH>
   23   struct ReduceAdd {
   24       ReduceAdd(Arch::Resource<ArchTag> &resource)
   25       {
 -> 26            for (uint32_t i = 0; i < BUFFER_NUM; i++) {
   27               inputBuffer[i] = resource.ubBuf.template GetBufferByByte<ElementAccumulator>(bufferOffset);
   28               bufferOffset += COMPUTE_LENGTH * sizeof(ElementAccumulator);
(msdebug) p this
(Catlass::Gemm::Kernel::ReduceAdd<Catlass::Arch::AtlasA2, float, __fp16, 32> *) $0 = 0x00000000001cf838

Example

Printing Variables

After a breakpoint is hit, you can run the p variable\_name command to print the value of a specified variable. For example:

(msdebug) p alpha
(float) $0 = 0.00100000005
(msdebug) p tiling
(const TCubeTiling) $1 = {
  usedCoreNum = 2
  M = 1024
  N = 640
  Ka = 256
  ...
}

Printing GlobalTensor

GlobalTensor is used to store the global data of the global memory (external storage).

You can run the following commands to print GlobalTensor. The following takes cGlobal as an example. The address_ field specifies the memory address of zGm. In this example, the value is 0x000012c045400000.

(msdebug) p cGlobal
(AscendC::GlobalTensor<float>) $0 = {
  AscendC::BaseGlobalTensor<float> = {
    address_ = 0x000012c045400000
    oriAddress_ = 0x000012c045400000
  }
  bufferSize_ = 655360
  shapeInfo_ = {
    shapeDim = '\0'
    originalShapeDim = '\0'
    shape = ([0] = 0, [1] = 0, [2] = 0, [3] = 0, [4] = 0, [5] = 0, [6] = 0, [7] = 0)
    originalShape = ([0] = 0, [1] = 0, [2] = 0, [3] = 0, [4] = 0, [5] = 0, [6] = 0, [7] = 0)
    dataFormat = ND
  }
  cacheMode_ = CACHE_MODE_NORMAL
}

The actual values of GlobalTensor variables are stored in the GM. Run the following command to print the values at 0x000012c045400000 in the GM. The example printing format contains the following information: one line to be printed, 256 bytes in each line, in float32 format.

(msdebug) x -m GM -f float32[] 0x000012c045400000 -s 256 -c 1
0x12c045400000: {4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096}

[!NOTE]NOTE

  • If you want to print other custom addresses, ensure the validity of the custom addresses. Otherwise, errors may occur during operator running.
  • If you want to print the memory starting from a custom address, you can add an offset based on the address\_ field as the start address. The unit of the offset is byte. After the offset GM memory address is obtained, enter it into the memory printing command.

Printing LocalTensor

LocalTensor is used to store the data in the local memory (internal storage) of the AI Core.

Run the following command to print the LocalTensor variable. reluOutLocal is used as an example. For the memory address of reluOutLocal, refer to the bufferAddr parameter in the address\_ field. In this example, the address is 0, and the length is 131072.

(msdebug) p reluOutLocal
(AscendC::LocalTensor<float>) $2 = {
  AscendC::BaseLocalTensor<float> = {
    address_ = (dataLen = 131072, bufferAddr = 0, bufferHandle = "", logicPos = '\n')
  }
  shapeInfo_ = {
    shapeDim = '\0'
    originalShapeDim = '\0'
    shape = ([0] = 0, [1] = 1092616192, [2] = 4800, [3] = 1473680, [4] = 0, [5] = 1473888, [6] = 0, [7] = 1471968)
    originalShape = ([0] = 0, [1] = 3222199212, [2] = 4800, [3] = 1, [4] = 0, [5] = 1473376, [6] = 0, [7] = 1473376)
    dataFormat = ND
  }
}

The actual content of the tensor is stored in the UB memory. You can run the following command to print the value at address 0 in the UB memory. The example printing format contains the following information: one line to be printed, 256 bytes in each line, in float32 format.

(msdebug) x -m UB -f float32[] 0 -s 256 -c 1
0x00000000: {4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096}

[!NOTE]NOTE

  • In this sample, the actual content of the tensor variables is stored in the UB. However, the local tensor may be stored in the UB, L1, L0A, or L0B. You need to determine store location based on the code, and select the correct memory type for the -m option of the printing command.
  • If you want to print the memory starting from a custom address, you can add an offset based on the address\_ field as the start address. The unit of the offset is byte. After the offset GM memory address is obtained, enter it into the memory printing command.

Printing All Local Variables

Print all local variables in the current scope:

(msdebug) var
(MatmulLeakyKernel<__fp16, __fp16, float, float> *__stack__) this = 0x0000000000167b60
(uint32_t) count = 0
(const uint32_t) roundM = 2
(const uint32_t) roundN = 5
(uint32_t) startOffset = 0
(AscendC::DataCopyParams) copyParam = (blockCount = 256, blockLen = 16, srcStride = 0, dstStride = 64)

Single-Step Debugging

Function

To understand the code execution details, you can run the thread step-over command to execute the code line by line for single-step debugging, or run the step in command to enter the function for debugging, or run the finish command to return to the next line of the function call point to continue debugging.

Precautions

During operator build, the build option of --cce-ignore-always-inline=true is used.

Example

Example for Using the thread step-over Command

  1. Set a breakpoint to the position to be debugged and run the program. For details about how to set a breakpoint, see Breakpoint Setting.

    (msdebug) r       // Running
    Process 177943 launched: '${INSTALL_DIR}/projects/mix/matmul_leakyrelu.fatbin' (aarch64)
    [Launch of Kernel matmul_leakyrelu_custom on Device 1]
    Process 177943 stopped
    [Switching to focus on Kernel matmul_leakyrelu_custom, CoreId 44, Type aiv]
    * thread #1, name = 'matmul_leakyrelu', stop reason = breakpoint 1.2
        frame #0: 0x000000000000f01c device_debugdata`_ZN17MatmulLeakyKernelIDhDhffE10CalcOffsetEiiRK11TCubeTilingRiS4_S4_S4__mix_aiv(this=0x0000000000217b60, blockIdx=0, usedCoreNum=2, tiling=0x0000000000217e28, offsetA=0x00000000002175c8, offsetB=0x00000000002175c4, offsetC=0x00000000002175c0, offsetBias=0x00000000002175bc) at matmul_leakyrelu_kernel.cpp:129:15
       126
       127      offsetA = mCoreIndx * tiling.Ka * tiling.singleCoreM;
       128      offsetB = nCoreIndx * tiling.singleCoreN;             
    -> 129      offsetC = mCoreIndx * tiling.N * tiling.singleCoreM + nCoreIndx * tiling.singleCoreN;        // Breakpoint position
       130      offsetBias = nCoreIndx * tiling.singleCoreN;
       131  }
       132
    (msdebug)
    
  2. Enter the next or n command for step-by-step execution.

    (msdebug) n
    Process 177943 stopped
    [Switching to focus on Kernel matmul_leakyrelu_custom, CoreId 44, Type aiv]
    * thread #1, name = 'matmul_leakyrelu', stop reason = step over   // If the PC location is displayed in the command output, the step-by-step execution is successful.
        frame #0: 0x000000000000f048 device_debugdata`_ZN17MatmulLeakyKernelIDhDhffE10CalcOffsetEiiRK11TCubeTilingRiS4_S4_S4__mix_aiv(this=0x0000000000217b60, blockIdx=0, usedCoreNum=2, tiling=0x0000000000217e28, offsetA=0x00000000002175c8, offsetB=0x00000000002175c4, offsetC=0x00000000002175c0, offsetBias=0x00000000002175bc) at matmul_leakyrelu_kernel.cpp:130:18
       127      offsetA = mCoreIndx * tiling.Ka * tiling.singleCoreM;
       128      offsetB = nCoreIndx * tiling.singleCoreN;
       129      offsetC = mCoreIndx * tiling.N * tiling.singleCoreM + nCoreIndx * tiling.singleCoreN;
    -> 130      offsetBias = nCoreIndx * tiling.singleCoreN;
       131  }
    
  3. Run the ascend info cores command to view the PC information and stop reason of all cores.

    (msdebug) ascend info cores
      CoreId  Type  Device Stream Task Block         PC               stop reason
       12     aic      1     3     0     0     0x12c0c00f03b0         breakpoint 1.2
    *  44     aiv      1     3     0     0     0x12c0c00f8048         step over               // * indicates the core that is currently running.
       45     aiv      1     3     0     0     0x12c0c00f801c         breakpoint 1.2
    

    [!NOTE]NOTE

    • If the current core is stopped due to both step-by-step debugging and breakpoints, "breakpoint" is displayed.
    • If the running program freezes, you can press "Ctrl+C" to interrupt the program. The possible causes of freezing are as follows:
    • The user program itself has an infinite loop, which needs to be rectified by repairing the program.
    • An operator uses synchronization instructions.
  4. After the debugging is complete, run the q command and enter Y or y to end the debugging.

    (msdebug) q
    Quitting LLDB will kill one or more processes. Do you really want to proceed: [Y/n] y
    

Example for Using the thread step-in and thread step-out Commands

  1. Set a breakpoint to the position to be debugged and run the program. For details about how to set a breakpoint, see Breakpoint Setting.

    (msdebug) r                // Running
    Process 180938 launched: '${INSTALL_DIR}/test/mstt/sample/normal_sample/mix/matmul_leakyrelu.fatbin' (aarch64)
    [Launch of Kernel matmul_leakyrelu_custom on Device 1]
    Process 180938 stopped
    [Switching to focus on Kernel matmul_leakyrelu_custom, CoreId 46, Type aiv]
    * thread #1, name = 'matmul_leakyrelu', stop reason = breakpoint 1.1
        frame #0: 0x000000000000e948 device_debugdata`_ZN17MatmulLeakyKernelIDhDhffE7ProcessEPN7AscendC5TPipeE_mix_aiv(this=0x000000000021fb60, pipe=0x000000000021f6a8) at matmul_leakyrelu_kernel.cpp:83:9
       80       while (matmulObj.template Iterate<true>()) {
       81           MatmulCompute();
       82           LeakyReluCompute();
    -> 83           CopyOut(computeRound);
       84           computeRound++;
       85       }
       86       matmulObj.End();
    
  2. Input step or s to enter the function for execution.

    (msdebug) s
    Process 180938 stopped
    [Switching to focus on Kernel matmul_leakyrelu_custom, CoreId 46, Type aiv]
    * thread #1, name = 'matmul_leakyrelu', stop reason = step in
        frame #0: 0x000000000000febc device_debugdata`_ZN17MatmulLeakyKernelIDhDhffE7CopyOutEj_mix_aiv(this=0x000000000021fb60, count=0) at matmul_leakyrelu_kernel.cpp:106:5
       103  template <typename aType, typename bType, typename cType, typename biasType>
       104  __aicore__ inline void MatmulLeakyKernel<aType, bType, cType, biasType>::CopyOut(uint32_t count)
       105  {
    -> 106      reluOutQueue_.DeQue<cType>();
       107      const uint32_t roundM = tiling.singleCoreM / tiling.baseM;
       108      const uint32_t roundN = tiling.singleCoreN / tiling.baseN;
       109      uint32_t startOffset = (count % roundM * tiling.baseM * tiling.N + count / roundM * tiling.baseN);
    
  3. Run the ascend info cores command to view the PC information and stop reason of all cores.

    (msdebug) ascend info cores
      CoreId  Type  Device Stream Task Block         PC               stop reason
       13     aic      1     3     0     0     0x12c0c00f1f88         breakpoint 1.1
    *  46     aiv      1     3     0     0     0x12c0c00f8ebc         step in          // * indicates the core that is currently running.
       47     aiv      1     3     0     0     0x12c0c00f8d3c         breakpoint 1.1
    

    [!NOTE]NOTE
    If the current core is stopped due to both function debugging and breakpoints, breakpoint is displayed.

  4. After debugging the CopyOut function, run the finish command to exit the CopyOut function and return to the main program to continue execution.

    (msdebug) finish
    Process 180938 stopped
    [Switching to focus on Kernel matmul_leakyrelu_custom, CoreId 46, Type aiv]
    * thread #1, name = 'matmul_leakyrelu', stop reason = step out
        frame #0: 0x000000000000e950 device_debugdata`_ZN17MatmulLeakyKernelIDhDhffE7ProcessEPN7AscendC5TPipeE_mix_aiv(this=0x000000000021fb60, pipe=0x000000000021f6a8) at matmul_leakyrelu_kernel.cpp:84:21
       81           MatmulCompute();
       82           LeakyReluCompute();
       83           CopyOut(computeRound);
    -> 84           computeRound++;
       85       }
       86       matmulObj.End();
       87   }
    

Running Interrupting

Function

When the operator execution program freezes, manually interrupt the operator execution program and display the interrupted location information.

Precautions

  • If the running program freezes, you can press "Ctrl+C" to interrupt the program. The possible causes of freezing are as follows:

    • The user program itself has an infinite loop, which needs to be rectified by repairing the program.
    • An operator uses synchronization instructions.
  • This function can debug only the operator programs started in msDebug.

  • After the interruption takes effect, the debugging information displaying and core switching functions are supported. Currently, single-step debugging, register reading, memory and variable printing, and continue command are not supported.

Example

  1. When the operator execution program on the host or device suspends, enter "CTRL+C" to manually interrupt the operator execution program and display the interrupted location.

    (msdebug) r
    Process 173221 launched: '${INSTALL_DIR}/projects/mix/matmul_leakyrelu.fatbin' (aarch64)
    [Launch of Kernel matmul_leakyrelu_custom on Device 1]
    // Enter CTRL+C.
    Process 173221 stopped
    [Switching to focus on Kernel matmul_leakyrelu_custom, CoreId 35, Type aiv]
    * thread #1, name = 'matmul_leakyrelu', stop reason = signal SIGSTOP
        frame #0: 0x000000000000ef5c device_debugdata`_ZN17MatmulLeakyKernelIDhDhffE10CalcOffsetEiiRK11TCubeTilingRiS4_S4_S4__mix_aiv(this=<unavailable>, blockIdx=<unavailable>, usedCoreNum=<unavailable>, tiling=<unavailable>, offsetA=<unavailable>, offsetB=<unavailable>, offsetC=<unavailable>, offsetBias=<unavailable>) at matmul_leakyrelu_kernel.cpp:127:5
       124      auto mCoreIndx = blockIdx % mSingleBlocks;
       125      auto nCoreIndx = blockIdx / mSingleBlocks;
       126
    -> 127      while(true) {
       128      }
       129      offsetA = mCoreIndx * tiling.Ka * tiling.singleCoreM;
       130      offsetB = nCoreIndx * tiling.singleCoreN;
    (msdebug)
    
  2. After the debugging is complete, run the q command and enter Y or y to end the debugging.

    (msdebug) q
    Quitting LLDB will kill one or more processes. Do you really want to proceed: [Y/n] y
    

Core Switching

Function

Switch the current core to the specified core. After the core is switched, the position of the code interruption of the specified core is automatically displayed.

Example

  • Assume that the running core is core 2 of the AIV, and the core to be switched is core 3 of the AIV.

    (msdebug) ascend aiv 3
    [Switching to focus on Kernel matmul_leakyrelu_custom, CoreId 3, Type aiv]
    * thread #1, name = 'matmul_leakyrelu', stop reason = breakpoint 1.1
        frame #0: 0x000000000000fd3c device_debugdata`_ZN7AscendC13WaitEventImplEt_mix_aiv(flagId=1) at kernel_operator_sync_impl.h:142:5
       139
       140  __aicore__ inline void WaitEventImpl(uint16_t flagId)
       141  {
    -> 142      wait_flag_dev(flagId);
       143  }
       144
       145  __aicore__ inline void SetSyncBaseAddrImpl(uint64_t config)
    

    After the switchover is complete, query the core information again. You can see that the core is switched to the line where the new core ID is located.

    (msdebug) ascend info cores
      CoreId  Type  Device Stream Task Block         PC               stop reason
       17     aic      1     3     0     0     0x12c0c00f1f88         breakpoint 1.1
        2     aiv      1     3     0     0     0x12c0c00f8fbc         breakpoint 1.1
    *   3     aiv      1     3     0     0     0x12c0c00f8d3c         breakpoint 1.1
    
  • Assume that the running core is core 3 of the AIV, and the core to be switched is core 17 of the AIC.

    (msdebug) ascend aic 17
    [Switching to focus on Kernel matmul_leakyrelu_custom, CoreId 17, Type aic]
    * thread #1, name = 'matmul_leakyrelu', stop reason = breakpoint 1.1
        frame #0: 0x0000000000008f88 device_debugdata`_ZN7AscendC7BarrierEv_mix_aic at kfc_comm.h:39
       36
       37   namespace AscendC {
       38   __aicore__ inline void Barrier()
    -> 39   {
       40   #if defined(__CCE_KT_TEST__) && __CCE_KT_TEST__ == 1
       41       __asm__ __volatile__("" ::: "memory");
       42   #else
    

    After the switchover is complete, query the core information again. You can see that the core is switched to the line where the new core ID is located.

    (msdebug) ascend info cores
      CoreId  Type  Device Stream Task Block         PC               stop reason
    *  17     aic      1     3     0     0     0x12c0c00f1f88         breakpoint 1.1
        2     aiv      1     3     0     0     0x12c0c00f8fbc         breakpoint 1.1
        3     aiv      1     3     0     0     0x12c0c00f8d3c         breakpoint 1.1
    

Program Status Checking

Function

After using msDebug to call an operator, you can read register values of the device where the current breakpoint is located to check the program status.

Example

  • After register read -a is entered, all available register values on the current device are returned.

    (msdebug) register read -a
                      PC = 0x12C0C00F1F88
                    COND = 0x0
                    CTRL = 0x100000000003C
                    GPR0 = 0x12C041200100
                    GPR1 = 0x146FD9
                    GPR2 = 0x146FC8
                    GPR3 = 0x8001000800
                    GPR4 = 0x80300000100
                    GPR5 = 0x80000000000
                    GPR6 = 0x0
                    GPR7 = 0x300000000
                    GPR8 = 0x3
                    GPR9 = 0x1000000
                   GPR10 = 0xFFF
                   GPR11 = 0xFC0
                   GPR12 = 0x0
                   GPR13 = 0x0
                   GPR14 = 0x0
                   GPR15 = 0x11
                   GPR16 = 0x7FFF
                   GPR17 = 0x7A0
                   GPR18 = 0x0
                   GPR19 = 0x0
                   GPR20 = 0x0
                   GPR21 = 0x0
                   GPR22 = 0x0
                   GPR23 = 0x0
                   GPR24 = 0x0
                   GPR25 = 0x0
                   GPR26 = 0x0
                   GPR27 = 0x0
                   GPR28 = 0x0
                   GPR29 = 0x146EE8
                   GPR30 = 0x147640
                   GPR31 = 0x12C0C00F1ED4
                   LPCNT = 0x0
                  STATUS = 0x0
                 SYS_CNT = 0x774E308602
           ICACHE_PRL_ST = 0x0
           SAFETY_CRC_EN = 0x0
           ST_ATOMIC_CFG = 0x5
          CALL_DEPTH_CNT = 0x5
          CONDITION_FLAG = 0x1
          FFTS_BASE_ADDR = 0xE7FFE044F000
        CUBE_EVENT_TABLE = 0x70000000000
        FIXP_EVENT_TABLE = 0x0
        MTE1_EVENT_TABLE = 0x700000000
        MTE2_EVENT_TABLE = 0x0
      SCALAR_EVENT_TABLE = 0x0
    
  • After register read $\{variable name\} is entered, the register value on the current device is returned. Separate multiple registers with spaces.

    • The register value is returned when the variable name is available on the current device.
    • Invalid register name 'variable name' is returned when the variable name is not available on the current device.
    (msdebug) register read $PC $test $GPR30
                      PC = 0x12C0C00F1F88
    Invalid register name 'test'.
                   GPR30 = 0x147640
    

Debugging Information Displaying

Function

Query information about the device where the operator runs.

Example

ascend info devices

Run the following command to query the information about the device where the operator is running. The line where \* is located indicates the target device.

(msdebug) ascend info devices
  Device Aic_Num Aiv_Num Aic_Mask Aiv_Mask
*    1      1       2      0x10000     0x3

[!NOTE]NOTE
In the MC2 operator scenario, multiple device IDs are displayed.

For details about the command output, see the following table.

Table 1 Information description

Field

Description

Device

Logical ID of the device.

Aic_Num

Number of used Cube Cores.

Aiv_Num

Number of used Vector Cores.

Aic_Mask

Mask code of the actually used Cube, which is represented by 64 bits. If the nth bit is 1, Cube n is used.

Aiv_Mask

Mask code of the actually used Vector, which is represented by 64 bits. If the nth bit is 1, Vector n is used.

ascend info cores

Run the following command to query the information about the core where the operator is running. The line where * is located indicates the target core. In the following example, the target core is core 0 of the AIV.

(msdebug) ascend info cores
  CoreId  Type  Device Stream Task Block         PC               stop reason
   16     aic      1     3     0     0     0x12c0c00f1fc0         breakpoint 1.1
*   0     aiv      1     3     0     0     0x12c0c00f8fcc         breakpoint 1.1
    1     aiv      1     3     0     0     0x12c0c00f8d3c         breakpoint 1.1

For details about the command output, see the following table.

Table 2 Information description

Field

Description

CoreId

Core ID of the AIV or AIC, starting from 0.

Type

Core type, which can be AIC or AIV.

Device

Logical device ID.

Stream

Stream ID delivered by the current kernel function. A stream consists of a series of tasks.

Task

ID of the task in the current stream. Task indicates the task delivered to the task scheduler for processing.

Block

Number of cores on which the kernel function will be executed. Each core that executes the kernel function is assigned a logical ID, that is, block ID.

PC

Logical absolute address of the PC on the current core.

Stop Reason

Reason why the program execution stops, such as breakpoint, step in, step over, and Ctrl+C.

ascend info tasks

Run the following command to query the task information of the operator. The line where * is located indicates the target task, including device ID, stream ID, task ID, and invocation (name of the called kernel function).

(msdebug) ascend info tasks
  Device Stream Task Invocation
*   1       3     0  matmul_leakyrelu_custom

ascend info stream

Run the following command to query the stream information of the operator. The line where * is located indicates the target stream, including device ID, stream ID, and type (kernel type, which can be AIC or AIV).

(msdebug) ascend info stream
  Device Stream Type
*   1      3    aiv

ascend info blocks

Run the following command to query the block information of the operator. The line where * is located indicates the target block, including device ID, stream ID, task ID, and block ID.

(msdebug) ascend info blocks
  Device Stream Task Block
    1      3     0     0
*   1      3     0     0
    1      3     0     0

Run the following command to print the code of the running block at the current breakpoint:

(msdebug) ascend info blocks -d
Current stop state of all blocks:

[CoreId 16, Block 0]
* thread #1, name = 'matmul_leakyrelu', stop reason = breakpoint 1.1
    frame #0: 0x0000000000008fc0 device_debugdata`_ZN7AscendC14KfcMsgGetStateEj_mix_aic(flag=0) at kfc_comm.h:188
   185      return static_cast<KFC_Enum>((flag & 0xffff0000) >> KFC_MSG_BYTE_OFFSET);
   186  }
   187  __aicore__ inline uint32_t KfcMsgGetState(uint32_t flag)
-> 188  {
   189      return (flag & 0x00008000);
   190  }
   191  __aicore__ inline uint32_t KfcMsgMakeFlag(KFC_Enum funID, uint16_t instID)

[* CoreId 0, Block 0]
* thread #1, name = 'matmul_leakyrelu', stop reason = breakpoint 1.1
    frame #0: 0x000000000000ffcc device_debugdata`_ZN17MatmulLeakyKernelIDhDhffE7CopyOutEj_mix_aiv(this=0x0000000000167b60, count=0) at matmul_leakyrelu_kernel.cpp:116:1
   113          (uint16_t)((tiling.N - tiling.baseN) * sizeof(cType) / DEFAULT_C0_SIZE)};
   114      DataCopy(cGlobal[startOffset], reluOutLocal, copyParam);
   115      reluOutQueue_.FreeTensor(reluOutLocal);
-> 116  }
   117
   118  template <typename aType, typename bType, typename cType, typename biasType>
   119  __aicore__ inline void MatmulLeakyKernel<aType, bType, cType, biasType>::CalcOffset(int32_t blockIdx,

[CoreId 1, Block 0]
* thread #1, name = 'matmul_leakyrelu', stop reason = breakpoint 1.1
    frame #0: 0x000000000000fd3c device_debugdata`_ZN7AscendC13WaitEventImplEt_mix_aiv(flagId=1) at kernel_operator_sync_impl.h:142:5
   139
   140  __aicore__ inline void WaitEventImpl(uint16_t flagId)
   141  {
-> 142      wait_flag_dev(flagId);
   143  }
   144
   145  __aicore__ inline void SetSyncBaseAddrImpl(uint64_t config)

Abnormal Operator Dump File Parsing

Function

If a hardware issue happens onsite, repeated stress tests are needed to reproduce the issue, which slows down troubleshooting. To solve this problem, the system initiates a dump operation upon detecting a potential hardware issue, and captures the current status information. The msDebug tool parses the dump file of an abnormal operator. You can collect sufficient data for fault analysis even without a stress test. The above functions enhance hardware exception detection and minimize repetitive stress tests.

[!NOTE]NOTE
Currently, only the function of parsing dump files of abnormal operators is supported by Ascend 950 products. Other functions are not supported by Ascend 950 products.

Precautions

After the acl.json file is configured, other functions of msDebug cannot be used.

Example

  1. Prepare the acl.json configuration file.

    • Project-based operator development (single-operator API calling scenario): Create the acl.json file by referring to "Initialization and Deinitialization" in Application Development Guide (C&C++) and load the file using the aclinit API.
    • AI framework operator adaptation (PyTorch framework scenario): Search for the acl.json file in the installation directory of torch_npu.
  2. Enable the function of generating dump files for abnormal operators by referring to the configuration file example (dump configuration for abnormal operators) in "acl API Reference (C)" > "System Configuration" > "aclInit" in Application Development Guide (C&C++).

    1. In the acl.json configuration file, set dump\_scene to aic\_err\_detail\_dump.
    2. In the acl.json configuration file, set dump\_path to the path for exporting the dump file of the abnormal operator.
  3. If the program crashes (for example, memory overflow or segmentation fault), a core file of the abnormal operator is generated. The file name ends with .core.

  4. Run the following command with the msDebug tool to load the dump file of the abnormal operator:

    msdebug --core output2/extra-info/data-dump/0/xxx.core add.fatbin
    msdebug(MindStudio Debugger) is part of MindStudio Operator-dev Tools.
    The tool provides developers with a mechanism for debugging Ascend kernels running on actual hardware.
    This enables developers to debug Ascend kernels without being affected by potential changes brought by simulation and emulation environments.
    (msdebug) target create "add.fatbin" --core "output2/extra-info/data-dump/0/xxx.core"
    Core file '/home/xxx/coredump_test/output2/extra-info/data-dump/0/xxx.core' (aarch64) was loaded.
    [Switching to focus on CoreId 26, Type aiv]
    

    [!NOTE]NOTE
    To view the call stack, use the -O2/O3 + -g option to compile and generate the kernel.o file that contains debugging information, or generate the ELF file of the fatbin structure.

    Cause: During operator execution, if a hardware exception occurs due to instruction execution, the hardware usually continues to execute several instructions before reporting the exception and generating a core file. Therefore, the memory and register data in the core file may be inaccurate. However, the value of the PC register is usually corrected.

    At the O2/O3 optimization level, the inline function is used by default. Call stack can still be traced accurately without requiring stack memory data. At the O0 optimization level, no inline function is used forcibly, and the stack memory data is inaccurate. Generally, accurate data requires the 0 stack frame.

  5. View the dump file information of the abnormal operator.

    msdebug --core output2/extra-info/data-dump/0/xxx.core /home/xxxxx/Ascend/cann/opp/vendors/customize/op_impl/ai_core/tbe/kernel/ascend910b/add_custom/AddCustom_xxxx.o
    
    msdebug(MindStudio Debugger) is part of MindStudio Operator-dev Tools.
    The tool provides developers with a mechanism for debugging Ascend kernels running on actual hardware.
    This enables developers to debug Ascend kernels without being affected by potential changes brought by simulation and emulation environments.
    (msdebug) target create "/home/xxx/Ascend/cann/opp/vendors/customize/op_impl/ai_core/tbe/kernel/ascend910b/add_custom/AddCustom_xxx.o" --core "output2/extra-info/data-dump/0/xxx.core"
    Core file '/home/xxx/output2
    /extra-info/data-dump/0/xxx.core' (hiipu64) was loaded.
    [Switching to focus on CoreId 34, Type aiv]
    
    (msdebug) ascend info summary
      CoreId  CoreType        PC         DeviceId    ChipType
        33       AIV    0x12c0412004c8       0        A2/A3
     *  34       AIV    0x12c0412007c0       0        A2/A3
        35       AIV    0x12c0412007c0       0        A2/A3
        36       AIV    0x12c0412007c0       0        A2/A3
        37       AIV    0x12c0412007c0       0        A2/A3
        38       AIV    0x12c0412007c0       0        A2/A3
        39       AIV    0x12c0412007c0       0        A2/A3
        40       AIV    0x12c0412007c0       0        A2/A3
    
      Id           DataType                   MemType                     Addr                       Size             CoreId    CoreType    Dim
       0    DEVICE_KERNEL_OBJECT                GM                   0x12c041200000                 167872             NA         AIV        NA
       1            STACK                    GM/DCACHE           0xff000108000(invalid)              32768             33         AIV        NA
       2            STACK                    GM/DCACHE           0xff000110000(invalid)              32768             34         AIV        NA
       3            STACK                    GM/DCACHE           0xff000118000(invalid)              32768             35         AIV        NA
       4            STACK                    GM/DCACHE           0xff000120000(invalid)              32768             36         AIV        NA
       5            STACK                    GM/DCACHE           0xff000128000(invalid)              32768             37         AIV        NA
       6            STACK                    GM/DCACHE           0xff000130000(invalid)              32768             38         AIV        NA
       7            STACK                    GM/DCACHE           0xff000138000(invalid)              32768             39         AIV        NA
       8            STACK                    GM/DCACHE           0xff000140000(invalid)              32768             40         AIV        NA
       9      WORKSPACE_TENSOR                  GM                         0x0                         0               NA          NA        NA
      10         TILING_DATA                 GM/DCACHE               0x12c100000038                   16               NA          NA        NA
      11        OUTPUT_TENSOR                   GM                   0x12c0c0024000                  32768             NA          NA        [8, 2048]
      12        INPUT_TENSOR                    GM                   0x12c0c0012000                  32768             NA          NA        [8, 2048]
      13        INPUT_TENSOR                    GM                   0x12c0c001b000                  32768             NA          NA        [8, 2048]
      14            ARGS                     GM/DCACHE               0x12c100000000                   96               NA          NA        NA
    
    (msdebug) bt
       * thread #1, stop reason = VEC_ERROR
         * frame #0: 0x000012c0412004c8 AddCustom_xxx.o`::AddCustom_xxx_0(uint8_t *__gm__, uint8_t *__gm__, uint8_t *__gm__, u
       int8_t *__gm__, uint8_t *__gm__) [inlined] void AscendC::TPipe::ReleaseEventID<(AscendC::HardEvent)5>(this=<unavailable>, id=<unavailable>) at kernel_tpipe_impl.h:454:24
           frame #1: 0x000012c0412004c8 AddCustom_xxx.o`::AddCustom_xxx_0(uint8_t *__gm__, uint8_t *__gm__, uint8_t *__gm__, u
       int8_t *__gm__, uint8_t *__gm__) [inlined] AscendC::TQueBind<(AscendC::TPosition)0, (AscendC::TPosition)9, 2, 0>::AllocBuffer(this=<unavailable>) at kernel_tquebind_impl.h:512:3
       6
           frame #2: 0x000012c041200474 AddCustom_xxx.o`::AddCustom_xxx_0(uint8_t *__gm__, uint8_t *__gm__, uint8_t *__gm__, u
       int8_t *__gm__, uint8_t *__gm__) [inlined] AscendC::LocalTensor<half> AscendC::TQueBind<(this=<unavailable>)0, (AscendC::TPosition)9, 2, 0>::AllocTensor<half>() at kernel_tquebi
       nd_impl.h:78:16
           frame #3: 0x000012c041200474 AddCustom_xxx.o`::AddCustom_xxx_0(uint8_t *__gm__, uint8_t *__gm__, uint8_t *__gm__, u
       int8_t *__gm__, uint8_t *__gm__) [inlined] KernelAdd::CopyIn(this=<unavailable>, progress=<unavailable>) at add_custom.cpp:42:57
           frame #4: 0x000012c041200474 AddCustom_xxx.o`::AddCustom_xxx_0(uint8_t *__gm__, uint8_t *__gm__, uint8_t *__gm__, u
       int8_t *__gm__, uint8_t *__gm__) at add_custom.cpp:33:13
           frame #5: 0x000012c04120039c AddCustom_xxx.o`::AddCustom_xxx_0(uint8_t *__gm__, uint8_t *__gm__, uint8_t *__gm__, u
       int8_t *__gm__, uint8_t *__gm__) [inlined] add_custom_0_tilingkey(x=<unavailable>, y=<unavailable>, z=<unavailable>, workspace=<unavailable>, tiling=<unavailable>) at add_custom
       .cpp:83:8
           frame #6: 0x000012c041200064 AddCustom_xxx.o`::AddCustom_xxx_0(uint8_t *__gm__, uint8_t *__gm__, uint8_t *__gm__, u
       int8_t *__gm__, uint8_t *__gm__) [inlined] ascendc_auto_gen_add_custom_kernel(x_in__=<unavailable>, y_in__=<unavailable>, z_out_=<unavailable>, workspace=<unavailable>, tiling=<
       unavailable>) at AddCustom_xxx_3800102_kernel.cpp:43:5
           frame #7: 0x000012c04120004c AddCustom_xxx.o`::AddCustom_xxx_0(x_in__=<unavailable>, y_in__=<unavailable>, z_out_=<
       unavailable>, workspace=<unavailable>, tiling=<unavailable>) at AddCustom_xxx_3800102_kernel.cpp:48:5
    
  6. For details about how to locate hardware exceptions, see Core Switching, Program Status Checking, and Memory and Variable Printing.

  7. After the debugging is complete, run the q command and enter Y or y to end the debugging.

    (msdebug) q
    Quitting LLDB will kill one or more processes. Do you really want to proceed: [Y/n] y