msServiceProfiler Multi Analyze¶

Overview¶

msServiceProfiler Multi Analyze parses the profile data collected by the msServiceProfiler from multiple dimensions, including request-level, batch-level, and overall service-level dimensions.

Supported Products ¶

[!NOTE]

For details about Ascend product models, see Ascend Product Models.

Product Type	Supported (Yes/No)
Atlas A3 Training Products and Atlas A3 Inference Products	Yes
Atlas A2 Training Products and Atlas A2 Inference Products	Yes
Atlas 200I/500 A2 inference products	Yes
Atlas Inference Products	Yes
Atlas training products	No

[!NOTE]

For Atlas A2 training products/Atlas A2 inference products, only the Atlas 800I A2 inference server is supported. For Atlas inference products, only the Atlas 300I Duo inference card and Atlas 800 inference server (model 3000) are supported.

Preparations¶

Environment Setup

Install msServiceProfiler.

Version Compatibility

msServiceProfiler Multi Analyze depends on the ms_service_profiler tool provided in Ascend-cann-toolkit.

msServiceProfiler Multi Analyze	CANN	MindIE
Dependency Version	≥ 8.1.RC1	≥ MindIE 2.0.RC1

Function Description¶

Function¶

This tool can analyze serving profile data across multiple dimensions.

Precautions

None

Syntax¶

```bash
msserviceprofiler analyze 
--input-path=/path/to/input 
[--output-path=/path/to/output/]
[--log-level level]
[--format format]
```

Parameter Description¶

Parameter	Mandatory (Yes/No)	Description
--input-path	Yes	Specifies the path to the profile data.
--output-path	No	Specifies the output directory where the parsing result files will be saved. It defaults to the `output` folder in the current directory.
--log-level	No	Sets the log level. The options are as follows: - `debug`: debug level. Logs debugging information for issue diagnosis. - `info`: informational level. Logs the normal tool operation information. - `warning`: warning level. Indicates unexpected but non-critical states that do not interrupt execution. - `error`: minor error level. - `fatal`: major error level. - `critical`: critical error level.
--format	No	Sets the export format for the profile data output files. The options are `json`, `csv`, and `db`.

Output File Description¶

batch_summary.csv

Field	Description
Metric	Metric item, including the column header metrics and the row header metrics.
Metrics (column header)	-
prefill_batch_num	Number of records with `batch_type` = `Prefill` in each batch.
decode_batch_num	Number of records with `batch_type` = `Decode` in each batch.
prefill_exec_time(ms)	`during_time` of all records with `batch_type` = `Prefill modelExec` in each batch. If `modelExec` is not present, this row is omitted. The unit is ms.
decode_exec_time(ms)	`during_time` of all records with `batch_type` = `Decode modelExec` in each batch. If `modelExec` is not present, this row is omitted. The unit is ms.
Metrics (row header)	-
Average	Average value.
Max	Maximum value.
Min	Minimum value.
P50	50th percentile.
P90	90% quantile value.
P99	99% quantile value.

request_summary.csv

Field	Description
Metric	Metric item, including the column header metrics and the row header metrics.
Metrics (column header)	-
first_token_latency(ms)	Time to first token (TTFT), in ms.
subsequent_token_latency(ms)	Inter-token latency, measuring the average time (in ms) taken to generate each subsequent token after the first one.
total_time(ms)	Total duration of an HTTP request, in ms.
exec_time(ms)	Execution time of `modelExec`. If `modelExec` is not present, this row is omitted. The unit is ms.
waiting_time(ms)	Request waiting time, in ms.
input_token_num	Number of input tokens per request.
generated_token_num	Number of output tokens per request.
Metrics (row header)	-
Average	Average value.
Max	Maximum value.
Min	Minimum value.
P50	50th percentile.
P90	90% quantile value.
P99	99% quantile value.

service_summary.csv

Field	Description
Metric	Metric item, including the column header metrics and the row header metrics.
Metrics (column header)	-
total_input_token_num	Total number of input tokens.
total_generated_token_num	Total number of output tokens.
generate_token_speed(token/s)	Tokens output per second (token/s).
generate_all_token_speed(token/s)	Tokens processed per second (token/s) (total number of input and output tokens).
Metrics (row header)	-
Value	Value

Mapping between domains and parsed results

Parsed Result	Collection Domain
batch_summary.csv	"BatchSchedule"
request_summary.csv	"Request"
service_summary.csv	"Request; BatchSchedule; ModelExecute"