msServiceProfiler Multi Analyze¶
Overview¶
msServiceProfiler Multi Analyze parses the profile data collected by the msServiceProfiler from multiple dimensions, including request-level, batch-level, and overall service-level dimensions.
Supported Products¶
[!NOTE]
For details about Ascend product models, see Ascend Product Models.
| Product Type | Supported (Yes/No) |
|---|---|
| Atlas A3 Training Products and Atlas A3 Inference Products | Yes |
| Atlas A2 Training Products and Atlas A2 Inference Products | Yes |
| Atlas 200I/500 A2 inference products | Yes |
| Atlas Inference Products | Yes |
| Atlas training products | No |
[!NOTE]
For Atlas A2 training products/Atlas A2 inference products, only the Atlas 800I A2 inference server is supported. For Atlas inference products, only the Atlas 300I Duo inference card and Atlas 800 inference server (model 3000) are supported.
Preparations¶
Environment Setup
Install msServiceProfiler.
Version Compatibility
msServiceProfiler Multi Analyze depends on the ms_service_profiler tool provided in Ascend-cann-toolkit.
| msServiceProfiler Multi Analyze | CANN | MindIE |
|---|---|---|
| Dependency Version | ≥ 8.1.RC1 | ≥ MindIE 2.0.RC1 |
Function Description¶
Function¶
This tool can analyze serving profile data across multiple dimensions.
Precautions
None
Syntax¶
```bash
msserviceprofiler analyze
--input-path=/path/to/input
[--output-path=/path/to/output/]
[--log-level level]
[--format format]
```
Parameter Description¶
| Parameter | Mandatory (Yes/No) | Description |
|---|---|---|
| --input-path | Yes | Specifies the path to the profile data. |
| --output-path | No | Specifies the output directory where the parsing result files will be saved. It defaults to the output folder in the current directory. |
| --log-level | No | Sets the log level. The options are as follows: - debug: debug level. Logs debugging information for issue diagnosis.- info: informational level. Logs the normal tool operation information.- warning: warning level. Indicates unexpected but non-critical states that do not interrupt execution.- error: minor error level.- fatal: major error level.- critical: critical error level. |
| --format | No | Sets the export format for the profile data output files. The options are json, csv, and db. |
Output File Description¶
batch_summary.csv
| Field | Description |
|---|---|
| Metric | Metric item, including the column header metrics and the row header metrics. |
| Metrics (column header) | |
| prefill_batch_num | Number of records with batch_type = Prefill in each batch. |
| decode_batch_num | Number of records with batch_type = Decode in each batch. |
| prefill_exec_time(ms) | during_time of all records with batch_type = Prefill modelExec in each batch. If modelExec is not present, this row is omitted. The unit is ms. |
| decode_exec_time(ms) | during_time of all records with batch_type = Decode modelExec in each batch. If modelExec is not present, this row is omitted. The unit is ms. |
| Metrics (row header) | |
| Average | Average value. |
| Max | Maximum value. |
| Min | Minimum value. |
| P50 | 50th percentile. |
| P90 | 90% quantile value. |
| P99 | 99% quantile value. |
request_summary.csv
| Field | Description |
|---|---|
| Metric | Metric item, including the column header metrics and the row header metrics. |
| Metrics (column header) | |
| first_token_latency(ms) | Time to first token (TTFT), in ms. |
| subsequent_token_latency(ms) | Inter-token latency, measuring the average time (in ms) taken to generate each subsequent token after the first one. |
| total_time(ms) | Total duration of an HTTP request, in ms. |
| exec_time(ms) | Execution time of modelExec. If modelExec is not present, this row is omitted. The unit is ms. |
| waiting_time(ms) | Request waiting time, in ms. |
| input_token_num | Number of input tokens per request. |
| generated_token_num | Number of output tokens per request. |
| Metrics (row header) | |
| Average | Average value. |
| Max | Maximum value. |
| Min | Minimum value. |
| P50 | 50th percentile. |
| P90 | 90% quantile value. |
| P99 | 99% quantile value. |
service_summary.csv
| Field | Description |
|---|---|
| Metric | Metric item, including the column header metrics and the row header metrics. |
| Metrics (column header) | |
| total_input_token_num | Total number of input tokens. |
| total_generated_token_num | Total number of output tokens. |
| generate_token_speed(token/s) | Tokens output per second (token/s). |
| generate_all_token_speed(token/s) | Tokens processed per second (token/s) (total number of input and output tokens). |
| Metrics (row header) | |
| Value | Value |
- Mapping between domains and parsed results
| Parsed Result | Collection Domain |
|---|---|
| batch_summary.csv | "BatchSchedule" |
| request_summary.csv | "Request" |
| service_summary.csv | "Request; BatchSchedule; ModelExecute" |