Precision Data Collection Baseline in MindSpore¶
Time Expansion Baseline for Data Collection in "statistics" Mode (MD5 Disabled)¶
This baseline is a reference for performance expansion when data is collected in "statistics" mode in the MindSpore framework. It shows performance scaling of a 38B language model with eight ranks in different collection modes.
| Collection Mode | Without Tool (Time Required) | With Tool but Dump Disabled (Time Required) | With Tool and Dump Enabled (Time Required) |
|---|---|---|---|
| L0 | ≈ 340 ms | ≈ 340 ms (no change) | ≈ 1.2s (3.5x) |
| L1 | ≈ 340 ms | ≈ 0.7–1.2s (2–4x) | ≈ 3.8s (11x) |
| mix | ≈ 340 ms | ≈ 0.7–1.2s (2–4x) | ≈ 5.5s (16x) |
Data Size Baseline in "tensor" Mode¶
This baseline is a reference for data size changes in "tensor" mode in the MindSpore framework. It shows the data size changes of a 38B language model across different collection modes, global batch sizes, and configurations (single-rank vs. eight-rank).
Data Size Changes¶
| Collection Mode | global_batch_size | Single-Rank | Eight-Rank |
|---|---|---|---|
| L0 | 1 | 262 GB | 2.1 TB |
| 2 | 480 GB | 3.8 TB | |
| 3 | 928 GB | 7.4 TB | |
| L1 | 1 | 2.1 TB | 17.1 TB |
| 2 | 2.8 TB | 22.7 TB | |
| 3 | 4.2 TB | 34.3 TB | |
| mix | 1 | 2.4 TB | 19.2 TB |
| 2 | 3.3TB | 26.6TB | |
| 3 | 5.1 TB | 41.4 TB |