跳转至

Precision Data Collection Baseline in PyTorch

Time Expansion Baseline for Data Collection in "statistics" Mode

This baseline is a reference for performance expansion when data is collected in "statistics" mode in the PyTorch framework. This baseline shows time expansion of a single-layer DeepSeek model with eight cards in different collection modes.

Collection Mode Without Tool (Time Required) With Tool but Dump Disabled (Time Required) With Tool and Dump Enabled (Time Required) With Tool and MD5 Dump Enabled (Time Required)
L0 ≈ 95.1 ms ≈ 95.5 ms (no expansion) ≈ 420.0 ms (4.5x) ≈ 1011.3 ms (10x)
L1 ≈ 95.1 ms ≈ 115.8 ms (1.2x) ≈ 2469.0 ms (26x) ≈ 8636.0 ms (90x)
mix ≈ 95.1 ms ≈ 117.8 ms (1.2x) ≈ 3635.4 ms (38x) ≈ 10698.3 ms (112x)

Data Size Baseline in "tensor" Mode

This baseline is a reference for data size changes in "tensor" mode in the PyTorch framework. It shows the data size changes of LLAMA2-7B and LLAMA2-13B across different collection modes, global batch sizes, and configurations (single-rank vs. eight-rank).

LLAMA2-7B

Collection Mode global_batch_size Single-Rank Eight-Rank
L0 1 7.8 GB 63 GB
2 16 GB 125 GB
3 24 GB 187 GB
L1 1 300.8 GB 2.3 TB
2 480 GB 3.6 TB
3 640 GB 4.9 TB
mix 1 313.6 GB 2.4 TB
2 512 GB 3.8 TB
3 672 GB 5.1 TB

LLAMA2-13B

Collection Mode global_batch_size Single-Rank Eight-Rank
L0 1 13 GB 97 GB
2 25 GB 194 GB
3 37 GB 291 GB
L1 1 440 GB 3.4 TB
2 720 GB 5.4 TB
3 960 GB 7.3 TB
mix 1 480 GB 3.6 TB
2 720 GB 5.6 TB
3 1000 GB 7.7 TB