Precision Data Collection Baseline in PyTorch¶

Time Expansion Baseline for Data Collection in "statistics" Mode¶

This baseline is a reference for performance expansion when data is collected in "statistics" mode in the PyTorch framework. This baseline shows time expansion of a single-layer DeepSeek model with eight cards in different collection modes.

Collection Mode	Without Tool (Time Required)	With Tool but Dump Disabled (Time Required)	With Tool and Dump Enabled (Time Required)	With Tool and MD5 Dump Enabled (Time Required)
L0	≈ 95.1 ms	≈ 95.5 ms (no expansion)	≈ 420.0 ms (4.5x)	≈ 1011.3 ms (10x)
L1	≈ 95.1 ms	≈ 115.8 ms (1.2x)	≈ 2469.0 ms (26x)	≈ 8636.0 ms (90x)
mix	≈ 95.1 ms	≈ 117.8 ms (1.2x)	≈ 3635.4 ms (38x)	≈ 10698.3 ms (112x)

Data Size Baseline in "tensor" Mode¶

This baseline is a reference for data size changes in "tensor" mode in the PyTorch framework. It shows the data size changes of LLAMA2-7B and LLAMA2-13B across different collection modes, global batch sizes, and configurations (single-rank vs. eight-rank).

LLAMA2-7B¶

Collection Mode	global_batch_size	Single-Rank	Eight-Rank
L0	1	7.8 GB	63 GB
	2	16 GB	125 GB
	3	24 GB	187 GB
L1	1	300.8 GB	2.3 TB
	2	480 GB	3.6 TB
	3	640 GB	4.9 TB
mix	1	313.6 GB	2.4 TB
	2	512 GB	3.8 TB
	3	672 GB	5.1 TB

LLAMA2-13B¶

Collection Mode	global_batch_size	Single-Rank	Eight-Rank
L0	1	13 GB	97 GB
	2	25 GB	194 GB
	3	37 GB	291 GB
L1	1	440 GB	3.4 TB
	2	720 GB	5.4 TB
	3	960 GB	7.3 TB
mix	1	480 GB	3.6 TB
	2	720 GB	5.6 TB
	3	1000 GB	7.7 TB