Supports W8A8 quantization for Qwen3-VL-32B-Instruct.
Supports automatic tuning based on quantization-accuracy feedback and can automatically search for the optimal quantization configuration based on accuracy requirements.
Supports self-managed quantization for multimodal understanding models and supports quantization integration for those models.
Quick quantization supports multi-card quantization and distributed layer-by-layer quantization, improving the efficiency of large-model quantization.
Supports W8A8 quantization for DeepSeek-V3.2. You can run it on a single card with 64 GB of accelerator memory and 100 GB of system memory.
Supports W4A8 quantization for DeepSeek-V3.2-Exp. You can run it on a single card with 64 GB of accelerator memory and 100 GB of system memory.
Supports W8A8 quantization for Qwen3-VL-235B-A22B.