跳转至

Automatic Tuning Configuration Protocol

Overview

Configuration Protocol Introduction

The automatic tuning configuration protocol uses a hierarchical structure design. The top layer contains two core fields:

  • strategy: Tuning policy configuration, which defines the quantization configuration generation policy and basic quantization configuration.
  • evaluation: Evaluation service configuration, which defines the model accuracy evaluation method and parameters related to the service-based startup of the quantized model.

Configuration File Location

Customize the tuning configuration file based on your environment. For a reference template, see the example directory.

Basic Configuration Structure

strategy:
  type: <strategy_type>  # Tuning strategy type, such as standing_high or standing_high_with_experience.
  # Different strategies have different configuration fields and semantics. For details, see the corresponding strategy documentation.

evaluation:
  type: service_oriented
  demand:
    expectations:
      # Accuracy expectation list: describes the specific datasets and corresponding accuracy requirements that must be met.
  evaluation:
    type: aisbench
    # For details about the fields, see section evaluation.evaluation.
  inference_engine:
    # Inference engine configuration

Configuration Fields

strategy - Tuning Strategy Configuration

Purpose: Defines the tuning strategy type, core parameters, and basic quantization configuration.

type - Strategy Type

Purpose: Specifies the type of tuning algorithm to execute. Different strategy types correspond to different optimization algorithms.

Type: string

Value: Determined by the currently implemented tuning strategy. Valid values: standing_high, standing_high_with_experience.

Model adapter (by strategy)

When a tuning strategy needs automatic layer sensitivity analysis (to build rollback candidates), the model adapter must implement ModelSlimPipelineInterfaceV1 (PipelineInterface in core/runner/pipeline_interface.py, consistent with the CLI msmodelslim analyze command and Sensitive Layer Analysis). The strategy calls PipelineAnalysisService, which invokes init_model, handle_dataset, and visit/forward pipeline methods; the strategy does not call load_model upfront.

Additional requirements per strategy:

Strategy Model adapter requirement
standing_high ModelSlimPipelineInterfaceV1 (always runs automatic sensitivity analysis)
standing_high_with_experience Above + StandingHighWithExperienceInterface (load_model, outlier suppression capability probe)

See each strategy document and Integrating LLM Models.

Strategy-specific Fields

The configuration fields vary significantly depending on the selected tuning strategy. For details about specific configuration items, see the "YAML Configuration Fields" in the corresponding algorithm document. The currently supported algorithms are as follows:

evaluation - Evaluation Service Configuration

Purpose: Defines the configuration for model accuracy evaluation, including the evaluation service type, evaluation tool configuration, and inference engine configuration.

Core Fields

Field Purpose Type Mandatory (Yes/No) Description
type Specifies the evaluation service type. string Yes The value is fixed to service_oriented (service-oriented evaluation, which launches the model as a service to perform evaluation).
demand Specifies the accuracy requirement configuration. object Yes Defines the specific accuracy requirements for model evaluation, including the target dataset, target accuracy, and tolerance.
evaluation Specifies the evaluation tool configuration. object Yes Defines configuration parameters for the underlying evaluation tool.
inference_engine Specifies the inference engine configuration. object Yes Defines configuration parameters for the inference engine, which is utilized to launch the quantized model in service-oriented mode.

type - Evaluation Service Type

Purpose: Specifies the type of the evaluation service.

Type: string

Value: service_oriented (service-oriented evaluation, which launches the model as a service to perform evaluation).

demand - Accuracy Requirement Configuration

Purpose: Defines the accuracy requirements for model evaluation, including the target dataset, target accuracy, and tolerance.

Field Description

Field Purpose Type Mandatory (Yes/No) Description
expectations Specifies the accuracy expectation list. list Yes A list of defined accuracy requirements. Each element contains the target dataset, target accuracy, and tolerance. At least one element must be included.

expectations Fields

Field Purpose Type Mandatory (Yes/No) Description
dataset Specifies the target dataset to be evaluated. string Yes The dataset name, which must match the dataset name specified in evaluation.evaluation.datasets.
target Sets the target accuracy value. number Yes The expected target accuracy, which must be greater than 0. The value can be configured as a number (such as 0.95) or a string (such as "0.95").
tolerance Sets the accuracy tolerance. number Yes The error range allowed for accuracy evaluation, which must be greater than or equal to 0. The value can be configured as a number (such as 0.95) or a string (such as "0.95").

Configuration Example

# Accuracy requirements for a single dataset
demand:
  expectations:
    - dataset: gsm8k
      target: 83  # Target accuracy: 83%
      tolerance: 2  # Tolerance: ±2%

# Accuracy requirements for multiple datasets
demand:
  expectations:
    - dataset: gsm8k
      target: 83  # Target accuracy: 83%
      tolerance: 2  # Tolerance: ±2%
    - dataset: aime25
      target: 85  # Target accuracy: 85%
      tolerance: 1  # Tolerance: ±1%
    - dataset: bfcl-simple
      target: 80  # Target accuracy: 80%
      tolerance: 2  # Tolerance: ±2%

Notes

  • Accuracy metric units: The accuracy format returned by different datasets may vary. Some datasets return accuracy values in a decimal format (ranging from 0.0 to 1.0, where 0.83 represents 83%), while others return values in a percentage format (ranging from 0 to 100, where 83 represents 83%). The units configured for target and tolerance must match the precision format returned by the corresponding dataset. Always configure target and tolerance based on the actual accuracy format output by the evaluation tool for that dataset.
  • Accuracy numerical types: The target and tolerance values are processed internally by the system using the Decimal data type. In the YAML configuration file, you can write these parameters directly as numbers (such as 0.95) or as strings (such as "0.95"). For scenarios that require strict control over decimal point precision, you are advised to use the string format.
  • Accuracy target guidelines: The accuracy metrics provided in this document are for reference only. Configure these fields based on the baseline accuracy of your actual floating-point model. In theory, the accuracy of a quantized model will not exceed that of the original floating-point model. Therefore, you are advised to set the accuracy target slightly lower than or equal to the accuracy of the floating-point model.
  • Multi-dataset support: The framework supports configuring accuracy requirements for multiple datasets simultaneously. You can define distinct target accuracy and error tolerance for each individual dataset.

evaluation - Evaluation Tool Configuration

Purpose: Defines configuration parameters for the evaluation tool.

Core Fields

Field Purpose Type Mandatory (Yes/No) Description
type Specifies the evaluation tool type. string Yes The value is fixed to aisbench.
precheck Specifies the pre-check configuration. list No Defines the pre-check configuration executed before formal evaluation. An empty list skips the pre-check. Default value: []
aisbench Specifies the AISBench configuration. object Yes Detailed configuration parameters for the AISBench evaluation tool, such as timeout, batch_size, and generation_kwargs.
datasets Specifies the dataset configuration. dict Yes Defines the targeted datasets to be evaluated and their configurations. This configuration must include all datasets specified in demand.expectations.
host Specifies the service host address. string No The IP address or hostname used by the evaluation client to connect to the inference server. This value must match the configuration on the inference engine side. Default value: "localhost"
port Specifies the service port. int No The port number used by the evaluation client to connect to the inference server. This value must match the configuration on the inference engine side. Default value: 1234
served_model_name Specifies the service-oriented model name. string No The model name identifier used by the evaluation client to send requests. This value must match the configuration on the inference engine side. Default value: "served_model_name"

Configuration Example

evaluation:
  type: aisbench
  aisbench:
    binary: ais_bench
    mode: all
    timeout: 7200
    request_rate: 1.0
    retry: 2
    batch_size: 32
    max_out_len: 512
    trust_remote_code: false
    pred_postprocessor: extract_non_reasoning_content
    generation_kwargs:
      temperature: 0.5
      top_k: 10
      top_p: 0.9
      seed: null
      repetition_penalty: 1.03
    model_meta:
      base_name: vllm_api_general_chat
      subdir: vllm_api
      abbr: vllm-api-general-chat
      attr: service
    default_metric_keys:
      - final_accuracy
      - accuracy
      - score
  datasets:
    gsm8k:
      config_name: "gsm8k_gen_0_shot_cot_str"
      mode: all
    aime25:
      config_name: "aime2025_gen_0_shot_chat_prompt"
      mode: all
    bfcl-simple:
      config_name: "BFCL_gen_simple"
      mode: all
  host: localhost
  port: 1234
  served_model_name: served_model_name
aisbench - AISBench Configuration

Purpose: Configures the command-line and evaluation parameters (such as timeout, batch_size and generation_kwargs) for AISBench.

aisbench Fields

The following table describes configuration fields of the AISBench evaluation tool.

Field Purpose Type Mandatory (Yes/No) Description
binary Specifies the AISBench startup command. string No Fixed value: ais_bench. Default value: "ais_bench"
mode Specifies the evaluation mode. string No The specific evaluation mode to execute. Default value: "all"
timeout Specifies the timeout duration for command execution. int No The timeout duration (in seconds) must be greater than 0. Default value: 7200 (2 hours)
cleanup_model_config Specifies whether to clear the model configuration. bool No Specifies whether to clear the generated model configuration files. Default value: true
model_meta Specifies the model metadata configuration. object No For details about the model metadata configuration, see the following section. Default value: ModelConfigMeta()
request_rate Specifies the default request rate. float No The default request rate must be greater than 0. Default value: 1.0
pred_postprocessor Specifies the prediction post-processor. string No The name of the prediction post-processor. Default value: "extract_non_reasoning_content"
retry Specifies the maximum number of request retries. int No The number of request retries must be greater than or equal to 0. Default value: 2
batch_size Specifies the batch size. int No The batch size for data processing must be greater than 0. Default value: 1
max_out_len Specifies the maximum output length. int No The maximum output length must be greater than 0. Default value: 512
trust_remote_code Specifies whether to trust remote code. bool No Specifies whether to trust remote code. Default value: false
generation_kwargs Specifies generation parameters for the inference backend. dict No A dictionary containing generation configuration parameters. Default value: {}
extra_args Specifies extra command-line arguments. list No A list of additional command-line arguments to append. Default value: []
log_dir Specifies the log directory path. string No An empty string indicates that the tool uses the default system path. Default value: ""

model_meta Fields

The following table describes parameters used to obtain the service-oriented inference backend model configuration.

Field Purpose Type Mandatory (Yes/No) Description
directory Specifies the model configuration directory path. string No The explicit path to the model configuration directory. An empty string indicates that the tool uses the default system path. Default value: ""
subdir Specifies the subdirectory for the service-oriented model backend configuration. string No The subdirectory name for the service-oriented model backend configuration. Default value: "vllm_api"
base_name Specifies the base name for the service-oriented model backend configuration. string No The base name for the service-oriented model backend configuration. Default value: "vllm_api_general_chat"
name_suffix Specifies the name suffix for the service-oriented model backend configuration. string No The value auto triggers automatic generation. Default value: "auto"
abbr Specifies the model configuration abbreviation. string No The abbreviated name used for the model configuration. Default value: "vllm-api-general-chat"
attr Specifies the model configuration attribute. string No The attribute tag for the model configuration. Default value: "service"

Note: Most of the preceding parameters correspond directly to AISBench command-line parameters and service-oriented inference backend settings. For comprehensive configuration options, see the AISBench Detailed Parameter Description.

datasets - Dataset Configuration

Purpose: Configures the datasets to be evaluated and the parameters of each dataset in AISBench. The configuration must contain all datasets that appear in demand.expectations.

datasets Fields

This field specifies the mapping between different dataset keys and the dataset configurations in AISBench. The following example lists only three datasets (gsm8k, aime25, and bfcl-simple). For details about more datasets, see Supported Dataset Types in the AISBench documentation.

The following table describes the configuration fields of each dataset.

Field Purpose Type Mandatory (Yes/No) Description
config_name Specifies the configuration name in AISBench. string Yes This field indicates the configuration name of the dataset in AISBench, which must be a non-empty string.
mode Specifies the evaluation mode for the dataset. string No An empty string enables global mode. Default value: ""
request_rate Specifies the request rate for the dataset. float No A value of 0.0 applies the global default value. The value must be greater than or equal to 0. Default value: 0.0
max_out_len Specifies the maximum output length for this dataset. int No None: applies the global default value. If this parameter is specified, the value must be greater than 0. Default value: None
returns_tool_calls Specifies whether to return tool calls. bool No None indicates that the field is not written. Default value: None
api_chat_type Specifies the API chat type used by the dataset. string No This field must be consistent with the API or request format required by the corresponding AISBench dataset configuration. Default value: "VLLMCustomAPIChat"
extra_args Specifies additional command-line arguments for the dataset. list No A list of additional command-line arguments to append. Default value: []
(Optional) precheck - Pre-check Configuration

Purpose: Defines the pre-check configuration before formal evaluation, which is used to pre-verify the quantized model before each model evaluation iteration.

Type: list

Note: The precheck field contains a list where each element is a pre-check item. Each item includes a type field that specifies the pre-check type. If this field is configured and is not an empty list, the system executes a pre-check before starting formal evaluation.

Supported pre-check type: expected_answer

expected_answer - Expected Answer Verification

Purpose: Verifies whether the model output contains the expected answer content.

Field Description

Field Purpose Type Mandatory (Yes/No) Description
type Specifies the pre-check type. string Yes Fixed value: expected_answer.
test_cases Specifies the test case list. list No This field contains a list of test cases in key-value pairs representing questions and answers. If omitted, the configuration applies a default test case: {"What is 2+2?": "4"}. Default value: [{"What is 2+2?": "4"}]
max_tokens Specifies the maximum number of tokens. int No The value must be greater than 0. Default value: 512
timeout Specifies the timeout duration. float No This field defines the timeout duration in seconds, which must be greater than 0. Default value: 60.0

test_cases Field

The test_cases field uses a dictionary format of key-value pairs, where the key represents the question and the value represents the answer:

test_cases:
  - "What is 2+2?": ["4", "four"] # Required content in the response: "4" or "four"
  - "What is the capital of China?": "Beijing" # Required content in the response: "Beijing"

Format Description

  • Key (question): a string that contains the test message.
  • Value (answer): a string or a list of strings.
  • String: A string value like "4" indicates that the expected response must contain "4".
  • List of strings: A list like ["4", "four"] indicates that the expected response must contain either "4" or "four".

Configuration Example

precheck:
  - type: expected_answer
    test_cases:
      - "What is 2+2?": ["4", "four"]
      - "What is the capital of China?": "Beijing"
    max_tokens: 1024
    timeout: 60.0

Notes:

  • The system executes the pre-check function after starting the service and before evaluating the model in each iteration. The system runs all configured pre-check rules in sequence. If any pre-check fails, the system skips the formal evaluation for the current iteration, returns zeroed-out dataset results, and proceeds directly to the next iteration.
  • Pre-checks quickly identify obvious issues to prevent wasting time on a full evaluation. If all pre-checks pass, the system proceeds to the formal accuracy evaluation.
  • English Q&A support only: The pre-check function currently supports only English Q&A. You must provide test messages and expected answers in English.
  • If you omit the precheck field or configure it as an empty list, the system skips the pre-check phase and proceeds directly to the formal evaluation.

inference_engine - Inference Engine Configuration

Purpose: Defines the configuration parameters of the inference engine, which are used to start the quantized model as a service.

Field Description

Field Purpose Type Mandatory (Yes/No) Description
type Specifies the inference engine type. string Yes Currently, only vllm-ascend is supported.
entrypoint Specifies the service entry point. string No This field requires a non-empty string. The default value vllm.entrypoints.openai.api_server designates the service entry point for the OpenAI-compatible API of vLLM. Alternative entries must match an executable module using the -m flag within your installed vLLM or vLLM-Ascend deployment.
env_vars Specifies environment variables. dict No This field configures the required environment variables. Default value: {}
served_model_name Specifies the external service model name. string No This field requires a non-empty string that acts as the model identifier for the external inference service. This value must match the identifier specified in evaluation.evaluation. Default value: "served_model_name"
host Specifies the service host address. string No This field defines the listening address of the inference service. Supported formats: localhost, IPv4, and IPv6. This value must match the value specified on the evaluation side. Default value: "localhost"
port Specifies the service port. int No This field defines the listening port of the inference service. Value range: 1 to 65535. This value must match the value specified on the evaluation side. Default value: 1234
health_check_endpoint Specifies the health check endpoint. string No This field defines the HTTP path requested during readiness probing. The value must be identical to the URL that returns a success response from the active inference process. The default value "/v1/models" maps to the standard model list interface of typical OpenAI-compatible services. The value must start with a forward slash /. You can customize this value based on the actual routes of your deployed vLLM-Ascend cluster.
startup_timeout Specifies the startup timeout duration. int No The timeout duration (in seconds) must be greater than 0. Default value: 600
args Specifies startup arguments for the inference engine. dict No This field allows you to append additional arguments for vLLM-Ascend. Default value: {}

Note: The parameters required to launch a service vary depending on the model. You must tune these parameters based on your actual model requirements. To configure specific parameters, see Configuration Guide in the vLLM-Ascend documentation. You can add custom startup arguments to the args field and define system environment variables within the env_vars dictionary.

Configuration Example

inference_engine:
  type: vllm-ascend
  entrypoint: vllm.entrypoints.openai.api_server
  env_vars:
    HCCL_BUFFSIZE: 1024
    VLLM_VERSION: 0.11.0
    ASCEND_RT_VISIBLE_DEVICES: 0
  served_model_name: served_model_name
  host: localhost
  port: 1234
  health_check_endpoint: /v1/models
  startup_timeout: 600
  args:
    enforce-eager: true
    served-model-name: served_model_name
    trust-remote-code: true
    tensor-parallel-size: 1
    data-parallel-size: 1
    quantization: ascend
    enable-prefix-caching: false
    max-model-len: 8192
    max-num-batched-tokens: 8192
    gpu-memory-utilization: 0.9
    additional_config:
      ascend_scheduler_config:
        enable: true
      enable_weight_nz_layout: true

Examples

Refer to the following files for complete automatic tuning configuration examples: