Skip to content

Simulators

Simulators in MASEval are used to create reproducible and scalable testing environments for multi-agent systems. Their primary function is to mock the behavior of external dependencies such as APIs, databases, or human users. By using simulators, developers can test agent interactions in a controlled, isolated environment. This approach eliminates variability from external factors like network latency or API availability, ensuring that benchmark results are consistent and deterministic. This allows for a more precise evaluation of agent performance and reliable comparison between different implementations.

View source

LLMSimulator

Bases: ABC, TraceableMixin

A base class for simulators that use an LLM.

Subclasses should override _create_error to return the appropriate exception type (ToolSimulatorError, UserSimulatorError, etc.).

__call__

__call__(
    generation_params: Optional[Dict[str, Any]] = None,
    **kwargs,
) -> Any

Generates a simulated output.

__init__

__init__(
    model: ModelAdapter,
    template: Optional[str] = None,
    max_try: int = 3,
    generation_params: Optional[Dict[str, Any]] = None,
)

Initializes the LLMSimulator.

PARAMETER DESCRIPTION
model

The language model to use for generation.

TYPE: ModelAdapter

template

A prompt template.

TYPE: str DEFAULT: None

max_try

Maximum number of model calls to attempt. Defaults to 3.

TYPE: int DEFAULT: 3

generation_params

Default generation parameters for the model. This overwrites the ModelAdapter's defaults if provided. Both can be overridden at call time. Defaults to None.

TYPE: Dict[str, Any] DEFAULT: None

gather_traces

gather_traces() -> dict[str, Any]

Gather execution traces from this simulator.

Output fields:

  • type - Component class name
  • gathered_at - ISO timestamp
  • simulator_type - The specific simulator class
  • total_calls - Number of simulation attempts
  • successful_calls - Number of successful simulations
  • failed_calls - Number of failed attempts
  • history - Complete history of all simulation attempts with timestamps, inputs, outputs, status, and error messages
RETURNS DESCRIPTION
dict[str, Any]

Dictionary containing simulator execution traces.

ToolLLMSimulator

Bases: LLMSimulator

A simulator that uses an LLM to generate plausible tool outputs.

Raises ToolSimulatorError on failure, which is classified as ENVIRONMENT_ERROR (not the agent's fault).

__init__

__init__(
    model: ModelAdapter,
    tool_name: str,
    tool_description: str,
    tool_inputs: Dict[str, Any],
    template: Optional[str] = None,
    max_try: int = 3,
    generation_params: Optional[Dict[str, Any]] = None,
)

Initializes the ToolLLMSimulator.

PARAMETER DESCRIPTION
model

The language model to use for generation (must have a generate method).

TYPE: ModelAdapter

tool_name

The name of the tool.

TYPE: str

tool_description

The description of the tool.

TYPE: str

tool_inputs

The schema for the tool's arguments.

TYPE: Dict[str, Any]

template

a prompt template. Defaults to the one in the library. See maseval.utils.templates.tool_llm_simulator_template.txt. The template should use double curly braces for placeholders. Should contain placeholders for name, description, inputs, and input_value_dict.

TYPE: str DEFAULT: None

max_try

Maximum number of model calls to attempt if json output parsing fails. Defaults to 3.

TYPE: int DEFAULT: 3

generation_params

Default generation parameters for the model. This overwrites the ModelAdapter's defaults if provided. Both can be overridden at call time. Defaults to None.

TYPE: Dict[str, Any] DEFAULT: None

gather_traces

gather_traces() -> dict[str, Any]

Gather execution traces from this simulator.

Output fields:

  • type - Component class name
  • gathered_at - ISO timestamp
  • simulator_type - The specific simulator class
  • total_calls - Number of simulation attempts
  • successful_calls - Number of successful simulations
  • failed_calls - Number of failed attempts
  • history - Complete history of all simulation attempts with timestamps, inputs, outputs, status, and error messages
RETURNS DESCRIPTION
dict[str, Any]

Dictionary containing simulator execution traces.

UserLLMSimulator

Bases: LLMSimulator

A simulator that uses an LLM to act as the user.

Raises UserSimulatorError on failure, which is classified as USER_ERROR (not the agent's fault).

__call__

__call__(
    conversation_history: List[Dict[str, str]],
    generation_params: Optional[Dict[str, Any]] = None,
) -> str

Generates a simulated user response.

PARAMETER DESCRIPTION
conversation_history

The history of the conversation.

TYPE: List[Dict[str, str]]

generation_params

Optional generation parameters for LLM to override the defaults.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

RETURNS DESCRIPTION
str

The simulated user response string.

__init__

__init__(
    model: ModelAdapter,
    user_profile: Dict[str, str],
    scenario: str,
    template: Optional[str] = None,
    max_try: int = 3,
    generation_params: Optional[Dict[str, Any]] = None,
    stop_token: Optional[str] = None,
    early_stopping_condition: Optional[str] = None,
)

Initializes the UserLLMSimulator.

PARAMETER DESCRIPTION
model

The language model to use for generation.

TYPE: ModelAdapter

user_profile

A dictionary containing the user's profile.

TYPE: Dict[str, str]

scenario

The scenario for the user.

TYPE: str

template

A prompt template. Defaults to the one in the library. See maseval.utils.templates.user_llm_simulator_template.txt.

TYPE: str DEFAULT: None

max_try

Maximum number of model calls to attempt. Defaults to 3.

TYPE: int DEFAULT: 3

generation_params

Default generation parameters for the model. This overwrites the ModelAdapter's defaults if provided. Both can be overridden at call time. Defaults to None.

TYPE: Dict[str, Any] DEFAULT: None

stop_token

Token to include in responses when early stopping condition is met. Must be provided together with early_stopping_condition. Defaults to None.

TYPE: Optional[str] DEFAULT: None

early_stopping_condition

A description of when the user should stop the conversation (e.g., "all goals have been accomplished"). Must be provided together with stop_token. Defaults to None.

TYPE: Optional[str] DEFAULT: None

RAISES DESCRIPTION
ValueError

If only one of stop_token or early_stopping_condition is provided.

gather_traces

gather_traces() -> dict[str, Any]

Gather execution traces from this simulator.

Output fields:

  • type - Component class name
  • gathered_at - ISO timestamp
  • simulator_type - The specific simulator class
  • total_calls - Number of simulation attempts
  • successful_calls - Number of successful simulations
  • failed_calls - Number of failed attempts
  • history - Complete history of all simulation attempts with timestamps, inputs, outputs, status, and error messages
RETURNS DESCRIPTION
dict[str, Any]

Dictionary containing simulator execution traces.

AgenticUserLLMSimulator

Bases: LLMSimulator

A simulator that uses an LLM to act as an agentic user (capable of using tools).

__call__

__call__(
    conversation_history: List[Dict[str, str]],
    generation_params: Optional[Dict[str, Any]] = None,
) -> Tuple[str, List[Dict[str, Any]]]

Generate a simulated user response with potential tool calls.

RETURNS DESCRIPTION
Tuple[str, List[Dict[str, Any]]]

Tuple[str, List[Dict[str, Any]]]: (text_response, list_of_tool_calls)

gather_traces

gather_traces() -> dict[str, Any]

Gather execution traces from this simulator.

Output fields:

  • type - Component class name
  • gathered_at - ISO timestamp
  • simulator_type - The specific simulator class
  • total_calls - Number of simulation attempts
  • successful_calls - Number of successful simulations
  • failed_calls - Number of failed attempts
  • history - Complete history of all simulation attempts with timestamps, inputs, outputs, status, and error messages
RETURNS DESCRIPTION
dict[str, Any]

Dictionary containing simulator execution traces.

SimulatorCallStatus

Bases: Enum