Simulators

Simulators in MASEval are used to create reproducible and scalable testing environments for multi-agent systems. Their primary function is to mock the behavior of external dependencies such as APIs, databases, or human users. By using simulators, developers can test agent interactions in a controlled, isolated environment. This approach eliminates variability from external factors like network latency or API availability, ensuring that benchmark results are consistent and deterministic. This allows for a more precise evaluation of agent performance and reliable comparison between different implementations.

View source

LLMSimulator

Bases: ABC, TraceableMixin

A base class for simulators that use an LLM.

Subclasses should override _create_error to return the appropriate exception type (ToolSimulatorError, UserSimulatorError, etc.).

call

__call__(
    generation_params: Optional[Dict[str, Any]] = None,
    **kwargs,
) -> Any

Generates a simulated output.

init

__init__(
    model: ModelAdapter,
    template: Optional[str] = None,
    max_try: int = 3,
    generation_params: Optional[Dict[str, Any]] = None,
)

Initializes the LLMSimulator.

PARAMETER	DESCRIPTION
`model`	The language model to use for generation. TYPE: `ModelAdapter`
`template`	A prompt template. TYPE: `str` DEFAULT: `None`
`max_try`	Maximum number of model calls to attempt. Defaults to 3. TYPE: `int` DEFAULT: `3`
`generation_params`	Default generation parameters for the model. This overwrites the ModelAdapter's defaults if provided. Both can be overridden at call time. Defaults to None. TYPE: `Dict[str, Any]` DEFAULT: `None`

gather_traces

gather_traces() -> dict[str, Any]

Gather execution traces from this simulator.

Output fields:

type - Component class name
gathered_at - ISO timestamp
simulator_type - The specific simulator class
total_calls - Number of simulation attempts
successful_calls - Number of successful simulations
failed_calls - Number of failed attempts
history - Complete history of all simulation attempts with timestamps, inputs, outputs, status, and error messages

RETURNS	DESCRIPTION
`dict[str, Any]`	Dictionary containing simulator execution traces.

ToolLLMSimulator

Bases: LLMSimulator

A simulator that uses an LLM to generate plausible tool outputs.

Raises ToolSimulatorError on failure, which is classified as ENVIRONMENT_ERROR (not the agent's fault).

init

__init__(
    model: ModelAdapter,
    tool_name: str,
    tool_description: str,
    tool_inputs: Dict[str, Any],
    template: Optional[str] = None,
    max_try: int = 3,
    generation_params: Optional[Dict[str, Any]] = None,
)

Initializes the ToolLLMSimulator.

PARAMETER	DESCRIPTION
`model`	The language model to use for generation (must have a `generate` method). TYPE: `ModelAdapter`
`tool_name`	The name of the tool. TYPE: `str`
`tool_description`	The description of the tool. TYPE: `str`
`tool_inputs`	The schema for the tool's arguments. TYPE: `Dict[str, Any]`
`template`	a prompt template. Defaults to the one in the library. See `maseval.utils.templates.tool_llm_simulator_template.txt`. The template should use double curly braces for placeholders. Should contain placeholders for `name`, `description`, `inputs`, and `input_value_dict`. TYPE: `str` DEFAULT: `None`
`max_try`	Maximum number of model calls to attempt if json output parsing fails. Defaults to 3. TYPE: `int` DEFAULT: `3`
`generation_params`	Default generation parameters for the model. This overwrites the ModelAdapter's defaults if provided. Both can be overridden at call time. Defaults to None. TYPE: `Dict[str, Any]` DEFAULT: `None`

gather_traces

gather_traces() -> dict[str, Any]

Gather execution traces from this simulator.

Output fields:

type - Component class name
gathered_at - ISO timestamp
simulator_type - The specific simulator class
total_calls - Number of simulation attempts
successful_calls - Number of successful simulations
failed_calls - Number of failed attempts
history - Complete history of all simulation attempts with timestamps, inputs, outputs, status, and error messages

RETURNS	DESCRIPTION
`dict[str, Any]`	Dictionary containing simulator execution traces.

UserLLMSimulator

Bases: LLMSimulator

A simulator that uses an LLM to act as the user.

Raises UserSimulatorError on failure, which is classified as USER_ERROR (not the agent's fault).

call

__call__(
    conversation_history: List[Dict[str, str]],
    generation_params: Optional[Dict[str, Any]] = None,
) -> str

Generates a simulated user response.

PARAMETER	DESCRIPTION
`conversation_history`	The history of the conversation. TYPE: `List[Dict[str, str]]`
`generation_params`	Optional generation parameters for LLM to override the defaults. TYPE: `Optional[Dict[str, Any]]` DEFAULT: `None`

RETURNS	DESCRIPTION
`str`	The simulated user response string.

init

__init__(
    model: ModelAdapter,
    user_profile: Dict[str, str],
    scenario: str,
    template: Optional[str] = None,
    max_try: int = 3,
    generation_params: Optional[Dict[str, Any]] = None,
    stop_token: Optional[str] = None,
    early_stopping_condition: Optional[str] = None,
)

Initializes the UserLLMSimulator.

PARAMETER	DESCRIPTION
`model`	The language model to use for generation. TYPE: `ModelAdapter`
`user_profile`	A dictionary containing the user's profile. TYPE: `Dict[str, str]`
`scenario`	The scenario for the user. TYPE: `str`
`template`	A prompt template. Defaults to the one in the library. See `maseval.utils.templates.user_llm_simulator_template.txt`. TYPE: `str` DEFAULT: `None`
`max_try`	Maximum number of model calls to attempt. Defaults to 3. TYPE: `int` DEFAULT: `3`
`generation_params`	Default generation parameters for the model. This overwrites the ModelAdapter's defaults if provided. Both can be overridden at call time. Defaults to None. TYPE: `Dict[str, Any]` DEFAULT: `None`
`stop_token`	Token to include in responses when early stopping condition is met. Must be provided together with early_stopping_condition. Defaults to None. TYPE: `Optional[str]` DEFAULT: `None`
`early_stopping_condition`	A description of when the user should stop the conversation (e.g., "all goals have been accomplished"). Must be provided together with stop_token. Defaults to None. TYPE: `Optional[str]` DEFAULT: `None`

RAISES	DESCRIPTION
`ValueError`	If only one of stop_token or early_stopping_condition is provided.

gather_traces

gather_traces() -> dict[str, Any]

Gather execution traces from this simulator.

Output fields:

type - Component class name
gathered_at - ISO timestamp
simulator_type - The specific simulator class
total_calls - Number of simulation attempts
successful_calls - Number of successful simulations
failed_calls - Number of failed attempts
history - Complete history of all simulation attempts with timestamps, inputs, outputs, status, and error messages

RETURNS	DESCRIPTION
`dict[str, Any]`	Dictionary containing simulator execution traces.

AgenticUserLLMSimulator

Bases: LLMSimulator

A simulator that uses an LLM to act as an agentic user (capable of using tools).

call

__call__(
    conversation_history: List[Dict[str, str]],
    generation_params: Optional[Dict[str, Any]] = None,
) -> Tuple[str, List[Dict[str, Any]]]

Generate a simulated user response with potential tool calls.

RETURNS	DESCRIPTION
`Tuple[str, List[Dict[str, Any]]]`	Tuple[str, List[Dict[str, Any]]]: (text_response, list_of_tool_calls)

gather_traces

gather_traces() -> dict[str, Any]

Gather execution traces from this simulator.

Output fields:

type - Component class name
gathered_at - ISO timestamp
simulator_type - The specific simulator class
total_calls - Number of simulation attempts
successful_calls - Number of successful simulations
failed_calls - Number of failed attempts
history - Complete history of all simulation attempts with timestamps, inputs, outputs, status, and error messages

RETURNS	DESCRIPTION
`dict[str, Any]`	Dictionary containing simulator execution traces.

SimulatorCallStatus

Bases: Enum