Simulators
Simulators in MASEval are used to create reproducible and scalable testing environments for multi-agent systems. Their primary function is to mock the behavior of external dependencies such as APIs, databases, or human users. By using simulators, developers can test agent interactions in a controlled, isolated environment. This approach eliminates variability from external factors like network latency or API availability, ensuring that benchmark results are consistent and deterministic. This allows for a more precise evaluation of agent performance and reliable comparison between different implementations.
LLMSimulator
Bases: ABC, TraceableMixin
A base class for simulators that use an LLM.
Subclasses should override _create_error to return the appropriate
exception type (ToolSimulatorError, UserSimulatorError, etc.).
__call__
__call__(
generation_params: Optional[Dict[str, Any]] = None,
**kwargs,
) -> Any
Generates a simulated output.
__init__
__init__(
model: ModelAdapter,
template: Optional[str] = None,
max_try: int = 3,
generation_params: Optional[Dict[str, Any]] = None,
)
Initializes the LLMSimulator.
| PARAMETER | DESCRIPTION |
|---|---|
model
|
The language model to use for generation.
TYPE:
|
template
|
A prompt template.
TYPE:
|
max_try
|
Maximum number of model calls to attempt. Defaults to 3.
TYPE:
|
generation_params
|
Default generation parameters for the model. This overwrites the ModelAdapter's defaults if provided. Both can be overridden at call time. Defaults to None.
TYPE:
|
gather_traces
gather_traces() -> dict[str, Any]
Gather execution traces from this simulator.
Output fields:
type- Component class namegathered_at- ISO timestampsimulator_type- The specific simulator classtotal_calls- Number of simulation attemptssuccessful_calls- Number of successful simulationsfailed_calls- Number of failed attemptshistory- Complete history of all simulation attempts with timestamps, inputs, outputs, status, and error messages
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dictionary containing simulator execution traces. |
ToolLLMSimulator
Bases: LLMSimulator
A simulator that uses an LLM to generate plausible tool outputs.
Raises ToolSimulatorError on failure, which is classified as ENVIRONMENT_ERROR (not the agent's fault).
__init__
__init__(
model: ModelAdapter,
tool_name: str,
tool_description: str,
tool_inputs: Dict[str, Any],
template: Optional[str] = None,
max_try: int = 3,
generation_params: Optional[Dict[str, Any]] = None,
)
Initializes the ToolLLMSimulator.
| PARAMETER | DESCRIPTION |
|---|---|
model
|
The language model to use for generation (must have a
TYPE:
|
tool_name
|
The name of the tool.
TYPE:
|
tool_description
|
The description of the tool.
TYPE:
|
tool_inputs
|
The schema for the tool's arguments.
TYPE:
|
template
|
a prompt template. Defaults to the one in the library. See
TYPE:
|
max_try
|
Maximum number of model calls to attempt if json output parsing fails. Defaults to 3.
TYPE:
|
generation_params
|
Default generation parameters for the model. This overwrites the ModelAdapter's defaults if provided. Both can be overridden at call time. Defaults to None.
TYPE:
|
gather_traces
gather_traces() -> dict[str, Any]
Gather execution traces from this simulator.
Output fields:
type- Component class namegathered_at- ISO timestampsimulator_type- The specific simulator classtotal_calls- Number of simulation attemptssuccessful_calls- Number of successful simulationsfailed_calls- Number of failed attemptshistory- Complete history of all simulation attempts with timestamps, inputs, outputs, status, and error messages
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dictionary containing simulator execution traces. |
UserLLMSimulator
Bases: LLMSimulator
A simulator that uses an LLM to act as the user.
Raises UserSimulatorError on failure, which is classified as USER_ERROR (not the agent's fault).
__call__
__call__(
conversation_history: List[Dict[str, str]],
generation_params: Optional[Dict[str, Any]] = None,
) -> str
Generates a simulated user response.
| PARAMETER | DESCRIPTION |
|---|---|
conversation_history
|
The history of the conversation.
TYPE:
|
generation_params
|
Optional generation parameters for LLM to override the defaults.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
The simulated user response string. |
__init__
__init__(
model: ModelAdapter,
user_profile: Dict[str, str],
scenario: str,
template: Optional[str] = None,
max_try: int = 3,
generation_params: Optional[Dict[str, Any]] = None,
stop_token: Optional[str] = None,
early_stopping_condition: Optional[str] = None,
)
Initializes the UserLLMSimulator.
| PARAMETER | DESCRIPTION |
|---|---|
model
|
The language model to use for generation.
TYPE:
|
user_profile
|
A dictionary containing the user's profile.
TYPE:
|
scenario
|
The scenario for the user.
TYPE:
|
template
|
A prompt template. Defaults to the one in the library.
See
TYPE:
|
max_try
|
Maximum number of model calls to attempt. Defaults to 3.
TYPE:
|
generation_params
|
Default generation parameters for the model. This overwrites the ModelAdapter's defaults if provided. Both can be overridden at call time. Defaults to None.
TYPE:
|
stop_token
|
Token to include in responses when early stopping condition is met. Must be provided together with early_stopping_condition. Defaults to None.
TYPE:
|
early_stopping_condition
|
A description of when the user should stop the conversation (e.g., "all goals have been accomplished"). Must be provided together with stop_token. Defaults to None.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If only one of stop_token or early_stopping_condition is provided. |
gather_traces
gather_traces() -> dict[str, Any]
Gather execution traces from this simulator.
Output fields:
type- Component class namegathered_at- ISO timestampsimulator_type- The specific simulator classtotal_calls- Number of simulation attemptssuccessful_calls- Number of successful simulationsfailed_calls- Number of failed attemptshistory- Complete history of all simulation attempts with timestamps, inputs, outputs, status, and error messages
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dictionary containing simulator execution traces. |
AgenticUserLLMSimulator
Bases: LLMSimulator
A simulator that uses an LLM to act as an agentic user (capable of using tools).
__call__
__call__(
conversation_history: List[Dict[str, str]],
generation_params: Optional[Dict[str, Any]] = None,
) -> Tuple[str, List[Dict[str, Any]]]
Generate a simulated user response with potential tool calls.
| RETURNS | DESCRIPTION |
|---|---|
Tuple[str, List[Dict[str, Any]]]
|
Tuple[str, List[Dict[str, Any]]]: (text_response, list_of_tool_calls) |
gather_traces
gather_traces() -> dict[str, Any]
Gather execution traces from this simulator.
Output fields:
type- Component class namegathered_at- ISO timestampsimulator_type- The specific simulator classtotal_calls- Number of simulation attemptssuccessful_calls- Number of successful simulationsfailed_calls- Number of failed attemptshistory- Complete history of all simulation attempts with timestamps, inputs, outputs, status, and error messages
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dictionary containing simulator execution traces. |
SimulatorCallStatus
Bases: Enum