User

In many real-world applications, Multi-Agent Systems (MAS) are designed to interact with human users to accomplish tasks. To effectively benchmark such systems, it is crucial to have a standardized way to simulate these interactions. MASEval provides this capability through a User hierarchy: the abstract User base class defines the interface, while LLMUser provides an LLM-driven implementation that can engage with the MAS in a realistic manner.

The LLMUser is initialized with a persona and a scenario, both of which are typically defined within a Task. This tight integration allows for dynamic and context-aware simulations. For example, a Task might generate a random birthdate for the user. This birthdate is then passed to both the LLMUser and the Evaluator. The user will use this information in its conversation with the MAS, and the Evaluator will check if the MAS correctly processes and remembers this information. This mechanism enables the creation of sophisticated and reliable benchmarks that can assess the interactive capabilities of a MAS.

View source

User

Bases: ABC, TraceableMixin, ConfigurableMixin

Abstract interface for user interaction during evaluation.

A user represents the entity that interacts with agents during evaluation. This could be an LLM simulating a human, a scripted response sequence, a real human, or another agent system.

Subclasses must implement:

get_initial_query() - Return the opening message to start the conversation
respond() - Generate responses to agent messages
is_done() - Determine when the interaction should end

The optional get_tool() method can be overridden for frameworks that use tool-based user interaction (e.g., smolagents, CAMEL).

gather_config

gather_config() -> Dict[str, Any]

Gather configuration from this component.

Provides a default implementation that returns basic metadata about the component (type and collection timestamp). Subclasses should extend this method to include their own configuration data.

This method is called by the Benchmark before evaluation to collect all configuration information. The returned dictionary must be JSON-serializable.

Output fields:

type - Component class name
gathered_at - ISO timestamp of when config was collected

Subclasses typically add additional component-specific configuration.

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary containing configuration with standardized structure.

How to use

Override this method and call super().gather_config() to extend the base implementation with your own data:

def gather_config(self) -> Dict[str, Any]:
    return {
        **super().gather_config(),
        "model_name": self.model_name,
        "temperature": self.temperature,
        "max_tokens": self.max_tokens
    }

If you don't need custom configuration tracking, you can use the default implementation without overriding (it will still return basic metadata about your component).

gather_traces

gather_traces() -> Dict[str, Any]

Gather execution traces from this component.

Provides a default implementation that returns basic metadata about the component (type and collection timestamp). Subclasses should extend this method to include their own execution data.

This method is called by the Benchmark before evaluation to collect all execution data. The returned dictionary must be JSON-serializable.

Output fields:

type - Component class name
gathered_at - ISO timestamp of when traces were collected

Subclasses typically add additional component-specific data.

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary containing traces with standardized structure.

How to use

Override this method and call super().gather_traces() to extend the base implementation with your own data:

def gather_traces(self) -> Dict[str, Any]:
    return {
        **super().gather_traces(),
        "my_field": self._my_data,
        "execution_count": len(self._history)
    }

If you don't need custom tracing, you can use the default implementation without overriding (it will still return basic metadata about your component).

get_initial_query `abstractmethod`

get_initial_query() -> str

Return the initial query to start the conversation.

RETURNS	DESCRIPTION
`str`	The opening message from the user to begin the interaction.

get_tool

get_tool() -> Any

Return a framework-compatible tool for agent interaction.

Some frameworks (smolagents, CAMEL) use a tool-based pattern where agents invoke an AskUser tool to interact with the user. Override this in subclasses for frameworks that need it.

RETURNS	DESCRIPTION
`Any`	Framework-specific tool, or `None` if not applicable.

is_done `abstractmethod`

is_done() -> bool

Check if the user interaction should terminate.

RETURNS	DESCRIPTION
`bool`	True if the user is done interacting, False to continue.

respond `abstractmethod`

respond(message: str) -> str

Respond to a message from the agent.

PARAMETER	DESCRIPTION
`message`	The agent's message or question. TYPE: `str`

RETURNS	DESCRIPTION
`str`	The user's response.

RAISES	DESCRIPTION
`UserExhaustedError`	If the user has no more turns available and no `exhausted_response` is configured.

LLMUser

Bases: User

User simulated by a language model.

Uses an LLM to generate realistic user responses based on a user profile and scenario description. Maintains conversation history and supports multi-turn interaction with configurable termination conditions.

The user only has access to the conversation history and does not see the full environment state, ensuring partial observability.

Multi-Turn Interaction

By default, users support single-turn interaction (max_turns=1). For benchmarks that require multiple agent-user exchanges, set max_turns > 1.

Early Stopping

For benchmarks where termination depends on user satisfaction rather than a fixed turn count, configure stop_tokens. When the user's response contains any of these tokens, is_done() returns True. The MACS benchmark uses "</stop>" to signal satisfaction.

ATTRIBUTE	DESCRIPTION
`name`	User identifier.
`model`	Language model for generating responses.
`user_profile`	Dictionary describing the user's persona and preferences.
`scenario`	Description of the task the user is trying to accomplish.
`simulator`	The LLM simulator instance generating responses.
`messages`	Conversation history between user and agent.
`max_turns`	Maximum number of user response turns.
`stop_tokens`	Tokens that trigger early stopping when detected (empty list if disabled).
`early_stopping_condition`	Description of when to emit a stop token, or None.

termination_reason `property`

termination_reason: TerminationReason

Get the reason why the user interaction terminated.

RETURNS	DESCRIPTION
`TerminationReason`	Why `is_done()` returns True, or `NOT_TERMINATED` if still ongoing.

init

__init__(
    name: str,
    model: ModelAdapter,
    user_profile: Dict[str, Any],
    scenario: str,
    initial_query: Optional[str] = None,
    template: Optional[str] = None,
    max_try: int = 3,
    max_turns: int = 1,
    stop_tokens: Optional[List[str]] = None,
    early_stopping_condition: Optional[str] = None,
    exhausted_response: Optional[str] = None,
)

Initialize the LLMUser.

PARAMETER	DESCRIPTION
`name`	The name of the user. TYPE: `str`
`model`	The language model to be used for generating responses. TYPE: `ModelAdapter`
`user_profile`	A dictionary describing the user's persona, preferences, and other relevant information. TYPE: `Dict[str, Any]`
`scenario`	A description of the situation or task the user is trying to accomplish. TYPE: `str`
`initial_query`	A pre-set query to start the conversation. If provided, it becomes the first user message. If None, call get_initial_query() to generate one from the model based on the user profile and scenario. Defaults to None. TYPE: `Optional[str]` DEFAULT: `None`
`template`	A custom prompt template for the user simulator. Defaults to None. TYPE: `Optional[str]` DEFAULT: `None`
`max_try`	The maximum number of attempts for the simulator to generate a valid response. Defaults to 3. TYPE: `int` DEFAULT: `3`
`max_turns`	Maximum number of user messages in the conversation. Each user message counts as one turn, including the initial_query. Use max_turns=1 for single-turn benchmarks, or higher values for multi-turn interaction. Defaults to 1. TYPE: `int` DEFAULT: `1`
`stop_tokens`	List of tokens that signal user satisfaction, enabling early termination. When the user's LLM-generated response contains any of these tokens, is_done() returns True regardless of remaining turns. The matched token is stripped from the response. Defaults to None (early stopping disabled). TYPE: `Optional[List[str]]` DEFAULT: `None`
`early_stopping_condition`	A description of when the user should stop the conversation (e.g., "all goals have been accomplished"). Used with stop_tokens to instruct the LLM when to emit a stop token. Must be provided if stop_tokens is set. Defaults to None. TYPE: `Optional[str]` DEFAULT: `None`
`exhausted_response`	Message to return when `respond()` is called after the user is done. If `None` (default), raises `UserExhaustedError` instead. Set this to a descriptive string (e.g., `"The user is no longer available. Proceed with the information you have."`) for tool-based integrations where the agent controls when to call the user. TYPE: `Optional[str]` DEFAULT: `None`

RAISES	DESCRIPTION
`ValueError`	If stop_tokens is set but early_stopping_condition is not provided.

gather_config

gather_config() -> Dict[str, Any]

Gather configuration from this user.

Output fields:

name - User identifier
profile - User profile data
scenario - Task scenario description
max_turns - Maximum interaction turns
stop_tokens - Early stopping tokens (empty list if disabled)
exhausted_response - Message returned when user is done, or None

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary containing user configuration.

gather_traces

gather_traces() -> Dict[str, Any]

Gather execution traces from this user.

Output fields:

name - User identifier
profile - User profile data
message_count - Number of messages in history
messages - Full conversation history
logs - Execution logs with timing
termination_reason - Why interaction ended (see TerminationReason)
stop_reason - Which stop token triggered termination, if any
max_turns - Maximum allowed turns
turns_used - Actual turns used
stopped_by_user - Whether user emitted a stop token

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary containing user state and interaction data.

get_initial_query

get_initial_query() -> str

Get the initial query for the conversation.

If an initial_query was provided at construction, returns it. Otherwise, generates one using the LLM simulator based on the user's profile and scenario.

This method: - Returns the existing initial query if one was provided - Or calls the LLM simulator to generate one - Ensures the query is in the message history - Counts the initial query as the first turn

RETURNS	DESCRIPTION
`str`	The initial query (either pre-set or LLM-generated).

RAISES	DESCRIPTION
`RuntimeError`	If called after conversation has progressed beyond the initial message.

get_tool

get_tool() -> Any

Return a framework-compatible tool for agent interaction.

Some frameworks (smolagents, CAMEL) use a tool-based pattern where agents invoke an AskUser tool to interact with the user. Override this in subclasses for frameworks that need it.

RETURNS	DESCRIPTION
`Any`	Framework-specific tool, or `None` if not applicable.

increment_turn

increment_turn() -> None

Increment the turn counter.

Call this after recording a user response in the message history.

is_done

is_done() -> bool

Check if the user interaction should end.

Checks: 1. If max_turns has been reached 2. If the user previously indicated termination (via stop_token)

Subclasses can override to add custom termination logic (e.g., LLM-based satisfaction checks) by calling super().is_done() first.

RETURNS	DESCRIPTION
`bool`	True if the user is done interacting, False to continue.

respond

respond(message: str) -> str

Respond to a message from the agent using LLM simulation.

This method appends the agent's message to the conversation history, generates a response using the LLM simulator, appends the response to the history, and returns it.

If a stop_token is detected in the response, triggers early stopping.

PARAMETER	DESCRIPTION
`message`	The message from the agent to which the user should respond. TYPE: `str`

RETURNS	DESCRIPTION
`str`	The user's response, or `exhausted_response` if done and configured.

RAISES	DESCRIPTION
`UserExhaustedError`	If the user is already done and no `exhausted_response` is configured.

AgenticLLMUser

Bases: LLMUser

LLM-simulated user with access to tools.

Extends LLMUser with the ability to use tools (e.g., check order status, lookup information) during the conversation. Uses a ReAct-style loop to iteratively call tools and generate responses.

termination_reason `property`

termination_reason: TerminationReason

Get the reason why the user interaction terminated.

RETURNS	DESCRIPTION
`TerminationReason`	Why `is_done()` returns True, or `NOT_TERMINATED` if still ongoing.

init

__init__(
    name: str,
    model: ModelAdapter,
    user_profile: Dict[str, Any],
    scenario: str,
    tools: Optional[Dict[str, Callable]] = None,
    max_internal_steps: int = 5,
    **kwargs: Any,
)

Initialize AgenticLLMUser.

PARAMETER	DESCRIPTION
`name`	The name of the user. TYPE: `str`
`model`	The language model to be used for generating responses. TYPE: `ModelAdapter`
`user_profile`	A dictionary describing the user's persona. TYPE: `Dict[str, Any]`
`scenario`	A description of the task the user is trying to accomplish. TYPE: `str`
`tools`	Dictionary of tools available to the user. TYPE: `Optional[Dict[str, Callable]]` DEFAULT: `None`
`max_internal_steps`	Maximum number of tool execution loops per turn. TYPE: `int` DEFAULT: `5`
`**kwargs`	Arguments passed to LLMUser.init TYPE: `Any` DEFAULT: `{}`

gather_config

gather_config() -> Dict[str, Any]

Gather configuration from this user.

Output fields:

name - User identifier
profile - User profile data
scenario - Task scenario description
max_turns - Maximum interaction turns
stop_tokens - Early stopping tokens (empty list if disabled)
exhausted_response - Message returned when user is done, or None

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary containing user configuration.

gather_traces

gather_traces() -> Dict[str, Any]

Gather execution traces from this user.

Output fields:

name - User identifier
profile - User profile data
message_count - Number of messages in history
messages - Full conversation history
logs - Execution logs with timing
termination_reason - Why interaction ended (see TerminationReason)
stop_reason - Which stop token triggered termination, if any
max_turns - Maximum allowed turns
turns_used - Actual turns used
stopped_by_user - Whether user emitted a stop token

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary containing user state and interaction data.

get_initial_query

get_initial_query() -> str

Get the initial query for the conversation.

If an initial_query was provided at construction, returns it. Otherwise, generates one using the LLM simulator based on the user's profile and scenario.

This method: - Returns the existing initial query if one was provided - Or calls the LLM simulator to generate one - Ensures the query is in the message history - Counts the initial query as the first turn

RETURNS	DESCRIPTION
`str`	The initial query (either pre-set or LLM-generated).

RAISES	DESCRIPTION
`RuntimeError`	If called after conversation has progressed beyond the initial message.

get_tool

get_tool() -> Any

Return a framework-compatible tool for agent interaction.

Some frameworks (smolagents, CAMEL) use a tool-based pattern where agents invoke an AskUser tool to interact with the user. Override this in subclasses for frameworks that need it.

RETURNS	DESCRIPTION
`Any`	Framework-specific tool, or `None` if not applicable.

increment_turn

increment_turn() -> None

Increment the turn counter.

Call this after recording a user response in the message history.

is_done

is_done() -> bool

Check if the user interaction should end.

Checks: 1. If max_turns has been reached 2. If the user previously indicated termination (via stop_token)

Subclasses can override to add custom termination logic (e.g., LLM-based satisfaction checks) by calling super().is_done() first.

RETURNS	DESCRIPTION
`bool`	True if the user is done interacting, False to continue.

respond

respond(message: str) -> str

Respond to a message, potentially executing tools in a loop.

Uses a ReAct-style loop where the LLM can call tools and reason about the results before generating a final response.

PARAMETER	DESCRIPTION
`message`	The message from the agent to respond to. TYPE: `str`

RETURNS	DESCRIPTION
`str`	The user's final response after any tool execution, or
`str`	`exhausted_response` if done and configured.

RAISES	DESCRIPTION
`UserExhaustedError`	If the user is already done and no `exhausted_response` is configured.

Interfaces

Some integrations provide convenience user implementations for specific agent frameworks. See the framework-specific interface pages for details:

SmolAgents — SmolAgentLLMUser
LangGraph — LangGraphLLMUser
LlamaIndex — LlamaIndexLLMUser
CAMEL-AI — CamelLLMUser, CamelAgentUser

User

User

gather_config

gather_traces

get_initial_query abstractmethod

get_tool

is_done abstractmethod

respond abstractmethod

LLMUser

termination_reason property

__init__

gather_config

gather_traces

get_initial_query

get_tool

increment_turn

is_done

respond

AgenticLLMUser

termination_reason property

__init__

gather_config

gather_traces

get_initial_query

get_tool

increment_turn

is_done

respond

Interfaces

get_initial_query `abstractmethod`

is_done `abstractmethod`

respond `abstractmethod`

termination_reason `property`

init

termination_reason `property`

init