Exceptions

Exception classes for error classification in benchmark execution.

Exception Hierarchy

MASEvalError (base)
├── AgentError           - Agent violated contract (agent's fault)
├── EnvironmentError     - Environment/tool failed (not agent's fault)
├── UserError            - User simulator failed (not agent's fault)
└── TaskTimeoutError     - Task exceeded configured timeout

SimulatorError (base for simulators)
├── ToolSimulatorError   - Also inherits EnvironmentError
└── UserSimulatorError   - Also inherits UserError

Core Exceptions

MASEvalError

Bases: Exception

Base exception for all MASEval-controlled component failures.

This is the base class for exceptions that occur at boundaries we control (tools, environment, user simulator). Errors from agent framework internals should NOT use this hierarchy - they remain as generic exceptions and are classified as UNKNOWN_EXECUTION_ERROR.

init

__init__(
    message: str,
    *,
    component: Optional[str] = None,
    details: Optional[Dict[str, Any]] = None,
)

Initialize MASEvalError.

PARAMETER	DESCRIPTION
`message`	Human-readable error description. TYPE: `str`
`component`	Name of the component that raised the error (e.g., tool name). TYPE: `Optional[str]` DEFAULT: `None`
`details`	Additional structured information about the error. TYPE: `Optional[Dict[str, Any]]` DEFAULT: `None`

AgentError

Bases: MASEvalError

Agent violated the contract at a boundary we control.

Raised when the agent provides invalid inputs to components we control. This is the agent's fault - these tasks count against their score.

The suggestion field provides agent-friendly hints for self-correction that some agent frameworks may use for automatic recovery.

When to raise

Agent passed wrong argument types to a tool
Agent passed arguments that violate documented constraints
Agent is missing required arguments
Agent called a tool with semantically invalid input
Agent exceeded documented limits (max retries, rate limits, etc.)

Examples:

# Wrong type with suggestion
raise AgentError(
    "Expected int for 'count', got str",
    component="search_tool",
    suggestion="Provide count as a number, e.g., count=10"
)

# Missing required argument
raise AgentError(
    "Missing required argument 'query'",
    component="search_tool",
    suggestion="Include query='your search terms'"
)

# Constraint violation
raise AgentError(
    "Argument 'limit' must be positive, got -5",
    component="fetch_tool",
    suggestion="Use a positive value, e.g., limit=10"
)

init

__init__(
    message: str,
    *,
    component: Optional[str] = None,
    details: Optional[Dict[str, Any]] = None,
    suggestion: Optional[str] = None,
)

Initialize AgentError.

PARAMETER	DESCRIPTION
`message`	Human-readable error description explaining what went wrong. TYPE: `str`
`component`	Name of the component that raised the error (e.g., tool name). TYPE: `Optional[str]` DEFAULT: `None`
`details`	Additional structured information about the error. TYPE: `Optional[Dict[str, Any]]` DEFAULT: `None`
`suggestion`	Agent-friendly hint for correcting the error. Some agent frameworks use this for automatic retry with corrected inputs. TYPE: `Optional[str]` DEFAULT: `None`

EnvironmentError

Bases: MASEvalError

Environment or tool infrastructure failed.

Raised when our code fails AFTER validating agent inputs. This indicates a problem with the evaluation infrastructure, not the agent's behavior. These tasks should be excluded from agent scoring.

When to raise

Tool implementation has a bug
External API/database our tool depends on failed
ToolLLMSimulator failed to parse model output
Model adapter for tool simulation failed
Resource exhaustion in environment components
File I/O errors in environment setup

Examples:

# Tool bug
raise EnvironmentError("Internal error in calculation", component="calc_tool")

# External dependency failed
raise EnvironmentError("Database connection failed", component="db_tool")

# Simulator failed
raise EnvironmentError(
    "Failed to parse LLM response after 3 attempts",
    component="flight_search",
    details={"attempts": 3, "last_error": "Invalid JSON"}
)

Note

Python has a built-in EnvironmentError (alias for OSError), but it's rarely used directly. This class shadows it intentionally for clean semantics. If you need the built-in, use OSError explicitly.

init

__init__(
    message: str,
    *,
    component: Optional[str] = None,
    details: Optional[Dict[str, Any]] = None,
)

Initialize MASEvalError.

PARAMETER	DESCRIPTION
`message`	Human-readable error description. TYPE: `str`
`component`	Name of the component that raised the error (e.g., tool name). TYPE: `Optional[str]` DEFAULT: `None`
`details`	Additional structured information about the error. TYPE: `Optional[Dict[str, Any]]` DEFAULT: `None`

UserError

Bases: MASEvalError

User simulator failed.

Raised when the user simulation infrastructure fails. This is NOT the agent's fault - these tasks should be excluded from agent scoring.

When to raise

UserLLMSimulator couldn't reach the LLM API
User model returned unparseable response after retries
User simulator configuration error
User profile data is malformed

Examples:

# API failure
raise UserError("OpenAI API unreachable", component="user_simulator")

# Parse failure
raise UserError(
    "Failed to parse user response after 3 attempts",
    component="user_simulator",
    details={"attempts": 3, "last_error": "Missing 'text' field"}
)

init

__init__(
    message: str,
    *,
    component: Optional[str] = None,
    details: Optional[Dict[str, Any]] = None,
)

Initialize MASEvalError.

PARAMETER	DESCRIPTION
`message`	Human-readable error description. TYPE: `str`
`component`	Name of the component that raised the error (e.g., tool name). TYPE: `Optional[str]` DEFAULT: `None`
`details`	Additional structured information about the error. TYPE: `Optional[Dict[str, Any]]` DEFAULT: `None`

TaskTimeoutError

Bases: MASEvalError

Task execution exceeded configured timeout.

This is classified as TASK_TIMEOUT in benchmark results, separate from other error types. Timeout is neither agent's fault nor infrastructure's fault—it's a resource constraint.

When to raise

Task execution time exceeds TaskProtocol.timeout_seconds
Cooperative timeout check detects deadline has passed
Hard timeout backstop triggers

ATTRIBUTE	DESCRIPTION
`elapsed`	Time elapsed before timeout was detected.
`timeout`	The configured timeout value in seconds.
`partial_traces`	Any traces collected before timeout occurred.

Examples:

# Cooperative timeout at checkpoint
raise TaskTimeoutError(
    "Task exceeded 60s deadline",
    component="execution_loop",
    elapsed=62.5,
    timeout=60.0
)

# Hard timeout with partial traces
raise TaskTimeoutError(
    "Task exceeded 120s hard deadline",
    component="timeout_backstop",
    elapsed=125.0,
    timeout=120.0,
    partial_traces={"agents": {"main": {"messages": [...]}}}
)

init

__init__(
    message: str,
    *,
    component: Optional[str] = None,
    details: Optional[Dict[str, Any]] = None,
    elapsed: float = 0.0,
    timeout: float = 0.0,
    partial_traces: Optional[Dict[str, Any]] = None,
)

Initialize TaskTimeoutError.

PARAMETER	DESCRIPTION
`message`	Human-readable error description. TYPE: `str`
`component`	Name of the component that raised the error. TYPE: `Optional[str]` DEFAULT: `None`
`details`	Additional structured information about the error. TYPE: `Optional[Dict[str, Any]]` DEFAULT: `None`
`elapsed`	Time elapsed before timeout was detected. TYPE: `float` DEFAULT: `0.0`
`timeout`	The configured timeout value in seconds. TYPE: `float` DEFAULT: `0.0`
`partial_traces`	Any traces collected before timeout occurred. TYPE: `Optional[Dict[str, Any]]` DEFAULT: `None`

Simulator Exceptions

SimulatorError

Bases: Exception

Base exception for simulator failures.

This exception is raised when an LLM simulator exhausts all retry attempts without successfully parsing the model output.

Note

Subclasses (ToolSimulatorError, UserSimulatorError) inherit from the appropriate MASEval exception type for proper error classification. Use those specific subclasses in concrete simulators.

ATTRIBUTE	DESCRIPTION
`message`	Description of the failure.
`attempts`	Number of attempts made before failing.
`last_error`	The last error encountered during parsing.
`logs`	The complete log of all attempts for debugging.

ToolSimulatorError

Bases: SimulatorError, EnvironmentError

Tool simulator failed - not the agent's fault.

Raised when ToolLLMSimulator fails after exhausting retries. This inherits from EnvironmentError, so it's classified as ENVIRONMENT_ERROR in benchmark results.

UserSimulatorError

Bases: SimulatorError, UserError

User simulator failed - not the agent's fault.

Raised when UserLLMSimulator fails after exhausting retries. This inherits from UserError, so it's classified as USER_ERROR in benchmark results.

Validation Helpers

These functions simplify input validation and raise AgentError with helpful suggestions:

validate_argument_type

validate_argument_type(
    value: Any,
    expected_type: str,
    arg_name: str,
    component: Optional[str] = None,
) -> None

Validate that a value matches an expected JSON schema type.

Raises AgentError if validation fails.

PARAMETER	DESCRIPTION
`value`	The value to validate. TYPE: `Any`
`expected_type`	JSON schema type ("string", "integer", "number", "boolean", "array", "object"). TYPE: `str`
`arg_name`	Name of the argument (for error message). TYPE: `str`
`component`	Optional component name for error context. TYPE: `Optional[str]` DEFAULT: `None`

RAISES	DESCRIPTION
`AgentError`	If value doesn't match expected type.

Example

def my_tool(count: int, name: str):
    validate_argument_type(count, "integer", "count", "my_tool")
    validate_argument_type(name, "string", "name", "my_tool")
    # ... tool logic

validate_required_arguments

validate_required_arguments(
    kwargs: Dict[str, Any],
    required: List[str],
    component: Optional[str] = None,
) -> None

Validate that all required arguments are present.

Raises AgentError if any required argument is missing.

PARAMETER	DESCRIPTION
`kwargs`	The keyword arguments dict to validate. TYPE: `Dict[str, Any]`
`required`	List of required argument names. TYPE: `List[str]`
`component`	Optional component name for error context. TYPE: `Optional[str]` DEFAULT: `None`

RAISES	DESCRIPTION
`AgentError`	If any required argument is missing.

Example

def my_tool(**kwargs):
    validate_required_arguments(kwargs, ["query", "limit"], "my_tool")
    # ... tool logic

validate_no_extra_arguments

validate_no_extra_arguments(
    kwargs: Dict[str, Any],
    allowed: List[str],
    component: Optional[str] = None,
) -> None

Validate that no unexpected arguments are present.

Raises AgentError if any argument is not in the allowed list.

PARAMETER	DESCRIPTION
`kwargs`	The keyword arguments dict to validate. TYPE: `Dict[str, Any]`
`allowed`	List of allowed argument names. TYPE: `List[str]`
`component`	Optional component name for error context. TYPE: `Optional[str]` DEFAULT: `None`

RAISES	DESCRIPTION
`AgentError`	If any unexpected argument is present.

Example

def my_tool(**kwargs):
    validate_no_extra_arguments(kwargs, ["query", "limit"], "my_tool")
    # ... tool logic

validate_arguments_from_schema

validate_arguments_from_schema(
    kwargs: Dict[str, Any],
    schema: Dict[str, Any],
    component: Optional[str] = None,
    *,
    strict: bool = False,
) -> None

Validate arguments against a JSON schema.

This is the main validation function for tool implementers. It validates: - Required arguments are present - Argument types match the schema - No extra arguments (if strict=True)

PARAMETER	DESCRIPTION
`kwargs`	The keyword arguments dict to validate. TYPE: `Dict[str, Any]`
`schema`	JSON schema with 'properties' and optionally 'required'. TYPE: `Dict[str, Any]`
`component`	Optional component name for error context. TYPE: `Optional[str]` DEFAULT: `None`
`strict`	If True, reject arguments not in schema. Default False. TYPE: `bool` DEFAULT: `False`

RAISES	DESCRIPTION
`AgentError`	If validation fails.

Example

SCHEMA = {
    "properties": {
        "query": {"type": "string"},
        "limit": {"type": "integer"},
    },
    "required": ["query"],
}

def my_tool(**kwargs):
    validate_arguments_from_schema(kwargs, SCHEMA, "my_tool")
    # ... tool logic