Skip to content

Exceptions

Exception classes for error classification in benchmark execution.

View source

Exception Hierarchy

MASEvalError (base)
├── AgentError           - Agent violated contract (agent's fault)
├── EnvironmentError     - Environment/tool failed (not agent's fault)
├── UserError            - User simulator failed (not agent's fault)
└── TaskTimeoutError     - Task exceeded configured timeout

SimulatorError (base for simulators)
├── ToolSimulatorError   - Also inherits EnvironmentError
└── UserSimulatorError   - Also inherits UserError

Core Exceptions

MASEvalError

Bases: Exception

Base exception for all MASEval-controlled component failures.

This is the base class for exceptions that occur at boundaries we control (tools, environment, user simulator). Errors from agent framework internals should NOT use this hierarchy - they remain as generic exceptions and are classified as UNKNOWN_EXECUTION_ERROR.

__init__

__init__(
    message: str,
    *,
    component: Optional[str] = None,
    details: Optional[Dict[str, Any]] = None,
)

Initialize MASEvalError.

PARAMETER DESCRIPTION
message

Human-readable error description.

TYPE: str

component

Name of the component that raised the error (e.g., tool name).

TYPE: Optional[str] DEFAULT: None

details

Additional structured information about the error.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

AgentError

Bases: MASEvalError

Agent violated the contract at a boundary we control.

Raised when the agent provides invalid inputs to components we control. This is the agent's fault - these tasks count against their score.

The suggestion field provides agent-friendly hints for self-correction that some agent frameworks may use for automatic recovery.

When to raise
  • Agent passed wrong argument types to a tool
  • Agent passed arguments that violate documented constraints
  • Agent is missing required arguments
  • Agent called a tool with semantically invalid input
  • Agent exceeded documented limits (max retries, rate limits, etc.)

Examples:

# Wrong type with suggestion
raise AgentError(
    "Expected int for 'count', got str",
    component="search_tool",
    suggestion="Provide count as a number, e.g., count=10"
)

# Missing required argument
raise AgentError(
    "Missing required argument 'query'",
    component="search_tool",
    suggestion="Include query='your search terms'"
)

# Constraint violation
raise AgentError(
    "Argument 'limit' must be positive, got -5",
    component="fetch_tool",
    suggestion="Use a positive value, e.g., limit=10"
)

__init__

__init__(
    message: str,
    *,
    component: Optional[str] = None,
    details: Optional[Dict[str, Any]] = None,
    suggestion: Optional[str] = None,
)

Initialize AgentError.

PARAMETER DESCRIPTION
message

Human-readable error description explaining what went wrong.

TYPE: str

component

Name of the component that raised the error (e.g., tool name).

TYPE: Optional[str] DEFAULT: None

details

Additional structured information about the error.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

suggestion

Agent-friendly hint for correcting the error. Some agent frameworks use this for automatic retry with corrected inputs.

TYPE: Optional[str] DEFAULT: None

EnvironmentError

Bases: MASEvalError

Environment or tool infrastructure failed.

Raised when our code fails AFTER validating agent inputs. This indicates a problem with the evaluation infrastructure, not the agent's behavior. These tasks should be excluded from agent scoring.

When to raise
  • Tool implementation has a bug
  • External API/database our tool depends on failed
  • ToolLLMSimulator failed to parse model output
  • Model adapter for tool simulation failed
  • Resource exhaustion in environment components
  • File I/O errors in environment setup

Examples:

# Tool bug
raise EnvironmentError("Internal error in calculation", component="calc_tool")

# External dependency failed
raise EnvironmentError("Database connection failed", component="db_tool")

# Simulator failed
raise EnvironmentError(
    "Failed to parse LLM response after 3 attempts",
    component="flight_search",
    details={"attempts": 3, "last_error": "Invalid JSON"}
)
Note

Python has a built-in EnvironmentError (alias for OSError), but it's rarely used directly. This class shadows it intentionally for clean semantics. If you need the built-in, use OSError explicitly.

__init__

__init__(
    message: str,
    *,
    component: Optional[str] = None,
    details: Optional[Dict[str, Any]] = None,
)

Initialize MASEvalError.

PARAMETER DESCRIPTION
message

Human-readable error description.

TYPE: str

component

Name of the component that raised the error (e.g., tool name).

TYPE: Optional[str] DEFAULT: None

details

Additional structured information about the error.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

UserError

Bases: MASEvalError

User simulator failed.

Raised when the user simulation infrastructure fails. This is NOT the agent's fault - these tasks should be excluded from agent scoring.

When to raise
  • UserLLMSimulator couldn't reach the LLM API
  • User model returned unparseable response after retries
  • User simulator configuration error
  • User profile data is malformed

Examples:

# API failure
raise UserError("OpenAI API unreachable", component="user_simulator")

# Parse failure
raise UserError(
    "Failed to parse user response after 3 attempts",
    component="user_simulator",
    details={"attempts": 3, "last_error": "Missing 'text' field"}
)

__init__

__init__(
    message: str,
    *,
    component: Optional[str] = None,
    details: Optional[Dict[str, Any]] = None,
)

Initialize MASEvalError.

PARAMETER DESCRIPTION
message

Human-readable error description.

TYPE: str

component

Name of the component that raised the error (e.g., tool name).

TYPE: Optional[str] DEFAULT: None

details

Additional structured information about the error.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

TaskTimeoutError

Bases: MASEvalError

Task execution exceeded configured timeout.

This is classified as TASK_TIMEOUT in benchmark results, separate from other error types. Timeout is neither agent's fault nor infrastructure's fault—it's a resource constraint.

When to raise
  • Task execution time exceeds TaskProtocol.timeout_seconds
  • Cooperative timeout check detects deadline has passed
  • Hard timeout backstop triggers
ATTRIBUTE DESCRIPTION
elapsed

Time elapsed before timeout was detected.

timeout

The configured timeout value in seconds.

partial_traces

Any traces collected before timeout occurred.

Examples:

# Cooperative timeout at checkpoint
raise TaskTimeoutError(
    "Task exceeded 60s deadline",
    component="execution_loop",
    elapsed=62.5,
    timeout=60.0
)

# Hard timeout with partial traces
raise TaskTimeoutError(
    "Task exceeded 120s hard deadline",
    component="timeout_backstop",
    elapsed=125.0,
    timeout=120.0,
    partial_traces={"agents": {"main": {"messages": [...]}}}
)

__init__

__init__(
    message: str,
    *,
    component: Optional[str] = None,
    details: Optional[Dict[str, Any]] = None,
    elapsed: float = 0.0,
    timeout: float = 0.0,
    partial_traces: Optional[Dict[str, Any]] = None,
)

Initialize TaskTimeoutError.

PARAMETER DESCRIPTION
message

Human-readable error description.

TYPE: str

component

Name of the component that raised the error.

TYPE: Optional[str] DEFAULT: None

details

Additional structured information about the error.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

elapsed

Time elapsed before timeout was detected.

TYPE: float DEFAULT: 0.0

timeout

The configured timeout value in seconds.

TYPE: float DEFAULT: 0.0

partial_traces

Any traces collected before timeout occurred.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

Simulator Exceptions

SimulatorError

Bases: Exception

Base exception for simulator failures.

This exception is raised when an LLM simulator exhausts all retry attempts without successfully parsing the model output.

Note

Subclasses (ToolSimulatorError, UserSimulatorError) inherit from the appropriate MASEval exception type for proper error classification. Use those specific subclasses in concrete simulators.

ATTRIBUTE DESCRIPTION
message

Description of the failure.

attempts

Number of attempts made before failing.

last_error

The last error encountered during parsing.

logs

The complete log of all attempts for debugging.

ToolSimulatorError

Bases: SimulatorError, EnvironmentError

Tool simulator failed - not the agent's fault.

Raised when ToolLLMSimulator fails after exhausting retries. This inherits from EnvironmentError, so it's classified as ENVIRONMENT_ERROR in benchmark results.

UserSimulatorError

Bases: SimulatorError, UserError

User simulator failed - not the agent's fault.

Raised when UserLLMSimulator fails after exhausting retries. This inherits from UserError, so it's classified as USER_ERROR in benchmark results.

Validation Helpers

These functions simplify input validation and raise AgentError with helpful suggestions:

validate_argument_type

validate_argument_type(
    value: Any,
    expected_type: str,
    arg_name: str,
    component: Optional[str] = None,
) -> None

Validate that a value matches an expected JSON schema type.

Raises AgentError if validation fails.

PARAMETER DESCRIPTION
value

The value to validate.

TYPE: Any

expected_type

JSON schema type ("string", "integer", "number", "boolean", "array", "object").

TYPE: str

arg_name

Name of the argument (for error message).

TYPE: str

component

Optional component name for error context.

TYPE: Optional[str] DEFAULT: None

RAISES DESCRIPTION
AgentError

If value doesn't match expected type.

Example
def my_tool(count: int, name: str):
    validate_argument_type(count, "integer", "count", "my_tool")
    validate_argument_type(name, "string", "name", "my_tool")
    # ... tool logic

validate_required_arguments

validate_required_arguments(
    kwargs: Dict[str, Any],
    required: List[str],
    component: Optional[str] = None,
) -> None

Validate that all required arguments are present.

Raises AgentError if any required argument is missing.

PARAMETER DESCRIPTION
kwargs

The keyword arguments dict to validate.

TYPE: Dict[str, Any]

required

List of required argument names.

TYPE: List[str]

component

Optional component name for error context.

TYPE: Optional[str] DEFAULT: None

RAISES DESCRIPTION
AgentError

If any required argument is missing.

Example
def my_tool(**kwargs):
    validate_required_arguments(kwargs, ["query", "limit"], "my_tool")
    # ... tool logic

validate_no_extra_arguments

validate_no_extra_arguments(
    kwargs: Dict[str, Any],
    allowed: List[str],
    component: Optional[str] = None,
) -> None

Validate that no unexpected arguments are present.

Raises AgentError if any argument is not in the allowed list.

PARAMETER DESCRIPTION
kwargs

The keyword arguments dict to validate.

TYPE: Dict[str, Any]

allowed

List of allowed argument names.

TYPE: List[str]

component

Optional component name for error context.

TYPE: Optional[str] DEFAULT: None

RAISES DESCRIPTION
AgentError

If any unexpected argument is present.

Example
def my_tool(**kwargs):
    validate_no_extra_arguments(kwargs, ["query", "limit"], "my_tool")
    # ... tool logic

validate_arguments_from_schema

validate_arguments_from_schema(
    kwargs: Dict[str, Any],
    schema: Dict[str, Any],
    component: Optional[str] = None,
    *,
    strict: bool = False,
) -> None

Validate arguments against a JSON schema.

This is the main validation function for tool implementers. It validates: - Required arguments are present - Argument types match the schema - No extra arguments (if strict=True)

PARAMETER DESCRIPTION
kwargs

The keyword arguments dict to validate.

TYPE: Dict[str, Any]

schema

JSON schema with 'properties' and optionally 'required'.

TYPE: Dict[str, Any]

component

Optional component name for error context.

TYPE: Optional[str] DEFAULT: None

strict

If True, reject arguments not in schema. Default False.

TYPE: bool DEFAULT: False

RAISES DESCRIPTION
AgentError

If validation fails.

Example
SCHEMA = {
    "properties": {
        "query": {"type": "string"},
        "limit": {"type": "integer"},
    },
    "required": ["query"],
}

def my_tool(**kwargs):
    validate_arguments_from_schema(kwargs, SCHEMA, "my_tool")
    # ... tool logic