Exceptions
Exception classes for error classification in benchmark execution.
Exception Hierarchy
MASEvalError (base)
├── AgentError - Agent violated contract (agent's fault)
├── EnvironmentError - Environment/tool failed (not agent's fault)
├── UserError - User simulator failed (not agent's fault)
└── TaskTimeoutError - Task exceeded configured timeout
SimulatorError (base for simulators)
├── ToolSimulatorError - Also inherits EnvironmentError
└── UserSimulatorError - Also inherits UserError
Core Exceptions
MASEvalError
Bases: Exception
Base exception for all MASEval-controlled component failures.
This is the base class for exceptions that occur at boundaries we control (tools, environment, user simulator). Errors from agent framework internals should NOT use this hierarchy - they remain as generic exceptions and are classified as UNKNOWN_EXECUTION_ERROR.
__init__
__init__(
message: str,
*,
component: Optional[str] = None,
details: Optional[Dict[str, Any]] = None,
)
Initialize MASEvalError.
| PARAMETER | DESCRIPTION |
|---|---|
message
|
Human-readable error description.
TYPE:
|
component
|
Name of the component that raised the error (e.g., tool name).
TYPE:
|
details
|
Additional structured information about the error.
TYPE:
|
AgentError
Bases: MASEvalError
Agent violated the contract at a boundary we control.
Raised when the agent provides invalid inputs to components we control. This is the agent's fault - these tasks count against their score.
The suggestion field provides agent-friendly hints for self-correction
that some agent frameworks may use for automatic recovery.
When to raise
- Agent passed wrong argument types to a tool
- Agent passed arguments that violate documented constraints
- Agent is missing required arguments
- Agent called a tool with semantically invalid input
- Agent exceeded documented limits (max retries, rate limits, etc.)
Examples:
# Wrong type with suggestion
raise AgentError(
"Expected int for 'count', got str",
component="search_tool",
suggestion="Provide count as a number, e.g., count=10"
)
# Missing required argument
raise AgentError(
"Missing required argument 'query'",
component="search_tool",
suggestion="Include query='your search terms'"
)
# Constraint violation
raise AgentError(
"Argument 'limit' must be positive, got -5",
component="fetch_tool",
suggestion="Use a positive value, e.g., limit=10"
)
__init__
__init__(
message: str,
*,
component: Optional[str] = None,
details: Optional[Dict[str, Any]] = None,
suggestion: Optional[str] = None,
)
Initialize AgentError.
| PARAMETER | DESCRIPTION |
|---|---|
message
|
Human-readable error description explaining what went wrong.
TYPE:
|
component
|
Name of the component that raised the error (e.g., tool name).
TYPE:
|
details
|
Additional structured information about the error.
TYPE:
|
suggestion
|
Agent-friendly hint for correcting the error. Some agent frameworks use this for automatic retry with corrected inputs.
TYPE:
|
EnvironmentError
Bases: MASEvalError
Environment or tool infrastructure failed.
Raised when our code fails AFTER validating agent inputs. This indicates a problem with the evaluation infrastructure, not the agent's behavior. These tasks should be excluded from agent scoring.
When to raise
- Tool implementation has a bug
- External API/database our tool depends on failed
- ToolLLMSimulator failed to parse model output
- Model adapter for tool simulation failed
- Resource exhaustion in environment components
- File I/O errors in environment setup
Examples:
# Tool bug
raise EnvironmentError("Internal error in calculation", component="calc_tool")
# External dependency failed
raise EnvironmentError("Database connection failed", component="db_tool")
# Simulator failed
raise EnvironmentError(
"Failed to parse LLM response after 3 attempts",
component="flight_search",
details={"attempts": 3, "last_error": "Invalid JSON"}
)
Note
Python has a built-in EnvironmentError (alias for OSError), but it's
rarely used directly. This class shadows it intentionally for clean semantics.
If you need the built-in, use OSError explicitly.
__init__
__init__(
message: str,
*,
component: Optional[str] = None,
details: Optional[Dict[str, Any]] = None,
)
Initialize MASEvalError.
| PARAMETER | DESCRIPTION |
|---|---|
message
|
Human-readable error description.
TYPE:
|
component
|
Name of the component that raised the error (e.g., tool name).
TYPE:
|
details
|
Additional structured information about the error.
TYPE:
|
UserError
Bases: MASEvalError
User simulator failed.
Raised when the user simulation infrastructure fails. This is NOT the agent's fault - these tasks should be excluded from agent scoring.
When to raise
- UserLLMSimulator couldn't reach the LLM API
- User model returned unparseable response after retries
- User simulator configuration error
- User profile data is malformed
Examples:
# API failure
raise UserError("OpenAI API unreachable", component="user_simulator")
# Parse failure
raise UserError(
"Failed to parse user response after 3 attempts",
component="user_simulator",
details={"attempts": 3, "last_error": "Missing 'text' field"}
)
__init__
__init__(
message: str,
*,
component: Optional[str] = None,
details: Optional[Dict[str, Any]] = None,
)
Initialize MASEvalError.
| PARAMETER | DESCRIPTION |
|---|---|
message
|
Human-readable error description.
TYPE:
|
component
|
Name of the component that raised the error (e.g., tool name).
TYPE:
|
details
|
Additional structured information about the error.
TYPE:
|
TaskTimeoutError
Bases: MASEvalError
Task execution exceeded configured timeout.
This is classified as TASK_TIMEOUT in benchmark results, separate from other error types. Timeout is neither agent's fault nor infrastructure's fault—it's a resource constraint.
When to raise
- Task execution time exceeds TaskProtocol.timeout_seconds
- Cooperative timeout check detects deadline has passed
- Hard timeout backstop triggers
| ATTRIBUTE | DESCRIPTION |
|---|---|
elapsed |
Time elapsed before timeout was detected.
|
timeout |
The configured timeout value in seconds.
|
partial_traces |
Any traces collected before timeout occurred.
|
Examples:
# Cooperative timeout at checkpoint
raise TaskTimeoutError(
"Task exceeded 60s deadline",
component="execution_loop",
elapsed=62.5,
timeout=60.0
)
# Hard timeout with partial traces
raise TaskTimeoutError(
"Task exceeded 120s hard deadline",
component="timeout_backstop",
elapsed=125.0,
timeout=120.0,
partial_traces={"agents": {"main": {"messages": [...]}}}
)
__init__
__init__(
message: str,
*,
component: Optional[str] = None,
details: Optional[Dict[str, Any]] = None,
elapsed: float = 0.0,
timeout: float = 0.0,
partial_traces: Optional[Dict[str, Any]] = None,
)
Initialize TaskTimeoutError.
| PARAMETER | DESCRIPTION |
|---|---|
message
|
Human-readable error description.
TYPE:
|
component
|
Name of the component that raised the error.
TYPE:
|
details
|
Additional structured information about the error.
TYPE:
|
elapsed
|
Time elapsed before timeout was detected.
TYPE:
|
timeout
|
The configured timeout value in seconds.
TYPE:
|
partial_traces
|
Any traces collected before timeout occurred.
TYPE:
|
Simulator Exceptions
SimulatorError
Bases: Exception
Base exception for simulator failures.
This exception is raised when an LLM simulator exhausts all retry attempts without successfully parsing the model output.
Note
Subclasses (ToolSimulatorError, UserSimulatorError) inherit from the appropriate MASEval exception type for proper error classification. Use those specific subclasses in concrete simulators.
| ATTRIBUTE | DESCRIPTION |
|---|---|
message |
Description of the failure.
|
attempts |
Number of attempts made before failing.
|
last_error |
The last error encountered during parsing.
|
logs |
The complete log of all attempts for debugging.
|
ToolSimulatorError
Bases: SimulatorError, EnvironmentError
Tool simulator failed - not the agent's fault.
Raised when ToolLLMSimulator fails after exhausting retries. This inherits from EnvironmentError, so it's classified as ENVIRONMENT_ERROR in benchmark results.
UserSimulatorError
Bases: SimulatorError, UserError
User simulator failed - not the agent's fault.
Raised when UserLLMSimulator fails after exhausting retries. This inherits from UserError, so it's classified as USER_ERROR in benchmark results.
Validation Helpers
These functions simplify input validation and raise AgentError with helpful suggestions:
validate_argument_type
validate_argument_type(
value: Any,
expected_type: str,
arg_name: str,
component: Optional[str] = None,
) -> None
Validate that a value matches an expected JSON schema type.
Raises AgentError if validation fails.
| PARAMETER | DESCRIPTION |
|---|---|
value
|
The value to validate.
TYPE:
|
expected_type
|
JSON schema type ("string", "integer", "number", "boolean", "array", "object").
TYPE:
|
arg_name
|
Name of the argument (for error message).
TYPE:
|
component
|
Optional component name for error context.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
AgentError
|
If value doesn't match expected type. |
Example
def my_tool(count: int, name: str):
validate_argument_type(count, "integer", "count", "my_tool")
validate_argument_type(name, "string", "name", "my_tool")
# ... tool logic
validate_required_arguments
validate_required_arguments(
kwargs: Dict[str, Any],
required: List[str],
component: Optional[str] = None,
) -> None
Validate that all required arguments are present.
Raises AgentError if any required argument is missing.
| PARAMETER | DESCRIPTION |
|---|---|
kwargs
|
The keyword arguments dict to validate.
TYPE:
|
required
|
List of required argument names.
TYPE:
|
component
|
Optional component name for error context.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
AgentError
|
If any required argument is missing. |
Example
def my_tool(**kwargs):
validate_required_arguments(kwargs, ["query", "limit"], "my_tool")
# ... tool logic
validate_no_extra_arguments
validate_no_extra_arguments(
kwargs: Dict[str, Any],
allowed: List[str],
component: Optional[str] = None,
) -> None
Validate that no unexpected arguments are present.
Raises AgentError if any argument is not in the allowed list.
| PARAMETER | DESCRIPTION |
|---|---|
kwargs
|
The keyword arguments dict to validate.
TYPE:
|
allowed
|
List of allowed argument names.
TYPE:
|
component
|
Optional component name for error context.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
AgentError
|
If any unexpected argument is present. |
Example
def my_tool(**kwargs):
validate_no_extra_arguments(kwargs, ["query", "limit"], "my_tool")
# ... tool logic
validate_arguments_from_schema
validate_arguments_from_schema(
kwargs: Dict[str, Any],
schema: Dict[str, Any],
component: Optional[str] = None,
*,
strict: bool = False,
) -> None
Validate arguments against a JSON schema.
This is the main validation function for tool implementers. It validates: - Required arguments are present - Argument types match the schema - No extra arguments (if strict=True)
| PARAMETER | DESCRIPTION |
|---|---|
kwargs
|
The keyword arguments dict to validate.
TYPE:
|
schema
|
JSON schema with 'properties' and optionally 'required'.
TYPE:
|
component
|
Optional component name for error context.
TYPE:
|
strict
|
If True, reject arguments not in schema. Default False.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
AgentError
|
If validation fails. |
Example
SCHEMA = {
"properties": {
"query": {"type": "string"},
"limit": {"type": "integer"},
},
"required": ["query"],
}
def my_tool(**kwargs):
validate_arguments_from_schema(kwargs, SCHEMA, "my_tool")
# ... tool logic