Skip to content

Utilities

In here, utilities are documented that are used across the library.

View source

TraceableMixin

Mixin that provides tracing capability to any component.

Classes that inherit from TraceableMixin can be registered with a Benchmark instance and will have their traces automatically collected before evaluation.

The gather_traces() method provides a default implementation that returns basic metadata (component type and timestamp). Subclasses can override or extend this method to include component-specific execution information (messages, invocations, errors, etc.). The returned dictionary must be JSON-serializable.

How to use

All core MASEval components (AgentAdapter, ModelAdapter, Environment, User, LLMSimulator, BenchmarkCallback, etc.) inherit from TraceableMixin by default and provide comprehensive tracing out of the box.

For custom components, simply inherit from TraceableMixin and optionally extend the gather_traces() method to add your own tracing data:

class MyCustomTool(TraceableMixin):
    def __init__(self):
        self.logs = []

    def execute(self, *args, **kwargs):
        result = self._do_work(*args, **kwargs)
        self.logs.append({
            "timestamp": datetime.now().isoformat(),
            "args": args,
            "kwargs": kwargs,
            "result": result
        })
        return result

    def gather_traces(self) -> Dict[str, Any]:
        return {
            **super().gather_traces(),
            "total_calls": len(self.logs),
            "logs": self.logs
        }

Then register it with your benchmark:

benchmark = MyBenchmark(tasks, agent_data)
tool = MyCustomTool()
benchmark.register("tool", "my_tool", tool)
Thread Safety

Trace collection happens synchronously in the main thread after all task execution completes. Individual components should use appropriate thread-safe data structures (e.g., threading.Lock) when accumulating traces during concurrent execution, but the gather_traces() method itself is called sequentially.

Note

Components can store traces in any internal data structure. Common patterns include self.logs = [] for invocation histories, self._messages = MessageHistory() for conversations, and self.logs = [] for simulator attempts.

gather_traces

gather_traces() -> Dict[str, Any]

Gather execution traces from this component.

Provides a default implementation that returns basic metadata about the component (type and collection timestamp). Subclasses should extend this method to include their own execution data.

This method is called by the Benchmark before evaluation to collect all execution data. The returned dictionary must be JSON-serializable.

Output fields:

  • type - Component class name
  • gathered_at - ISO timestamp of when traces were collected

Subclasses typically add additional component-specific data.

RETURNS DESCRIPTION
Dict[str, Any]

Dictionary containing traces with standardized structure.

How to use

Override this method and call super().gather_traces() to extend the base implementation with your own data:

def gather_traces(self) -> Dict[str, Any]:
    return {
        **super().gather_traces(),
        "my_field": self._my_data,
        "execution_count": len(self._history)
    }

If you don't need custom tracing, you can use the default implementation without overriding (it will still return basic metadata about your component).

View source

ConfigurableMixin

Mixin that provides configuration gathering capability to any component.

Classes that inherit from ConfigurableMixin can be registered with a Benchmark instance and will have their configurations automatically collected before evaluation.

The gather_config() method provides a default implementation that returns basic metadata (component type and timestamp). Subclasses can override or extend this method to include component-specific configuration information (model parameters, agent settings, tool specifications, etc.). The returned dictionary must be JSON-serializable.

How to use

All core MASEval components (AgentAdapter, ModelAdapter, Environment, User, LLMSimulator, BenchmarkCallback, etc.) inherit from ConfigurableMixin by default and provide comprehensive configuration out of the box.

For custom components, simply inherit from ConfigurableMixin and optionally extend the gather_config() method to add your own configuration data:

class MyCustomTool(ConfigurableMixin):
    def __init__(self, temperature: float = 0.7, max_retries: int = 3):
        self.temperature = temperature
        self.max_retries = max_retries

    def gather_config(self) -> Dict[str, Any]:
        return {
            **super().gather_config(),
            "temperature": self.temperature,
            "max_retries": self.max_retries,
            "version": "1.0.0"
        }

Then register it with your benchmark:

benchmark = MyBenchmark(tasks, agent_data)
tool = MyCustomTool(temperature=0.9)
benchmark.register("tool", "my_tool", tool)
Thread Safety

Configuration collection happens synchronously in the main thread after all task execution completes. The gather_config() method is called sequentially and should return static configuration data (not runtime state).

Note

Components should expose their configuration through instance variables or properties that can be accessed during configuration gathering.

gather_config

gather_config() -> Dict[str, Any]

Gather configuration from this component.

Provides a default implementation that returns basic metadata about the component (type and collection timestamp). Subclasses should extend this method to include their own configuration data.

This method is called by the Benchmark before evaluation to collect all configuration information. The returned dictionary must be JSON-serializable.

Output fields:

  • type - Component class name
  • gathered_at - ISO timestamp of when config was collected

Subclasses typically add additional component-specific configuration.

RETURNS DESCRIPTION
Dict[str, Any]

Dictionary containing configuration with standardized structure.

How to use

Override this method and call super().gather_config() to extend the base implementation with your own data:

def gather_config(self) -> Dict[str, Any]:
    return {
        **super().gather_config(),
        "model_name": self.model_name,
        "temperature": self.temperature,
        "max_tokens": self.max_tokens
    }

If you don't need custom configuration tracking, you can use the default implementation without overriding (it will still return basic metadata about your component).