Utilities
In here, utilities are documented that are used across the library.
TraceableMixin
Mixin that provides tracing capability to any component.
Classes that inherit from TraceableMixin can be registered with a Benchmark instance and will have their traces automatically collected before evaluation.
The gather_traces() method provides a default implementation that returns
basic metadata (component type and timestamp). Subclasses can override or
extend this method to include component-specific execution information
(messages, invocations, errors, etc.). The returned dictionary must be
JSON-serializable.
How to use
All core MASEval components (AgentAdapter, ModelAdapter, Environment, User, LLMSimulator, BenchmarkCallback, etc.) inherit from TraceableMixin by default and provide comprehensive tracing out of the box.
For custom components, simply inherit from TraceableMixin and optionally
extend the gather_traces() method to add your own tracing data:
class MyCustomTool(TraceableMixin):
def __init__(self):
self.logs = []
def execute(self, *args, **kwargs):
result = self._do_work(*args, **kwargs)
self.logs.append({
"timestamp": datetime.now().isoformat(),
"args": args,
"kwargs": kwargs,
"result": result
})
return result
def gather_traces(self) -> Dict[str, Any]:
return {
**super().gather_traces(),
"total_calls": len(self.logs),
"logs": self.logs
}
Then register it with your benchmark:
benchmark = MyBenchmark(tasks, agent_data)
tool = MyCustomTool()
benchmark.register("tool", "my_tool", tool)
Thread Safety
Trace collection happens synchronously in the main thread after all
task execution completes. Individual components should use appropriate
thread-safe data structures (e.g., threading.Lock) when accumulating
traces during concurrent execution, but the gather_traces() method
itself is called sequentially.
Note
Components can store traces in any internal data structure. Common patterns
include self.logs = [] for invocation histories,
self._messages = MessageHistory() for conversations,
and self.logs = [] for simulator attempts.
gather_traces
gather_traces() -> Dict[str, Any]
Gather execution traces from this component.
Provides a default implementation that returns basic metadata about the component (type and collection timestamp). Subclasses should extend this method to include their own execution data.
This method is called by the Benchmark before evaluation to collect all execution data. The returned dictionary must be JSON-serializable.
Output fields:
type- Component class namegathered_at- ISO timestamp of when traces were collected
Subclasses typically add additional component-specific data.
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing traces with standardized structure. |
How to use
Override this method and call super().gather_traces() to extend
the base implementation with your own data:
def gather_traces(self) -> Dict[str, Any]:
return {
**super().gather_traces(),
"my_field": self._my_data,
"execution_count": len(self._history)
}
If you don't need custom tracing, you can use the default implementation without overriding (it will still return basic metadata about your component).
ConfigurableMixin
Mixin that provides configuration gathering capability to any component.
Classes that inherit from ConfigurableMixin can be registered with a Benchmark instance and will have their configurations automatically collected before evaluation.
The gather_config() method provides a default implementation that returns
basic metadata (component type and timestamp). Subclasses can override or
extend this method to include component-specific configuration information
(model parameters, agent settings, tool specifications, etc.). The returned
dictionary must be JSON-serializable.
How to use
All core MASEval components (AgentAdapter, ModelAdapter, Environment, User, LLMSimulator, BenchmarkCallback, etc.) inherit from ConfigurableMixin by default and provide comprehensive configuration out of the box.
For custom components, simply inherit from ConfigurableMixin and optionally
extend the gather_config() method to add your own configuration data:
class MyCustomTool(ConfigurableMixin):
def __init__(self, temperature: float = 0.7, max_retries: int = 3):
self.temperature = temperature
self.max_retries = max_retries
def gather_config(self) -> Dict[str, Any]:
return {
**super().gather_config(),
"temperature": self.temperature,
"max_retries": self.max_retries,
"version": "1.0.0"
}
Then register it with your benchmark:
benchmark = MyBenchmark(tasks, agent_data)
tool = MyCustomTool(temperature=0.9)
benchmark.register("tool", "my_tool", tool)
Thread Safety
Configuration collection happens synchronously in the main thread after all
task execution completes. The gather_config() method is called sequentially
and should return static configuration data (not runtime state).
Note
Components should expose their configuration through instance variables or properties that can be accessed during configuration gathering.
gather_config
gather_config() -> Dict[str, Any]
Gather configuration from this component.
Provides a default implementation that returns basic metadata about the component (type and collection timestamp). Subclasses should extend this method to include their own configuration data.
This method is called by the Benchmark before evaluation to collect all configuration information. The returned dictionary must be JSON-serializable.
Output fields:
type- Component class namegathered_at- ISO timestamp of when config was collected
Subclasses typically add additional component-specific configuration.
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing configuration with standardized structure. |
How to use
Override this method and call super().gather_config() to extend
the base implementation with your own data:
def gather_config(self) -> Dict[str, Any]:
return {
**super().gather_config(),
"model_name": self.model_name,
"temperature": self.temperature,
"max_tokens": self.max_tokens
}
If you don't need custom configuration tracking, you can use the default implementation without overriding (it will still return basic metadata about your component).