Skip to content

Callback

Callbacks allow you to hook into benchmark execution at various points. Use them for logging, monitoring, tracing, or custom side effects during agent runs.

View source

BenchmarkCallback

Bases: ABC, TraceableMixin

Base class for benchmark callbacks.

gather_traces

gather_traces() -> dict[str, Any]

Gather execution traces from this callback.

By default, callbacks don't store traces, but subclasses can override this to provide custom tracing data.

RETURNS DESCRIPTION
dict[str, Any]

Dictionary with basic callback information. Subclasses should

dict[str, Any]

extend this with their own data.

on_event

on_event(event_name: str, **data) -> None

Handle a generic event.

EnvironmentCallback

Bases: ABC, TraceableMixin

Base class for environment callbacks.

gather_traces

gather_traces() -> dict[str, Any]

Gather execution traces from this callback.

By default, callbacks don't store traces, but subclasses can override this to provide custom tracing data.

RETURNS DESCRIPTION
dict[str, Any]

Dictionary with basic callback information. Subclasses should

dict[str, Any]

extend this with their own data.

on_event

on_event(event_name: str, **data) -> None

Handle a generic event.

AgentCallback

Bases: ABC, TraceableMixin

Base class for agent callbacks.

gather_traces

gather_traces() -> dict[str, Any]

Gather execution traces from this callback.

By default, callbacks don't store traces, but subclasses can override this to provide custom tracing data.

RETURNS DESCRIPTION
dict[str, Any]

Dictionary with basic callback information. Subclasses should

dict[str, Any]

extend this with their own data.

on_event

on_event(event_name: str, **data) -> None

Handle a generic event.

Built-in Callbacks

MASEval provides built-in callback implementations:

Message Tracing

View source

MessageTracingAgentCallback

Bases: AgentCallback

Callback that traces all agent messages to memory.

This callback is useful for: - Frameworks that don't provide built-in message history - Debugging agent behavior - Creating datasets from agent runs - Monitoring multi-agent systems

The callback collects all message history from agents after each run.

Example
from maseval import AgentAdapter
from maseval.core.callbacks.message_tracing import MessageTracingAgentCallback

# Create callback
tracer = MessageTracingAgentCallback(include_metadata=True, verbose=True)

# Use with agent
agent_adapter = MyAgentAdapter(agent, name="agent1", callbacks=[tracer])
agent_adapter.run("What's the weather?")

# Access traced conversations
for conversation in tracer.get_all_conversations():
    print(f"Agent: {conversation['agent_name']}")
    print(f"Query: {conversation['query']}")
    print(f"Messages: {len(conversation['messages'])}")

__init__

__init__(
    include_metadata: bool = True, verbose: bool = False
)

Initialize the message tracing callback.

PARAMETER DESCRIPTION
include_metadata

If True, include timestamps and metadata in traces

TYPE: bool DEFAULT: True

verbose

If True, print tracing information to console

TYPE: bool DEFAULT: False

clear

clear() -> None

Clear all conversations from memory.

gather_traces

gather_traces() -> dict[str, Any]

Gather execution traces from this callback.

By default, callbacks don't store traces, but subclasses can override this to provide custom tracing data.

RETURNS DESCRIPTION
dict[str, Any]

Dictionary with basic callback information. Subclasses should

dict[str, Any]

extend this with their own data.

get_all_conversations

get_all_conversations() -> List[Dict[str, Any]]

Get all traced conversations from memory.

RETURNS DESCRIPTION
List[Dict[str, Any]]

List of conversation dictionaries

get_conversations_by_agent

get_conversations_by_agent(
    agent_name: str,
) -> List[Dict[str, Any]]

Get all conversations for a specific agent.

PARAMETER DESCRIPTION
agent_name

Name of the agent to filter by

TYPE: str

RETURNS DESCRIPTION
List[Dict[str, Any]]

List of conversation dictionaries for the specified agent

get_statistics

get_statistics() -> Dict[str, Any]

Get statistics about traced conversations.

RETURNS DESCRIPTION
Dict[str, Any]

Dictionary with statistics

on_event

on_event(event_name: str, **data) -> None

Handle a generic event.

on_run_end

on_run_end(agent: AgentAdapter, result: Any) -> None

Called when agent execution completes.

PARAMETER DESCRIPTION
agent

The agent adapter instance

TYPE: AgentAdapter

result

The result returned by the agent (usually MessageHistory)

TYPE: Any

on_run_start

on_run_start(agent: AgentAdapter) -> None

Called when agent execution starts.

Note: We don't have access to the query here in the current implementation, so we'll capture it in on_run_end from the result.

Result Logging

View source

ResultLogger

Bases: BenchmarkCallback, ABC

Abstract base class for logging benchmark results to various backends.

This class provides a framework for implementing result loggers that: - Write results incrementally after each task iteration (repeat) - Track expected vs actual logged iterations - Validate completeness at benchmark end - Support selective logging of traces, config, and eval results

Subclasses implement specific backends (file, wandb, opentelemetry, etc.) by overriding the abstract methods.

ATTRIBUTE DESCRIPTION
include_traces

Whether to include execution traces in logged results

include_config

Whether to include configuration in logged results

include_eval

Whether to include evaluation results in logged results

include_usage

Whether to include API usage data in logged results

validate_on_completion

Whether to validate all iterations were logged

Example
class MyLogger(ResultLogger):
    def log_iteration(self, report: Dict) -> None:
        # Write report to backend
        pass

    def finalize(self) -> None:
        # Close connections, flush buffers
        pass

    def validate(self) -> bool:
        # Check all iterations present
        return True

logger = MyLogger(include_traces=True)
benchmark = MyBenchmark(tasks, agent_data, callbacks=[logger])
benchmark.run()

__init__

__init__(
    include_traces: bool = True,
    include_config: bool = True,
    include_eval: bool = True,
    include_task: bool = True,
    include_usage: bool = True,
    validate_on_completion: bool = True,
)

Initialize the result logger.

PARAMETER DESCRIPTION
include_traces

If True, include execution traces in logged results

TYPE: bool DEFAULT: True

include_config

If True, include configuration in logged results

TYPE: bool DEFAULT: True

include_eval

If True, include evaluation results in logged results

TYPE: bool DEFAULT: True

include_task

If True, include task data (query, metadata, protocol) in logged results

TYPE: bool DEFAULT: True

include_usage

If True, include API usage data in logged results

TYPE: bool DEFAULT: True

validate_on_completion

If True, validate all iterations were logged at end

TYPE: bool DEFAULT: True

finalize abstractmethod

finalize() -> None

Finalize logging operations.

Called at benchmark end. Implementations should: - Close file handles - Flush buffers - Close network connections - Write metadata files - Perform any cleanup operations

RAISES DESCRIPTION
Exception

If finalization fails (will be caught and re-raised by base class)

gather_traces

gather_traces() -> dict[str, Any]

Gather execution traces from this callback.

By default, callbacks don't store traces, but subclasses can override this to provide custom tracing data.

RETURNS DESCRIPTION
dict[str, Any]

Dictionary with basic callback information. Subclasses should

dict[str, Any]

extend this with their own data.

log_iteration abstractmethod

log_iteration(report: Dict) -> None

Log a single task iteration to the backend.

This method is called after each task repeat completes. Implementations should write the report to their specific backend (file, API, etc.).

PARAMETER DESCRIPTION
report

Filtered report dict containing task_id, repeat_idx, and optionally traces, config, and eval based on include flags

TYPE: Dict

RAISES DESCRIPTION
Exception

If logging fails (will be caught and re-raised by base class)

on_event

on_event(event_name: str, **data) -> None

Handle a generic event.

on_run_end

on_run_end(
    benchmark: Benchmark, results: List[Dict]
) -> None

Called when benchmark execution completes.

Finalizes logging and optionally validates completeness.

PARAMETER DESCRIPTION
benchmark

The benchmark instance

TYPE: Benchmark

results

List of all result reports from the benchmark

TYPE: List[Dict]

on_run_start

on_run_start(benchmark: Benchmark) -> None

Called when benchmark execution starts.

Records the expected number of tasks and repeats for validation.

PARAMETER DESCRIPTION
benchmark

The benchmark instance

TYPE: Benchmark

on_task_repeat_end

on_task_repeat_end(
    benchmark: Benchmark, report: Dict
) -> None

Called after each task iteration completes.

Filters the report based on include flags, logs it, and tracks the iteration.

PARAMETER DESCRIPTION
benchmark

The benchmark instance

TYPE: Benchmark

report

The complete report dict with task_id, repeat_idx, traces, config, eval

TYPE: Dict

validate abstractmethod

validate() -> bool

Validate that all expected iterations were logged correctly.

Called at benchmark end if validate_on_completion is True. Implementations should verify: - All expected iterations are present - No duplicate iterations exist - Data integrity is maintained

RETURNS DESCRIPTION
bool

True if validation passes, False otherwise

FileResultLogger

Bases: ResultLogger

Logger that writes benchmark results incrementally to JSONL files.

This logger writes each task iteration to a JSONL file (one JSON object per line) as soon as it completes. This provides: - Recovery from crashes: partial results are preserved - Streaming analysis: results can be read while benchmark is running - Safe concurrent reads: JSONL format is line-atomic - Validation: ensures all expected iterations were written

The logger uses atomic writes (write to temp file, then rename) to prevent file corruption from crashes or interruptions.

ATTRIBUTE DESCRIPTION
output_dir

Directory where result files will be written

filename_pattern

Pattern for result filename (supports {timestamp})

write_metadata

Whether to write a metadata file with benchmark info

atomic_writes

Whether to use atomic writes (recommended)

Example
from maseval.core.callbacks.result_logger import FileResultLogger

# Basic usage
logger = FileResultLogger(output_dir="./results")

# Custom configuration
logger = FileResultLogger(
    output_dir="./results",
    filename_pattern="benchmark_{timestamp}.jsonl",
    include_traces=True,
    include_config=True,
    validate_on_completion=True
)

# Use with benchmark
benchmark = MyBenchmark(
    tasks=tasks,
    agent_data=agent_data,
    callbacks=[logger]
)
results = benchmark.run()

# Results are written to: ./results/benchmark_20251028_143022.jsonl

__init__

__init__(
    output_dir: Path | str = "./results",
    filename_pattern: str = "benchmark_{timestamp}.jsonl",
    write_metadata: bool = True,
    atomic_writes: bool = True,
    overwrite: bool = False,
    include_traces: bool = True,
    include_config: bool = True,
    include_eval: bool = True,
    include_task: bool = True,
    include_usage: bool = True,
    validate_on_completion: bool = True,
)

Initialize the file logger.

PARAMETER DESCRIPTION
output_dir

Directory where result files will be written (created if needed). Accepts either a Path object or a string path.

TYPE: Path | str DEFAULT: './results'

filename_pattern

Pattern for result filename. Use {timestamp} for automatic timestamp insertion (format: YYYYMMDD_HHMMSS)

TYPE: str DEFAULT: 'benchmark_{timestamp}.jsonl'

write_metadata

If True, write a metadata file alongside results

TYPE: bool DEFAULT: True

atomic_writes

If True, use atomic writes (write to temp, then rename)

TYPE: bool DEFAULT: True

overwrite

If True, overwrite existing files. If False, raise an error when the output file already exists.

TYPE: bool DEFAULT: False

include_traces

If True, include execution traces in logged results

TYPE: bool DEFAULT: True

include_config

If True, include configuration in logged results

TYPE: bool DEFAULT: True

include_eval

If True, include evaluation results in logged results

TYPE: bool DEFAULT: True

include_task

If True, include task data (query, metadata, protocol) in logged results

TYPE: bool DEFAULT: True

include_usage

If True, include API usage data in logged results

TYPE: bool DEFAULT: True

validate_on_completion

If True, validate all iterations were logged

TYPE: bool DEFAULT: True

finalize

finalize() -> None

Finalize logging by closing files and writing metadata.

RAISES DESCRIPTION
IOError

If file operations fail

gather_traces

gather_traces() -> dict[str, Any]

Gather execution traces from this callback.

By default, callbacks don't store traces, but subclasses can override this to provide custom tracing data.

RETURNS DESCRIPTION
dict[str, Any]

Dictionary with basic callback information. Subclasses should

dict[str, Any]

extend this with their own data.

log_iteration

log_iteration(report: Dict) -> None

Log a single task iteration to the JSONL file.

PARAMETER DESCRIPTION
report

Filtered report dict to write

TYPE: Dict

RAISES DESCRIPTION
IOError

If writing to file fails

on_event

on_event(event_name: str, **data) -> None

Handle a generic event.

on_run_end

on_run_end(
    benchmark: Benchmark, results: List[Dict]
) -> None

Called when benchmark execution completes.

Finalizes logging and optionally validates completeness.

PARAMETER DESCRIPTION
benchmark

The benchmark instance

TYPE: Benchmark

results

List of all result reports from the benchmark

TYPE: List[Dict]

on_run_start

on_run_start(benchmark: Benchmark) -> None

Called when benchmark execution starts.

Records the expected number of tasks and repeats for validation.

PARAMETER DESCRIPTION
benchmark

The benchmark instance

TYPE: Benchmark

on_task_repeat_end

on_task_repeat_end(
    benchmark: Benchmark, report: Dict
) -> None

Called after each task iteration completes.

Filters the report based on include flags, logs it, and tracks the iteration.

PARAMETER DESCRIPTION
benchmark

The benchmark instance

TYPE: Benchmark

report

The complete report dict with task_id, repeat_idx, traces, config, eval

TYPE: Dict

validate

validate() -> bool

Validate that all expected iterations were written to file.

Checks: 1. Number of lines matches number of logged iterations 2. All expected iterations are present 3. No duplicate iterations exist

RETURNS DESCRIPTION
bool

True if validation passes, False otherwise

Progress Bars

View source

ProgressBarCallback

Bases: BenchmarkCallback, ABC

Abstract base class for progress bar callbacks.

Displays benchmark execution progress including overall completion, success rate, time elapsed/remaining, and custom metrics. Automatically tracks benchmark execution and updates the progress bar as tasks complete.

Use TqdmProgressBarCallback or RichProgressBarCallback directly, or subclass them to customize metric display.

User-facing methods:

  • set_metrics(**metrics): Manually update displayed metrics
  • update_metrics(report): Override to automatically extract metrics from task reports
Example
from maseval.core.callbacks.progress_bar import TqdmProgressBarCallback

# Option 1: Use directly with manual metric updates
progress_bar = TqdmProgressBarCallback()
benchmark = MyBenchmark(callbacks=[progress_bar])
benchmark.run(tasks)
progress_bar.set_metrics(accuracy="95.2%", avg_score="0.87")

# Option 2: Subclass to automatically extract metrics from reports
class MyProgressBar(TqdmProgressBarCallback):
    def update_metrics(self, report):
        if "evaluation_result" in report:
            return {"accuracy": f"{report['evaluation_result']['acc']:.1%}"}
        return {}

progress_bar = MyProgressBar()
benchmark = MyBenchmark(callbacks=[progress_bar])
benchmark.run(tasks)  # Metrics auto-update after each task
PARAMETER DESCRIPTION
desc

Custom description. Defaults to "Running {BenchmarkClassName}"

TYPE: Optional[str] DEFAULT: None

show_status

Whether to display success counter (X/Y Successful)

TYPE: bool DEFAULT: True

gather_traces

gather_traces() -> dict[str, Any]

Gather execution traces from this callback.

By default, callbacks don't store traces, but subclasses can override this to provide custom tracing data.

RETURNS DESCRIPTION
dict[str, Any]

Dictionary with basic callback information. Subclasses should

dict[str, Any]

extend this with their own data.

on_event

on_event(event_name: str, **data) -> None

Handle a generic event.

on_run_end

on_run_end(
    benchmark: Benchmark, results: List[Dict]
) -> None

Called by benchmark framework when run completes.

on_run_start

on_run_start(benchmark: Benchmark) -> None

Called by benchmark framework when run starts.

on_task_repeat_end

on_task_repeat_end(
    benchmark: Benchmark, report: Dict
) -> None

Called by benchmark framework when a task repeat completes.

set_metrics

set_metrics(**metrics: str) -> None

Manually update custom metrics displayed in the progress bar.

Call this method to set or update metrics at any time during benchmark execution. The progress bar will immediately reflect the changes.

PARAMETER DESCRIPTION
**metrics

Key-value pairs to display (e.g., accuracy="95%", loss="0.23")

TYPE: str DEFAULT: {}

Example
progress_bar = TqdmProgressBarCallback()
benchmark = MyBenchmark(callbacks=[progress_bar])

# Update metrics during or after execution
progress_bar.set_metrics(accuracy="95%", f1="0.87")
progress_bar.set_metrics(avg_loss="0.23")  # Updates/adds metrics

update_metrics

update_metrics(report: Dict) -> Dict[str, str]

Extract and return custom metrics from task execution reports.

Override this method in a subclass to automatically display metrics extracted from benchmark task reports. Called by the framework after each task completes.

The default implementation returns an empty dict (no automatic metrics). Use set_metrics() instead if you prefer manual metric updates.

PARAMETER DESCRIPTION
report

Task execution report containing status, results, and evaluation data. Common keys include "status", "evaluation_result", "agent_response".

TYPE: Dict

RETURNS DESCRIPTION
Dict[str, str]

Dictionary mapping metric names to string values for display.

Dict[str, str]

Return empty dict {} if no metrics should be added.

Example
class MyProgressBar(TqdmProgressBarCallback):
    def update_metrics(self, report):
        # Extract metrics from evaluation results
        if "evaluation_result" in report:
            result = report["evaluation_result"]
            return {
                "accuracy": f"{result['accuracy']:.1%}",
                "f1": f"{result['f1']:.2f}"
            }
        return {}  # No metrics for this report

progress_bar = MyProgressBar()
benchmark = MyBenchmark(callbacks=[progress_bar])
benchmark.run(tasks)  # Metrics auto-update after each task

TqdmProgressBarCallback

Bases: ProgressBarCallback

Progress bar callback using tqdm (recommended default).

Simple text-based progress bar that works in terminals and Jupyter notebooks. Displays task completion, success rate, and custom metrics.

Example
from maseval.core.callbacks.progress_bar import TqdmProgressBarCallback

# Basic usage
progress_bar = TqdmProgressBarCallback()
benchmark = MyBenchmark(callbacks=[progress_bar])
benchmark.run(tasks)

# With custom description and metrics
progress_bar = TqdmProgressBarCallback(desc="Evaluating agents")
progress_bar.set_metrics(accuracy="95%", f1="0.87")
benchmark.run(tasks)
PARAMETER DESCRIPTION
desc

Custom description (defaults to "Running {BenchmarkClassName}")

TYPE: Optional[str] DEFAULT: None

show_status

Show success counter (default: True)

TYPE: bool DEFAULT: True

leave

Keep bar visible after completion (default: True)

TYPE: bool DEFAULT: True

ncols

Width in characters (default: auto)

TYPE: Optional[int] DEFAULT: None

bar_format

Custom tqdm format string (default: None)

TYPE: Optional[str] DEFAULT: None

gather_traces

gather_traces() -> dict[str, Any]

Gather execution traces from this callback.

By default, callbacks don't store traces, but subclasses can override this to provide custom tracing data.

RETURNS DESCRIPTION
dict[str, Any]

Dictionary with basic callback information. Subclasses should

dict[str, Any]

extend this with their own data.

on_event

on_event(event_name: str, **data) -> None

Handle a generic event.

on_run_end

on_run_end(
    benchmark: Benchmark, results: List[Dict]
) -> None

Called by benchmark framework when run completes.

on_run_start

on_run_start(benchmark: Benchmark) -> None

Called by benchmark framework when run starts.

on_task_repeat_end

on_task_repeat_end(
    benchmark: Benchmark, report: Dict
) -> None

Called by benchmark framework when a task repeat completes.

set_metrics

set_metrics(**metrics: str) -> None

Manually update custom metrics displayed in the progress bar.

Call this method to set or update metrics at any time during benchmark execution. The progress bar will immediately reflect the changes.

PARAMETER DESCRIPTION
**metrics

Key-value pairs to display (e.g., accuracy="95%", loss="0.23")

TYPE: str DEFAULT: {}

Example
progress_bar = TqdmProgressBarCallback()
benchmark = MyBenchmark(callbacks=[progress_bar])

# Update metrics during or after execution
progress_bar.set_metrics(accuracy="95%", f1="0.87")
progress_bar.set_metrics(avg_loss="0.23")  # Updates/adds metrics

update_metrics

update_metrics(report: Dict) -> Dict[str, str]

Extract and return custom metrics from task execution reports.

Override this method in a subclass to automatically display metrics extracted from benchmark task reports. Called by the framework after each task completes.

The default implementation returns an empty dict (no automatic metrics). Use set_metrics() instead if you prefer manual metric updates.

PARAMETER DESCRIPTION
report

Task execution report containing status, results, and evaluation data. Common keys include "status", "evaluation_result", "agent_response".

TYPE: Dict

RETURNS DESCRIPTION
Dict[str, str]

Dictionary mapping metric names to string values for display.

Dict[str, str]

Return empty dict {} if no metrics should be added.

Example
class MyProgressBar(TqdmProgressBarCallback):
    def update_metrics(self, report):
        # Extract metrics from evaluation results
        if "evaluation_result" in report:
            result = report["evaluation_result"]
            return {
                "accuracy": f"{result['accuracy']:.1%}",
                "f1": f"{result['f1']:.2f}"
            }
        return {}  # No metrics for this report

progress_bar = MyProgressBar()
benchmark = MyBenchmark(callbacks=[progress_bar])
benchmark.run(tasks)  # Metrics auto-update after each task

RichProgressBarCallback

Bases: ProgressBarCallback

Progress bar callback using rich library.

Visually enhanced progress bar with color formatting, rich text support, and improved aesthetics. Requires the rich library to be installed.

Example
from maseval.core.callbacks.progress_bar import RichProgressBarCallback

# Basic usage
progress_bar = RichProgressBarCallback()
benchmark = MyBenchmark(callbacks=[progress_bar])
benchmark.run(tasks)

# With custom metrics
progress_bar = RichProgressBarCallback(desc="Benchmarking")
progress_bar.set_metrics(avg_score="0.89", correct="42/50")
benchmark.run(tasks)
PARAMETER DESCRIPTION
desc

Custom description (defaults to "Running {BenchmarkClassName}")

TYPE: Optional[str] DEFAULT: None

show_status

Show colored success counter (default: True)

TYPE: bool DEFAULT: True

transient

Remove bar after completion (default: False)

TYPE: bool DEFAULT: False

gather_traces

gather_traces() -> dict[str, Any]

Gather execution traces from this callback.

By default, callbacks don't store traces, but subclasses can override this to provide custom tracing data.

RETURNS DESCRIPTION
dict[str, Any]

Dictionary with basic callback information. Subclasses should

dict[str, Any]

extend this with their own data.

on_event

on_event(event_name: str, **data) -> None

Handle a generic event.

on_run_end

on_run_end(
    benchmark: Benchmark, results: List[Dict]
) -> None

Called by benchmark framework when run completes.

on_run_start

on_run_start(benchmark: Benchmark) -> None

Called by benchmark framework when run starts.

on_task_repeat_end

on_task_repeat_end(
    benchmark: Benchmark, report: Dict
) -> None

Called by benchmark framework when a task repeat completes.

set_metrics

set_metrics(**metrics: str) -> None

Manually update custom metrics displayed in the progress bar.

Call this method to set or update metrics at any time during benchmark execution. The progress bar will immediately reflect the changes.

PARAMETER DESCRIPTION
**metrics

Key-value pairs to display (e.g., accuracy="95%", loss="0.23")

TYPE: str DEFAULT: {}

Example
progress_bar = TqdmProgressBarCallback()
benchmark = MyBenchmark(callbacks=[progress_bar])

# Update metrics during or after execution
progress_bar.set_metrics(accuracy="95%", f1="0.87")
progress_bar.set_metrics(avg_loss="0.23")  # Updates/adds metrics

update_metrics

update_metrics(report: Dict) -> Dict[str, str]

Extract and return custom metrics from task execution reports.

Override this method in a subclass to automatically display metrics extracted from benchmark task reports. Called by the framework after each task completes.

The default implementation returns an empty dict (no automatic metrics). Use set_metrics() instead if you prefer manual metric updates.

PARAMETER DESCRIPTION
report

Task execution report containing status, results, and evaluation data. Common keys include "status", "evaluation_result", "agent_response".

TYPE: Dict

RETURNS DESCRIPTION
Dict[str, str]

Dictionary mapping metric names to string values for display.

Dict[str, str]

Return empty dict {} if no metrics should be added.

Example
class MyProgressBar(TqdmProgressBarCallback):
    def update_metrics(self, report):
        # Extract metrics from evaluation results
        if "evaluation_result" in report:
            result = report["evaluation_result"]
            return {
                "accuracy": f"{result['accuracy']:.1%}",
                "f1": f"{result['f1']:.2f}"
            }
        return {}  # No metrics for this report

progress_bar = MyProgressBar()
benchmark = MyBenchmark(callbacks=[progress_bar])
benchmark.run(tasks)  # Metrics auto-update after each task