SmolAgents

Adapter implementing commonly used functions for the HuggingFace smolagents package.

Installation

pip install maseval[smolagents]

Alternatively, install smolagents directly:

pip install smolagents

API Reference

View source

SmolAgentAdapter

Bases: AgentAdapter

An AgentAdapter for HuggingFace smolagents MultiStepAgent.

This adapter integrates smolagents' MultiStepAgent with MASEval's benchmarking framework, converting smolagents' internal message format to OpenAI-compatible MessageHistory format. It automatically tracks tool calls, tool responses, agent reasoning steps, and provides comprehensive execution monitoring through smolagents' built-in memory system.

The adapter leverages smolagents' native memory storage as the source of truth, dynamically fetching messages, logs, and execution traces from the agent's internal state. This ensures accurate tracking of tool usage, timing, and token consumption without additional overhead.

How to use

Create a smolagents agent with tools and configuration
Wrap with SmolAgentAdapter to enable MASEval integration
Use in benchmarks or call directly for testing
Access traces and config for analysis and debugging

Example workflow:

from maseval.interface.agents.smolagents import SmolAgentAdapter
from smolagents import MultiStepAgent, ToolCallingAgent
from smolagents.tools import DuckDuckGoSearchTool

# Create a smolagents agent
agent = ToolCallingAgent(
    tools=[DuckDuckGoSearchTool()],
    model="gpt-4",
    max_steps=10
)

# Wrap with adapter
agent_adapter = SmolAgentAdapter(agent, name="search_agent")

# Run agent
result = agent_adapter.run("What's the latest news on AI?")

# Access message history in OpenAI format
for msg in agent_adapter.get_messages():
    print(f"{msg['role']}: {msg['content']}")

# Gather aggregated usage
usage = agent_adapter.gather_usage()
print(f"Total tokens: {usage.total_tokens}")

# Gather execution traces with timing
traces = agent_adapter.gather_traces()
print(f"Total duration: {traces['total_duration_seconds']}s")

# Use in benchmark
benchmark = MyBenchmark(agent_data={"agent": agent_adapter})
results = benchmark.run(tasks)

The adapter automatically converts smolagents' ActionStep and PlanningStep objects into structured logs, preserving timing, token usage, tool calls, and error information.

Message Format

smolagents uses its own message format. The adapter converts to maseval / OpenAI format.

Tool calls are preserved with their IDs, names, and arguments.

Execution Monitoring

The adapter provides comprehensive monitoring through gather_traces():

Token usage: Input, output, and total tokens per step and aggregated
Timing: Duration per step and total execution time
Tool calls: Complete tool call history with arguments and results
Errors: Error tracking with type and message
Observations: Tool outputs and agent observations

Requires

smolagents to be installed: pip install maseval[smolagents]

logs `property`

logs: List[Dict[str, Any]]

Dynamically generate logs from smolagents' internal memory.

Converts smolagents' ActionStep and PlanningStep objects into log entries compatible with the AgentAdapter contract, including all available properties.

RETURNS	DESCRIPTION
`List[Dict[str, Any]]`	List of log dictionaries with comprehensive step information

init

__init__(
    agent_instance: Any,
    name: str,
    callbacks: Any = None,
    cost_calculator: Optional[CostCalculator] = None,
    model_id: Optional[str] = None,
)

Initialize the Smolagent adapter.

Note: We don't call super().init() to avoid initializing self.logs as a list, since we override it as a property that dynamically fetches from agent.memory.

PARAMETER	DESCRIPTION
`agent_instance`	smolagents MultiStepAgent or similar TYPE: `Any`
`name`	Agent name for identification TYPE: `str`
`callbacks`	Optional list of AgentCallback instances TYPE: `Any` DEFAULT: `None`
`cost_calculator`	Optional cost calculator. If not provided, a `LiteLLMCostCalculator` is created automatically when litellm is available. TYPE: `Optional[CostCalculator]` DEFAULT: `None`
`model_id`	Optional model ID for cost calculation. If not provided, auto-detected from `agent.model.model_id`. TYPE: `Optional[str]` DEFAULT: `None`

gather_config

gather_config() -> dict[str, Any]

Gather configuration from this SmolAgent.

Integrates with smolagents' native configuration system by accessing the agent's to_dict() method which includes comprehensive config data.

RETURNS	DESCRIPTION
`dict[str, Any]`	Dictionary containing:
`dict[str, Any]`	type: Component class name
`dict[str, Any]`	gathered_at: ISO timestamp
`dict[str, Any]`	name: Agent name
`dict[str, Any]`	agent_type: Underlying agent class name
`dict[str, Any]`	adapter_type: SmolAgentAdapter
`dict[str, Any]`	callbacks: List of callback class names
`dict[str, Any]`	smolagents_config: Full configuration from agent.to_dict() including: model: Model configuration with class and parameters tools: List of tool configurations max_steps: Maximum number of steps planning_interval: Planning interval (if set) verbosity_level: Logging verbosity additional_authorized_imports: Additional imports (CodeAgent only) executor_type: Code executor type (CodeAgent only) managed_agents: List of managed agent configs (if any)

gather_traces

gather_traces() -> dict

Gather traces including message history and monitoring data.

Extends the base class to include smolagents' per-step monitoring data (token usage, timing, actions, observations). Aggregated usage totals are available via gather_usage().

RETURNS	DESCRIPTION
`dict`	Dict containing messages and per-step monitoring statistics.

gather_usage

gather_usage() -> Usage

Gather usage with automatic cost calculation.

Calls _gather_usage() for raw token counts, then applies the cost calculator if one is available and cost is still 0.0.

The model_id used for cost calculation is resolved in order:

Explicit model_id passed to __init__
Auto-detected from the framework agent via _resolve_model_id()

Subclasses should override _gather_usage() (not this method) to provide framework-specific token extraction.

RETURNS	DESCRIPTION
`Usage`	Usage (or TokenUsage) with cost filled in when possible.

get_messages

get_messages() -> MessageHistory

Get message history by converting from smolagents memory.

This method dynamically fetches messages from the agent's internal memory and converts them to MASEval format.

RETURNS	DESCRIPTION
`MessageHistory`	MessageHistory with converted messages from smolagents

run

run(query: str) -> Any

Executes the agent and returns the result.

SmolAgentLLMUser

Bases: LLMUser

A smolagents-specific LLM user that provides a tool for user interaction.

Extends LLMUser to provide a smolagents-compatible tool via get_tool(). Requires smolagents to be installed.

Example

from maseval.interface.agents.smolagents import SmolAgentLLMUser

user = SmolAgentLLMUser(...)
tool = user.get_tool()  # Returns a SmolAgentUserSimulationInputTool

termination_reason `property`

termination_reason: TerminationReason

Get the reason why the user interaction terminated.

RETURNS	DESCRIPTION
`TerminationReason`	Why `is_done()` returns True, or `NOT_TERMINATED` if still ongoing.

init

__init__(
    name: str,
    model: ModelAdapter,
    user_profile: Dict[str, Any],
    scenario: str,
    initial_query: Optional[str] = None,
    template: Optional[str] = None,
    max_try: int = 3,
    max_turns: int = 1,
    stop_tokens: Optional[List[str]] = None,
    early_stopping_condition: Optional[str] = None,
    exhausted_response: Optional[str] = None,
)

Initialize the LLMUser.

PARAMETER	DESCRIPTION
`name`	The name of the user. TYPE: `str`
`model`	The language model to be used for generating responses. TYPE: `ModelAdapter`
`user_profile`	A dictionary describing the user's persona, preferences, and other relevant information. TYPE: `Dict[str, Any]`
`scenario`	A description of the situation or task the user is trying to accomplish. TYPE: `str`
`initial_query`	A pre-set query to start the conversation. If provided, it becomes the first user message. If None, call get_initial_query() to generate one from the model based on the user profile and scenario. Defaults to None. TYPE: `Optional[str]` DEFAULT: `None`
`template`	A custom prompt template for the user simulator. Defaults to None. TYPE: `Optional[str]` DEFAULT: `None`
`max_try`	The maximum number of attempts for the simulator to generate a valid response. Defaults to 3. TYPE: `int` DEFAULT: `3`
`max_turns`	Maximum number of user messages in the conversation. Each user message counts as one turn, including the initial_query. Use max_turns=1 for single-turn benchmarks, or higher values for multi-turn interaction. Defaults to 1. TYPE: `int` DEFAULT: `1`
`stop_tokens`	List of tokens that signal user satisfaction, enabling early termination. When the user's LLM-generated response contains any of these tokens, is_done() returns True regardless of remaining turns. The matched token is stripped from the response. Defaults to None (early stopping disabled). TYPE: `Optional[List[str]]` DEFAULT: `None`
`early_stopping_condition`	A description of when the user should stop the conversation (e.g., "all goals have been accomplished"). Used with stop_tokens to instruct the LLM when to emit a stop token. Must be provided if stop_tokens is set. Defaults to None. TYPE: `Optional[str]` DEFAULT: `None`
`exhausted_response`	Message to return when `respond()` is called after the user is done. If `None` (default), raises `UserExhaustedError` instead. Set this to a descriptive string (e.g., `"The user is no longer available. Proceed with the information you have."`) for tool-based integrations where the agent controls when to call the user. TYPE: `Optional[str]` DEFAULT: `None`

RAISES	DESCRIPTION
`ValueError`	If stop_tokens is set but early_stopping_condition is not provided.

gather_config

gather_config() -> Dict[str, Any]

Gather configuration from this user.

Output fields:

name - User identifier
profile - User profile data
scenario - Task scenario description
max_turns - Maximum interaction turns
stop_tokens - Early stopping tokens (empty list if disabled)
exhausted_response - Message returned when user is done, or None

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary containing user configuration.

gather_traces

gather_traces() -> Dict[str, Any]

Gather execution traces from this user.

Output fields:

name - User identifier
profile - User profile data
message_count - Number of messages in history
messages - Full conversation history
logs - Execution logs with timing
termination_reason - Why interaction ended (see TerminationReason)
stop_reason - Which stop token triggered termination, if any
max_turns - Maximum allowed turns
turns_used - Actual turns used
stopped_by_user - Whether user emitted a stop token

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary containing user state and interaction data.

get_initial_query

get_initial_query() -> str

Get the initial query for the conversation.

If an initial_query was provided at construction, returns it. Otherwise, generates one using the LLM simulator based on the user's profile and scenario.

This method: - Returns the existing initial query if one was provided - Or calls the LLM simulator to generate one - Ensures the query is in the message history - Counts the initial query as the first turn

RETURNS	DESCRIPTION
`str`	The initial query (either pre-set or LLM-generated).

RAISES	DESCRIPTION
`RuntimeError`	If called after conversation has progressed beyond the initial message.

get_tool

get_tool() -> Any

Get a smolagents-compatible tool for user interaction.

Returns a SmolAgentUserSimulationInputTool instance that wraps this user and can be passed directly to a smolagents agent.

RETURNS	DESCRIPTION
`Any`	A tool instance compatible with smolagents that simulates user responses.

Example

user = SmolAgentLLMUser(model=model, persona="...", scenario="...")
tool = user.get_tool()
agent = CodeAgent(tools=[tool, ...], model=model)

increment_turn

increment_turn() -> None

Increment the turn counter.

Call this after recording a user response in the message history.

is_done

is_done() -> bool

Check if the user interaction should end.

Checks: 1. If max_turns has been reached 2. If the user previously indicated termination (via stop_token)

Subclasses can override to add custom termination logic (e.g., LLM-based satisfaction checks) by calling super().is_done() first.

RETURNS	DESCRIPTION
`bool`	True if the user is done interacting, False to continue.

respond

respond(message: str) -> str

Respond to a message from the agent using LLM simulation.

This method appends the agent's message to the conversation history, generates a response using the LLM simulator, appends the response to the history, and returns it.

If a stop_token is detected in the response, triggers early stopping.

PARAMETER	DESCRIPTION
`message`	The message from the agent to which the user should respond. TYPE: `str`

RETURNS	DESCRIPTION
`str`	The user's response, or `exhausted_response` if done and configured.

RAISES	DESCRIPTION
`UserExhaustedError`	If the user is already done and no `exhausted_response` is configured.

View source

SmolAgentUserSimulationInputTool

Bases: UserInputTool

A tool that simulates user input for smolagents using the User simulator.

This class directly inherits from smolagents.UserInputTool and can be passed to any smolagent. It wraps a SmolAgentLLMUser and intercepts user input requests, routing them through the user's LLM-based response simulation.

Note

Don't instantiate this directly. Use SmolAgentLLMUser.get_tool() instead.

Example

from maseval.interface.agents.smolagents import SmolAgentLLMUser

user = SmolAgentLLMUser(model=model, persona="Helpful user", scenario="Book a flight")
tool = user.get_tool()  # Returns SmolAgentUserSimulationInputTool instance

# Pass to your smolagent
agent = CodeAgent(tools=[tool, ...], model=model)

ATTRIBUTE	DESCRIPTION
`_user`	The SmolAgentLLMUser instance that handles response simulation.

init

__init__(user: SmolAgentLLMUser)

Initialize the tool with a SmolAgentLLMUser.

PARAMETER	DESCRIPTION
`user`	The SmolAgentLLMUser instance to wrap for response simulation. TYPE: `SmolAgentLLMUser`

forward

forward(question: str) -> str

Ask the user a question and get a response.

This method is called by smolagents when the agent needs user input. It delegates to the wrapped SmolAgentLLMUser's respond method.

PARAMETER	DESCRIPTION
`question`	The question to ask the user. TYPE: `str`

RETURNS	DESCRIPTION
`str`	The user's response.

SmolAgents

Installation

API Reference

SmolAgentAdapter

logs property

__init__

gather_config

gather_traces

gather_usage

get_messages

run

SmolAgentLLMUser

termination_reason property

__init__

gather_config

gather_traces

get_initial_query

get_tool

increment_turn

is_done

respond

SmolAgentUserSimulationInputTool

__init__

forward

logs `property`

init

termination_reason `property`

init

init