CAMEL-AI

Adapter for the CAMEL-AI multi-agent framework.

Installation

pip install maseval[camel]

Alternatively, install camel-ai directly:

pip install camel-ai

API Reference

View source

CamelAgentAdapter

Bases: AgentAdapter

An AgentAdapter for CAMEL-AI ChatAgent.

This adapter integrates CAMEL-AI's ChatAgent with MASEval's benchmarking framework, converting CAMEL's message format to OpenAI-compatible MessageHistory format. It leverages CAMEL's native memory system and response info as the source of truth for conversation history and execution traces, ensuring accurate tracking of multi-turn interactions without duplicating CAMEL's internal state.

CAMEL-AI is a modular framework for building intelligent multi-agent systems. The ChatAgent is its core component for single-agent interactions, supporting tool calling, memory management, and various LLM backends.

How to use

Create a CAMEL ChatAgent with system message and optional tools
Wrap with CamelAgentAdapter to enable MASEval integration
Use in benchmarks or call directly for testing
Access traces and config for analysis and debugging

Example workflow:

from maseval.interface.agents.camel import CamelAgentAdapter
from camel.agents import ChatAgent
from camel.models import ModelFactory
from camel.types import ModelPlatformType, ModelType

# Create a CAMEL model
model = ModelFactory.create(
    model_platform=ModelPlatformType.OPENAI,
    model_type=ModelType.GPT_4O_MINI,
)

# Create a CAMEL ChatAgent
agent = ChatAgent(
    system_message="You are a helpful assistant.",
    model=model,
)

# Wrap with adapter
agent_adapter = CamelAgentAdapter(agent, name="assistant")

# Run agent
result = agent_adapter.run("What is the capital of France?")

# Access message history in OpenAI format
for msg in agent_adapter.get_messages():
    print(f"{msg['role']}: {msg['content']}")

# Gather aggregated usage
usage = agent_adapter.gather_usage()
print(f"Total tokens: {usage.total_tokens}")

# Gather execution traces with tool call counts
traces = agent_adapter.gather_traces()
print(f"Tool calls: {traces['total_tool_calls']}")

# Gather configuration
config = agent_adapter.gather_config()
print(f"Model: {config.get('camel_config', {}).get('model_type')}")

# Use in benchmark
benchmark = MyBenchmark(agent_data={"agent": agent_adapter})
results = benchmark.run(tasks)

Message Format

CAMEL uses BaseMessage objects with role_name, role_type, and content. The adapter converts these to OpenAI-compatible format via the agent's memory system, which already provides messages in a compatible structure.

Memory as Source of Truth

Following MASEval's adapter pattern (similar to SmolAgentAdapter), this adapter uses CAMEL's native memory and ChatAgentResponse info as the single source of truth. The logs property dynamically extracts execution data from stored responses rather than manually tracking metrics.

Execution Model

CAMEL's ChatAgent uses a step() method for execution, which processes one turn of conversation and returns a ChatAgentResponse containing: - msgs: Response messages - terminated: Whether the conversation should end - info: Dict with usage stats, tool_calls, termination_reasons, etc.

Requires

camel-ai to be installed: pip install maseval[camel]

logs `property`

logs: List[Dict[str, Any]]

Dynamically generate logs from CAMEL's ChatAgentResponse info.

Extracts execution data from stored ChatAgentResponse objects, including token usage, tool calls, and termination reasons. This follows the same pattern as SmolAgentAdapter.

RETURNS	DESCRIPTION
`List[Dict[str, Any]]`	List of log dictionaries with comprehensive step information

init

__init__(
    agent_instance: Any,
    name: str,
    callbacks: Optional[List[Any]] = None,
    cost_calculator: Optional[CostCalculator] = None,
    model_id: Optional[str] = None,
)

Initialize the CAMEL adapter.

Note: We don't call super().init() to avoid initializing self.logs as a list, since we override it as a property that dynamically fetches from stored responses.

PARAMETER	DESCRIPTION
`agent_instance`	CAMEL ChatAgent instance TYPE: `Any`
`name`	Agent name for identification TYPE: `str`
`callbacks`	Optional list of AgentCallback instances TYPE: `Optional[List[Any]]` DEFAULT: `None`
`cost_calculator`	Optional cost calculator. If not provided, a `LiteLLMCostCalculator` is created automatically when litellm is available. TYPE: `Optional[CostCalculator]` DEFAULT: `None`
`model_id`	Optional model ID for cost calculation. If not provided, auto-detected from `agent.model_backend.model_type`. TYPE: `Optional[str]` DEFAULT: `None`

gather_config

gather_config() -> Dict[str, Any]

Gather configuration from this CAMEL agent.

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary containing:
`Dict[str, Any]`	Base config (type, gathered_at, name, agent_type, adapter_type, callbacks)
`Dict[str, Any]`	camel_config: CAMEL-specific configuration including: system_message: The agent's system prompt model_type: The model being used tools: List of configured tools memory_type: Type of memory being used

gather_traces

gather_traces() -> Dict[str, Any]

Gather execution traces from this CAMEL agent.

Extends the base class to include CAMEL-specific per-step execution data. Aggregated usage totals are available via gather_usage().

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary containing base traces plus step count, tool call count,
`Dict[str, Any]`	and termination status.

gather_usage

gather_usage() -> Usage

Gather usage with automatic cost calculation.

Calls _gather_usage() for raw token counts, then applies the cost calculator if one is available and cost is still 0.0.

The model_id used for cost calculation is resolved in order:

Explicit model_id passed to __init__
Auto-detected from the framework agent via _resolve_model_id()

Subclasses should override _gather_usage() (not this method) to provide framework-specific token extraction.

RETURNS	DESCRIPTION
`Usage`	Usage (or TokenUsage) with cost filled in when possible.

get_messages

get_messages() -> MessageHistory

Get message history from CAMEL's memory system.

Dynamically fetches messages from the agent's memory, converting them to MASEval's MessageHistory format. CAMEL's memory.get_context() returns messages in OpenAI-compatible format.

RETURNS	DESCRIPTION
`MessageHistory`	MessageHistory with converted messages

run

run(query: str) -> Any

Executes the agent and returns the result.

CamelLLMUser

Bases: LLMUser

A CAMEL-specific LLM user that provides a tool for user interaction.

Extends LLMUser to provide a CAMEL-compatible FunctionTool that wraps the respond method, allowing CAMEL agents to interact with users during benchmarking.

Requires camel-ai to be installed.

Example

from maseval.interface.agents.camel import CamelLLMUser
from maseval.interface.inference import OpenAIModelAdapter

# Create a model for user simulation
model = OpenAIModelAdapter(model_id="gpt-4o-mini")

# Create the user
user = CamelLLMUser(
    name="customer",
    model=model,
    user_profile={"name": "John", "preferences": ["fast service"]},
    scenario="Customer seeking help with a product return",
    initial_query="I need to return a product I bought last week.",
)

# Get the tool for use with CAMEL agent
tool = user.get_tool()

# Create CAMEL agent with the user tool
from camel.agents import ChatAgent
agent = ChatAgent(
    system_message="You are a helpful customer service agent.",
    tools=[tool],
)

termination_reason `property`

termination_reason: TerminationReason

Get the reason why the user interaction terminated.

RETURNS	DESCRIPTION
`TerminationReason`	Why `is_done()` returns True, or `NOT_TERMINATED` if still ongoing.

init

__init__(
    name: str,
    model: ModelAdapter,
    user_profile: Dict[str, Any],
    scenario: str,
    initial_query: Optional[str] = None,
    template: Optional[str] = None,
    max_try: int = 3,
    max_turns: int = 1,
    stop_tokens: Optional[List[str]] = None,
    early_stopping_condition: Optional[str] = None,
    exhausted_response: Optional[str] = None,
)

Initialize the LLMUser.

PARAMETER	DESCRIPTION
`name`	The name of the user. TYPE: `str`
`model`	The language model to be used for generating responses. TYPE: `ModelAdapter`
`user_profile`	A dictionary describing the user's persona, preferences, and other relevant information. TYPE: `Dict[str, Any]`
`scenario`	A description of the situation or task the user is trying to accomplish. TYPE: `str`
`initial_query`	A pre-set query to start the conversation. If provided, it becomes the first user message. If None, call get_initial_query() to generate one from the model based on the user profile and scenario. Defaults to None. TYPE: `Optional[str]` DEFAULT: `None`
`template`	A custom prompt template for the user simulator. Defaults to None. TYPE: `Optional[str]` DEFAULT: `None`
`max_try`	The maximum number of attempts for the simulator to generate a valid response. Defaults to 3. TYPE: `int` DEFAULT: `3`
`max_turns`	Maximum number of user messages in the conversation. Each user message counts as one turn, including the initial_query. Use max_turns=1 for single-turn benchmarks, or higher values for multi-turn interaction. Defaults to 1. TYPE: `int` DEFAULT: `1`
`stop_tokens`	List of tokens that signal user satisfaction, enabling early termination. When the user's LLM-generated response contains any of these tokens, is_done() returns True regardless of remaining turns. The matched token is stripped from the response. Defaults to None (early stopping disabled). TYPE: `Optional[List[str]]` DEFAULT: `None`
`early_stopping_condition`	A description of when the user should stop the conversation (e.g., "all goals have been accomplished"). Used with stop_tokens to instruct the LLM when to emit a stop token. Must be provided if stop_tokens is set. Defaults to None. TYPE: `Optional[str]` DEFAULT: `None`
`exhausted_response`	Message to return when `respond()` is called after the user is done. If `None` (default), raises `UserExhaustedError` instead. Set this to a descriptive string (e.g., `"The user is no longer available. Proceed with the information you have."`) for tool-based integrations where the agent controls when to call the user. TYPE: `Optional[str]` DEFAULT: `None`

RAISES	DESCRIPTION
`ValueError`	If stop_tokens is set but early_stopping_condition is not provided.

gather_config

gather_config() -> Dict[str, Any]

Gather configuration from this user.

Output fields:

name - User identifier
profile - User profile data
scenario - Task scenario description
max_turns - Maximum interaction turns
stop_tokens - Early stopping tokens (empty list if disabled)
exhausted_response - Message returned when user is done, or None

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary containing user configuration.

gather_traces

gather_traces() -> Dict[str, Any]

Gather execution traces from this user.

Output fields:

name - User identifier
profile - User profile data
message_count - Number of messages in history
messages - Full conversation history
logs - Execution logs with timing
termination_reason - Why interaction ended (see TerminationReason)
stop_reason - Which stop token triggered termination, if any
max_turns - Maximum allowed turns
turns_used - Actual turns used
stopped_by_user - Whether user emitted a stop token

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary containing user state and interaction data.

get_initial_query

get_initial_query() -> str

Get the initial query for the conversation.

If an initial_query was provided at construction, returns it. Otherwise, generates one using the LLM simulator based on the user's profile and scenario.

This method: - Returns the existing initial query if one was provided - Or calls the LLM simulator to generate one - Ensures the query is in the message history - Counts the initial query as the first turn

RETURNS	DESCRIPTION
`str`	The initial query (either pre-set or LLM-generated).

RAISES	DESCRIPTION
`RuntimeError`	If called after conversation has progressed beyond the initial message.

get_tool

get_tool() -> Any

Get a CAMEL-compatible tool for user interaction.

Returns a CAMEL FunctionTool that wraps the respond method, allowing agents to ask the user questions during execution.

RETURNS	DESCRIPTION
`Any`	CAMEL FunctionTool instance for user interaction

increment_turn

increment_turn() -> None

Increment the turn counter.

Call this after recording a user response in the message history.

is_done

is_done() -> bool

Check if the user interaction should end.

Checks: 1. If max_turns has been reached 2. If the user previously indicated termination (via stop_token)

Subclasses can override to add custom termination logic (e.g., LLM-based satisfaction checks) by calling super().is_done() first.

RETURNS	DESCRIPTION
`bool`	True if the user is done interacting, False to continue.

respond

respond(message: str) -> str

Respond to a message from the agent using LLM simulation.

This method appends the agent's message to the conversation history, generates a response using the LLM simulator, appends the response to the history, and returns it.

If a stop_token is detected in the response, triggers early stopping.

PARAMETER	DESCRIPTION
`message`	The message from the agent to which the user should respond. TYPE: `str`

RETURNS	DESCRIPTION
`str`	The user's response, or `exhausted_response` if done and configured.

RAISES	DESCRIPTION
`UserExhaustedError`	If the user is already done and no `exhausted_response` is configured.

CamelAgentUser

Bases: User

User backed by a CAMEL ChatAgent.

Wraps a CAMEL ChatAgent to act as the user in MASEval's evaluation loop, enabling agent-to-agent evaluation where one agent acts as the user.

Unlike CamelLLMUser which uses MASEval's LLM simulator, this class delegates directly to a CAMEL ChatAgent for generating responses.

Example

from camel.agents import ChatAgent
from camel.models import ModelFactory
from camel.types import ModelPlatformType, ModelType
from maseval.interface.agents.camel import CamelAgentUser

# Create a ChatAgent to act as the user
model = ModelFactory.create(
    model_platform=ModelPlatformType.OPENAI,
    model_type=ModelType.GPT_4O_MINI,
)
user_agent = ChatAgent(
    system_message="You are a customer seeking help with an order.",
    model=model,
)

# Wrap as MASEval user
user = CamelAgentUser(
    user_agent=user_agent,
    initial_query="I need help with my order",
    max_turns=5,
)

init

__init__(
    user_agent: Any,
    initial_query: str,
    name: str = "camel_agent_user",
    max_turns: int = 10,
)

Initialize CamelAgentUser.

PARAMETER	DESCRIPTION
`user_agent`	CAMEL ChatAgent instance to use as the user. TYPE: `Any`
`initial_query`	The opening message to start the conversation. TYPE: `str`
`name`	Name for this user (used in traces). Defaults to "camel_agent_user". TYPE: `str` DEFAULT: `'camel_agent_user'`
`max_turns`	Maximum number of response turns. Defaults to 10. TYPE: `int` DEFAULT: `10`

gather_config

gather_config() -> Dict[str, Any]

Gather configuration from this user.

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary containing configuration information.

gather_traces

gather_traces() -> Dict[str, Any]

Gather execution traces from this user.

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary containing trace information.

get_initial_query

get_initial_query() -> str

Return the initial query to start the conversation.

RETURNS	DESCRIPTION
`str`	The initial query provided at construction.

get_tool

get_tool() -> Any

Return a CAMEL FunctionTool for agent-to-user interaction.

RETURNS	DESCRIPTION
`Any`	CAMEL FunctionTool wrapping the respond method.

is_done

is_done() -> bool

Check if the user interaction should terminate.

RETURNS	DESCRIPTION
`bool`	True if max_turns has been reached.

respond

respond(message: str) -> str

Forward the message to the CAMEL agent and return its response.

PARAMETER	DESCRIPTION
`message`	The agent's message to respond to. TYPE: `str`

RETURNS	DESCRIPTION
`str`	The CAMEL agent's response, or empty string if done.

camel_role_playing_execution_loop

camel_role_playing_execution_loop(
    role_playing: Any,
    task: Any,
    max_steps: int = 10,
    tracer: Optional[CamelRolePlayingTracer] = None,
) -> Any

Execution loop for benchmarks using CAMEL's RolePlaying.

CAMEL's RolePlaying manages its own agent-user coordination: it alternates between an assistant agent and a user agent via step() calls, handling turn-taking and termination internally. This differs from MASEval's default execution loop, which coordinates between an AgentAdapter and a User.

This function bridges the two: call it from your benchmark's execution_loop override to let RolePlaying handle the interaction while MASEval handles the evaluation lifecycle.

PARAMETER	DESCRIPTION
`role_playing`	The CAMEL RolePlaying instance. TYPE: `Any`
`task`	Current MASEval task (passed for interface consistency). TYPE: `Any`
`max_steps`	Maximum number of RolePlaying steps. Defaults to 10. TYPE: `int` DEFAULT: `10`
`tracer`	Optional `CamelRolePlayingTracer` to record step data. TYPE: `Optional[CamelRolePlayingTracer]` DEFAULT: `None`

RETURNS	DESCRIPTION
`Any`	Final answer from the assistant agent, or `None` if no response.

Example

class CamelRolePlayingBenchmark(Benchmark):
    def setup_agents(self, agent_data, environment, task, user):
        self._role_playing = RolePlaying(
            assistant_role_name="Assistant",
            user_role_name="User",
            task_prompt=task.query,
        )

        # Wrap both agents for tracing
        assistant = CamelAgentAdapter(
            self._role_playing.assistant_agent, "assistant"
        )
        user_agent = CamelAgentAdapter(
            self._role_playing.user_agent, "user_agent"
        )

        # Optional: create tracer
        self._tracer = CamelRolePlayingTracer(self._role_playing)
        self.register(self._tracer)

        return [assistant], {"assistant": assistant, "user_agent": user_agent}

    def execution_loop(self, agents, task, environment, user):
        return camel_role_playing_execution_loop(
            self._role_playing, task, tracer=self._tracer
        )

CamelRolePlayingTracer

Bases: TraceableMixin, ConfigurableMixin

Collects orchestration traces from CAMEL RolePlaying.

RolePlaying is a CAMEL-AI component that orchestrates turn-based interaction between two ChatAgents (an assistant and a simulated user).

When using RolePlaying, you typically wrap both agents with CamelAgentAdapter to trace their individual message histories and token usage. However, this misses orchestration-level data that no single agent owns: how many back-and-forth steps occurred, which agent terminated the conversation, etc.

This tracer fills that gap by capturing RolePlaying's orchestration state, giving you the complete picture alongside individual agent traces.

Register with benchmark to include in trace collection:

tracer = CamelRolePlayingTracer(role_playing)
self.register(tracer)

Then call record_step() after each RolePlaying.step():

assistant_response, user_response = role_playing.step()
tracer.record_step(assistant_response, user_response)

Example

class MyBenchmark(Benchmark):
    def setup_agents(self, agent_data, environment, task, user):
        self._role_playing = RolePlaying(...)
        self._tracer = CamelRolePlayingTracer(self._role_playing)
        self.register(self._tracer)
        return [...]

    def execution_loop(self, agents, task, environment, user):
        for _ in range(10):
            assistant_response, user_response = self._role_playing.step()
            self._tracer.record_step(assistant_response, user_response)
            if assistant_response.terminated:
                break
        return final_answer

init

__init__(role_playing: Any, name: str = 'role_playing')

Initialize the RolePlaying tracer.

PARAMETER	DESCRIPTION
`role_playing`	CAMEL RolePlaying instance to trace. TYPE: `Any`
`name`	Name for this tracer in traces. Defaults to "role_playing". TYPE: `str` DEFAULT: `'role_playing'`

gather_config

gather_config() -> Dict[str, Any]

Gather configuration from RolePlaying.

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary containing RolePlaying configuration.

gather_traces

gather_traces() -> Dict[str, Any]

Gather orchestration traces from RolePlaying.

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary containing:
`Dict[str, Any]`	name: Tracer name
`Dict[str, Any]`	type: "role_playing_orchestration"
`Dict[str, Any]`	step_count: Number of steps executed
`Dict[str, Any]`	termination_reason: Why the interaction ended
`Dict[str, Any]`	step_logs: Per-step termination data

record_step

record_step(
    assistant_response: Any, user_response: Any
) -> None

Record data from a RolePlaying step.

Call this after each role_playing.step() to track progress.

PARAMETER	DESCRIPTION
`assistant_response`	ChatAgentResponse from the assistant. TYPE: `Any`
`user_response`	ChatAgentResponse from the user agent. TYPE: `Any`

CamelWorkforceTracer

Bases: TraceableMixin, ConfigurableMixin

Collects orchestration traces from CAMEL Workforce.

Workforce is a CAMEL-AI component that manages task decomposition, worker assignment, and retry strategies for complex multi-agent collaboration.

When using Workforce, you typically wrap individual workers with CamelAgentAdapter to trace their message histories. However, this misses orchestration-level data that no single worker owns: how the problem was decomposed into subtasks, which worker was assigned to each task, task dependencies, and completion status.

This tracer fills that gap by capturing Workforce's orchestration state, giving you the complete picture alongside individual worker traces.

Note: This tracer accesses Workforce internal attributes (_children, _assignees, _pending_tasks, etc.) which may change with CAMEL updates.

Register with benchmark to include in trace collection:

tracer = CamelWorkforceTracer(workforce)
self.register(tracer)

Example

class MyBenchmark(Benchmark):
    def setup_agents(self, agent_data, environment, task, user):
        workforce = Workforce(...)
        self._workforce = workforce

        # Create tracer and register it
        tracer = CamelWorkforceTracer(workforce)
        self.register(tracer)

        # Wrap individual workers for message tracing
        worker_adapters = {}
        for worker in workforce._children:
            adapter = CamelAgentAdapter(worker.agent, name=worker.name)
            worker_adapters[worker.name] = adapter

        return [], worker_adapters

init

__init__(workforce: Any, name: str = 'workforce')

Initialize the Workforce tracer.

PARAMETER	DESCRIPTION
`workforce`	CAMEL Workforce instance to trace. TYPE: `Any`
`name`	Name for this tracer in traces. Defaults to "workforce". TYPE: `str` DEFAULT: `'workforce'`

gather_config

gather_config() -> Dict[str, Any]

Gather configuration from Workforce.

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary containing Workforce configuration.

gather_traces

gather_traces() -> Dict[str, Any]

Gather orchestration traces from Workforce.

Extracts task decomposition, worker assignments, and task lifecycle information from the Workforce's internal state.

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary containing:
`Dict[str, Any]`	name: Tracer name
`Dict[str, Any]`	type: "workforce_orchestration"
`Dict[str, Any]`	task_decomposition: Task dependency graph
`Dict[str, Any]`	worker_assignments: Which worker handled which task
`Dict[str, Any]`	completed_tasks: List of completed task summaries
`Dict[str, Any]`	pending_tasks: Count of pending tasks

CAMEL-AI

Installation

API Reference

CamelAgentAdapter

logs property

__init__

gather_config

gather_traces

gather_usage

get_messages

run

CamelLLMUser

termination_reason property

__init__

gather_config

gather_traces

get_initial_query

get_tool

increment_turn

is_done

respond

CamelAgentUser

__init__

gather_config

gather_traces

get_initial_query

get_tool

is_done

respond

camel_role_playing_execution_loop

CamelRolePlayingTracer

__init__

gather_config

gather_traces

record_step

CamelWorkforceTracer

__init__

gather_config

gather_traces

logs `property`

init

termination_reason `property`

init

init

init

init