CAMEL-AI
Adapter for the CAMEL-AI multi-agent framework.
Installation
pip install maseval[camel]
Alternatively, install camel-ai directly:
pip install camel-ai
API Reference
CamelAgentAdapter
Bases: AgentAdapter
An AgentAdapter for CAMEL-AI ChatAgent.
This adapter integrates CAMEL-AI's ChatAgent with MASEval's benchmarking framework, converting CAMEL's message format to OpenAI-compatible MessageHistory format. It leverages CAMEL's native memory system and response info as the source of truth for conversation history and execution traces, ensuring accurate tracking of multi-turn interactions without duplicating CAMEL's internal state.
CAMEL-AI is a modular framework for building intelligent multi-agent systems. The ChatAgent is its core component for single-agent interactions, supporting tool calling, memory management, and various LLM backends.
How to use
- Create a CAMEL ChatAgent with system message and optional tools
- Wrap with CamelAgentAdapter to enable MASEval integration
- Use in benchmarks or call directly for testing
- Access traces and config for analysis and debugging
Example workflow:
from maseval.interface.agents.camel import CamelAgentAdapter
from camel.agents import ChatAgent
from camel.models import ModelFactory
from camel.types import ModelPlatformType, ModelType
# Create a CAMEL model
model = ModelFactory.create(
model_platform=ModelPlatformType.OPENAI,
model_type=ModelType.GPT_4O_MINI,
)
# Create a CAMEL ChatAgent
agent = ChatAgent(
system_message="You are a helpful assistant.",
model=model,
)
# Wrap with adapter
agent_adapter = CamelAgentAdapter(agent, name="assistant")
# Run agent
result = agent_adapter.run("What is the capital of France?")
# Access message history in OpenAI format
for msg in agent_adapter.get_messages():
print(f"{msg['role']}: {msg['content']}")
# Gather aggregated usage
usage = agent_adapter.gather_usage()
print(f"Total tokens: {usage.total_tokens}")
# Gather execution traces with tool call counts
traces = agent_adapter.gather_traces()
print(f"Tool calls: {traces['total_tool_calls']}")
# Gather configuration
config = agent_adapter.gather_config()
print(f"Model: {config.get('camel_config', {}).get('model_type')}")
# Use in benchmark
benchmark = MyBenchmark(agent_data={"agent": agent_adapter})
results = benchmark.run(tasks)
Memory as Source of Truth
Following MASEval's adapter pattern (similar to SmolAgentAdapter), this adapter
uses CAMEL's native memory and ChatAgentResponse info as the single source of
truth. The logs property dynamically extracts execution data from stored
responses rather than manually tracking metrics.
Execution Model
CAMEL's ChatAgent uses a step() method for execution, which processes
one turn of conversation and returns a ChatAgentResponse containing:
- msgs: Response messages
- terminated: Whether the conversation should end
- info: Dict with usage stats, tool_calls, termination_reasons, etc.
Requires
camel-ai to be installed: pip install maseval[camel]
logs
property
logs: List[Dict[str, Any]]
Dynamically generate logs from CAMEL's ChatAgentResponse info.
Extracts execution data from stored ChatAgentResponse objects, including token usage, tool calls, and termination reasons. This follows the same pattern as SmolAgentAdapter.
| RETURNS | DESCRIPTION |
|---|---|
List[Dict[str, Any]]
|
List of log dictionaries with comprehensive step information |
__init__
__init__(
agent_instance: Any,
name: str,
callbacks: Optional[List[Any]] = None,
cost_calculator: Optional[CostCalculator] = None,
model_id: Optional[str] = None,
)
Initialize the CAMEL adapter.
Note: We don't call super().init() to avoid initializing self.logs as a list, since we override it as a property that dynamically fetches from stored responses.
| PARAMETER | DESCRIPTION |
|---|---|
agent_instance
|
CAMEL ChatAgent instance
TYPE:
|
name
|
Agent name for identification
TYPE:
|
callbacks
|
Optional list of AgentCallback instances
TYPE:
|
cost_calculator
|
Optional cost calculator. If not provided, a
TYPE:
|
model_id
|
Optional model ID for cost calculation. If not provided,
auto-detected from
TYPE:
|
gather_config
gather_config() -> Dict[str, Any]
Gather configuration from this CAMEL agent.
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing: |
Dict[str, Any]
|
|
Dict[str, Any]
|
|
gather_traces
gather_traces() -> Dict[str, Any]
Gather execution traces from this CAMEL agent.
Extends the base class to include CAMEL-specific per-step execution
data. Aggregated usage totals are available via gather_usage().
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing base traces plus step count, tool call count, |
Dict[str, Any]
|
and termination status. |
gather_usage
gather_usage() -> Usage
Gather usage with automatic cost calculation.
Calls _gather_usage() for raw token counts, then applies
the cost calculator if one is available and cost is still 0.0.
The model_id used for cost calculation is resolved in order:
- Explicit
model_idpassed to__init__ - Auto-detected from the framework agent via
_resolve_model_id()
Subclasses should override _gather_usage() (not this method)
to provide framework-specific token extraction.
| RETURNS | DESCRIPTION |
|---|---|
Usage
|
Usage (or TokenUsage) with cost filled in when possible. |
get_messages
get_messages() -> MessageHistory
Get message history from CAMEL's memory system.
Dynamically fetches messages from the agent's memory, converting them to MASEval's MessageHistory format. CAMEL's memory.get_context() returns messages in OpenAI-compatible format.
| RETURNS | DESCRIPTION |
|---|---|
MessageHistory
|
MessageHistory with converted messages |
run
run(query: str) -> Any
Executes the agent and returns the result.
CamelLLMUser
Bases: LLMUser
A CAMEL-specific LLM user that provides a tool for user interaction.
Extends LLMUser to provide a CAMEL-compatible FunctionTool that wraps the respond method, allowing CAMEL agents to interact with users during benchmarking.
Requires camel-ai to be installed.
Example
from maseval.interface.agents.camel import CamelLLMUser
from maseval.interface.inference import OpenAIModelAdapter
# Create a model for user simulation
model = OpenAIModelAdapter(model_id="gpt-4o-mini")
# Create the user
user = CamelLLMUser(
name="customer",
model=model,
user_profile={"name": "John", "preferences": ["fast service"]},
scenario="Customer seeking help with a product return",
initial_query="I need to return a product I bought last week.",
)
# Get the tool for use with CAMEL agent
tool = user.get_tool()
# Create CAMEL agent with the user tool
from camel.agents import ChatAgent
agent = ChatAgent(
system_message="You are a helpful customer service agent.",
tools=[tool],
)
termination_reason
property
termination_reason: TerminationReason
Get the reason why the user interaction terminated.
| RETURNS | DESCRIPTION |
|---|---|
TerminationReason
|
Why |
__init__
__init__(
name: str,
model: ModelAdapter,
user_profile: Dict[str, Any],
scenario: str,
initial_query: Optional[str] = None,
template: Optional[str] = None,
max_try: int = 3,
max_turns: int = 1,
stop_tokens: Optional[List[str]] = None,
early_stopping_condition: Optional[str] = None,
exhausted_response: Optional[str] = None,
)
Initialize the LLMUser.
| PARAMETER | DESCRIPTION |
|---|---|
name
|
The name of the user.
TYPE:
|
model
|
The language model to be used for generating responses.
TYPE:
|
user_profile
|
A dictionary describing the user's persona, preferences, and other relevant information.
TYPE:
|
scenario
|
A description of the situation or task the user is trying to accomplish.
TYPE:
|
initial_query
|
A pre-set query to start the conversation. If provided, it becomes the first user message. If None, call get_initial_query() to generate one from the model based on the user profile and scenario. Defaults to None.
TYPE:
|
template
|
A custom prompt template for the user simulator. Defaults to None.
TYPE:
|
max_try
|
The maximum number of attempts for the simulator to generate a valid response. Defaults to 3.
TYPE:
|
max_turns
|
Maximum number of user messages in the conversation. Each user message counts as one turn, including the initial_query. Use max_turns=1 for single-turn benchmarks, or higher values for multi-turn interaction. Defaults to 1.
TYPE:
|
stop_tokens
|
List of tokens that signal user satisfaction, enabling early termination. When the user's LLM-generated response contains any of these tokens, is_done() returns True regardless of remaining turns. The matched token is stripped from the response. Defaults to None (early stopping disabled).
TYPE:
|
early_stopping_condition
|
A description of when the user should stop the conversation (e.g., "all goals have been accomplished"). Used with stop_tokens to instruct the LLM when to emit a stop token. Must be provided if stop_tokens is set. Defaults to None.
TYPE:
|
exhausted_response
|
Message to return when
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If stop_tokens is set but early_stopping_condition is not provided. |
gather_config
gather_config() -> Dict[str, Any]
Gather configuration from this user.
Output fields:
name- User identifierprofile- User profile datascenario- Task scenario descriptionmax_turns- Maximum interaction turnsstop_tokens- Early stopping tokens (empty list if disabled)exhausted_response- Message returned when user is done, or None
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing user configuration. |
gather_traces
gather_traces() -> Dict[str, Any]
Gather execution traces from this user.
Output fields:
name- User identifierprofile- User profile datamessage_count- Number of messages in historymessages- Full conversation historylogs- Execution logs with timingtermination_reason- Why interaction ended (seeTerminationReason)stop_reason- Which stop token triggered termination, if anymax_turns- Maximum allowed turnsturns_used- Actual turns usedstopped_by_user- Whether user emitted a stop token
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing user state and interaction data. |
get_initial_query
get_initial_query() -> str
Get the initial query for the conversation.
If an initial_query was provided at construction, returns it. Otherwise, generates one using the LLM simulator based on the user's profile and scenario.
This method: - Returns the existing initial query if one was provided - Or calls the LLM simulator to generate one - Ensures the query is in the message history - Counts the initial query as the first turn
| RETURNS | DESCRIPTION |
|---|---|
str
|
The initial query (either pre-set or LLM-generated). |
| RAISES | DESCRIPTION |
|---|---|
RuntimeError
|
If called after conversation has progressed beyond the initial message. |
get_tool
get_tool() -> Any
Get a CAMEL-compatible tool for user interaction.
Returns a CAMEL FunctionTool that wraps the respond method, allowing agents to ask the user questions during execution.
| RETURNS | DESCRIPTION |
|---|---|
Any
|
CAMEL FunctionTool instance for user interaction |
increment_turn
increment_turn() -> None
Increment the turn counter.
Call this after recording a user response in the message history.
is_done
is_done() -> bool
Check if the user interaction should end.
Checks: 1. If max_turns has been reached 2. If the user previously indicated termination (via stop_token)
Subclasses can override to add custom termination logic (e.g., LLM-based satisfaction checks) by calling super().is_done() first.
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if the user is done interacting, False to continue. |
respond
respond(message: str) -> str
Respond to a message from the agent using LLM simulation.
This method appends the agent's message to the conversation history, generates a response using the LLM simulator, appends the response to the history, and returns it.
If a stop_token is detected in the response, triggers early stopping.
| PARAMETER | DESCRIPTION |
|---|---|
message
|
The message from the agent to which the user should respond.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
The user's response, or |
| RAISES | DESCRIPTION |
|---|---|
UserExhaustedError
|
If the user is already done and no
|
CamelAgentUser
Bases: User
User backed by a CAMEL ChatAgent.
Wraps a CAMEL ChatAgent to act as the user in MASEval's evaluation loop, enabling agent-to-agent evaluation where one agent acts as the user.
Unlike CamelLLMUser which uses MASEval's LLM simulator, this class
delegates directly to a CAMEL ChatAgent for generating responses.
Example
from camel.agents import ChatAgent
from camel.models import ModelFactory
from camel.types import ModelPlatformType, ModelType
from maseval.interface.agents.camel import CamelAgentUser
# Create a ChatAgent to act as the user
model = ModelFactory.create(
model_platform=ModelPlatformType.OPENAI,
model_type=ModelType.GPT_4O_MINI,
)
user_agent = ChatAgent(
system_message="You are a customer seeking help with an order.",
model=model,
)
# Wrap as MASEval user
user = CamelAgentUser(
user_agent=user_agent,
initial_query="I need help with my order",
max_turns=5,
)
__init__
__init__(
user_agent: Any,
initial_query: str,
name: str = "camel_agent_user",
max_turns: int = 10,
)
Initialize CamelAgentUser.
| PARAMETER | DESCRIPTION |
|---|---|
user_agent
|
CAMEL ChatAgent instance to use as the user.
TYPE:
|
initial_query
|
The opening message to start the conversation.
TYPE:
|
name
|
Name for this user (used in traces). Defaults to "camel_agent_user".
TYPE:
|
max_turns
|
Maximum number of response turns. Defaults to 10.
TYPE:
|
gather_config
gather_config() -> Dict[str, Any]
Gather configuration from this user.
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing configuration information. |
gather_traces
gather_traces() -> Dict[str, Any]
Gather execution traces from this user.
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing trace information. |
get_initial_query
get_initial_query() -> str
Return the initial query to start the conversation.
| RETURNS | DESCRIPTION |
|---|---|
str
|
The initial query provided at construction. |
get_tool
get_tool() -> Any
Return a CAMEL FunctionTool for agent-to-user interaction.
| RETURNS | DESCRIPTION |
|---|---|
Any
|
CAMEL FunctionTool wrapping the respond method. |
is_done
is_done() -> bool
Check if the user interaction should terminate.
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if max_turns has been reached. |
respond
respond(message: str) -> str
Forward the message to the CAMEL agent and return its response.
| PARAMETER | DESCRIPTION |
|---|---|
message
|
The agent's message to respond to.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
The CAMEL agent's response, or empty string if done. |
camel_role_playing_execution_loop
camel_role_playing_execution_loop(
role_playing: Any,
task: Any,
max_steps: int = 10,
tracer: Optional[CamelRolePlayingTracer] = None,
) -> Any
Execution loop for benchmarks using CAMEL's RolePlaying.
CAMEL's RolePlaying manages its own agent-user coordination: it alternates
between an assistant agent and a user agent via step() calls, handling
turn-taking and termination internally. This differs from MASEval's default
execution loop, which coordinates between an AgentAdapter and a User.
This function bridges the two: call it from your benchmark's execution_loop
override to let RolePlaying handle the interaction while MASEval handles
the evaluation lifecycle.
| PARAMETER | DESCRIPTION |
|---|---|
role_playing
|
The CAMEL RolePlaying instance.
TYPE:
|
task
|
Current MASEval task (passed for interface consistency).
TYPE:
|
max_steps
|
Maximum number of RolePlaying steps. Defaults to 10.
TYPE:
|
tracer
|
Optional
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Any
|
Final answer from the assistant agent, or |
Example
class CamelRolePlayingBenchmark(Benchmark):
def setup_agents(self, agent_data, environment, task, user):
self._role_playing = RolePlaying(
assistant_role_name="Assistant",
user_role_name="User",
task_prompt=task.query,
)
# Wrap both agents for tracing
assistant = CamelAgentAdapter(
self._role_playing.assistant_agent, "assistant"
)
user_agent = CamelAgentAdapter(
self._role_playing.user_agent, "user_agent"
)
# Optional: create tracer
self._tracer = CamelRolePlayingTracer(self._role_playing)
self.register(self._tracer)
return [assistant], {"assistant": assistant, "user_agent": user_agent}
def execution_loop(self, agents, task, environment, user):
return camel_role_playing_execution_loop(
self._role_playing, task, tracer=self._tracer
)
CamelRolePlayingTracer
Bases: TraceableMixin, ConfigurableMixin
Collects orchestration traces from CAMEL RolePlaying.
RolePlaying is a CAMEL-AI component that orchestrates turn-based interaction between two ChatAgents (an assistant and a simulated user).
When using RolePlaying, you typically wrap both agents with CamelAgentAdapter
to trace their individual message histories and token usage. However, this
misses orchestration-level data that no single agent owns: how many
back-and-forth steps occurred, which agent terminated the conversation, etc.
This tracer fills that gap by capturing RolePlaying's orchestration state, giving you the complete picture alongside individual agent traces.
Register with benchmark to include in trace collection:
tracer = CamelRolePlayingTracer(role_playing)
self.register(tracer)
Then call record_step() after each RolePlaying.step():
assistant_response, user_response = role_playing.step()
tracer.record_step(assistant_response, user_response)
Example
class MyBenchmark(Benchmark):
def setup_agents(self, agent_data, environment, task, user):
self._role_playing = RolePlaying(...)
self._tracer = CamelRolePlayingTracer(self._role_playing)
self.register(self._tracer)
return [...]
def execution_loop(self, agents, task, environment, user):
for _ in range(10):
assistant_response, user_response = self._role_playing.step()
self._tracer.record_step(assistant_response, user_response)
if assistant_response.terminated:
break
return final_answer
__init__
__init__(role_playing: Any, name: str = 'role_playing')
Initialize the RolePlaying tracer.
| PARAMETER | DESCRIPTION |
|---|---|
role_playing
|
CAMEL RolePlaying instance to trace.
TYPE:
|
name
|
Name for this tracer in traces. Defaults to "role_playing".
TYPE:
|
gather_config
gather_config() -> Dict[str, Any]
Gather configuration from RolePlaying.
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing RolePlaying configuration. |
gather_traces
gather_traces() -> Dict[str, Any]
Gather orchestration traces from RolePlaying.
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing: |
Dict[str, Any]
|
|
Dict[str, Any]
|
|
Dict[str, Any]
|
|
Dict[str, Any]
|
|
Dict[str, Any]
|
|
record_step
record_step(
assistant_response: Any, user_response: Any
) -> None
Record data from a RolePlaying step.
Call this after each role_playing.step() to track progress.
| PARAMETER | DESCRIPTION |
|---|---|
assistant_response
|
ChatAgentResponse from the assistant.
TYPE:
|
user_response
|
ChatAgentResponse from the user agent.
TYPE:
|
CamelWorkforceTracer
Bases: TraceableMixin, ConfigurableMixin
Collects orchestration traces from CAMEL Workforce.
Workforce is a CAMEL-AI component that manages task decomposition, worker assignment, and retry strategies for complex multi-agent collaboration.
When using Workforce, you typically wrap individual workers with
CamelAgentAdapter to trace their message histories. However, this misses
orchestration-level data that no single worker owns: how the problem was
decomposed into subtasks, which worker was assigned to each task, task
dependencies, and completion status.
This tracer fills that gap by capturing Workforce's orchestration state, giving you the complete picture alongside individual worker traces.
Note: This tracer accesses Workforce internal attributes (_children,
_assignees, _pending_tasks, etc.) which may change with CAMEL updates.
Register with benchmark to include in trace collection:
tracer = CamelWorkforceTracer(workforce)
self.register(tracer)
Example
class MyBenchmark(Benchmark):
def setup_agents(self, agent_data, environment, task, user):
workforce = Workforce(...)
self._workforce = workforce
# Create tracer and register it
tracer = CamelWorkforceTracer(workforce)
self.register(tracer)
# Wrap individual workers for message tracing
worker_adapters = {}
for worker in workforce._children:
adapter = CamelAgentAdapter(worker.agent, name=worker.name)
worker_adapters[worker.name] = adapter
return [], worker_adapters
__init__
__init__(workforce: Any, name: str = 'workforce')
Initialize the Workforce tracer.
| PARAMETER | DESCRIPTION |
|---|---|
workforce
|
CAMEL Workforce instance to trace.
TYPE:
|
name
|
Name for this tracer in traces. Defaults to "workforce".
TYPE:
|
gather_config
gather_config() -> Dict[str, Any]
Gather configuration from Workforce.
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing Workforce configuration. |
gather_traces
gather_traces() -> Dict[str, Any]
Gather orchestration traces from Workforce.
Extracts task decomposition, worker assignments, and task lifecycle information from the Workforce's internal state.
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing: |
Dict[str, Any]
|
|
Dict[str, Any]
|
|
Dict[str, Any]
|
|
Dict[str, Any]
|
|
Dict[str, Any]
|
|
Dict[str, Any]
|
|