LlamaIndex
Adapter implementing commonly used functions for LlamaIndex's workflow-based agent system.
Installation
pip install maseval[llamaindex]
Alternatively, install llama-index-core directly:
pip install llama-index-core
API Reference
LlamaIndexAgentAdapter
Bases: AgentAdapter
An AgentAdapter for LlamaIndex workflow-based agents.
This adapter integrates LlamaIndex's workflow-based agent system with MASEval's benchmarking framework, converting LlamaIndex's ChatMessage format to OpenAI-compatible MessageHistory format. It handles both AgentWorkflow and BaseWorkflowAgent instances, automatically managing async execution in synchronous contexts.
LlamaIndex agents are async-first, using workflows that must be awaited. This adapter handles the async-to-sync conversion automatically, supporting both agents with persistent memory and stateless execution modes. It seamlessly integrates with MASEval's synchronous benchmarking API.
How to use
- Create a LlamaIndex workflow agent with tools and LLM
- Wrap with LlamaIndexAgentAdapter to enable MASEval integration
- Use in benchmarks or call directly for testing
- Access traces and config for analysis and debugging
Example workflow:
from maseval.interface.agents.llamaindex import LlamaIndexAgentAdapter
from llama_index.core.agent.workflow import AgentWorkflow
from llama_index.core.llms import OpenAI
from llama_index.core.tools import FunctionTool
# Define a tool
def search(query: str) -> str:
"""Search for information."""
return f"Results for: {query}"
search_tool = FunctionTool.from_defaults(fn=search)
# Create a LlamaIndex workflow
workflow = AgentWorkflow.from_tools_or_functions(
tools_or_functions=[search_tool],
llm=OpenAI(model="gpt-4"),
system_prompt="You are a helpful research assistant"
)
# Wrap with adapter
agent_adapter = LlamaIndexAgentAdapter(workflow, "research_agent")
# Run agent (async handled automatically)
result = agent_adapter.run("What are the latest developments in quantum computing?")
# Access message history in OpenAI format
for msg in agent_adapter.get_messages():
print(f"{msg['role']}: {msg['content']}")
# Gather configuration including tools and system prompt
config = agent_adapter.gather_config()
print(f"System prompt: {config['llamaindex_config']['system_prompt']}")
print(f"Tools: {config['llamaindex_config']['tools']}")
# Gather execution traces with timing
traces = agent_adapter.gather_traces()
if 'total_tokens' in traces:
print(f"Total tokens: {traces['total_tokens']}")
# Use in benchmark
benchmark = MyBenchmark(agent_data={"agent": agent_adapter})
results = benchmark.run(tasks)
The adapter works with various LlamaIndex agent types including AgentWorkflow, FunctionAgent (tool calling), ReActAgent, and CodeActAgent.
Async Handling
LlamaIndex agents return a WorkflowHandler from .run() which must be awaited.
The adapter handles this automatically:
- Checks for
run_sync()method first (for compatibility) - Falls back to
asyncio.run()to execute the asyncrun()method - Works seamlessly in synchronous benchmarking contexts
This allows you to use async-first LlamaIndex agents in MASEval's sync API without any additional configuration.
Supported Agent Types
- AgentWorkflow: Multi-agent workflow orchestrator
- FunctionAgent: Function-calling based agent (for LLMs with tool calling)
- ReActAgent: ReAct prompting pattern agent
- CodeActAgent: Code execution based agent
Token Usage
Token usage is extracted from LLM responses when available. If the LLM response includes usage metadata, it's automatically captured in execution traces.
Requires
llama-index-core to be installed: pip install maseval[llamaindex]
__init__
__init__(
agent_instance: Any,
name: str,
callbacks: Optional[List[Any]] = None,
max_iterations: Optional[int] = None,
cost_calculator: Optional[CostCalculator] = None,
model_id: Optional[str] = None,
)
Initialize the LlamaIndex adapter.
| PARAMETER | DESCRIPTION |
|---|---|
agent_instance
|
LlamaIndex AgentWorkflow or BaseWorkflowAgent instance
TYPE:
|
name
|
Agent name
TYPE:
|
callbacks
|
Optional list of callbacks
TYPE:
|
max_iterations
|
Maximum number of agent iterations for AgentWorkflow.run(). If None, LlamaIndex's DEFAULT_MAX_ITERATIONS (20) is used. Bug fix: FunctionAgent does NOT have a max_steps constructor parameter — passing max_steps to it is silently swallowed by **kwargs. The actual iteration limit must be passed here so the adapter forwards it to AgentWorkflow.run(max_iterations=...).
TYPE:
|
cost_calculator
|
Optional cost calculator. If not provided, a
TYPE:
|
model_id
|
Optional model ID for cost calculation. If not provided,
auto-detected from
TYPE:
|
gather_config
gather_config() -> Dict[str, Any]
Gather configuration from this LlamaIndex agent.
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing: |
Dict[str, Any]
|
|
Dict[str, Any]
|
|
Dict[str, Any]
|
|
Dict[str, Any]
|
|
Dict[str, Any]
|
|
Dict[str, Any]
|
|
Dict[str, Any]
|
|
gather_traces
gather_traces() -> Dict[str, Any]
Gather execution traces from this agent.
Collects comprehensive information about the agent's execution including message history, callback information, and agent metadata.
Output fields:
type- Component class namegathered_at- ISO timestampname- Agent nameagent_type- Underlying agent framework class namemessage_count- Number of messages in historymessages- Full message history as list of dictscallbacks- List of callback class names attached to this agent
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing agent execution traces. |
How to use
This method is automatically called by Benchmark during trace collection. Framework-specific adapters can extend this to include additional data:
def gather_traces(self) -> Dict[str, Any]:
return {
**super().gather_traces(),
"framework_specific_metric": self.agent.some_metric
}
gather_usage
gather_usage() -> Usage
Gather usage with automatic cost calculation.
Calls _gather_usage() for raw token counts, then applies
the cost calculator if one is available and cost is still 0.0.
The model_id used for cost calculation is resolved in order:
- Explicit
model_idpassed to__init__ - Auto-detected from the framework agent via
_resolve_model_id()
Subclasses should override _gather_usage() (not this method)
to provide framework-specific token extraction.
| RETURNS | DESCRIPTION |
|---|---|
Usage
|
Usage (or TokenUsage) with cost filled in when possible. |
get_messages
get_messages() -> MessageHistory
Get message history from LlamaIndex.
For agents with accessible memory, fetches from the agent's memory. Otherwise, returns cached messages from the last run.
| RETURNS | DESCRIPTION |
|---|---|
MessageHistory
|
MessageHistory with converted messages |
run
run(query: str) -> Any
Executes the agent and returns the result.
LlamaIndexLLMUser
Bases: LLMUser
A LlamaIndex-specific LLM user that provides a tool for user interaction.
Extends LLMUser to provide a LlamaIndex-compatible tool via get_tool(). Requires llama-index-core to be installed.
Example
from maseval.interface.agents.llamaindex import LlamaIndexLLMUser
user = LlamaIndexLLMUser(...)
tool = user.get_tool() # Returns a LlamaIndex FunctionTool
termination_reason
property
termination_reason: TerminationReason
Get the reason why the user interaction terminated.
| RETURNS | DESCRIPTION |
|---|---|
TerminationReason
|
Why |
__init__
__init__(
name: str,
model: ModelAdapter,
user_profile: Dict[str, Any],
scenario: str,
initial_query: Optional[str] = None,
template: Optional[str] = None,
max_try: int = 3,
max_turns: int = 1,
stop_tokens: Optional[List[str]] = None,
early_stopping_condition: Optional[str] = None,
exhausted_response: Optional[str] = None,
)
Initialize the LLMUser.
| PARAMETER | DESCRIPTION |
|---|---|
name
|
The name of the user.
TYPE:
|
model
|
The language model to be used for generating responses.
TYPE:
|
user_profile
|
A dictionary describing the user's persona, preferences, and other relevant information.
TYPE:
|
scenario
|
A description of the situation or task the user is trying to accomplish.
TYPE:
|
initial_query
|
A pre-set query to start the conversation. If provided, it becomes the first user message. If None, call get_initial_query() to generate one from the model based on the user profile and scenario. Defaults to None.
TYPE:
|
template
|
A custom prompt template for the user simulator. Defaults to None.
TYPE:
|
max_try
|
The maximum number of attempts for the simulator to generate a valid response. Defaults to 3.
TYPE:
|
max_turns
|
Maximum number of user messages in the conversation. Each user message counts as one turn, including the initial_query. Use max_turns=1 for single-turn benchmarks, or higher values for multi-turn interaction. Defaults to 1.
TYPE:
|
stop_tokens
|
List of tokens that signal user satisfaction, enabling early termination. When the user's LLM-generated response contains any of these tokens, is_done() returns True regardless of remaining turns. The matched token is stripped from the response. Defaults to None (early stopping disabled).
TYPE:
|
early_stopping_condition
|
A description of when the user should stop the conversation (e.g., "all goals have been accomplished"). Used with stop_tokens to instruct the LLM when to emit a stop token. Must be provided if stop_tokens is set. Defaults to None.
TYPE:
|
exhausted_response
|
Message to return when
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If stop_tokens is set but early_stopping_condition is not provided. |
gather_config
gather_config() -> Dict[str, Any]
Gather configuration from this user.
Output fields:
name- User identifierprofile- User profile datascenario- Task scenario descriptionmax_turns- Maximum interaction turnsstop_tokens- Early stopping tokens (empty list if disabled)exhausted_response- Message returned when user is done, or None
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing user configuration. |
gather_traces
gather_traces() -> Dict[str, Any]
Gather execution traces from this user.
Output fields:
name- User identifierprofile- User profile datamessage_count- Number of messages in historymessages- Full conversation historylogs- Execution logs with timingtermination_reason- Why interaction ended (seeTerminationReason)stop_reason- Which stop token triggered termination, if anymax_turns- Maximum allowed turnsturns_used- Actual turns usedstopped_by_user- Whether user emitted a stop token
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing user state and interaction data. |
get_initial_query
get_initial_query() -> str
Get the initial query for the conversation.
If an initial_query was provided at construction, returns it. Otherwise, generates one using the LLM simulator based on the user's profile and scenario.
This method: - Returns the existing initial query if one was provided - Or calls the LLM simulator to generate one - Ensures the query is in the message history - Counts the initial query as the first turn
| RETURNS | DESCRIPTION |
|---|---|
str
|
The initial query (either pre-set or LLM-generated). |
| RAISES | DESCRIPTION |
|---|---|
RuntimeError
|
If called after conversation has progressed beyond the initial message. |
get_tool
get_tool() -> Any
Get a LlamaIndex-compatible tool for user interaction.
| RETURNS | DESCRIPTION |
|---|---|
Any
|
LlamaIndex FunctionTool that wraps the respond method. |
increment_turn
increment_turn() -> None
Increment the turn counter.
Call this after recording a user response in the message history.
is_done
is_done() -> bool
Check if the user interaction should end.
Checks: 1. If max_turns has been reached 2. If the user previously indicated termination (via stop_token)
Subclasses can override to add custom termination logic (e.g., LLM-based satisfaction checks) by calling super().is_done() first.
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if the user is done interacting, False to continue. |
respond
respond(message: str) -> str
Respond to a message from the agent using LLM simulation.
This method appends the agent's message to the conversation history, generates a response using the LLM simulator, appends the response to the history, and returns it.
If a stop_token is detected in the response, triggers early stopping.
| PARAMETER | DESCRIPTION |
|---|---|
message
|
The message from the agent to which the user should respond.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
The user's response, or |
| RAISES | DESCRIPTION |
|---|---|
UserExhaustedError
|
If the user is already done and no
|