Skip to content

LlamaIndex

Adapter implementing commonly used functions for LlamaIndex's workflow-based agent system.

Installation

pip install maseval[llamaindex]

Alternatively, install llama-index-core directly:

pip install llama-index-core

API Reference

View source

LlamaIndexAgentAdapter

Bases: AgentAdapter

An AgentAdapter for LlamaIndex workflow-based agents.

This adapter integrates LlamaIndex's workflow-based agent system with MASEval's benchmarking framework, converting LlamaIndex's ChatMessage format to OpenAI-compatible MessageHistory format. It handles both AgentWorkflow and BaseWorkflowAgent instances, automatically managing async execution in synchronous contexts.

LlamaIndex agents are async-first, using workflows that must be awaited. This adapter handles the async-to-sync conversion automatically, supporting both agents with persistent memory and stateless execution modes. It seamlessly integrates with MASEval's synchronous benchmarking API.

How to use
  1. Create a LlamaIndex workflow agent with tools and LLM
  2. Wrap with LlamaIndexAgentAdapter to enable MASEval integration
  3. Use in benchmarks or call directly for testing
  4. Access traces and config for analysis and debugging

Example workflow:

from maseval.interface.agents.llamaindex import LlamaIndexAgentAdapter
from llama_index.core.agent.workflow import AgentWorkflow
from llama_index.core.llms import OpenAI
from llama_index.core.tools import FunctionTool

# Define a tool
def search(query: str) -> str:
    """Search for information."""
    return f"Results for: {query}"

search_tool = FunctionTool.from_defaults(fn=search)

# Create a LlamaIndex workflow
workflow = AgentWorkflow.from_tools_or_functions(
    tools_or_functions=[search_tool],
    llm=OpenAI(model="gpt-4"),
    system_prompt="You are a helpful research assistant"
)

# Wrap with adapter
agent_adapter = LlamaIndexAgentAdapter(workflow, "research_agent")

# Run agent (async handled automatically)
result = agent_adapter.run("What are the latest developments in quantum computing?")

# Access message history in OpenAI format
for msg in agent_adapter.get_messages():
    print(f"{msg['role']}: {msg['content']}")

# Gather configuration including tools and system prompt
config = agent_adapter.gather_config()
print(f"System prompt: {config['llamaindex_config']['system_prompt']}")
print(f"Tools: {config['llamaindex_config']['tools']}")

# Gather execution traces with timing
traces = agent_adapter.gather_traces()
if 'total_tokens' in traces:
    print(f"Total tokens: {traces['total_tokens']}")

# Use in benchmark
benchmark = MyBenchmark(agent_data={"agent": agent_adapter})
results = benchmark.run(tasks)

The adapter works with various LlamaIndex agent types including AgentWorkflow, FunctionAgent (tool calling), ReActAgent, and CodeActAgent.

Message Format

LlamaIndex uses ChatMessage objects with MessageRole enums. The adapter converts to maseval / OpenAI format.

Tool calls are preserved in the additional_kwargs field and converted to OpenAI's tool call format when available.

Async Handling

LlamaIndex agents return a WorkflowHandler from .run() which must be awaited. The adapter handles this automatically:

  • Checks for run_sync() method first (for compatibility)
  • Falls back to asyncio.run() to execute the async run() method
  • Works seamlessly in synchronous benchmarking contexts

This allows you to use async-first LlamaIndex agents in MASEval's sync API without any additional configuration.

Supported Agent Types
  • AgentWorkflow: Multi-agent workflow orchestrator
  • FunctionAgent: Function-calling based agent (for LLMs with tool calling)
  • ReActAgent: ReAct prompting pattern agent
  • CodeActAgent: Code execution based agent
Token Usage

Token usage is extracted from LLM responses when available. If the LLM response includes usage metadata, it's automatically captured in execution traces.

Requires

llama-index-core to be installed: pip install maseval[llamaindex]

__init__

__init__(
    agent_instance: Any,
    name: str,
    callbacks: Optional[List[Any]] = None,
    max_iterations: Optional[int] = None,
    cost_calculator: Optional[CostCalculator] = None,
    model_id: Optional[str] = None,
)

Initialize the LlamaIndex adapter.

PARAMETER DESCRIPTION
agent_instance

LlamaIndex AgentWorkflow or BaseWorkflowAgent instance

TYPE: Any

name

Agent name

TYPE: str

callbacks

Optional list of callbacks

TYPE: Optional[List[Any]] DEFAULT: None

max_iterations

Maximum number of agent iterations for AgentWorkflow.run(). If None, LlamaIndex's DEFAULT_MAX_ITERATIONS (20) is used. Bug fix: FunctionAgent does NOT have a max_steps constructor parameter — passing max_steps to it is silently swallowed by **kwargs. The actual iteration limit must be passed here so the adapter forwards it to AgentWorkflow.run(max_iterations=...).

TYPE: Optional[int] DEFAULT: None

cost_calculator

Optional cost calculator. If not provided, a LiteLLMCostCalculator is created automatically when litellm is available.

TYPE: Optional[CostCalculator] DEFAULT: None

model_id

Optional model ID for cost calculation. If not provided, auto-detected from agent.llm.metadata.model_name.

TYPE: Optional[str] DEFAULT: None

gather_config

gather_config() -> Dict[str, Any]

Gather configuration from this LlamaIndex agent.

RETURNS DESCRIPTION
Dict[str, Any]

Dictionary containing:

Dict[str, Any]
  • type: Component class name
Dict[str, Any]
  • gathered_at: ISO timestamp
Dict[str, Any]
  • name: Agent name
Dict[str, Any]
  • agent_type: Underlying agent class name
Dict[str, Any]
  • adapter_type: LlamaIndexAgentAdapter
Dict[str, Any]
  • callbacks: List of callback class names
Dict[str, Any]
  • llamaindex_config: LlamaIndex-specific configuration (if available)

gather_traces

gather_traces() -> Dict[str, Any]

Gather execution traces from this agent.

Collects comprehensive information about the agent's execution including message history, callback information, and agent metadata.

Output fields:

  • type - Component class name
  • gathered_at - ISO timestamp
  • name - Agent name
  • agent_type - Underlying agent framework class name
  • message_count - Number of messages in history
  • messages - Full message history as list of dicts
  • callbacks - List of callback class names attached to this agent
RETURNS DESCRIPTION
Dict[str, Any]

Dictionary containing agent execution traces.

How to use

This method is automatically called by Benchmark during trace collection. Framework-specific adapters can extend this to include additional data:

def gather_traces(self) -> Dict[str, Any]:
    return {
        **super().gather_traces(),
        "framework_specific_metric": self.agent.some_metric
    }

gather_usage

gather_usage() -> Usage

Gather usage with automatic cost calculation.

Calls _gather_usage() for raw token counts, then applies the cost calculator if one is available and cost is still 0.0.

The model_id used for cost calculation is resolved in order:

  1. Explicit model_id passed to __init__
  2. Auto-detected from the framework agent via _resolve_model_id()

Subclasses should override _gather_usage() (not this method) to provide framework-specific token extraction.

RETURNS DESCRIPTION
Usage

Usage (or TokenUsage) with cost filled in when possible.

get_messages

get_messages() -> MessageHistory

Get message history from LlamaIndex.

For agents with accessible memory, fetches from the agent's memory. Otherwise, returns cached messages from the last run.

RETURNS DESCRIPTION
MessageHistory

MessageHistory with converted messages

run

run(query: str) -> Any

Executes the agent and returns the result.

LlamaIndexLLMUser

Bases: LLMUser

A LlamaIndex-specific LLM user that provides a tool for user interaction.

Extends LLMUser to provide a LlamaIndex-compatible tool via get_tool(). Requires llama-index-core to be installed.

Example
from maseval.interface.agents.llamaindex import LlamaIndexLLMUser

user = LlamaIndexLLMUser(...)
tool = user.get_tool()  # Returns a LlamaIndex FunctionTool

termination_reason property

termination_reason: TerminationReason

Get the reason why the user interaction terminated.

RETURNS DESCRIPTION
TerminationReason

Why is_done() returns True, or NOT_TERMINATED if still ongoing.

__init__

__init__(
    name: str,
    model: ModelAdapter,
    user_profile: Dict[str, Any],
    scenario: str,
    initial_query: Optional[str] = None,
    template: Optional[str] = None,
    max_try: int = 3,
    max_turns: int = 1,
    stop_tokens: Optional[List[str]] = None,
    early_stopping_condition: Optional[str] = None,
    exhausted_response: Optional[str] = None,
)

Initialize the LLMUser.

PARAMETER DESCRIPTION
name

The name of the user.

TYPE: str

model

The language model to be used for generating responses.

TYPE: ModelAdapter

user_profile

A dictionary describing the user's persona, preferences, and other relevant information.

TYPE: Dict[str, Any]

scenario

A description of the situation or task the user is trying to accomplish.

TYPE: str

initial_query

A pre-set query to start the conversation. If provided, it becomes the first user message. If None, call get_initial_query() to generate one from the model based on the user profile and scenario. Defaults to None.

TYPE: Optional[str] DEFAULT: None

template

A custom prompt template for the user simulator. Defaults to None.

TYPE: Optional[str] DEFAULT: None

max_try

The maximum number of attempts for the simulator to generate a valid response. Defaults to 3.

TYPE: int DEFAULT: 3

max_turns

Maximum number of user messages in the conversation. Each user message counts as one turn, including the initial_query. Use max_turns=1 for single-turn benchmarks, or higher values for multi-turn interaction. Defaults to 1.

TYPE: int DEFAULT: 1

stop_tokens

List of tokens that signal user satisfaction, enabling early termination. When the user's LLM-generated response contains any of these tokens, is_done() returns True regardless of remaining turns. The matched token is stripped from the response. Defaults to None (early stopping disabled).

TYPE: Optional[List[str]] DEFAULT: None

early_stopping_condition

A description of when the user should stop the conversation (e.g., "all goals have been accomplished"). Used with stop_tokens to instruct the LLM when to emit a stop token. Must be provided if stop_tokens is set. Defaults to None.

TYPE: Optional[str] DEFAULT: None

exhausted_response

Message to return when respond() is called after the user is done. If None (default), raises UserExhaustedError instead. Set this to a descriptive string (e.g., "The user is no longer available. Proceed with the information you have.") for tool-based integrations where the agent controls when to call the user.

TYPE: Optional[str] DEFAULT: None

RAISES DESCRIPTION
ValueError

If stop_tokens is set but early_stopping_condition is not provided.

gather_config

gather_config() -> Dict[str, Any]

Gather configuration from this user.

Output fields:

  • name - User identifier
  • profile - User profile data
  • scenario - Task scenario description
  • max_turns - Maximum interaction turns
  • stop_tokens - Early stopping tokens (empty list if disabled)
  • exhausted_response - Message returned when user is done, or None
RETURNS DESCRIPTION
Dict[str, Any]

Dictionary containing user configuration.

gather_traces

gather_traces() -> Dict[str, Any]

Gather execution traces from this user.

Output fields:

  • name - User identifier
  • profile - User profile data
  • message_count - Number of messages in history
  • messages - Full conversation history
  • logs - Execution logs with timing
  • termination_reason - Why interaction ended (see TerminationReason)
  • stop_reason - Which stop token triggered termination, if any
  • max_turns - Maximum allowed turns
  • turns_used - Actual turns used
  • stopped_by_user - Whether user emitted a stop token
RETURNS DESCRIPTION
Dict[str, Any]

Dictionary containing user state and interaction data.

get_initial_query

get_initial_query() -> str

Get the initial query for the conversation.

If an initial_query was provided at construction, returns it. Otherwise, generates one using the LLM simulator based on the user's profile and scenario.

This method: - Returns the existing initial query if one was provided - Or calls the LLM simulator to generate one - Ensures the query is in the message history - Counts the initial query as the first turn

RETURNS DESCRIPTION
str

The initial query (either pre-set or LLM-generated).

RAISES DESCRIPTION
RuntimeError

If called after conversation has progressed beyond the initial message.

get_tool

get_tool() -> Any

Get a LlamaIndex-compatible tool for user interaction.

RETURNS DESCRIPTION
Any

LlamaIndex FunctionTool that wraps the respond method.

increment_turn

increment_turn() -> None

Increment the turn counter.

Call this after recording a user response in the message history.

is_done

is_done() -> bool

Check if the user interaction should end.

Checks: 1. If max_turns has been reached 2. If the user previously indicated termination (via stop_token)

Subclasses can override to add custom termination logic (e.g., LLM-based satisfaction checks) by calling super().is_done() first.

RETURNS DESCRIPTION
bool

True if the user is done interacting, False to continue.

respond

respond(message: str) -> str

Respond to a message from the agent using LLM simulation.

This method appends the agent's message to the conversation history, generates a response using the LLM simulator, appends the response to the history, and returns it.

If a stop_token is detected in the response, triggers early stopping.

PARAMETER DESCRIPTION
message

The message from the agent to which the user should respond.

TYPE: str

RETURNS DESCRIPTION
str

The user's response, or exhausted_response if done and configured.

RAISES DESCRIPTION
UserExhaustedError

If the user is already done and no exhausted_response is configured.