LangGraph
Adapter for the LangGraph agent framework.
Installation
pip install maseval[langgraph]
Alternatively, install langgraph directly:
pip install langgraph
API Reference
LangGraphAgentAdapter
Bases: AgentAdapter
An AgentAdapter for LangGraph CompiledGraph agents.
This adapter integrates LangGraph's compiled graphs with MASEval's benchmarking framework, converting LangChain/LangGraph message types to OpenAI-compatible MessageHistory format. It preserves tool calls, tool responses, multi-modal content, and supports both stateless and stateful (checkpointed) graph execution.
LangGraph graphs can operate in two modes:
- Stateless: Messages from invoke() result are cached in the adapter for access
- Stateful: With checkpointer and thread_id, messages are fetched from persistent state
The adapter automatically handles both modes, preferring persistent state when available and falling back to cached results for stateless graphs.
How to use
- Create a LangGraph graph with state and nodes
- Compile the graph (optionally with checkpointer for state persistence)
- Wrap with LangGraphAgentAdapter to enable MASEval integration
- Use in benchmarks or call directly for testing
- Access traces and config for analysis and debugging
Example workflow:
from maseval.interface.agents.langgraph import LangGraphAgentAdapter
from langgraph.graph import StateGraph, MessagesState
from langgraph.checkpoint.memory import MemorySaver
# Define your graph
def chatbot(state: MessagesState):
# Your agent logic
return {"messages": [response]}
# Build graph
graph = StateGraph(MessagesState)
graph.add_node("chatbot", chatbot)
graph.set_entry_point("chatbot")
graph.set_finish_point("chatbot")
# Compile (stateless)
compiled_graph = graph.compile()
agent_adapter = LangGraphAgentAdapter(compiled_graph, "agent_name")
# Or compile with checkpointer (stateful)
memory = MemorySaver()
compiled_graph = graph.compile(checkpointer=memory)
config = {"configurable": {"thread_id": "session_1"}}
agent_adapter = LangGraphAgentAdapter(
compiled_graph,
"agent_name",
config=config
)
# Run agent
result = agent_adapter.run("What's the weather?")
# Access message history in OpenAI format
for msg in agent_adapter.get_messages():
print(f"{msg['role']}: {msg['content']}")
# Gather execution traces
traces = agent_adapter.gather_traces()
if 'total_tokens' in traces:
print(f"Total tokens: {traces['total_tokens']}")
# Use in benchmark
benchmark = MyBenchmark(agent_data={"agent": agent_adapter})
results = benchmark.run(tasks)
For stateful graphs, the adapter preserves conversation context across multiple calls using the same thread_id, enabling multi-turn interactions.
Token Usage
If LangChain messages include usage_metadata, the adapter automatically extracts
and aggregates token counts. This is available for models that provide usage information.
Requires
langgraph to be installed: pip install maseval[langgraph]
__init__
__init__(
agent_instance: Any,
name: str,
callbacks: Optional[List[Any]] = None,
config: Optional[Dict[str, Any]] = None,
cost_calculator: Optional[CostCalculator] = None,
model_id: Optional[str] = None,
)
Initialize the LangGraph adapter.
| PARAMETER | DESCRIPTION |
|---|---|
agent_instance
|
Compiled LangGraph graph
TYPE:
|
name
|
Agent name
TYPE:
|
callbacks
|
Optional list of callbacks
TYPE:
|
config
|
Optional LangGraph config dict (for stateful graphs with checkpointer).
Should include
TYPE:
|
cost_calculator
|
Optional cost calculator. If not provided, a
TYPE:
|
model_id
|
Model ID for cost calculation. LangGraph graphs can contain multiple models across nodes, so the model ID cannot be auto-detected. Pass the primary model's ID here to enable cost tracking::
TYPE:
|
gather_config
gather_config() -> dict[str, Any]
Gather configuration from this LangGraph agent.
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dictionary containing: |
dict[str, Any]
|
|
dict[str, Any]
|
|
dict[str, Any]
|
|
dict[str, Any]
|
|
dict[str, Any]
|
|
dict[str, Any]
|
|
dict[str, Any]
|
|
dict[str, Any]
|
|
dict[str, Any]
|
|
gather_traces
gather_traces() -> Dict[str, Any]
Gather execution traces from this agent.
Collects comprehensive information about the agent's execution including message history, callback information, and agent metadata.
Output fields:
type- Component class namegathered_at- ISO timestampname- Agent nameagent_type- Underlying agent framework class namemessage_count- Number of messages in historymessages- Full message history as list of dictscallbacks- List of callback class names attached to this agent
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing agent execution traces. |
How to use
This method is automatically called by Benchmark during trace collection. Framework-specific adapters can extend this to include additional data:
def gather_traces(self) -> Dict[str, Any]:
return {
**super().gather_traces(),
"framework_specific_metric": self.agent.some_metric
}
gather_usage
gather_usage() -> Usage
Gather usage with automatic cost calculation.
Calls _gather_usage() for raw token counts, then applies
the cost calculator if one is available and cost is still 0.0.
The model_id used for cost calculation is resolved in order:
- Explicit
model_idpassed to__init__ - Auto-detected from the framework agent via
_resolve_model_id()
Subclasses should override _gather_usage() (not this method)
to provide framework-specific token extraction.
| RETURNS | DESCRIPTION |
|---|---|
Usage
|
Usage (or TokenUsage) with cost filled in when possible. |
get_messages
get_messages() -> MessageHistory
Get message history from LangGraph.
For stateful graphs (with checkpointer and thread_id), fetches from graph state. For stateless graphs, returns cached messages from last run.
| RETURNS | DESCRIPTION |
|---|---|
MessageHistory
|
MessageHistory with converted messages |
run
run(query: str) -> Any
Executes the agent and returns the result.
LangGraphLLMUser
Bases: LLMUser
A LangGraph-specific LLM user that provides a tool for user interaction.
Extends LLMUser to provide a LangChain-compatible tool via get_tool(). Requires langgraph to be installed.
Example
from maseval.interface.agents.langgraph import LangGraphLLMUser
user = LangGraphLLMUser(...)
tool = user.get_tool() # Returns a LangChain tool
termination_reason
property
termination_reason: TerminationReason
Get the reason why the user interaction terminated.
| RETURNS | DESCRIPTION |
|---|---|
TerminationReason
|
Why |
__init__
__init__(
name: str,
model: ModelAdapter,
user_profile: Dict[str, Any],
scenario: str,
initial_query: Optional[str] = None,
template: Optional[str] = None,
max_try: int = 3,
max_turns: int = 1,
stop_tokens: Optional[List[str]] = None,
early_stopping_condition: Optional[str] = None,
exhausted_response: Optional[str] = None,
)
Initialize the LLMUser.
| PARAMETER | DESCRIPTION |
|---|---|
name
|
The name of the user.
TYPE:
|
model
|
The language model to be used for generating responses.
TYPE:
|
user_profile
|
A dictionary describing the user's persona, preferences, and other relevant information.
TYPE:
|
scenario
|
A description of the situation or task the user is trying to accomplish.
TYPE:
|
initial_query
|
A pre-set query to start the conversation. If provided, it becomes the first user message. If None, call get_initial_query() to generate one from the model based on the user profile and scenario. Defaults to None.
TYPE:
|
template
|
A custom prompt template for the user simulator. Defaults to None.
TYPE:
|
max_try
|
The maximum number of attempts for the simulator to generate a valid response. Defaults to 3.
TYPE:
|
max_turns
|
Maximum number of user messages in the conversation. Each user message counts as one turn, including the initial_query. Use max_turns=1 for single-turn benchmarks, or higher values for multi-turn interaction. Defaults to 1.
TYPE:
|
stop_tokens
|
List of tokens that signal user satisfaction, enabling early termination. When the user's LLM-generated response contains any of these tokens, is_done() returns True regardless of remaining turns. The matched token is stripped from the response. Defaults to None (early stopping disabled).
TYPE:
|
early_stopping_condition
|
A description of when the user should stop the conversation (e.g., "all goals have been accomplished"). Used with stop_tokens to instruct the LLM when to emit a stop token. Must be provided if stop_tokens is set. Defaults to None.
TYPE:
|
exhausted_response
|
Message to return when
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If stop_tokens is set but early_stopping_condition is not provided. |
gather_config
gather_config() -> Dict[str, Any]
Gather configuration from this user.
Output fields:
name- User identifierprofile- User profile datascenario- Task scenario descriptionmax_turns- Maximum interaction turnsstop_tokens- Early stopping tokens (empty list if disabled)exhausted_response- Message returned when user is done, or None
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing user configuration. |
gather_traces
gather_traces() -> Dict[str, Any]
Gather execution traces from this user.
Output fields:
name- User identifierprofile- User profile datamessage_count- Number of messages in historymessages- Full conversation historylogs- Execution logs with timingtermination_reason- Why interaction ended (seeTerminationReason)stop_reason- Which stop token triggered termination, if anymax_turns- Maximum allowed turnsturns_used- Actual turns usedstopped_by_user- Whether user emitted a stop token
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing user state and interaction data. |
get_initial_query
get_initial_query() -> str
Get the initial query for the conversation.
If an initial_query was provided at construction, returns it. Otherwise, generates one using the LLM simulator based on the user's profile and scenario.
This method: - Returns the existing initial query if one was provided - Or calls the LLM simulator to generate one - Ensures the query is in the message history - Counts the initial query as the first turn
| RETURNS | DESCRIPTION |
|---|---|
str
|
The initial query (either pre-set or LLM-generated). |
| RAISES | DESCRIPTION |
|---|---|
RuntimeError
|
If called after conversation has progressed beyond the initial message. |
get_tool
get_tool()
Get a LangChain-compatible tool for user interaction.
increment_turn
increment_turn() -> None
Increment the turn counter.
Call this after recording a user response in the message history.
is_done
is_done() -> bool
Check if the user interaction should end.
Checks: 1. If max_turns has been reached 2. If the user previously indicated termination (via stop_token)
Subclasses can override to add custom termination logic (e.g., LLM-based satisfaction checks) by calling super().is_done() first.
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if the user is done interacting, False to continue. |
respond
respond(message: str) -> str
Respond to a message from the agent using LLM simulation.
This method appends the agent's message to the conversation history, generates a response using the LLM simulator, appends the response to the history, and returns it.
If a stop_token is detected in the response, triggers early stopping.
| PARAMETER | DESCRIPTION |
|---|---|
message
|
The message from the agent to which the user should respond.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
The user's response, or |
| RAISES | DESCRIPTION |
|---|---|
UserExhaustedError
|
If the user is already done and no
|