Anthropic Inference Adapter
This page documents the Anthropic model adapter for MASEval.
AnthropicModelAdapter
Bases: ModelAdapter
Adapter for Anthropic Claude models.
Works with Claude models through the official Anthropic Python SDK. Pass any model ID supported by the Anthropic API.
The adapter accepts OpenAI-style messages and converts them to Anthropic's format internally. Key differences handled automatically:
- System messages are passed separately (not in messages array)
- Tool definitions are converted to Anthropic format
- Tool responses are converted to tool_result content blocks
seed
property
seed: Optional[int]
Seed for deterministic generation, or None if unseeded.
__init__
__init__(
client: Any,
model_id: str,
default_generation_params: Optional[
Dict[str, Any]
] = None,
max_tokens: int = 4096,
seed: Optional[int] = None,
cost_calculator: Optional[CostCalculator] = None,
)
Initialize Anthropic model adapter.
| PARAMETER | DESCRIPTION |
|---|---|
client
|
An anthropic.Anthropic client instance.
TYPE:
|
model_id
|
The model identifier (e.g., "claude-sonnet-4-5-20250514").
TYPE:
|
default_generation_params
|
Default parameters for all calls. Common parameters: temperature, top_p, top_k.
TYPE:
|
max_tokens
|
Maximum tokens to generate. Anthropic requires this parameter. Default is 4096.
TYPE:
|
seed
|
Seed for deterministic generation. Note: Anthropic does NOT support seeding. Providing a seed will raise SeedingError.
TYPE:
|
cost_calculator
|
Optional cost calculator for computing cost from token counts when the provider doesn't report cost directly.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
SeedingError
|
If seed is provided (Anthropic doesn't support seeding). |
chat
chat(
messages: Union[List[Dict[str, Any]], MessageHistory],
generation_params: Optional[Dict[str, Any]] = None,
tools: Optional[List[Dict[str, Any]]] = None,
tool_choice: Optional[
Union[str, Dict[str, Any]]
] = None,
**kwargs: Any,
) -> ChatResponse
Send messages to the model and get a response.
This is the primary method for interacting with the model. Pass a conversation history and receive the model's response.
| PARAMETER | DESCRIPTION |
|---|---|
messages
|
The conversation history. Either a list of message dicts in OpenAI format, or a MessageHistory object. Each message has 'role' ('system', 'user', 'assistant', 'tool') and 'content' keys.
TYPE:
|
generation_params
|
Model parameters like temperature, max_tokens, top_p, etc. Provider-specific parameters are also accepted.
TYPE:
|
tools
|
Tool definitions the model can use. Each tool is a dict with 'type' (usually 'function') and 'function' containing 'name', 'description', and 'parameters' (JSON Schema).
TYPE:
|
tool_choice
|
How the model should use tools: - "auto": Model decides whether to use tools (default) - "none": Model won't use tools - "required": Model must use a tool - {"type": "function", "function": {"name": "..."}}: Use specific tool
TYPE:
|
**kwargs
|
Additional provider-specific arguments.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ChatResponse
|
ChatResponse containing the model's response (text and/or tool calls). |
| RAISES | DESCRIPTION |
|---|---|
Exception
|
Provider-specific errors are logged and re-raised. |
Example
# Simple conversation
response = model.chat([
{"role": "user", "content": "Hello!"}
])
print(response.content)
# With system prompt
response = model.chat([
{"role": "system", "content": "You are a pirate."},
{"role": "user", "content": "Hello!"}
])
# With tools
response = model.chat(
messages=[{"role": "user", "content": "What's 2+2?"}],
tools=[{
"type": "function",
"function": {
"name": "calculator",
"description": "Evaluate math expressions",
"parameters": {
"type": "object",
"properties": {"expression": {"type": "string"}},
"required": ["expression"]
}
}
}]
)
gather_config
gather_config() -> Dict[str, Any]
Gather configuration from this Anthropic model adapter.
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing model configuration. |
gather_traces
gather_traces() -> Dict[str, Any]
Gather execution traces from this model adapter.
Called automatically by Benchmark to collect execution data for evaluation. Returns comprehensive statistics about all calls made to this adapter.
Output fields:
type- Component class namegathered_at- ISO timestampmodel_id- Model identifiertotal_calls- Number of chat/generate callssuccessful_calls- Number of successful callsfailed_calls- Number of failed callstotal_duration_seconds- Total time spent in callsaverage_duration_seconds- Average time per calllogs- List of individual call records
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing model execution traces. |
gather_usage
gather_usage() -> Usage
Gather accumulated token usage from all chat calls.
| RETURNS | DESCRIPTION |
|---|---|
Usage
|
Summed TokenUsage across all calls, or empty TokenUsage if no calls were made. |
generate
generate(
prompt: str,
generation_params: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> str
Generate text from a simple prompt.
This is a convenience method that wraps the prompt in a user message
and calls chat(). Use this for simple text-in/text-out scenarios.
For conversations or tool use, use chat() directly.
| PARAMETER | DESCRIPTION |
|---|---|
prompt
|
The input prompt.
TYPE:
|
generation_params
|
Generation parameters (temperature, max_tokens, etc.).
TYPE:
|
**kwargs
|
Additional provider-specific arguments.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
The model's text response. |
Example
response = model.generate("What is the capital of France?")
print(response) # "Paris"