Model Adapters
Model Adapters provide a uniform runtime interface to heterogeneous model providers (HuggingFace, OpenAI, Google GenAI, or simple callables). Each adapter exposes a stable API so that maseval can call models without handling provider-specific shapes.
Note
Benchmark expects AgentAdapter instances; it does not consume model adapters directly. ModelAdapters are used by agents, simulators, others that directly invoke models.
ModelAdapter
Bases: ABC, TraceableMixin, ConfigurableMixin, UsageTrackableMixin
Abstract base class for model adapters.
ModelAdapter provides a consistent interface for LLM inference across different providers. All adapters implement the same methods, so you can swap providers without changing your code.
To use a model adapter
- Create an instance with provider-specific configuration
- Call
chat()for message-based conversations - Call
generate()for simple text-in/text-out
The adapter automatically tracks all calls for tracing and evaluation.
Implementing a custom adapter
Subclass ModelAdapter and implement:
- model_id property: Return the model identifier string
- _chat_impl(): The actual chat completion logic
See maseval.interface.inference for concrete implementations: - AnthropicModelAdapter - GoogleGenAIModelAdapter - HuggingFacePipelineModelAdapter - LiteLLMModelAdapter - OpenAIModelAdapter
Seeding
Pass a seed parameter to enable deterministic generation. This seed
is passed to the underlying provider API if supported. If a seed is
provided but the provider doesn't support seeding, the adapter should
raise SeedingError from maseval.core.seeding.
User-provided generation_params["seed"] takes precedence over the adapter's seed parameter.
model_id
abstractmethod
property
model_id: str
The identifier for the underlying model.
| RETURNS | DESCRIPTION |
|---|---|
str
|
A string identifying the model (e.g., "gpt-4", "claude-sonnet-4-5", |
str
|
"gemini-pro"). Used for tracing and configuration. |
seed
property
seed: Optional[int]
Seed for deterministic generation, or None if unseeded.
__init__
__init__(
seed: Optional[int] = None,
cost_calculator: Optional[CostCalculator] = None,
)
Initialize the model adapter with call tracing.
| PARAMETER | DESCRIPTION |
|---|---|
seed
|
Seed for deterministic generation. Passed to the underlying provider API if supported. If the provider doesn't support seeding, subclasses should raise SeedingError.
TYPE:
|
cost_calculator
|
Optional cost calculator for computing USD (or
other unit) cost from token counts. If provided and the
provider does not report cost directly, the calculator is
used to fill in
TYPE:
|
chat
chat(
messages: Union[List[Dict[str, Any]], MessageHistory],
generation_params: Optional[Dict[str, Any]] = None,
tools: Optional[List[Dict[str, Any]]] = None,
tool_choice: Optional[
Union[str, Dict[str, Any]]
] = None,
**kwargs: Any,
) -> ChatResponse
Send messages to the model and get a response.
This is the primary method for interacting with the model. Pass a conversation history and receive the model's response.
| PARAMETER | DESCRIPTION |
|---|---|
messages
|
The conversation history. Either a list of message dicts in OpenAI format, or a MessageHistory object. Each message has 'role' ('system', 'user', 'assistant', 'tool') and 'content' keys.
TYPE:
|
generation_params
|
Model parameters like temperature, max_tokens, top_p, etc. Provider-specific parameters are also accepted.
TYPE:
|
tools
|
Tool definitions the model can use. Each tool is a dict with 'type' (usually 'function') and 'function' containing 'name', 'description', and 'parameters' (JSON Schema).
TYPE:
|
tool_choice
|
How the model should use tools: - "auto": Model decides whether to use tools (default) - "none": Model won't use tools - "required": Model must use a tool - {"type": "function", "function": {"name": "..."}}: Use specific tool
TYPE:
|
**kwargs
|
Additional provider-specific arguments.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ChatResponse
|
ChatResponse containing the model's response (text and/or tool calls). |
| RAISES | DESCRIPTION |
|---|---|
Exception
|
Provider-specific errors are logged and re-raised. |
Example
# Simple conversation
response = model.chat([
{"role": "user", "content": "Hello!"}
])
print(response.content)
# With system prompt
response = model.chat([
{"role": "system", "content": "You are a pirate."},
{"role": "user", "content": "Hello!"}
])
# With tools
response = model.chat(
messages=[{"role": "user", "content": "What's 2+2?"}],
tools=[{
"type": "function",
"function": {
"name": "calculator",
"description": "Evaluate math expressions",
"parameters": {
"type": "object",
"properties": {"expression": {"type": "string"}},
"required": ["expression"]
}
}
}]
)
gather_config
gather_config() -> Dict[str, Any]
Gather configuration from this model adapter.
Called automatically by Benchmark to collect configuration for reproducibility. Returns identifying information about this adapter.
Output fields:
type- Component class namegathered_at- ISO timestampmodel_id- Model identifieradapter_type- The specific adapter class nameseed- Seed for deterministic generation, or None if unseeded
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing model configuration. |
gather_traces
gather_traces() -> Dict[str, Any]
Gather execution traces from this model adapter.
Called automatically by Benchmark to collect execution data for evaluation. Returns comprehensive statistics about all calls made to this adapter.
Output fields:
type- Component class namegathered_at- ISO timestampmodel_id- Model identifiertotal_calls- Number of chat/generate callssuccessful_calls- Number of successful callsfailed_calls- Number of failed callstotal_duration_seconds- Total time spent in callsaverage_duration_seconds- Average time per calllogs- List of individual call records
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing model execution traces. |
gather_usage
gather_usage() -> Usage
Gather accumulated token usage from all chat calls.
| RETURNS | DESCRIPTION |
|---|---|
Usage
|
Summed TokenUsage across all calls, or empty TokenUsage if no calls were made. |
generate
generate(
prompt: str,
generation_params: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> str
Generate text from a simple prompt.
This is a convenience method that wraps the prompt in a user message
and calls chat(). Use this for simple text-in/text-out scenarios.
For conversations or tool use, use chat() directly.
| PARAMETER | DESCRIPTION |
|---|---|
prompt
|
The input prompt.
TYPE:
|
generation_params
|
Generation parameters (temperature, max_tokens, etc.).
TYPE:
|
**kwargs
|
Additional provider-specific arguments.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
The model's text response. |
Example
response = model.generate("What is the capital of France?")
print(response) # "Paris"
Interfaces
The following adapter classes implement the ModelAdapter interface for specific providers. Each requires their own dependencies.
OpenAIModelAdapter
Bases: ModelAdapter
Adapter for OpenAI and OpenAI-compatible APIs.
Works with
- OpenAI API (gpt-4, gpt-3.5-turbo, etc.)
- Azure OpenAI
- Any OpenAI-compatible server (vLLM, LocalAI, etc.)
The adapter expects an OpenAI client instance. API keys and configuration should be set on the client before passing it to the adapter.
seed
property
seed: Optional[int]
Seed for deterministic generation, or None if unseeded.
__init__
__init__(
client: Any,
model_id: str,
default_generation_params: Optional[
Dict[str, Any]
] = None,
seed: Optional[int] = None,
cost_calculator: Optional[CostCalculator] = None,
)
Initialize OpenAI model adapter.
| PARAMETER | DESCRIPTION |
|---|---|
client
|
An OpenAI client instance (openai.OpenAI or openai.AzureOpenAI). The client should already be configured with API keys.
TYPE:
|
model_id
|
The model identifier (e.g., "gpt-4", "gpt-3.5-turbo").
TYPE:
|
default_generation_params
|
Default parameters for all calls. Common parameters: temperature, max_tokens, top_p.
TYPE:
|
seed
|
Seed for deterministic generation. OpenAI supports this natively. Note: Determinism is best-effort, not guaranteed by OpenAI.
TYPE:
|
cost_calculator
|
Optional cost calculator for computing cost from token counts when the provider doesn't report cost directly.
TYPE:
|
chat
chat(
messages: Union[List[Dict[str, Any]], MessageHistory],
generation_params: Optional[Dict[str, Any]] = None,
tools: Optional[List[Dict[str, Any]]] = None,
tool_choice: Optional[
Union[str, Dict[str, Any]]
] = None,
**kwargs: Any,
) -> ChatResponse
Send messages to the model and get a response.
This is the primary method for interacting with the model. Pass a conversation history and receive the model's response.
| PARAMETER | DESCRIPTION |
|---|---|
messages
|
The conversation history. Either a list of message dicts in OpenAI format, or a MessageHistory object. Each message has 'role' ('system', 'user', 'assistant', 'tool') and 'content' keys.
TYPE:
|
generation_params
|
Model parameters like temperature, max_tokens, top_p, etc. Provider-specific parameters are also accepted.
TYPE:
|
tools
|
Tool definitions the model can use. Each tool is a dict with 'type' (usually 'function') and 'function' containing 'name', 'description', and 'parameters' (JSON Schema).
TYPE:
|
tool_choice
|
How the model should use tools: - "auto": Model decides whether to use tools (default) - "none": Model won't use tools - "required": Model must use a tool - {"type": "function", "function": {"name": "..."}}: Use specific tool
TYPE:
|
**kwargs
|
Additional provider-specific arguments.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ChatResponse
|
ChatResponse containing the model's response (text and/or tool calls). |
| RAISES | DESCRIPTION |
|---|---|
Exception
|
Provider-specific errors are logged and re-raised. |
Example
# Simple conversation
response = model.chat([
{"role": "user", "content": "Hello!"}
])
print(response.content)
# With system prompt
response = model.chat([
{"role": "system", "content": "You are a pirate."},
{"role": "user", "content": "Hello!"}
])
# With tools
response = model.chat(
messages=[{"role": "user", "content": "What's 2+2?"}],
tools=[{
"type": "function",
"function": {
"name": "calculator",
"description": "Evaluate math expressions",
"parameters": {
"type": "object",
"properties": {"expression": {"type": "string"}},
"required": ["expression"]
}
}
}]
)
gather_config
gather_config() -> Dict[str, Any]
Gather configuration from this OpenAI model adapter.
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing model configuration and client settings. |
gather_traces
gather_traces() -> Dict[str, Any]
Gather execution traces from this model adapter.
Called automatically by Benchmark to collect execution data for evaluation. Returns comprehensive statistics about all calls made to this adapter.
Output fields:
type- Component class namegathered_at- ISO timestampmodel_id- Model identifiertotal_calls- Number of chat/generate callssuccessful_calls- Number of successful callsfailed_calls- Number of failed callstotal_duration_seconds- Total time spent in callsaverage_duration_seconds- Average time per calllogs- List of individual call records
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing model execution traces. |
gather_usage
gather_usage() -> Usage
Gather accumulated token usage from all chat calls.
| RETURNS | DESCRIPTION |
|---|---|
Usage
|
Summed TokenUsage across all calls, or empty TokenUsage if no calls were made. |
generate
generate(
prompt: str,
generation_params: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> str
Generate text from a simple prompt.
This is a convenience method that wraps the prompt in a user message
and calls chat(). Use this for simple text-in/text-out scenarios.
For conversations or tool use, use chat() directly.
| PARAMETER | DESCRIPTION |
|---|---|
prompt
|
The input prompt.
TYPE:
|
generation_params
|
Generation parameters (temperature, max_tokens, etc.).
TYPE:
|
**kwargs
|
Additional provider-specific arguments.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
The model's text response. |
Example
response = model.generate("What is the capital of France?")
print(response) # "Paris"
HuggingFacePipelineModelAdapter
Bases: ModelAdapter
Adapter for HuggingFace transformers pipelines and callables.
Wraps a HuggingFace pipeline() object (or any text-generation callable)
for use with the ModelAdapter interface (chat(), generate()).
For log-likelihood scoring, see HuggingFaceModelScorer.
Works with:
transformers.pipeline()objects- Any callable that accepts a prompt and returns text
For chat functionality, the adapter uses the tokenizer's chat template if available. This provides proper formatting for instruction-tuned models.
Tool calling support
Tool calling is only supported if the model's chat template explicitly
supports it. If you pass tools and the model doesn't support them,
a ToolCallingNotSupportedError is raised. For reliable tool calling,
consider using LiteLLMModelAdapter instead.
seed
property
seed: Optional[int]
Seed for deterministic generation, or None if unseeded.
__init__
__init__(
model: Callable[[str], str],
model_id: Optional[str] = None,
default_generation_params: Optional[
Dict[str, Any]
] = None,
seed: Optional[int] = None,
cost_calculator: Optional[CostCalculator] = None,
)
Initialize HuggingFace model adapter.
| PARAMETER | DESCRIPTION |
|---|---|
model
|
A callable that generates text. Can be: - A transformers pipeline (e.g., pipeline("text-generation", ...)) - Any callable that takes a prompt string and returns text
TYPE:
|
model_id
|
Identifier for the model. If not provided, attempts to extract from the model's name_or_path attribute.
TYPE:
|
default_generation_params
|
Default parameters for all calls. Common parameters: max_new_tokens, temperature, top_p, do_sample.
TYPE:
|
seed
|
Seed for deterministic generation. Sets the random seed before each generation call using transformers.set_seed().
TYPE:
|
cost_calculator
|
Optional cost calculator for computing cost from token counts when the provider doesn't report cost directly.
TYPE:
|
chat
chat(
messages: Union[List[Dict[str, Any]], MessageHistory],
generation_params: Optional[Dict[str, Any]] = None,
tools: Optional[List[Dict[str, Any]]] = None,
tool_choice: Optional[
Union[str, Dict[str, Any]]
] = None,
**kwargs: Any,
) -> ChatResponse
Send messages to the model and get a response.
This is the primary method for interacting with the model. Pass a conversation history and receive the model's response.
| PARAMETER | DESCRIPTION |
|---|---|
messages
|
The conversation history. Either a list of message dicts in OpenAI format, or a MessageHistory object. Each message has 'role' ('system', 'user', 'assistant', 'tool') and 'content' keys.
TYPE:
|
generation_params
|
Model parameters like temperature, max_tokens, top_p, etc. Provider-specific parameters are also accepted.
TYPE:
|
tools
|
Tool definitions the model can use. Each tool is a dict with 'type' (usually 'function') and 'function' containing 'name', 'description', and 'parameters' (JSON Schema).
TYPE:
|
tool_choice
|
How the model should use tools: - "auto": Model decides whether to use tools (default) - "none": Model won't use tools - "required": Model must use a tool - {"type": "function", "function": {"name": "..."}}: Use specific tool
TYPE:
|
**kwargs
|
Additional provider-specific arguments.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ChatResponse
|
ChatResponse containing the model's response (text and/or tool calls). |
| RAISES | DESCRIPTION |
|---|---|
Exception
|
Provider-specific errors are logged and re-raised. |
Example
# Simple conversation
response = model.chat([
{"role": "user", "content": "Hello!"}
])
print(response.content)
# With system prompt
response = model.chat([
{"role": "system", "content": "You are a pirate."},
{"role": "user", "content": "Hello!"}
])
# With tools
response = model.chat(
messages=[{"role": "user", "content": "What's 2+2?"}],
tools=[{
"type": "function",
"function": {
"name": "calculator",
"description": "Evaluate math expressions",
"parameters": {
"type": "object",
"properties": {"expression": {"type": "string"}},
"required": ["expression"]
}
}
}]
)
gather_config
gather_config() -> Dict[str, Any]
Gather configuration from this HuggingFace model adapter.
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing model configuration. |
gather_traces
gather_traces() -> Dict[str, Any]
Gather execution traces from this model adapter.
Called automatically by Benchmark to collect execution data for evaluation. Returns comprehensive statistics about all calls made to this adapter.
Output fields:
type- Component class namegathered_at- ISO timestampmodel_id- Model identifiertotal_calls- Number of chat/generate callssuccessful_calls- Number of successful callsfailed_calls- Number of failed callstotal_duration_seconds- Total time spent in callsaverage_duration_seconds- Average time per calllogs- List of individual call records
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing model execution traces. |
gather_usage
gather_usage() -> Usage
Gather accumulated token usage from all chat calls.
| RETURNS | DESCRIPTION |
|---|---|
Usage
|
Summed TokenUsage across all calls, or empty TokenUsage if no calls were made. |
generate
generate(
prompt: str,
generation_params: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> str
Generate text from a simple prompt.
This is a convenience method that wraps the prompt in a user message
and calls chat(). Use this for simple text-in/text-out scenarios.
For conversations or tool use, use chat() directly.
| PARAMETER | DESCRIPTION |
|---|---|
prompt
|
The input prompt.
TYPE:
|
generation_params
|
Generation parameters (temperature, max_tokens, etc.).
TYPE:
|
**kwargs
|
Additional provider-specific arguments.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
The model's text response. |
Example
response = model.generate("What is the capital of France?")
print(response) # "Paris"
GoogleGenAIModelAdapter
Bases: ModelAdapter
Adapter for Google Generative AI (Gemini models).
Works with Google's Gemini models through the google-genai SDK. Pass any model ID supported by the Google GenAI API.
The adapter converts OpenAI-style messages to Google's format internally, so you can use the same message format across all adapters.
seed
property
seed: Optional[int]
Seed for deterministic generation, or None if unseeded.
__init__
__init__(
client: Any,
model_id: str,
default_generation_params: Optional[
Dict[str, Any]
] = None,
seed: Optional[int] = None,
cost_calculator: Optional[CostCalculator] = None,
)
Initialize Google GenAI model adapter.
| PARAMETER | DESCRIPTION |
|---|---|
client
|
A google.genai.Client instance.
TYPE:
|
model_id
|
The model identifier (e.g., "gemini-2.0-flash").
TYPE:
|
default_generation_params
|
Default parameters for all calls. Common parameters: temperature, max_output_tokens, top_p.
TYPE:
|
seed
|
Seed for deterministic generation. Google GenAI supports this.
TYPE:
|
cost_calculator
|
Optional cost calculator for computing cost from token counts when the provider doesn't report cost directly.
TYPE:
|
chat
chat(
messages: Union[List[Dict[str, Any]], MessageHistory],
generation_params: Optional[Dict[str, Any]] = None,
tools: Optional[List[Dict[str, Any]]] = None,
tool_choice: Optional[
Union[str, Dict[str, Any]]
] = None,
**kwargs: Any,
) -> ChatResponse
Send messages to the model and get a response.
This is the primary method for interacting with the model. Pass a conversation history and receive the model's response.
| PARAMETER | DESCRIPTION |
|---|---|
messages
|
The conversation history. Either a list of message dicts in OpenAI format, or a MessageHistory object. Each message has 'role' ('system', 'user', 'assistant', 'tool') and 'content' keys.
TYPE:
|
generation_params
|
Model parameters like temperature, max_tokens, top_p, etc. Provider-specific parameters are also accepted.
TYPE:
|
tools
|
Tool definitions the model can use. Each tool is a dict with 'type' (usually 'function') and 'function' containing 'name', 'description', and 'parameters' (JSON Schema).
TYPE:
|
tool_choice
|
How the model should use tools: - "auto": Model decides whether to use tools (default) - "none": Model won't use tools - "required": Model must use a tool - {"type": "function", "function": {"name": "..."}}: Use specific tool
TYPE:
|
**kwargs
|
Additional provider-specific arguments.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ChatResponse
|
ChatResponse containing the model's response (text and/or tool calls). |
| RAISES | DESCRIPTION |
|---|---|
Exception
|
Provider-specific errors are logged and re-raised. |
Example
# Simple conversation
response = model.chat([
{"role": "user", "content": "Hello!"}
])
print(response.content)
# With system prompt
response = model.chat([
{"role": "system", "content": "You are a pirate."},
{"role": "user", "content": "Hello!"}
])
# With tools
response = model.chat(
messages=[{"role": "user", "content": "What's 2+2?"}],
tools=[{
"type": "function",
"function": {
"name": "calculator",
"description": "Evaluate math expressions",
"parameters": {
"type": "object",
"properties": {"expression": {"type": "string"}},
"required": ["expression"]
}
}
}]
)
gather_config
gather_config() -> Dict[str, Any]
Gather configuration from this Google GenAI model adapter.
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing model configuration. |
gather_traces
gather_traces() -> Dict[str, Any]
Gather execution traces from this model adapter.
Called automatically by Benchmark to collect execution data for evaluation. Returns comprehensive statistics about all calls made to this adapter.
Output fields:
type- Component class namegathered_at- ISO timestampmodel_id- Model identifiertotal_calls- Number of chat/generate callssuccessful_calls- Number of successful callsfailed_calls- Number of failed callstotal_duration_seconds- Total time spent in callsaverage_duration_seconds- Average time per calllogs- List of individual call records
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing model execution traces. |
gather_usage
gather_usage() -> Usage
Gather accumulated token usage from all chat calls.
| RETURNS | DESCRIPTION |
|---|---|
Usage
|
Summed TokenUsage across all calls, or empty TokenUsage if no calls were made. |
generate
generate(
prompt: str,
generation_params: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> str
Generate text from a simple prompt.
This is a convenience method that wraps the prompt in a user message
and calls chat(). Use this for simple text-in/text-out scenarios.
For conversations or tool use, use chat() directly.
| PARAMETER | DESCRIPTION |
|---|---|
prompt
|
The input prompt.
TYPE:
|
generation_params
|
Generation parameters (temperature, max_tokens, etc.).
TYPE:
|
**kwargs
|
Additional provider-specific arguments.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
The model's text response. |
Example
response = model.generate("What is the capital of France?")
print(response) # "Paris"
LiteLLMModelAdapter
Bases: ModelAdapter
Adapter for LiteLLM unified interface.
LiteLLM provides a consistent API for calling multiple LLM providers (OpenAI, Anthropic, Cohere, Azure, AWS Bedrock, Google, etc.) using OpenAI-compatible syntax.
For supported providers see https://docs.litellm.ai/docs/providers.
API keys are read from environment variables by default
- OPENAI_API_KEY for OpenAI
- ANTHROPIC_API_KEY for Anthropic
- etc.
Or pass api_key directly to the constructor.
seed
property
seed: Optional[int]
Seed for deterministic generation, or None if unseeded.
__init__
__init__(
model_id: str,
default_generation_params: Optional[
Dict[str, Any]
] = None,
api_key: Optional[str] = None,
api_base: Optional[str] = None,
seed: Optional[int] = None,
cost_calculator: Optional[CostCalculator] = None,
)
Initialize LiteLLM model adapter.
| PARAMETER | DESCRIPTION |
|---|---|
model_id
|
The model identifier in LiteLLM format. Examples: - "gpt-4" (OpenAI) - "claude-3-opus-20240229" (Anthropic) - "azure/gpt-4" (Azure OpenAI) - "bedrock/anthropic.claude-v2" (AWS Bedrock) See https://docs.litellm.ai/docs/providers for full list.
TYPE:
|
default_generation_params
|
Default parameters for all calls. Common parameters: temperature, max_tokens, top_p.
TYPE:
|
api_key
|
API key for the provider. If not provided, LiteLLM reads from environment variables.
TYPE:
|
api_base
|
Custom API base URL for self-hosted or Azure endpoints.
TYPE:
|
seed
|
Seed for deterministic generation. LiteLLM passes this to the underlying provider. Note: Not all providers support seeding.
TYPE:
|
cost_calculator
|
Optional cost calculator for computing cost from
token counts. Note: LiteLLM already reports cost via
TYPE:
|
chat
chat(
messages: Union[List[Dict[str, Any]], MessageHistory],
generation_params: Optional[Dict[str, Any]] = None,
tools: Optional[List[Dict[str, Any]]] = None,
tool_choice: Optional[
Union[str, Dict[str, Any]]
] = None,
**kwargs: Any,
) -> ChatResponse
Send messages to the model and get a response.
This is the primary method for interacting with the model. Pass a conversation history and receive the model's response.
| PARAMETER | DESCRIPTION |
|---|---|
messages
|
The conversation history. Either a list of message dicts in OpenAI format, or a MessageHistory object. Each message has 'role' ('system', 'user', 'assistant', 'tool') and 'content' keys.
TYPE:
|
generation_params
|
Model parameters like temperature, max_tokens, top_p, etc. Provider-specific parameters are also accepted.
TYPE:
|
tools
|
Tool definitions the model can use. Each tool is a dict with 'type' (usually 'function') and 'function' containing 'name', 'description', and 'parameters' (JSON Schema).
TYPE:
|
tool_choice
|
How the model should use tools: - "auto": Model decides whether to use tools (default) - "none": Model won't use tools - "required": Model must use a tool - {"type": "function", "function": {"name": "..."}}: Use specific tool
TYPE:
|
**kwargs
|
Additional provider-specific arguments.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ChatResponse
|
ChatResponse containing the model's response (text and/or tool calls). |
| RAISES | DESCRIPTION |
|---|---|
Exception
|
Provider-specific errors are logged and re-raised. |
Example
# Simple conversation
response = model.chat([
{"role": "user", "content": "Hello!"}
])
print(response.content)
# With system prompt
response = model.chat([
{"role": "system", "content": "You are a pirate."},
{"role": "user", "content": "Hello!"}
])
# With tools
response = model.chat(
messages=[{"role": "user", "content": "What's 2+2?"}],
tools=[{
"type": "function",
"function": {
"name": "calculator",
"description": "Evaluate math expressions",
"parameters": {
"type": "object",
"properties": {"expression": {"type": "string"}},
"required": ["expression"]
}
}
}]
)
gather_config
gather_config() -> Dict[str, Any]
Gather configuration from this LiteLLM model adapter.
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing model configuration and LiteLLM settings. |
gather_traces
gather_traces() -> Dict[str, Any]
Gather execution traces from this model adapter.
Called automatically by Benchmark to collect execution data for evaluation. Returns comprehensive statistics about all calls made to this adapter.
Output fields:
type- Component class namegathered_at- ISO timestampmodel_id- Model identifiertotal_calls- Number of chat/generate callssuccessful_calls- Number of successful callsfailed_calls- Number of failed callstotal_duration_seconds- Total time spent in callsaverage_duration_seconds- Average time per calllogs- List of individual call records
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing model execution traces. |
gather_usage
gather_usage() -> Usage
Gather accumulated token usage from all chat calls.
| RETURNS | DESCRIPTION |
|---|---|
Usage
|
Summed TokenUsage across all calls, or empty TokenUsage if no calls were made. |
generate
generate(
prompt: str,
generation_params: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> str
Generate text from a simple prompt.
This is a convenience method that wraps the prompt in a user message
and calls chat(). Use this for simple text-in/text-out scenarios.
For conversations or tool use, use chat() directly.
| PARAMETER | DESCRIPTION |
|---|---|
prompt
|
The input prompt.
TYPE:
|
generation_params
|
Generation parameters (temperature, max_tokens, etc.).
TYPE:
|
**kwargs
|
Additional provider-specific arguments.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
The model's text response. |
Example
response = model.generate("What is the capital of France?")
print(response) # "Paris"
AnthropicModelAdapter
Bases: ModelAdapter
Adapter for Anthropic Claude models.
Works with Claude models through the official Anthropic Python SDK. Pass any model ID supported by the Anthropic API.
The adapter accepts OpenAI-style messages and converts them to Anthropic's format internally. Key differences handled automatically:
- System messages are passed separately (not in messages array)
- Tool definitions are converted to Anthropic format
- Tool responses are converted to tool_result content blocks
seed
property
seed: Optional[int]
Seed for deterministic generation, or None if unseeded.
__init__
__init__(
client: Any,
model_id: str,
default_generation_params: Optional[
Dict[str, Any]
] = None,
max_tokens: int = 4096,
seed: Optional[int] = None,
cost_calculator: Optional[CostCalculator] = None,
)
Initialize Anthropic model adapter.
| PARAMETER | DESCRIPTION |
|---|---|
client
|
An anthropic.Anthropic client instance.
TYPE:
|
model_id
|
The model identifier (e.g., "claude-sonnet-4-5-20250514").
TYPE:
|
default_generation_params
|
Default parameters for all calls. Common parameters: temperature, top_p, top_k.
TYPE:
|
max_tokens
|
Maximum tokens to generate. Anthropic requires this parameter. Default is 4096.
TYPE:
|
seed
|
Seed for deterministic generation. Note: Anthropic does NOT support seeding. Providing a seed will raise SeedingError.
TYPE:
|
cost_calculator
|
Optional cost calculator for computing cost from token counts when the provider doesn't report cost directly.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
SeedingError
|
If seed is provided (Anthropic doesn't support seeding). |
chat
chat(
messages: Union[List[Dict[str, Any]], MessageHistory],
generation_params: Optional[Dict[str, Any]] = None,
tools: Optional[List[Dict[str, Any]]] = None,
tool_choice: Optional[
Union[str, Dict[str, Any]]
] = None,
**kwargs: Any,
) -> ChatResponse
Send messages to the model and get a response.
This is the primary method for interacting with the model. Pass a conversation history and receive the model's response.
| PARAMETER | DESCRIPTION |
|---|---|
messages
|
The conversation history. Either a list of message dicts in OpenAI format, or a MessageHistory object. Each message has 'role' ('system', 'user', 'assistant', 'tool') and 'content' keys.
TYPE:
|
generation_params
|
Model parameters like temperature, max_tokens, top_p, etc. Provider-specific parameters are also accepted.
TYPE:
|
tools
|
Tool definitions the model can use. Each tool is a dict with 'type' (usually 'function') and 'function' containing 'name', 'description', and 'parameters' (JSON Schema).
TYPE:
|
tool_choice
|
How the model should use tools: - "auto": Model decides whether to use tools (default) - "none": Model won't use tools - "required": Model must use a tool - {"type": "function", "function": {"name": "..."}}: Use specific tool
TYPE:
|
**kwargs
|
Additional provider-specific arguments.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ChatResponse
|
ChatResponse containing the model's response (text and/or tool calls). |
| RAISES | DESCRIPTION |
|---|---|
Exception
|
Provider-specific errors are logged and re-raised. |
Example
# Simple conversation
response = model.chat([
{"role": "user", "content": "Hello!"}
])
print(response.content)
# With system prompt
response = model.chat([
{"role": "system", "content": "You are a pirate."},
{"role": "user", "content": "Hello!"}
])
# With tools
response = model.chat(
messages=[{"role": "user", "content": "What's 2+2?"}],
tools=[{
"type": "function",
"function": {
"name": "calculator",
"description": "Evaluate math expressions",
"parameters": {
"type": "object",
"properties": {"expression": {"type": "string"}},
"required": ["expression"]
}
}
}]
)
gather_config
gather_config() -> Dict[str, Any]
Gather configuration from this Anthropic model adapter.
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing model configuration. |
gather_traces
gather_traces() -> Dict[str, Any]
Gather execution traces from this model adapter.
Called automatically by Benchmark to collect execution data for evaluation. Returns comprehensive statistics about all calls made to this adapter.
Output fields:
type- Component class namegathered_at- ISO timestampmodel_id- Model identifiertotal_calls- Number of chat/generate callssuccessful_calls- Number of successful callsfailed_calls- Number of failed callstotal_duration_seconds- Total time spent in callsaverage_duration_seconds- Average time per calllogs- List of individual call records
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing model execution traces. |
gather_usage
gather_usage() -> Usage
Gather accumulated token usage from all chat calls.
| RETURNS | DESCRIPTION |
|---|---|
Usage
|
Summed TokenUsage across all calls, or empty TokenUsage if no calls were made. |
generate
generate(
prompt: str,
generation_params: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> str
Generate text from a simple prompt.
This is a convenience method that wraps the prompt in a user message
and calls chat(). Use this for simple text-in/text-out scenarios.
For conversations or tool use, use chat() directly.
| PARAMETER | DESCRIPTION |
|---|---|
prompt
|
The input prompt.
TYPE:
|
generation_params
|
Generation parameters (temperature, max_tokens, etc.).
TYPE:
|
**kwargs
|
Additional provider-specific arguments.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
The model's text response. |
Example
response = model.generate("What is the capital of France?")
print(response) # "Paris"