Skip to content

Model Adapters

Model Adapters provide a uniform runtime interface to heterogeneous model providers (HuggingFace, OpenAI, Google GenAI, or simple callables). Each adapter exposes a stable API so that maseval can call models without handling provider-specific shapes.

Note

Benchmark expects AgentAdapter instances; it does not consume model adapters directly. ModelAdapters are used by agents, simulators, others that directly invoke models.

View source

ModelAdapter

Bases: ABC, TraceableMixin, ConfigurableMixin, UsageTrackableMixin

Abstract base class for model adapters.

ModelAdapter provides a consistent interface for LLM inference across different providers. All adapters implement the same methods, so you can swap providers without changing your code.

To use a model adapter
  1. Create an instance with provider-specific configuration
  2. Call chat() for message-based conversations
  3. Call generate() for simple text-in/text-out

The adapter automatically tracks all calls for tracing and evaluation.

Implementing a custom adapter

Subclass ModelAdapter and implement: - model_id property: Return the model identifier string - _chat_impl(): The actual chat completion logic

See maseval.interface.inference for concrete implementations: - AnthropicModelAdapter - GoogleGenAIModelAdapter - HuggingFacePipelineModelAdapter - LiteLLMModelAdapter - OpenAIModelAdapter

Seeding

Pass a seed parameter to enable deterministic generation. This seed is passed to the underlying provider API if supported. If a seed is provided but the provider doesn't support seeding, the adapter should raise SeedingError from maseval.core.seeding.

User-provided generation_params["seed"] takes precedence over the adapter's seed parameter.

model_id abstractmethod property

model_id: str

The identifier for the underlying model.

RETURNS DESCRIPTION
str

A string identifying the model (e.g., "gpt-4", "claude-sonnet-4-5",

str

"gemini-pro"). Used for tracing and configuration.

seed property

seed: Optional[int]

Seed for deterministic generation, or None if unseeded.

__init__

__init__(
    seed: Optional[int] = None,
    cost_calculator: Optional[CostCalculator] = None,
)

Initialize the model adapter with call tracing.

PARAMETER DESCRIPTION
seed

Seed for deterministic generation. Passed to the underlying provider API if supported. If the provider doesn't support seeding, subclasses should raise SeedingError.

TYPE: Optional[int] DEFAULT: None

cost_calculator

Optional cost calculator for computing USD (or other unit) cost from token counts. If provided and the provider does not report cost directly, the calculator is used to fill in Usage.cost after each call. Provider- reported cost always takes precedence.

TYPE: Optional[CostCalculator] DEFAULT: None

chat

chat(
    messages: Union[List[Dict[str, Any]], MessageHistory],
    generation_params: Optional[Dict[str, Any]] = None,
    tools: Optional[List[Dict[str, Any]]] = None,
    tool_choice: Optional[
        Union[str, Dict[str, Any]]
    ] = None,
    **kwargs: Any,
) -> ChatResponse

Send messages to the model and get a response.

This is the primary method for interacting with the model. Pass a conversation history and receive the model's response.

PARAMETER DESCRIPTION
messages

The conversation history. Either a list of message dicts in OpenAI format, or a MessageHistory object. Each message has 'role' ('system', 'user', 'assistant', 'tool') and 'content' keys.

TYPE: Union[List[Dict[str, Any]], MessageHistory]

generation_params

Model parameters like temperature, max_tokens, top_p, etc. Provider-specific parameters are also accepted.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

tools

Tool definitions the model can use. Each tool is a dict with 'type' (usually 'function') and 'function' containing 'name', 'description', and 'parameters' (JSON Schema).

TYPE: Optional[List[Dict[str, Any]]] DEFAULT: None

tool_choice

How the model should use tools: - "auto": Model decides whether to use tools (default) - "none": Model won't use tools - "required": Model must use a tool - {"type": "function", "function": {"name": "..."}}: Use specific tool

TYPE: Optional[Union[str, Dict[str, Any]]] DEFAULT: None

**kwargs

Additional provider-specific arguments.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
ChatResponse

ChatResponse containing the model's response (text and/or tool calls).

RAISES DESCRIPTION
Exception

Provider-specific errors are logged and re-raised.

Example
# Simple conversation
response = model.chat([
    {"role": "user", "content": "Hello!"}
])
print(response.content)

# With system prompt
response = model.chat([
    {"role": "system", "content": "You are a pirate."},
    {"role": "user", "content": "Hello!"}
])

# With tools
response = model.chat(
    messages=[{"role": "user", "content": "What's 2+2?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "calculator",
            "description": "Evaluate math expressions",
            "parameters": {
                "type": "object",
                "properties": {"expression": {"type": "string"}},
                "required": ["expression"]
            }
        }
    }]
)

gather_config

gather_config() -> Dict[str, Any]

Gather configuration from this model adapter.

Called automatically by Benchmark to collect configuration for reproducibility. Returns identifying information about this adapter.

Output fields:

  • type - Component class name
  • gathered_at - ISO timestamp
  • model_id - Model identifier
  • adapter_type - The specific adapter class name
  • seed - Seed for deterministic generation, or None if unseeded
RETURNS DESCRIPTION
Dict[str, Any]

Dictionary containing model configuration.

gather_traces

gather_traces() -> Dict[str, Any]

Gather execution traces from this model adapter.

Called automatically by Benchmark to collect execution data for evaluation. Returns comprehensive statistics about all calls made to this adapter.

Output fields:

  • type - Component class name
  • gathered_at - ISO timestamp
  • model_id - Model identifier
  • total_calls - Number of chat/generate calls
  • successful_calls - Number of successful calls
  • failed_calls - Number of failed calls
  • total_duration_seconds - Total time spent in calls
  • average_duration_seconds - Average time per call
  • logs - List of individual call records
RETURNS DESCRIPTION
Dict[str, Any]

Dictionary containing model execution traces.

gather_usage

gather_usage() -> Usage

Gather accumulated token usage from all chat calls.

RETURNS DESCRIPTION
Usage

Summed TokenUsage across all calls, or empty TokenUsage if no calls were made.

generate

generate(
    prompt: str,
    generation_params: Optional[Dict[str, Any]] = None,
    **kwargs: Any,
) -> str

Generate text from a simple prompt.

This is a convenience method that wraps the prompt in a user message and calls chat(). Use this for simple text-in/text-out scenarios.

For conversations or tool use, use chat() directly.

PARAMETER DESCRIPTION
prompt

The input prompt.

TYPE: str

generation_params

Generation parameters (temperature, max_tokens, etc.).

TYPE: Optional[Dict[str, Any]] DEFAULT: None

**kwargs

Additional provider-specific arguments.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
str

The model's text response.

Example
response = model.generate("What is the capital of France?")
print(response)  # "Paris"

Interfaces

The following adapter classes implement the ModelAdapter interface for specific providers. Each requires their own dependencies.

View source

OpenAIModelAdapter

Bases: ModelAdapter

Adapter for OpenAI and OpenAI-compatible APIs.

Works with
  • OpenAI API (gpt-4, gpt-3.5-turbo, etc.)
  • Azure OpenAI
  • Any OpenAI-compatible server (vLLM, LocalAI, etc.)

The adapter expects an OpenAI client instance. API keys and configuration should be set on the client before passing it to the adapter.

seed property

seed: Optional[int]

Seed for deterministic generation, or None if unseeded.

__init__

__init__(
    client: Any,
    model_id: str,
    default_generation_params: Optional[
        Dict[str, Any]
    ] = None,
    seed: Optional[int] = None,
    cost_calculator: Optional[CostCalculator] = None,
)

Initialize OpenAI model adapter.

PARAMETER DESCRIPTION
client

An OpenAI client instance (openai.OpenAI or openai.AzureOpenAI). The client should already be configured with API keys.

TYPE: Any

model_id

The model identifier (e.g., "gpt-4", "gpt-3.5-turbo").

TYPE: str

default_generation_params

Default parameters for all calls. Common parameters: temperature, max_tokens, top_p.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

seed

Seed for deterministic generation. OpenAI supports this natively. Note: Determinism is best-effort, not guaranteed by OpenAI.

TYPE: Optional[int] DEFAULT: None

cost_calculator

Optional cost calculator for computing cost from token counts when the provider doesn't report cost directly.

TYPE: Optional[CostCalculator] DEFAULT: None

chat

chat(
    messages: Union[List[Dict[str, Any]], MessageHistory],
    generation_params: Optional[Dict[str, Any]] = None,
    tools: Optional[List[Dict[str, Any]]] = None,
    tool_choice: Optional[
        Union[str, Dict[str, Any]]
    ] = None,
    **kwargs: Any,
) -> ChatResponse

Send messages to the model and get a response.

This is the primary method for interacting with the model. Pass a conversation history and receive the model's response.

PARAMETER DESCRIPTION
messages

The conversation history. Either a list of message dicts in OpenAI format, or a MessageHistory object. Each message has 'role' ('system', 'user', 'assistant', 'tool') and 'content' keys.

TYPE: Union[List[Dict[str, Any]], MessageHistory]

generation_params

Model parameters like temperature, max_tokens, top_p, etc. Provider-specific parameters are also accepted.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

tools

Tool definitions the model can use. Each tool is a dict with 'type' (usually 'function') and 'function' containing 'name', 'description', and 'parameters' (JSON Schema).

TYPE: Optional[List[Dict[str, Any]]] DEFAULT: None

tool_choice

How the model should use tools: - "auto": Model decides whether to use tools (default) - "none": Model won't use tools - "required": Model must use a tool - {"type": "function", "function": {"name": "..."}}: Use specific tool

TYPE: Optional[Union[str, Dict[str, Any]]] DEFAULT: None

**kwargs

Additional provider-specific arguments.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
ChatResponse

ChatResponse containing the model's response (text and/or tool calls).

RAISES DESCRIPTION
Exception

Provider-specific errors are logged and re-raised.

Example
# Simple conversation
response = model.chat([
    {"role": "user", "content": "Hello!"}
])
print(response.content)

# With system prompt
response = model.chat([
    {"role": "system", "content": "You are a pirate."},
    {"role": "user", "content": "Hello!"}
])

# With tools
response = model.chat(
    messages=[{"role": "user", "content": "What's 2+2?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "calculator",
            "description": "Evaluate math expressions",
            "parameters": {
                "type": "object",
                "properties": {"expression": {"type": "string"}},
                "required": ["expression"]
            }
        }
    }]
)

gather_config

gather_config() -> Dict[str, Any]

Gather configuration from this OpenAI model adapter.

RETURNS DESCRIPTION
Dict[str, Any]

Dictionary containing model configuration and client settings.

gather_traces

gather_traces() -> Dict[str, Any]

Gather execution traces from this model adapter.

Called automatically by Benchmark to collect execution data for evaluation. Returns comprehensive statistics about all calls made to this adapter.

Output fields:

  • type - Component class name
  • gathered_at - ISO timestamp
  • model_id - Model identifier
  • total_calls - Number of chat/generate calls
  • successful_calls - Number of successful calls
  • failed_calls - Number of failed calls
  • total_duration_seconds - Total time spent in calls
  • average_duration_seconds - Average time per call
  • logs - List of individual call records
RETURNS DESCRIPTION
Dict[str, Any]

Dictionary containing model execution traces.

gather_usage

gather_usage() -> Usage

Gather accumulated token usage from all chat calls.

RETURNS DESCRIPTION
Usage

Summed TokenUsage across all calls, or empty TokenUsage if no calls were made.

generate

generate(
    prompt: str,
    generation_params: Optional[Dict[str, Any]] = None,
    **kwargs: Any,
) -> str

Generate text from a simple prompt.

This is a convenience method that wraps the prompt in a user message and calls chat(). Use this for simple text-in/text-out scenarios.

For conversations or tool use, use chat() directly.

PARAMETER DESCRIPTION
prompt

The input prompt.

TYPE: str

generation_params

Generation parameters (temperature, max_tokens, etc.).

TYPE: Optional[Dict[str, Any]] DEFAULT: None

**kwargs

Additional provider-specific arguments.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
str

The model's text response.

Example
response = model.generate("What is the capital of France?")
print(response)  # "Paris"

View source

HuggingFacePipelineModelAdapter

Bases: ModelAdapter

Adapter for HuggingFace transformers pipelines and callables.

Wraps a HuggingFace pipeline() object (or any text-generation callable) for use with the ModelAdapter interface (chat(), generate()).

For log-likelihood scoring, see HuggingFaceModelScorer.

Works with:

  • transformers.pipeline() objects
  • Any callable that accepts a prompt and returns text

For chat functionality, the adapter uses the tokenizer's chat template if available. This provides proper formatting for instruction-tuned models.

Tool calling support

Tool calling is only supported if the model's chat template explicitly supports it. If you pass tools and the model doesn't support them, a ToolCallingNotSupportedError is raised. For reliable tool calling, consider using LiteLLMModelAdapter instead.

seed property

seed: Optional[int]

Seed for deterministic generation, or None if unseeded.

__init__

__init__(
    model: Callable[[str], str],
    model_id: Optional[str] = None,
    default_generation_params: Optional[
        Dict[str, Any]
    ] = None,
    seed: Optional[int] = None,
    cost_calculator: Optional[CostCalculator] = None,
)

Initialize HuggingFace model adapter.

PARAMETER DESCRIPTION
model

A callable that generates text. Can be: - A transformers pipeline (e.g., pipeline("text-generation", ...)) - Any callable that takes a prompt string and returns text

TYPE: Callable[[str], str]

model_id

Identifier for the model. If not provided, attempts to extract from the model's name_or_path attribute.

TYPE: Optional[str] DEFAULT: None

default_generation_params

Default parameters for all calls. Common parameters: max_new_tokens, temperature, top_p, do_sample.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

seed

Seed for deterministic generation. Sets the random seed before each generation call using transformers.set_seed().

TYPE: Optional[int] DEFAULT: None

cost_calculator

Optional cost calculator for computing cost from token counts when the provider doesn't report cost directly.

TYPE: Optional[CostCalculator] DEFAULT: None

chat

chat(
    messages: Union[List[Dict[str, Any]], MessageHistory],
    generation_params: Optional[Dict[str, Any]] = None,
    tools: Optional[List[Dict[str, Any]]] = None,
    tool_choice: Optional[
        Union[str, Dict[str, Any]]
    ] = None,
    **kwargs: Any,
) -> ChatResponse

Send messages to the model and get a response.

This is the primary method for interacting with the model. Pass a conversation history and receive the model's response.

PARAMETER DESCRIPTION
messages

The conversation history. Either a list of message dicts in OpenAI format, or a MessageHistory object. Each message has 'role' ('system', 'user', 'assistant', 'tool') and 'content' keys.

TYPE: Union[List[Dict[str, Any]], MessageHistory]

generation_params

Model parameters like temperature, max_tokens, top_p, etc. Provider-specific parameters are also accepted.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

tools

Tool definitions the model can use. Each tool is a dict with 'type' (usually 'function') and 'function' containing 'name', 'description', and 'parameters' (JSON Schema).

TYPE: Optional[List[Dict[str, Any]]] DEFAULT: None

tool_choice

How the model should use tools: - "auto": Model decides whether to use tools (default) - "none": Model won't use tools - "required": Model must use a tool - {"type": "function", "function": {"name": "..."}}: Use specific tool

TYPE: Optional[Union[str, Dict[str, Any]]] DEFAULT: None

**kwargs

Additional provider-specific arguments.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
ChatResponse

ChatResponse containing the model's response (text and/or tool calls).

RAISES DESCRIPTION
Exception

Provider-specific errors are logged and re-raised.

Example
# Simple conversation
response = model.chat([
    {"role": "user", "content": "Hello!"}
])
print(response.content)

# With system prompt
response = model.chat([
    {"role": "system", "content": "You are a pirate."},
    {"role": "user", "content": "Hello!"}
])

# With tools
response = model.chat(
    messages=[{"role": "user", "content": "What's 2+2?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "calculator",
            "description": "Evaluate math expressions",
            "parameters": {
                "type": "object",
                "properties": {"expression": {"type": "string"}},
                "required": ["expression"]
            }
        }
    }]
)

gather_config

gather_config() -> Dict[str, Any]

Gather configuration from this HuggingFace model adapter.

RETURNS DESCRIPTION
Dict[str, Any]

Dictionary containing model configuration.

gather_traces

gather_traces() -> Dict[str, Any]

Gather execution traces from this model adapter.

Called automatically by Benchmark to collect execution data for evaluation. Returns comprehensive statistics about all calls made to this adapter.

Output fields:

  • type - Component class name
  • gathered_at - ISO timestamp
  • model_id - Model identifier
  • total_calls - Number of chat/generate calls
  • successful_calls - Number of successful calls
  • failed_calls - Number of failed calls
  • total_duration_seconds - Total time spent in calls
  • average_duration_seconds - Average time per call
  • logs - List of individual call records
RETURNS DESCRIPTION
Dict[str, Any]

Dictionary containing model execution traces.

gather_usage

gather_usage() -> Usage

Gather accumulated token usage from all chat calls.

RETURNS DESCRIPTION
Usage

Summed TokenUsage across all calls, or empty TokenUsage if no calls were made.

generate

generate(
    prompt: str,
    generation_params: Optional[Dict[str, Any]] = None,
    **kwargs: Any,
) -> str

Generate text from a simple prompt.

This is a convenience method that wraps the prompt in a user message and calls chat(). Use this for simple text-in/text-out scenarios.

For conversations or tool use, use chat() directly.

PARAMETER DESCRIPTION
prompt

The input prompt.

TYPE: str

generation_params

Generation parameters (temperature, max_tokens, etc.).

TYPE: Optional[Dict[str, Any]] DEFAULT: None

**kwargs

Additional provider-specific arguments.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
str

The model's text response.

Example
response = model.generate("What is the capital of France?")
print(response)  # "Paris"

View source

GoogleGenAIModelAdapter

Bases: ModelAdapter

Adapter for Google Generative AI (Gemini models).

Works with Google's Gemini models through the google-genai SDK. Pass any model ID supported by the Google GenAI API.

The adapter converts OpenAI-style messages to Google's format internally, so you can use the same message format across all adapters.

seed property

seed: Optional[int]

Seed for deterministic generation, or None if unseeded.

__init__

__init__(
    client: Any,
    model_id: str,
    default_generation_params: Optional[
        Dict[str, Any]
    ] = None,
    seed: Optional[int] = None,
    cost_calculator: Optional[CostCalculator] = None,
)

Initialize Google GenAI model adapter.

PARAMETER DESCRIPTION
client

A google.genai.Client instance.

TYPE: Any

model_id

The model identifier (e.g., "gemini-2.0-flash").

TYPE: str

default_generation_params

Default parameters for all calls. Common parameters: temperature, max_output_tokens, top_p.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

seed

Seed for deterministic generation. Google GenAI supports this.

TYPE: Optional[int] DEFAULT: None

cost_calculator

Optional cost calculator for computing cost from token counts when the provider doesn't report cost directly.

TYPE: Optional[CostCalculator] DEFAULT: None

chat

chat(
    messages: Union[List[Dict[str, Any]], MessageHistory],
    generation_params: Optional[Dict[str, Any]] = None,
    tools: Optional[List[Dict[str, Any]]] = None,
    tool_choice: Optional[
        Union[str, Dict[str, Any]]
    ] = None,
    **kwargs: Any,
) -> ChatResponse

Send messages to the model and get a response.

This is the primary method for interacting with the model. Pass a conversation history and receive the model's response.

PARAMETER DESCRIPTION
messages

The conversation history. Either a list of message dicts in OpenAI format, or a MessageHistory object. Each message has 'role' ('system', 'user', 'assistant', 'tool') and 'content' keys.

TYPE: Union[List[Dict[str, Any]], MessageHistory]

generation_params

Model parameters like temperature, max_tokens, top_p, etc. Provider-specific parameters are also accepted.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

tools

Tool definitions the model can use. Each tool is a dict with 'type' (usually 'function') and 'function' containing 'name', 'description', and 'parameters' (JSON Schema).

TYPE: Optional[List[Dict[str, Any]]] DEFAULT: None

tool_choice

How the model should use tools: - "auto": Model decides whether to use tools (default) - "none": Model won't use tools - "required": Model must use a tool - {"type": "function", "function": {"name": "..."}}: Use specific tool

TYPE: Optional[Union[str, Dict[str, Any]]] DEFAULT: None

**kwargs

Additional provider-specific arguments.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
ChatResponse

ChatResponse containing the model's response (text and/or tool calls).

RAISES DESCRIPTION
Exception

Provider-specific errors are logged and re-raised.

Example
# Simple conversation
response = model.chat([
    {"role": "user", "content": "Hello!"}
])
print(response.content)

# With system prompt
response = model.chat([
    {"role": "system", "content": "You are a pirate."},
    {"role": "user", "content": "Hello!"}
])

# With tools
response = model.chat(
    messages=[{"role": "user", "content": "What's 2+2?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "calculator",
            "description": "Evaluate math expressions",
            "parameters": {
                "type": "object",
                "properties": {"expression": {"type": "string"}},
                "required": ["expression"]
            }
        }
    }]
)

gather_config

gather_config() -> Dict[str, Any]

Gather configuration from this Google GenAI model adapter.

RETURNS DESCRIPTION
Dict[str, Any]

Dictionary containing model configuration.

gather_traces

gather_traces() -> Dict[str, Any]

Gather execution traces from this model adapter.

Called automatically by Benchmark to collect execution data for evaluation. Returns comprehensive statistics about all calls made to this adapter.

Output fields:

  • type - Component class name
  • gathered_at - ISO timestamp
  • model_id - Model identifier
  • total_calls - Number of chat/generate calls
  • successful_calls - Number of successful calls
  • failed_calls - Number of failed calls
  • total_duration_seconds - Total time spent in calls
  • average_duration_seconds - Average time per call
  • logs - List of individual call records
RETURNS DESCRIPTION
Dict[str, Any]

Dictionary containing model execution traces.

gather_usage

gather_usage() -> Usage

Gather accumulated token usage from all chat calls.

RETURNS DESCRIPTION
Usage

Summed TokenUsage across all calls, or empty TokenUsage if no calls were made.

generate

generate(
    prompt: str,
    generation_params: Optional[Dict[str, Any]] = None,
    **kwargs: Any,
) -> str

Generate text from a simple prompt.

This is a convenience method that wraps the prompt in a user message and calls chat(). Use this for simple text-in/text-out scenarios.

For conversations or tool use, use chat() directly.

PARAMETER DESCRIPTION
prompt

The input prompt.

TYPE: str

generation_params

Generation parameters (temperature, max_tokens, etc.).

TYPE: Optional[Dict[str, Any]] DEFAULT: None

**kwargs

Additional provider-specific arguments.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
str

The model's text response.

Example
response = model.generate("What is the capital of France?")
print(response)  # "Paris"

View source

LiteLLMModelAdapter

Bases: ModelAdapter

Adapter for LiteLLM unified interface.

LiteLLM provides a consistent API for calling multiple LLM providers (OpenAI, Anthropic, Cohere, Azure, AWS Bedrock, Google, etc.) using OpenAI-compatible syntax.

For supported providers see https://docs.litellm.ai/docs/providers.

API keys are read from environment variables by default
  • OPENAI_API_KEY for OpenAI
  • ANTHROPIC_API_KEY for Anthropic
  • etc.

Or pass api_key directly to the constructor.

seed property

seed: Optional[int]

Seed for deterministic generation, or None if unseeded.

__init__

__init__(
    model_id: str,
    default_generation_params: Optional[
        Dict[str, Any]
    ] = None,
    api_key: Optional[str] = None,
    api_base: Optional[str] = None,
    seed: Optional[int] = None,
    cost_calculator: Optional[CostCalculator] = None,
)

Initialize LiteLLM model adapter.

PARAMETER DESCRIPTION
model_id

The model identifier in LiteLLM format. Examples: - "gpt-4" (OpenAI) - "claude-3-opus-20240229" (Anthropic) - "azure/gpt-4" (Azure OpenAI) - "bedrock/anthropic.claude-v2" (AWS Bedrock) See https://docs.litellm.ai/docs/providers for full list.

TYPE: str

default_generation_params

Default parameters for all calls. Common parameters: temperature, max_tokens, top_p.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

api_key

API key for the provider. If not provided, LiteLLM reads from environment variables.

TYPE: Optional[str] DEFAULT: None

api_base

Custom API base URL for self-hosted or Azure endpoints.

TYPE: Optional[str] DEFAULT: None

seed

Seed for deterministic generation. LiteLLM passes this to the underlying provider. Note: Not all providers support seeding.

TYPE: Optional[int] DEFAULT: None

cost_calculator

Optional cost calculator for computing cost from token counts. Note: LiteLLM already reports cost via response._hidden_params.response_cost for most models, so a calculator is only needed as a fallback or override.

TYPE: Optional[CostCalculator] DEFAULT: None

chat

chat(
    messages: Union[List[Dict[str, Any]], MessageHistory],
    generation_params: Optional[Dict[str, Any]] = None,
    tools: Optional[List[Dict[str, Any]]] = None,
    tool_choice: Optional[
        Union[str, Dict[str, Any]]
    ] = None,
    **kwargs: Any,
) -> ChatResponse

Send messages to the model and get a response.

This is the primary method for interacting with the model. Pass a conversation history and receive the model's response.

PARAMETER DESCRIPTION
messages

The conversation history. Either a list of message dicts in OpenAI format, or a MessageHistory object. Each message has 'role' ('system', 'user', 'assistant', 'tool') and 'content' keys.

TYPE: Union[List[Dict[str, Any]], MessageHistory]

generation_params

Model parameters like temperature, max_tokens, top_p, etc. Provider-specific parameters are also accepted.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

tools

Tool definitions the model can use. Each tool is a dict with 'type' (usually 'function') and 'function' containing 'name', 'description', and 'parameters' (JSON Schema).

TYPE: Optional[List[Dict[str, Any]]] DEFAULT: None

tool_choice

How the model should use tools: - "auto": Model decides whether to use tools (default) - "none": Model won't use tools - "required": Model must use a tool - {"type": "function", "function": {"name": "..."}}: Use specific tool

TYPE: Optional[Union[str, Dict[str, Any]]] DEFAULT: None

**kwargs

Additional provider-specific arguments.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
ChatResponse

ChatResponse containing the model's response (text and/or tool calls).

RAISES DESCRIPTION
Exception

Provider-specific errors are logged and re-raised.

Example
# Simple conversation
response = model.chat([
    {"role": "user", "content": "Hello!"}
])
print(response.content)

# With system prompt
response = model.chat([
    {"role": "system", "content": "You are a pirate."},
    {"role": "user", "content": "Hello!"}
])

# With tools
response = model.chat(
    messages=[{"role": "user", "content": "What's 2+2?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "calculator",
            "description": "Evaluate math expressions",
            "parameters": {
                "type": "object",
                "properties": {"expression": {"type": "string"}},
                "required": ["expression"]
            }
        }
    }]
)

gather_config

gather_config() -> Dict[str, Any]

Gather configuration from this LiteLLM model adapter.

RETURNS DESCRIPTION
Dict[str, Any]

Dictionary containing model configuration and LiteLLM settings.

gather_traces

gather_traces() -> Dict[str, Any]

Gather execution traces from this model adapter.

Called automatically by Benchmark to collect execution data for evaluation. Returns comprehensive statistics about all calls made to this adapter.

Output fields:

  • type - Component class name
  • gathered_at - ISO timestamp
  • model_id - Model identifier
  • total_calls - Number of chat/generate calls
  • successful_calls - Number of successful calls
  • failed_calls - Number of failed calls
  • total_duration_seconds - Total time spent in calls
  • average_duration_seconds - Average time per call
  • logs - List of individual call records
RETURNS DESCRIPTION
Dict[str, Any]

Dictionary containing model execution traces.

gather_usage

gather_usage() -> Usage

Gather accumulated token usage from all chat calls.

RETURNS DESCRIPTION
Usage

Summed TokenUsage across all calls, or empty TokenUsage if no calls were made.

generate

generate(
    prompt: str,
    generation_params: Optional[Dict[str, Any]] = None,
    **kwargs: Any,
) -> str

Generate text from a simple prompt.

This is a convenience method that wraps the prompt in a user message and calls chat(). Use this for simple text-in/text-out scenarios.

For conversations or tool use, use chat() directly.

PARAMETER DESCRIPTION
prompt

The input prompt.

TYPE: str

generation_params

Generation parameters (temperature, max_tokens, etc.).

TYPE: Optional[Dict[str, Any]] DEFAULT: None

**kwargs

Additional provider-specific arguments.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
str

The model's text response.

Example
response = model.generate("What is the capital of France?")
print(response)  # "Paris"

View source

AnthropicModelAdapter

Bases: ModelAdapter

Adapter for Anthropic Claude models.

Works with Claude models through the official Anthropic Python SDK. Pass any model ID supported by the Anthropic API.

The adapter accepts OpenAI-style messages and converts them to Anthropic's format internally. Key differences handled automatically:

  • System messages are passed separately (not in messages array)
  • Tool definitions are converted to Anthropic format
  • Tool responses are converted to tool_result content blocks

seed property

seed: Optional[int]

Seed for deterministic generation, or None if unseeded.

__init__

__init__(
    client: Any,
    model_id: str,
    default_generation_params: Optional[
        Dict[str, Any]
    ] = None,
    max_tokens: int = 4096,
    seed: Optional[int] = None,
    cost_calculator: Optional[CostCalculator] = None,
)

Initialize Anthropic model adapter.

PARAMETER DESCRIPTION
client

An anthropic.Anthropic client instance.

TYPE: Any

model_id

The model identifier (e.g., "claude-sonnet-4-5-20250514").

TYPE: str

default_generation_params

Default parameters for all calls. Common parameters: temperature, top_p, top_k.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

max_tokens

Maximum tokens to generate. Anthropic requires this parameter. Default is 4096.

TYPE: int DEFAULT: 4096

seed

Seed for deterministic generation. Note: Anthropic does NOT support seeding. Providing a seed will raise SeedingError.

TYPE: Optional[int] DEFAULT: None

cost_calculator

Optional cost calculator for computing cost from token counts when the provider doesn't report cost directly.

TYPE: Optional[CostCalculator] DEFAULT: None

RAISES DESCRIPTION
SeedingError

If seed is provided (Anthropic doesn't support seeding).

chat

chat(
    messages: Union[List[Dict[str, Any]], MessageHistory],
    generation_params: Optional[Dict[str, Any]] = None,
    tools: Optional[List[Dict[str, Any]]] = None,
    tool_choice: Optional[
        Union[str, Dict[str, Any]]
    ] = None,
    **kwargs: Any,
) -> ChatResponse

Send messages to the model and get a response.

This is the primary method for interacting with the model. Pass a conversation history and receive the model's response.

PARAMETER DESCRIPTION
messages

The conversation history. Either a list of message dicts in OpenAI format, or a MessageHistory object. Each message has 'role' ('system', 'user', 'assistant', 'tool') and 'content' keys.

TYPE: Union[List[Dict[str, Any]], MessageHistory]

generation_params

Model parameters like temperature, max_tokens, top_p, etc. Provider-specific parameters are also accepted.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

tools

Tool definitions the model can use. Each tool is a dict with 'type' (usually 'function') and 'function' containing 'name', 'description', and 'parameters' (JSON Schema).

TYPE: Optional[List[Dict[str, Any]]] DEFAULT: None

tool_choice

How the model should use tools: - "auto": Model decides whether to use tools (default) - "none": Model won't use tools - "required": Model must use a tool - {"type": "function", "function": {"name": "..."}}: Use specific tool

TYPE: Optional[Union[str, Dict[str, Any]]] DEFAULT: None

**kwargs

Additional provider-specific arguments.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
ChatResponse

ChatResponse containing the model's response (text and/or tool calls).

RAISES DESCRIPTION
Exception

Provider-specific errors are logged and re-raised.

Example
# Simple conversation
response = model.chat([
    {"role": "user", "content": "Hello!"}
])
print(response.content)

# With system prompt
response = model.chat([
    {"role": "system", "content": "You are a pirate."},
    {"role": "user", "content": "Hello!"}
])

# With tools
response = model.chat(
    messages=[{"role": "user", "content": "What's 2+2?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "calculator",
            "description": "Evaluate math expressions",
            "parameters": {
                "type": "object",
                "properties": {"expression": {"type": "string"}},
                "required": ["expression"]
            }
        }
    }]
)

gather_config

gather_config() -> Dict[str, Any]

Gather configuration from this Anthropic model adapter.

RETURNS DESCRIPTION
Dict[str, Any]

Dictionary containing model configuration.

gather_traces

gather_traces() -> Dict[str, Any]

Gather execution traces from this model adapter.

Called automatically by Benchmark to collect execution data for evaluation. Returns comprehensive statistics about all calls made to this adapter.

Output fields:

  • type - Component class name
  • gathered_at - ISO timestamp
  • model_id - Model identifier
  • total_calls - Number of chat/generate calls
  • successful_calls - Number of successful calls
  • failed_calls - Number of failed calls
  • total_duration_seconds - Total time spent in calls
  • average_duration_seconds - Average time per call
  • logs - List of individual call records
RETURNS DESCRIPTION
Dict[str, Any]

Dictionary containing model execution traces.

gather_usage

gather_usage() -> Usage

Gather accumulated token usage from all chat calls.

RETURNS DESCRIPTION
Usage

Summed TokenUsage across all calls, or empty TokenUsage if no calls were made.

generate

generate(
    prompt: str,
    generation_params: Optional[Dict[str, Any]] = None,
    **kwargs: Any,
) -> str

Generate text from a simple prompt.

This is a convenience method that wraps the prompt in a user message and calls chat(). Use this for simple text-in/text-out scenarios.

For conversations or tool use, use chat() directly.

PARAMETER DESCRIPTION
prompt

The input prompt.

TYPE: str

generation_params

Generation parameters (temperature, max_tokens, etc.).

TYPE: Optional[Dict[str, Any]] DEFAULT: None

**kwargs

Additional provider-specific arguments.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
str

The model's text response.

Example
response = model.generate("What is the capital of France?")
print(response)  # "Paris"