Model Adapters

Model Adapters provide a uniform runtime interface to heterogeneous model providers (HuggingFace, OpenAI, Google GenAI, or simple callables). Each adapter exposes a stable API so that maseval can call models without handling provider-specific shapes.

Note

Benchmark expects AgentAdapter instances; it does not consume model adapters directly. ModelAdapters are used by agents, simulators, others that directly invoke models.

View source

ModelAdapter

Bases: ABC, TraceableMixin, ConfigurableMixin, UsageTrackableMixin

Abstract base class for model adapters.

ModelAdapter provides a consistent interface for LLM inference across different providers. All adapters implement the same methods, so you can swap providers without changing your code.

To use a model adapter

Create an instance with provider-specific configuration
Call chat() for message-based conversations
Call generate() for simple text-in/text-out

The adapter automatically tracks all calls for tracing and evaluation.

Implementing a custom adapter

Subclass ModelAdapter and implement: - model_id property: Return the model identifier string - _chat_impl(): The actual chat completion logic

See maseval.interface.inference for concrete implementations: - AnthropicModelAdapter - GoogleGenAIModelAdapter - HuggingFacePipelineModelAdapter - LiteLLMModelAdapter - OpenAIModelAdapter

Seeding

Pass a seed parameter to enable deterministic generation. This seed is passed to the underlying provider API if supported. If a seed is provided but the provider doesn't support seeding, the adapter should raise SeedingError from maseval.core.seeding.

User-provided generation_params["seed"] takes precedence over the adapter's seed parameter.

model_id `abstractmethod` `property`

model_id: str

The identifier for the underlying model.

RETURNS	DESCRIPTION
`str`	A string identifying the model (e.g., "gpt-4", "claude-sonnet-4-5",
`str`	"gemini-pro"). Used for tracing and configuration.

seed `property`

seed: Optional[int]

Seed for deterministic generation, or None if unseeded.

init

__init__(
    seed: Optional[int] = None,
    cost_calculator: Optional[CostCalculator] = None,
)

Initialize the model adapter with call tracing.

PARAMETER	DESCRIPTION
`seed`	Seed for deterministic generation. Passed to the underlying provider API if supported. If the provider doesn't support seeding, subclasses should raise SeedingError. TYPE: `Optional[int]` DEFAULT: `None`
`cost_calculator`	Optional cost calculator for computing USD (or other unit) cost from token counts. If provided and the provider does not report cost directly, the calculator is used to fill in `Usage.cost` after each call. Provider- reported cost always takes precedence. TYPE: `Optional[CostCalculator]` DEFAULT: `None`

chat

chat(
    messages: Union[List[Dict[str, Any]], MessageHistory],
    generation_params: Optional[Dict[str, Any]] = None,
    tools: Optional[List[Dict[str, Any]]] = None,
    tool_choice: Optional[
        Union[str, Dict[str, Any]]
    ] = None,
    **kwargs: Any,
) -> ChatResponse

Send messages to the model and get a response.

This is the primary method for interacting with the model. Pass a conversation history and receive the model's response.

PARAMETER	DESCRIPTION
`messages`	The conversation history. Either a list of message dicts in OpenAI format, or a MessageHistory object. Each message has 'role' ('system', 'user', 'assistant', 'tool') and 'content' keys. TYPE: `Union[List[Dict[str, Any]], MessageHistory]`
`generation_params`	Model parameters like temperature, max_tokens, top_p, etc. Provider-specific parameters are also accepted. TYPE: `Optional[Dict[str, Any]]` DEFAULT: `None`
`tools`	Tool definitions the model can use. Each tool is a dict with 'type' (usually 'function') and 'function' containing 'name', 'description', and 'parameters' (JSON Schema). TYPE: `Optional[List[Dict[str, Any]]]` DEFAULT: `None`
`tool_choice`	How the model should use tools: - "auto": Model decides whether to use tools (default) - "none": Model won't use tools - "required": Model must use a tool - {"type": "function", "function": {"name": "..."}}: Use specific tool TYPE: `Optional[Union[str, Dict[str, Any]]]` DEFAULT: `None`
`**kwargs`	Additional provider-specific arguments. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`ChatResponse`	ChatResponse containing the model's response (text and/or tool calls).

RAISES	DESCRIPTION
`Exception`	Provider-specific errors are logged and re-raised.

Example

# Simple conversation
response = model.chat([
    {"role": "user", "content": "Hello!"}
])
print(response.content)

# With system prompt
response = model.chat([
    {"role": "system", "content": "You are a pirate."},
    {"role": "user", "content": "Hello!"}
])

# With tools
response = model.chat(
    messages=[{"role": "user", "content": "What's 2+2?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "calculator",
            "description": "Evaluate math expressions",
            "parameters": {
                "type": "object",
                "properties": {"expression": {"type": "string"}},
                "required": ["expression"]
            }
        }
    }]
)

gather_config

gather_config() -> Dict[str, Any]

Gather configuration from this model adapter.

Called automatically by Benchmark to collect configuration for reproducibility. Returns identifying information about this adapter.

Output fields:

type - Component class name
gathered_at - ISO timestamp
model_id - Model identifier
adapter_type - The specific adapter class name
seed - Seed for deterministic generation, or None if unseeded

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary containing model configuration.

gather_traces

gather_traces() -> Dict[str, Any]

Gather execution traces from this model adapter.

Called automatically by Benchmark to collect execution data for evaluation. Returns comprehensive statistics about all calls made to this adapter.

Output fields:

type - Component class name
gathered_at - ISO timestamp
model_id - Model identifier
total_calls - Number of chat/generate calls
successful_calls - Number of successful calls
failed_calls - Number of failed calls
total_duration_seconds - Total time spent in calls
average_duration_seconds - Average time per call
logs - List of individual call records

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary containing model execution traces.

gather_usage

gather_usage() -> Usage

Gather accumulated token usage from all chat calls.

RETURNS	DESCRIPTION
`Usage`	Summed TokenUsage across all calls, or empty TokenUsage if no calls were made.

generate

generate(
    prompt: str,
    generation_params: Optional[Dict[str, Any]] = None,
    **kwargs: Any,
) -> str

Generate text from a simple prompt.

This is a convenience method that wraps the prompt in a user message and calls chat(). Use this for simple text-in/text-out scenarios.

For conversations or tool use, use chat() directly.

PARAMETER	DESCRIPTION
`prompt`	The input prompt. TYPE: `str`
`generation_params`	Generation parameters (temperature, max_tokens, etc.). TYPE: `Optional[Dict[str, Any]]` DEFAULT: `None`
`**kwargs`	Additional provider-specific arguments. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`str`	The model's text response.

Example

response = model.generate("What is the capital of France?")
print(response)  # "Paris"

Interfaces

The following adapter classes implement the ModelAdapter interface for specific providers. Each requires their own dependencies.

View source

OpenAIModelAdapter

Bases: ModelAdapter

Adapter for OpenAI and OpenAI-compatible APIs.

Works with

OpenAI API (gpt-4, gpt-3.5-turbo, etc.)
Azure OpenAI
Any OpenAI-compatible server (vLLM, LocalAI, etc.)

The adapter expects an OpenAI client instance. API keys and configuration should be set on the client before passing it to the adapter.

seed `property`

seed: Optional[int]

Seed for deterministic generation, or None if unseeded.

init

__init__(
    client: Any,
    model_id: str,
    default_generation_params: Optional[
        Dict[str, Any]
    ] = None,
    seed: Optional[int] = None,
    cost_calculator: Optional[CostCalculator] = None,
)

Initialize OpenAI model adapter.

PARAMETER	DESCRIPTION
`client`	An OpenAI client instance (openai.OpenAI or openai.AzureOpenAI). The client should already be configured with API keys. TYPE: `Any`
`model_id`	The model identifier (e.g., "gpt-4", "gpt-3.5-turbo"). TYPE: `str`
`default_generation_params`	Default parameters for all calls. Common parameters: temperature, max_tokens, top_p. TYPE: `Optional[Dict[str, Any]]` DEFAULT: `None`
`seed`	Seed for deterministic generation. OpenAI supports this natively. Note: Determinism is best-effort, not guaranteed by OpenAI. TYPE: `Optional[int]` DEFAULT: `None`
`cost_calculator`	Optional cost calculator for computing cost from token counts when the provider doesn't report cost directly. TYPE: `Optional[CostCalculator]` DEFAULT: `None`

chat

chat(
    messages: Union[List[Dict[str, Any]], MessageHistory],
    generation_params: Optional[Dict[str, Any]] = None,
    tools: Optional[List[Dict[str, Any]]] = None,
    tool_choice: Optional[
        Union[str, Dict[str, Any]]
    ] = None,
    **kwargs: Any,
) -> ChatResponse

Send messages to the model and get a response.

This is the primary method for interacting with the model. Pass a conversation history and receive the model's response.

PARAMETER	DESCRIPTION
`messages`	The conversation history. Either a list of message dicts in OpenAI format, or a MessageHistory object. Each message has 'role' ('system', 'user', 'assistant', 'tool') and 'content' keys. TYPE: `Union[List[Dict[str, Any]], MessageHistory]`
`generation_params`	Model parameters like temperature, max_tokens, top_p, etc. Provider-specific parameters are also accepted. TYPE: `Optional[Dict[str, Any]]` DEFAULT: `None`
`tools`	Tool definitions the model can use. Each tool is a dict with 'type' (usually 'function') and 'function' containing 'name', 'description', and 'parameters' (JSON Schema). TYPE: `Optional[List[Dict[str, Any]]]` DEFAULT: `None`
`tool_choice`	How the model should use tools: - "auto": Model decides whether to use tools (default) - "none": Model won't use tools - "required": Model must use a tool - {"type": "function", "function": {"name": "..."}}: Use specific tool TYPE: `Optional[Union[str, Dict[str, Any]]]` DEFAULT: `None`
`**kwargs`	Additional provider-specific arguments. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`ChatResponse`	ChatResponse containing the model's response (text and/or tool calls).

RAISES	DESCRIPTION
`Exception`	Provider-specific errors are logged and re-raised.

Example

# Simple conversation
response = model.chat([
    {"role": "user", "content": "Hello!"}
])
print(response.content)

# With system prompt
response = model.chat([
    {"role": "system", "content": "You are a pirate."},
    {"role": "user", "content": "Hello!"}
])

# With tools
response = model.chat(
    messages=[{"role": "user", "content": "What's 2+2?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "calculator",
            "description": "Evaluate math expressions",
            "parameters": {
                "type": "object",
                "properties": {"expression": {"type": "string"}},
                "required": ["expression"]
            }
        }
    }]
)

gather_config

gather_config() -> Dict[str, Any]

Gather configuration from this OpenAI model adapter.

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary containing model configuration and client settings.

gather_traces

gather_traces() -> Dict[str, Any]

Gather execution traces from this model adapter.

Called automatically by Benchmark to collect execution data for evaluation. Returns comprehensive statistics about all calls made to this adapter.

Output fields:

type - Component class name
gathered_at - ISO timestamp
model_id - Model identifier
total_calls - Number of chat/generate calls
successful_calls - Number of successful calls
failed_calls - Number of failed calls
total_duration_seconds - Total time spent in calls
average_duration_seconds - Average time per call
logs - List of individual call records

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary containing model execution traces.

gather_usage

gather_usage() -> Usage

Gather accumulated token usage from all chat calls.

RETURNS	DESCRIPTION
`Usage`	Summed TokenUsage across all calls, or empty TokenUsage if no calls were made.

generate

generate(
    prompt: str,
    generation_params: Optional[Dict[str, Any]] = None,
    **kwargs: Any,
) -> str

Generate text from a simple prompt.

This is a convenience method that wraps the prompt in a user message and calls chat(). Use this for simple text-in/text-out scenarios.

For conversations or tool use, use chat() directly.

PARAMETER	DESCRIPTION
`prompt`	The input prompt. TYPE: `str`
`generation_params`	Generation parameters (temperature, max_tokens, etc.). TYPE: `Optional[Dict[str, Any]]` DEFAULT: `None`
`**kwargs`	Additional provider-specific arguments. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`str`	The model's text response.

Example

response = model.generate("What is the capital of France?")
print(response)  # "Paris"

View source

HuggingFacePipelineModelAdapter

Bases: ModelAdapter

Adapter for HuggingFace transformers pipelines and callables.

Wraps a HuggingFace pipeline() object (or any text-generation callable) for use with the ModelAdapter interface (chat(), generate()).

For log-likelihood scoring, see HuggingFaceModelScorer.

Works with:

transformers.pipeline() objects
Any callable that accepts a prompt and returns text

For chat functionality, the adapter uses the tokenizer's chat template if available. This provides proper formatting for instruction-tuned models.

Tool calling support

Tool calling is only supported if the model's chat template explicitly supports it. If you pass tools and the model doesn't support them, a ToolCallingNotSupportedError is raised. For reliable tool calling, consider using LiteLLMModelAdapter instead.

seed `property`

seed: Optional[int]

Seed for deterministic generation, or None if unseeded.

init

__init__(
    model: Callable[[str], str],
    model_id: Optional[str] = None,
    default_generation_params: Optional[
        Dict[str, Any]
    ] = None,
    seed: Optional[int] = None,
    cost_calculator: Optional[CostCalculator] = None,
)

Initialize HuggingFace model adapter.

PARAMETER	DESCRIPTION
`model`	A callable that generates text. Can be: - A transformers pipeline (e.g., pipeline("text-generation", ...)) - Any callable that takes a prompt string and returns text TYPE: `Callable[[str], str]`
`model_id`	Identifier for the model. If not provided, attempts to extract from the model's name_or_path attribute. TYPE: `Optional[str]` DEFAULT: `None`
`default_generation_params`	Default parameters for all calls. Common parameters: max_new_tokens, temperature, top_p, do_sample. TYPE: `Optional[Dict[str, Any]]` DEFAULT: `None`
`seed`	Seed for deterministic generation. Sets the random seed before each generation call using transformers.set_seed(). TYPE: `Optional[int]` DEFAULT: `None`
`cost_calculator`	Optional cost calculator for computing cost from token counts when the provider doesn't report cost directly. TYPE: `Optional[CostCalculator]` DEFAULT: `None`

chat

chat(
    messages: Union[List[Dict[str, Any]], MessageHistory],
    generation_params: Optional[Dict[str, Any]] = None,
    tools: Optional[List[Dict[str, Any]]] = None,
    tool_choice: Optional[
        Union[str, Dict[str, Any]]
    ] = None,
    **kwargs: Any,
) -> ChatResponse

Send messages to the model and get a response.

This is the primary method for interacting with the model. Pass a conversation history and receive the model's response.

PARAMETER	DESCRIPTION
`messages`	The conversation history. Either a list of message dicts in OpenAI format, or a MessageHistory object. Each message has 'role' ('system', 'user', 'assistant', 'tool') and 'content' keys. TYPE: `Union[List[Dict[str, Any]], MessageHistory]`
`generation_params`	Model parameters like temperature, max_tokens, top_p, etc. Provider-specific parameters are also accepted. TYPE: `Optional[Dict[str, Any]]` DEFAULT: `None`
`tools`	Tool definitions the model can use. Each tool is a dict with 'type' (usually 'function') and 'function' containing 'name', 'description', and 'parameters' (JSON Schema). TYPE: `Optional[List[Dict[str, Any]]]` DEFAULT: `None`
`tool_choice`	How the model should use tools: - "auto": Model decides whether to use tools (default) - "none": Model won't use tools - "required": Model must use a tool - {"type": "function", "function": {"name": "..."}}: Use specific tool TYPE: `Optional[Union[str, Dict[str, Any]]]` DEFAULT: `None`
`**kwargs`	Additional provider-specific arguments. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`ChatResponse`	ChatResponse containing the model's response (text and/or tool calls).

RAISES	DESCRIPTION
`Exception`	Provider-specific errors are logged and re-raised.

Example

# Simple conversation
response = model.chat([
    {"role": "user", "content": "Hello!"}
])
print(response.content)

# With system prompt
response = model.chat([
    {"role": "system", "content": "You are a pirate."},
    {"role": "user", "content": "Hello!"}
])

# With tools
response = model.chat(
    messages=[{"role": "user", "content": "What's 2+2?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "calculator",
            "description": "Evaluate math expressions",
            "parameters": {
                "type": "object",
                "properties": {"expression": {"type": "string"}},
                "required": ["expression"]
            }
        }
    }]
)

gather_config

gather_config() -> Dict[str, Any]

Gather configuration from this HuggingFace model adapter.

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary containing model configuration.

gather_traces

gather_traces() -> Dict[str, Any]

Gather execution traces from this model adapter.

Called automatically by Benchmark to collect execution data for evaluation. Returns comprehensive statistics about all calls made to this adapter.

Output fields:

type - Component class name
gathered_at - ISO timestamp
model_id - Model identifier
total_calls - Number of chat/generate calls
successful_calls - Number of successful calls
failed_calls - Number of failed calls
total_duration_seconds - Total time spent in calls
average_duration_seconds - Average time per call
logs - List of individual call records

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary containing model execution traces.

gather_usage

gather_usage() -> Usage

Gather accumulated token usage from all chat calls.

RETURNS	DESCRIPTION
`Usage`	Summed TokenUsage across all calls, or empty TokenUsage if no calls were made.

generate

generate(
    prompt: str,
    generation_params: Optional[Dict[str, Any]] = None,
    **kwargs: Any,
) -> str

Generate text from a simple prompt.

This is a convenience method that wraps the prompt in a user message and calls chat(). Use this for simple text-in/text-out scenarios.

For conversations or tool use, use chat() directly.

PARAMETER	DESCRIPTION
`prompt`	The input prompt. TYPE: `str`
`generation_params`	Generation parameters (temperature, max_tokens, etc.). TYPE: `Optional[Dict[str, Any]]` DEFAULT: `None`
`**kwargs`	Additional provider-specific arguments. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`str`	The model's text response.

Example

response = model.generate("What is the capital of France?")
print(response)  # "Paris"

View source

GoogleGenAIModelAdapter

Bases: ModelAdapter

Adapter for Google Generative AI (Gemini models).

Works with Google's Gemini models through the google-genai SDK. Pass any model ID supported by the Google GenAI API.

The adapter converts OpenAI-style messages to Google's format internally, so you can use the same message format across all adapters.

seed `property`

seed: Optional[int]

Seed for deterministic generation, or None if unseeded.

init

__init__(
    client: Any,
    model_id: str,
    default_generation_params: Optional[
        Dict[str, Any]
    ] = None,
    seed: Optional[int] = None,
    cost_calculator: Optional[CostCalculator] = None,
)

Initialize Google GenAI model adapter.

PARAMETER	DESCRIPTION
`client`	A google.genai.Client instance. TYPE: `Any`
`model_id`	The model identifier (e.g., "gemini-2.0-flash"). TYPE: `str`
`default_generation_params`	Default parameters for all calls. Common parameters: temperature, max_output_tokens, top_p. TYPE: `Optional[Dict[str, Any]]` DEFAULT: `None`
`seed`	Seed for deterministic generation. Google GenAI supports this. TYPE: `Optional[int]` DEFAULT: `None`
`cost_calculator`	Optional cost calculator for computing cost from token counts when the provider doesn't report cost directly. TYPE: `Optional[CostCalculator]` DEFAULT: `None`

chat

chat(
    messages: Union[List[Dict[str, Any]], MessageHistory],
    generation_params: Optional[Dict[str, Any]] = None,
    tools: Optional[List[Dict[str, Any]]] = None,
    tool_choice: Optional[
        Union[str, Dict[str, Any]]
    ] = None,
    **kwargs: Any,
) -> ChatResponse

Send messages to the model and get a response.

This is the primary method for interacting with the model. Pass a conversation history and receive the model's response.

PARAMETER	DESCRIPTION
`messages`	The conversation history. Either a list of message dicts in OpenAI format, or a MessageHistory object. Each message has 'role' ('system', 'user', 'assistant', 'tool') and 'content' keys. TYPE: `Union[List[Dict[str, Any]], MessageHistory]`
`generation_params`	Model parameters like temperature, max_tokens, top_p, etc. Provider-specific parameters are also accepted. TYPE: `Optional[Dict[str, Any]]` DEFAULT: `None`
`tools`	Tool definitions the model can use. Each tool is a dict with 'type' (usually 'function') and 'function' containing 'name', 'description', and 'parameters' (JSON Schema). TYPE: `Optional[List[Dict[str, Any]]]` DEFAULT: `None`
`tool_choice`	How the model should use tools: - "auto": Model decides whether to use tools (default) - "none": Model won't use tools - "required": Model must use a tool - {"type": "function", "function": {"name": "..."}}: Use specific tool TYPE: `Optional[Union[str, Dict[str, Any]]]` DEFAULT: `None`
`**kwargs`	Additional provider-specific arguments. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`ChatResponse`	ChatResponse containing the model's response (text and/or tool calls).

RAISES	DESCRIPTION
`Exception`	Provider-specific errors are logged and re-raised.

Example

# Simple conversation
response = model.chat([
    {"role": "user", "content": "Hello!"}
])
print(response.content)

# With system prompt
response = model.chat([
    {"role": "system", "content": "You are a pirate."},
    {"role": "user", "content": "Hello!"}
])

# With tools
response = model.chat(
    messages=[{"role": "user", "content": "What's 2+2?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "calculator",
            "description": "Evaluate math expressions",
            "parameters": {
                "type": "object",
                "properties": {"expression": {"type": "string"}},
                "required": ["expression"]
            }
        }
    }]
)

gather_config

gather_config() -> Dict[str, Any]

Gather configuration from this Google GenAI model adapter.

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary containing model configuration.

gather_traces

gather_traces() -> Dict[str, Any]

Gather execution traces from this model adapter.

Called automatically by Benchmark to collect execution data for evaluation. Returns comprehensive statistics about all calls made to this adapter.

Output fields:

type - Component class name
gathered_at - ISO timestamp
model_id - Model identifier
total_calls - Number of chat/generate calls
successful_calls - Number of successful calls
failed_calls - Number of failed calls
total_duration_seconds - Total time spent in calls
average_duration_seconds - Average time per call
logs - List of individual call records

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary containing model execution traces.

gather_usage

gather_usage() -> Usage

Gather accumulated token usage from all chat calls.

RETURNS	DESCRIPTION
`Usage`	Summed TokenUsage across all calls, or empty TokenUsage if no calls were made.

generate

generate(
    prompt: str,
    generation_params: Optional[Dict[str, Any]] = None,
    **kwargs: Any,
) -> str

Generate text from a simple prompt.

This is a convenience method that wraps the prompt in a user message and calls chat(). Use this for simple text-in/text-out scenarios.

For conversations or tool use, use chat() directly.

PARAMETER	DESCRIPTION
`prompt`	The input prompt. TYPE: `str`
`generation_params`	Generation parameters (temperature, max_tokens, etc.). TYPE: `Optional[Dict[str, Any]]` DEFAULT: `None`
`**kwargs`	Additional provider-specific arguments. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`str`	The model's text response.

Example

response = model.generate("What is the capital of France?")
print(response)  # "Paris"

View source

LiteLLMModelAdapter

Bases: ModelAdapter

Adapter for LiteLLM unified interface.

LiteLLM provides a consistent API for calling multiple LLM providers (OpenAI, Anthropic, Cohere, Azure, AWS Bedrock, Google, etc.) using OpenAI-compatible syntax.

For supported providers see https://docs.litellm.ai/docs/providers.

API keys are read from environment variables by default

OPENAI_API_KEY for OpenAI
ANTHROPIC_API_KEY for Anthropic
etc.

Or pass api_key directly to the constructor.

seed `property`

seed: Optional[int]

Seed for deterministic generation, or None if unseeded.

init

__init__(
    model_id: str,
    default_generation_params: Optional[
        Dict[str, Any]
    ] = None,
    api_key: Optional[str] = None,
    api_base: Optional[str] = None,
    seed: Optional[int] = None,
    cost_calculator: Optional[CostCalculator] = None,
)

Initialize LiteLLM model adapter.

PARAMETER	DESCRIPTION
`model_id`	The model identifier in LiteLLM format. Examples: - "gpt-4" (OpenAI) - "claude-3-opus-20240229" (Anthropic) - "azure/gpt-4" (Azure OpenAI) - "bedrock/anthropic.claude-v2" (AWS Bedrock) See https://docs.litellm.ai/docs/providers for full list. TYPE: `str`
`default_generation_params`	Default parameters for all calls. Common parameters: temperature, max_tokens, top_p. TYPE: `Optional[Dict[str, Any]]` DEFAULT: `None`
`api_key`	API key for the provider. If not provided, LiteLLM reads from environment variables. TYPE: `Optional[str]` DEFAULT: `None`
`api_base`	Custom API base URL for self-hosted or Azure endpoints. TYPE: `Optional[str]` DEFAULT: `None`
`seed`	Seed for deterministic generation. LiteLLM passes this to the underlying provider. Note: Not all providers support seeding. TYPE: `Optional[int]` DEFAULT: `None`
`cost_calculator`	Optional cost calculator for computing cost from token counts. Note: LiteLLM already reports cost via `response._hidden_params.response_cost` for most models, so a calculator is only needed as a fallback or override. TYPE: `Optional[CostCalculator]` DEFAULT: `None`

chat

chat(
    messages: Union[List[Dict[str, Any]], MessageHistory],
    generation_params: Optional[Dict[str, Any]] = None,
    tools: Optional[List[Dict[str, Any]]] = None,
    tool_choice: Optional[
        Union[str, Dict[str, Any]]
    ] = None,
    **kwargs: Any,
) -> ChatResponse

Send messages to the model and get a response.

This is the primary method for interacting with the model. Pass a conversation history and receive the model's response.

PARAMETER	DESCRIPTION
`messages`	The conversation history. Either a list of message dicts in OpenAI format, or a MessageHistory object. Each message has 'role' ('system', 'user', 'assistant', 'tool') and 'content' keys. TYPE: `Union[List[Dict[str, Any]], MessageHistory]`
`generation_params`	Model parameters like temperature, max_tokens, top_p, etc. Provider-specific parameters are also accepted. TYPE: `Optional[Dict[str, Any]]` DEFAULT: `None`
`tools`	Tool definitions the model can use. Each tool is a dict with 'type' (usually 'function') and 'function' containing 'name', 'description', and 'parameters' (JSON Schema). TYPE: `Optional[List[Dict[str, Any]]]` DEFAULT: `None`
`tool_choice`	How the model should use tools: - "auto": Model decides whether to use tools (default) - "none": Model won't use tools - "required": Model must use a tool - {"type": "function", "function": {"name": "..."}}: Use specific tool TYPE: `Optional[Union[str, Dict[str, Any]]]` DEFAULT: `None`
`**kwargs`	Additional provider-specific arguments. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`ChatResponse`	ChatResponse containing the model's response (text and/or tool calls).

RAISES	DESCRIPTION
`Exception`	Provider-specific errors are logged and re-raised.

Example

# Simple conversation
response = model.chat([
    {"role": "user", "content": "Hello!"}
])
print(response.content)

# With system prompt
response = model.chat([
    {"role": "system", "content": "You are a pirate."},
    {"role": "user", "content": "Hello!"}
])

# With tools
response = model.chat(
    messages=[{"role": "user", "content": "What's 2+2?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "calculator",
            "description": "Evaluate math expressions",
            "parameters": {
                "type": "object",
                "properties": {"expression": {"type": "string"}},
                "required": ["expression"]
            }
        }
    }]
)

gather_config

gather_config() -> Dict[str, Any]

Gather configuration from this LiteLLM model adapter.

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary containing model configuration and LiteLLM settings.

gather_traces

gather_traces() -> Dict[str, Any]

Gather execution traces from this model adapter.

Called automatically by Benchmark to collect execution data for evaluation. Returns comprehensive statistics about all calls made to this adapter.

Output fields:

type - Component class name
gathered_at - ISO timestamp
model_id - Model identifier
total_calls - Number of chat/generate calls
successful_calls - Number of successful calls
failed_calls - Number of failed calls
total_duration_seconds - Total time spent in calls
average_duration_seconds - Average time per call
logs - List of individual call records

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary containing model execution traces.

gather_usage

gather_usage() -> Usage

Gather accumulated token usage from all chat calls.

RETURNS	DESCRIPTION
`Usage`	Summed TokenUsage across all calls, or empty TokenUsage if no calls were made.

generate

generate(
    prompt: str,
    generation_params: Optional[Dict[str, Any]] = None,
    **kwargs: Any,
) -> str

Generate text from a simple prompt.

This is a convenience method that wraps the prompt in a user message and calls chat(). Use this for simple text-in/text-out scenarios.

For conversations or tool use, use chat() directly.

PARAMETER	DESCRIPTION
`prompt`	The input prompt. TYPE: `str`
`generation_params`	Generation parameters (temperature, max_tokens, etc.). TYPE: `Optional[Dict[str, Any]]` DEFAULT: `None`
`**kwargs`	Additional provider-specific arguments. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`str`	The model's text response.

Example

response = model.generate("What is the capital of France?")
print(response)  # "Paris"

View source

AnthropicModelAdapter

Bases: ModelAdapter

Adapter for Anthropic Claude models.

Works with Claude models through the official Anthropic Python SDK. Pass any model ID supported by the Anthropic API.

The adapter accepts OpenAI-style messages and converts them to Anthropic's format internally. Key differences handled automatically:

System messages are passed separately (not in messages array)
Tool definitions are converted to Anthropic format
Tool responses are converted to tool_result content blocks

seed `property`

seed: Optional[int]

Seed for deterministic generation, or None if unseeded.

init

__init__(
    client: Any,
    model_id: str,
    default_generation_params: Optional[
        Dict[str, Any]
    ] = None,
    max_tokens: int = 4096,
    seed: Optional[int] = None,
    cost_calculator: Optional[CostCalculator] = None,
)

Initialize Anthropic model adapter.

PARAMETER	DESCRIPTION
`client`	An anthropic.Anthropic client instance. TYPE: `Any`
`model_id`	The model identifier (e.g., "claude-sonnet-4-5-20250514"). TYPE: `str`
`default_generation_params`	Default parameters for all calls. Common parameters: temperature, top_p, top_k. TYPE: `Optional[Dict[str, Any]]` DEFAULT: `None`
`max_tokens`	Maximum tokens to generate. Anthropic requires this parameter. Default is 4096. TYPE: `int` DEFAULT: `4096`
`seed`	Seed for deterministic generation. Note: Anthropic does NOT support seeding. Providing a seed will raise SeedingError. TYPE: `Optional[int]` DEFAULT: `None`
`cost_calculator`	Optional cost calculator for computing cost from token counts when the provider doesn't report cost directly. TYPE: `Optional[CostCalculator]` DEFAULT: `None`

RAISES	DESCRIPTION
`SeedingError`	If seed is provided (Anthropic doesn't support seeding).

chat

chat(
    messages: Union[List[Dict[str, Any]], MessageHistory],
    generation_params: Optional[Dict[str, Any]] = None,
    tools: Optional[List[Dict[str, Any]]] = None,
    tool_choice: Optional[
        Union[str, Dict[str, Any]]
    ] = None,
    **kwargs: Any,
) -> ChatResponse

Send messages to the model and get a response.

This is the primary method for interacting with the model. Pass a conversation history and receive the model's response.

PARAMETER	DESCRIPTION
`messages`	The conversation history. Either a list of message dicts in OpenAI format, or a MessageHistory object. Each message has 'role' ('system', 'user', 'assistant', 'tool') and 'content' keys. TYPE: `Union[List[Dict[str, Any]], MessageHistory]`
`generation_params`	Model parameters like temperature, max_tokens, top_p, etc. Provider-specific parameters are also accepted. TYPE: `Optional[Dict[str, Any]]` DEFAULT: `None`
`tools`	Tool definitions the model can use. Each tool is a dict with 'type' (usually 'function') and 'function' containing 'name', 'description', and 'parameters' (JSON Schema). TYPE: `Optional[List[Dict[str, Any]]]` DEFAULT: `None`
`tool_choice`	How the model should use tools: - "auto": Model decides whether to use tools (default) - "none": Model won't use tools - "required": Model must use a tool - {"type": "function", "function": {"name": "..."}}: Use specific tool TYPE: `Optional[Union[str, Dict[str, Any]]]` DEFAULT: `None`
`**kwargs`	Additional provider-specific arguments. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`ChatResponse`	ChatResponse containing the model's response (text and/or tool calls).

RAISES	DESCRIPTION
`Exception`	Provider-specific errors are logged and re-raised.

Example

# Simple conversation
response = model.chat([
    {"role": "user", "content": "Hello!"}
])
print(response.content)

# With system prompt
response = model.chat([
    {"role": "system", "content": "You are a pirate."},
    {"role": "user", "content": "Hello!"}
])

# With tools
response = model.chat(
    messages=[{"role": "user", "content": "What's 2+2?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "calculator",
            "description": "Evaluate math expressions",
            "parameters": {
                "type": "object",
                "properties": {"expression": {"type": "string"}},
                "required": ["expression"]
            }
        }
    }]
)

gather_config

gather_config() -> Dict[str, Any]

Gather configuration from this Anthropic model adapter.

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary containing model configuration.

gather_traces

gather_traces() -> Dict[str, Any]

Gather execution traces from this model adapter.

Called automatically by Benchmark to collect execution data for evaluation. Returns comprehensive statistics about all calls made to this adapter.

Output fields:

type - Component class name
gathered_at - ISO timestamp
model_id - Model identifier
total_calls - Number of chat/generate calls
successful_calls - Number of successful calls
failed_calls - Number of failed calls
total_duration_seconds - Total time spent in calls
average_duration_seconds - Average time per call
logs - List of individual call records

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dictionary containing model execution traces.

gather_usage

gather_usage() -> Usage

Gather accumulated token usage from all chat calls.

RETURNS	DESCRIPTION
`Usage`	Summed TokenUsage across all calls, or empty TokenUsage if no calls were made.

generate

generate(
    prompt: str,
    generation_params: Optional[Dict[str, Any]] = None,
    **kwargs: Any,
) -> str

Generate text from a simple prompt.

This is a convenience method that wraps the prompt in a user message and calls chat(). Use this for simple text-in/text-out scenarios.

For conversations or tool use, use chat() directly.

PARAMETER	DESCRIPTION
`prompt`	The input prompt. TYPE: `str`
`generation_params`	Generation parameters (temperature, max_tokens, etc.). TYPE: `Optional[Dict[str, Any]]` DEFAULT: `None`
`**kwargs`	Additional provider-specific arguments. TYPE: `Any` DEFAULT: `{}`

RETURNS	DESCRIPTION
`str`	The model's text response.

Example

response = model.generate("What is the capital of France?")
print(response)  # "Paris"

Model Adapters

ModelAdapter

model_id abstractmethod property

seed property

__init__

chat

gather_config

gather_traces

gather_usage

generate

Interfaces

OpenAIModelAdapter

seed property

__init__

chat

gather_config

gather_traces

gather_usage

generate

HuggingFacePipelineModelAdapter

seed property

__init__

chat

gather_config

gather_traces

gather_usage

generate

GoogleGenAIModelAdapter

seed property

__init__

chat

gather_config

gather_traces

gather_usage

generate

LiteLLMModelAdapter

seed property

__init__

chat

gather_config

gather_traces

gather_usage

generate

AnthropicModelAdapter

seed property

__init__

chat

gather_config

gather_traces

gather_usage

generate

model_id `abstractmethod` `property`

seed `property`

init

seed `property`

init

seed `property`

init

seed `property`

init

seed `property`

init

seed `property`

init