LiteLLM Inference Adapter
This page documents the LiteLLM model adapter for MASEval.
LiteLLMModelAdapter
Bases: ModelAdapter
Adapter for LiteLLM unified interface.
LiteLLM provides a consistent API for calling multiple LLM providers (OpenAI, Anthropic, Cohere, Azure, AWS Bedrock, Google, etc.) using OpenAI-compatible syntax.
For supported providers see https://docs.litellm.ai/docs/providers.
API keys are read from environment variables by default
- OPENAI_API_KEY for OpenAI
- ANTHROPIC_API_KEY for Anthropic
- etc.
Or pass api_key directly to the constructor.
seed
property
seed: Optional[int]
Seed for deterministic generation, or None if unseeded.
__init__
__init__(
model_id: str,
default_generation_params: Optional[
Dict[str, Any]
] = None,
api_key: Optional[str] = None,
api_base: Optional[str] = None,
seed: Optional[int] = None,
cost_calculator: Optional[CostCalculator] = None,
)
Initialize LiteLLM model adapter.
| PARAMETER | DESCRIPTION |
|---|---|
model_id
|
The model identifier in LiteLLM format. Examples: - "gpt-4" (OpenAI) - "claude-3-opus-20240229" (Anthropic) - "azure/gpt-4" (Azure OpenAI) - "bedrock/anthropic.claude-v2" (AWS Bedrock) See https://docs.litellm.ai/docs/providers for full list.
TYPE:
|
default_generation_params
|
Default parameters for all calls. Common parameters: temperature, max_tokens, top_p.
TYPE:
|
api_key
|
API key for the provider. If not provided, LiteLLM reads from environment variables.
TYPE:
|
api_base
|
Custom API base URL for self-hosted or Azure endpoints.
TYPE:
|
seed
|
Seed for deterministic generation. LiteLLM passes this to the underlying provider. Note: Not all providers support seeding.
TYPE:
|
cost_calculator
|
Optional cost calculator for computing cost from
token counts. Note: LiteLLM already reports cost via
TYPE:
|
chat
chat(
messages: Union[List[Dict[str, Any]], MessageHistory],
generation_params: Optional[Dict[str, Any]] = None,
tools: Optional[List[Dict[str, Any]]] = None,
tool_choice: Optional[
Union[str, Dict[str, Any]]
] = None,
**kwargs: Any,
) -> ChatResponse
Send messages to the model and get a response.
This is the primary method for interacting with the model. Pass a conversation history and receive the model's response.
| PARAMETER | DESCRIPTION |
|---|---|
messages
|
The conversation history. Either a list of message dicts in OpenAI format, or a MessageHistory object. Each message has 'role' ('system', 'user', 'assistant', 'tool') and 'content' keys.
TYPE:
|
generation_params
|
Model parameters like temperature, max_tokens, top_p, etc. Provider-specific parameters are also accepted.
TYPE:
|
tools
|
Tool definitions the model can use. Each tool is a dict with 'type' (usually 'function') and 'function' containing 'name', 'description', and 'parameters' (JSON Schema).
TYPE:
|
tool_choice
|
How the model should use tools: - "auto": Model decides whether to use tools (default) - "none": Model won't use tools - "required": Model must use a tool - {"type": "function", "function": {"name": "..."}}: Use specific tool
TYPE:
|
**kwargs
|
Additional provider-specific arguments.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ChatResponse
|
ChatResponse containing the model's response (text and/or tool calls). |
| RAISES | DESCRIPTION |
|---|---|
Exception
|
Provider-specific errors are logged and re-raised. |
Example
# Simple conversation
response = model.chat([
{"role": "user", "content": "Hello!"}
])
print(response.content)
# With system prompt
response = model.chat([
{"role": "system", "content": "You are a pirate."},
{"role": "user", "content": "Hello!"}
])
# With tools
response = model.chat(
messages=[{"role": "user", "content": "What's 2+2?"}],
tools=[{
"type": "function",
"function": {
"name": "calculator",
"description": "Evaluate math expressions",
"parameters": {
"type": "object",
"properties": {"expression": {"type": "string"}},
"required": ["expression"]
}
}
}]
)
gather_config
gather_config() -> Dict[str, Any]
Gather configuration from this LiteLLM model adapter.
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing model configuration and LiteLLM settings. |
gather_traces
gather_traces() -> Dict[str, Any]
Gather execution traces from this model adapter.
Called automatically by Benchmark to collect execution data for evaluation. Returns comprehensive statistics about all calls made to this adapter.
Output fields:
type- Component class namegathered_at- ISO timestampmodel_id- Model identifiertotal_calls- Number of chat/generate callssuccessful_calls- Number of successful callsfailed_calls- Number of failed callstotal_duration_seconds- Total time spent in callsaverage_duration_seconds- Average time per calllogs- List of individual call records
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing model execution traces. |
gather_usage
gather_usage() -> Usage
Gather accumulated token usage from all chat calls.
| RETURNS | DESCRIPTION |
|---|---|
Usage
|
Summed TokenUsage across all calls, or empty TokenUsage if no calls were made. |
generate
generate(
prompt: str,
generation_params: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> str
Generate text from a simple prompt.
This is a convenience method that wraps the prompt in a user message
and calls chat(). Use this for simple text-in/text-out scenarios.
For conversations or tool use, use chat() directly.
| PARAMETER | DESCRIPTION |
|---|---|
prompt
|
The input prompt.
TYPE:
|
generation_params
|
Generation parameters (temperature, max_tokens, etc.).
TYPE:
|
**kwargs
|
Additional provider-specific arguments.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
The model's text response. |
Example
response = model.generate("What is the capital of France?")
print(response) # "Paris"