HuggingFace Inference Adapters
This page documents the HuggingFace model adapters for MASEval.
Pipeline Model Adapter (Text Generation)
HuggingFacePipelineModelAdapter
Bases: ModelAdapter
Adapter for HuggingFace transformers pipelines and callables.
Wraps a HuggingFace pipeline() object (or any text-generation callable)
for use with the ModelAdapter interface (chat(), generate()).
For log-likelihood scoring, see HuggingFaceModelScorer.
Works with:
transformers.pipeline()objects- Any callable that accepts a prompt and returns text
For chat functionality, the adapter uses the tokenizer's chat template if available. This provides proper formatting for instruction-tuned models.
Tool calling support
Tool calling is only supported if the model's chat template explicitly
supports it. If you pass tools and the model doesn't support them,
a ToolCallingNotSupportedError is raised. For reliable tool calling,
consider using LiteLLMModelAdapter instead.
seed
property
seed: Optional[int]
Seed for deterministic generation, or None if unseeded.
__init__
__init__(
model: Callable[[str], str],
model_id: Optional[str] = None,
default_generation_params: Optional[
Dict[str, Any]
] = None,
seed: Optional[int] = None,
cost_calculator: Optional[CostCalculator] = None,
)
Initialize HuggingFace model adapter.
| PARAMETER | DESCRIPTION |
|---|---|
model
|
A callable that generates text. Can be: - A transformers pipeline (e.g., pipeline("text-generation", ...)) - Any callable that takes a prompt string and returns text
TYPE:
|
model_id
|
Identifier for the model. If not provided, attempts to extract from the model's name_or_path attribute.
TYPE:
|
default_generation_params
|
Default parameters for all calls. Common parameters: max_new_tokens, temperature, top_p, do_sample.
TYPE:
|
seed
|
Seed for deterministic generation. Sets the random seed before each generation call using transformers.set_seed().
TYPE:
|
cost_calculator
|
Optional cost calculator for computing cost from token counts when the provider doesn't report cost directly.
TYPE:
|
chat
chat(
messages: Union[List[Dict[str, Any]], MessageHistory],
generation_params: Optional[Dict[str, Any]] = None,
tools: Optional[List[Dict[str, Any]]] = None,
tool_choice: Optional[
Union[str, Dict[str, Any]]
] = None,
**kwargs: Any,
) -> ChatResponse
Send messages to the model and get a response.
This is the primary method for interacting with the model. Pass a conversation history and receive the model's response.
| PARAMETER | DESCRIPTION |
|---|---|
messages
|
The conversation history. Either a list of message dicts in OpenAI format, or a MessageHistory object. Each message has 'role' ('system', 'user', 'assistant', 'tool') and 'content' keys.
TYPE:
|
generation_params
|
Model parameters like temperature, max_tokens, top_p, etc. Provider-specific parameters are also accepted.
TYPE:
|
tools
|
Tool definitions the model can use. Each tool is a dict with 'type' (usually 'function') and 'function' containing 'name', 'description', and 'parameters' (JSON Schema).
TYPE:
|
tool_choice
|
How the model should use tools: - "auto": Model decides whether to use tools (default) - "none": Model won't use tools - "required": Model must use a tool - {"type": "function", "function": {"name": "..."}}: Use specific tool
TYPE:
|
**kwargs
|
Additional provider-specific arguments.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ChatResponse
|
ChatResponse containing the model's response (text and/or tool calls). |
| RAISES | DESCRIPTION |
|---|---|
Exception
|
Provider-specific errors are logged and re-raised. |
Example
# Simple conversation
response = model.chat([
{"role": "user", "content": "Hello!"}
])
print(response.content)
# With system prompt
response = model.chat([
{"role": "system", "content": "You are a pirate."},
{"role": "user", "content": "Hello!"}
])
# With tools
response = model.chat(
messages=[{"role": "user", "content": "What's 2+2?"}],
tools=[{
"type": "function",
"function": {
"name": "calculator",
"description": "Evaluate math expressions",
"parameters": {
"type": "object",
"properties": {"expression": {"type": "string"}},
"required": ["expression"]
}
}
}]
)
gather_config
gather_config() -> Dict[str, Any]
Gather configuration from this HuggingFace model adapter.
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing model configuration. |
gather_traces
gather_traces() -> Dict[str, Any]
Gather execution traces from this model adapter.
Called automatically by Benchmark to collect execution data for evaluation. Returns comprehensive statistics about all calls made to this adapter.
Output fields:
type- Component class namegathered_at- ISO timestampmodel_id- Model identifiertotal_calls- Number of chat/generate callssuccessful_calls- Number of successful callsfailed_calls- Number of failed callstotal_duration_seconds- Total time spent in callsaverage_duration_seconds- Average time per calllogs- List of individual call records
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing model execution traces. |
gather_usage
gather_usage() -> Usage
Gather accumulated token usage from all chat calls.
| RETURNS | DESCRIPTION |
|---|---|
Usage
|
Summed TokenUsage across all calls, or empty TokenUsage if no calls were made. |
generate
generate(
prompt: str,
generation_params: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> str
Generate text from a simple prompt.
This is a convenience method that wraps the prompt in a user message
and calls chat(). Use this for simple text-in/text-out scenarios.
For conversations or tool use, use chat() directly.
| PARAMETER | DESCRIPTION |
|---|---|
prompt
|
The input prompt.
TYPE:
|
generation_params
|
Generation parameters (temperature, max_tokens, etc.).
TYPE:
|
**kwargs
|
Additional provider-specific arguments.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
The model's text response. |
Example
response = model.generate("What is the capital of France?")
print(response) # "Paris"
Model Scorer (Log-Likelihood)
HuggingFaceModelScorer
Bases: ModelScorer
Log-likelihood scorer backed by a HuggingFace causal language model.
Loads the model lazily on first use. Supports:
- Single-token optimisation: when all continuations map to a single token, one forward pass scores every choice.
- Multi-token fallback: separate forward pass per continuation.
loglikelihood_choices()override that picks the optimal path automatically.
The tokenisation strategy matches lm-evaluation-harness: context and
continuation are encoded separately, then concatenated to handle
tokenisation-boundary effects correctly.
seed
property
seed: Optional[int]
Seed for deterministic scoring, or None if unseeded.
__init__
__init__(
model_id: str,
device: str = "cuda:0",
trust_remote_code: bool = True,
seed: Optional[int] = None,
)
Initialize HuggingFace model scorer.
| PARAMETER | DESCRIPTION |
|---|---|
model_id
|
HuggingFace model identifier
(e.g.
TYPE:
|
device
|
Torch device string (e.g.
TYPE:
|
trust_remote_code
|
Trust remote code when loading the model.
TYPE:
|
seed
|
Seed for deterministic scoring.
TYPE:
|
gather_config
gather_config() -> Dict[str, Any]
Gather configuration including device and model settings.
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing scorer configuration. |
gather_traces
gather_traces() -> Dict[str, Any]
Gather execution traces from this scorer.
Output fields:
type- Component class namegathered_at- ISO timestampmodel_id- Model identifiertotal_calls- Number of scoring callssuccessful_calls- Number of successful callsfailed_calls- Number of failed callstotal_duration_seconds- Total time spent in callslogs- List of individual call records
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary containing scorer execution traces. |
loglikelihood
loglikelihood(context: str, continuation: str) -> float
Compute the log-likelihood of continuation given context.
| PARAMETER | DESCRIPTION |
|---|---|
context
|
The conditioning text (prompt).
TYPE:
|
continuation
|
The text whose likelihood is scored.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
float
|
Log-likelihood (negative float; higher = more likely). |
loglikelihood_batch
loglikelihood_batch(
pairs: List[Tuple[str, str]],
) -> List[float]
Compute log-likelihoods for a batch of (context, continuation) pairs.
Override _loglikelihood_batch_impl for provider-specific batching
optimisations. The default loops over _loglikelihood_impl.
| PARAMETER | DESCRIPTION |
|---|---|
pairs
|
List of (context, continuation) tuples.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
List[float]
|
List of log-likelihoods, one per pair. |
loglikelihood_choices
loglikelihood_choices(
context: str, choices: List[str], delimiter: str = " "
) -> List[float]
Score multiple-choice continuations with shared-context optimisation.
When every delimiter + choice maps to a single continuation token,
all choices are scored in one forward pass. Otherwise falls back to
per-choice scoring via _loglikelihood_impl.
| PARAMETER | DESCRIPTION |
|---|---|
context
|
The question/prompt text.
TYPE:
|
choices
|
Answer choice strings (e.g.
TYPE:
|
delimiter
|
String prepended to each choice (default
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
List[float]
|
List of log-likelihoods, one per choice. |