Usage & Cost Tracking
Usage and cost tracking provides data classes for recording resource consumption, a mixin for automatic collection, and pluggable cost calculators.
See the Usage & Cost Tracking guide for usage patterns and examples.
Usage
dataclass
Generic usage record for any billable resource.
Represents accumulated cost and countable units for a component or
aggregated group. All fields default to zero, so Usage() can be
used as a starting value for accumulation with + and sum().
Note
cost defaults to 0.0. This means adding a Usage()
to another record never changes the cost:
Usage() + Usage(cost=0.05) gives cost=0.05.
Components that track cost start at 0.0 and accumulate upward.
Components that do not track cost (e.g., agent adapters that only
count tokens) also default to 0.0 — their cost simply has no
effect when summed with components that do report cost.
Grouping fields (provider, category, component_name, kind)
identify what scope the record covers. When two records are summed,
matching grouping fields are preserved; mismatches become None
(meaning "aggregated over").
| ATTRIBUTE | DESCRIPTION |
|---|---|
cost |
Total cost in USD (or whatever unit your calculator uses).
Defaults to
TYPE:
|
units |
Arbitrary countable units (e.g.,
TYPE:
|
provider |
Provider identifier (e.g.,
TYPE:
|
category |
Registry category (e.g.,
TYPE:
|
component_name |
Component name within category (e.g.,
TYPE:
|
kind |
Component kind (e.g.,
TYPE:
|
Example
usage = Usage(cost=0.05, units={"api_calls": 1}, provider="bloomberg", kind="service")
# Summing preserves matching fields
total = usage + Usage(cost=0.03, units={"api_calls": 2}, provider="bloomberg", kind="service")
assert total.cost == 0.08
assert total.units == {"api_calls": 3}
assert total.provider == "bloomberg"
# Usage() is the zero element
assert (usage + Usage()).cost == 0.05
# Accumulate with sum()
records = [Usage(cost=0.10), Usage(cost=0.20), Usage(cost=0.05)]
assert sum(records, Usage()).cost == 0.35
# Mismatched grouping fields become None
mixed = usage + Usage(cost=0.10, provider="anthropic", kind="llm")
assert mixed.provider is None # aggregated over
assert mixed.kind is None # aggregated over
to_dict
to_dict() -> Dict[str, Any]
Serialize to a JSON-compatible dictionary.
TokenUsage
dataclass
Bases: Usage
LLM-specific usage record with token counts.
Extends Usage with token fields reported by LLM providers. Use
from_chat_response_usage() to create from the dict returned by
model adapters.
| ATTRIBUTE | DESCRIPTION |
|---|---|
input_tokens |
Number of input/prompt tokens.
TYPE:
|
output_tokens |
Number of output/completion tokens.
TYPE:
|
total_tokens |
Total tokens (input + output).
TYPE:
|
cached_input_tokens |
Tokens served from cache (Anthropic
TYPE:
|
cache_creation_input_tokens |
Tokens used to create a new cache entry
(Anthropic
TYPE:
|
reasoning_tokens |
Tokens used for reasoning (OpenAI
TYPE:
|
audio_tokens |
Tokens for audio processing (OpenAI).
TYPE:
|
Example
token_usage = TokenUsage.from_chat_response_usage({
"input_tokens": 100,
"output_tokens": 50,
"total_tokens": 150,
})
assert token_usage.input_tokens == 100
from_chat_response_usage
classmethod
from_chat_response_usage(
usage_dict: Dict[str, Any],
*,
cost: float = 0.0,
provider: Optional[str] = None,
category: Optional[str] = None,
component_name: Optional[str] = None,
kind: str = "llm",
) -> TokenUsage
Create a TokenUsage from a ChatResponse.usage dict.
Maps provider-specific key names to the canonical fields.
| PARAMETER | DESCRIPTION |
|---|---|
usage_dict
|
The usage dict from
TYPE:
|
cost
|
Cost in USD (e.g., from provider-reported cost). Defaults to
TYPE:
|
provider
|
Provider identifier.
TYPE:
|
category
|
Registry category.
TYPE:
|
component_name
|
Component name.
TYPE:
|
kind
|
Component kind, defaults to
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
TokenUsage
|
A TokenUsage instance with mapped fields. |
to_dict
to_dict() -> Dict[str, Any]
Serialize to a JSON-compatible dictionary.
UsageTrackableMixin
Mixin that provides usage tracking capability to any component.
Classes that inherit from UsageTrackableMixin can be registered with a
Benchmark instance and will have their usage automatically collected
by the registry via collect_usage().
The gather_usage() method provides a default implementation that returns
an empty Usage. Subclasses should override this to return their
accumulated usage data.
How to use
For custom components that incur billable costs, inherit from
UsageTrackableMixin and override gather_usage():
class MyPaidService(TraceableMixin, UsageTrackableMixin):
def __init__(self):
self._usage_records: List[Usage] = []
def call_api(self, query):
result = api.call(query)
self._usage_records.append(Usage(
cost=result.cost,
units={"api_calls": 1},
))
return result
def gather_usage(self) -> Usage:
return sum(self._usage_records, Usage())
Then register it with your benchmark:
service = MyPaidService()
benchmark.register("tools", "my_service", service)
Thread Safety
Usage collection happens synchronously in the main thread after
task execution completes. Components should use thread-safe data
structures when accumulating usage during concurrent execution,
but gather_usage() itself is called sequentially.
gather_usage
gather_usage() -> Usage
Gather accumulated usage from this component.
Provides a default implementation that returns an empty Usage. Subclasses should override this to return their accumulated usage data.
| RETURNS | DESCRIPTION |
|---|---|
Usage
|
Accumulated usage for this component. |
How to use
Override this method to return your component's usage:
def gather_usage(self) -> Usage:
return sum(self._usage_records, Usage())
CostCalculator
Bases: Protocol
Protocol for computing cost from token usage.
Implementations receive a TokenUsage and the model ID, and return
the cost in whatever unit the calculator declares (typically USD).
Example
class MyCostCalculator:
def calculate_cost(self, usage: TokenUsage, model_id: str) -> Optional[float]:
rate = MY_PRICING.get(model_id)
if rate is None:
return None
return rate["input"] * usage.input_tokens + rate["output"] * usage.output_tokens
calculate_cost
calculate_cost(
usage: TokenUsage, model_id: str
) -> Optional[float]
Compute cost for a single chat call.
| PARAMETER | DESCRIPTION |
|---|---|
usage
|
Token usage from the call.
TYPE:
|
model_id
|
The model identifier (e.g.,
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Optional[float]
|
Cost as a float, or |
StaticPricingCalculator
Cost calculator using user-supplied per-model pricing.
Pricing is specified as cost per token (not per 1K or 1M tokens).
If a model is not in the pricing table, calculate_cost returns None.
| PARAMETER | DESCRIPTION |
|---|---|
pricing
|
Dict mapping model IDs to their per-token rates. Each value is a dict with keys:
TYPE:
|
Example
calculator = StaticPricingCalculator({
"gpt-4": {"input": 0.00003, "output": 0.00006},
"claude-sonnet-4-5": {"input": 0.000003, "output": 0.000015},
})
model = LiteLLMModelAdapter(model_id="gpt-4", cost_calculator=calculator)
For university clusters or custom credit systems, the "cost" unit is whatever the pricing values represent (credits, EUR, etc.):
```python
calculator = StaticPricingCalculator({
"llama-3-70b": {"input": 0.5, "output": 1.0}, # credits per token
})
```
models
property
models: List[str]
List of model IDs with pricing configured.
add_model
add_model(model_id: str, rates: Dict[str, float]) -> None
Add or update pricing for a model.
| PARAMETER | DESCRIPTION |
|---|---|
model_id
|
The model identifier.
TYPE:
|
rates
|
Per-token rates (
TYPE:
|
calculate_cost
calculate_cost(
usage: TokenUsage, model_id: str
) -> Optional[float]
Compute cost from static per-token rates.
| PARAMETER | DESCRIPTION |
|---|---|
usage
|
Token usage from the call.
TYPE:
|
model_id
|
The model identifier to look up in the pricing table.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Optional[float]
|
Computed cost, or |
gather_config
gather_config() -> Dict[str, Any]
Return pricing configuration for reproducibility.
UsageReporter
Post-hoc utility for analyzing usage across benchmark reports.
Walks report["usage"] across all reports to produce breakdowns
by task, component, model, etc.
Example
reporter = UsageReporter.from_reports(benchmark.reports)
print(reporter.total())
print(reporter.by_task())
print(reporter.by_component())
__init__
__init__(entries: List[Dict[str, Any]])
Initialize with raw entries extracted from reports.
| PARAMETER | DESCRIPTION |
|---|---|
entries
|
List of dicts, each with
TYPE:
|
by_component
by_component() -> Dict[str, Usage]
Aggregate usage by registry key (e.g., "models:main_model").
from_reports
staticmethod
from_reports(
reports: List[Dict[str, Any]],
) -> UsageReporter
Create a UsageReporter from benchmark reports.
| PARAMETER | DESCRIPTION |
|---|---|
reports
|
The
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
UsageReporter
|
A UsageReporter ready for analysis. |
summary
summary() -> Dict[str, Any]
Nested dict with all breakdowns.
LiteLLMCostCalculator
Cost calculator using LiteLLM's bundled pricing database.
LiteLLM maintains a comprehensive model_prices_and_context_window.json
<https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json>_
that covers most major LLM providers. This calculator delegates to
litellm.cost_per_token for per-token rates and computes the total.
This is the recommended calculator for most users — it covers OpenAI, Anthropic, Google, Mistral, Cohere, and many more without requiring manual pricing tables.
Note
If you're already using the LiteLLMModelAdapter, it extracts
provider-reported cost from response._hidden_params.response_cost
automatically. This calculator is useful as a fallback when using
other adapters (OpenAI, Anthropic, Google) directly.
Example
from maseval.interface.usage import LiteLLMCostCalculator
from maseval.interface.inference import OpenAIModelAdapter
calculator = LiteLLMCostCalculator()
model = OpenAIModelAdapter(client=client, model_id="gpt-4", cost_calculator=calculator)
# Cost is now computed automatically after each chat() call
response = model.chat([{"role": "user", "content": "Hello"}])
print(model.gather_usage().cost) # e.g., 0.00123
__init__
__init__(
custom_pricing: Optional[
Dict[str, Dict[str, float]]
] = None,
model_id_map: Optional[Dict[str, str]] = None,
)
Initialize the LiteLLM cost calculator.
| PARAMETER | DESCRIPTION |
|---|---|
custom_pricing
|
Optional overrides for specific models. Keys are
model IDs, values are dicts with
TYPE:
|
model_id_map
|
Optional mapping from adapter model IDs to LiteLLM
model IDs. Use this when your adapter's Example::
TYPE:
|
calculate_cost
calculate_cost(
usage: TokenUsage, model_id: str
) -> Optional[float]
Compute cost using LiteLLM's pricing database.
| PARAMETER | DESCRIPTION |
|---|---|
usage
|
Token usage from the call.
TYPE:
|
model_id
|
The model identifier. Remapped via
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Optional[float]
|
Cost in USD, or |
Optional[float]
|
this model and no custom pricing was provided. |
gather_config
gather_config() -> Dict[str, Any]
Return calculator configuration for reproducibility.