Models & Thinking

Operator uses LiteLLM model IDs like anthropic/claude-sonnet-4-6 or openai/gpt-4.1.

Scan First

models: is a fallback chain, not a single preferred value.
Fallback happens on every live model call.
thinking: is off|low|medium|high.
Jobs inherit thinking, context_ratio, and max_output_tokens from the resolved agent.
Jobs may override model and max_iterations, but not thinking.

Fallback Chains

defaults:
  models:
    - "anthropic/claude-sonnet-4-6"
    - "openai/gpt-4.1"

Behavior:

Operator tries the first model.
If the call errors, it retries the same request with the next model.
If a later model succeeds, the run continues normally.

Fallback is evaluated per live call, not per conversation. A long conversation may use different models on different turns if failures occur.

Thinking Levels

defaults:
  thinking: off

agents:
  researcher:
    models:
      - "openai/o3"
    thinking: high

Supported values:

off
low
medium
high

Operator maps these to LiteLLM reasoning controls when the current model supports them:

Operator	LiteLLM
`off`	no reasoning request; usually `reasoning_effort="none"`, but omitted for Anthropic compatibility
`low`	`reasoning_effort="low"`
`medium`	`reasoning_effort="medium"`
`high`	`reasoning_effort="high"`

If the current model does not support reasoning controls, Operator omits the parameter and continues.

Cross-Provider Safety

When a response includes provider-specific reasoning metadata, Operator strips that metadata from assistant history before the next model call. This is what keeps Anthropic-to-OpenAI or OpenAI-to-other-provider fallbacks from breaking on incompatible history payloads.

Inheritance Rules

Execution type	Models	Thinking	Other execution settings
Chat	agent override or `defaults.models`	agent override or `defaults.thinking`	`max_iterations`, `context_ratio`, `max_output_tokens` resolve the same way
`spawn_agent()` without `agent=`	inherit current run context	inherit current run context	inherit current run context
`spawn_agent(agent="other")`	switch to target agent	switch to target agent	switch to target agent
Scheduled job	`job.model` or resolved agent models	resolved agent thinking	resolved agent `context_ratio` and `max_output_tokens`

Jobs

Job frontmatter supports:

model
max_iterations

It does not support:

thinking
context_ratio
max_output_tokens

If you need a different thinking level for a job, point the job at a different agent.

`max_output_tokens`

max_output_tokens sets LiteLLM max_tokens for each call.

if configured: Operator uses your explicit value
if null: Operator asks LiteLLM for the model’s max output size and uses that when available
if LiteLLM cannot resolve a max: the param is omitted

`context_ratio`

context_ratio controls how aggressively Operator trims stored history before each model call.

higher values keep more history
lower values reduce token usage
0.0 disables trimming logic and sends the full stored history

The system prompt and latest user exchange are preserved first; older exchanges are the first thing dropped.

Getting Started

Core Concepts

Transports

CLI Reference

Scan First

Fallback Chains

Thinking Levels

Cross-Provider Safety

Inheritance Rules

Jobs

`max_output_tokens`

`context_ratio`

Getting Started

Core Concepts

Transports

CLI Reference

​Scan First

​Fallback Chains

​Thinking Levels

​Cross-Provider Safety

​Inheritance Rules

​Jobs

​max_output_tokens

​context_ratio

​Related Docs

Scan First

Fallback Chains

Thinking Levels

Cross-Provider Safety

Inheritance Rules

Jobs

`max_output_tokens`

`context_ratio`

Related Docs