Skip to main content
Operator uses LiteLLM model IDs like anthropic/claude-sonnet-4-6 or openai/gpt-4.1.

Scan First

  • models: is a fallback chain, not a single preferred value.
  • Fallback happens on every live model call.
  • thinking: is off|low|medium|high.
  • Jobs inherit thinking, context_ratio, and max_output_tokens from the resolved agent.
  • Jobs may override model and max_iterations, but not thinking.

Fallback Chains

defaults:
  models:
    - "anthropic/claude-sonnet-4-6"
    - "openai/gpt-4.1"
Behavior:
  1. Operator tries the first model.
  2. If the call errors, it retries the same request with the next model.
  3. If a later model succeeds, the run continues normally.
Fallback is evaluated per live call, not per conversation. A long conversation may use different models on different turns if failures occur.

Thinking Levels

defaults:
  thinking: off

agents:
  researcher:
    models:
      - "openai/o3"
    thinking: high
Supported values:
  • off
  • low
  • medium
  • high
Operator maps these to LiteLLM reasoning controls when the current model supports them:
OperatorLiteLLM
offno reasoning request; usually reasoning_effort="none", but omitted for Anthropic compatibility
lowreasoning_effort="low"
mediumreasoning_effort="medium"
highreasoning_effort="high"
If the current model does not support reasoning controls, Operator omits the parameter and continues.

Cross-Provider Safety

When a response includes provider-specific reasoning metadata, Operator strips that metadata from assistant history before the next model call. This is what keeps Anthropic-to-OpenAI or OpenAI-to-other-provider fallbacks from breaking on incompatible history payloads.

Inheritance Rules

Execution typeModelsThinkingOther execution settings
Chatagent override or defaults.modelsagent override or defaults.thinkingmax_iterations, context_ratio, max_output_tokens resolve the same way
spawn_agent() without agent=inherit current run contextinherit current run contextinherit current run context
spawn_agent(agent="other")switch to target agentswitch to target agentswitch to target agent
Scheduled jobjob.model or resolved agent modelsresolved agent thinkingresolved agent context_ratio and max_output_tokens

Jobs

Job frontmatter supports:
  • model
  • max_iterations
It does not support:
  • thinking
  • context_ratio
  • max_output_tokens
If you need a different thinking level for a job, point the job at a different agent.

max_output_tokens

max_output_tokens sets LiteLLM max_tokens for each call.
  • if configured: Operator uses your explicit value
  • if null: Operator asks LiteLLM for the model’s max output size and uses that when available
  • if LiteLLM cannot resolve a max: the param is omitted

context_ratio

context_ratio controls how aggressively Operator trims stored history before each model call.
  • higher values keep more history
  • lower values reduce token usage
  • 0.0 disables trimming logic and sends the full stored history
The system prompt and latest user exchange are preserved first; older exchanges are the first thing dropped.