May 27, 202610 min

Plan with Opus, Build with Gemini: A Practical Guide to Mixed-Provider Workflows

Frontier models are hitting rate limits. Open models are catching up. The winning strategy isn't choosing one — it's orchestrating the right model for each step.

Mixed-ProviderModel OrchestrationClaude OpusKimiArchon

Plan with Opus, Build with Gemini: A Practical Guide to Mixed-Provider Workflows

Anthropic's rate limits are tightening. Subscription output quality is reportedly degrading. And the frontier models — Claude Opus, GPT-4.5 — cost 10x more per token than Kimi K2.6, DeepSeek, or Qwen.

The solution isn't abandoning frontier models. It's mixing them: use the expensive model where it matters, and the cheap model everywhere else.

This is the mixed-provider workflow. Here's how to build it.

---

The Central Question

Where do you spend your frontier tokens?

Two hypotheses:

Opus for planning, cheap model for implementation: A thorough plan (files to touch, validation strategy, success criteria) lets a cheaper model implement reliably.
Cheap model for planning, Opus for implementation: The stronger model catches hallucinations and not-following-plan errors during self-review.

The answer is: it depends on the task, and you should test it empirically.

---

The Architecture

A mixed-provider workflow has three layers:

1. Orchestration Layer

The orchestrator is the layer above the coding agent. It builds a DAG of steps, assigns a provider to each node, and manages work-tree isolation.

Archon is the reference implementation: per-node provider selection, git work-tree isolation, retry logic, and PR creation. The key insight: tooling gets replaced, but the orchestration layer doesn't.

2. Provider Routing

Each node in the workflow gets a provider assignment:

Exploration: Sonnet (cheap, high context)
Planning: Opus (thorough, structured)
Implementation: Kimi K2.6 or Gemini 3.5 Flash (cost-efficient)
Validation: Opus (catches errors)
Design: Gemini 3.5 Flash (fast, visual)

3. Artifact Handoffs

Provider switching breaks conversation continuity. You cannot continue the same agent session across providers. The solution: markdown artifacts in a dedicated work-tree space.

The planning node writes a plan.md. The implementation node reads it. The validation node reads both. Each node is a fresh agent session, bridged by files.

---

The Reliability Reality

Kimi K2.6 is the weak link operationally:

Frequent "tool edit failed" warnings
API hangs (~1 in 4–8 runs)
Weird multi-newline output

Codex is the only agent that reportedly doesn't crash. The Claude Agent SDK also crashes occasionally (subprocess crash → retry → guard).

The harness must have built-in retry mechanisms. Timeout + reset on hang, not just on failed tool edits.

---

The Cost Math

Kimi Code: $40/month, 5% of weekly limit on a multi-million-token stream
Anthropic subscription: subsidized but rate-limited, reportedly degrading in quality
Gemini 3.5 Flash: ~20% of weekly limit per single-file edit

The strategy: use Gemini for frontend design (fast, visual) + Opus/Kimi for content (accurate, structured). Avoid using Gemini for reasoning — it hallucinates facts.

---

What to Build

Run a mixed-provider benchmark: plan with Opus, implement with Kimi, validate with Opus. Measure quality, cost, and time.
Add retry/guard logic: timeout on API hangs, reset on subprocess crashes.
Test additional models: Qwen 3.6, DeepSeek, GLM 5.1, MiniMax.
Design a "design vs. content" split: Gemini for UI, Opus for logic.
Build a private eval suite: inputs + expected outputs, run against every new model release.

The model is the engine. The orchestrator is the driver. Invest in the driver.

---

*This post draws from Cole Medin's live benchmarks of mixed-provider Archon workflows and our own experience with multi-model routing.*

Turning Audio Into a Resonant Plate

A Chladni visualizer belongs in the lab first because the artifact is interactive. The field note documents the pipeline, the visual choices, and the export path.

Jun 20266 min

SkillOpt: Stop Writing Agent Skills, Start Optimizing Them

A new optimizer treats an agent's skill document as trainable state, accepting edits only when they measurably improve a held-out score, because LLMs reading a skill can't tell good from bad.

Jun 20266 min

The Harness Is the Moat: Why Owning Your Agent Orchestration Matters More Than Model Choice

Models matter less and less. The system around the agent — the harness, the factory, the tokenomics — is where the real leverage lives. Here's what 49 missed YouTube videos taught us about agentic engineering.

May 20269 min