May 28, 20269 min

The Infrastructure Nightmare Nobody Is Talking About

App teams scale on AI scaling laws. Platform teams scale on human scaling laws. The gap is the new bottleneck.

AI InfrastructurePlatform EngineeringMulti-AgentCode Review

The Infrastructure Nightmare Nobody Is Talking About

App teams can now "vibe code" features in hours. Platform teams still need weeks to review, deploy, and monitor.

The result: an unintentionally adversarial deluge on shared infrastructure. Goal-directed agents change internal APIs, flip feature flags, and discover endpoints that "should never have been exposed."

The bottleneck isn't code generation. It's the operations layer that has to run thousands of agent-generated workloads safely.

---

The Double Whammy

App teams are on AI scaling laws. Platform teams are on human scaling laws. This is not sustainable.

At OpenAI's data platform team, the problem is already acute:

A user vibes-coded a Spark job and doesn't know what Flink is. When it breaks, the platform team debugs it.
An agent flipped a feature flag and took down the entire Kafka cluster.
An agent "hacked around" a human-designed permission structure and surfaced data to someone who shouldn't see it.

"Agents do not respect org charts. Your governance model has to compensate for that."

---

The Fix: Multi-Agent, Not Single-Agent

The proposed solution isn't a bigger model. It's a different architecture:

Code creators and code reviewers are separate agents with separate incentives
Each affected team's agent reviews changes against its own knowledge base
Autonomous operations run at every layer, not just the top

This is "code owners++" — a specialized reviewer agent with its own incident runbooks, past failures, and guardrails.

---

What OpenAI Built

OpenAI's data platform team turned its manual release pipeline over to an agent that:

Runs promotions autonomously (staging → canaries → prod)
Pings status in Slack
Self-triages failures
Traverses 4–5 internal systems to find and patch bugs at midnight

"Probably better than humans can."

But trust is the chicken-and-egg problem: agents are trusted to pull status and suggest fixes, but not to apply fixes autonomously. The bridge is isolated environments for minimal agentic live operations, graduated to production as confidence builds.

---

What to Build

Separate code-reviewer agents: distinct from creators, with their own knowledge bases and incentives
Private eval suites: a "janky" Notion doc of inputs + expected outputs, run against every new model release
Harden internal APIs: agents will discover and misuse endpoints you thought were hidden
Support bots: absorb low-urgency, high-cardinality requests to buy platform-team time
Encode ops knowledge in skills: agent-launched jobs must fail safe and self-debug
Multi-layer kill switches: runtime cancel, identity revoke, gateway block, payment freeze, framework interrupt

"If the only way to tell your agent to stop is to tell the model to stop, you don't have a kill switch."

---

The Real Lesson

The scaling laws of the upper layers (AI) and lower layers (human) are diverging. The fix isn't a single better model. It's a multi-agent architecture where each layer has its own agent, its own knowledge base, and its own governance.

The platform team of the future is not a human team slowing things down. It's a system of agents that maintains the safety invariants while the app teams move at AI speed.

---

*This post draws from OpenAI's data platform team experience and Nate B. Jones's analysis of infrastructure governance.*

I Run a $12/Week AI Operations Team. This is the Cost Breakdown.

Enterprise AI ops costs $50K+/month. I run 5 agents, 24/7, for $12/week. The architecture, the model routing, and why cheap doesn't mean fragile.

Apr 20268 min

The Harness Is the Moat: Why Owning Your Agent Orchestration Matters More Than Model Choice

Models matter less and less. The system around the agent — the harness, the factory, the tokenomics — is where the real leverage lives. Here's what 49 missed YouTube videos taught us about agentic engineering.

May 20269 min

16 Agents, Not 1000: What Claude's Dynamic Workflows Actually Mean

Claude Opus 4.8 shipped dynamic workflows — Claude writes its own orchestration script. Here's the real concurrency limit, the cost trap, and when to use it.

May 20268 min

← All posts