Platform Bottlenecks

Platform bottlenecks is the failure pattern where an AI platform team centralizes decisions instead of capabilities, so its review queue becomes the product and waiting becomes the cost. A healthy platform team exists to make repeated decisions disappear into the default path. The moment every experiment needs a ticket, a Slack ping, and a weekly exception review, the platform is no longer a platform — it is a gate with a service catalog.

What it exposes

Bottlenecks rarely announce themselves. They sound like process: “we are waiting on the platform team,” “can we make this an exception,” “we built a small internal workaround,” “the platform is a few weeks behind us.” None of these lines is fatal alone; the pattern is the problem when they become the normal way work gets done.

The structural warning signs: request backlogs that never get smaller, the same exception coming back under a new name, engineers building shadow infrastructure because the official path is too slow, and work that should have been standardized long ago still handled by hand. Once teams start routing around the platform, the default path has already lost.

How to use it

Measure blunt metrics instead of dashboard comfort:

  • time from request to usable platform support
  • exceptions granted per month
  • shadow systems discovered in production
  • hours spent waiting on platform review
  • AI workflows shipped without platform involvement

These tell you whether the platform is compounding or constraining. If exceptions keep rising and the team calls that “flexibility,” the default path is still too hard to use.

Then redesign the team around capabilities, not control. Own the hard parts once — identity and access patterns, model routing defaults, evaluation harnesses, logging and traceability, safe deployment templates, fallback behavior — and get out of the way. Bias toward self-service, make safe defaults boring, and track the cost of waiting as carefully as the cost of infrastructure. The test: a platform team should remove waiting, not become a waiting room.

Essays

Questions

How do you know an AI platform team has become a bottleneck?

Watch for the recurring lines — “we are waiting on the platform team,” “can we make this an exception” — plus backlogs that never shrink, repeat exceptions under new names, and shadow infrastructure appearing because the official path is too slow.

How do you fix a platform bottleneck?

Move decisions out of the queue and into the platform. Build paved roads for the hard parts — identity, routing defaults, eval harnesses, logging, safe deployment, fallback — so the safe path is easier than the unsafe one, and stop blessing every new use case.

Should you shrink the platform team to fix it?

No. The answer is not to shrink the team and hope demand goes away; it is to redesign the team around capabilities instead of control, so repeated decisions disappear into the default path.