GraphQL in Production Is Harder Than They Tell You

Every GraphQL talk I’ve sat through follows the same script. “Look how elegant this is. One endpoint. Clients get exactly what they need. REST is dead.” And honestly? The pitch is compelling. I bought it. We adopted GraphQL at the fintech startup for our financial data API about a year ago.

I don’t regret it. But I’m annoyed at how much the community glosses over the hard parts.

The stuff that actually works

I’ll give credit where it’s due. The flexibility is real. At the fintech startup we serve mobile clients, our web app, and internal dashboards off the same schema. Before GraphQL, we had this growing mess of bespoke REST endpoints – /stories-lite, /stories-full, /stories-for-widget – each slightly different. That’s gone now.

# The mobile app asks for this
query StoryList {
  stories(first: 20) {
    id
    title
    thumbnailUrl
    score
  }
}

# The web dashboard asks for this
query StoryDetail($id: ID!) {
  story(id: $id) {
    id
    title
    content
    score
    entities {
      name
      type
      sentiment
    }
    source { name url }
  }
}

One schema. No arguments about response shapes. Frontend devs stop bugging me for new endpoints. That part is genuinely great.

The typed schema is also a real win. We catch mismatches early, generate client code from it, and the self-documenting nature via GraphiQL means I don’t have to maintain a separate API doc that’s always three months stale.

Adding fields without breaking existing clients? Chef’s kiss. Deprecation is clean. This is where GraphQL actually delivers on the promise.

The stuff nobody warns you about

Here’s where I get frustrated. The conference talks stop right about where the real work begins.

N+1 queries will eat you alive. Our first production deployment was a disaster. A single story list query was firing hundreds of database calls because every resolver was fetching independently. DataLoader isn’t optional. It’s not a nice-to-have. It’s table stakes. If you ship GraphQL without batching, you will get paged at 2am. I did.

Clients can craft queries that bring your server to its knees. This was a fun one to discover. Someone on our team wrote a deeply nested query during development that pegged our API server at 100% CPU for thirty seconds. In production, with real users, that’s a denial-of-service vector. You need depth limits, complexity scoring, and cost budgets from day one. Not day thirty. Day one.

Caching is a pain. With REST, you slap a Cache-Control header on a GET endpoint and CDNs just work. GraphQL uses POST. For everything. So your CDN is useless out of the box. We ended up implementing persisted queries – basically pre-registering allowed queries with stable hashes so we could cache them. It works, but it’s a bunch of extra machinery the blog posts never mention.

Monitoring is misleading. Every request hits POST /graphql. Your dashboards show one endpoint with an average latency. Meaningless. You need field-level tracing to figure out that the sentiment resolver is the one taking 800ms, not the title resolver. We wired up Apollo Tracing and suddenly understood what was actually happening. Before that we were guessing.

Error handling is weird. GraphQL returns HTTP 200 even when things break. Partial success is a feature, supposedly. Your monitoring that alerts on 5xx status codes? Blind. Your clients need to parse error arrays and handle partial data. It’s doable, but it’s a different mental model and every new developer on the team trips over it.

Schema design mistakes haunt you. I named a field wrong in month two. We’re still living with it. Renaming a field in a public GraphQL schema is technically possible but practically painful, especially when you don’t control all the clients. Think hard about naming and pagination patterns upfront. Harder than you think you need to.

What I’d tell you before you start

Design your schema from real client use cases. Not from your database tables. Not from what looks clean in a presentation. Sit with the frontend devs and figure out what they actually need.

Set up DataLoader before you write your second resolver. Enforce query depth and complexity limits at the gateway before you launch. Add field-level tracing before you need it, because by the time you need it, you’re already in an incident.

Use persisted queries. They help with caching and they help with security – you’re basically whitelisting the queries your clients can run.

And have a deprecation policy. Write it down. With timelines. Because you will want to change your schema, and without a policy you’ll just accumulate dead fields forever.

Should you use it?

If you have multiple clients with different data needs, a connected domain model, and frontend teams that want autonomy – yes. It’s good. It solved real problems for us at the fintech startup.

If you have one client, simple CRUD, or you can’t invest in the operational tooling? Stick with REST. Seriously. GraphQL without the supporting infrastructure is worse than REST, not better.

The technology works. But the gap between “GraphQL in a tutorial” and “GraphQL in production” is wider than anyone at a conference will admit.

GraphQL in Production Is Harder Than They Tell You

The stuff that actually works

The stuff nobody warns you about

What I’d tell you before you start

Should you use it?

Assumptions

Limits

References