AI Code Review: What It Actually Catches (And What It Misses)

We started running AI-assisted code review on PRs about three months ago: first on one project, then on a few internal Go services. I was skeptical going in – my NATO background gave me a healthy distrust of automated security tools that promise more than they deliver. But I wanted to give it an honest shot.

Here’s where I landed: it’s useful. It’s just not what the marketing says it is.

The stuff it’s genuinely good at

Pattern matching. That’s the core strength, and it’s not nothing. Across our Go codebases, the AI reviewer consistently catches:

Unchecked errors. Go makes this easy to miss, and the AI never gets tired of pointing it out. Worth it for this alone, honestly.
Resource leaks. Deferred closes that should happen but don’t. Missing context cancellation.
Naming inconsistencies. It remembers the conventions better than most humans on the team.
Import ordering. Boring but useful. It catches what goimports misses when people configure their editors differently.

It’s basically a very thorough linter that can read English comments. For the mechanical stuff, it saves real time. The junior devs on one of my teams told me it cut their “stupid mistake” PR cycles in half. I believe them.

The stuff it confidently gets wrong

Here’s where it gets interesting: the AI has no idea why code exists. It can tell you code has a race condition, but it can’t tell you that the race condition is a known trade-off the team accepted because the alternative was a 3x latency hit.

Real examples from the last month:

It flagged a “redundant” nil check that was actually guarding against a known upstream bug we hadn’t fixed yet. Removing it would have caused a production incident.
It suggested refactoring a function that was intentionally verbose because three different teams needed to understand it during an incident.
It recommended moving to a newer API version that had a subtle breaking change in our edge case. The model had no idea about our specific integration constraints.

The pattern is consistent: AI reviews the diff. Humans review the context. These are different jobs.

How we actually use it

Two-pass review. Non-negotiable.

AI runs when the PR opens. Posts a summary and flags. All non-blocking.
Human reviewer looks at the PR with the AI comments as background context. They can dismiss, agree, or dig deeper.

The critical rule: AI comments are suggestions, not approvals. A clean AI report means nothing about whether the change is safe to merge. I’ve seen clean AI reports on PRs that would have taken down a production service.

We also explicitly exclude certain paths from AI review: auth code, cryptographic operations, permission logic. The cost of a confident-but-wrong suggestion in those areas is too high. A developer reads “looks good” from the AI and their guard drops. That’s the real danger.

Noise is the killer

The single biggest problem isn’t accuracy. It’s noise. If the tool posts 15 comments and 12 are trivial style nitpicks, developers stop reading the other 3. I’ve seen this happen on every team that adopts these tools without tuning.

Our fix: aggressive filtering. We configured the tool to only surface medium-and-above severity issues. Style enforcement stays in the linter where it belongs. The AI reviewer gets to have opinions about logic, error handling, and security patterns. That’s it.

This reduced comment volume by about 70% and increased the rate at which developers actually engaged with the remaining comments. Less is more. The boring lesson, again.

The honest assessment

AI code review saves maybe 15-20 minutes per PR cycle on mechanical issues. It doesn’t save any time on the hard reviews – the ones involving architecture decisions, performance trade-offs, or cross-service implications. Those still take the same amount of time, and they should.

If you’re considering adding AI review to your workflow, go in with clear expectations. It’s a tireless pattern matcher. It’s not a senior engineer. Configure it tight, keep it non-blocking, and don’t let anyone treat a clean AI report as a substitute for thinking.

The best code review tool is still someone who understands the system, the users, and the trade-offs. AI just helps them focus on the parts that matter.