AI Pair Programming: It's a Junior Dev, Not a Wizard

I pair with AI every day: building production systems, contributing to Go, and prototyping new ideas. It’s part of my workflow the same way version control and testing are – not because it’s magical, but because it’s useful when you know its limits.

The teams I’ve seen get the most value from AI coding assistants treat them the same way: like a fast, literal junior developer. Emphasis on literal. The model does exactly what you ask, fills in gaps with plausible guesses, and never tells you when your approach is wrong. That’s the mental model that keeps you productive without getting burned.

Where It Shines

AI assistants are excellent at work that’s well-scoped and pattern-driven. The kind of tasks where you know exactly what the output should look like but don’t want to type it all out.

Boilerplate generation, test scaffolding from existing patterns, translating a clear spec into working code, exploring how an unfamiliar API works, and refactoring repetitive code paths into a cleaner abstraction when you already know what that abstraction should be.

I use it heavily for these cases and it genuinely saves hours per week. When I’m writing Go and I need a new handler that follows the same pattern as the last ten handlers, the AI drafts it in seconds. I review, adjust, and move on.

Where It Falls Apart

The moment you need architectural judgment, project history, or business context, the AI becomes dangerous. Not useless – dangerous. Because it will confidently produce something that looks right, passes a quick glance, and introduces a subtle bug or design flaw that you don’t catch until it’s in production.

Watch for these warning signs:

It repeats the same mistake after you correct it. The model doesn’t learn within a session the way a human colleague does. If it keeps ignoring a constraint, it probably can’t reliably hold that constraint in its current context.
It invents things. Functions that don’t exist. Config options that aren’t real. API endpoints it hallucinated from training data. Always verify against actual docs.
It optimizes for elegance over correctness. The model loves clean, compact code. Sometimes that means it refactors away an important edge case because the edge case made the code ugly.

I’ve caught all three of these in my own work. More than once.

The Loop That Works

Long, open-ended chat sessions with AI produce garbage. The context window fills up, the model loses track of constraints, and you end up in a back-and-forth that takes longer than writing the code yourself.

Short, focused loops work. Here’s the pattern I use:

Define the task tightly. Inputs, outputs, constraints, existing style to match. Be specific. “Add a function that does X given Y, handling Z edge case, matching the pattern in the rest of this file.”
Get a first pass. Let the AI draft it.
Review critically. Not “does this look right” – trace through the logic. Check edge cases. Check error handling. Check that it respects the codebase conventions.
Iterate on specific gaps. Don’t ask for a full rewrite. Point at the specific line or logic branch that’s wrong and ask for a fix.
Integrate manually. Copy the code into your editor, run the tests, review the diff. The AI’s output is a draft, not a commit.

Give It Real Context

Vague prompts produce vague code. The single biggest improvement I’ve seen is upgrading from “write me a function that processes users” to something with actual constraints:

“Add a method getActiveUsers(since time.Time) to UserStore. Users are active if their LastSeen is after the given time. Return a slice sorted by LastSeen descending. If the store is empty, return nil, not an empty slice. Match the existing receiver pattern in this file.”

That level of specificity is the difference between useful output and time wasted reviewing hallucinated code.

The Trust Boundary

Here’s the line I draw: AI output is untrusted input. Same as user input. Same as data from an external API. It goes through the same gates.

Tests must pass.
Linter must pass.
Code review still applies. A human reads the diff.
Security-sensitive code gets extra scrutiny regardless of who or what wrote it.

Some teams have started rubber-stamping AI-generated code because “the AI wrote it and it looks fine.” That’s how you get vulnerabilities in production. I’ve seen it happen.

The Honest Assessment

AI pair programming makes me faster at the boring parts of writing software. It doesn’t make me better at the hard parts. Architecture decisions, security considerations, performance tradeoffs, understanding what the user actually needs – those are still entirely on me.

The developers who get the most value are the ones who already know what good code looks like. The AI accelerates their output. The developers who rely on AI to compensate for gaps in their understanding ship bugs faster.

Use it as a tool. Review its work. Keep the sessions short. And never, ever merge without reading the diff.