I Tracked My AI-Assisted Coding for Three Months. Here Are the Numbers.

| 5 min read |
ai developer-tools productivity copilot

After three months of tracking Copilot and GPT-4 usage across real projects, the productivity picture is messier than the marketing suggests.

Quick take

AI coding assistants are a genuine productivity boost for boilerplate, tests, and documentation. They’re a net negative for security-critical code, debugging, and architecture. My tracked numbers: about 25% faster on scaffolding tasks, roughly unchanged on complex work, and measurably worse review quality when I got lazy about checking suggestions. The tool isn’t the bottleneck. Your discipline is.

I’ve been using Copilot and GPT-4 daily since the summer. Not casually – I tracked it. Time to complete tasks, acceptance rates, bugs introduced, review time. Three months of data across production work and personal Go projects. Here is what I found.

The Before/After Numbers

Boilerplate and glue code: 25-30% faster. This is where AI assistants shine. Repetitive struct definitions, HTTP handler wiring, error wrapping patterns. I’d write a comment describing what I needed, accept the suggestion, and move on. For familiar Go patterns, the hit rate was high.

Test scaffolding: 20-25% faster on initial draft, but only about 10% faster end-to-end. The assistant generates test structure quickly, but the assertions are often wrong in subtle ways. Edge cases get missed. Table-driven test cases sound complete but have gaps. I spent the saved time on review.

Documentation and comments: 30-40% faster for first drafts. This surprised me. The assistant is genuinely good at turning code into readable explanations. I still edit everything – the tone is always too corporate – but having a draft to edit beats staring at a blank docstring.

API exploration: Useful but unreliable. When I was learning a new library, the assistant could suggest plausible usage patterns faster than I could read docs. But “plausible” isn’t “correct.” I caught three bugs in one week that came from hallucinated API behavior. The methods existed but the parameters were wrong.

Debugging: No improvement. Often negative. When I’m tracking down a subtle concurrency bug in Go, the last thing I need is another layer of confident guesses. The assistant doesn’t understand the runtime behavior. It pattern-matches against the syntax and suggests fixes that look reasonable but miss the actual problem.

Architecture and design: Not applicable. I never even tried. These decisions require context the model doesn’t have – team capabilities, product constraints, operational history, business timeline. Using an AI assistant for architecture is like asking autocomplete to write your strategy doc.

The Review Tax

Here is the number that matters most: review time increased by about 15% across all assisted code.

This is counterintuitive. The tool saves time writing code but costs time reviewing it. The code looks plausible. It follows patterns. It compiles. And sometimes it’s wrong in ways that are hard to spot because the style is correct but the logic isn’t.

I caught myself rubber-stamping suggestions twice in the first month. Both times introduced bugs. After that I adopted a rule: treat every AI suggestion like a PR from a new hire. Read it line by line. Question the edge cases. Check the error handling.

The net effect: faster for writing, slower for review, roughly neutral for total time on complex tasks, and genuinely faster for simple ones.

Where I Won’t Use It

I maintain a hard no-go list. Not because the assistant can’t generate code in these areas – it can, and that’s the problem.

  • Authentication and authorization. A subtle bug here is a security vulnerability. The cost of a mistake is too high relative to the time saved.
  • Cryptography. Just no. The assistant will confidently suggest insecure defaults.
  • Financial calculations at a fintech company. When you’re dealing with ledger operations, “close enough” isn’t a thing. Off-by-one errors in money are lawsuits.
  • Concurrency primitives. Go’s concurrency model is subtle. The assistant doesn’t understand happens-before relationships. It generates code that looks like it uses channels correctly but has race conditions.

What I Tell My Teams

I’ve been rolling this out gradually across teams I’ve worked with. The approach that works:

Start with opt-in, not mandates. Let people try it on low-risk tasks. Boilerplate, test scaffolding, documentation. No pressure.

Keep review standards unchanged. The bar for merging code doesn’t drop because an AI wrote the first draft. If anything, review should be more careful because the failure mode is “plausible but wrong.”

Track what you care about. Completion time. Bug rate. Review churn. Developer satisfaction after the novelty wears off – check in at 30 and 60 days, not just the first week when everyone is excited.

Be explicit about boundaries. Write down where the team will and won’t use it. Auth code, crypto, and permission logic are default no-go zones. Make this a team decision, not a personal preference.

The Honest Assessment

AI coding assistants are good. They aren’t transformative. They save real time on the boring parts of programming and zero time on the hard parts. The 10x productivity claims are marketing. The real number, in my experience, is something like 1.15x to 1.25x on the tasks where it helps, and 1.0x or worse on the tasks where it doesn’t.

The developers who benefit most are the ones who were already disciplined about code review and testing. The tool amplifies your existing workflow. If your workflow is “accept suggestion and ship,” you’re going to have a bad time.

Use it for drafts. Use it for repetition. Keep your review standards. That’s the entire playbook.