How to Review AI-Generated Code Like a Senior Engineer (And Why Ford Just Relearned This Lesson)

Table of Contents

Ford Thought AI Alone Could Ship Code. It Couldn’t.

Last week, Ford’s CTO said something you don’t hear often from a Fortune 500 company: they made a mistake. A big one. “Mistakenly we thought that by just introducing artificial intelligence…that would produce a high-quality product,” he admitted to TechCrunch. So Ford is now doing something unusual — they’re actively rehiring veteran engineers. The “gray beards” who retired or were let go during the AI hype cycle.

Code displayed on a computer monitor — *Image: Fæ via Wikimedia Commons (CC0)*

That confession hits different when you’re a developer who’s been staring at AI-generated pull requests for the last year. Because here’s the thing Ford learned the hard way: AI writes code fast. But fast code isn’t the same as good code. And catching the difference — that’s still very much a human skill.

I’ve been using AI coding assistants daily for over a year now. Copilot, Claude, local models through Ollama — I’ve tried most of them. And I’ll tell you straight up: they’re incredible tools. But the code they produce needs a specific kind of review. Not the regular code review you’d give a colleague. Something sharper.

Let me show you what I mean.

The Three Ways AI-Generated Code Goes Wrong

Before we talk about how to review AI code, we need to understand what we’re looking for. AI models don’t make mistakes the way junior developers do. Their errors are weirder. More subtle. And sometimes more dangerous.

1. The Confident Wrong Answer

This is the one everyone talks about. The AI generates code that looks correct — proper indentation, reasonable variable names, decent comments — but does something completely wrong. I hit this last month asking for a file watcher in Python. The AI gave me a beautifully structured solution using watchdog. Except it imported a class that doesn’t exist in that library. It confidently invented an API.

The code compiled. The logic flowed. It would have failed at runtime — but only after I’d already trusted it enough to push it.

What to look for: Check every import, every library call, every function signature against the actual documentation. Don’t assume a familiar-looking API is real. AI models autocomplete what sounds like valid code, not what is valid code.

2. The Security Blind Spot

This one cost me real debugging hours. I asked an AI to generate a script that reads user uploads from a directory and processes them. Simple enough. It produced clean Python that worked perfectly — in my test environment. What I didn’t notice on the first pass: it was using os.system() with user-supplied filenames. No sanitization. No subprocess separation. Just raw command injection waiting to happen.

AI models trained on public GitHub repos have seen a lot of code. Including a lot of insecure code. They don’t distinguish between “this pattern appears frequently” and “this pattern is safe.” That’s your job.

What to look for: Shell commands, SQL queries, file paths built from user input, hardcoded secrets, missing input validation. If the AI wrote a function that touches the filesystem, network, or database — assume it’s insecure until you’ve verified otherwise.

3. The Context Amnesia

This one is subtle because the code is technically correct. The AI generates a function that works perfectly in isolation. But it doesn’t fit your codebase. Wrong error handling pattern. Different naming convention. Returns a dict when the rest of your code expects a dataclass. It solves the literal problem you described without understanding the system it’s being plugged into.

I see this constantly with larger codebases. The AI writes a helper function that looks great, but it duplicates logic that already exists in a utility module three directories away. Or it reimplements a pattern the team agreed to stop using six months ago. The AI doesn’t know your team’s history. It doesn’t know your architectural decisions. It just knows the prompt you gave it.

What to look for: Consistency with existing patterns. Duplicate logic. Import paths that don’t match your project structure. Error types that don’t match what the calling code expects to catch.

A Practical Review Checklist for AI-Generated Code

After a year of reviewing AI-generated PRs, I’ve settled into a rhythm. Here’s the checklist I actually use — not a theoretical one, but what I run through before I hit merge.

Pass 1: The Five-Second Sanity Check

Does this code actually solve the problem described in the prompt?
Are there any obvious hallucinations — imports that don’t exist, functions that were never defined?
Does the overall approach make sense, or did it take a weird detour?

This pass catches about 30% of AI mistakes. If the approach is wrong, don’t bother with detailed review — fix the prompt and regenerate.

Pass 2: Security and Correctness

Any unsanitized user input reaching shell commands, SQL, or file paths?
Hardcoded credentials, tokens, or API keys?
Error handling that swallows exceptions silently?
Race conditions — especially in async code or file operations?
Are the right data types being used at every boundary (input, processing, output)?

This is where I spend most of my review time. AI code tends to be correct in the happy path and completely unguarded at the edges.

Pass 3: Integration Fit

Does this follow the project’s existing patterns and conventions?
Are there existing utility functions or modules that already do this?
Do the imports match the project’s dependency structure?
Is the logging/error handling consistent with the rest of the codebase?

This pass is where senior devs earn their salary. The AI doesn’t know your codebase. You do. Use that.

When to Trust AI (And When Absolutely Not To)

Look, I’m not saying don’t use AI coding tools. I use them every day. The key is knowing which tasks they’re good at and which ones need full human attention.

Good Candidates for AI Assistance

Boilerplate and scaffolding. CRUD endpoints, configuration files, test stubs. AI crushes repetitive, pattern-based work.
Documentation and docstrings. Let AI draft the first version, then edit for accuracy and clarity.
Regex and string manipulation. AI is surprisingly good at regex — better than most humans at getting the escaping right on the first try.
Translation between frameworks. “Rewrite this Express.js middleware in FastAPI.” AI handles syntax translations well.
Exploratory brainstorming. “Show me three different ways to structure this data pipeline.” The AI won’t pick the best one, but it’ll surface options you might not have considered.

Keep Full Human Control

Authentication and authorization. Never let AI generate your auth logic. Too many ways to get it subtly wrong.
Cryptography. AI models have seen a lot of cryptography code. They’ve also seen a lot of broken cryptography code. They can’t tell the difference.
Financial calculations. Floating point rounding, currency conversions, tax logic — these need human verification. A wrong decimal place isn’t a bug, it’s a liability.
Business logic that defines your product. If getting this wrong would make your company look bad, you review every line yourself. AI can draft it, but the final pass is yours.
Infrastructure as Code affecting production. Terraform, Kubernetes manifests, database migrations. Test in staging, review twice, deploy manually the first time.

Building a Review Workflow That Actually Works

The biggest trap with AI coding tools isn’t bad code — it’s review fatigue. When AI can generate 500 lines in 10 seconds, your brain gets tired of reviewing by line 100. You start skimming. And skimming is where the bugs slip through.

Here’s what’s worked for me:

Limit AI output size. I never ask for more than 50-80 lines at a time. Yes, it means more prompts. But each chunk gets a real review instead of a tired skim. The alternative — asking for a 300-line module and “reviewing” it — is how Ford ended up rehiring their engineers.

Review before you run. This sounds obvious, but I see developers paste AI code into their editor and hit Run immediately. That’s testing by compiler. Read the code first. Understand what it’s doing. Then run it. If you can’t explain the code line by line, you shouldn’t be running it — and you definitely shouldn’t be committing it.

Pair AI with automated checks. Before you even start reviewing, let your toolchain catch the obvious stuff. Linters, type checkers, and static analysis tools don’t get tired. They’ll flag unused imports, type mismatches, and security anti-patterns while you focus on logic and design. I’ve talked before about setting up pre-commit hooks — they’re even more valuable when your co-author is an AI.

Write tests by hand. I used to let AI generate my tests too. Double AI — code and tests both from the same model. That’s circular validation. The AI can’t catch its own blind spots. Now I write tests manually, or at minimum, I write the test cases and assertions myself and only let the AI fill in the boilerplate. If your tests and your implementation share the same wrong assumption, they’ll pass perfectly and fail catastrophically.

Keep an AI code journal. This sounds nerdy, but it’s saved me multiple times. Every time the AI generates code with a subtle bug that took me more than five minutes to find, I jot down the pattern. “Watch for hallucinated imports from watchdog.” “AI loves os.system() with unsanitized input.” Over time, you build a personal checklist of what your specific AI tools tend to get wrong.

What Ford’s Story Actually Means for Developers

The Ford story isn’t about AI failing. It’s about the difference between generating code and engineering software. AI can do the first one. Only experienced humans can do the second.

The developers Ford is rehiring aren’t faster typists. They’re the people who know what questions to ask before writing a single line. They know which architecture decisions will haunt you six months later. They’ve made enough mistakes to recognize one before it ships. The AI is a fantastic junior developer with infinite stamina and zero judgment. The gray beards are the senior review that catches what the junior missed.

That’s actually great news for us. It means AI isn’t replacing senior developers — it’s making senior judgment more valuable, not less. The skill that matters going forward isn’t writing code. It’s knowing which code is worth keeping.

So keep using Copilot, Claude, Cursor — whatever tool fits your workflow. But review the output like a senior engineer. Because that’s what Ford just spent millions to relearn: AI accelerates your typing. It doesn’t replace your thinking. And if you’re curious about where this is all headed, I broke down the economics of AI coding assistants a few weeks back — the subsidies won’t last forever, and when they end, the developers who actually know how to code (and review code) will be the ones still standing.

And if you’re still building the foundations, I wrote about setting up AI coding context files that make these tools dramatically better at understanding your project. The better the context you give them, the fewer hallucinations you’ll need to catch. But catch them you still will — that part doesn’t change.