LGTM. Shipped. Broke Production. The Code Review Problem Nobody Talks About.

The structural flaws hiding inside modern code review culture.

You opened the pull request. Two teammates glanced at the diff. Someone typed LGTM. The pipeline went green. You merged.

Three days later, a bug report lands. Someone traces it back to that exact PR. Two approvals. Nobody caught it.

This is not a rare story. This is Tuesday.

Code review is one of the most widely adopted practices in software development and also one of the most widely misunderstood. Most teams treat it as a quality gate. In practice, it has become something closer to a ritual: a checkbox that needs to get ticked before the merge button turns green. The ritual creates the feeling of safety without much of the substance.

This article is about why that happens and what a review that actually works looks like.

What Code Review Actually Catches

There is a persistent belief that the primary job of a code review is to catch bugs. Managers expect it. Developers feel pressure to find them. The whole ceremony is framed around defect prevention.

The research tells a different story.

Multiple studies, including work published in conferences on software engineering, have found a significant mismatch between what people expect code reviews to do and what they actually do. The majority of changes made during code review do not fix major functional defects. They fix maintainability issues: unclear naming, missing comments, awkward structure and style problems.

That is not worthless. But it is very different from catching the logic error that will corrupt a user’s order on checkout day.

There is a specific taxonomy worth knowing here. A 2022 study from arXiv examining missed bugs in merged pull requests found that the most common category of bugs that slipped through review was semantic bugs, accounting for over half of all missed defects. These are logical errors where the code is syntactically correct and passes all automated checks but does the wrong thing. They are exactly the bugs that are hardest to spot by reading a diff.

The implication is uncomfortable: code review, as most teams practice it, is reasonably good at catching what the code looks like and reasonably poor at catching what the code does.

The PR Size Problem

Here is the dynamic every developer recognizes but few teams address directly.

Open a PR with 10 lines changed. You will get five comments about variable naming, a debate about whether the function should be extracted and a question about error handling. Every line gets interrogated.

Open a PR with 800 lines changed. You will get two comments and an LGTM within the hour.

This is not laziness. It is how human cognition works under load. A SmartBear study examining 2,500 code reviews across 3.2 million lines of code found that review effectiveness drops sharply after 200 to 400 lines of code. Defect density peaks at around 200 lines and then declines as reviewers fatigue. The larger the PR, the less of it gets genuinely reviewed.

The incentives make this worse. A developer sitting on a large PR is blocking a colleague’s work. The social pressure to approve and move on is real. Nobody wants to be the person who slows the team down. So the review shrinks to match the time available rather than the complexity of the change.

The result is that the biggest and riskiest changes in a codebase, the ones that touch many files and reorganise logic across multiple layers, receive the least scrutiny. The small surgical fixes get picked apart. The architectural changes sail through.

The Psychology Behind LGTM Culture

Kent Beck, who has decades of experience in software development, wrote about this honestly: “I’ve rubber-stamped PRs because I didn’t have time to really understand them. I’ve had my PRs rubber-stamped and felt a mix of relief and unease. Did anyone actually look at this?”

That unease is the right instinct. But the relief usually wins.

Several forces push teams toward LGTM culture:

Reciprocity. I approved yours quickly. Now you approve mine quickly. It becomes an unspoken agreement. Nobody negotiated it. It just settled into the team’s habits over time.

Authority dynamics. A junior developer looking at a senior’s code feels unqualified to push back even when something looks wrong. LGTM is safer than raising a concern that turns out to be wrong.

Deadline pressure. When sprint velocity is being tracked and the manager can see the PR queue, a review that takes two hours is a problem. A review that takes two minutes is not. The system rewards speed.

Cognitive load. The average developer already switches context dozens of times a day. Reviewing someone else’s code means loading an entirely different mental model of a problem you did not work on. That is genuinely expensive. LGTM is the exit ramp.

None of these are moral failures. They are rational responses to the environment most teams have created. The fix is changing the environment, not lecturing developers about doing better reviews.

What You Are Not Reviewing When You Review a Diff

The diff view is a fundamentally limited lens.

When you look at a diff, you see what changed. You do not see what was deleted and why. You do not see the behavior that emerges from how this change interacts with three other modules. You do not see whether the approach chosen was the right one or just the first one that came to mind. You do not see whether the test coverage actually exercises the edge cases that matter.

You see lines. Green and red.

This is why many of the most expensive production bugs that were approved in code review were not hiding in obviously wrong code. They were hiding in the gap between what the diff showed and what the system would actually do. The code looked correct. The logic was plausible. The tests passed. And then a specific sequence of inputs in production exposed that the reviewer and the author had shared the same wrong assumption, which meant neither of them thought to question it.

A review that only reads the diff is a review that can only catch what the diff makes visible. Most serious bugs are not visible in the diff.

What a Review That Actually Works Looks Like

None of this means code review is not worth doing. The data from Steve McConnell’s Code Complete is clear: formal code inspection, when done properly, achieves a defect detection rate of around 60 percent compared to roughly 45 percent for integration testing. That is a meaningful number. The problem is that most teams are not doing anything close to formal inspection. They are doing LGTM with a PR description.

A review worth the time it takes has a few properties that most reviews lack.

The reviewer understands the intent before reading the code. What problem is this supposed to solve? What was the approach chosen and why? Without that context, the reviewer is playing detective with the diff instead of evaluating whether the solution is the right one.

This is the author’s responsibility as much as the reviewer’s. A PR description that says “fixes bug” tells the reviewer nothing. A description that says “users with expired sessions were hitting a null pointer in the payment handler when the session token was checked after the payment object was initialized rather than before” gives the reviewer a chance to actually evaluate whether the fix addresses the root cause.

The reviewer runs the code, not just reads it. Reading a diff is passive. Checking out the branch, running it locally and trying to break it is active. This is more expensive in time and most teams do not do it for routine changes. But for any change touching critical paths, payment flows, authentication or data migrations, reading alone is not enough.

The reviewer asks questions instead of just approving. “Does this handle the case where the user’s account is suspended mid-session?” is more valuable than a comment about naming. Questions force the author to think through edge cases they may not have considered. They create a record of the assumptions baked into the design.

PRs are small enough to actually review. The single most structural change a team can make is enforcing a size limit. A PR that changes more than 400 lines should be split. No exceptions. This is a constraint that feels annoying until you experience the alternative: a five-file, 900-line PR that introduces a subtle data race and three reviewers typed LGTM in the same afternoon.

What Code Review Should Not Be Doing

Code review is not the right place to enforce style. That is what linters and formatters are for. Every minute a reviewer spends on indentation, quote style or naming conventions is a minute not spent on logic, behavior and edge cases.

Automate the mechanical feedback entirely. Run your linter in CI. Fail the build on style violations before the PR ever reaches a human. This removes an entire category of review comment and forces reviewers to engage with things that actually require judgment.

Code review is also not a good place for architecture discussions. If a PR arrives with a fundamental design decision already baked into hundreds of lines of implementation, the cost of changing it is too high for anyone to seriously suggest it. The time for architecture discussions is before the code is written. Bring the design to the team as a proposal, a document or a short conversation first. By the time it becomes a PR, the only sensible review is on the implementation of a decision already agreed upon.

The Version of Code Review Worth Keeping

Strip out the rubber stamps. Strip out the style comments. Strip out the reviews-by-deadline.

What you have left is genuinely valuable: a second set of eyes that understands the context, exercises the logic mentally, asks about the edge cases the author forgot and occasionally catches the assumption that would have silently broken something in production six weeks from now.

That version of code review is worth doing. It is also much harder to do consistently than clicking the approve button.

The gap between those two things is where most bugs live.

Your team does code reviews. That much is probably true. The harder question is whether anyone is actually reading the code.

LGTM!. Merged. See you at the postmortem.

LGTM. Shipped. Broke Production. The Code Review Problem Nobody Talks About.

The structural flaws hiding inside modern code review culture.

What Code Review Actually Catches

The PR Size Problem

The Psychology Behind LGTM Culture

What You Are Not Reviewing When You Review a Diff

What a Review That Actually Works Looks Like

What Code Review Should Not Be Doing

The Version of Code Review Worth Keeping

You might also like

I Am the Only One Who Can Deploy This System. That Is My Fault.

Every Friday at 12:30, I Close My Laptop and Hope Nothing Breaks

Debugging Is a Skill. Here’s How to Actually Get Good at It.