I Tried Letting AI Handle Code Maintenance. Here’s What Happened.

Security holes, code smells, missing tests, refactoring and dependency upgrades. One month of maintenance work on a real production system. Here’s the honest version.

Nobody talks about maintenance.

Every article is about shipping features. Building fast. Launching in a weekend. But the quiet reality of most engineering work is that you spend more time keeping existing code alive than you do writing new things.

I maintain a production Laravel system for one of the water utility companies in Malaysia. Real users. Real data. Bills, service requests, customer records. The kind of system where a bug is not just annoying, it is a complaint call to a government agency.

Every month I run a maintenance cycle on this codebase. Security review, code smell detection, test coverage, refactoring and dependency upgrades. Five areas. Same routine every month.

This time I brought Claude Code, Junie and JetBrains IDE into each one and paid close attention to what actually helped.

Here is what I found.

A Bit of Context on the System

The codebase is a few years old. It has gone through multiple developers and a few phases of rapid feature work driven by operational deadlines. It is not a mess but it has the marks of a system that was always under pressure to ship.

The stack is Laravel on the backend with a mix of frontend components on top. It serves internal operations staff and has a customer-facing portal for billing and service requests. Downtime or data issues here have real consequences for real people trying to pay their water bills or report a burst pipe.

That context matters because it changes how you approach maintenance. You are not just cleaning up code. You are making sure a critical piece of public infrastructure stays reliable.

Security Review: The Most Valuable Use of the Month

This is where I was most impressed.

I used Claude Code to do a first-pass security audit on the codebase. SQL injection risks, missing authorization checks, exposed sensitive data in logs, unvalidated file uploads, mass assignment vulnerabilities. The kind of things that are easy to miss when you are the one who wrote the code and know every corner of it a little too well.

The tool found three things I had missed in my own reviews. One was a route missing a middleware check, meaning a logged-in user of any role could access an administrative function if they knew the URL. It had been there for over a year. Not exploited but sitting there. The second was a log statement writing full request payloads including customer IC numbers during an error condition. The third was an old file upload handler not validating MIME types properly.

None of these were catastrophic on their own. But for a utility company handling customer identity and billing data, any one of them would have been a serious incident if found by the wrong person.

JetBrains IDE made it easy to navigate directly to each flagged location and understand it in full context rather than reading a flat list of findings. Jump to the file, see the surrounding code and make a proper judgment call.

The caveat is that the tool only finds what it can see. It flagged code-level issues well. It could not reason about infrastructure, network configuration or deployment-level risks. Security review still needs a human strategy behind it. The tool is a very good first pass, not a complete audit.

Code Smell Detection: Faster Than Doing It by Eye

Every month I go through the codebase looking for things that are not wrong exactly but are heading in the wrong direction. Functions doing too much. Classes with too many dependencies. Logic duplicated in two places because someone was in a hurry.

I gave Claude Code sections of the codebase and asked it to identify smells and explain why each one was a problem. The output was good. It caught a controller that had grown to handle eight different responsibilities. It flagged a helper file that had become a dumping ground for unrelated utility functions. It identified three places where the same business logic had been written slightly differently across different parts of the system.

What I appreciated was that it explained the why. Not just “this function is too long” but “this function is handling both the data transformation and the database write, which means you cannot test the transformation logic without hitting the database.” That kind of reasoning gives you something concrete to act on.

Junie was useful for the smaller inline smells while navigating. A method with a misleading name. A variable shadowing an outer scope. Small things that quietly accumulate into a codebase that feels harder to work in than it should.

Test Coverage: Better When You Give It the Business Context

A water utility system has business rules that are not obvious from the code alone. Late payment penalties calculated differently for domestic and commercial accounts. Service suspension rules tied to specific notice periods. Meter reading logic that handles estimated reads differently from actual reads.

The tools could not know any of that without me explaining it first.

Early in the month I made the mistake of asking Junie to write tests for a billing function without providing that context. The tests it produced were technically correct and tested exactly what the code did. But the code had a subtle error in how it handled a specific commercial account type and the tests passed right through it because they were testing the wrong thing.

Once I changed my approach and briefed the tool on what the function was supposed to do, the edge cases I cared about and what a wrong result would look like, the quality improved significantly. The output after that kind of briefing was solid enough to trust with minimal changes.

The scaffolding work was genuinely faster regardless. Mocks, setup, teardown, the basic structure of the test file. What used to take me 30 minutes to get started on now takes under 10. That adds up over a full maintenance cycle.

Refactoring: The Clearest Win of the Month

This is where the tools saved the most time with the least risk.

Refactoring is a transformation task. You know what the code does and you want it to do the same thing in a cleaner way. That is a well-defined problem and the tools handle well-defined problems well.

I had a service class that had been added to incrementally over two years. It was 500 lines and touched billing, notifications and account status in the same file. I gave Claude Code the class, explained the three distinct responsibilities and asked it to split them cleanly while preserving all the existing behavior.

The output was good. Three focused classes each with a clear job, connected through proper dependency injection. It preserved all the logic including a few edge case conditions I had honestly forgotten were in there. I reviewed it carefully, ran the test suite and merged it with one small correction.

JetBrains IDE made the review fast. Running the refactored code, stepping through it and comparing the before and after in the diff view meant I could verify behavior quickly rather than reading every line cold.

One thing to watch: the tools do not know your full system. On one refactoring task the output was clean in isolation but broke a behavior in a completely different part of the app that depended on an undocumented side effect of the original code. Caught in review. Not a disaster. But a reminder that review is never optional.

Dependency Upgrades: Useful for the Parts That Used to Be Slow

Dependency upgrades are the maintenance task everyone knows they should do and nobody wants to do. For a production system serving a utility company the risk appetite for a botched upgrade is essentially zero.

The tools helped in two specific ways.

First, Claude Code was useful for understanding what had changed between versions. Paste in the changelog or release notes and ask what the breaking changes were and which parts of the codebase were likely to be affected. That turned a two-hour reading exercise into a 20-minute one.

Second, when an upgrade did introduce a breaking change, having the tool help trace through the codebase to find all the call sites that needed updating was faster than doing it manually. Not magic. Just faster.

The actual testing and verification still required human judgment and a proper staging environment. The tools helped me understand and locate. The decision to ship was still mine.

What the Month Actually Looked Like

The honest answer is that the maintenance cycle took roughly the same amount of calendar time. Maybe 15 to 20 percent less overall.

But the quality was better. The security review found things I had been missing for months. The refactored code is genuinely cleaner. The test coverage improved more in one month than it had in the previous three combined.

What changed was not the time. It was the ceiling. There are things I was skipping in the maintenance cycle because they took too long for a single monthly window. The security audit in particular. With the tools doing the first pass I could do a more thorough review in the same amount of time.

What Still Needs a Human

The tools do not know the domain. They do not know this system serves a government-regulated utility in Malaysia. They do not know the business rules behind the billing engine or which parts of the codebase are sensitive to change for non-technical reasons. That context lives with the person who has maintained the system for years and it does not transfer through a prompt.

They also cannot make judgment calls about risk. Whether to upgrade a major dependency three months before the annual billing cycle. Whether a refactoring is worth the regression risk right before a public holiday. Whether a security finding needs an emergency patch or can wait for the next release window. Those calls are yours.

The tools are very capable. They are not a replacement for the person who understands the system.

Would I Keep Doing This?

Already locked in as part of the monthly routine.

The security pass with Claude Code is now how I start every cycle. Junie runs in the background whenever I am working through the codebase. The refactoring work moved faster this month than it has in a long time and the findings were better than what I was getting from my own eyes alone.

For anyone maintaining a production system where reliability actually matters, this approach is worth taking seriously. Not because it makes the work disappear. Because it makes the work that was previously too slow to justify actually possible within the time you have.

That is worth more than it sounds.