I Am the Only One Who Can Deploy This System. That Is My Fault.

Being the fastest person in the room feels like competence. Eventually it becomes a trap. Here is how I built one and what I did about it.

Three years ago a client asked if someone else could handle a deployment while I was on leave. Their project manager had a simple change. A config update for a new payment gateway endpoint. They said it would take few minutes.

I said I would do it remotely.

I was at a family gathering / vacation that 3 hours from the city. I sat in a corner with my laptop, connected to the server over a hotspot, made the change, verified it worked and rejoined the table 45minutes later. Everyone assumed this was fine because it worked. I assumed this was fine because it worked.

What neither of us asked was: why was I the only one who could do that?

What Tribal Knowledge Looks Like When It Has No Home

Tribal knowledge is not something you create deliberately. It accumulates.

You deploy the same system twenty times and eventually you do not think about the steps anymore. You know that the .env on the production server uses a specific database socket path instead of a host because the server was migrated once and the config was patched rather than cleaned up. You know the deployment has to run in a specific order because there is a migration that touches a column the queue workers read and if the workers do not restart first the jobs fail silently. You know the SSL certificate renews automatically except for one subdomain that was added manually and needs a manual renewal every ninety days.

You know all of this. You carry it. And because you carry it well, nobody notices the weight until you are not there.

The moment it becomes visible is always the same type of moment. You go on leave. You get sick. The client asks someone else to handle something small while you are in a meeting. The handover starts and you watch the other person’s face as they realise there is no document. No checklist. No record of what you know.

What they find instead is a server they have credentials for but cannot confidently touch. A deployment process that lives in your terminal history. Environment variables that were set two years ago by copying from a Slack message that no longer exists. A cron job that nobody added to any documentation because you added it “temporarily” in 2022 and it has been running ever since.

This is not a story about bad teams or negligent organisations. It is a story about what happens when a capable person is allowed to be the answer to every question. The system becomes dependent on that person, and the person mistakes the dependency for value.

The Bus Factor

In software engineering there is a concept called the bus factor. It asks: how many people on this project would have to be hit by a bus before the project becomes undeliverable?

A bus factor of one means a single person’s absence stops the project. Most solo technical leads working across multiple client contracts have a bus factor of one on every system they maintain. I did. I am being honest about that because pretending otherwise would make the rest of this article useless.

The problem with a bus factor of one is not just the risk. The problem is that it is invisible during normal operations. The system runs. Deployments go out. The client is happy. Nothing signals that a serious dependency exists until something breaks or someone leaves.

By then the documentation debt has compounded for months or years.

What I Did About It

I did not fix this overnight and I did not fix it by writing documentation in one long session. I fixed it by treating documentation the same way I treat database migrations: small, incremental and attached to the work rather than separate from it.

Runbooks first. A runbook is a document that describes how to perform an operational task. Not a code comment. Not a README that describes what a project is. A step-by-step procedure that someone unfamiliar with the system can follow and reach the correct outcome.

I started with the tasks I performed most often. Deploy to production. Restart the queue worker after a failed migration. Renew the SSL certificate. Roll back a bad release. For each one I wrote down exactly what I did, in order, including the commands, the expected output and what to check to confirm the step worked. I wrote them as if the reader had never touched the server before. That constraint forced me to include the parts I had been skipping because they felt obvious.

They were not obvious to anyone else.

Deployment checklists second. Runbooks describe how to do something. Checklists confirm you did not skip a step under pressure.

My deployment checklist for a Laravel application now covers: confirm the branch, confirm the environment target, run the dry-run migration check before applying, check queue status before and after worker restart, verify the scheduled task output after deployment, test the two user paths most likely to break. Each item is a checkbox. Each item can be verified by someone who was not in the original build team.

Joel Spolsky’s “Joel Test” from 2000 included the question: can you make a build in one step? The intent was not just automation. It was removing the need for a specific person to be in the room. A deployment that requires undocumented judgment calls is not a one-step build regardless of how many scripts surround it.

Environment documentation third. This is the one most people skip because it feels unglamorous. It is also the one that causes the most damage when it is missing.

For every production system I now maintain a document that covers: the server setup, which services are running and how they are managed, which environment variables exist and what each one is for, anything that was configured manually rather than through code, any recurring tasks including cron jobs and their expected behaviour and any non-obvious dependencies between components.

When I added a cron job, I added it to the document at the same time. When a certificate renewal was set to manual for that one subdomain, I added it to the document with a reminder date. The goal was that the document should match the production reality within twenty-four hours, not eventually.

The Mindset Shift That Made It Work

The practical steps are not the hard part. The hard part is stopping yourself from treating documentation as something you do after the real work.

The reason tribal knowledge accumulates is not laziness. It is that knowing something feels faster than writing it down. You can deploy the system in eight minutes. Writing the deployment runbook will take two hours. In the moment, the eight-minute deployment wins every time.

The shift I had to make was accepting that I was not evaluating the two-hour investment against the eight-minute deployment. I was evaluating it against all the future instances where I would have to be physically present, mentally available and unoccupied with anything else to keep that system running. When I calculated it that way, the two hours was cheap.

The other shift was accepting that “only I know how to do this” is not a measure of how skilled I am. It is a measure of how much undocumented risk sits in my head. A client who depends on me to be present for every production event is not a client who trusts me. It is a client who has no other option. Those are different things.

What Good Looks Like

A system with good operational documentation has these properties. Anyone with the correct credentials can deploy it by following a document. Anyone can identify what is running in production and why. Anyone can respond to a common failure mode without escalating to the person who built it. A new developer can reach a functioning local environment without asking questions that are not answered somewhere.

None of that requires extensive tooling or a DevOps team. It requires treating the system’s documentation as part of the system.

If you got sick tomorrow, could your client keep the system running for two weeks? If the answer is no, you are not an indispensable developer. You are a single point of failure wearing a contractor badge.

That is not a compliment. I know because I used to take it as one.

The three-hour drive home from that family gathering gave me time to think about what I had just demonstrated. Not that I was dedicated. Not that I was skilled enough to handle production remotely over a mobile hotspot. What I had demonstrated was that I had built something that could not survive without me.

I spent the next month writing it all down. It was slower than deploying. It felt less productive than writing code.

It was the most useful thing I did that year.

I Am the Only One Who Can Deploy This System. That Is My Fault.

Being the fastest person in the room feels like competence. Eventually it becomes a trap. Here is how I built one and what I did about it.

What Tribal Knowledge Looks Like When It Has No Home

The Bus Factor

What I Did About It

The Mindset Shift That Made It Work

What Good Looks Like

You might also like

Every Friday at 12:30, I Close My Laptop and Hope Nothing Breaks

LGTM. Shipped. Broke Production. The Code Review Problem Nobody Talks About.

Debugging Is a Skill. Here’s How to Actually Get Good at It.