Your Migration Is One ALTER TABLE Away From an Outage

The expand-contract pattern and the tools that actually make it survivable.

You are deploying on a Tuesday afternoon. The pull request is clean, the tests pass, the staging environment is happy. You run php artisan migrate against production. The command hangs. Then the error logs start. Then Slack starts.

It is always a Tuesday.

The table you just altered has 40 million rows. MySQL decided it needed to rebuild the entire thing. While it did, every read and write against that table queued up behind a metadata lock. Your application didn’t crash. It just stopped responding. By the time the migration finished (eight minutes later), your load balancer had marked every instance as unhealthy and dropped them.

This is not a rare story. It is how most backend developers learn that php artisan migrate is not a deployment command. It is a bomb with a variable fuse.

Why ALTER TABLE Locks at All

The problem is not migrations as a concept. The problem is that most developers treat schema changes like code changes: write the diff, deploy, done. Databases do not work that way.

When you run a DDL statement like ALTER TABLE orders ADD COLUMN processed_at TIMESTAMP NULL, what actually happens depends heavily on your database engine, version, the nature of the change and the size of the table.

MySQL’s InnoDB has three DDL algorithms: COPY (builds a full shadow table, blocks reads and writes), INPLACE (modifies the table structure in-place, may allow concurrent reads) and INSTANT (metadata-only change, no table rebuild, effectively free).

MySQL 8.0.12 added INSTANT support for adding new nullable columns. MySQL 8.0.29 extended it to dropping columns. For those operations specifically, a 100 million-row table and a 1-row table are exactly the same cost. But anything outside that narrow set of operations falls back to INPLACE or COPY, and COPY on a large table will ruin your week.

PostgreSQL is somewhat better by default. Most operations use ShareUpdateExclusiveLock instead of AccessExclusiveLock, which means reads can proceed. But adding a NOT NULL column without a default still requires a full table rewrite in older versions. Creating an index without the CONCURRENTLY keyword blocks writes for the duration. And renaming a column? That is an AccessExclusiveLock regardless.

The point is that you cannot know the locking behavior of an ALTER TABLE statement by looking at it. You have to know what your database engine, version and table size will do with it. Most developers do not have this knowledge. Most frameworks do not surface it.

The Expand-Contract Pattern Is the Actual Fix

Tooling comes later. First, get the mental model right.

The expand-contract pattern (also called parallel change) is how you change a schema without ever touching something that production code currently depends on in a breaking way. It has four phases.

Expand. Add the new thing. New column, new table, new index. Do not touch the old thing. Deploy code that writes to both old and new, still reads from old. At this point the migration is safe to run because you are only adding, never breaking.

Backfill. Populate the new column with historical data. Do this in a background job, in batches, not inside a migration file. A migration runs synchronously during deployment. A background job runs on your queue workers at whatever pace your database can handle.

Switch. Deploy code that reads from the new column. Still write to both. Verify correctness. This is your rollback window.

Contract. Drop the old column. By now, nothing in your application code references it. The drop is safe, and because you are only removing a column (and possibly an index), MySQL 8.0.29+ can do it instantly.

Notice what this pattern does not do: it does not give you a single migration file that does everything at once. The one-file, one-deploy approach is what causes incidents. Expand-contract turns one dangerous deployment into three or four boring ones.

The Tools That Complement This Pattern

The pattern handles the application layer. The tools handle the database layer when you cannot avoid a heavyweight schema change.

gh-ost is GitHub’s triggerless online schema change tool for MySQL. Instead of triggers (which add write overhead to every row operation), gh-ost connects to your database as a fake replica and streams the binary log. It creates a ghost table, copies rows in configurable chunks, applies live changes from the binlog and swaps the tables atomically when it is done.

What makes gh-ost operationally useful beyond just being online is that it is controllable mid-flight. You can pause it, throttle it, set a maximum replication lag it will respect and even postpone the final table swap until you are ready to watch it. This matters in production because “online” does not mean “invisible.” The copy process still generates load, and you may need to back off during peak hours.

For PostgreSQL, pgroll is the modern answer. Built by Xata and released as open source, pgroll implements the expand-contract pattern at the tooling level. It serves multiple schema versions simultaneously using PostgreSQL views, so your old application code and your new application code can run against the same database at the same time. As of v0.13.0 (released May 2025), it added a baseline feature that compresses migration history, which matters once you have been running expand-contract migrations for a year and your migration directory has gotten out of hand.

Neither tool replaces the expand-contract pattern. They are the implementation layer for the specific cases where you cannot avoid touching a large table.

The Specific Mistakes That Cause Incidents

Beyond “ran ALTER TABLE on a big table,” there are a handful of specific patterns that kill production databases repeatedly.

Adding a NOT NULL column without a default. In MySQL prior to 8.0, this required a full table rewrite because every existing row needed a value. In recent MySQL versions, the behavior has improved, but not universally. In PostgreSQL, a NOT NULL column with a volatile default still triggers a rewrite. The correct approach: add the column as nullable first, backfill, add the NOT NULL constraint after. PostgreSQL even lets you add the constraint as NOT VALID and then validate it separately, which is a massive win for large tables.

Renaming a column. This looks harmless. It is not. A rename is not a metadata-only operation in most engines. More importantly, you cannot rename a column safely in one step because your running application code still references the old name. You need to expand (add new column), backfill, switch, contract (drop old column). A rename migration that ships with application code in a single deploy is a data loss incident waiting to happen during a rollback.

Running backfills inside migration files. A migration file runs synchronously in the deployment pipeline. If your table has 20 million rows and your backfill updates them in a loop, your deployment will time out, your CI/CD pipeline will mark the deploy as failed and leave the database in a half-migrated state. Backfills belong in queue jobs with configurable batch sizes and delay between batches.

// Wrong: backfill inside a migration
public function up(): void
{
    Schema::table('orders', function (Blueprint $table) {
        $table->string('status_v2')->nullable();
    });

    // This will timeout on any large table
    Order::whereNull('status_v2')->update(['status_v2' => 'pending']);
}
// Right: migration only touches schema
public function up(): void
{
    Schema::table('orders', function (Blueprint $table) {
        $table->string('status_v2')->nullable();
    });
}

// Separate artisan command or queued job for the backfill
class BackfillOrderStatusV2 implements ShouldQueue
{
    public function handle(): void
    {
        Order::whereNull('status_v2')
            ->chunkById(1000, function ($orders) {
                foreach ($orders as $order) {
                    $order->update(['status_v2' => $this->deriveStatus($order)]);
                }
                // Optional: add a small sleep to avoid hammering the DB
            });
    }
}

Creating indexes without CONCURRENTLY (PostgreSQL). A standard CREATE INDEX on PostgreSQL acquires a ShareLock that blocks all writes for the duration. CREATE INDEX CONCURRENTLY acquires a much weaker lock in multiple passes, allows writes throughout and does a final pass to pick up any changes it missed. It takes longer and cannot run inside a transaction. But for any table with active production writes, it is the only acceptable option.

For MySQL, Laravel lets you chain lock modifiers onto index operations:

// MySQL: allow concurrent reads/writes during index creation
Schema::table('orders', function (Blueprint $table) {
    $table->index(['user_id', 'created_at'])->algorithm('inplace')->lock('none');
});

For PostgreSQL, run it directly outside of a migration transaction:

// PostgreSQL: must run outside a transaction block
DB::statement('CREATE INDEX CONCURRENTLY idx_orders_user_id ON orders (user_id)');

A Practical Deploy Sequence for Laravel

Here is the actual process I run for any schema change that touches a table with more than a few hundred thousand rows.

Deploy 1: Expand. Add the new column as nullable. No application code changes. Migration runs fast, no locks, no sweat.

public function up(): void
{
    Schema::table('orders', function (Blueprint $table) {
        $table->timestamp('fulfilled_at')->nullable()->after('shipped_at');
    });
}

Dispatch backfill. Schedule the backfill command to run as a queued job after the deploy. Do not block the deploy on it. It runs in the background while production traffic continues normally.

Deploy 2: Switch. Now that the column exists and has data, ship the application code that reads and writes fulfilled_at. Keep writing to the old column too, for the rollback window.

Deploy 3: Contract. Once the new column is stable and the old one is no longer referenced in code, drop it. Verify with grep -r 'old_column_name' app/ before running. Then:

public function up(): void
{
    Schema::table('orders', function (Blueprint $table) {
        $table->dropColumn('shipped_at');
    });
}

Three deploys instead of one. Each individual step is boring and safe. The whole thing is still faster to recover from if something goes wrong because each deploy is reversible until the contract step.

For teams on MySQL that need to do a heavyweight schema change outside this pattern, the laravel-zero-downtime-migration package by Daursu wraps pt-online-schema-change into the Laravel migration flow. It is a reasonable option when you have inherited a schema with no room for the gradual approach.

Separating Deploys From Migrations Is the Bigger Principle

The root cause of migration-related incidents is not developers writing bad migrations. It is the assumption that “deploy code” and “run migration” are one atomic operation.

They are not. They never were.

Schema changes have a different risk profile, different rollback story and different performance characteristics than application code changes. Treating them as one thing means accepting all the worst-case scenarios of both at once.

When you decouple migrations from deploys, you get to think about them separately. Can this migration run before the code is deployed? Can it run after? Does it need a backfill job to follow it? Does it need manual monitoring during the cutover? These are good questions with good answers. They are impossible to ask when your migration and your code ship in the same step.

The expand-contract pattern is not complicated. gh-ost and pgroll are not hard to set up. The actual work is convincing yourself that a three-step deploy is not slower than a one-step deploy that puts your database in traction.

It is much, much faster. Ask your incident report.

Your Migration Is One ALTER TABLE Away From an Outage

The expand-contract pattern and the tools that actually make it survivable.

Why ALTER TABLE Locks at All

The Expand-Contract Pattern Is the Actual Fix

The Tools That Complement This Pattern

The Specific Mistakes That Cause Incidents

A Practical Deploy Sequence for Laravel

Separating Deploys From Migrations Is the Bigger Principle

You might also like

The Livewire v3 RCE That Broke the Magic: Understanding CVE-2025–54068

Laravel 13 Just Dropped. Here’s What’s Actually Worth Your Time.

How I Added Semantic Search to a Production App Using pgvector: What the Tutorials Skip