Your Database Transactions Are Tanking Throughput

ACID compliance isn’t free. Neither is the isolation level you picked without testing.

You inherited a backend service. It processes orders. Throughput is capped at about 800 requests per second. You keep hitting database lock contention, and the team blames “the database.”

You’re wrong. The database is fine. Your isolation level is the problem.

This isn’t a story about MySQL being slow or PostgreSQL having mysterious performance cliffs. It’s about a decision your predecessor made in 2019 and never revisited. They set isolation level to SERIALIZABLE because “data consistency” sounded important. No one actually measured what that costs.

The ACID Bargain Nobody Explains

When you start a transaction, your database doesn’t just execute commands. It makes promises.

Atomicity says all commands in your transaction succeed or all fail. Fair.

Consistency says your data ends up in a valid state. Sure.

Isolation says other transactions don’t see your partial work. OK.

Durability says once you commit, it stays committed even if the server catches fire.

These four things cost CPU cycles, lock contention, and latency. The more you demand, the slower the whole system becomes.

Most teams run at READ COMMITTED or REPEATABLE READ. Some run SERIALIZABLE. Everyone pays a price. Most never measure it.

How Isolation Levels Actually Work

PostgreSQL has four isolation levels. Think of them as a spectrum:

Read Uncommitted doesn’t really exist in PostgreSQL (it treats it as Read Committed). Dirty reads are technically allowed by the SQL standard but you don’t actually get them in Postgres.

Read Committed is the default. A transaction only sees data that was committed before it started. Within a transaction, multiple reads of the same row might see different values if another transaction commits changes in between. This is fast. It handles high throughput. Most SaaS companies run this and sleep fine.

Repeatable Read adds stricter isolation. Once you read a row, you see the same version for the life of your transaction. This prevents non-repeatable reads. But it introduces phantom rows. A query might return different results on a second run if new rows were inserted by another transaction. PostgreSQL actually uses Snapshot Isolation here, which is stronger than the SQL standard requires. You get consistency that’s good enough for most things. Throughput drops noticeably under concurrent writes.

Serializable says your transaction behaves as if it ran completely alone. No other transaction can run concurrently and create inconsistent state. PostgreSQL enforces this with predicate locks and Serializable Snapshot Isolation (SSI). It’s thorough. It’s also slow under contention because every transaction acquisition, lock check, and conflict resolution is a syscall, and syscalls are expensive.

The Performance Reality

PostgreSQL’s official documentation makes this clear: SERIALIZABLE isolation introduces overhead for monitoring, incurs more transaction retry errors, and imposes more than REPEATABLE READ. The relative impact grows with concurrent load.

Real teams report similar patterns. Read Committed scales better under concurrent writes. Repeatable Read performs noticeably worse. Serializable performs substantially worse. The exact percentages vary by workload shape and hardware, but the direction is always the same.

One production team reported moving from SERIALIZABLE to REPEATABLE READ and seeing throughput nearly double, with retry logic handling the rare conflicts. Another moved from REPEATABLE READ to READ COMMITTED and eliminated lock timeouts entirely.

You don’t hit SERIALIZABLE limits because your database is weak. You hit them because every serializable transaction now involves conflict detection across rows that other transactions might access. Connections queue. Timeouts compound. One slow client blocks everyone behind it.

Production monitoring won’t show “isolation level is slow.” It’ll show “database connection timeout” or “lock timeout after 30 seconds.” You blame the database. The database was never the problem.

What Isolation Level Do You Actually Need?

If you’re building an accounting system or a payment processor, the answer is “higher than Read Committed but probably not Serializable.” Here’s why:

Repeatable Read with application-level validation works. You read the account balance, validate it’s above the withdrawal amount, and commit. If another transaction withdrew concurrently, Postgres tells you there’s a conflict via serialization failure. Your application retries. It works. Everyone gets the throughput they need.

Serializable buys you automatic conflict detection. Your application doesn’t need to validate. The database does it for you. But you pay for it in every single transaction. This is only worth it if transaction failures are impossible to handle in application code.

Most of you don’t have that constraint.

The False Equivalence of “Consistency”

Teams often say “we use Serializable because we need consistency.” This is cargo-cult engineering.

Consistency isn’t about isolation level. Consistency is about not writing garbage to disk. You get that from integrity constraints, foreign keys, check constraints, and not writing invalid data. These work at any isolation level.

Isolation level controls concurrency behavior, not correctness. At Read Committed, your data is still consistent. It’s just that concurrent transactions see different versions of truth. That’s often fine. That’s literally by design.

If you’re processing payments and you need to ensure no one spends the same dollar twice, Read Committed with application validation handles it. So does Repeatable Read. Serializable feels safer because you don’t have to write validation code. But that code is part of your job. It’s not optional. You’re just pushing the burden from application to database.

When Serializable Actually Makes Sense

There are real cases where Serializable is correct:

Financial reconciliation. You’re computing sums across tables and making decisions based on them. Race conditions in the computation are worse than slow throughput. Accept the tradeoff.
Auction systems where the same resource can’t sell twice. Stock inventory where inventory must never go negative. Truly mutually exclusive operations where you can’t afford a miss.
Constraint enforcement that’s cheaper to do in the database than in code. If you have 20 competing services all wanting to claim the same row, and they fire simultaneously, Serializable prevents embarrassing failures. But you could also just have a single service own that resource.

Note the pattern: these are the exceptions, not the rule. Most CRUD operations, event processing, and API backends don’t need Serializable.

The Real Cost Analysis

Here’s what you get at different isolation levels:

Read Committed:

Highest throughput under concurrent writes
Simple validation code in your application
You handle retries on conflict
No phantom read protection

Repeatable Read:

30–50% lower throughput than Read Committed
Better consistency guarantees
Phantom rows still possible
Retry logic still needed

Serializable:

Dramatically lower throughput under concurrent load
Zero validation code needed
Database handles all conflict detection
Significant monitoring overhead

If you need more throughput than your current isolation level gives you, you have two paths:

Add more database servers and implement read replicas. Expensive and only partially helps (writes still go to one primary).
Lower your isolation level and accept the operational burden of validation code.

Most teams pick path 2 and never look back. Some teams don’t realize they have a choice.

The Observability You’re Missing

How do you know what’s actually limiting you?

Pull pg_stat_statements for your database. Look for queries with high lock_time. If a query is spending 400ms waiting for locks and only 50ms executing, you have an isolation level problem.

Look at pg_stat_activity and count connections in "waiting" state. If 30% of your connections are waiting on locks, you have an isolation level problem.

Run EXPLAIN ANALYZE on your slowest queries. If the planner says 50ms but the query takes 2 seconds, you're probably waiting on locks or blocked by an exclusive lock someone else holds.

Most teams never do this. They just crank up connection pool size and hope. That’s the wrong lever.

Another signal is application-level observability. If your application code shows that reads and writes are fast in isolation but slow under load, you’re hitting contention at the database. If a single request that does ten queries takes 5 seconds when the sum of query times is 500ms, you’re waiting on locks.

If you see serialization failures (Postgres error code 40001) appearing regularly in your logs, your isolation level is too strict for your workload. Every failure means a retry, which means wasted work, which means lower effective throughput.

Logging these failures is the first step. Most teams deploy Serializable without any visibility into how often conflicts occur. Then they’re surprised when load testing reveals the collapse.

Three Mistakes You’re Probably Making Right Now

Mistake 1: Long-running transactions. You open a transaction, do some work, call an external API, wait for a response, then commit. During that wait, you’re holding locks. Other transactions queue behind you. This amplifies isolation level costs. Keep transactions short. Call external APIs outside transactions.

Mistake 2: Implicit assumptions about ordering. You assume that if transaction A committed before transaction B started, transaction B sees A’s changes. That’s true at Read Committed. But at Repeatable Read, transaction B might not see A’s changes if B read the same tables before A committed. This causes subtle bugs where a value you swear was updated isn’t visible. Test this assumption explicitly.

Mistake 3: Ignoring distributed transactions. If your transactions span multiple databases (a legacy application calling a separate microservice), you have no true ACID guarantee anyway. Two-phase commit (2PC) is slow and fragile. Accept that you’re running Read Committed semantics across services and design accordingly.

The Migration You’re Afraid Of

You want to lower your isolation level from SERIALIZABLE to REPEATABLE READ. You’re afraid you’ll lose consistency.

You won’t. Here’s how it actually goes:

Audit your application code. Find every place where you read data, make a decision, and write back. Add explicit version checks. Increment a version column. Compare before update.
Wrap those paths in retry logic that catches serialization failures from the database.
Set isolation level to REPEATABLE READ on a test instance.
Run load tests with your actual traffic patterns. Measure throughput improvement and serialization failure rate.
Deploy to production with careful monitoring. Your retry logic should catch failures cleanly.

This takes a week of careful work. Your throughput probably improves significantly. The bugs you avoid by measuring are worth the effort.

Testing Isolation Level Changes

Before you change isolation levels in production, test it correctly. That means:

Run your load tests at the new isolation level. Not a gentle warm-up. Actual production traffic patterns with realistic concurrency. Watch for serialization failures in your retry logs.

Test your retry logic explicitly. Write a test that deliberately triggers a serialization conflict and verify that your retry code handles it without data corruption.

Measure latency distribution, not just average. If 99th percentile latency triples, you’ll catch it. If only the 50th percentile improves, you might miss the problem.

Test with your actual hot tables and access patterns. A generic benchmark won’t reveal the conflicts that happen in your specific schema.

The Configuration You Can Tune

PostgreSQL also exposes default_transaction_isolation. That's the per-session default. You can override it per transaction:

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
BEGIN;
  -- your transaction
COMMIT;

This is useful for scenarios where most of your work runs at Read Committed but specific critical paths need Repeatable Read or Serializable. You don’t have to pick one level for everything.

This nuance is lost on teams who set it globally and never think about it again.

The Uncomfortable Truth

Your database isolation level is a tradeoff, not a moral choice. Higher isolation buys you freedom from thinking about concurrency in application code. It’s a convenience tax. You pay in CPU, lock contention, and throughput.

Lower isolation makes you think harder about what can race. But it scales better and costs less.

The mistake is thinking there’s a right answer independent of your workload. There isn’t. The right answer is the one you measured.

Most teams default to SERIALIZABLE because it feels safer. Most teams never measure whether it’s actually necessary. Most teams lose significant throughput as a result.

Your predecessor didn’t measure. You’re inheriting the cost.

Measure yours. Then decide.