Linux 7.0 Cuts PostgreSQL Performance in Half on AWS — Fix Won't Be Quick

5 min read 1 source breaking
├── "This is a critical kernel-level regression that cannot be fixed by database tuning"
│  ├── crcastle (Hacker News, 171 pts) → read

Submitted the Phoronix report highlighting that an AWS engineer found PostgreSQL performance halved on Linux 7.0 versus the 6.x series. The framing emphasizes that the fix 'may not be easy,' suggesting the regression is deeply rooted in kernel architectural changes rather than a simple patchable bug.

│  └── Michael Larabel (Phoronix) → read

Reports that the root cause lies in kernel-level changes introduced in the 7.0 release cycle — likely in the scheduler, memory management, or I/O subsystem — not in PostgreSQL itself. No amount of postgresql.conf tuning will recover the lost performance, making this a problem only the kernel community can address.

├── "The real-world impact is a budget emergency for cloud infrastructure, not just a technical curiosity"
│  └── top10.dev editorial (top10.dev) → read below

Argues that a 50% throughput regression on the most widely deployed database running on the most widely used cloud platform translates directly to doubled infrastructure costs. AWS runs millions of PostgreSQL instances across RDS, Aurora, and EC2, making this a financial crisis for anyone who upgrades without rigorous kernel-level benchmarking.

└── "The regression likely stems from a fundamental architectural change that will be difficult to resolve"
  └── AWS engineer (via Phoronix) (Phoronix) → read

The AWS engineer who identified the regression through production-representative benchmarks assessed that fixing the issue 'may not be easy.' This suggests the performance drop originates from a deliberate architectural change in the kernel — such as scheduler or memory subsystem rework — rather than an accidental bug amenable to a quick point-release patch.

What Happened

An engineer at AWS has reported that PostgreSQL performance drops by approximately 50% when running on Linux 7.0 compared to the 6.x kernel series. The regression was identified through production-representative benchmarks on AWS infrastructure, and the findings have drawn significant attention on Hacker News (171 points), signaling broad concern across the infrastructure community.

The performance hit is not a minor edge case — it's a halving of throughput on one of the most widely deployed databases in the world, running on the most widely used cloud platform. The report, covered by Phoronix, indicates that the root cause lies in kernel-level changes introduced in the 7.0 release cycle, not in PostgreSQL itself. This is a critical distinction: no amount of `postgresql.conf` tuning will recover what the kernel took away.

Perhaps most concerning is the assessment that fixing the issue "may not be easy." This suggests the regression stems from a fundamental architectural change in the kernel — likely in the scheduler, memory management, or I/O subsystem — rather than a simple bug that can be patched in a point release.

Why It Matters

PostgreSQL and Linux are the foundational pairing for a massive portion of production infrastructure. AWS alone runs millions of PostgreSQL instances across RDS, Aurora PostgreSQL-compatible, and customer-managed EC2 deployments. A 50% throughput regression doesn't just mean slower queries — it means doubled infrastructure costs to maintain the same performance envelope, or degraded user experience for anyone who upgrades without testing.

When your kernel upgrade doubles your database bill, that's not a performance regression — it's a budget emergency.

The Linux kernel's major version transitions have a history of database performance surprises. The 5.x to 6.x transition brought its own set of scheduler and memory management changes that affected database workloads, though none as dramatic as what's being reported here. PostgreSQL is particularly sensitive to kernel behavior because of its process-per-connection architecture and heavy reliance on the OS page cache and buffer management. Unlike databases that manage their own memory pools more aggressively, Postgres trusts the kernel to do the right thing with shared buffers, huge pages, and I/O scheduling.

The Hacker News discussion reflects a community that has learned this lesson repeatedly. Database administrators and infrastructure engineers know that kernel upgrades on database servers are never routine — they're treated with the same caution as a major PostgreSQL version upgrade, complete with shadow traffic testing and gradual rollouts.

The "not easy" fix is the real story here. When a kernel developer says a fix won't be straightforward, it typically means one of two things: the regression is a side effect of a deliberate architectural improvement that benefits other workloads, or the fix requires rethinking assumptions that are baked into multiple subsystems. Either way, it means the Linux kernel community faces an uncomfortable trade-off: revert useful changes to restore database performance, or ask the database community to wait while a proper solution is engineered.

The Kernel-Database Interface Problem

This regression highlights a deeper structural issue in how databases and operating systems co-evolve. PostgreSQL's architecture was designed for an era of Linux kernel behavior that has been incrementally changing. The implicit contract between PostgreSQL and the Linux kernel — around process scheduling, memory page management, and I/O prioritization — has no formal specification, and it breaks silently.

Modern kernel development optimizes for a broad set of workloads: containers, microservices, cloud-native applications with many short-lived processes. Database workloads look fundamentally different — long-lived processes, large shared memory segments, sequential and random I/O patterns that don't match the assumptions of general-purpose schedulers. Every time the kernel improves for the common case, it risks degrading the database case.

This is not a new tension. The introduction of transparent huge pages (THP) years ago caused similar PostgreSQL performance disasters, leading to the now-standard advice to disable THP on database servers. The cgroup v2 migration introduced its own set of database-specific gotchas. Each kernel generation adds another item to the "things to check before upgrading your database server's kernel" list.

The PostgreSQL community has historically responded to these issues by adding kernel-specific workarounds — configuration parameters that compensate for kernel behavior changes. But this approach has limits. At some point, the database needs a kernel that behaves predictably, not a pile of workarounds for kernel regressions.

What This Means for Your Stack

If you're running PostgreSQL on Linux in production — whether self-managed on EC2, on bare metal, or even on managed services where you control the kernel — do not upgrade to Linux 7.0 on database servers until this is resolved. Pin your kernel version explicitly. If you're using rolling-release distributions (Arch, Fedora, etc.) on database infrastructure, this is a good time to reconsider that choice.

If you're running managed PostgreSQL services (RDS, Aurora, Cloud SQL), your cloud provider will handle this — but it may delay their adoption of 7.0 features you were counting on. Contact your provider's support to ask about their kernel qualification timeline.

For teams doing infrastructure-as-code, add kernel version constraints to your database server provisioning. Your Terraform modules, Ansible playbooks, or CloudFormation templates should treat the kernel version as a first-class configuration parameter for database nodes, not something that floats with the latest AMI.

For capacity planning, factor in the possibility that kernel upgrades may not be performance-neutral. If your database servers are running at 60%+ CPU utilization, a 50% throughput regression means you're going from healthy to overloaded with a single `apt upgrade`. Build kernel version testing into your load testing pipeline.

Looking Ahead

This regression will likely accelerate two trends. First, expect more database vendors and cloud providers to invest in kernel-bypass technologies — io_uring, user-space networking, and direct storage access — that reduce their dependency on kernel behavior. Second, the conversation about whether PostgreSQL's process-per-connection model needs fundamental rethinking will get louder, as each kernel generation makes the assumptions underlying that architecture a little less reliable. For now, the practical advice is simple: don't upgrade, test everything, and watch the kernel mailing list for resolution timelines.

Hacker News 361 pts 109 comments

AWS Engineer Reports PostgreSQL Perf Halved by Linux 7.0, Fix May Not Be Easy

→ read on Hacker News
lfittl · Hacker News

Its worth reading this follow-up LKML post by Andres Freund (who works on Postgres): https://lore.kernel.org/lkml/yr3inlzesdb45n6i6lpbimwr7b25kqk...

galbar · Hacker News

It's not a good look to break userspace applications without a deprecation period where both old and new solutions exist, allowing for a transition period.

harshreality · Hacker News

Background on PREEMPT_LAZY:https://lwn.net/Articles/994322/

dsr_ · Hacker News

Nobody sensible runs the latest kernel; nobody running PG in production should be afraid of setting a non-default at either boot time or as a sysctl. So this will, most likely, be another step in building a PG database server (turn off pre-emption if your kernel is 7.0 or later and PG is pre-whateve

longislandguido · Hacker News

Anyone check to see if Jia Tan has submitted any kernel patches lately?

// share this

// get daily digest

Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.