The editorial argues that Pull Requests, Issues, Git Operations, and API Requests all flipping red within the same minute cannot be coincidence — these surfaces share a backbone (typically the Vitess-sharded MySQL primary, the auth path, or the Spokes Git-serving control plane). GitHub's postmortems over the last 18 months have repeatedly fingered exactly these suspects, so the pattern is more diagnostic than the individual symptoms.
The editorial reframes the incident away from GitHub's fault and toward consumer fragility: every major CI system defaults to aggressive retries on git failures, so a brownout doesn't just stall pipelines — it amplifies load against an already-degraded service. Dependabot PRs piling up and bots flooding retry logs are symptoms of an industry that builds on GitHub as if it were infrastructure with no failure mode.
By posting the status-page link as-is and watching it accumulate 188 points and 147 comments within the hour, the submission treats the incident as a familiar ritual rather than a novel event. The thread's velocity — engineers narrating stalled CI queues and Dependabot retries — suggests the community now experiences GitHub degradations as a predictable shared interruption, not a surprise.
On the morning of the incident, GitHub's status page (githubstatus.com/incidents/xy1tt3hs572m) flipped four product surfaces to degraded at effectively the same moment: Pull Requests, Issues, Git Operations, and API Requests. The Hacker News thread climbed past 188 points within the hour, dominated by the usual genre of comment — engineers watching their CI queue stall, bots flooding retry logs, and Dependabot PRs piling into a void.
The incident itself was relatively short by GitHub's recent standards, but the surface area was the story. Four 'independent' product areas going red within the same minute is not four bugs — it's one bug wearing four hats. Git Operations covers the raw `git push`/`git fetch` path over HTTPS and SSH. Pull Requests is the merge-state machine on top of it. Issues is a separate-ish product surface backed by the same primary datastore. API Requests is the REST and GraphQL gateway that fronts all of the above. When all four degrade together, you are not looking at coincidence; you are looking at a shared dependency — usually the primary MySQL cluster, the auth service, or the rate-limiter fronting them.
GitHub's post-incident summaries over the last 18 months have repeatedly fingered the same suspects: replication lag on the Vitess-sharded MySQL fleet, a misbehaving auth path, or a control-plane change that cascaded through Spokes (their internal Git serving layer). We don't yet have the postmortem for this one. We do have the pattern.
The interesting question isn't "why did GitHub break." Everything breaks. The interesting question is why does your build pipeline assume it won't.
Walk through a typical mid-sized engineering org during one of these brownouts. The CI runner can't `git fetch`, so jobs hang until the 30-minute timeout, then retry, then hang again. Every CI system on Earth defaults to aggressive retries on git failures, which means a GitHub brownout doesn't just stall your pipeline — it actively makes GitHub's recovery harder by hammering them with retry storms from millions of runners simultaneously. A real example: during the January 2023 incident, GitHub's own status updates noted that retry traffic from CI systems extended the recovery window by an estimated 40 minutes after the underlying issue was resolved. The thundering herd is real, and you are part of it.
Meanwhile, your reviewers can't load PR diffs (Pull Requests surface down), your on-call can't file an incident in your GitHub-Issues-backed runbook (Issues down), your Slack bot can't comment on the PR to notify the author (API down), and your deploy tooling can't tag a release (Git Ops down). The four surfaces look independent in the marketing copy and on the status page, but they collapse together into a single point of failure the moment something upstream of all of them blinks.
This is the shared-fate problem, and it's structural. GitHub can't fully decouple these surfaces without rebuilding the product. Pull Requests *fundamentally needs* Git Ops to compute mergeability. The API *fundamentally needs* the primary datastore to serve writes. Issues *shares* notification infrastructure with PRs. The dependency graph is not a bug — it's the architecture. The honest read is: GitHub will continue to have multi-surface outages roughly quarterly, because the only alternative is rewriting the product, and they're not going to.
Compare this to the discipline you'd apply to your own infra. If your service had four critical subsystems and they all failed together every 90 days, you'd have a SEV-1 architectural review and a multi-quarter project to decompose the failure domains. GitHub's scale and product complexity makes that calculus different — and the upshot for you is that you are running production on top of a vendor whose blast radius is wider than you've been pretending.
A few concrete moves, in rough order of leverage:
1. Mirror your critical repos. Set up a read-only mirror on a second Git host — GitLab, Gitea, Codeberg, an S3-backed `git bundle`, whatever. Cron it every 15 minutes. The total cost is under an hour of setup and roughly $0/month; the upside is that a two-hour GitHub outage no longer blocks a hotfix deploy. Your CI can fall back to the mirror with a one-line URL swap.
2. Cap your CI retry behavior. Audit every job that runs `git fetch`, `gh` CLI calls, or hits `api.github.com`. Add exponential backoff with jitter, and cap total retry attempts at 3. Most teams have GitHub Actions workflows or Jenkins pipelines that will retry indefinitely on network failure — these are precisely the jobs that turn a 20-minute GitHub blip into a 90-minute outage for your team and a thundering herd for GitHub.
3. Decouple your runbooks from GitHub Issues. If your incident response process starts with "open an issue in the runbook repo," you have a circular dependency the moment GitHub goes down. Move your incident commander checklist to a static page, a Notion doc, or a printed PDF — anything that survives a GitHub outage.
4. Don't use GitHub as your status communication channel. If your status page is a GitHub Pages site, your customers can't reach it when GitHub is down. This is more common than you'd think.
GitHub's reliability profile is, on average, very good — better than most teams could build themselves. But "on average" hides the shape of the failure mode, which is infrequent, correlated, multi-surface, hours-long. That failure shape is poorly matched to the way most engineering teams have wired GitHub into their critical path, treating it as an always-on utility rather than a vendor with a quarterly bad day. The fix isn't to leave GitHub; the alternatives have their own incidents and you'd lose the network effects. The fix is to stop pretending the four-surface outage is unusual, and start designing for it the same way you design for an AWS region going dark. Treat the next GitHub status page red square as a fire drill you've already rehearsed, not a surprise.
This is getting ridiculous. One particularly concerning thing I’m seeing is that pull requests on both the web UI and API aren’t reflecting all commits or branch changes consistently. It would be very easy to merge something without realizing you’re not actually reviewing the full diff.
Before clicking, I assumed this was going to be a write-up of the one from a few days ago instead of an entirely new incident.
New PR: revert GitHub software and infrastructure to version of June 1st, 2018.New PR: disable new user signups for 6 monthsHR initiative: all future KPIs automatically require three-nines availability; all bonuses are forfeited, regardless of accomplishments, if annual availability falls below targ
is it me or ever since AI coding became the norm, there have been way more outages with otherwise reliable services?I get downtime on Supabase every few weeks. Even Cloudflare. And now Github
Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.
https://isgithubcooked.comNormally I defend GH in the comments of these incidents but it’s been an impressively bad month by their standards, even when you filter for critical components filter out sev-2’s and 3’s.