GitHub's Merge Queue Broke the Review-to-Main Contract

Three days after I wrote about GitHub Copilot’s privacy split, GitHub managed to tell a much uglier story about itself. The last post was about data governance. This one is about something more basic: whether the code review system can be trusted to put the reviewed code on main.

On April 23, between 16:05 and 20:43 UTC, GitHub’s Pull Requests service regressed in a way that broke merge queue correctness. According to GitHub’s own incident report, 2,092 pull requests were affected. In the impacted window, pull requests merged through merge queue with the squash method could produce incorrect merge commits. Subsequent merges could silently revert changes from earlier pull requests and even changes from prior commits.

That is not an outage story. It is a trust story.

A normal outage tells you the machine is down. A merge correctness bug tells you the machine smiled, showed green checks, said “merged,” and then wrote the wrong bytes to the branch you treat as canonical truth.

GitHub’s own merge queue documentation makes the promise clearly: the queue ensures a pull request’s changes pass required checks when applied to the latest version of the target branch and any pull requests already in the queue. The companion docs explain that GitHub creates temporary queue branches with distinct SHAs to test integrated states before landing them on the base branch. That product only works if one invariant holds: the thing that passed CI must be the thing that lands.

On April 23, that invariant failed.

GitHub’s status writeup is more revealing than the first wave of commentary. The company says the regression came from a new code path that adjusted merge base computation for merge queue ref updates. That path was supposed to stay behind a feature flag for an unreleased feature, but the gating was incomplete. The result was an incorrect three-way merge for affected squash groups. In plain English, GitHub let an unfinished path into production on one of the most sensitive write operations in the whole platform.

That detail matters because it kills the comforting version of the story where merge queues are just some fancy UI wrapper around normal Git semantics. They are not. A merge queue is a second control plane sitting on top of Git, with its own speculative branches, queue grouping logic, CI triggers, merge-base calculations, and deployment surface. The more that machinery diverges from ordinary “merge the reviewed thing” behavior, the more it needs to be treated like critical infrastructure instead of a convenience feature.

This is where the self-interested vendor posts from Mergify and Trunk are still useful. Yes, both companies sell merge-queue products. Yes, they are obviously taking free swings at GitHub. But both pieces isolate the same important structural point: silent drift between the reviewed artifact, the tested artifact, and the landed artifact is the cardinal sin of a merge system.

Trunk puts the issue in especially blunt terms: a PR that looked like a small tidy diff could land as a huge destructive change set. Mergify frames the same failure mode as the queue “lying” about what made it onto main. Strip out the marketing and the criticism still holds. A code review platform has one sacred job after approvals are done: preserve the contract between human intent and repository state.

GitHub’s own timeline shows how bad this class of failure is operationally. The bad deployment completed around 16:05 UTC. GitHub says it only became aware at 19:38 UTC after a rise in customer support inquiries. Existing automated monitoring did not catch it because the problem was not availability. The service was up. The queue was processing. The checks were green. The bug lived in correctness, not liveness.

That should make every engineering organization a little uneasy, because modern developer infrastructure is full of systems that are easy to monitor for uptime and hard to monitor for semantic drift. Agent frameworks, CI optimizers, policy bots, queue managers, backport tools, AI refactor assistants, code owners automation, release trains. They all promise throughput. Very few make it easy to prove that the final state still matches what the human thought they approved.

GitHub also says its internal validation primarily exercised single-PR merge queue groups, which did not trigger the faulty base-reference calculation. The production bug only surfaced in multi-PR squash scenarios. Again, that is not a minor testing miss. It is a sign that the company tested the happy path of the feature while under-testing the exact combinatorial state explosion that made a merge queue worth building in the first place.

Busy branches are why merge queues exist. Multi-PR interaction is not an edge case. It is the product.

There is a broader lesson here about 2026 platform culture. The industry keeps treating developer infrastructure as if the glamorous work is adding autonomy, AI surfaces, and orchestration layers, while the boring work is making sure the old invariants still hold. That hierarchy is backwards. The most valuable dev tool in the room is usually the one least interested in being interesting.

A thing with write access to main should be profoundly boring. It should be so structurally constrained that product ambition has trouble reaching the critical path. It should be painful to ship an unreleased code path into merge-base computation by accident. It should be aggressively tested for exact tree correctness across queue shapes and merge methods. And if the platform cannot prove those properties internally, users should assume they need their own audit scripts, at least for the blast-radius windows that incidents expose.

The ugly irony is that GitHub had just spent the week reminding everyone how much power it wants over the development workflow. In the Copilot policy story, the company drew hard distinctions about who gets privacy protections and who gets turned into training exhaust. In the merge queue story, it reminded everyone that workflow capture is not the same thing as workflow stewardship. Owning the control plane is impressive right up until the control plane writes fiction into your repository.

The thesis is simple: GitHub did not have a merge queue outage. It had a reality-integrity outage.

When the reviewed diff, the tested queue state, and the landed branch can diverge without alarms, the platform is no longer just helping teams move faster. It is asking them to outsource trust to a second Git that behaves like Git only when nothing interesting is happening.

That is the wrong trade.