A pull request landed in a repository I work with recently. 214 commits. 887 files changed. 186,000 lines added. Built over days, in phases — Phase 1 through Phase 37 — with the commits co-authored by an AI agent.
One approval had already been given.
I don't bring this up to criticise the engineer who opened it. I bring it up because I've been staring at this PR, and I can't figure out what "reviewing" it would even mean. The approval button is there. The code owners list is there. The CI pipeline exists. All the forms of review are in place. The substance — a human actually comprehending what these 186,000 lines do, why they're structured this way, what breaks if you change something on line 40,000 — that's not there. There isn't time for it. I'm not sure there ever will be again.
I've been trying to figure out why this bothers me so much. I think it's because I'm starting to see the same pattern in places that have nothing to do with code.
The Ritual of Review
For most of software engineering's short history, human review was the real bottleneck. You could write code as fast as you wanted. Tests could pass. Staging could look fine. But before anything shipped, it had to survive the judgment of another person — someone who would read your diff, hold the logic in their head, and ask uncomfortable questions about edge cases you hadn't considered. It was slow. It was sometimes political. Still, it was also the place where understanding happened. Not the approval itself — the process of arriving at the approval. That was where the value lived.
Let's not romanticize what we had. Code review was already largely ritualistic in many organizations before AI wrote a single line. Reviewers skimmed. They checked style and obvious bugs. They approved because the author was senior, or because they were behind on their own work. Academic peer review was already failing to catch fraud at meaningful rates. The substance-to-ritual ratio was already unfavorable. What AI does is not create this problem — it accelerates it past the point where we can pretend.
We're in the middle of removing the bottleneck entirely. Not as a deliberate decision, not as a policy change — but as a side effect of two pressures compounding. AI tooling has removed the friction of writing code. An agent can produce hundreds of thousands of lines in a day. Delivery expectations haven't stayed the same — they've accelerated. For managers and product people, review was always just quality control, a necessary gate before shipping. Now that AI can produce so much so fast, the expectation isn't the same output with less effort. It's ten times the output. A hundred times. Timelines have shortened. Everybody wants to see the productivity gains. Something has to give, and what gives is review. It gives quietly. Nobody decides that code review is unimportant. It just becomes physically impossible to do it at the speed the code is being produced — and the speed it's now expected to be produced. Schieber Research published an argument this year that I keep coming back to — not about software, but about consumer goods. The claim is that convenience doesn't compete with quality. It devours it. Every time a slightly worse option removes enough friction, people choose it anyway. Not because they stop caring about quality, but because the cognitive cost of the alternative becomes too high. It's a contested framing — there's a long tradition arguing that competition drives quality up. In the specific case of review under volume pressure, the devouring pattern fits. I think that's exactly what's happening to code review — squeezed from both sides, by the volume of what's produced and the pace at which it's expected to ship.
Some teams are being honest about this, and I respect that. The DoltHub team wrote about spending a week with an AI agent orchestrator and shipping a working database engine in five days. Impressive work. The author, however, was explicit about what it produced: the code is "by agents, for agents." Humans read at your own risk. Joe Ruscio at Heavybit gave this a name — "write-only code." The term has an older meaning — Perl one-liners, APL, code too dense for anyone but the author to parse. The new usage is different. It's not a defect of style but a structural condition of production. Code that was never meant to be read by a human, because no human was part of the loop. Worth naming the shift.
The rituals persist. The PR still gets an approval. The pipeline still runs. The code owners list still has names on it. From the outside, everything looks like review is happening. The substance — someone holding the code in their head, questioning its intent, catching the things that tests can't catch — that's draining away. The ritual survived. The judgment behind it didn't.
This isn't hypothetical. We already know what happens when the substance of review hollows out, because someone exploited exactly that gap. In 2024, a backdoor was discovered in XZ Utils — a compression library embedded in virtually every Linux distribution. The attack didn't exploit a technical vulnerability. It exploited a human one. A developer operating under the name Jia Tan spent 2.6 years making legitimate contributions to the project, patiently building trust with the sole maintainer, Lasse Collin. Sock puppet accounts pressured Collin about slow response times, manufacturing a sense of urgency and community frustration until he handed over co-maintainer access. Out of years of contributions, only eight commits were malicious. The rituals of open-source trust — commit history, community oversight, contributor reputation — were all intact. The substance was absent. One overwhelmed, burned-out human couldn't hold it all, and the attack was designed so that he wouldn't be able to.
Here's what keeps me thinking about XZ Utils in the context of the 186,000-line PR: that attack required years of patient, skilled human effort against a single project. AI threatens to change the economics entirely. An agent can generate plausible, legitimate-looking contributions across dozens of projects simultaneously. Building trust through volume becomes trivially cheap. Even reviewing those contributions with AI assistance isn't free — at current pricing, running code through a frontier model costs dollars per million tokens. Open-source maintainers are already unpaid volunteers. Now the cost of vigilance goes up while the cost of producing convincing noise goes down. The asymmetry that made the XZ attack possible — one overwhelmed human against one patient attacker — scales in the attacker's favor. AI didn't create this vulnerability. It threatens, however, to make the economics of exploiting it dramatically worse.
This problem isn't entirely new. When compilers appeared, we accepted incomprehensible artifacts for the first time — the machine code a compiler produced was not meant to be read by humans. Those artifacts, however, were downstream of human intent. The source code was still ours. We understood what we were asking the machine to do, even if we couldn't read the machine's output. Over decades, we built safety infrastructure around that gap: type systems, formal verification, test suites, decades of accumulated confidence that the compiler was doing what we told it to. What's different now is that the incomprehensible artifacts are the source code — the expression of intent itself. The 186,000-line PR isn't machine code generated from a human specification. It's the specification. We don't yet have the equivalent of type systems or formal verification for code that was never held in a human mind.
I keep thinking this is just a code review problem. The more I sit with it, the less sure I am. Something similar seems to be happening to knowledge itself.
The Ritual of Knowing
I read an essay recently by Jônadas Techio called "The Claim Upon the Training Data" that reframed something I'd been vaguely sensing but couldn't articulate. Techio writes about Tyler Cowen — the economist who has been writing deliberately, strategically, not just for human readers but for AI training data. Cowen understands that the models being built right now will mediate how future people (and future systems) understand economics, policy, and perhaps reality itself. So he's writing for that audience. Shaping the training set on purpose.
This is a strange and fascinating move. Techio calls it founding an institution at the "fiction layer" — the layer where shared reality gets constructed. That sounds abstract, so let me try to make it concrete. For most of history, the layer where collective truth was maintained — what counts as knowledge, what's credible, what's authoritative — was controlled by institutions. Universities, publishers, religious authorities, peer review boards. They were imperfect. Often exclusionary. Sometimes corrupt. But they provided friction. Getting something accepted as "knowledge" was hard, and that difficulty was the point. The friction was the substance behind the ritual of credibility. When a paper was published, or a book carried a university press imprint, or a claim survived peer review, there was a human chain of judgment behind it — people with reputations at stake, who could be wrong, who bore some cost if they were.
What Techio is pointing at is that this friction is dissolving. Anyone can now write into the training data — not just Cowen, but anyone who publishes anything on the open web. The infrastructure that ingests this data can't distinguish between a carefully reasoned argument and a confident-sounding fabrication. The ritual of credibility survives. Published articles still look authoritative. AI-generated summaries still cite sources. The substance behind it — someone with stakes vouching for the truth, someone who can be held accountable for being wrong — that's thinning out.
Techio takes this further, drawing on Stanley Cavell's reading of Wittgenstein. Cavell makes a distinction between knowledge and acknowledgment. A system can process claims — can store them, retrieve them, recombine them, even generate new ones. But being claimed upon — having a claim make a demand on you, requiring you to reckon with it — that's something different. It requires finitude. Vulnerability. The possibility of being wrong with real consequences.
Let me try to make this less abstract. A doctor diagnosing a patient stakes their reputation, their license, their sleep. They carry the weight of that judgment home with them. They lie awake wondering if they missed something. A model suggesting a diagnosis processes the same symptoms and may reach the same conclusion. There is, however, no one who lies awake wondering if they got it wrong. No one whose career ends if the diagnosis kills someone. The output may be identical. The accountability structure is categorically different. That difference is what I mean by "substance" — not just comprehension, but exposure. Someone who can be wrong, and who bears the cost of being wrong. A system that processes knowledge without bearing any risk in its coherence can preserve the form of knowing perfectly while the substance — the part that needs a creature with something to lose — goes absent.
Same pattern, wider blast radius. In code, we kept the ritual of review while the judgment drained out. In knowledge, we're keeping the rituals of authority — publication, citation, institutional backing — while the thing that made authority meaningful erodes underneath. Someone with stakes, vouching for truth.
If this feels unprecedented — I don't think it is. We've watched this exact dynamic play out before, and we can see where the current one leads.
The Ritual of Signal
The early internet promised democratization. Anyone could publish, anyone could access. The old gatekeepers were going to be disintermediated by something open and meritocratic. What it delivered instead was a flood. SEO spam. Content farms. Engagement-optimized articles engineered to capture clicks, not to inform. The rituals of signal survived — Google results still look like a curated library, social media still looks like a public square, a blog post with a professional layout still looks like it was written by someone who knows what they're talking about. The substance behind those signals, however, collapsed under the volume. "I read it online" went from being a statement about access to knowledge to being a punchline about gullibility. Not because anyone decided to stop caring about quality — but because convenience, at scale, devours quality every time.
AI follows the same arc, but compressed. AI-generated code that nobody reviews. AI-generated articles that nobody fact-checks. AI-generated summaries of AI-generated analyses of AI-generated data. The 186,000-line PR is the new content farm — not malicious, not lazy, just produced at a speed that makes human judgment physically impossible to apply. The scarier question isn't the flood itself — it's who directs it. The companies building the models choose the training data, define the filters, set the defaults. Cowen writes deliberately for the training data, and I respect the transparency. But most of us don't get to make that choice. Consider what we know about curation in practice: models are trained on selections of the open web, with filtering decisions made by small teams at a handful of companies. What gets included, what gets weighted, what gets filtered as "low quality" — these are editorial decisions with enormous downstream influence, made without editorial accountability. When the models of the future are shaped primarily by the content and choices of the companies that build them, that's not the democratization we were promised. That's a new kind of gatekeeping — less visible, less accountable, and harder to challenge than anything that came before it.
The Trade-off
There's a pattern here, and I think it's worth naming plainly. Every time the production of something radically outpaces our capacity to comprehend it, we face the same quiet choice: slow down to preserve understanding, or keep moving and let the rituals of quality stand in for the real thing. We almost always choose the rituals. Not because we're careless or negligent — but because the cognitive cost of the alternative, in the moment, is unbearable. There are deadlines. There are expectations. There are 186,000 lines in the review queue. You can't stop the flood by reading harder.
I don't think AI is going to "take over." Honestly, I find that framing unhelpful — it assumes a destination, a moment of rupture, a before and after. What I think is more likely, and much harder to see while it's happening, is something quieter. We'll keep all the forms of human judgment. The reviews, the approvals, the credentials, the citations, the editorial standards. They'll all still be there. But the actual judgment — the slow, expensive, human process of understanding something well enough to vouch for it — will quietly drain out of them. Not because AI replaces human judgment, but because the volume of AI-produced material makes that judgment impossible to sustain.
Here's where I have to be honest with myself: this is how progress has always worked. We trade something for something. We always have. The question has never been whether there's a trade-off — it's whether we're making a good one. History offers two templates, and they look very different.
One version: offshoring software development. The pitch was compelling — the same work, at a fraction of the cost. Many companies went all in through the 2000s and 2010s. Many quietly reversed course years later, after discovering that the cost savings evaporated into communication overhead, rework cycles, and quality gaps that only became visible after the damage was done. The ritual of "we shipped the feature" survived. The substance — code that the team understood and could maintain — often didn't. Consider, too, financial derivatives before 2008: instruments so complex that the ritual of valuation survived (ratings agencies stamped them AAA, risk models marked them safe) while the substance — anyone actually understanding what the instruments contained — collapsed. When the reckoning came, it was systemic. Offshoring wasn't universally bad, and derivatives weren't inherently fraudulent, but in both cases the trade-off was worse than advertised, and the gap between ritual and substance only became visible in crisis.
The other version: the transition from assembly to higher-level programming languages. When FORTRAN appeared in the 1950s, serious programmers were skeptical. You were giving up control — raw speed, precise memory management, direct access to the machine. Those concerns were real. Higher-level languages did produce slower code, larger binaries, more abstraction between the programmer and the hardware. What you got back, however, was enormous: readability, portability, the ability for more people to write more complex programs and actually maintain them. The trade-off was real, but it was good. We gave up something we could afford to lose and gained something we desperately needed. Over time, compilers closed much of the performance gap anyway. Crucially, the transition worked because we built the safety infrastructure. Type systems. Formal verification. Decades of compiler testing and optimization. The abstraction earned trust because we invested in the mechanisms that made trust warranted.
AI-assisted development is a trade-off. We're gaining speed, volume, accessibility. We're losing comprehension, review depth, the kind of slow human judgment that catches what tests can't — and, as the XZ Utils case showed, what malice deliberately hides. After everything this essay has laid out — the PR, the hollowed-out review, the erosion of epistemic substance — I want to be honest about where I land. I think we are currently making a bad trade. Not inevitably. Not permanently. Right now, in this transition, we are trading comprehension for speed and calling it progress.
I think this because of what happens next — the second-order consequences that follow from ritual review over years, not months.
The first is the detection problem. The danger isn't just losing judgment. It's losing the ability to detect that we've lost it. If no one reads the code closely enough to notice quality gaps, the feedback mechanisms that would trigger course correction drain away too. The companies that reversed on offshoring could see the rework cycles, could measure the defect rates, could feel the communication overhead. In the AI case, the rework may be invisible — patched by the same agents that introduced the bugs, in a loop that never surfaces to human awareness.
The second is generational loss. A generation of engineers trained in an environment where review is ritual won't experience the loss, because they'll never have had the substance. The cultural memory of what review was for — not the checkbox, but the hard-won comprehension — fades through non-transmission. Polanyi called this tacit knowledge: the kind that lives only in practice and dies when the practice dies. You can't document your way out of it. You can't write a wiki page that transmits the instinct for when something in a diff feels wrong. That instinct is built by doing thousands of reviews where it mattered. Take away the practice, and the knowledge goes with it.
The third is the debugging problem. When write-only code breaks — not a small bug, a systemic failure — who debugs it? If no human ever understood the code, debugging becomes a conversation with the same AI that wrote it, or a different one operating on different assumptions. Machines explaining machines to humans who can't verify the explanations. This isn't science fiction. It's the logical endpoint of the trajectory we're on.
I'm not sure I'm right about this. I use these tools every day. I see the gains. I think, however, we're closer to the offshoring template than the FORTRAN one — not because AI tooling is bad, but because we haven't built the new forms of trust that would make the trade-off good. The compiler transition worked because we invested decades in the safety infrastructure. Type systems didn't appear overnight. Formal verification grew alongside the abstraction. Trust was earned. We haven't done the equivalent for AI-generated code. We're running the higher-level-abstraction playbook without the safety infrastructure that made it work.
Here's what would change my mind. If I start seeing new review practices that restore substance at machine scale — architecture-level review rather than line-by-line, AI-generated intent explanations that can be verified against the code, hard limits on PR scope that force comprehensible units of change. If I see tooling that makes comprehension scale with production, not just production alone. If I see feedback mechanisms that surface quality gaps before they compound — not just test coverage, but genuine measures of human understanding. If those emerge and take hold, this is the FORTRAN story. We'll look back at this period the way we look back at assembly programmers resisting higher-level languages — understandable anxiety about a transition that ultimately worked. If they don't, though — if we just keep clicking approve and trusting the pipeline — then we are offshoring our understanding, and we will discover the cost the same way: slowly, then all at once.
I'm going to go try to review that PR this week. I already know I can't hold 186,000 lines in my head. Nobody can. I think the trying still matters — not because I'll catch everything, but because right now, in this transition, the act of trying is how we find out what the new forms of trust need to look like. The rituals will survive the flood. They always do. The question is whether we build something with substance underneath them before we forget what substance felt like.
References
- Schieber Research, "When Friction Becomes the Feature (And Why Convenience Was Never the Villain)"
- DoltHub, "A Week in Gas Town"
- Joe Ruscio, "Write-Only Code" — Heavybit
- Jônadas Techio, "The Claim Upon the Training Data"
- Wikipedia, "XZ Utils backdoor"