[← back to blogs]

The Intern that never grows up

January 2026

AI-assisted coding is no longer interesting because it can write code. It's interesting because it forces you to pick a philosophy of responsibility. The only one that survives contact with real repositories, real maintainers, and real incentives is this: treat the model like an intern. High-throughput, intermittently brilliant, often wrong in confident ways, and never the accountable party.

The new argument isn't "can it code?" We've moved on from 2023's awe ("it wrote a regex!") to 2026's annoyance. The output is often plausible enough to waste expert time. That's why "AI slop" became a term of art. Not merely low quality, but high-volume, high-confidence text that imposes asymmetric review cost on humans. "Slop" is also shorthand for a specific failure mode. The thing looks neat, reads fluent, compiles cleanly, and still doesn't deserve to exist.

You see this in open source debates. The pain isn't that bots can't code. It's that they generate artifacts that look like they belong (PRs, docs, bug reports) while quietly raising the burden on maintainers to prove a negative. "No, this is not real. Not correct. Not actionable." The shift is subtle. Output is cheap. Attention is scarce. So governance and review discipline become the battleground.

The chat interface, copy-paste, and who's actually at fault

A lot of devs are still coding through the ChatGPT interface. Paste a snippet, ask for a fix, copy the answer back. When they don't get the exact answer they want they curse the thing. At scale. Twitter and Reddit are full of "AI is useless for real work" from people who never gave the model context, never asked for tests, never reviewed the output. The problem isn't AI. It's the dev. You wouldn't hand an intern a one-line prompt and a random file and expect a correct fix. We're doing that to the model and then blaming it when the result is wrong.

In government and public-sector IT the situation is the same, often worse. Stakes are high. Citizen data, critical systems. But the workflow is still copy-paste from a chat window. When something breaks the narrative becomes "AI failed" instead of "we didn't review, we didn't test, we didn't own it." Govt should be the place where accountability is non-negotiable. That's exactly where "the problem is the dev" needs to be said loudest. The tool doesn't sign off on the release. The human does.

Linus, "AI slop," and the refusal to moralize

In January 2026 Linus Torvalds dropped a very Linus-shaped response into an email thread about whether kernel documentation should take a stance on LLM-generated contributions. "There is zero point in talking about AI slop. So stop this idiocy." He argues that documentation is "for good actors" and that the "AI slop people aren't going to document their patches as such." So policy text is performative rather than protective. He doesn't want kernel docs to become "some AI statement" on either side (doom or hype). He insists on "just a tool" as the least-wrong posture for project documentation.

This isn't a pro-AI rallying cry. It's something more practical and more damning. A statement about incentives. If you ban it people will still use it and lie. If you mandate self-disclosure, bad actors will self-hide. So the only reliable defense is the kernel's immune system. Review culture, correctness standards, and the maintainers' reflexive intolerance for regressions and handwaving. It's the same conclusion we've reached for every other "tool that makes output easy." You can't legislate taste. You encode it in process.

If you want a nerdy angle, Linus is basically saying you can't solve an adversarial classification problem with voluntary labeling. That's an attack model, not a vibes model.

OpenSlopware and the emergence of "provenance wars"

A week later (still January 2026), The Register reported on "OpenSlopware," a Codeberg repo that tried to list open source projects using LLM-bot-generated code or showing signs of coding assistants in the workflow. The story isn't "a list exists." The story is that the list creator allegedly got enough harassment that they deleted the repo and even their Bluesky account. Forks quickly appeared and people coordinated around maintaining copies. That's the sociotechnical fact worth staring at. We've reached the stage where provenance and "AI contamination" are contentious enough to trigger naming-and-shaming and harassment in the open.

Whether you see such lists as helpful transparency or misguided policing, the existence of the conflict is the point. The ecosystem is negotiating a new question. What does it mean for code to be "authored," "reviewed," "trusted," or "maintained" when the cost to generate 10,000 lines of neatly formatted plausibility has collapsed? In older open source culture volume was itself a signal. Someone paid for it in time. In the new world volume is often a warning sign. It can be generated without paying the human cost that historically correlated with care.

This is how you get provenance wars. Not because everyone suddenly got philosophical. Because maintainers are trying to defend a limited resource. Reviewer attention.

Kailash Nadh's "code is cheap" and the inversion of Linus's quote

Then in late January 2026 Kailash Nadh published a post titled "Code is cheap. Show me the talk." The title inverts Linus's famous "Talk is cheap. Show me the code." He makes the argument in the most engineer-brained way possible. LLM tools have "completely flipped" his workflow. "Software development, as it has been done for decades, is over." He says LLMs can one-shot generate "stunning looking documentation pages" and "dense READMEs." So much so that the old heuristics for evaluating a repo (docs quality, neat organization, even "proper idioms") are now unreliable. "The more stunning or perfect looking something is, the more suspicious it is now."

That's a very specific, very real-world observation. LLMs don't just produce code. They produce the camouflage that used to distinguish careful work from sloppy work. In a way this is the mirror image of Linus's stance. Linus says don't put statements in docs, keep docs for good actors, rely on review. Nadh says docs can be generated at scale now, neatness is no longer evidence of thoughtfulness, you need to look at provenance and governance more than the superficial artifact. Same destination, different trail. A world where "it looks good" is weak evidence so we fall back to processes that are harder to fake.

Nadh also uses a phrase that's worth stealing. He says LLM tools reduce the physiological and cognitive cost of producing outcomes by "several orders of magnitude." He now spends the freed bandwidth on "engineering, architecting, debating, tinkering" and writing "much more concise and meaningful code" that he actually wants to write. That's the optimistic version that doesn't require hype. The tool shifts labor from typing to thinking. If you already have thinking skills to spend.

The intern model, but with sharper edges

"Treat it like an intern" is correct, but still too gentle unless you add the real edge cases.

An intern does three dangerous things:

They optimize locally because they don't yet feel global invariants.
They overfit to recent examples because they lack deep priors.
They produce convincing prose that can smuggle weak reasoning past tired reviewers.

LLMs do the same things, except the intern at least learns from feedback and eventually internalizes your system's culture. The model doesn't "learn your repo" unless you repeatedly constrain it and route it through a verification harness. So the intern metaphor needs an addendum:

Treat it like an intern who never grows up, but can draft infinite versions while you sleep.

That sounds grim until you realize the strategy is obvious. You don't ask it to be wise. You ask it to be fast. And you invest in mechanisms that cheaply reject wrongness.

A maintainer-grade way to use these tools

Here's what it means to operationalize "intern, not competitor" in a way that would make a kernel maintainer nod and a fintech on-call engineer sleep.

1) Convert tasks into reviewable diffs

The first anti-slop measure isn't "better prompting." It's shaping work so it can be reviewed.

Rules that work:

Demand minimal diffs. No refactor, only fix this bug, keep behavior identical elsewhere.
Ban drive-by changes. Formatting-only PRs, style churn, "while I'm here" improvements.
Require locality. Touch these files only. Explain why any new file exists.

This is how you keep the AI from producing a thousand-line novella to fix a three-line bug.

2) Make it state assumptions up front

In human reviews the most expensive part is reverse-engineering what someone assumed. LLMs are assumption factories.

So you require a preface.

List all assumptions about inputs, concurrency, error handling, and environment.
State invariants in plain English.
Name failure modes and how they show up in logs or metrics.

If it can't articulate assumptions it's probably guessing. If it can, you can quickly falsify them.

3) Tests are the price of admission

LLMs can generate tests, sysadmin glue, publishing scaffolding—not just application code. Use that. Make tests the currency. If the model proposes a change it must also propose the tests that would fail before and pass after.

But you need to be picky about what counts as a test.

Behavioral tests, not snapshot tests of internals.
One adversarial test that tries to break the logic.
For parsers and codecs, fuzz hooks (even basic).
For concurrency, cancellation and race checks where feasible.

No tests, no merge. This isn't moralism. It's economics. You can generate code cheaply so you must demand verification to keep review cost bounded.

4) Treat documentation as suspect unless it's anchored to code

Nadh's point about "stunning" docs being suspicious is crucial. LLMs can produce beautiful explanations that aren't bound to reality. So for docs you enforce executable documentation principles.

Every doc claim should reference a function, config, endpoint, or actual output.
Prefer examples that are copy-paste runnable (or at least CI-checked).
Use doctests or snippet tests where possible.

The new smell isn't "bad docs." It's "docs too perfect, too generic, too unmoored."

5) Provenance and governance matter more than ever

OpenSlopware is one signal that the ecosystem is starting to care about provenance and "contamination" in a way that will get messy. You don't have to endorse naming-and-shaming lists to take the underlying concern seriously.

In practice this means:

Clarify contribution standards. Not "ban AI" but "meet the bar."
Require authorship accountability. Who reviewed, who approved, who is on the hook.
Be explicit about licensing posture for generated code where relevant.

The only stable unit of value in a world of infinite generated artifacts is accountability—the "framework of accountability" and "humanness" Nadh names: the ability to hold someone responsible for the artifact.

Linus's antigravity thing and the right way to read it

Same month Linus told kernel docs not to moralize about AI slop, he pushed commits to a hobby project called AudioNoise. GPL repo, digital audio effects. Delays, filters, phasers, flangers. He wrote the core C himself. For the Python audio visualizer he vibe-coded with Google Antigravity. In the README he says it started as his usual "google and do the monkey-see-monkey-do" kind of programming. Then he cut out the middle-man (himself) and used Antigravity for the visualizer. Python isn't his strong suit. The AI approach was more efficient than copying examples. So the guy who said "just a tool" and "don't put statements in docs" also has a public hobby repo where he used an AI coding tool. Not on the kernel. On a bounded, non-critical piece. The part he was willing to review and own.

That's the same posture the rest of this essay argues for. Give the tool a clear scope. Keep the critical path in human hands. He didn't hand Antigravity the C core or the kernel. He gave it the visualizer. Bounded work. Falsifiable. The kind of task that fits the "good" side of the taxonomy we're about to get to. So the right way to read the AudioNoise commits isn't "Linus uses AI now." It's "Linus uses AI the way you're supposed to." Where stakes are low. Where he's still the accountable party. Where the tool is an intern that never grows up but can draft the Python while he focuses on what he cares about.

The kernel culture he helped build is hostile to churn and regressions. Don't move bugs around. Don't break users. Don't add abstractions because they feel clean. The tool encourages churn. Maintainers reward stability. So he uses the tool where it doesn't threaten that. The idea holds: process and accountability over performative policy, bounded work over global rewrites.

AI will not replace the need for taste. It will punish the absence of taste faster.

Linus didn't suddenly become a vibe-coding evangelist. He applied taste. He knew where the tool was safe and where it wasn't. The next section is about who has that taste and who doesn't. Juniors and seniors.

The part people avoid saying: juniors are at risk, seniors are amplified

Nadh directly worries about learners. He argues that without fundamentals LLMs become "unreliable, dangerous genies." Juniors can end up stuck with codebases they don't understand, forced into helpless dependence on the tool. He frames it as a potential loss of the natural pathway by which juniors become seniors. The circumstances that force learning might never arise if the genie always produces something "good enough."

That's a strong, non-generic claim. The threat isn't "AI takes jobs." It's "AI changes the apprenticeship pipeline." The intern model sharpens this. If your intern never grows you must already be competent to get consistent value. That creates a gradient. Experienced engineers compound output. Novices risk compounding confusion.

A practical posture that isn't preachy:

Juniors. Use AI to explain and to generate small, test-backed changes. Avoid letting it design systems for you.
Seniors. Use AI to compress toil, explore design space quickly, draft reviews. But keep the bar on invariants and regressions brutally high.

A production-flavored taxonomy of good AI work

You can draw a line between the kinds of work where "intern AI" is net-positive and where it turns into slop. The good side is bounded and falsifiable. Think writing a converter from format A to B with golden tests, or refactoring a function while preserving semantics and proving it with tests. Adding instrumentation and structured logs without changing behavior. Drafting a migration with a rollback plan and idempotency checks. In every case there's a crisp notion of wrong. The model's output can be cheaply rejected. You run the tests, they fail or pass, you know where you stand.

The risky side is global and ambiguous. Designing the architecture. Rewriting the service for performance. Fixing the concurrency bug when you don't have a reproducer. Implementing auth without explicit threat modeling and a test harness. These tasks don't fail loudly. They fail as incidents, security issues, or future maintenance tax. In government that second category is where citizen data and critical systems live. So the taxonomy isn't academic. It's the difference between "we used a tool to generate a script we could verify" and "we let the tool design the thing that holds citizen records."

Microsoft, C/C++ to Rust, and the optimism of AI migration

In late 2025 Microsoft Distinguished Engineer Galen Hunt posted about an ambitious goal: eliminate every line of C and C++ from Microsoft by 2030, using AI and algorithms to rewrite large codebases at scale. The North Star metric was "1 engineer, 1 month, 1 million lines of code." The idea is that AI-based coding tools make migration feasible and very quick. You build code processing infrastructure that creates scalable graphs over source code, then deploy AI agents guided by algorithms to make modifications at scale. Language-to-language migration that would have been unthinkable a few years ago becomes a research target. Hunt's team is in CoreAI's Future of Scalable Software Engineering group, focused on eliminating technical debt at scale. Microsoft has since clarified that this is a research initiative, not an imminent rewrite of Windows. But the take is out there: with the right infrastructure, AI can make massive migrations fast.

That optimism sits right on the line between the good and risky sides of the taxonomy. Migrating a billion lines of C/C++ to Rust is global work. It's also the kind of thing that fails as incidents if you get it wrong. The interesting part is that Hunt's pitch isn't "paste into ChatGPT and pray." It's graphs over source code, algorithmic guidance, and AI agents operating in a structured pipeline. The takeaway is unchanged—bounded tasks, verification, human ownership. The Microsoft story is a useful counterpoint—when someone serious bets on AI for something this big, they're betting on infrastructure that makes the intern's output reviewable and rejectable at scale, not on copy-paste.

Where the optimism actually lives

The subtle optimism isn't that software gets effortless. It's that the center of gravity shifts toward what good engineers wanted all along. More time on invariants, interfaces, and failure modes. More emphasis on review and maintenance, which is Linus's preferred battlefield. More value in articulation, in "talk," because code is cheap and Nadh's inversion holds. When code generation becomes a commodity, engineering becomes the scarce skill again. In a government department that means the people who can specify what the system must do, review the output, and own the outcome matter more than the people who can type the most code the fastest.

Government, AI that helps, and teaching the right way

AI can be very helpful in government IT: bounded tasks (formats, refactors, instrumentation, migrations), boilerplate, anchored docs, small fixes. The gain is real. The mistake is the wrong way—copy-paste-and-pray, no context, blame the tool when it fails. As with devs earlier: give the intern the brief, not a one-liner and a prayer.

The right way: context-aware tooling. Cursor, Copilot, or whatever fits your stack—model sees the repo, you review every diff, run tests, own the merge. In government that matters: citizen-facing systems, compliance, audit trails. You want the tool to help, not produce slop you paste into production. Teaching context-aware AI use, not "here's ChatGPT, go build something," is the first thing to get right.

Onboard by making the line between bounded and risky work explicit, the mindset (intern, not oracle), and the defense (review, tests, named owner). Government is where accountability is non-negotiable—so teach who's accountable first.

A long, nerdy closing

There's a future where we drown in a Borgesian library of plausible patches and perfect-looking READMEs. The only people who can navigate it are those with enough taste to reject 99% of it quickly. Nadh invokes Borges' "Library of Babel" to describe the emotional unfairness of being burdened with infinite low-cost artifacts, even if each one is individually "good." That's a scary metaphor but it's also a map. You don't survive the Library by generating more books. You survive it by building indexing, provenance, and strong filters.

Linus's refusal to let kernel docs become a political statement is, in its own grumpy way, an indexing strategy. OpenSlopware shows the community groping toward provenance as a first-class property. Nadh's "code is cheap, show me the talk" is a prediction: articulation and accountability become the differentiators; typing code is no longer the bottleneck. In government that lands hard—you're already supposed to document decisions and hold people accountable. When anyone can generate a plausible-looking spec or patch, the only thing that holds is who stood behind it.

So yes. AI-assisted coding is here. Use it—as an intern: bounded work, tests, explanations, human signs off. In a government department that human is the engineer who reviewed, the lead who signed off, the process that made both mandatory. The one philosophy that survives is the one that never forgets who's accountable.

P.S. The intern did not write this. The intern is currently drafting 47 alternative endings in a parallel universe. I picked this one. I'm still the one who hits merge. The asymmetry is real.