Why working code is not the same as understanding

This essay is the first of three on knowledge, genesis, and the human role in an age of capable AI.

The core thesis (of this essay): Knowledge is not information you receive; it is structure you build through the effortful traversal of problem spaces. AI delivers destinations without journeys. The artefact works. The understanding that would let you navigate it forms only through the work you did not do.

The second essay, "The Composition Problem," examines what happens when stable components must work together.

The third, "The Archaeological Reversal," asks what AI's structural advantages actually buy us.


I have used coding agents extensively over the past twelve months. I have watched them wire up ten thousand lines of working code, mass-refactor unwieldy components, and produce artefacts that compile, pass tests, and serve users. The capability is real. It is also, I will argue, beside the point.

AI coding agents produce syntactically correct code. They follow patterns from their training distribution with remarkable fluency. What they do not reliably produce, when used as a substitute for human traversal, is a corresponding theory in the user's mind. Peter Naur, in his 1985 essay "Programming as Theory Building," argued that the primary product of programming is not the code but the theory the programmer holds: a mental model that lets them explain how the solution relates to the problem, anticipate how changes will propagate, and modify the system in response to novel demands. The code is an artefact. The theory is what makes the artefact navigable. AI produces artefacts. But without traversal, the user may hold no theory adequate to the artefact they now operate.

I have seen this gap in my own practice. A generated authentication module worked correctly for months until a requirement changed: support for multiple identity providers. The code was sound; my theory of its structure was thin. I could not confidently locate the extension points, could not anticipate which assumptions were load-bearing. I had to reverse-engineer my own system, a system I had "built" but not traversed. The artefact functioned. The theory I needed for modification had not formed.

The question is not whether the artefact works. It often does. The question is what theory you hold of the artefact you now operate, adequate for maintenance, modification, and extension; and what happens when that theory is thinner than the system it must navigate.

The same gap appears in prose. AI generates grammatically correct, plausible-sounding text. It can supply sentences at will. But clarity (clarity as a thesis made precise, as irrelevancies cut away, as the heart of the matter brought to the front) does not automatically arise from fluent output. It arises when the writer's own grasp is sharpened: when they hold a theory of what they are trying to say, not merely a text that says something.

This is not a capability complaint, not a claim that next year's model will fail in the same ways. It is a puzzle about locus: what is missing when the artifact is correct but the theory is absent in the person who must maintain, modify, and extend it? And what happens to your capacity to build theory when you outsource the very process that, in you, generates it?

Information With Causal Power

The physicist David Deutsch, in The Beginning of Infinity (2011), offers a distinction that clarifies the stakes. Deutsch is known for foundational work in quantum computation and for a philosophical program emphasising the growth of knowledge as the central phenomenon requiring explanation. Knowledge, he argues, is not merely information stored and retrievable. Knowledge is information with causal power: structure that lets you act on situations you have not encountered before. In what follows, "knowledge" is used in that sense: knowledge delimited by its ability to guide action in novel cases, not merely to repeat what has been seen. The related term "theory," following Naur, refers to the mental model that constitutes such knowledge in practice: the capacity to explain, anticipate, and modify.

The distinction cuts deeper than it first appears. Information is pattern: this function takes these arguments, this historical event followed that one, this word means approximately this. You can retrieve information. You can look it up, ask for it, have it delivered. Knowledge is something else. Knowledge is compressed structure that allows navigation: knowing not just that a design pattern exists but when it applies, where it will create friction, how it interacts with other patterns, why it emerged as a solution to problems that preceded it.

The philosopher Jean Hyppolite, in Genesis and Structure of Hegel's Phenomenology of Spirit (1946), emphasises the difference between structure and genesis. Structure is the configuration as it stands: the finished kitchen, the working code, the completed proof. Genesis is the effortful traversal through which understanding is built: the sequence of decisions, constraints, and productive failures through which you came to grasp why this structure rather than another. Structure tells you what. Genesis tells you why this and not something else. (The application of Hyppolite's framework to programming and AI is my own synthesis; the structure/genesis distinction is Hyppolite's, the extension to these domains is not.)

The person who receives an artefact has structure. The person who built it carries genesis. And genesis is typically what lets you act when circumstances change, when requirements shift, when the novel situation arrives that no existing pattern quite addresses. Structure lets you recognise. Genesis lets you generate.

This connection between genesis and adaptive capacity admits exceptions. Someone might acquire adequate theory through intensive study of received artefacts, through deliberate reverse-engineering, through careful examination of implementations. The claim is not that traversal is the only possible route to theory, but that it is the historically robust one, and that fluent delivery without traversal provides no guarantee that theory will form. The encoding gap is a risk, not a certainty.

Knowledge, understood this way, is the residue of traversal. You do not typically acquire Deutsch-knowledge merely by receiving information; you acquire it by moving through problem spaces, encountering constraints, feeling trade-offs, watching your assumptions fail and rebuilding them. The compression happens through that encounter. The causal power comes from having inhabited the structure, not merely observed it. (An important qualification: those who have previously traversed related problem-spaces may integrate new information into existing causal structures without repeating the full traversal. The concern applies most directly to those who have not yet built such structures.)

This is what abstraction in a mind is: compressed structure with causal power. A good abstraction is not merely a label or a grouping. It is a tool for thought that lets you reason about novel situations without reconstructing everything from first principles. Abstractions emerge from noticing what varies together, what remains stable, where the joints in the problem lie. That noticing requires traversal. It cannot simply be delivered as finished output and counted, in you, as knowledge.

Bret Victor, in his essay "Up and Down the Ladder of Abstraction" (2011), captures what this traversal looks like in practice. Understanding a system, Victor argues, requires moving fluidly between levels of abstraction: from the concrete particular (this specific execution, this specific input) to the abstract pattern (how the system behaves across all inputs), and back again. Neither level alone suffices. The concrete grounds understanding; the abstract reveals structure invisible at ground level. "The deepest insights," Victor writes, "emerge in the transitions": in the act of climbing and descending, not in arriving at any single rung. Causal power, in this frame, is ladder-fluency: the capacity to move between levels, to see how abstract patterns manifest in concrete cases and how concrete encounters reveal abstract structure.

When AI produces code without that abstraction forming in the user, it is producing information without causal power in the human who must act next. The syntax is correct because syntax is pattern. The abstraction is absent (in the user) when compression has not occurred through the user's own navigation of the problem space. The tool retrieves. It does not, by itself, ensure traversal. And traversal is what has reliably generated the structure that lets humans act on what comes next.

The Encoding Threshold

The distinction between information and knowledge is not merely philosophical. It has a grounding. Cognitive science and neuroscience have accumulated findings consistent with a simple point: effortful reconstruction tends to produce more durable, retrievable learning than passive exposure. The "testing effect," demonstrated in landmark studies by Roediger and Karpicke (2006), shows that retrieval practice produces better long-term retention than equivalent time spent restudying. The Bjorks' research on "desirable difficulties" (1994) provides a broader framework: conditions that slow initial learning often enhance retention and transfer.

A caveat on scope: this research largely examines novice learners acquiring discrete knowledge (vocabulary, facts, procedures). Whether the findings transfer fully to expert practitioners building on existing causal structures, or to the acquisition of architectural judgment in complex domains, remains less established. The encoding threshold is best understood as a well-grounded principle for foundational learning, with plausible but not yet fully demonstrated extension to expert practice.

When you struggle through a problem (when you attempt to reconstruct an answer from incomplete cues, when you reach for a solution and find it missing, when you work through the friction of not-yet-knowing) you force reconstruction rather than recognition. In these findings, reconstruction is associated with stronger later retrieval than mere rereading. The effort is not incidental. It is part of the mechanism by which information becomes structure.

When you receive an answer without that struggle, you can get the signal of familiarity rather than the work of reconstruction. Familiarity feels like learning. The information enters awareness with confidence. But, absent retrieval practice, it is less likely to consolidate into durable, retrievable structure. You experienced the feeling of knowing without reliably building the structure that constitutes knowledge-in-you.

This is why, in a study published in Science, Karpicke and Blunt (2011) found that students who engaged in retrieval practice outperformed those who created elaborate concept maps or repeatedly studied material. Same content. Different cognitive act. Rereading emphasises familiarity. Recall emphasises encoding for later use.

The metaphor of cognitive effort as lifting weights captures this dynamic. Terence Tao, the Fields Medalist, has expressed concern on his blog (terrytao.wordpress.com) about AI tools allowing mathematicians to bypass struggle. The precise formulation here synthesises ideas from multiple posts and interviews rather than quoting a single source, but the concern is his: that outsourcing cognitive effort may deprive the mind of the resistance that builds intuition. Resistance creates adaptation. Remove the resistance, remove the adaptation signal. The system has no reason to change.

Cognition follows the same logic. The brain tends to strengthen what it must reconstruct because reconstruction marks a deficit: this mattered, it was needed, it was not immediately available. Fluent delivery can send the opposite signal. If the information arrived without effort, the system may register no need for structural change.

Here is the problem: AI systems are fluency machines of remarkable power. Earlier tools (calculators, spell-checkers, search engines) also delivered answers without struggle; AI differs in degree rather than kind, but the degree matters. The scope of what can be fluently delivered has expanded dramatically: not just arithmetic or spelling but code, prose, analysis, design. AI delivers answers with confidence, completeness, smoothness: the very qualities that make struggle feel unnecessary. Why work through the problem when the solution appears in seconds? Why sit with difficulty when relief is a prompt away? The tool does exactly what tools should do: it reduces friction. But friction, in cognitive terms, is not always waste. In the domain of forming durable competence, friction is often the signal.

There is a range of cognitive effort below which durable, retrievable learning is markedly reduced. Above it, reconstruction tends to encode structure. Below it, familiarity can mimic knowledge without reliably producing it. AI can invite workflows that drop users below that range, unless they deliberately reintroduce reconstruction, testing, and delay. The virtue for the tool can become a cost for the user who needs not just answers but the capacity to generate answers in novel situations.

The risk is not that AI gives wrong answers. The risk is that right answers, delivered fluently, can bypass the user's encoding work. You feel informed. You are not necessarily changed in respect of durable competence.

Recent experimental evidence confirms this mechanism directly. Shen and Tamkin (Jan, 2026), in a randomized controlled study at Anthropic, found that participants who used AI assistance while learning to code scored 17% lower on subsequent assessments than those who learned without AI (Cohen's d = 0.738, p = 0.010). Crucially, the control group encountered more errors during learning (median 3 versus 1 for the AI group). Those errors were not obstacles to learning; they were the learning. The struggle that AI eliminated was precisely the friction that would have produced encoding. Participants in the AI condition reported feeling "lazy" and noted "gaps in understanding"; the phenomenology matched the mechanism.

The Ten Thousand Lines That Work

But now a strong objection can be raised, and it deserves full weight: "AI coding agents can produce ten thousand lines of working code. The system compiles, passes tests, serves users, solves the stated problem".

I have watched this happen. You probably have too. This is not a strawman capability to be dismissed: it is genuine, impressive, and improving monthly. The spec-verify loop, with sufficient iteration, produces working artifacts.

I grant all of this to be valid.

The question remains: what do you carry? Assuming the artefact is genuinely stable, do you have sufficient theory (understanding adequate for maintenance, modification, and extension) for what comes next? What is your relationship to the system that you have created and may need to operate? The Shen and Tamkin study illuminates the mechanism: their control group, working without AI, encountered three times as many errors and scored significantly higher on later assessments. The errors were not waste. The errors were where the learning happened. Working code is the outcome; the errors encountered en route are the traversal that produces theory.

Consider the act of specification. To specify precisely what you need, you require vocabulary. And vocabulary emerges from traversal. Stuart Kauffman, a theoretical biologist at the Santa Fe Institute known for his work on self-organization and complexity, coined the term "adjacent possible" in Investigations (2000) to name a simple constraint: you can only reach spaces that border what you have already explored. If you have not built systems of similar structure, navigated their failure modes, felt where abstractions hold and where they leak, you lack the concepts to articulate your actual requirements. You are reduced to high-level gestures: make it maintainable, keep it performant, follow best practices. The agent interprets these through its training distribution, not your situational understanding. The specification you write before understanding is categorically different from the specification you would write after moving through the solution space yourself. They may use the same words. They do not carry the same meaning.

Christopher Alexander, the architect and design theorist whose Notes on the Synthesis of Form (1964) influenced both building design and software engineering, observed that good problem-solving requires tackling the parts with fewest degrees of freedom first. When designing a kitchen, place the windows before the table, and the table before the stove: windows have perhaps two natural positions on one wall, while the stove can go nearly anywhere. Start with the stove, and you risk blocking the only viable spot for something more constrained.

This sequencing wisdom is not in any specification. It is learned by having placed windows badly, by watching the cascade of compromises that follows a wrong early decision. The analogy to software is structural rather than exact: software constraints differ from physical ones, but both domains reward learning which decisions are load-bearing through the experience of getting sequencing wrong. The AI can produce a kitchen. But unless you traversed the constraints, you do not learn which decisions were load-bearing, and that learning transfers imperfectly through delivered artefacts.

Verification faces the same structural problem. You can confirm that tests pass. You can check that the code does what the specification requested. What you cannot automatically do is evaluate whether the abstraction boundaries will hold under requirements you have not yet imagined. In Victor's terms, spec-verify operates at fixed rungs: you specify at the level of intent, you verify at the level of test results, but you never descend to the concrete level of implementation, never see how the abstraction is grounded, never watch a specific execution trace reveal why a design choice matters. Deep verification requires ladder-mobility: the capacity to step down from "tests pass" to "here is why this edge case would break the assumption," and back up to "here is the pattern of brittleness this reveals." That theory of failure comes from encountering failure directly, from having climbed down to where failures live. Stop lifting the weights, and you lose not just the capacity to build but the capacity to evaluate meaningfully. Verification without traversal becomes checking what you know to check. It is not reliably checking what you do not yet know to look for.

This leads to the deeper problem. After the cycle of specification and verification, what do you actually know? You know the artefact exists. You know it passes the checks you defined. But you may not carry a mental model of why it is structured as it is, what alternatives were implicitly rejected, where the brittleness hides, how the pieces interact under conditions outside the test suite. The implementation arrived as a package. You did not watch it emerge.

David Krakauer, President of the Santa Fe Institute and a complexity scientist who studies the evolution of intelligence, has articulated in lectures and interviews a definition that illuminates the trap we risk: stupidity is the persistent use of a rule system that does not improve with more information. (This formulation derives from informal communications rather than peer-reviewed publication, but captures a pattern worth naming.) Specification-without-traversal can fit that structure. Your rule system is your capacity to specify: vocabulary, concepts, intuitions. When you outsource the building, information arrives (the implementation) but that information has no guaranteed mechanism to update your model. Each successful cycle can reinforce the workflow while leaving your understanding static. The loop tightens. You become more dependent on the process precisely as you become less equipped to evaluate its outputs.

What gets lost in the efficiency narrative is that specifications change because implementation reveals what you actually wanted. The build process is not merely productive. It is epistemic. You discover constraints you did not know existed. You encounter trade-offs that reshape preferences. You develop intuitions about where the system wants to go and where it resists. Traversal is the compression that generates causal power. Outsource it too early, and you sever the feedback loop that would have refined information into knowledge.

The trajectory compounds. Spec-verify sharpens one dimension of vocabulary: you learn to articulate what you need. But it leaves others undeveloped. You do not learn why certain architectural choices fail under pressure, where abstractions leak, which patterns apply when. You do not develop the bridging vocabulary that connects domain to implementation: how requirements constrain technical choices, how technical realities reshape what can be specified. The artefact embodies structure you did not earn; that structure does not transfer to your theory. When requirements shift and novelty arrives, you can say what you want. You cannot evaluate whether what you received will hold.

Not Another Abstraction Layer

Here is where an objection can be made by reflecting on the history of programming languages.

Programming has always moved up the abstraction stack. Each time, someone warned the new generation would not "really understand". Each time, productive work continued. Why is AI different? Why is this not simply the next layer, with humans now operating at the specification level rather than the implementation level?

The objection has content and force. But it misunderstands what made previous abstraction layers work, and why AI represents a different kind of shift.

Marshall McLuhan's insight that "the medium is the message" provides insight. McLuhan argued in Understanding Media (1964) that the form of a medium shapes what can be learned and communicated through it, independent of content. Television teaches differently than print not because of what is broadcast but because of how attention is structured, what feedback loops exist, what kind of participation the medium affords. The medium determines the cognitive act.

Previous programming abstractions changed the medium but preserved its essential structure: you thought in solution-space. Assembly, C, Python, Java, each required you to inhabit the logic of the solution itself, to express your intent in terms the machine can execute. When your C code segfaulted, the medium taught you something about memory. When your Java code raised a type error, the medium taught you something about underlying structures. The resistance you encountered was the resistance of the solution domain. The message of the medium was: this is how computation works at this level of abstraction.

AI as medium is structurally different. It permits you to remain entirely in problem-space, expressing intent in natural language, while the solution materialises in a black box. The translation from intent to implementation happens somewhere you do not witness. You may learn to specify better, to prompt more precisely, to verify more carefully. But specification-learning is not implementation-learning. The medium does not carry the message about how the solution works, only that it works.

This is not merely a change in representation. It is a change in what the medium can teach. When you work in C, the medium's constraints are the solution's constraints; you cannot avoid learning something about the solution domain by working in that medium. When you work in natural language prompts, the medium's constraints are linguistic and intentional; you can successfully use the medium while learning nothing about how the resulting artefact is structured.

The counter-objection still deserves acknowledgment: AI-assisted development also involves iterative failure. Prompts that don't work, generated code that breaks, specifications that prove inadequate. Is this not also traversal? The distinction is this: what domain does the failure teach you about? When your prompt fails, you learn about prompting. When your specification proves inadequate, you learn about specification. These are real skills. But they are not the same skills as understanding why a particular architectural choice fails under load, or where an abstraction leaks, or how components interact under conditions outside the test suite. Previous abstraction layers preserved your encounter with the resistance of the solution itself, merely changing the representation. AI can, but need not, preserve that encounter. When AI supplies a working solution before you have struggled with the problem, you skip the failures that would have taught you about the solution domain. The medium has changed what it is possible to learn by using it.

Bret Victor's ladder clarifies what is lost. Previous programming media forced you to climb: to move from abstract intent down through concrete implementation, encountering resistance at each level, then back up as patterns emerged from particulars. The ladder had rungs, and you had to step on each one. AI can teleport you to a rung, delivering you to working code without the climb. You arrive at the abstraction without having grounded it in concrete encounters, without having seen how the abstract pattern manifests in specific cases. The result is an abstraction you hold but cannot move from: you cannot step down to explain why this implementation rather than another, cannot step up to see how this pattern generalises. You have a position on the ladder without the ladder itself.

Hyppolite's reading of Hegel provides philosophical ground for a stronger claim, though, here, one that should be understood as a philosophical commitment rather than an established empirical fact: knowledge is not mere representation of structure that exists independently; knowledge is the process by which structure becomes articulate. If this view is correct, traversal is not merely instrumental, a step that can be replaced by fluent delivery. Traversal is where knowledge comes into being.

Hegel's dialectic proceeds through further with determinate negation: you learn from how inadequate formulations fail, and that failure drives you toward the more adequate. The error is not noise to be eliminated. The error is the signal. This is the philosophical analogue of active recall: the reaching and not finding, the inadequacy that forces reconstruction. Previous abstraction layers preserved this dialectic at the new level. You still had to learn from how your formulations broke.

AI's fluent delivery can skip that negation when it supplies the result before you reconstruct it. You receive an artefact without the failures that would have taught you why it is the right artefact, or what would break it.

Hegel distinguishes between Verstand and Vernunft: understanding and reason. Understanding applies fixed categories externally. Reason grasps internal movement: becoming, pressure, transformation. Previous abstraction layers did not reduce programming to mere Verstand (understanding). They gave new materials for Vernunft (reason) to work through.

AI, by delivering completed artefacts for verification, can reduce your engagement to Verstand (understanding), something critical for deeper theory building. You may not grasp the movement that produced the structure, because you were not present for that movement.

This is the qualitative break: previous layers changed the medium of traversal; AI can remove traversal and deliver the destination, unless you intentionally restore traversal at the higher level. You can operate at the specification level productively only if genesis occurs at that level: only if you traverse enough specification spaces to earn vocabulary, failure-models, and intuitions about what specifications reveal under implementation pressure.

The person who has built many systems and now uses AI to accelerate carries genesis from prior traversal. They know what to look for because they have seen what breaks. They can evaluate because they have theory. (There is also a reversal to acknowledge: AI can hold more structure in active attention than any human, revealing patterns that escape notice due to cognitive limits. This structural advantage is real, and matters. But seeing structure is not the same as understanding why the structure is this way rather than another. I will return to that distinction and discuss further in the third essay.)

But the person who begins at the specification level, who has never built, who receives working artefacts from the start: what do they carry? Categories without movement. Structure without genesis.

The honest conclusion is not that AI is useless. It is that AI can become a traversal substitute. And substituting for traversal is substituting for the process by which knowledge, in the strong sense of information with causal power in the human, comes to exist.

The Evidence at Scale

The encoding threshold is not a laboratory curiosity. Population data is at least consistent with the possibility that when societies remove certain kinds of cognitive friction, certain measured capacities change as well. Here, strong claims require strong evidence; what follows should be treated as hypothesis-generating rather than hypothesis-confirming.

IQ scores rose through much of the twentieth century, a phenomenon James Flynn documented in his 1987 paper "Massive IQ Gains in 14 Nations." In several countries, later cohorts show reversals in some measures: Bratsberg and Rogeberg (2018) documented a reversal in Norwegian IQ scores beginning with cohorts born after 1975. Researchers have proposed various causes, including changes in education, nutrition, and media consumption. Dutton and Lynn (2015) survey several hypotheses.

The timing invites a hypothesis: calculators reduce mental arithmetic practice; GPS reduces self-generated navigation; search engines reduce the need to hold and retrieve information. Each tool can remove a category of cognitive friction. And if durable encoding depends on effortful reconstruction, then some friction-removing tools may have hidden costs to the capacities they replace. But correlation is not causation; education policy, test-specific coaching, demographic shifts, environmental factors, and measurement changes are all confounders. The best we can say, without further evidence, is that the direction is consistent with the mechanism. This is suggestive, not probative. But it is consistent enough to warrant caution about what we might be trading away.

Positive Friction

The argument is not for abstinence but for sequencing. The spec-verify loop produces working artefacts. This is not in dispute. What is in dispute is whether you emerge from that loop equipped for what comes next, and that depends on what you did before the loop began. So, what should one do?

Terence Tao has articulated the concern directly: AI can lower mental effort so dramatically that the brain may stop lifting its own weights. Mathematics is especially vulnerable because every step can be outsourced. You can produce correct proofs without encoding the structure that would let you produce the next proof. You can verify solutions without building the failure-models that would let you spot subtle errors. AI's capability to generate and verify is real. The corresponding knowledge in the human does not automatically form. The prescription follows from the mechanism.

Struggle first. Move into the problem space far enough to develop vocabulary for what you need. Encounter constraints. Make errors. Feel the trade-offs. Let the difficulty do its work on your encoding. Then use the tool: to verify, to extend, to accelerate. The intervention point is temporal. Friction before fluency. Traversal before receipt.

The Shen and Tamkin study (2026) identified six distinct interaction patterns, and their learning outcomes diverged sharply. Three patterns preserved learning: "Generation-Then-Comprehension" (attempting solutions before seeking AI help, 86% average score), "Hybrid Code-Explanation" (using AI primarily for explanations rather than implementations, 68%), and "Conceptual Inquiry" (asking about principles rather than requesting code, 65%). Three patterns impaired learning: "AI Delegation" (immediate handoff of problems, 39%), "Progressive AI Reliance" (starting independently but increasingly deferring, 35%), and "Iterative AI Debugging" (using AI to fix errors without understanding them, 24%). The pattern is clear: interaction modes that preserve the user's encounter with difficulty preserve learning; modes that eliminate difficulty eliminate learning. Positive friction is not abstinence from tools but strategic sequencing of when tools intervene.

Responsible use means choosing when to think: not whether to use assistance, but when. Delay the offload until encoding has occurred. Use AI to verify, to extend, to accelerate, but not to substitute for the traversal that builds causal structure.

This is not intuitive. Friction feels like waste. The brain seeks efficiency, and efficiency means minimising effort. When a tool offers to eliminate struggle, accepting feels obviously correct. Refusing feels perverse, performative, a rejection of progress for its own sake.

But friction, in cognitive terms, is not mere inefficiency. Friction is often the signal that triggers adaptation. The discomfort of not-knowing, the effort of recall, the slow construction of understanding through attempt and error: these are the conditions under which durable competence forms. Eliminate them prematurely and you eliminate the encoding. You arrive at the answer without becoming someone who could have found it.

The design principle that follows is what might be called positive friction: deliberate preservation of effort at the stages where effort generates encoding, combined with aggressive tool use at the stages where encoding has already occurred. The sequence matters because the cognitive system responds differently depending on whether struggle precedes or follows the arrival of information. (The optimal sequence may vary by domain, task complexity, and the user's prior expertise: a novice learning fundamentals needs different friction than an expert extending mastery.)

This reframes the conversation about AI in education, in professional development, in the formation of expertise. The question is not how to restrict access to tools. Restriction fails and probably should fail. The question is how to structure workflows, curricula, and habits so that active recall precedes fluent delivery, so that people struggle with problems before they have access to solutions, long enough that encoding occurs.

The answers will vary by domain, by individual, by stage of development. But the principle holds: encoding requires effort; effort requires friction; and, for the sake of becoming someone who can navigate novelty, friction must come first.

Jeremy Howard states the stakes in career terms: outsource all your thinking and you stop up-skilling. Competence compounds through practice. Remove the practice, and the compounding stops. Those who maintain the friction of building continue to develop; those who outsource it plateau. The gap between them widens with each cycle.

The Concession and What Follows

A concession is due, and its precise scope matters.

The spec-verify loop works for producing artefacts. Given a well-defined specification, sufficient steering with iterative feedback, and a stable target, AI generates components that function reliably. This essay has not argued otherwise.

One might conclude that the encoding gap is therefore a solved problem for well-defined components: use AI for the parts, reserve human judgment for the whole. But this response reveals where the gap relocates rather than where it disappears.

What this essay has argued is that the artefact's stability does not guarantee a corresponding theory in the user. And if the user lacks theory of the component, that lack compounds when components must work together. The question that follows is not about individual artefacts. It is about composition. And composition theory is not component theory at scale. It is a different kind of theory, about a different object: not the parts but the boundaries, not the modules but the protocols, not the implementations but the spaces between. That theory cannot be specified into existence. It requires its own traversal: encountering how separately-stable things fail when forced to interact.

But concede even more: that you can, with care, develop domain vocabulary through specification and bridging vocabulary through examining what implementations reveal. The question still remains: what happens when you have many such artefacts, each stable in isolation? How do they compose? What theory do you need for the relationships between them, for the protocols that connect them, for the concerns that cut across all of them? That theory lives in no single specification. It requires its own traversal, its own genesis, at the level of composition itself.

The encoding gap does not disappear when spec-verify works for components. It relocates: to the level where components must cohere into systems, where boundaries must hold under pressure, where the geometry of interaction determines whether the whole is more than an accidental assemblage of working parts. In Bret Victor's terms, composition is not a higher rung on the same ladder; it is a different ladder entirely. The rungs are not implementation details but interaction patterns, not code paths but protocol boundaries, not function behaviour but emergent system dynamics. You must climb this ladder too, descending into concrete cases of composition failure and ascending to patterns of integration brittleness. That climbing cannot be inherited from having climbed the component ladder, however thoroughly.

What Remains

Krakauer's definition of stupidity, the persistent use of a rule system that does not improve with more information, names the trap. When your only interaction with complex systems is specifying and verifying, your mental models never encounter resistance. The artefact works; you do not learn why. The tests pass; you do not learn what they failed to check. Information arrives. The capacities that would let you evaluate it do not. The loop tightens. You become more dependent on the process precisely as you become less equipped to evaluate it.

The alternative is not to reject the tools but to sequence their use. Traversal first. Encoding first. Build the vocabulary that lets you specify meaningfully. Build the failure-models that let you verify deeply. Build the causal structure that lets you navigate when requirements shift and novelty arrives.

Genesis must occur at whatever level you operate. AI shifts the level upward, from implementation to specification to composition. This shift is legitimate when you are genuinely traversing at the new level, encountering constraints, learning from productive failure, building theory appropriate to your resolution. It becomes pathological when higher abstraction is mistaken for less theory required rather than different theory required.

AI delivers information with remarkable fluency. But knowledge, information with causal power, structure that lets you act on situations you have not encountered, forms only when the human does the work that fluency tempts us to skip: the friction of recall, the struggle of construction, the slow compression of experience into understanding.

The question is not whether to use the tools. The question is whether you remain someone who could have built what the tools build for you. Whether you carry the structure, not just the output. Whether, when the novel situation arrives, you can navigate, or only reach for a pattern you half-remember, or a tool that might approximate an answer.

That capacity is not given. It is built. And it is built only through the effort that the tools exist to eliminate.

Choose when to think. Accept the friction.


But "when" is only the first question. The second is "where." If spec-verify produces stable components, the locus of essential genesis shifts upward, not to implementation, which the tools can handle, but to composition: understanding how components relate, what protocols connect them, where boundaries must hold. The next essay takes up that question.


References

Alexander, C. (1964). Notes on the Synthesis of Form. Harvard University Press.

Bjork, R. A. (1994). Memory and metamemory considerations in the training of human beings. In J. Metcalfe & A. Shimamura (Eds.), Metacognition: Knowing about knowing (pp. 185-205). MIT Press.

Bratsberg, B., & Rogeberg, O. (2018). Flynn effect and its reversal are both environmentally caused. Proceedings of the National Academy of Sciences, 115(26), 6674-6678.

Deutsch, D. (2011). The Beginning of Infinity: Explanations That Transform the World. Allen Lane.

Dutton, E., & Lynn, R. (2015). A negative Flynn effect in France, 1999 to 2008-9. Intelligence, 51, 67-70.

Flynn, J. R. (1987). Massive IQ gains in 14 nations: What IQ tests really measure. Psychological Bulletin, 101(2), 171-191.

Hyppolite, J. (1946/1974). Genesis and Structure of Hegel's Phenomenology of Spirit (S. Cherniak & J. Heckman, Trans.). Northwestern University Press.

Karpicke, J. D., & Blunt, J. R. (2011). Retrieval practice produces more learning than elaborative studying with concept mapping. Science, 331(6018), 772-775.

Kauffman, S. A. (2000). Investigations. Oxford University Press.

McLuhan, M. (1964). Understanding Media: The Extensions of Man. McGraw-Hill.

Naur, P. (1985). Programming as theory building. Microprocessing and Microprogramming, 15(5), 253-261. Reprinted in P. Naur, Computing: A Human Activity (pp. 37-48). ACM Press/Addison-Wesley, 1992.

Roediger, H. L., & Karpicke, J. D. (2006). Test-enhanced learning: Taking memory tests improves long-term retention. Psychological Science, 17(3), 249-255.

Shen, A., & Tamkin, A. (2026). How AI Impacts Skill Formation: Experimental Evidence from Coding. arXiv preprint arXiv:2601.20245.

Victor, B. (2011). Up and Down the Ladder of Abstraction. worrydream.com/LadderOfAbstraction/.