The Composition Problem

The spec-verify loop works for parts, but the theory of how parts relate lives in no single specification. Composition requires its own genesis, its own traversal, at the level of interaction between stable components.

February 02, 2026

The theory that lives between the parts

Note: This essay is the second of three on knowledge, genesis, and the human role in an age of capable AI. The first, "The Encoding Gap", argued that knowledge is structure you build through the effortful traversal of problem spaces.

The core thesis here: the spec-verify loop works for parts, but the theory of how parts relate lives in no single specification. Composition requires its own genesis, its own traversal, at the level of interaction between stable components.

The third essay, "The Archaeological Reversal," asks what AI's structural advantages actually buy us.

In the previous essay, I argued that knowledge is not information you receive; it is structure you build through struggle. AI delivers destinations without journeys. The artefact works. The understanding that would let you navigate it forms only through doing the actual work.

There is, however, a case that complicates this picture. Attending to it honestly reveals where the deeper difficulty lies.

A coding agent, given a well-defined specification and sufficient iterative feedback, can produce a stable subsystem. The spec-feedback loop is itself a form of genesis: you traverse at the specification level, encounter what implementations reveal about your requirements, refine your understanding of what you actually need. The human holds genesis at the spec level. The AI generates structure at the implementation level. The artifact stabilises. This works.

The question that now emerges is not about individual artifacts. It is about composition. When you have multiple stable subsystems, each with its own internal logic, the theory you need is theory of how they relate. And that theory lives in no subsystem. It cannot be generated by refining any single specification. It requires its own traversal, its own genesis, at a level that transcends the components.

This is the composition problem. And it is where the nature of boundaries, protocols, and cross-cutting concerns becomes central.

The Markov Blanket and the Boundary

The physicist and neuroscientist Karl Friston, through his work on the free energy principle and active inference, provides a framework that illuminates structural principles in engineered systems. Originally developed for biological systems, the framework applies to software architecture through a specific structural correspondence: both biological organisms and well-designed software modules maintain themselves by controlling what crosses their boundaries. The cell membrane and the API endpoint serve analogous functions, regulating exchange while preserving internal coherence.

An autonomous system, Friston argues, maintains itself against entropy by modelling its boundary conditions. The boundary (what he calls the Markov blanket) is the surface that separates internal states from external states. Internal states are shielded from direct external influence; they interact with the world only through the blanket. The blanket specifies what the system can ignore (everything internal, from the external perspective) and what it must attend to (the interface states that mediate interaction).

The Markov blanket is not merely a convenience for analysis. It is constitutive. A system is a system precisely because it maintains a boundary that allows internal organisation to persist despite external perturbation. Without the blanket, there is no inside and outside. There is just undifferentiated process.

This framing illuminates what good abstractions actually provide. A well-designed CRM system gives you "customer," "pipeline," "deal." An LMS gives you "course," "module," "assessment." These are not arbitrary labels. They are semantically coherent units with maintained boundaries. Changes in the CRM's database schema do not propagate to your workflow configuration because someone engineered that decoupling deliberately. The Markov blanket around your level of interaction is thick and robust.

At this level, you need theory appropriate to your resolution: theory about your business domain, about what workflows serve what purposes, about customer behaviour. You are doing genesis at your level. The CRM developer did genesis at theirs. The abstraction boundary lets these be genuinely separate. You do not need to know relational algebra to configure pipelines, because someone else holds the genesis of that layer and, through active maintenance and versioning discipline, sustains the boundary's integrity.

This is the legitimate case for working at higher abstraction levels. The theory you need is inversely proportional to the stability and semantic coherence of the abstraction boundary you operate above. Well-designed platforms give you genuine abstraction: boundaries that hold (because humans monitor and defend them), vocabularies that remain stable (through versioning and deprecation protocols), interfaces that shield you from irrelevant internal complexity.

The Problem With AI-Generated Structure

AI-generated code has no such guarantee, and let's make that distinction more precise.

When an AI produces ten thousand lines of code that satisfies a specification, it produces structure. But that structure was not shaped by the kind of genesis that sustains boundaries over time. The "structure" is static in a specific sense: whatever configuration the model produced that satisfies the immediate test cases. There is no one committed to defending the boundary, monitoring what changes might breach it, adapting the interface as external pressures shift.

You do not know which changes will propagate because no one analysed the domain to decide where boundaries should be. They emerged from the statistical patterns in the training distribution, not from principled decisions about what should be hidden from what. This does not mean the boundaries will necessarily fail; it means their robustness is contingent on how well the training distribution happened to capture the relevant domain structure. The structure is accidental in this precise sense: it reflects the model's output distribution, not a human's reasoning about failure modes, and its adequacy depends on factors outside your knowledge or control.

This extends beyond structure to vocabulary itself. When humans build systems, they develop domain models: naming conventions, entity relationships, taxonomies that encode hard-won understanding of what distinctions matter. These schemas evolve through dialogue. Consider how Salesforce's object model evolved: the distinction between "Lead" and "Contact" emerged from sales teams discovering that treating them identically caused pipeline confusion; "Opportunity" split from "Account" when it became clear these had different ownership patterns. A "customer" becomes distinguished from a "prospect" when someone notices that conflating them causes errors. An "order" splits into "quote," "commitment," and "fulfillment" when the business discovers these have different lifecycles. The vocabulary carries the residue of encounters with inadequacy. AI-generated code inherits naming from its training distribution: plausible labels that lack the scar tissue of having been wrong and corrected. The names work, but they do not encode the reasoning that would tell you when they stop working.

Compare this to human-written systems. A human developer, iterating over time with feedback, learns where boundaries naturally arise: where changes accumulate, where stakeholders need different views, where internal complexity threatens stability. The human internalises the cost of boundary-crossing and places boundaries to minimise it. This is a form of genesis: effortful traversal through which understanding of where boundaries should lie is built.

When you work above a CRM's abstraction layer, you can rely on the boundary because the vendor continues to invest in its maintenance. When you work above AI-generated code that is never revisited, no one is maintaining the boundary. If your specification was underspecified (and specifications always are, because language is never fully precise until it has encountered what it attempts to describe), the resulting boundaries may be arbitrary. They will not hold under pressure you did not anticipate. More critically: no one will intervene when they fail.

This is not a capability limitation that future models will overcome by training longer. It is structural. The boundary's integrity depends on someone understanding why it is where it is, what violations would look like, how to adjust it when the world changes. That understanding is developed through genesis, through having traversed the space of possible configurations and learned, from failures, where the better boundaries lie. Without it, you have structure that happens to work under current conditions, not structure designed to remain stable under perturbation.

At this point, another objection can rise: "AI has analysed thousands of codebases and seen thousands of boundary patterns. Its decisions about where to place abstractions may be as sound as those of an average engineer. Given that most organisations do not employ brilliant architects, and given the churn of contractors who build systems and leave, why would AI-generated boundaries be worse than what we already have?"

The objection has gravity, but it mistakes the target. The claim is not that AI-generated boundaries are worse than average corporate systems. The claim is that average corporate systems are precisely what accumulate technical debt, what become unmaintainable, what eventually require expensive rewrites. Contractor churn is not a feature to emulate; it is a pathology that produces systems no one understands. AI-generated code without maintained genesis has the same vulnerability, with one difference: at least a returning contractor might remember why something was done. AI-generated code has no such memory to recover. The question is not whether AI matches the median; it is whether the median is adequate for systems that must evolve under pressure. The composition problem reveals that it is not.

A clarification is necessary here: the essay earlier conceded that careful spec-feedback iteration can produce stable subsystems. The critique that follows applies specifically to AI-generated code produced without such iteration: one-shot generations, underspecified prompts, systems assembled without the human holding genesis at the specification level. Where the spec-feedback loop is robust, the boundaries may inherit that robustness. Where it is absent, the concerns below apply with full force.

One might object further: "if the subsystem is stable and the boundaries hold, why does it matter whether anyone holds the theory? Can we not simply operate the system without understanding it, as we operate countless technologies we do not understand?"

We can, so long as two conditions hold: the boundaries remain stable, and we do not need to compose the system with others in novel ways. The moment either condition fails, the absence of theory becomes acute. A stable subsystem that must now integrate with a new authentication provider, or handle a regulatory requirement its original specification did not anticipate, or interoperate with another stable subsystem whose assumptions subtly conflict: these are composition problems. And composition, as we shall see, is precisely where the absence of theory cannot be papered over by stability alone.

Wittgenstein's later philosophy illuminates why maintenance matters. Meaning, he argued, is not a mental state but a practice: "the meaning of a word is its use in the language." A boundary holds not because of how it was defined but because of how it is maintained: the ongoing practice of respecting it, of correcting violations, of adjusting it when the world shifts. Rule-following is not private; it requires a community that can recognise departures and bring practitioners back into alignment. AI-generated code has definitions but lacks the practice. There is no community around it checking whether the boundary is being respected, no ongoing correction of drift. The meaning of its abstractions is therefore fragile: stable under conditions that match the training distribution, vulnerable to anything that requires the kind of interpretive maintenance that Wittgenstein saw as constitutive of meaning itself.

Protocol as Stabilised Vocabulary

When two subsystems interact, they require a protocol: a stable vocabulary of interaction that specifies what can be said, what responses are expected, what the semantics of exchange are.

The protocol is the Markov blanket made explicit. It defines what internal states each subsystem can ignore about the other. It specifies the interface through which they couple. A well-designed protocol allows each subsystem to evolve internally without breaking the interaction, because the protocol specifies what must remain stable and what is free to vary.

David Deutsch offers a complementary lens. A good theory, he argues, is one that is hard to vary: its elements are tightly coupled such that changing any part breaks its explanatory power. Good protocols share this property. A well-designed protocol constrains precisely what must be constrained (message formats, response semantics, error signaling) while leaving internal implementation free to vary. It is hard to vary at the interface, flexible at the implementation. This is what makes it robust: you cannot accidentally modify the protocol without noticing that you have broken something. Poor protocols, by contrast, are easy to vary. Their constraints are loose, their semantics ambiguous, their coupling to implementation unclear. They drift without anyone noticing until composition fails.

This is where Friston's framework becomes practical. Autonomous systems maintain themselves by modelling their boundary conditions. When two such systems interact, they must develop shared models of the boundary between them. The protocol is that shared model, externalised and stabilised.

Michael Levin's work in developmental biology makes this concrete. Cells are autonomous systems: they maintain internal states, pursue morphogenetic goals, and model their boundary conditions through bioelectric signaling. When cells must coordinate to generate a limb or form an organ, they develop shared models via gap junctions: physical channels that create a common voltage space between neighbours. A cell at a wound site does not receive instructions from a central controller; it reads the bioelectric gradient, infers what anatomical structure belongs at its location, and adjusts its behaviour to converge with its neighbours on a shared morphogenetic outcome. The gap junction is the protocol made physical: a stabilised channel through which autonomous systems negotiate a shared model of the boundary between them. What Friston describes formally, Levin observes empirically. And what both reveal is that protocol is not imposed from outside but emerges from the interaction of systems that must coordinate while maintaining their own integrity.

Developing a protocol is itself a form of genesis and evolution is implied. You traverse the space of possible interactions. You encounter failures: messages that are ambiguous, responses that are misinterpreted, edge cases where the vocabulary breaks down. You refine the protocol through productive negation, learning from how it fails. The result is a stable vocabulary that both parties can rely on for semantic constancy and behavioural predictability: a boundary that holds because it was forged through encounter with the pressures it must withstand.

Now, what AI currently does not do is develop the protocol between subsystems through autonomous negotiation, and here is why: protocol development requires both parties to iteratively encounter and interpret each other's failures. When humans develop protocols with each other, each brings interpretive capacity: the ability to recognise when the other has misunderstood, to detect equivocation, to reframe what went wrong in terms the other can act on. When two AI systems attempt to develop a protocol, current architectures lack the reflective feedback about why the other's interpretation diverged. They can encounter failures; they do not model the other party's model in ways that allow genuine vocabulary repair.

Consider the contrast: when two engineering teams must agree on what "order complete" means, one team (payments) considers an order complete when the charge clears; another (fulfillment) when it ships; a third (support) when the customer confirms receipt. No schema or format specification resolves this. The repair requires each team to articulate why their definition exists, what business process it serves, and then negotiate a shared abstraction that accommodates the different purposes. Current AI systems can retry failed interactions, escalate to human oversight, or generate alternative formulations. What they do not do is engage in the metalinguistic turn: asking "what was my model of your model, and where did it diverge from your actual model?" This capacity for mutual model-revision is what allows protocols to be forged rather than merely imposed.

Humans can do something deeper: they can ask "what assumption was I making that you were not?" This metalinguistic turn is what allows protocol to be forged. AI systems can be programmed to try specific repairs (retry, escalate, re-specify), but autonomous protocol development at the level of "our vocabularies misaligned because our models of the world diverged" requires traversal that includes interpretation of the other party's model. This is not impossible in principle: future architectures might develop such capacities. But it is not what current AI systems do.

What this means in practice: you can use AI to help draft protocols, to generate candidate interface specifications, to explore the space of possibilities. What you cannot do, with current systems, is leave two AI agents to negotiate the protocol between them and expect to find, months later, a stable shared vocabulary that humans can rely on. You must inject human interpretation at the boundary.

Could future AI systems develop this capacity? Perhaps. The constraint is not logical impossibility. What would be required is not merely generating candidate protocols but modelling the other agent's model, detecting where models diverge, and negotiating vocabulary at the level of assumptions rather than symptoms. This is a form of mutual theory-of-mind applied to technical semantics. Current large language models do not do this; they generate plausible outputs conditioned on inputs, but do not maintain persistent models of interlocutors that update through interaction. Whether future architectures might develop such capacities is an open empirical question. The argument here is not that AI can never negotiate protocols, but that current AI does not, and that until it does, protocol development remains humanity's work.

The Geometry of Composition

The composition problem is not simply "how do subsystems interact." It is more complex, because concerns do not nest cleanly.

Some concerns are horizontal: they exist between subsystems at the same level of abstraction. The protocol between your authentication service and your user database is horizontal. Both operate at the same resolution. The genesis required is understanding how they couple, what assumptions each makes about the other, where the interface can fail.

Some concerns are vertical: they exist across resolution levels, connecting high abstraction to low. When Excel runs Python scripts, you are operating at two abstraction levels simultaneously within what appears to be a single system. The Markov blanket between spreadsheet logic and script logic is thin and permeable. Changes in your Python code can break your spreadsheet in ways that neither level, considered alone, would predict. The genesis required is understanding how the levels interpenetrate, where the higher level's operations affect the lower level in unexpected ways, what invariants must hold across resolution boundaries.

Some concerns are orthogonal: they cut through everything, operating at a different dimension entirely.

Security is orthogonal. It is not a subsystem. It is a property that must hold across the entire composition. One subsystem's "internal state" can leak through another's blanket. Security is adversarial pressure on blanket integrity itself. The genesis required is understanding how attackers think, where information flows unexpectedly, how composition creates surfaces that no component alone would expose. This kind of genesis is inherently iterative and adversarial: you learn security by encountering breaches, by recognising what you failed to anticipate, by hardening against patterns you have now seen.

Observability is orthogonal. Logging, monitoring, tracing: these are not features of any subsystem but properties of the whole. They require instrumentation that crosses every boundary, that reveals what the blankets are designed to hide. The genesis required is understanding what questions you will need to ask when things go wrong, and this understanding is inherently retrospective. In distributed systems, the critical logging requirement is almost always discovered the first time it was missing. An outage occurs; the team scrambles to diagnose; they discover that the correlation ID they needed was never propagated, that the timestamp resolution was too coarse, that the queue depth was logged but not the processing latency. Each incident deposits a new requirement. The observability layer is not designed once but accreted through successive encounters with its own inadequacy. This is genesis of a particular kind: learning what you need by not having it precisely when you need it. No specification anticipates this. It emerges from the friction of operating a system under pressure. This retrospective character of observability is not unique to AI-generated systems; human-designed systems face it equally. The difference is that humans who have lived through such incidents carry the scar tissue forward, shaping their next system's observability from the start. AI-generated systems, by default, lack this accumulated foresight.

Error handling is orthogonal. Failures do not respect module boundaries. A timeout in one subsystem propagates effects through others. The genesis required is understanding failure cascades, how partial failures create inconsistent states, where recovery is possible and where it is not.

These orthogonal concerns reveal something important: the composition is not just the sum of its parts. It has its own structure, its own failure modes, its own theory. And that theory lives in no subsystem. It emerges from understanding how the pieces relate: horizontally, vertically, and orthogonally.

Functors and the Meta-Level

Category theory offers a vocabulary for this level of thinking, and it is not accidental that functional programming has found it useful. (For the mathematical foundations, see Saunders Mac Lane's Categories for the Working Mathematician or, for a programming-oriented treatment, Bartosz Milewski's Category Theory for Programmers.)

A functor, in the mathematical sense, is not an object within a category. It is a mapping between categories that preserves structure. It operates at the meta-level, describing not things but relationships between kinds of things.

When you understand that a particular pattern (say, mapping a function over a collection) is an instance of a functor, you have compressed a vast space of specific implementations into a single structural insight. You can now recognise that same pattern in contexts that look superficially different: mapping over lists, over trees, over optional values, over asynchronous results. The functor is not in any of these implementations. It is the relationship between them.

This is genesis at the level of composition. You are not learning how any particular subsystem works. You are learning the shapes that recur across subsystems, the patterns of relationship that hold regardless of what the related things are.

Monads are similar. A monad is a pattern for composing operations that have context: operations that might fail, that might have side effects, that might be asynchronous. When you understand the monadic pattern, you can sequence such operations in any context where the pattern applies. You do not need to learn a new approach for error handling and a different one for asynchronous operations and a third for state management. You learn one pattern and instantiate it across domains.

But learning these patterns requires traversal at the meta-level. You must encounter the specific cases, feel the friction of composing operations that have context, notice the similarities across different contexts, and compress those similarities into the abstract pattern. Receiving the pattern as a definition is not the same as carrying the genesis that would let you recognise new instances, that would let you know when the pattern applies and when it breaks.

Current AI can generate code that uses monadic patterns, because such code exists in its training distribution. What AI currently lacks is the demonstrated capacity for meta-level genesis that comes from having traversed multiple concrete composition problems, refined approach through failure, and then abstracted upward. That capacity is what lets you compose novel subsystems in novel ways. It lives above any particular implementation, in the space of relationships between implementations. Whether AI systems could develop such capacity through different training regimes or architectures is an open question; what matters is that current systems do not exhibit it, and this has practical consequences for composition.

The Newtonian-Quantum Boundary

An analogy from physics clarifies the stakes.

Newtonian mechanics works magnificently within its domain. Objects have definite positions and momenta. Forces cause accelerations. Trajectories are deterministic. You can build bridges and launch rockets with Newtonian theory.

Quantum mechanics also works within its domain. Particles have probability amplitudes. Observables do not commute. Measurement affects the measured. You can build transistors and lasers with quantum theory.

Both theories are internally consistent. Both are empirically successful. But they do not automatically compose. The theory that unifies them, explaining how quantum behaviour gives rise to classical behaviour at scale and reconciling their different ontologies, remains incomplete. We have effective theories for specific regimes. We do not have a single theory that covers the boundary.

Composition across abstraction levels has the same structure. The CRM developer has theory that works at the database level. The CRM user has theory that works at the workflow level. Both theories are effective within their domains. But the theory of how they compose (how changes at one level affect the other, where the abstractions leak, what invariants must hold across the boundary) is not contained in either.

The analogy is imperfect: in physics, there is a single underlying reality at both scales, whereas in software, the database developer and workflow user may inhabit genuinely different conceptual domains. But the structural point holds: internal consistency at each level does not guarantee coherent composition across levels. Each level's theory was developed for its own concerns; the boundary between them was not the focus of either.

When the boundary is well-maintained by active human attention, you can mostly ignore this. The CRM developer's theory handles database concerns. Your theory handles workflow concerns. The abstraction boundary keeps them separate.

But when the boundary is poorly maintained, or when novel situations arise that the boundary was not designed for, you need theory of the boundary itself. You need genesis at the level of composition, understanding of how internally consistent theories can fail to cohere when forced to interact.

AI-generated subsystems are like having Newtonian and quantum mechanics without any theory of their relationship. Each subsystem may be internally consistent. But you have no theory of how they compose, because no one traversed that space, no one encountered the failures at the boundary, no one developed the intuitions that would let you predict where composition breaks down.

Where Genesis Must Live

Let me be precise about the claim.

AI can generate stable subsystems, given sufficient specification and feedback. This is legitimate. The human does genesis at the specification level, and the resulting artifact can be stable and useful, so long as that stability is not required to persist through domain shifts, API changes, or novel interaction patterns the original specification did not anticipate.

But composition requires its own genesis: theory of how stable things relate. That theory has several layers:

At the horizontal level, you need theory of how subsystems at the same resolution interact. What protocols connect them. What assumptions each makes about the other. Where those assumptions can fail. This is the work of negotiating shared vocabularies, and it is inherently dialogical. It requires beings capable of recognising when they have been misunderstood and reframing their own understanding in response.

At the vertical level, you need theory of how abstraction levels interpenetrate. Where the higher level's operations affect the lower level in unexpected ways. Where changes in the lower level leak through the abstraction. What invariants must hold across resolution boundaries. This requires experience with systems where boundaries have failed, and the kind of pattern-recognition that comes from having felt such failures.

At the orthogonal level, you need theory of concerns that cut through everything. Security, observability, error handling. How these concerns constrain the composition in ways that no individual subsystem specifies. This is learned by encountering attacks you did not predict, outages you did not expect, failure modes you did not think to plan for.

At the meta-level, you need theory of the patterns that recur across compositions. The functorial relationships. The monadic structures. The shapes of composition that let you recognise, in a novel situation, what kind of problem you are facing and what approaches might apply. This is learned by traversing enough specific compositions to notice the underlying patterns.

This multi-layered theory cannot be generated from a single prompt or specification. It cannot emerge without traversal at each layer: encountering the failures specific to horizontal interaction, to vertical leakage, to orthogonal pressure, to meta-level confusion. Each layer has its own genesis, its own productive negation, its own path from inadequate formulation to more adequate. (The spec-feedback loop, where applied, constitutes such traversal at the specification level; the point is that composition requires traversal at these additional levels, above and beyond specification.)

Whether machines might one day do this kind of traversal is an open question. Current systems generate structure but do not interrogate the assumptions embedded in it. Composition requires precisely this: asking not just "does this work?" but "what am I assuming, and where will those assumptions collide with assumptions I cannot see?"

Another reasonable objection can arise now: could reinforcement learning, trained on enterprise log files, develop this capacity? The logs contain traces of failures: timeouts, exceptions, cascading errors. If the reward signal captures system health, might an RL agent learn the composition patterns that humans learn through experience?

The answer is subtle. Logs record that something failed, and often how it manifested. What they rarely capture is why the failure was surprising, what assumption the original designers held that turned out to be wrong. The interpretive layer is missing. A human reviewing an incident asks: "We assumed the cache would never exceed 1GB; what made us think that, and why was it wrong?" The log shows cache overflow. It does not show the reasoning that made overflow seem impossible. RL optimising against such logs might learn to avoid the specific failure modes present in training. It would not learn the meta-capacity to recognise when its own assumptions are vulnerable to novel pressures. That capacity, asking "what am I assuming that I should not?", requires not just exposure to failures but the ability to model one's own models. Current deep RL techniques do not do this. Whether future architectures might is an open question, but it is not a capability that scales automatically with more data or compute. The claim here is empirical and provisional: current systems lack this capacity; future systems might develop it; the argument concerns what composition requires, not what AI can never achieve.

The Locus of Essential Human Genesis

Here, finally, is the reframe.

As AI capability grows, the locus of essential human genesis shifts. It does not disappear. It relocates.

When AI could not write code at all, human genesis lived at the implementation level. You had to traverse the space of possible implementations, encounter their failures, develop intuitions about what worked.

As AI becomes capable of generating stable subsystems from specifications, human genesis shifts upward, to the specification level itself, and to the composition level above that.

At the specification level, genesis means developing the vocabulary to specify meaningfully. This still requires traversal: you must have specified enough similar systems, encountered enough specification failures, felt enough of the gap between what you asked for and what you needed.

At the composition level, genesis means developing theory of how stable things relate. This requires traversal at the level of interaction: encountering protocol failures, abstraction leakages, orthogonal pressure, and meta-level confusion. And crucially, it requires the capacity to interpret these failures not as mere data points but as signals about what your models of the system were missing.

The trajectory is not toward obsolescence. It is toward higher-order genesis. The human contribution shifts from building subsystems to designing the protocols between them, from implementing features to understanding the geometry of composition, from writing code to discerning which decisions are load-bearing.

But (and this is crucial) higher-order genesis still requires traversal. You cannot jump to composition theory without having felt composition failures. You cannot design protocols without having encountered the ways protocols break. You cannot think in functors without having traversed enough specific cases to notice the pattern.

The encoding gap does not disappear as we move up the stack. It persists at every level. The cognitive work required for genuine understanding is still required. The difference is what the work is about: not implementation details, but specification vocabulary, composition geometry, protocol design, meta-level pattern recognition.

The Question for the World of Agents

One more movement of thought, pointing toward what comes next.

If AI can generate stable subsystems, and humans provide genesis at the specification and composition levels, what happens when the world fills with agents, both AI and human, interacting to get things done?

The protocols between them become the primary site of genesis. Not "how do I build" but "how do we interact." The stable vocabulary, the shared constraints, the negotiated boundaries.

This is political and institutional as much as technical. Who decides what the protocol is? Whose genesis shapes the interaction? When agents operate with different theories, different assumptions, different implicit protocols, the composition fails in ways that no single agent can diagnose.

The question of where genesis lives becomes a question about coordination. And coordination, at scale, is not a technical problem. It is a problem of shared understanding, of converging on vocabularies that both parties can rely on, of maintaining boundaries that hold under pressure neither party fully anticipated.

This is the composition problem writ large. And it will require its own essay to explore.

For now, the conclusion is this: AI can generate structure, given specification and feedback. What AI currently does not generate is the theory of composition: how structures relate, where boundaries hold, what protocols enable coordination. That theory requires traversal at the level of interaction, genesis that lives between the things composed rather than within them.

The locus of essential human genesis is not implementation. It is increasingly composition. And composition, at every level, still requires the cognitive work of traversal, the productive failure of encounter, the slow development of theory adequate to the pressures the composition must withstand.

References

Deutsch, David. The Beginning of Infinity: Explanations That Transform the World. London: Allen Lane, 2011.

Friston, Karl. "The Free-Energy Principle: A Unified Brain Theory?" Nature Reviews Neuroscience 11, no. 2 (2010): 127–138.

Friston, Karl. "A Free Energy Principle for Biological Systems." Entropy 14, no. 11 (2012): 2100–2121.

Levin, Michael. "The Computational Boundary of a 'Self': Developmental Bioelectricity Drives Multicellularity and Scale-Free Cognition." Frontiers in Psychology 10 (2019): 2688.

Levin, Michael. "Technological Approach to Mind Everywhere: An Experimentally-Grounded Framework for Understanding Diverse Bodies and Minds." Frontiers in Systems Neuroscience 16 (2022): 768201.

Levin, Michael, and Daniel C. Dennett. "Cognition All the Way Down." Aeon, 13 October 2020. https://aeon.co/essays/how-to-understand-cells-tissues-and-organisms-as-agents-with-agendas

Mac Lane, Saunders. Categories for the Working Mathematician. 2nd ed. New York: Springer, 1998.

Milewski, Bartosz. Category Theory for Programmers. Self-published, 2019. https://github.com/hmemcpy/milewski-ctfp-pdf

Wittgenstein, Ludwig. Philosophical Investigations. Translated by G. E. M. Anscombe. Oxford: Basil Blackwell, 1953.

The Archaeological Reversal

The Encoding Gap

AI Engineering

Thoughts on how to thrive in the age of vibe coding, using AI agents, how to engineer with and for AI.