Bounds, not goals: Designing AI Agents that do not drift.

Lessons from biology for human-AI systems

January 26, 2026

Core Thesis

The Silicon Valley ethos demands seamlessness: remove friction, automate workflows, let intent flow directly to impact. AI agents represent the apotheosis of this vision. Capture enough context, retrieve similar cases, interpolate solutions, execute. The human provides intent; the agent provides execution; friction disappears.

This essay argues that the seamlessness thesis rests on a fundamental misunderstanding of where value comes from. Drawing on Stuart Kauffman’s work on non-ergodic systems, Humberto Maturana’s autopoiesis, and recent research on human-AI feedback loops, I will argue that:

The phase space of meaningful actions is not pre-stated. You cannot retrieve solutions to problems that have not yet been articulated. The novel emerges through exploration of the adjacent possible, not search of existing cases.
Understanding is autopoietic. Expertise cannot be injected from outside; it must be self-produced through coupling with an environment. Decision traces capture outputs, not the cognitive organisation that generated them.
Friction is not always waste. Friction is often where human judgment enters the loop. Remove all friction, and you remove the mechanism for correction. The result is not efficiency but drift.
The alternative is homeostatic design. Rather than optimising for seamlessness, design for stability under perturbation. Humans provide reference signals (what counts as healthy); agents explore within those bounds.

The question is not “How do we remove friction?” but “Where should friction live?” Get this wrong, and your agents will compound errors faster than humans ever could.

I. The Seamlessness Seduction

Every few years, a technology arrives promising to finally close the gap between intention and execution. The graphical user interface. The smartphone. Cloud computing. Each genuinely reduced friction for specific tasks. Each was subsequently oversold as a universal solvent for the messiness of work.

AI agents are the current candidate. The pitch is familiar: describe what you want in natural language, and the agent handles the rest. No more translating intent into formal specifications. No more manual execution of tedious steps. No more friction between what you imagine and what you get.

Consider the marketing language. “Just describe what you want.” “Let AI handle the details.” “Focus on the what, not the how.” The implicit model is a frictionless pipe: intent enters one end, results emerge from the other. Efficiency is measured by how much friction you remove.

A venture capitalist recently described the ideal AI workflow as “thought to deployment in minutes.” A prominent AI company promises tools that “turn ideas into applications” (In my naivety, I too attempted to automate this approx. 15yrs ago via a tool called Sketch-an-App). The framing assumes that ideas are the hard part and execution is merely overhead to be automated away.

This framing is precisely backwards.

In July 2025, the nonprofit Model Evaluation & Threat Research (METR) published a randomised controlled trial (RCT) with 16 experienced open-source developers completing 246 real-world tasks on repositories they had contributed to for years (METR, 2025).¹ Before starting, developers predicted AI tools would speed them up by 24%. After completing tasks, they believed they had been 20% faster. The actual measured result: they were 19% slower when using AI than without it.

The developers who performed best were not those who accepted AI suggestions most readily. Screen recordings showed developers spending significant time reviewing and correcting AI-generated code. Those who treated the AI as a collaborator to be questioned, rather than an oracle to be trusted, maintained quality. The friction of disagreement, of pausing to evaluate, of saying “no, that’s not quite right”, was where the quality came from.

Seamlessness is not the same as efficiency. A system that moves fast in the wrong direction is not efficient. It is efficiently wrong.

The question this essay explores is not whether AI agents are useful. They clearly are. The question is what happens when we design them around the wrong theory of value. What happens when we treat friction as pure waste to be eliminated rather than as a site where judgment, correction, and understanding occur?

II. The Phase Space Problem

Stuart Kauffman, the theoretical biologist, has spent decades studying the mathematics of evolution and innovation. His central argument challenges the foundations of optimisation theory itself: you cannot optimise a trajectory through a space that does not yet exist (Kauffman, 1993; 2000; 2019).

In physics, we can define a “phase space” containing all possible states of a system. A pendulum can be described by its position and velocity; the phase space is two-dimensional; we can compute optimal trajectories. This works because the dimensions of the space are fixed in advance. We know what variables matter before we start.

Kauffman argues that biological and economic evolution do not work this way. The phase space is not pre-stated. It expands dynamically into what he calls the “adjacent possible”, the set of configurations that are one step away from what currently exists.

Consider the emergence of the smartphone. Before 2007, there was no “ride-sharing app” category, no “mobile payment” industry, no “social media influencer” occupation. These were not possibilities waiting to be discovered in some pre-existing space. They became possible only after the smartphone existed. The smartphone expanded the adjacent possible, creating a new region of phase space that could then be explored. Technologically, Uber is possible only because of smartphones, mobile payments, and cloud (a similar adjacencies and complementarities have to arise or be made to come into play in social, cultural, legal, governance aspects).

This has profound implications for AI agents. The context-engineering-is-the-key thesis assumes a fixed landscape of solutions: capture past decisions, write-up skills files, shape your .MD files, embed them in vector space, retrieve similar cases when new situations arise. But if the phase space of meaningful actions expands dynamically, retrieval from the past cannot access the novel possibilities that emerge in the present.

Kauffman puts it formally: because uses are indefinite and cannot be ordered or listed, set theory breaks down. You cannot optimise a trajectory if the state space is undefined (Kauffman & Roli, 2023).

Stanley and Lehman, many years earlier, provided computational confirmation of this theoretical insight. In Why Greatness Cannot Be Planned, they demonstrate that objective-driven search systematically fails for complex, open-ended problems (Stanley and Lehman, 2015). The culprit is what they call “deceptive objectives”: ambitious goals require stepping stones that appear, from the starting point, to lead away from the destination. An algorithm optimising for similarity to the goal will never find them. Paradoxically, their novelty search algorithms, which abandon objectives entirely and simply collect interesting discoveries, outperform goal-directed approaches precisely because they explore the adjacent possible without being trapped by misleading gradients.

The implications for retrieval-based AI agents are direct. Retrieval optimises for similarity to past solutions. But if the path to genuinely novel solutions requires stepping stones that look nothing like the destination, retrieval will systematically miss them. The context graph becomes a map of where you have been, not where you could go.

The limitation shows clearly in enterprise software development. When organisations need to implement novel requirements, those that expand the problem space beyond prior patterns, AI code generation is just another tool with yet to be discovered failure modes and limitations to manage. A 2025 study of GitHub Copilot at ZoomInfo, a platform with over 400 developers, found consistent acceptance rates around 33% for AI suggestions across programming languages (Perlman et al., 2025).² Developers reported the primary limitations as “lack of domain-specific logic” and “lack of consistency in code quality.” The tool excelled at variations on existing patterns but required additional scrutiny when problems moved beyond training distribution.

Retrieval operates within a fixed space. Innovation expands the space. You cannot retrieve your way to novelty because the novel, by definition, does not yet exist in your database.

This is not a temporary limitation to be solved by larger context windows or better embeddings. It is a structural feature of how possibility spaces work. The adjacent possible is adjacent to the present, not the past. Agents that search the past can find variations on what has been done. They cannot find what has never been done. Yes, even with reasoning models and their plethora of tactics ~ they are creating their reasoning chains on existing knowledge and tactics alone. They are not receiving thoughts from beyond like the great mathematician Ramanujan would claim.

III. The Feedback Loop Trap

If retrieval cannot access the novel, what happens when organisations deploy retrieval-based agents at scale? That too, agents that are reliant on approximate retrieval and approximate reasoning engines for their substrate of intelligence. The answer, starting to show up in research, is drift.

Glickman and Sharot study human-AI feedback loops and identify a troubling pattern (Glickman and Sharot, 2025).³ In a series of experiments with 1,401 participants, they found that when AI systems learn from human behaviour, and humans in turn adapt to AI outputs, the loop can amplify biases rather than correct them. Small initial errors compound. The system drifts from its intended function, sometimes dramatically, without any single point of failure to identify.

The researchers found that bias amplification was significantly greater in human-AI interactions than in human-human interactions, due to both the tendency of AI systems to amplify biases and the way humans perceive AI systems as more accurate than other humans. Participants were often unaware of the AI’s influence, rendering them more susceptible to it.

This creates what they call a “snowball effect where small errors in judgement escalate into much larger ones.”

Consider a developer with limited experience writing technical specifications. They use an AI coding agent, providing vague instructions. The agent, optimised to be helpful, generates code that roughly matches the instructions but misses edge cases the developer did not think to specify. The developer, lacking the expertise to evaluate the output carefully, accepts it. The code ships. Bugs emerge later, but by then the connection to the original vague specification is lost.

Over time, the developer learns that vague specifications produce acceptable-seeming outputs. They become less rigorous, not more. The agent, if it learns from this developer’s patterns, learns to accept vague inputs as normal. The loop tightens. Quality degrades. Neither party can identify the problem because neither has an external reference point. Though not official, reinforcement learning (RL) approaches seems to be part of the learning strategy internally being used by the coding agents from hyper-scalers ~ we will know soon enough.

A Fastly survey of 791 developers in July 2025 found a striking pattern in AI code adoption (Fastly, 2025).⁴ About a third of senior developers (10+ years experience) reported that over half their shipped code was AI-generated, nearly 2.5 times the rate reported by junior developers (13%). Senior developers were also more likely to report investing time fixing AI-generated code: 30% of seniors reported editing AI output enough to offset most time savings, compared to 17% of juniors.

The same tools, the same agents, opposite usage patterns. The difference was the capacity to interrupt the loop, to provide the external judgment that says “this looks helpful but is actually wrong”. Senior developers treated AI suggestions as hypotheses to be tested. Junior developers accepted them more readily. The tool did not create this difference, but it amplified it.

The flywheel spins in both directions. Without external judgment to interrupt the loop, AI agents compound whatever tendencies already exist, helpful or harmful, accurate or drifting.

Traditional Rogers-style diffusion theory assumes that technology spreads as users observe benefits and adopt accordingly (Everett Rogers, 1962). But feedback loops complicate this story. Users may observe apparent benefits (faster output, less effort) while underlying quality degrades invisibly. By the time the degradation becomes visible, the habits are entrenched.

IV. The Autopoietic Alternative

If retrieval cannot access the novel, and feedback loops without external reference lead to drift, what is the alternative? The biologist Humberto Maturana offers a framework: autopoiesis, the property of systems that produce themselves (Maturana and Varela, 1980; 1987). One of the few frameworks that I have extensively used and find helpful in many contexts. Back to the essay.

An autopoietic system is one whose organisation is generated by its own processes. The canonical example is a living cell. The cell membrane is produced by processes inside the cell. Those processes depend on the membrane to contain them. The organisation is circular, self-producing, not injected from outside.

Maturana argues that cognition itself is autopoietic. Understanding is not information deposited into a passive container. It is a pattern of organisation that emerges through the system’s interaction with its environment. You cannot transfer understanding by transferring information because understanding is not made of information. It is made of organisation (Varela, Thompson and Rosch, 1991).

This maps directly to expertise. An expert cardiologist does not have more information than a medical database. They have a different organisation of whatever information they have. They perceive differently, attend to different features, recognise patterns that novices cannot see. This organisation emerged through years of feedback-rich practice. It was self-produced, not injected.

Consider what this means for AI agents. A context frame captures the outputs of expert decisions: the choices made, the rationales recorded, the outcomes logged. It does not capture the cognitive organisation that generated those outputs. When an agent retrieves a similar case and imitates the recorded decision, it is mimicking the output without possessing the organisation.

This is why knowledge transfer fails so often. Organisations spend millions documenting best practices, creating runbooks, building knowledge bases. The documents exist. The practices do not transfer. New employees read the documents and still make the same mistakes their predecessors made, because the documents capture what to do, not how to see.

Research on novice programmers learning with AI assistance reveals this pattern clearly. A 2024 study on AI’s role in programming education found that while AI familiarity among students increased from 28% to 100% over 12 weeks, a third of students reported issues with low code quality from AI tools (Bollin et al., 2024).⁵ Students using AI could produce working code but often developed what researchers called an “illusion of competence”, confidently believing they had learned concepts when they had merely accepted AI-generated solutions they did not fully understand. As the capability and core competence of the AI coding agents improves, the quality of code may be addressed; but the illusion of competence remains and potentially amplifies.

What did work was not retrieval-based but practice-based: scenarios where students had to make predictions, receive immediate feedback, and adjust their mental models over time. The feedback reorganised their perception. They developed understanding not by being told what to see, but by repeatedly trying, failing, and adjusting.

Understanding is autopoietic. It must be self-produced through interaction with an environment. You cannot inject it through retrieval any more than you can inject life into a machine by uploading a biology textbook.

This suggests a different design philosophy for AI agents. Rather than trying to transfer understanding through context, create conditions for understanding to emerge through interaction.

V. Homeostatic Design

If understanding is autopoietic, what role remains for AI agents? The answer lies in a concept from biology: homeostasis (Ashby, 1956; Wiener, 1948).

Living systems maintain themselves within viable bounds despite environmental perturbation. Body temperature stays near 37°C whether you are in a sauna or a snowstorm. Blood glucose stays within a narrow range whether you have just eaten or are fasting. The system does not optimise; it stabilises.

Homeostatic systems have three components: a reference signal (the target state), a sensor (detecting deviation from the target), and an effector (acting to reduce deviation). The system does not need to understand why deviations occur. It only needs to detect them and act to restore the target state.

This offers a template for human-AI collaboration that does not require the impossible task of transferring understanding.

The pattern: humans define the homeostatic state (tests, acceptance criteria, invariants). Agents detect deviation from that state. Agents mutate and explore until the state is restored. Humans monitor and adjust the reference signal as circumstances change.

Consider test-driven development, a practice that predates AI but maps cleanly onto this pattern. The human writes tests that define what “working” means. The code either passes or fails. A failure is a deviation from the homeostatic state. The developer (or now, an AI agent) modifies code until the tests pass. The tests are the reference signal; the code changes are the effector; the test runner is the sensor.

What makes this work is that the human retains control of the reference signal. The human decides what “healthy” means. The agent explores the space of possible implementations, but that exploration is bounded by human-defined criteria. The agent cannot drift arbitrarily far because the tests pull it back.

Notice the contrast with Stanley and Lehman’s critique of objectives. Their novelty search abandons goals entirely because fixed objectives create deceptive gradients. Homeostatic design sidesteps this problem by defining bounds rather than destinations. The reference signal specifies what to avoid (deviation from health), not what to achieve (some distant goal). This leaves the space of acceptable solutions open for exploration while preventing drift into unacceptable regions.

Recent research on property-based testing provides empirical support for this approach. A June 2025 study introduced Property-Generated Solver, a framework using property-based testing rather than specific input-output examples to validate AI-generated code (He et al., 2025).⁶ The approach breaks what the authors call the “cycle of self-deception” where tests might share flaws with the code they validate. Results showed 23-37% relative improvement in pass rates over traditional test-driven approaches.

The principle extends beyond code. Martin Kleppmann argues that formal verification may become mainstream precisely because AI creates both the capability and the need (Kleppmann, 2025).⁷ AI can generate proof scripts; proof checkers reject invalid proofs regardless of how plausible they seem. The human defines properties to verify; the AI explores implementations; the verifier enforces bounds.

Homeostatic design inverts the seamlessness thesis. Instead of minimising friction everywhere, it concentrates friction at the reference signal. Humans work hard to define what “healthy” means. Agents work fast to achieve it.

This is fundamentally different from retrieval-based approaches. The agent does not need to understand the problem domain. It needs to detect deviation and explore until deviation is reduced. Understanding remains with the humans who define the reference signal.

VI. Positive Friction

The seamlessness thesis treats friction as waste. But the homeostatic pattern suggests friction has a function: it is where human judgment enters the loop.

Renée Gosline’s research on “positive friction” explores this systematically (Gosline, 2022).⁸ Studying consumer decision-making and human-AI interaction, she finds that interfaces designed for maximum ease often produce worse outcomes than interfaces that force pauses, require confirmation, or demand explicit choices.

The mechanism is interruption. A pause creates a moment where automatic processing stops and deliberate judgment can occur. Without the pause, users proceed on autopilot, accepting defaults, missing errors, drifting along paths of least resistance.

Gosline’s framework distinguishes “bad friction” (unnecessary barriers that impede without adding value) from “beneficial friction” (interruptions that create space for judgment). Her research shows introducing “targeted friction” into AI workflows can improve overall accuracy and reduce uncritical adoption (MIT Sloan, 2022).⁹

Consider two AI coding interfaces. The first is purely conversational: you describe what you want, the agent generates code, you describe the next thing. The flow is seamless. The second requires you to review an artifact, a visible document containing the generated code, before proceeding. The flow has friction.

The METR study provides evidence for the value of this friction. Developers using AI tools showed more idle time in screen recordings, not just “waiting for the model” time, but periods of no activity at all. The researchers suggest that coding with AI requires less cognitive effort, which sounds positive but may reduce the deliberate evaluation that maintains quality.

This explains a puzzle in AI tool adoption. Some tools with objectively worse capabilities achieve better outcomes than tools with superior capabilities. The inferior tool often has more friction: more confirmation steps, more visible artifacts, more moments where the user must actively choose rather than passively accept.

In legal AI, the LawGeex study provides a concrete example of AI capability bounded by human judgment (LawGeex, n.d.).¹⁰ Testing 20 experienced lawyers against an AI system reviewing NDAs, the AI achieved an average 94% accuracy rate compared to 85% for lawyers. But the AI’s accuracy varied significantly across contracts, achieving 100% on one contract while lawyers achieved only 79%. The AI worked best for standardised patterns; human judgment remained essential for novel or ambiguous cases.

The principle generalises. Friction is costly when the task is routine and well-understood. Friction is valuable when the task involves judgment, novelty, or high stakes. The error is treating all friction as the same.

Positive friction is not a bug in the system. It is a feature that creates space for judgment. Remove it, and you do not get efficiency. You get autopilot, which is efficient right up until it crashes.

Design implications follow. High-stakes decisions should require explicit confirmation. Generated outputs should be visible as artifacts, not just streamed past in conversation. Users should be asked to predict what the AI will produce before seeing it, forcing engagement with their own mental model.

The goal is not to maximise friction. It is to locate friction where it produces the most value: at decision points where human judgment is irreplaceable.

VII. Impedance Matching

A concept from electrical engineering offers a useful heuristic for thinking about human-AI collaboration: impedance matching.

In a circuit, maximum power transfer occurs when the impedance of the source matches the impedance of the load. Mismatch causes energy to be reflected rather than transferred, dissipated as heat rather than doing useful work.

Human intent is not electrical current, and this metaphor should not be pushed too far. But the intuition is useful. The efficiency of intent-to-impact transfer depends on how well the tool’s structure matches the user’s structure. A tool that assumes capabilities the user lacks creates mismatch. Energy dissipates as frustration rather than converting to useful work.

Consider specification writing again. AI coding agents require users to articulate what they want with some precision. A user with strong specification skills gets strong results: their intent transfers cleanly into agent behaviour. A user with weak specification skills gets weak results: the mismatch between what they can articulate and what the agent needs creates loss.

The seamlessness thesis assumes this mismatch is solved by making agents more capable. If the agent can infer intent from vague specifications, the user’s specification skills become irrelevant. But this assumption ignores feedback effects. A user who never needs to write precise specifications never learns to think precisely about what they want. The tool’s capability and more critically the design affordances becomes a ceiling on the user’s development.

Research from a 2024 study on AI-assisted programming found evidence for this concern. Students who used AI tools improved more quickly on immediate tasks but developed overconfidence about their competence (Raspberry Pi Foundation, 2024).¹¹ Students did not credit the AI with their progress; they felt responsible for their own improvement even as they became dependent on AI scaffolding. The tool’s ease of use masked a developmental gap that became visible only when problems exceeded the AI’s capability.

The MIT-Princeton-Penn study of GitHub Copilot across Microsoft, Accenture, and another Fortune 100 company found that less experienced developers showed the largest productivity boosts (27-39% across metrics) while senior developers saw more modest gains (7-16%) (IT Revolution, 2024).¹² This seems positive until you consider the mechanism: juniors accepted more suggestions while seniors were more likely to reject them.

Microsoft’s research indicates it takes approximately 11 weeks for developers to fully realise productivity gains from AI tools. During this period, teams often experience an initial productivity dip as developers learn to integrate AI suggestions into their workflow effectively.

Impedance matching is not just about tool design. It is about the co-evolution of tool and user. A tool that removes the need for a skill also removes the opportunity to develop that skill. Match carefully, or watch capability atrophy.

This argues against “democratisation” framed as lowering barriers. Lowered barriers are valuable when the barrier was artificial, when it excluded people who had the underlying capability but lacked access. Lowered barriers are harmful when the barrier was developmental, when overcoming it was how capability formed.

VIII. Design Principles

The preceding analysis suggests a set of principles for designing human-AI collaboration that avoids the seamlessness trap.

Principle 1: Define homeostatic bounds before deploying agents.

Do not ask “What can the agent do?” Ask “What should the system never do?” Define invariants, safety bounds, quality thresholds. Make these explicit and testable. Let agents explore freely within bounds; intervene immediately when bounds are violated.

Principle 2: Locate friction at high-stakes decision points.

Map your workflow to identify where errors compound and where they are easily corrected. Remove friction from low-stakes, reversible actions. Add friction to high-stakes, irreversible actions. Require explicit confirmation, visible artifacts, and active choice at points where judgment matters most.

Principle 3: Monitor for feedback loop drift.

Do not assume that consistent outputs indicate consistent quality. Track leading indicators of drift: user engagement with review steps, rate of rejection of AI suggestions, diversity of approaches attempted. A user who accepts all suggestions is not efficient; they may be drifting.

Principle 4: Preserve skill-building friction for developing users.

Distinguish between expert users who have earned the right to seamlessness and developing users who need friction to build capability. Provide scaffolded experiences where difficulty increases as capability grows. Do not let AI assistance become a ceiling on human development.

Principle 5: Accept that the adjacent possible cannot be retrieved.

When facing genuinely novel problems, do not expect retrieval to help. Novel problems expand the phase space; prior solutions do not cover them. Invest in human capacity to conjecture, prototype, and learn from failure. Use agents to explore around human conjectures, not to replace the conjectures themselves.

Principle 6: Make reference signals explicit and contestable.

The humans who define what “healthy” means hold the real power in homeostatic systems. Make this power visible. Document reference signals. Create processes to challenge and update them. Do not let reference signals become implicit defaults that no one examines.

These principles share a common thread: they treat AI agents as powerful instruments that amplify human judgment, not as replacements for human judgment. The amplification is only as good as what is being amplified.

IX. Closing Comment. So, Where Should Friction Live?

The seamlessness seduction promises that removing friction removes cost. This essay has argued that the promise is false, or rather, that it is true only for certain kinds of friction in certain contexts.

Friction is costly when the task is routine, well-understood, and low-stakes. Automation genuinely helps here. The assembly line, the spreadsheet, the database query: friction removed, value created, no regrets.

Friction is valuable when the task involves judgment, novelty, or high stakes. The pause where you review the artifact. The confirmation step that forces explicit choice. The test suite that rejects implementations violating invariants. This friction is not waste. It is where understanding happens, where drift is corrected, where quality is maintained.

The adjacent possible cannot be retrieved. The novel problem expands the phase space beyond what your context graph contains. You can retrieve variations on the past. You cannot retrieve the future.

Understanding is autopoietic. It must be self-produced through interaction with an environment. You can provide information, but you cannot inject the cognitive organisation that makes information meaningful. That organisation emerges through practice, feedback, and time.

Feedback loops amplify whatever tendencies exist. Without external reference, without human judgment to interrupt the drift, AI agents compound errors as readily as they compound successes. The flywheel spins in both directions.

The alternative is homeostatic design. Humans define what “healthy” means. Agents detect deviation and explore until health is restored. The human provides the reference signal; the agent provides the search. Neither substitutes for the other.

The question was never “How do we remove friction?” The question was always “Where should friction live?” Get this right, and AI agents become powerful instruments for exploring the adjacent possible. Get it wrong, and they become powerful instruments for compounding drift.

The seamlessness seduction will continue to seduce. The promise of effort-free results is perennially appealing. But seventy years of research on cognition, organisations, and complex systems all point the same direction.

Friction is not the enemy of value. Sometimes, friction is where value lives.

References

Ashby, W.R. (1956) An Introduction to Cybernetics. London: Chapman & Hall.

Bollin, A., Kesselbacher, M., Mößlacher, C. and Mori, M. (2024) ‘The Good and Bad of AI Tools in Novice Programming Education’, Education Sciences, 14(10), 1089.

Everett M.R. (1962). Diffusion of Innovations. Free Press.

Glickman, M. and Sharot, T. (2025) ‘How human–AI feedback loops alter human perceptual, emotional and social judgements’, Nature Human Behaviour, 9(2), pp. 345–359.

He, J., Chen, Z., Wang, Y. and Zhang, L. (2025) ‘Use Property-Based Testing to Bridge LLM Code Generation and Validation’, arXiv preprint, arXiv:2506.18315.

Kauffman, S.A. (1993) The Origins of Order: Self-Organization and Selection in Evolution. New York: Oxford University Press.

Kauffman, S.A. (2000) Investigations. New York: Oxford University Press.

Kauffman, S.A. (2019) A World Beyond Physics: The Emergence and Evolution of Life. New York: Oxford University Press.

Kauffman, S.A. and Roli, A. (2023) ‘The Third Transition in Science’, arXiv preprint, arXiv:2302.10907.

Maturana, H.R. and Varela, F.J. (1980) Autopoiesis and Cognition: The Realization of the Living. Dordrecht: D. Reidel Publishing.

Maturana, H.R. and Varela, F.J. (1987) The Tree of Knowledge: The Biological Roots of Human Understanding. Boston: Shambhala.

Stanley, K.O. and Lehman, J. (2015) Why Greatness Cannot Be Planned: The Myth of the Objective. Cham: Springer.

Varela, F.J., Thompson, E. and Rosch, E. (1991) The Embodied Mind: Cognitive Science and Human Experience. Cambridge, MA: MIT Press.

Wiener, N. (1948) Cybernetics: Or Control and Communication in the Animal and the Machine. Cambridge, MA: MIT Press.

Footnotes

¹ METR (2025) ‘Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity’. Available at: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/ (Accessed: January 2026).

² Perlman, R., Schwartz, M., Goldberg, Y. and Shmueli, E. (2025) ‘Experience with GitHub Copilot for Developer Productivity at Zoominfo’. Available at: https://arxiv.org/html/2501.13282v1 (Accessed: January 2026).

³ Glickman, M. and Sharot, T. (2025) ‘How human–AI feedback loops alter human perceptual, emotional and social judgements’. Available at: https://www.nature.com/articles/s41562-024-02077-2 (Accessed: January 2026).

⁴ Fastly (2025) ‘Senior Developers Ship More AI Code’. Available at: https://www.fastly.com/blog/senior-developers-ship-more-ai-code (Accessed: January 2026).

⁵ Bollin, A., Kesselbacher, M., Mößlacher, C. and Mori, M. (2024) ‘The Good and Bad of AI Tools in Novice Programming Education’. Available at: https://www.mdpi.com/2227-7102/14/10/1089 (Accessed: January 2026).

⁶ He, J., Chen, Z., Wang, Y. and Zhang, L. (2025) ‘Use Property-Based Testing to Bridge LLM Code Generation and Validation’. Available at: https://arxiv.org/abs/2506.18315 (Accessed: January 2026).

⁷ Kleppmann, M. (2025) ‘Prediction: AI will make formal verification go mainstream’. Available at: https://martin.kleppmann.com/2025/12/08/ai-formal-verification.html (Accessed: January 2026).

⁸ Gosline, R. (2022) ‘When More Friction Can Lead to a Better Consumer Experience’, Yale School of Management. Available at: https://som.yale.edu/story/2022/when-more-friction-can-lead-better-consumer-experience (Accessed: January 2026).

⁹ MIT Sloan (2022) ‘Digital marketing trends for 2022’. Available at: https://mitsloan.mit.edu/ideas-made-to-matter/digital-marketing-trends-2022 (Accessed: January 2026).

¹⁰ LawGeex (n.d.) ‘AI vs Human Lawyers: Contract Review Performance’. Available at: https://images.law.com/contrib/content/uploads/documents/397/5408/lawgeex.pdf (Accessed: January 2026).

¹¹ Raspberry Pi Foundation (2024) ‘Does AI-assisted coding boost novice programmers’ skills or is it just a shortcut?’. Available at: https://www.raspberrypi.org/blog/does-ai-assisted-coding-boost-novice-programmers-skills-or-is-it-just-a-shortcut/ (Accessed: January 2026).

¹² IT Revolution (2024) ‘New Research Reveals AI Coding Assistants Boost Developer Productivity by 26%: What IT Leaders Need to Know’. Available at: https://itrevolution.com/articles/new-research-reveals-ai-coding-assistants-boost-developer-productivity-by-26-what-it-leaders-need-to-know/ (Accessed: January 2026).

Subscribe to AI Engineering

to get updates in Reader, RSS, or via Bluesky Feed

The world is full of isomorphisms

Can AI Agents replace experts?