The Residue and the Trace: What AI Cannot Extract from Expert Knowledge

Tacit Knowledge, Training Corpus Boundaries, and the Knowledge AI Cannot Reach

February 08, 2026

Core Thesis: Experts know more than they can say. Most of what makes them good was never written down. AI learns from what was written down. So there's a permanent gap between what AI can learn and what experts actually know. As AI gets better at the writable stuff, the unwritable stuff becomes the thing that matters most.

Introduction: The Doctor Who Did Not Know

At the turn of the twentieth century, a physician achieved a spectacularly high success rate in diagnosing early-stage syphilis. Laboratory tests consistently confirmed his judgments. Yet when hospital administrators asked him to explain his method, he could not. It took weeks of observation by two colleagues before the secret surfaced: the doctor had been unconsciously registering a slight tremor in his patients' eyes (Scott, 1998). His expertise was real, productive, and empirically verified. It was also, in a precise sense, illegible: invisible not only to outside observers but to its own bearer.

This case is more than a curiosity. That experts possess tacit knowledge is well established. The urgent question is whether the recorded output of human civilisation, text, images, video, code, contains enough of that knowledge for AI systems to reconstruct it. If expert performance draws substantially on skills that were never externalised, like the doctor's unconscious detection of an eye tremor, then the training data has a structural hole, and no amount of scale or modality will fill it.

This essay argues that expert knowledge operates across three distinct layers, each with a different relationship to encoding:

Inferable metafeatures (Layer 1): the patterns of reasoning, analogy, and parsimony that pervade the written record. No expert articulated these as a set, but language models have inferred them from the statistical regularities of the corpus.
Craft heuristics (Layer 2): practical judgments that could be verbalised but rarely are. The training data preserves only scattered fragments.
Embodied perceptual sensitivities (Layer 3): knowledge forged through direct engagement with the world, operating below conscious access. This layer was never externalised in any modality, and no current system can extract it.

The practical consequence is a paradox of value. The more effectively AI systems capture what can be formalised, the more the residual value concentrates in what cannot. This essay calls that concentration the illegibility premium, and argues that it grows, not shrinks, with every advance in AI capability.

I. The Fractal Trace: What the Training Corpus Contains (Layer 1)

The case for AI's capacity to extract expert knowledge is stronger than sceptics acknowledge. The written record of human inquiry is not merely a warehouse of conclusions. It is a compressed trace of the process by which those conclusions were reached: the hypotheses entertained and discarded, the analogies drawn across domains, the aesthetic judgments about which explanations are parsimonious and which are baroque. Centuries of scientific papers, philosophical arguments, engineering reports, and literary criticism encode what we might call the inferable metafeatures of expert reasoning: coherence, parsimony, analogy, negation, and the recursive interplay among them. No single expert articulated these as a system. They are latent regularities, implicit in how experts write, argue, and revise, recoverable by any sufficiently powerful learner that trains on the aggregate.

Alan Turing, the mathematician whose 1936 formulation of the universal computing machine laid the theoretical foundation for modern computation, anticipated this architecture of intelligence. In his 1939 doctoral thesis on ordinal logics (Turing, 1939), he described mathematical reasoning as "the exercise of a combination of two faculties, which we may call intuition and ingenuity." Ingenuity, the capacity to follow logical sequences, was formalisable. Intuition, the capacity for "spontaneous judgments which are not the result of conscious trains of reasoning" (quoted in Dyson, 2012), was what carried the mathematician from one formalised sequence to the next, across terrain that logic alone could not traverse. Crucially, Turing conceived of machines, his O-machines, that could incorporate oracular steps into otherwise mechanical processes. The architecture was hybrid from the beginning: formal logic punctuated by leaps that the formalism itself could not produce.

Modern language models have, in effect, learned a statistical approximation of those leaps. Trained on the textual traces of billions of such judgment calls, they have internalised not just what experts concluded but the patterns of traversal by which experts navigate ambiguity. This is why AI systems have autonomously contributed to the resolution of open Erdős conjectures verified by Terence Tao (Tao, 2025), generated de novo theoretical physics published in Physics Letters B (Hsu, 2025), and achieved resolution rates approaching 80% on benchmarks requiring end-to-end diagnosis and patching of real-world codebases (Jimenez et al., 2024). The metafeatures of inquiry, the compass rather than the map, are genuinely present in the corpus and genuinely extractable.

Yet even within this layer, the extraction has clear limits. When the same coding benchmarks are extended to long-horizon, evolving tasks that demand sustained architectural reasoning, performance drops sharply, from roughly 65% to approximately 21% (Jimenez et al., 2024). The compass works well in familiar terrain. It falters where the terrain demands judgment that metafeatures alone cannot supply.

These metafeatures are not merely artefacts of textual convention. They are the same patterns that experts exhibit when they reason at their best: the physicist favouring parsimony, the engineer reasoning by analogy to prior designs, the programmer insisting on coherence across a codebase. Experts rarely name these principles while practising them, but the principles shape the textual output, and the textual output is what the model trains on. This is the first layer of expert knowledge (Layer 1): inferable metafeatures. It is the layer where the extraction thesis holds most firmly, and where AI capability is advancing most rapidly. But its boundaries are already visible, and they point downward, toward the strata where the corpus thins.

II. Craft Heuristics at the Threshold of Language (Layer 2)

Between the inferable metafeatures that models can extract from the written record and the fully embodied intuitions that elude language entirely, there exists a middle stratum. This is the domain of craft heuristics: practical judgments that could, in principle, be articulated but that practitioners rarely bother to put into words, either because the situations are too numerous, too context-dependent, or too obvious-seeming to warrant explicit treatment.

Donald Norman identifies this stratum precisely in his treatment of procedural knowledge: "the sort of knowledge that enables a person to be a skilled musician, to return a serve in tennis, or to move the tongue properly when saying the phrase 'frightening witches'" (Norman, 2013). Such knowledge is "difficult or impossible to write down and difficult to teach". Norman's formulation is careful: difficult, not categorically impossible. The tongue movements for "frightening witches" could, with sufficient phonetic analysis, be described. A tennis coach can articulate the biomechanics of a serve. The question is whether such articulations, even when they exist, capture enough of the operative knowledge to be actionable.

The evidence suggests they do not, at least not reliably. Adler and Van Doren make the point with characteristic directness in their treatment of practical reasoning: "a practical problem can only be solved by action itself. ... The kind of practical judgment which immediately precedes action must be highly particular. It can be expressed in words, but it seldom is" (Adler and Van Doren, 1972). The gap here is not ontological but economic and attentional. The practitioner could verbalise the judgment but has no reason to, because the verbalisation would be so particular, so hedged with contextual qualifiers, that it would be useless to anyone not already embedded in the same situation.

For language models, this creates a distinctive pattern of partial extraction. The corpus contains scattered traces of Layer 2 knowledge: the code review comment that says "this feels fragile" without explaining why, the clinical case report that describes a diagnosis but omits the twelve differential diagnoses the physician silently eliminated, the architectural decision record that states a choice but not the embodied familiarity with failure modes that motivated it. Peter Seibel's interviews with veteran programmers capture this stratum vividly. One programmer describes reaching "the peak of my abilities" as a state where "I had extremely trustworthy intuition. I would do things and they would just turn out right. ... Some of it, I'm sure, was experience that had simply gotten internalized so far down that I didn't have conscious access to the process" (Seibel, 2009). The textual trace here is a report about tacit knowledge, not the knowledge itself. A language model can learn that experienced programmers have trustworthy intuitions. It cannot learn the intuitions.

This explains a pattern visible across current AI benchmarks: strong results on well-structured tasks within established domains, degrading sharply as tasks become more open-ended, longer in horizon, and more dependent on accumulated situational judgment. The drop from approximately 65% to 21% on extended SWE-bench variants (Jimenez et al., 2024) is characteristic. The model has enough Layer 2 fragments to perform competently in the centre of the distribution but lacks the coherent, experience-grounded judgment to handle the tails. It can approximate the median practitioner's craft but not the master's.

III. Embodied Perception Beyond the Corpus (Layer 3)

The deepest stratum of expert knowledge is not merely difficult to articulate. It is non-linguistic in its formation and operation: perceptual sensitivities forged through direct physical engagement with the world, operating below the threshold of conscious access and therefore absent from the textual record on which current language models predominantly train. A caveat is necessary here. Multimodal models that incorporate video, audio, and sensor data may eventually access traces of this stratum that text cannot carry. A model trained on footage of clinical examinations might learn to detect the cues a diagnostician registers unconsciously. But even this possibility underscores the central point: the knowledge itself is not linguistic, and text-based training, which remains the backbone of current large language model architectures, cannot reach it.

Scott's syphilis diagnostician is the paradigm case. The eye tremor was a valid cue, reliably associated with early syphilis, and the physician's perceptual system had learned to detect it through thousands of examinations. But the learning occurred entirely within the sensorimotor loop of clinical practice. It never passed through language, not because the doctor chose not to describe it, but because he did not know he was doing it. The knowledge existed as a trained perceptual sensitivity, not as a proposition, and no amount of retrospective interview could have surfaced it without the independent observation that ultimately did (Scott, 1998).

This is not an isolated phenomenon. It is characteristic of expertise across domains. Scott generalises the principle through the concept of mētis, the Greek term for practical wisdom acquired through experience: "One powerful indication that they all require mētis is that they are exceptionally difficult to teach apart from engaging in the activity itself" (Scott, 1998). His examples are deliberately diverse: navigation, farming, craft production, medical diagnosis. In each case, the knowledge that separates the competent from the masterful is knowledge that resists extraction from practice into language.

The implications for language model training are stark. If this knowledge never enters text, it never enters the corpus. If it never enters the corpus, no architecture, no matter how deep or broadly trained, can extract it. The limitation is not computational but modal: the training signal simply does not contain the information. Dartnell makes the civilisational stakes of this point explicit in his account of medical knowledge transfer: it takes up to a decade of hands-on hospital training to achieve competency as a specialist physician, "all of this with training and hands-on demonstrations provided by someone already proficient. If this cycle of knowledge transfer breaks with the collapse of civilization, it will be impossible to teach yourself the necessary practical skills and interpretative expertise from textbooks alone" (Dartnell, 2014). Textbooks, like corpora, contain the trace but not the residue.

Bunnie Huang provides the engineering analogue. Despite decades of development in computer-aided design, "traditional craft still matters, because CAD tools haven't brought about the ability to simulate our mistakes before we make them" (Huang, 2017). The hand-feel of a solder joint, the visual assessment of a circuit board for manufacturing defects, the physical intuition for how a mechanism will behave under stress: these are perceptual skills that CAD models do not capture because they operate in the medium of physical interaction, not symbolic representation. The same gap that separates a CAD model from a craftsperson's hands separates a language model from a practitioner's perception.

IV. The Flow Paradox: Expertise Against Its Own Articulation

The three-layer model gains additional force from a phenomenon documented across multiple domains in the trails: expert performance depends on the suppression of the very articulatory mode that generates training data. The state in which practitioners do their best work is precisely the state in which they are least likely to produce the textual traces on which language models depend.

The pattern is remarkably consistent. In programming, Seibel's interviewees describe their best work as emerging from a state of complete absorption: "I have noticed over the years, the really good code I would write was when I'm in complete flow — just totally unaware of time: not even really thinking about the program, just sitting there in a relaxed state just typing this stuff and watching it come out on the screen" (Seibel, 2009). The corollary is equally telling: the code produced under conscious deliberation, when "something's saying, 'No, no, no, this is wrong, wrong, wrong,'" is the code that fails. The programmer's peak performance occurs when the linguistic, deliberative system has been silenced.

In creative practice, Byron Howard, a director at Disney Animation, distils the principle into a maxim: "If you think, you stink" (Catmull and Wallace, 2014). The injunction is not anti-intellectual. It is a recognition that the skilled practitioner's instrument, whether a guitar, a camera angle, or a narrative structure, must be operated through a mode of engagement that conscious reflection disrupts. Gallwey documents the same phenomenon in athletic performance with experimental precision: "images are better than words, showing better than telling, too much instruction worse than none, and that trying often produces negative results" (Gallwey, 1974). The element of the stroke that the student tried to remember was the one element he failed to execute. Everything absorbed without verbal instruction was reproduced flawlessly.

Steve Jobs connected this to contemplative practice, describing how sustained meditation creates "room to hear more subtle things — that's when your intuition starts to blossom and you start to see things more clearly and be in the present more" (Isaacson, 2011). The common thread across programming, animation, athletics, and contemplation is that expert perception operates in a register that verbal articulation actively interferes with.

This creates a systematic sampling bias in the corpus. What gets written down is the expert's retrospective rationalisation: the post-hoc account of why a decision was made, reconstructed through the deliberative system after the fact. What does not get written down, because it cannot be, is the real-time perceptual process that produced the decision. Language models trained on this corpus learn the rationalisation, not the perception. They learn what experts say about their expertise, not the expertise itself.

The paradox is structural for text-based training, not contingent. It cannot be resolved by collecting more text, because the text-generation process is inherently filtered through a linguistic bottleneck that excludes precisely the information that matters most. Sensor data, eye tracking, and physiological monitoring might one day capture traces of what the expert's body does during flow. But the current training paradigm, and the vast majority of the historical corpus, is linguistic. You could interview every master programmer on earth about their flow states and still not capture what their nervous systems are actually doing when the code "just comes out right."

V. The Illegibility Premium

Expert knowledge distributes across three layers with decreasing extractability. AI systems are rapidly automating the first: the inferable metafeatures (Layer 1) that LLMs have genuinely captured. They are partially automating the second: the craft heuristics (Layer 2) of which the corpus preserves only scattered fragments. But they cannot reach the third: the embodied perceptual sensitivities (Layer 3) that never enter language at all. As the upper layers yield to formalisation, the competitive and epistemic value of human contribution concentrates increasingly in that deepest stratum. This concentration is what we might call the illegibility premium: the economic and cognitive value that accrues specifically to knowledge that symbol-processing systems, whether statistical or rule-based, cannot capture.

The premium is not new. Scott's entire analysis in Seeing Like a State demonstrates that high-modernist schemes fail when they attempt to replace mētis with techne, substituting legible, centrally planned systems for the illegible, locally adapted knowledge of practitioners. The disastrous collectivisation of agriculture, the failures of scientific forestry, the social destruction wrought by authoritarian urban planning: these are all instances of the illegibility premium being violated, with catastrophic results (Scott, 1998). What is new is the scale and sophistication of the formalising technology. Language models are the most powerful mētis-extraction apparatus ever constructed. And precisely because they are so powerful, the residue they cannot extract becomes more consequential, not less.

Tetlock provides the essential caveat from the forecasting domain: "Whether intuition generates delusion or insight depends on whether you work in a world full of valid cues you can unconsciously register for future use" (Tetlock and Gardner, 2015). Expert intuition is not inherently reliable. It requires a structured environment with regular feedback, what Kahneman and Klein (2009) call a "kind" learning environment. The syphilis doctor operated in such an environment: thousands of examinations, each eventually confirmed or disconfirmed by laboratory tests. The illegibility premium does not attach to intuition per se but to calibrated intuition: perception that has been shaped by extensive engagement with a domain that provides valid, if sometimes delayed, feedback.

This distinction matters because it identifies the conditions under which the premium is genuine rather than illusory. Not all tacit knowledge is valuable. Some is mere habit, some is superstition, and some is the residue of training environments that no longer reflect current conditions. The premium attaches specifically to embodied perceptual skills that have been honed through extensive practice in feedback-rich environments. These are precisely the skills that Dartnell warns cannot be reconstituted from textbooks alone (Dartnell, 2014), and that Norman identifies as requiring demonstration and practice rather than instruction (Norman, 2013).

VI. Distributed Illegibility: From Individual to System

The illegibility premium operates not only within individual experts but across communities of practice. The same dynamic that makes a master craftsperson's perception irreducible to text makes a functioning craft tradition irreducible to a set of documented procedures.

Raymond's account of the Linux kernel's development provides a striking case. He had assumed that "the most important software needed to be built like cathedrals, carefully crafted by individual wizards or small bands of mages working in splendid isolation" (Raymond, 1999). Torvalds's bazaar model, releasing early and often with radical openness, seemed reckless for systems above a certain complexity threshold. Yet it worked, and it worked because the bazaar possessed a distributed form of practical judgment that no central plan could replicate: the collective mētis of thousands of contributors, each bringing domain-specific perceptual skills that could not have been specified in advance.

Ridley extends this observation to innovation more broadly: "even the cleverest in-house programmer is unlikely to be as smart as the collective efforts of ten thousand users at the 'bleeding edge' of a new idea" (Ridley, 2010). Much of what lead users "free-reveal" is articulable: bug reports, feature requests, documented usage patterns. This feedback enters corpora and is extractable by language models. But beneath the articulable reports lies the perceptual substrate that generated them: the user's ability to notice what is missing, to feel the friction of a clumsy interface, to sense an unexploited possibility. That noticing, shaped by each user's particular history of embodied interaction with the technology, is a form of distributed Layer 3 perception. The reports are its trace, not its substance.

The East Asian developmental states provide an institutional analogue. Studwell's analysis of Japan, Korea, and Taiwan reveals that successful industrial policy did not involve picking winners through theoretical analysis. Instead, "the state did not so much pick winners as weed out losers" (Studwell, 2013), using export performance as a feedback signal. The capacity to export told politicians what worked. This is mētis operating at the level of national economic management: practical judgment, responsive to feedback, resistant to theoretical formalisation, and dependent on sustained engagement with the messy particulars of industrial reality.

In each case, the distributed system's intelligence depends on a substrate of individual tacit knowledge that cannot be centralised without being destroyed. The bazaar works because its contributors possess perceptual skills that no cathedral architect could specify. The developmental state works because its feedback loops engage the practical judgment of thousands of firms. A language model trained on the outputs of these systems, the released code, the exported products, the published policies, captures the trace but not the distributed perceptual ecology that produced it.

VII. The Starving Ecology

The deepest risk of the current trajectory is not that AI will fail to replicate expert knowledge. It is that AI will succeed just enough to disrupt the conditions under which expert knowledge forms, while failing to provide an adequate substitute.

Consider the apprenticeship pipeline. Layer 3 knowledge, the embodied perceptual sensitivities that constitute the illegibility premium, is acquired through what Tetlock calls "bruising experience" (Tetlock and Gardner, 2015): direct, sustained, feedback-rich engagement with a domain. The junior developer who struggles through a codebase builds perceptual skills, a feel for code smells, an instinct for fragile abstractions, that the developer who receives an AI-generated patch does not. The materials science graduate student who manually characterises alloy samples develops a physical intuition for how microstructures behave under stress that no amount of reading synthetic data can provide.

If AI systems handle the tasks that serve as the training ground for Layer 3 acquisition, the pipeline narrows. Fewer practitioners develop the deep perceptual skills that constitute the premium. The corpus, in turn, receives fewer traces of those skills, since fewer humans are generating the experience from which such traces emerge. Scott's warning about the syphilis doctor becomes systemic: the diagnostic skill was transmissible, but only through direct observation of practice. Remove the practice, and the skill disappears within a generation.

This is not a Luddite argument against automation. It is a structural observation about the ecology of knowledge. The illegibility premium depends on a continuous supply of practitioners who have traversed the full generative process, from novice fumbling through to masterful perception. If AI tools eliminate the fumbling, they eliminate the traversal, and with it the formation of the very knowledge that makes the premium real.

A counter-argument deserves acknowledgment: AI tools might accelerate traversal rather than eliminate it. A chess player training with an engine reaches expert perception sooner than one studying only books, because the engine compresses the feedback loop. The question is whether the acceleration preserves the embodied engagement that forms Layer 3 knowledge or bypasses it. In chess, the player still makes moves, still sees the board, still feels the consequences of error. In AI-assisted coding, the practitioner may receive a correct patch without ever having inhabited the problem space that produced it. The distinction is between tools that compress the journey and tools that skip it. The risk attaches to the latter.

Gallwey's observation about the Inner Game provides the micro-level mechanism: "too much instruction worse than none, and trying often produces negative results" (Gallwey, 1974). The implication is that learning, like performance, requires a mode of engagement that explicit, verbally mediated instruction can disrupt. AI tools that interpose themselves between the learner and the task, offering suggestions, corrections, and completions, risk creating what Gallwey would recognise as a chronic "trying" state: conscious, deliberative, and ultimately counterproductive to the development of the intuitive perception that defines mastery.

VIII. Designing for the Illegible

If the illegibility premium is real and growing, then system design must protect the conditions under which tacit knowledge forms rather than optimising them away. Several principles follow.

First, preserve the traversal. The value of expert knowledge lies not in the destination but in the perceptual sensitivities acquired during the journey. Systems should be designed to scaffold the journey rather than skip it: providing context, suggesting framings, and accelerating feedback loops, while leaving the core act of judgment to the human practitioner. A coding agent that explains the problem space and presents diagnostic options preserves the developer's traversal. One that silently generates a correct patch does not. The goal is what we might call positive friction: resistance that builds capability rather than merely impeding throughput.

Second, show the world, don't interrogate the expert. Norman observes that effective design distributes knowledge between the head and the world, reducing the need for explicit articulation by making relevant structure perceptually available (Norman, 2013). The medical specialist did not need to recall a rule about eye tremors. He needed to see the patient. AI collaboration interfaces that demand detailed prompts, explicit intent specifications, and verbal justifications of choices force experts to translate their perceptual mode into a linguistic one, disrupting the very process that makes their judgment valuable. The alternative is to present the practitioner with rich, well-structured context and let their trained perception do the work. A code review tool that surfaces the relevant diff with surrounding context engages the expert's pattern recognition. One that asks "what should I check for?" forces articulation of what the expert already sees.

Third, maintain the ecology. The distributed illegibility of communities of practice, the bazaar's collective mētis, the lead users' embodied feedback, depends on a population of practitioners with diverse, deep, experience-grounded knowledge. Industrial and educational policy should attend to the apprenticeship pipeline as infrastructure, not as an inefficiency to be automated away. Studwell's developmental states succeeded because they maintained the feedback loops between production and policy (Studwell, 2013). The analogous challenge for AI-augmented knowledge work is maintaining the feedback loops between practice and perception.

Fourth, respect the paradox. Expert performance depends on the suppression of conscious articulation. Systems that demand more articulation as a condition of collaboration, more detailed prompts, more explicit specification of intent, more verbal justification of choices, will systematically degrade the performance of the practitioners they aim to support. The most effective human-AI collaboration may be the least linguistically mediated: systems that observe and adapt to the expert's behaviour rather than requiring the expert to translate their perception into language.

Conclusion: The Structural Hole

Language models have achieved something remarkable. By training on the textual and visual residue of human civilisation, they have inferred the metafeatures of expert inquiry: the patterns of reasoning, the heuristics of judgment, the recursive architecture of hypothesis and refinement that characterise productive intellectual work. This extraction is genuine, not superficial, and its practical consequences are already transforming scientific discovery, software engineering, and creative production.

But the textual residue is not the knowledge itself. It is the trace left by knowledge as it passed through the bottleneck of language. The most valuable expert knowledge, the calibrated perceptual sensitivities forged through sustained embodied engagement with feedback-rich domains, never passes through that bottleneck. It operates below the threshold of conscious access, degrades under explicit articulation, and is acquired only through the kind of direct practice that no corpus can contain.

The illegibility premium names this structural fact. As AI systems automate the inferable layers of expert knowledge, the value of the remainder does not diminish. It concentrates. The doctor who could not explain his diagnoses, the programmer whose best code emerged from states beyond thought, the craftsperson whose hands know what CAD cannot simulate: these are not relics of a pre-computational age. They are the permanent frontier of a knowledge ecology that formal systems can approach but never fully colonise.

The corpus has a structural hole. It is shaped exactly like the residue of human experience that language cannot carry. Recognising this hole, and designing systems that respect it, is the central challenge of the age of artificial intelligence.

References

Adler, M.J. and Van Doren, C. (1972) How to Read a Book. Revised edn. New York: Simon & Schuster.

Catmull, E. and Wallace, A. (2014) Creativity, Inc.: Overcoming the Unseen Forces That Stand in the Way of True Inspiration. New York: Random House.

Dartnell, L. (2014) The Knowledge: How to Rebuild Our World from Scratch. New York: Penguin Press.

Dyson, G. (2012) Turing's Cathedral: The Origins of the Digital Universe. New York: Pantheon Books.

Gallwey, W.T. (1974) The Inner Game of Tennis. New York: Random House.

Hsu, S.D.H. (2025) 'Relativistic covariance and nonlinear quantum mechanics: Tomonaga-Schwinger analysis', Physics Letters B, 862, p. 140053.

Huang, A. (2017) The Hardware Hacker: Adventures in Making and Breaking Hardware. San Francisco: No Starch Press.

Isaacson, W. (2011) Steve Jobs. New York: Simon & Schuster.

Jimenez, C.E., Yang, J., Wettig, A., Yao, S., Pei, K., Press, O. and Narasimhan, K. (2024) 'SWE-bench: Can language models resolve real-world GitHub issues?', in Proceedings of the Twelfth International Conference on Learning Representations.

Kahneman, D. and Klein, G. (2009) 'Conditions for intuitive expertise: A failure to disagree', American Psychologist, 64(6), pp. 515-526.

Norman, D.A. (2013) The Design of Everyday Things. Revised and expanded edn. New York: Basic Books.

Raymond, E.S. (1999) The Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary. Sebastopol: O'Reilly Media.

Ridley, M. (2010) The Rational Optimist: How Prosperity Evolves. London: Fourth Estate.

Scott, J.C. (1998) Seeing Like a State: How Certain Schemes to Improve the Human Condition Have Failed. New Haven: Yale University Press.

Seibel, P. (2009) Coders at Work: Reflections on the Craft of Programming. New York: Apress.

Studwell, J. (2013) How Asia Works: Success and Failure in the World's Most Dynamic Region. London: Profile Books.

Tao, T. (2025) 'The story of Erdős problem #1026', What's new [Blog]. Available at: https://terrytao.wordpress.com.

Tetlock, P.E. and Gardner, D. (2015) Superforecasting: The Art and Science of Prediction. New York: Crown Publishers.

Turing, A.M. (1939) 'Systems of logic based on ordinals', Proceedings of the London Mathematical Society, s2-45(1), pp. 161-228.

Beyond the encoding gap: Metafeatures, Swarms, and the Architecture of Discovery

Why AI can make organisations slower before it makes them faster

AI Engineering

Thoughts on how to thrive in the age of vibe coding, using AI agents, how to engineer with and for AI.