The Archaeological Reversal

When AI sees structure better than you, what can we humans do?

February 03, 2026

This essay is the third of three. The first argued that knowledge, in David Deutsch's sense, is information with causal power, acquired through genesis (the effortful traversal through which understanding is built) and encoding (the articulation of that understanding into transferable form). The second made the argument that while AI can generate stable components, composition requires its own genesis at the level of interaction: protocols, boundaries, and the theory of how stable things relate. What follows acknowledges a reversal: where AI's structural capabilities exceed human capacity in ways that matter.

The previous two essays argued that genesis, the effortful traversal through which understanding is built, is essential for knowledge, and that AI delivers structure without genesis. But there is a reversal to acknowledge, a place where AI's structural capabilities exceed human capacity in ways that matter.

In systems of sufficient scale and complexity, AI can see structure more comprehensively than you can.

Not always. Not in every way. But in specific, important cases: where the sheer scale of what must be held in simultaneous attention exceeds what any human mind can manage, and where the relevant information is distributed across the codebase rather than concentrated in formal specifications. In these cases, AI's capabilities reveal structural patterns and dependencies that escape human perception, not through superior intelligence but through architectural difference. This creates a new kind of collaboration: AI as archaeological instrument, surfacing the actual structure of systems too complex for human cognition to grasp directly.

This essay explores that reversal, its limits, and what it means for the human role when AI becomes the better reader of what exists.

The Cognitive Constraint

Modern software systems exceed human cognitive capacity. This is not a metaphor or an exaggeration. It is a structural fact about the mismatch between human working memory and system scale.

A human engineer can hold perhaps four to seven discrete chunks of information in working memory at once (Miller, 1956; Cowan, 2001). This is a constraint rooted in cognitive architecture, something that training and expertise can optimise but not transcend. You can learn to chunk more effectively, to treat a design pattern as a single unit rather than its constituent parts, but the number of chunks remains bounded.

A moderately complex codebase contains thousands of files, hundreds of modules, intricate webs of dependency. The "cathedral" of the system architecture cannot be held in mind while laying any single "brick" of a specific function. You work locally, with partial views, trusting that the parts you cannot see will behave as you assume.

This trust is often misplaced. As teams scale, knowledge becomes distributed and diluted. The engineer who understood why a particular design decision was made has left. The documentation has decayed. The system has drifted from its original conception through the accumulation of local, reasonable decisions that nobody saw adding up to systemic problems. Naur (1985) observed that the primary product of programming is not the code but the theory the programmer holds; when programmers leave, the theory leaves with them.

The gap between system complexity and human attention capacity is not incidental. It produces real consequences: knowledge dissipates, inconsistencies accumulate, and the decisions that made sense locally prove unwise at scale. This is entropy, but not the thermodynamic kind. It is informational entropy, the degradation that follows inevitably when the system exceeds the working memory that could hold and protect it.

AI's Structural Advantage

AI does not share this constraint. Current large language models can hold context windows exceeding 100,000 tokens, with some architectures supporting over one million tokens (Anthropic, 2025; Google, 2025): entire codebases, not just single files. This is not intelligence in the sense of insight or creativity. It is something more basic and, in its domain, more powerful: comprehensive simultaneous attention. I want to also acknowledge the limits of current AI architecture, and their failure modes; but that does not affect the arguments that I am extending in this essay.

Consider what this enables:

An AI can check a thousand files with the same scrutiny it applies to ten. Pattern consistency (error handling conventions, naming standards, architectural rules) can be verified globally rather than sampled locally. The vigilance does not degrade with scale.

An AI can hold the architectural documentation in context while reviewing a specific line of code. The "forest" and the "tree" can be simultaneously present, allowing the model to notice when a local change violates global constraints that no human reviewer would have kept in mind.

An AI can access version history and identify regressions: patterns that were tried and failed months ago, now being inadvertently reintroduced by an engineer who was not present for the original failure.

An AI does not experience decision fatigue. The last file reviewed receives the same attention as the first. Deadline pressure does not cause the model to skip checks or approve changes it would otherwise flag.

These are not advantages in reasoning or judgment. They are structural advantages rooted in a different cognitive architecture, one not bounded by working memory limits or subject to attention decay. In systems of sufficient complexity, where the number of relationships that must be held simultaneously exceeds human working memory, and where the codebase follows irregular patterns that resist chunking into simpler abstractions, this architectural difference becomes consequential. In highly modular, well-documented systems where architecture is formally specified, the advantage diminishes: human review guided by good documentation may suffice.

The Patterns That Escape Human Notice

This structural advantage reveals problems that human review systematically misses. Not because the problems are subtle. Because their visibility requires holding too much context at once. Consider examples across a typical system's layers:

Database. A query runs fast in development. In production, with a million rows and concurrent writes, it locks tables that block the checkout flow. The human reviewer sees correct SQL. An AI holding schema, indexes, and usage patterns across all services could see the contention.

Communication. A new microservice calls an upstream API. That API already calls another service that calls back to the original. The cycle is invisible when reviewing any single service. An AI holding the full call graph could see the circular dependency before it causes cascading timeouts.

UI. A component re-renders on every state change. Harmless in isolation. But fifty instances on one screen, each triggering layout recalculation, produce visible lag. The human reviewer approves clean component code. An AI tracking instantiation counts and render paths could see the compound cost.

Probabilistic AI. A prompt template is updated. The change improves one use case. But the same template feeds three other workflows with different assumptions. The outputs shift in ways no single reviewer anticipates. An AI holding all downstream consumers and their expected distributions could flag the ripple effects.

Each example shares a structure: local correctness, global cost. The problem emerges only when sufficient scale and coupling are held simultaneously in mind. Comprehensive attention could, in principle, reveal such patterns.

A caveat: these examples illustrate structural possibility, not demonstrated performance. Whether AI systems reliably catch such errors in practice remains an open empirical question. Systematic comparative studies of AI-assisted versus human-only code review are sparse (Sadowski et al., 2018, document human review practices but predate modern LLM capabilities). The claim is about what comprehensive attention makes possible, not about what current tools consistently achieve.

The Archaeological Function

This reframes AI's role. When confronting an existing system, especially one that has grown organically, lost its original architects, and accumulated undocumented decisions, AI becomes an archaeological instrument in a specific sense.

The system exists. Its structure contains information about its own organisation, its patterns, its failure modes. But that information is distributed across too many files for any human to synthesise. The original knowledge has dissipated. The current team inherits the artifact without inheriting the understanding that produced it.

AI can build a model of what the system actually is. It can trace dependencies, identify patterns, surface inconsistencies, map the actual architecture (as opposed to the documented architecture, which has inevitably drifted). It can answer questions about the codebase that no current team member can answer, because no current team member has traversed enough of the code to hold the relevant information in active attention.

This is genuinely valuable. The structural model, the comprehensive map of actual dependencies, patterns, and relationships, is often more accurate and more complete than anything the team possesses. It reveals what is actually there, how the pieces actually connect, where the actual patterns of usage lie.

But, and this is the limit that matters, the structural model is not the same as understanding in the sense of causal-explanatory knowledge: knowing not just what the structure is but why it came to be this way rather than some other way, and which elements of the structure are load-bearing versus accidental.

Structure Without Genesis, Revisited

Return to Hyppolite's distinction (Hyppolite, 1946/1974). Structure is the configuration as it stands. Genesis is the effortful process through which it came to be: the decisions, the constraints, the pressures, the struggles that shaped the structure into what it is. AI excels at structural archaeology: revealing what is here, how it connects, what patterns recur. AI cannot do genesis archaeology with certainty: revealing why this decision, what constraints shaped it, what alternatives were rejected, which choices were load-bearing and which were accidents.

A qualification. AI trained on many codebases can guess at genesis. It pattern-matches: modules shaped like this usually exist because of constraints like that. Such guesses may be useful starting points. But pattern-matching is not knowledge. The local context may differ from the training distribution. The actual reason may be idiosyncratic: a political compromise, a regulatory deadline, an incident no one documented. Probabilistic inference offers plausibility, not certainty. Hypotheses to investigate, not grounds for action.

Structure and genesis are differently accessible. Structure is in the code, recoverable by comprehensive analysis. Genesis is not. It lives in the history of decisions, the context of constraints, the pressures that shaped choices. Much of that history is nowhere: undocumented, unpreserved, gone when the original architects left.

Some of it is in version control, but version control captures what changed, not why. The commit message says "refactored authentication flow." It does not say "the previous flow broke under load during the October incident, and the Engineering Director mandated this approach over the technically superior alternative because it could ship in two weeks."

Some of it is in documentation, but documentation decays. It describes what someone intended at the time of writing, which may diverge from what actually happened as the system evolved. The architecture document shows clean module boundaries; the code shows a tangle of cross-cutting dependencies that accumulated after the document was written.

Some of it is in people's heads, but those people leave. The knowledge walks out the door. The new team inherits the artifact and must reverse-engineer not just what it does but why it is the way it is.

AI can tell you what the system is. It can offer probabilistic hypotheses about why the system is that way. It cannot tell you with certainty which decisions are load-bearing: the ones whose removal would cause cascade failures, versus the historical accidents and arbitrary choices that could be safely revised. That distinction requires genesis knowledge, and genesis is precisely what was never captured or has since become inaccessible.

The Observer History

In earlier work (Vasa, 2025), I introduced the concept of "observer history": the accumulated experiential knowledge of working within organisational systems that cannot be transferred to AI agents. The term draws on Maturana's biology of cognition: what an observer can perceive depends on their history of structural coupling with their environment (Maturana & Varela, 1987). You see what your history has shaped you to see. This is where that concept becomes concrete and consequential.

Observer history is the knowledge of: what actually happened (the decisions, the incidents, the evolution), what constraints actually shaped those decisions, which constraints are real versus phantom, and why particular choices were made when alternatives existed.

The code says this endpoint requires two-factor authentication. What the code does not say: a customer lost $40,000 because the previous version did not. The breach made the news. The CEO apologised publicly. The engineer who added the requirement did so at 2 a.m. the night after, hands still shaking. The requirement looks like defence in depth. It is actually a scar. Remove it, and you remove the one thing standing between the system and litigation.

The code says this cache exists with this expiration policy. What the code does not say: this cache exists because of an incident that no one documented. The system failed under load. Someone added the cache as an emergency fix. The expiration policy was chosen because it was "long enough" to help and "short enough" to not cause obvious staleness issues. There is no analysis backing those numbers. They are cargo cult values that happened to work and hardened into assumptions.

The code says this service must not be called more than once per transaction. What the code does not say: it double-charged 3,000 customers on Black Friday. The refunds took six weeks. The support team worked through Christmas. The founder personally called the angriest customers. The idempotency check looks like defensive programming. It is actually a tourniquet. Remove it, and the bleeding starts again.

This is observer history: the accumulated knowledge of what actually happened, what pressures actually shaped the system, what constraints are real versus phantom, which parts of the structure are deliberate and which are artifacts. AI can read the code. AI can hypothesise about common patterns. It cannot know with certainty that the documentation is obsolete, that the specification embeds a compromise, that the dependency structure reflects Conway's Law in action: "organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations" (Conway, 1968). The code mirrors the org chart of three years ago. The org chart has since changed. The code has not. AI cannot know this because it is not in the code. It is in the people who built and maintained it.

The Danger: Mistaking Structure for Understanding

Here is the trap, and it is easy to fall into.

AI provides a structural analysis of impressive comprehensiveness. The map is better than anything you had. The patterns are genuinely identified. The inconsistencies are correctly flagged. You feel that you now understand the system.

You do not. You have the structure. You do not have the genesis, not in the emotional sense that matters.

This matters when you try to change the system. The structural map tells you what connects to what. It does not tell you which connections are load-bearing in the sense of encoding undocumented constraints whose violation would cause failures. You make a change that the structural analysis suggests is safe: the dependency is there, but it looks vestigial, the pattern is inconsistent with the rest of the codebase, the module seems like an obvious candidate for refactoring.

The change ships. The system breaks in a way the structural analysis did not predict, because the structural analysis did not know that this particular connection, inconsistent though it appears, encodes a constraint that no one documented. The load-bearing decision looked like a historical accident. You could not tell the difference because you had structure without genesis.

This is the verification paradox. You can test that code does what specifications require. You cannot test for what was never specified. The constraint that lives in someone's head, or lived there before they left, is exactly what your tests do not cover. Generate-check loops work for bounded problems with clear success criteria. They fail silently for the unspecified.

The danger is not that AI gives you wrong structural information. The structural analysis is often accurate. The danger is that accurate structural information creates confidence that is not warranted for action, because the information that would let you act safely (the genesis, the observer history, the undocumented constraints) is exactly what the structural analysis cannot provide with certainty. Comprehensive attention to structure is necessary but not sufficient for safe modification. The hetu (comprehensive structural attention) does not guarantee the sādhya (warranted confidence for action); it provides a necessary input that must be supplemented by genesis knowledge.

The Productive Collaboration

This analysis points toward collaboration rather than rejection of AI's role. The collaboration has a specific shape, and understanding it requires following the hypothesis through to action.

Step one: AI performs structural archaeology. What is the actual dependency graph? What patterns recur? Where are the inconsistencies? The output is a comprehensive structural map that exceeds what any human could produce manually.

Step two: Humans interpret the map through observer history, where it exists. This dependency looks vestigial: is it? This pattern is inconsistent: is it debt or deliberate? Sometimes the current team knows. Often they do not. The knowledge left with the people who had it.

Step three: When observer history is absent, humans must reconstruct it. This is where the work becomes distinctly human. You find the engineer who built that module, now at another company, and buy them coffee. You search old Slack threads for the incident that prompted the cache. You ask the support team why customers complained about that workflow. You read the commit messages from the month the architecture changed. AI can surface the questions. Only humans can pursue the answers through organisational memory and relationship.

Step four: Interpretation becomes hypothesis. Whether drawn from intact observer history or reconstructed through conversation, the interpretation is rarely certain. "This dependency probably encodes the constraint from the 2019 payment incident." "This pattern likely reflects the org structure before the platform team split." The hypothesis states what you believe and why.

Step five: AI tests the hypothesis against structure. "If this dependency encodes that constraint, where would we see its effects? What else would break if we removed it?" The human provides the theory. AI checks consistency with the evidence. Sometimes the structure corroborates. Sometimes it contradicts. The hypothesis sharpens.

Step six: The hypothesis shapes the action. This is the downstream payoff. A confident hypothesis (strong observer history, structural corroboration) justifies direct action. An uncertain hypothesis (reconstructed, partially corroborated) justifies cautious action: staged rollouts, feature flags, monitoring, easy reversal. A weak hypothesis (no observer history, structural ambiguity) justifies defensive action: change nothing until you know more, or change with extensive safeguards and the assumption that rollback will be needed.

The hypothesis is not academic. It is the bridge between structural knowledge and safe modification. Without it, you either act blindly (and break things) or freeze (and change nothing). With it, you act proportionally: confidence determines pace, uncertainty determines safeguards.

The Question of Preservation

One final movement of thought.

If observer history is essential and observer history walks out the door when people leave, how do we preserve it?

The obvious answer, documentation, fails in practice. Documentation decays. It describes intentions at the time of writing, not outcomes as they actually unfolded. It is written once and rarely updated. The system drifts, and the documentation becomes a historical artifact rather than a living map.

A better answer might be: AI as institutional memory.

Not AI that generates code, but AI that captures the reasoning behind decisions. Not documentation that describes the system, but dialogue that preserves the why. When an engineer makes a decision, they explain it to the AI. The explanation is preserved, searchable, connected to the code it concerns. When a future engineer encounters the code, the AI can surface not just what the code does but why it is that way, what constraints shaped it, what the alternatives were.

There is a partial precedent for this: decision records (Nygard, 2011; Tyree & Akerman, 2005) that document architectural choices and their rationales. The limitation of traditional decision records is maintenance burden and discoverability. AI could lower the burden (capturing explanations through natural dialogue rather than formal documentation) and improve discoverability (surfacing relevant decisions when engineers encounter related code).

There is a practical path to this. AI coding agents accelerate precisely the work that humans find tedious but necessary: maintaining build scripts, addressing linter warnings, keeping string literals and magic numbers in their proper places, structuring code for internationalisation and localisation, refactoring overlong methods into coherent units, fixing the minor glitches that accumulate into user frustration. This is discipline work. Humans find it draining because it demands vigilance without reward. AI systems find it trivial because vigilance without fatigue is what they offer. The discipline that is hard for humans is easy for machines.

When AI handles this maintenance burden, engineers gain time. More importantly, they gain cognitive space. The mental overhead of remembering to run the linter, of tracking down that one hardcoded value, of dreading the thousand-line method that needs splitting: that overhead dissipates. What remains is the work that requires judgment, context, interpretation.

This creates an opening. The time returned to engineers could be used to capture the decisions that usually go undocumented. The collaboration becomes bidirectional: AI scans the codebase and identifies trails that should have been in the documentation but were left out (why does this module exist? what constraint does this pattern encode? when was this exception handler added and what incident prompted it?). The engineer, no longer consumed by discipline work, has the bandwidth to answer. The answers accumulate into institutional memory. The observer history that would have walked out the door gets preserved, not through heroic documentation efforts but through the ordinary rhythm of AI-assisted development.

This remains speculative. The tooling does not exist in mature form. But the direction is suggestive: AI's role might include not just structural analysis but preservation of genesis, capturing the observer history that would otherwise dissipate.

The limit remains: captured explanations are still representations, not the tacit knowledge itself. The engineer who explains their decision captures what they can articulate. The tacit dimension of knowledge, the pattern recognition, the felt sense of what will work based on experiences that resist verbalisation (Polanyi, 1966), may resist capture. But partial preservation exceeds none. A system with captured decision rationales, even incomplete ones, is more navigable than one where genesis has entirely dissipated.

Conclusion

Where does human genesis most matter? The first essay argued for encoding at whatever level you operate. The second argued for composition theory that lives between the things composed. This essay argues for observer history and the interpretation of structure.

Across all three, the claim is consistent: genesis remains essential. It relocates as AI capability grows. It does not disappear.

AI can see the cathedral. Humans must know why the stones are placed as they are. That knowledge is built through traversal, preserved through institutional memory, applied through the collaboration between structural comprehension and interpretive understanding.

The archaeological reversal is real. And human genesis remains irreducible: not despite AI's capabilities, but precisely because of what those capabilities cannot reach.

References

Anthropic. (2024). Claude 3 model card. Retrieved from https://www.anthropic.com

Conway, M. E. (1968). How do committees invent? Datamation, 14(4), 28-31.

Cowan, N. (2001). The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences, 24(1), 87-114.

Deutsch, D. (2011). The Beginning of Infinity: Explanations That Transform the World. Allen Lane.

Google. (2024). Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv preprint.

Hyppolite, J. (1946/1974). Genesis and Structure of Hegel's Phenomenology of Spirit (S. Cherniak & J. Heckman, Trans.). Northwestern University Press.

Maturana, H. R., & Varela, F. J. (1987). The Tree of Knowledge: The Biological Roots of Human Understanding. Shambhala.

Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81-97.

Naur, P. (1985). Programming as theory building. Microprocessing and Microprogramming, 15(5), 253-261.

Nygard, M. T. (2011). Documenting architecture decisions. Retrieved from https://cognitect.com/blog/2011/11/15/documenting-architecture-decisions

Polanyi, M. (1966). The Tacit Dimension. University of Chicago Press.

Sadowski, C., Söderberg, E., Church, L., Sipko, M., & Bacchelli, A. (2018). Modern code review: A case study at Google. Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice, 181-190.

Tyree, J., & Akerman, A. (2005). Architecture decisions: Demystifying architecture. IEEE Software, 22(2), 19-27.

Vasa, R. (2025). AI changed what's possible. Knowing what's worth doing did not get easier. AI Engineering Leaflet. Retrieved from https://aiengineering.leaflet.pub/3mdby4o5cbk2i

Why AI can make organisations slower before it makes them faster

The Composition Problem

AI Engineering

Thoughts on how to thrive in the age of vibe coding, using AI agents, how to engineer with and for AI.