← virtue

Crystals of Code

The vibe flows, but also crystallizes. How LLMs, code, and crystals are the same mathematics — and what that means for building with AI.

The thesis
LLM text generation and crystal growth are not analogous — they are mathematically identical. Both are instances of sampling from Boltzmann distributions defined by energy functions, where temperature controls the order-disorder tradeoff, and local rules produce global structure through pattern propagation. A single prompt is a nucleation event. The output is the crystal that grows from it. Proven formally in December 2025.

The Identity

The softmax function that governs every token choice in every LLM IS the Boltzmann distribution from statistical physics. Not "like it." IS it.

LLM token selection
P(token_i) = exp(logit_i / T) / Σ exp(logit_j / T)
Statistical mechanics (Boltzmann-Gibbs)
P(state_i) = exp(-E_i / kT) / Σ exp(-E_j / kT)

Same equation. The identification is exact:

LLM
Physics
Mathematics
Logit zi
Negative energy -Ei/k
Unnormalized log-probability
Temperature T
Temperature T
Entropy-energy tradeoff parameter
Σ exp(zj/T)
Partition function Z
Normalizing constant
softmax(zi/T)
Occupation probability pi
Boltzmann distribution

Zhao et al. (2025, arXiv:2512.15605) proved the full formal bijection: every autoregressive language model implicitly defines an energy landscape over the space of all possible token sequences. Every energy-based model has a unique autoregressive decomposition. The negative log-probability of a sequence under the LLM is its energy:

E(x) = -Σt log p(xt | x<t) + const

Shocking connection: the free energy principle

The Boltzmann distribution is the unique distribution that minimizes the Helmholtz free energy F = ⟨E⟩ - T·S, where S is Shannon entropy. The softmax attention mechanism implicitly minimizes free energy at every step. High logit = low energy = more attention. Temperature controls the tradeoff: low T → attend to the single best match (energy minimization), high T → attend broadly (entropy maximization). This is precisely the energy-entropy tradeoff in thermodynamics. Jaynes 1957, Baroni et al. 2024

The Growth Process

Both processes are sequential, locally determined, and produce emergent global order from local rules.

Crystal Growth
LLM Generation
Shared Mathematics
Seed crystal provides initial lattice template
Prompt provides initial pattern template
Boundary condition / initial state
Growth proceeds atom by atom
Generation proceeds token by token
Sequential sampling from Boltzmann distribution
Unit cell repeats — local rules → global order
Learned patterns repeat — local prediction → coherent text
Translational symmetry / pattern propagation
Low T → rigid, perfect crystal
Low T → deterministic, repetitive output
Ground state / energy minimum
High T → liquid/gas, no structure
High T → creative, eventually incoherent
Maximum entropy / thermal disorder
Defects (vacancies, dislocations)
Errors (hallucinations, contradictions)
Broken local symmetry / imperfect attachment

Compression IS crystallization

A crystal takes a chaotic liquid and extracts the pattern — the unit cell — then propagates it. O(N) information collapses to O(1). LLM training does the same: it takes the chaotic soup of the internet and extracts patterns into weights. Deletang et al. (2024, ICLR) showed that Chinchilla 70B literally outperforms PNG at image compression and FLAC at audio compression — because prediction = compression = understanding.

Structure Description Kolmogorov Complexity
Perfect crystal Periodic O(1) — just the unit cell
Quasicrystal Ordered, aperiodic O(1) — projection rules from higher dimension
Defective crystal Mostly periodic + defects O(1) + O(d) — unit cell + defect catalog
Glass Disordered, frozen O(N) — must specify every atom
Liquid Disordered, flowing O(N) per timestep

Kolmogorov 1965, Li & Vitanyi 2008, Krivovichev 2012, Estevez-Rams & Gonzalez-Ferez 2009

Crystal growth is information compression. LLM training is information compression. The Third Law of Thermodynamics (S=0 at T=0) is an information-theoretic statement: the ground state requires zero bits to describe beyond the rules. Zurek 1989

Phase Transitions

Grokking = crystallization

When training an LLM on modular arithmetic, it first memorizes (stores each example = amorphous/liquid state), then suddenly generalizes (discovers the algorithm = crystallization). Levi et al. (ICLR 2024) proved this is a first-order phase transition — mathematically identical to water freezing:

  MEMORIZATION CIRCUIT               GENERALIZATION CIRCUIT
  (amorphous / glass)                (crystal)

  Lookup table                       Algorithm
  High weight norm                   Low weight norm
  O(N) complexity                    O(1) complexity
  Stores each example                Encodes the rule

  -------- weight decay slowly makes crystal favorable -------->
  -------- nucleation barrier delays the transition ----------->
  ====> GROKKING (first-order phase transition) ====>

The two circuits coexist during the transition, exactly like ice and water at 0°C. Nanda et al. (2023) showed the generalization circuit for modular addition learns Fourier features — embedding numbers on a circle and computing angle sums. A compact, crystalline algorithm.

Power et al. 2022, Nanda et al. 2023, Levi et al. ICLR 2024, Varma et al. 2023, Chen et al. ICLR 2025

Scaling laws = thermodynamic limits

Chinchilla loss formula (Hoffmann et al. 2022)
L(N, D) = L + A/Nα + B/Dβ

L is the irreducible entropy of language — genuine unpredictability. No model can beat it. This is the ground state energy. The correction terms A/Nα and B/Dβ are finite-size scaling corrections, identical in form to corrections in statistical mechanics.

Shocking connection: universality classes

The renormalization group (Wilson, Nobel 1982) explains why microscopically different systems have identical critical exponents at phase transitions. Magnets, fluids, and percolation share the same physics near criticality. If grokking is a genuine phase transition, its critical exponents define a universality class — and completely different architectures (transformers, MLPs, CNNs) trained on different tasks should exhibit the same critical behavior. Early evidence supports this. Doshi et al. 2024, Bahri et al. 2024

Symmetry Breaking

Crystallization IS spontaneous symmetry breaking

A liquid has continuous rotational and translational symmetry — it looks the same everywhere, in every direction. When it crystallizes, that continuous symmetry snaps to a discrete subgroup: one of the 230 space groups in 3D. The liquid "chose" a lattice orientation. Nothing forced it. The symmetry of the laws is preserved, but the symmetry of the state is broken.

Mathematically: if the system's symmetry group is G and its ground state has residual symmetry H, the space of equivalent ground states is the coset space G/H. The broken generators produce massless excitations — Goldstone bosons (Goldstone 1961). In crystals, these are phonons. In ferromagnets, magnons.

A prompt IS explicit symmetry breaking

Before a prompt, the LLM's output distribution is symmetric across all possible topics. The prompt breaks this symmetry — it selects a direction in output space, exactly like applying a magnetic field to a paramagnet. The specificity of the prompt = the strength of the field.

Symmetry Concept
Crystal
LLM
Spontaneous breaking
Liquid → crystal (choosing a lattice)
Training (choosing weights from random init)
Explicit breaking
Applied field / substrate
Prompt / system instruction
Goldstone modes
Phonons (easy deformations)
Easy refactoring directions within the paradigm
Cascading breaking
G → H → K (cooling a mineral)
Progressive refinement: topic → style → specifics

What CAN'T break

Mermin-Wagner theorem

Continuous symmetries cannot spontaneously break in ≤2 dimensions at finite temperature. Thermal fluctuations destroy long-range order. Prediction: very shallow networks lack the "dimensionality" for certain types of long-range coherence.

Mermin & Wagner 1966, Nobel 2016 (Kosterlitz-Thouless)

Topological protection

Some states are protected by topology, not symmetry. You cannot destroy them without closing the bulk energy gap. Topological insulators have surface states immune to disorder. Time crystals (2017) break time-translation symmetry — repeating in time the way crystals repeat in space.

Haldane, Nobel 2016. Wilczek 2012, Zhang et al. 2017

Higher Dimensions

Crystals in arbitrary dimensions

DimensionSpace GroupsNotable LatticeConnection
3D230 FCC, BCC, diamond All natural crystals
4D4,783 24-cell lattice Quasicrystal projections
8D E8 lattice: densest sphere packing Viazovska proof (Fields Medal 2022), Lie groups, string theory
24D Leech lattice: 196,560 kissing number Monster group, monstrous moonshine, error-correcting codes

Quasicrystals ARE higher-dimensional crystals projected down

A Penrose tiling (2D, fivefold symmetry, aperiodic) is literally a 2D slice of a 5D periodic lattice. An icosahedral quasicrystal (Shechtman, Nobel 2011) is a projection from 6D. The cut-and-project method: take a higher-dimensional periodic lattice, cut through it at an irrational angle, project nearby points. Result: ordered but aperiodic. O(1) complexity.

Shocking connection: LLM embeddings ARE high-dimensional lattices

A transformer's residual stream is a ~4,096-dimensional space. Concepts are encoded as directions. Recent research shows cyclical concepts (months, days) arrange on circles, ordinal concepts (small/medium/large) on lines, categories form clusters (Chalnev et al. 2025). The model represents more concepts than it has dimensions through superposition — encoding features as nearly-orthogonal directions, tolerating small interference. This exploits the same geometry as high-dimensional lattice packings: the Johnson-Lindenstrauss lemma guarantees that random directions in high-D space are nearly orthogonal. Elhage et al. 2022, Park et al. 2024

Aperiodic tilings and computation

Wang (1961) asked: given a set of tiles, can they tile the plane? Berger (1966) proved this is undecidable — equivalent to the halting problem. Turing machines can be encoded as Wang tile sets. If the machine halts, the tiles can't tile the plane. Crystal growth IS computation. Tiling IS the halting problem.

The 2023 discovery of the hat monotile (Smith, Myers, Kaplan, Goodman-Strauss) — a single shape that tiles the plane but only aperiodically — shows that aperiodic order can emerge from the simplest possible rules.

Shocking connection: lattices = error-correcting codes

Construction A (Leech, Sloane) converts binary error-correcting codes into lattices. The Hamming [8,4,4] code produces the E8 lattice. The Golay [24,12,8] code produces the Leech lattice. Information theory and geometry are the same subject. Shannon's channel coding theorem (maximize information per symbol) and the sphere-packing problem (maximize density per dimension) are dual statements. Conway & Sloane 1999

Software as Condensed Matter

The phase diagram of software

PhasePhysicalSoftwareKey Property
Gas No interactions Brainstorming, pseudocode Maximum entropy, no structure
Liquid Short-range order Prototyping Reshapes easily, no rigidity
Glass Frozen disorder Legacy spaghetti code Metastable. Looks solid. No long-range order.
Polycrystal Ordered grains, disordered boundaries Microservices Fault isolation at boundaries, but overhead
Single crystal Complete long-range order Well-structured monolith Maximum consistency, but brittle — cracks propagate
Quasicrystal Ordered, aperiodic Event-driven / microkernel Ordered but not periodic. No single point of failure.

Technical debt = crystal defects

DefectCrystalCode
Point (vacancy)Missing atomMissing abstraction, TODO
Point (interstitial)Extra atomUnnecessary dependency
DislocationLine defect, propagates under stressBroken interface propagating through call chains
Stacking faultWrong layer sequenceWrong abstraction level
Twin boundaryMirror plane in crystalDuplicated functionality
Grain boundaryMisoriented regionsModule boundary with convention mismatch
VoidMissing regionDead code

The Taylor hardening law: stress to deform ∝ √(dislocation density). Software analog: effort to modify code ∝ √(technical debt density).

Refactoring IS annealing

Simulated annealing: heat the system (accept disorder), slowly cool (enforce constraints) → lower-energy state. Refactoring: relax constraints (accept temporary breakage), incrementally re-impose structure. Cool too fast = new glass (new spaghetti). Cool slowly enough = crystal (clean architecture).

LLM code generation IS epitaxial growth

Epitaxy = growing a crystal on an existing substrate, where the new material's structure is determined by the substrate. When an LLM reads your codebase (substrate) and generates code (growth layer):

The context window = interaction range. A 200K-token model has a longer "coherence length" than a 4K model — it can maintain crystallographic consistency over larger codebases.

Design patterns ARE unit cells

A design pattern (Singleton, Factory, Observer) is a repeatable structural unit that propagates through a codebase exactly like a unit cell in a crystal. It contains all the information needed to instantiate itself at any point. The codebase's space group is the complete set of architectural symmetry operations.

Fowler 1999, Martin 2003, Lehman 1996. See research/07-software-as-matter.md for the full mapping.

Non-Equilibrium: The Real Physics

Crystal growth is NOT an equilibrium process. It's a dissipative structure (Prigogine, Nobel 1977) — it requires continuous energy input and entropy export. Systems far from equilibrium can self-organize into states more ordered than equilibrium.

LLM inference IS a dissipative structure

Cut the power and the structure collapses. It exists only while being driven.

The edge of chaos

Computation is maximized at the boundary between order and chaos (Langton's λ parameter). Too ordered → repetitive, no information processing. Too chaotic → noise, no information retention. The optimal LLM temperature lives at this edge.

Dendritic instability = runaway repetition

The Mullins-Sekerka instability (1963): a growing crystal face becomes unstable — protrusions grow faster (they see steeper gradients), producing tree-like dendrites. The LLM analog: when the model locks onto a pattern, it self-reinforces through the context window, producing repetitive loops. This is dendritic overgrowth.

Surface tension (capillarity) stabilizes crystals against short-wavelength fragmentation. In LLMs, attention and learned constraints act as "surface tension" — preventing output from fragmenting into noise.

Shocking connection: Turing patterns in transformers

Turing's 1952 morphogenesis paper showed that diffusion + local reactions produce spatial patterns. A transformer has the same structure: the MLP layers are local reactions (nonlinear computation at each position), and attention is diffusion (mixing information across positions). LayerNorm acts as the fast-diffusing inhibitor. The transformer IS a reaction-diffusion system, and its ability to produce structured output follows from the same mathematics as animal stripe patterns. Turing 1952. See research/09-non-equilibrium.md

The Formal Bridge

Hopfield = attention = energy minimization

Ramsauer et al. (2021) proved that the transformer attention mechanism IS the update rule of a modern Hopfield network. Hopfield networks are spin systems — physical systems that minimize energy. John Hopfield shared the 2024 Nobel Prize in Physics for this. Therefore:

Attention IS energy minimization. Each attention head searches for the stored pattern (memory) that best matches the current query — by gradient descent on an energy function. The softmax attention weights are the Boltzmann probabilities of the stored patterns.

Spin glasses and loss landscapes

Parisi's replica symmetry breaking (Nobel 2021) describes the energy landscape of spin glasses — disordered magnets with competing interactions. The landscape is ultrametric: states organize into a hierarchical tree of nested basins. The loss landscape of neural networks has the same structure. SGD = thermal fluctuations. Weight decay = annealing pressure. Batch size = heat capacity.

Diffusion models ARE crystallization

Diffusion models (DDPM, score matching) reverse a noising process. The forward process = melting. The reverse process = crystallization. The score function ∇x log p(x) is the negative gradient of the energy — the force field that guides atoms to lattice sites.

The complete correspondence

Crystal GrowthLLM GenerationMathematics
Supersaturated solutionPrompt + model weightsInitial conditions
NucleationFirst tokens generatedSymmetry breaking
Unit cellLearned pattern/templateRepeating structural unit
Growth frontGeneration positionInterface: ordered | disordered
TemperatureTemperature parameterBoltzmann T
Crystal faceConsistent styleSymmetry constraint
DefectHallucinationBroken local symmetry
Grain boundaryTopic changeInterface between ordered regions
AnnealingFine-tuning / RLHFControlled thermal treatment
PolymorphismMultiple valid completionsDegenerate ground states
Phase transitionGrokking / emergenceOrder parameter discontinuity
Dendritic instabilityRepetitive loopsMullins-Sekerka instability
EpitaxyIn-context code generationGrowth on existing substrate
TwinningCode duplicationMirror symmetry defect

Chemical Bonding

The crystal analogy captures pattern propagation — but LLMs generating code do something more specific. They bind to existing APIs, extending code at attachment points like a molecule docking into a binding pocket. This is not accidental. It is the dominant mode of LLM code generation.

The binding hypothesis
An LLM generating code is not inventing from scratch. It identifies binding sites on existing APIs — function signatures, class interfaces, import patterns — and attaches new functional groups at those points. Induction heads are the molecular recognition apparatus. Training data frequency defines a Gibbs free energy landscape over the space of possible bindings.

Where the analogy works

Valence & binding sites

An API’s valence is its combining capacity — the number of parameters, required arguments, and configuration options it exposes. A function with three required parameters has valence 3. An unsatisfied required parameter is a radical — reactive until filled. A zero-argument pure function is a noble gas — inert, self-contained.

Chemical Property
Code Analog
Quality
Valence number
Parameter count / required args
Strong
Unsatisfied valence (radical)
Required param without default
Strong
Noble gas (full shell)
Zero-argument pure function
Strong
Multivalent atom (carbon)
Highly configurable API (pandas, React)
Strong
Coordination number
Number of callers / dependents
Moderate

Highly multivalent APIs serve as structural hubs in codebases, just as carbon serves as the backbone of organic chemistry. Libraries like pandas, React, and Express have high “carbon-like” valence.

Lock-and-key vs. induced fit

Fischer’s 1894 lock-and-key model maps to static type systems: the type signature is the lock, the argument is the key. The compiler is the molecular recognition apparatus. But Fischer was wrong about rigidity — Koshland’s 1958 induced fit model (enzyme adapts to substrate) maps to duck typing and dynamic dispatch.

Binding ModelChemical AnalogLanguage ImplementationError Rate
Rigid fit Fischer’s lock-and-key Rust borrow checker, Haskell types Very low (compile-time)
Semi-rigid Modern enzyme model TypeScript strict, Go interfaces Low
Induced fit Koshland’s model Python duck typing Higher (runtime)
Promiscuous Enzyme promiscuity JavaScript type coercion Very high

Just as enzymes with higher specificity produce fewer unwanted byproducts, languages with stricter type systems produce fewer runtime errors. And just as high-specificity enzymes are slower to evolve new functions, strictly-typed codebases are slower to adapt.

Bond types = coupling strength

Bond TypeCode CouplingExample
CovalentDirect call + shared mutable stateobj.method() with mutation
IonicEvent-driven with typed contractsTypeScript event emitter
HydrogenInterface/protocol conformanceGo interface, Python Protocol
Van der WaalsShared conventionsJSON naming conventions across services
MetallicShared mutable global stateGlobal vars, shared DB

Covalent code bonds are hard to break — just like covalent chemical bonds. Van der Waals forces are individually weak but collectively enable gecko adhesion; naming conventions are individually minor but collectively enable codebases to function. Metallic bonding (delocalized electrons) maps to shared mutable state: rapid communication (fast reads), impossible to reason about locally.

The LLM as catalyst

A catalyst lowers activation energy without being consumed. The LLM does exactly this:

Wang et al. ICSE 2025, CodeHalu AAAI 2025

Allosteric regulation: action at a distance

In biochemistry, an allosteric effector binds at a site other than the active site, changing the protein’s conformation and indirectly altering its function. Software has precise analogs: config files, feature flags, environment variables, dependency injection, and the CSS cascade all change behavior at distant locations through indirect binding events.

Shocking connection: induction heads ARE molecular recognition

Anthropic’s induction heads (Olsson et al. 2022) implement fuzzy pattern matching: if the context contains [A][B]…[A], predict [B]. This is directly analogous to molecular recognition in biochemistry — binding doesn’t require exact shape matching, just sufficient complementarity. When the LLM sees requests.get(, induction circuits retrieve patterns that followed it in training data, recognizing the binding site and attaching the complementary functional group. Olsson et al. 2022, Anthropic 2025

Where the analogy is bogus

No conservation laws

Code can be created from nothing and destroyed without residue. A refactoring can reduce 1,000 lines to 100 without losing functionality. A prompt can produce 10,000 lines from 10 words. You cannot write balanced equations for code transformations.

No spatial locality

Chemical bonds depend on 1/r² force laws. Code has no physical space. Any two functions can call each other regardless of “distance.” Steric hindrance, bond angles, molecular geometry — all meaningless. (Exception: the LLM context window imposes a locality constraint, but it’s topological, not spatial.)

Trivially reversible

In chemistry, breaking a covalent bond requires energy proportional to bond strength. In code, git revert cleaves any bond at zero thermodynamic cost. The barriers are cognitive and economic, not physical.

No equilibrium

Chemical systems reach thermodynamic equilibrium. Software never does. Per Lehman’s Laws, a used system must be continually adapted or it degrades. Software is permanently far from equilibrium.

Non-fungible components

Every hydrogen atom is identical. No two functions are, even with the same signature. Two sort(list) -> list implementations with different algorithms are chemically “identical” (same valence) but computationally distinct.

Intentionality

Chemistry has no concept of purpose. Software is designed to accomplish goals. The teleological dimension of code has no chemical analog.

Hallucinated bonds: when the chemistry goes wrong

When an LLM can’t find a real binding site, it invents one — fabricating plausible API methods that don’t exist. The CodeHalu framework (AAAI 2025) categorizes these into four failure modes:

Hallucination TypeChemical AnalogRate
Resource: nonexistent APIImaginary element25-43% of API misuses
Naming: wrong method nameWrong IUPAC name29-41% of method calls
Mapping: wrong typesIsomer confusionPervasive
Logic: wrong behavior modelWrong reaction mechanismHardest to detect

The generated code has the right “shape” — plausible function names, reasonable parameter patterns — but the bond target is imaginary. The De-Hallucinator (2024) mitigates this through iterative grounding, functionally identical to computational docking validation in drug design.

The free energy landscape of libraries

Training data creates a Gibbs free energy surface over possible code outputs. Popular libraries sit in deep energy wells; novel libraries face barriers. The “LLMs Love Python” study (2025) quantified this: NumPy imported unnecessarily in 48% of cases. Polars (faster pandas alternative) used in 0% of cases. Models contradict their own language recommendations 83% of the time.

Shocking connection: kinetic vs. thermodynamic products

In chemistry, the kinetically favored product (low activation energy, easy to reach) may not be the thermodynamically favored product (lowest total energy). LLM coding agents consistently produce the kinetic product — conservative, conventional code — rather than the thermodynamic product — potentially better but harder-to-reach architectural improvements. When Cursor tested agents with optimistic concurrency, they became risk-averse, “making only tiny safe changes.” They are trapped in local energy minima. Baez & Pollard 2017, arXiv:2503.17181

Full analysis: research/11-chemical-bonding.md (670 lines, 8,400 words)

What the Physics Predicts

If the correspondence is real — and the mathematics says it is — then results from crystal physics make testable predictions about LLMs:

1. Nucleation theory → prompt engineering

Critical nucleus size = minimum prompt length to "lock in" a direction. Too short → the generation wanders (weak supersaturation). A well-crafted prompt = heterogeneous nucleation on a prepared substrate — the barrier is lower.

2. Mullins-Sekerka → repetition failures

Fast growth is inherently unstable. The faster you push generation, the more susceptible to dendritic instability (repetitive patterns). Prediction: there exists an optimal generation speed that maximizes quality.

3. Grain boundary engineering → module interfaces

Maximizing low-Σ CSL boundaries dramatically improves material properties. Prediction: code quality depends on the types of interfaces between modules, not just module quality.

4. Mermin-Wagner → depth requirements

Continuous symmetry can't break in ≤2D. If model depth maps to dimensionality, very shallow networks can't learn certain types of order.

5. Quasicrystal theory → optimal architectures

The most interesting structures are projections from higher dimensions: ordered but not periodic, structured but not rigid. The best software architectures might be "projections" of simpler high-dimensional designs.

6. Naica cave → patience produces perfection

The largest crystals on Earth (12m selenite, Naica, Mexico) grew at 0.5mm per millennium under minimal supersaturation for 500,000 years. The most perfect code comes from low-temperature, long-duration growth — not from fast, high-energy sprints.

Sources & Deep Dives

This page distills 650KB+ of research across 11 chapters. The raw research lives in /crystal-code/research/.

Foundational papers

Autoregressive LMs are Secretly EBMs
Zhao et al. 2025 · arXiv:2512.15605

The formal bijection between autoregressive models and energy-based models.

Attention Is All You Need
Vaswani et al. 2017 · NeurIPS

The transformer architecture. Everything starts here.

Toy Models of Superposition
Elhage et al. 2022 · arXiv:2209.10652

How neural networks encode more features than dimensions.

Grokking as a First Order Phase Transition
Levi et al. 2024 · ICLR 2024

Proves the grokking transition is mathematically identical to crystallization.

Language Modeling Is Compression
Deletang et al. 2024 · ICLR 2024

Chinchilla 70B outperforms PNG and FLAC. Prediction = compression.

Training Compute-Optimal LLMs
Hoffmann et al. 2022 · arXiv:2203.15556

Chinchilla scaling laws. The thermodynamic limit formula for LLMs.

Hopfield Networks Is All You Need
Ramsauer et al. 2021 · ICLR 2021

Proves transformer attention = Hopfield energy minimization.

Circuit Tracing in Language Models
Anthropic 2025 · transformer-circuits.pub

Attribution graphs reveal how Claude performs multi-hop reasoning.

Crystal physics

Classical Nucleation Theory
Gibbs 1876, Volmer & Weber 1926, Becker & Doring 1935

ΔG* = 16πγ³/3(Δgv)². The nucleation barrier.

Prenucleation Clusters
Gebauer, Volkel & Colfen 2008 · Science

Non-classical nucleation: stable clusters exist even below solubility.

BCF Theory: Spiral Growth
Burton, Cabrera & Frank 1951 · Phil. Trans.

How screw dislocations enable crystal growth at low supersaturation.

Mullins-Sekerka Instability
Mullins & Sekerka 1963-64 · J. Appl. Phys.

Why flat crystal interfaces become unstable and form dendrites.

Naica Cave Megacrystals
Garcia-Ruiz et al. 2007, Van Driessche et al. 2011

12m crystals grown at 0.5mm/millennium. Patience = perfection.

The 230 Space Groups
Fedorov 1891, Schoenflies 1891, Barlow 1894

Complete classification of 3D crystal symmetry. Hahn (ed.) 2005, Int. Tables Vol. A.

Chemical bonding & code binding

CodeHalu: Code Hallucinations via Execution-based Verification
Tian et al. 2025 · AAAI 2025

Taxonomy of code hallucinations: mapping, naming, resource, logic. Four failure modes of wrong bonds.

How and Why LLMs Use Deprecated APIs
Wang et al. 2025 · ICSE 2025

Deprecated API usage rises to 70-90% with outdated context. Catalyst poisoning quantified.

LLMs Love Python
arXiv:2503.17181 · 2025

NumPy imported unnecessarily in 48% of cases. Polars used 0%. The free energy landscape of library bias.

In-context Learning and Induction Heads
Olsson et al. 2022 · Anthropic

The match-and-copy circuit: molecular recognition in transformers.

Compositional Framework for Reaction Networks
Baez & Pollard 2017 · arXiv:1704.02051

Category theory bridge: chemical reaction composition and software composition share formal structure.

Mechanistic Interpretability of Code Correctness
arXiv:2510.02917 · 2024-2025

SAEs reveal code correctness as anomaly detection. F1=0.821 for error detection vs. 0.504 for correctness.

Statistical mechanics & information theory

Renormalization Group
Wilson 1971, Nobel 1982

Why microscopically different systems share the same critical exponents.

Replica Symmetry Breaking
Parisi 1979, Nobel 2021

The ultrametric energy landscape of spin glasses = loss landscape of NNs.

More Is Different
Anderson 1972 · Science

The constructionist hypothesis fails. Each level of complexity requires new laws.

Maximum Entropy Principle
Jaynes 1957 · Phys. Rev.

The Boltzmann distribution IS the MaxEnt distribution given energy constraints.

Dissipative Structures
Prigogine, Nobel 1977

Systems far from equilibrium self-organize into states MORE ordered than equilibrium.

Kolmogorov Complexity
Kolmogorov 1965, Li & Vitanyi 2008

The shortest program that produces x. Crystals: O(1). Glass: O(N).

Sphere Packing in 8 and 24 Dimensions
Viazovska 2016, Fields Medal 2022

E8 and Leech lattice are optimal. Proved using modular forms.

Wang Tiles & Undecidability
Wang 1961, Berger 1966

Whether tiles can tile the plane is undecidable. Tiling = halting problem.

Research chapters

The full research (9,018 lines, 624KB) is organized into 10 chapters:

#ChapterKey Topics
01LLM InternalsTransformer math, mechanistic interpretability, softmax-Boltzmann proof
02Crystal GrowthNucleation theory, BCF growth, defect physics, 230 space groups
03Higher DimensionsE8, Leech lattice, quasicrystals, aperiodic tilings, codes ↔ lattices
04Statistical MechanicsIsing model, Landau theory, renormalization group, universality
05Information TheoryShannon, Kolmogorov, compression = prediction, crystal entropy
06Symmetry BreakingGoldstone theorem, Higgs mechanism, topological order, time crystals
07Software as MatterPhase diagram of software, tech debt as defects, design patterns as unit cells
08Category TheoryFunctors Cryst → Type, sheaf theory, Yoneda lemma, free energy principle
09Non-EquilibriumDissipative structures, edge of chaos, Landauer's principle, Turing patterns
10Energy Models BridgeARM-EBM bijection, Hopfield = attention, spin glasses, the formal proof
11Chemical BondingAPI valence, lock-and-key vs. induced fit, bond types, LLM as catalyst, hallucinated bonds, where it’s bogus