Crystals of Code

The vibe flows, but also crystallizes. How LLMs, code, and crystals are the same mathematics — and what that means for building with AI.

The thesis

LLM text generation and crystal growth are not analogous — they are mathematically identical. Both are instances of sampling from Boltzmann distributions defined by energy functions, where temperature controls the order-disorder tradeoff, and local rules produce global structure through pattern propagation. A single prompt is a nucleation event. The output is the crystal that grows from it. Proven formally in December 2025.

The Identity

The softmax function that governs every token choice in every LLM IS the Boltzmann distribution from statistical physics. Not "like it." IS it.

LLM token selection

P(token_i) = exp(logit_i / T) / Σ exp(logit_j / T)

Statistical mechanics (Boltzmann-Gibbs)

P(state_i) = exp(-E_i / kT) / Σ exp(-E_j / kT)

Same equation. The identification is exact:

LLM

Physics

Mathematics

Logit z_i

Negative energy -E_i/k

Unnormalized log-probability

Temperature T

Entropy-energy tradeoff parameter

Σ exp(z_j/T)

Partition function Z

Normalizing constant

softmax(z_i/T)

Occupation probability p_i

Boltzmann distribution

Zhao et al. (2025, arXiv:2512.15605) proved the full formal bijection: every autoregressive language model implicitly defines an energy landscape over the space of all possible token sequences. Every energy-based model has a unique autoregressive decomposition. The negative log-probability of a sequence under the LLM is its energy:

E(x) = -Σ_t log p(x_t | x_<t) + const

Shocking connection: the free energy principle

The Boltzmann distribution is the unique distribution that minimizes the Helmholtz free energy F = ⟨E⟩ - T·S, where S is Shannon entropy. The softmax attention mechanism implicitly minimizes free energy at every step. High logit = low energy = more attention. Temperature controls the tradeoff: low T → attend to the single best match (energy minimization), high T → attend broadly (entropy maximization). This is precisely the energy-entropy tradeoff in thermodynamics. Jaynes 1957, Baroni et al. 2024

The Growth Process

Both processes are sequential, locally determined, and produce emergent global order from local rules.

Crystal Growth

LLM Generation

Shared Mathematics

Seed crystal provides initial lattice template

Prompt provides initial pattern template

Boundary condition / initial state

Growth proceeds atom by atom

Generation proceeds token by token

Sequential sampling from Boltzmann distribution

Unit cell repeats — local rules → global order

Learned patterns repeat — local prediction → coherent text

Translational symmetry / pattern propagation

Low T → rigid, perfect crystal

Low T → deterministic, repetitive output

Ground state / energy minimum

High T → liquid/gas, no structure

High T → creative, eventually incoherent

Maximum entropy / thermal disorder

Defects (vacancies, dislocations)

Errors (hallucinations, contradictions)

Broken local symmetry / imperfect attachment

Compression IS crystallization

A crystal takes a chaotic liquid and extracts the pattern — the unit cell — then propagates it. O(N) information collapses to O(1). LLM training does the same: it takes the chaotic soup of the internet and extracts patterns into weights. Deletang et al. (2024, ICLR) showed that Chinchilla 70B literally outperforms PNG at image compression and FLAC at audio compression — because prediction = compression = understanding.

Structure	Description	Kolmogorov Complexity
Perfect crystal	Periodic	O(1) — just the unit cell
Quasicrystal	Ordered, aperiodic	O(1) — projection rules from higher dimension
Defective crystal	Mostly periodic + defects	O(1) + O(d) — unit cell + defect catalog
Glass	Disordered, frozen	O(N) — must specify every atom
Liquid	Disordered, flowing	O(N) per timestep

Kolmogorov 1965, Li & Vitanyi 2008, Krivovichev 2012, Estevez-Rams & Gonzalez-Ferez 2009

Crystal growth is information compression. LLM training is information compression. The Third Law of Thermodynamics (S=0 at T=0) is an information-theoretic statement: the ground state requires zero bits to describe beyond the rules. Zurek 1989

Phase Transitions

Grokking = crystallization

When training an LLM on modular arithmetic, it first memorizes (stores each example = amorphous/liquid state), then suddenly generalizes (discovers the algorithm = crystallization). Levi et al. (ICLR 2024) proved this is a first-order phase transition — mathematically identical to water freezing:

The memorization phase = supercooled liquid (metastable, disordered)
"Droplets" of the generalizing solution nucleate within the memorizing solution
Weight decay = thermodynamic pressure driving crystallization
The transition exhibits latent heat, nucleation barriers, and critical exponents

  MEMORIZATION CIRCUIT               GENERALIZATION CIRCUIT
  (amorphous / glass)                (crystal)

  Lookup table                       Algorithm
  High weight norm                   Low weight norm
  O(N) complexity                    O(1) complexity
  Stores each example                Encodes the rule

  -------- weight decay slowly makes crystal favorable -------->
  -------- nucleation barrier delays the transition ----------->
  ====> GROKKING (first-order phase transition) ====>

The two circuits coexist during the transition, exactly like ice and water at 0°C. Nanda et al. (2023) showed the generalization circuit for modular addition learns Fourier features — embedding numbers on a circle and computing angle sums. A compact, crystalline algorithm.

Power et al. 2022, Nanda et al. 2023, Levi et al. ICLR 2024, Varma et al. 2023, Chen et al. ICLR 2025

Scaling laws = thermodynamic limits

Chinchilla loss formula (Hoffmann et al. 2022)

L(N, D) = L_∞ + A/N^α + B/D^β

L_∞ is the irreducible entropy of language — genuine unpredictability. No model can beat it. This is the ground state energy. The correction terms A/N^α and B/D^β are finite-size scaling corrections, identical in form to corrections in statistical mechanics.

Shocking connection: universality classes

The renormalization group (Wilson, Nobel 1982) explains why microscopically different systems have identical critical exponents at phase transitions. Magnets, fluids, and percolation share the same physics near criticality. If grokking is a genuine phase transition, its critical exponents define a universality class — and completely different architectures (transformers, MLPs, CNNs) trained on different tasks should exhibit the same critical behavior. Early evidence supports this. Doshi et al. 2024, Bahri et al. 2024

Symmetry Breaking

Crystallization IS spontaneous symmetry breaking

A liquid has continuous rotational and translational symmetry — it looks the same everywhere, in every direction. When it crystallizes, that continuous symmetry snaps to a discrete subgroup: one of the 230 space groups in 3D. The liquid "chose" a lattice orientation. Nothing forced it. The symmetry of the laws is preserved, but the symmetry of the state is broken.

Mathematically: if the system's symmetry group is G and its ground state has residual symmetry H, the space of equivalent ground states is the coset space G/H. The broken generators produce massless excitations — Goldstone bosons (Goldstone 1961). In crystals, these are phonons. In ferromagnets, magnons.

A prompt IS explicit symmetry breaking

Before a prompt, the LLM's output distribution is symmetric across all possible topics. The prompt breaks this symmetry — it selects a direction in output space, exactly like applying a magnetic field to a paramagnet. The specificity of the prompt = the strength of the field.

Symmetry Concept

Crystal

LLM

Spontaneous breaking

Liquid → crystal (choosing a lattice)

Training (choosing weights from random init)

Explicit breaking

Applied field / substrate

Prompt / system instruction

Goldstone modes

Phonons (easy deformations)

Easy refactoring directions within the paradigm

Cascading breaking

G → H → K (cooling a mineral)

Progressive refinement: topic → style → specifics

What CAN'T break

Mermin-Wagner theorem

Continuous symmetries cannot spontaneously break in ≤2 dimensions at finite temperature. Thermal fluctuations destroy long-range order. Prediction: very shallow networks lack the "dimensionality" for certain types of long-range coherence.

Mermin & Wagner 1966, Nobel 2016 (Kosterlitz-Thouless)

Topological protection

Some states are protected by topology, not symmetry. You cannot destroy them without closing the bulk energy gap. Topological insulators have surface states immune to disorder. Time crystals (2017) break time-translation symmetry — repeating in time the way crystals repeat in space.

Haldane, Nobel 2016. Wilczek 2012, Zhang et al. 2017

Higher Dimensions

Crystals in arbitrary dimensions

Dimension	Space Groups	Notable Lattice	Connection
3D	230	FCC, BCC, diamond	All natural crystals
4D	4,783	24-cell lattice	Quasicrystal projections
8D	—	E8 lattice: densest sphere packing	Viazovska proof (Fields Medal 2022), Lie groups, string theory
24D	—	Leech lattice: 196,560 kissing number	Monster group, monstrous moonshine, error-correcting codes

Quasicrystals ARE higher-dimensional crystals projected down

A Penrose tiling (2D, fivefold symmetry, aperiodic) is literally a 2D slice of a 5D periodic lattice. An icosahedral quasicrystal (Shechtman, Nobel 2011) is a projection from 6D. The cut-and-project method: take a higher-dimensional periodic lattice, cut through it at an irrational angle, project nearby points. Result: ordered but aperiodic. O(1) complexity.

Shocking connection: LLM embeddings ARE high-dimensional lattices

A transformer's residual stream is a ~4,096-dimensional space. Concepts are encoded as directions. Recent research shows cyclical concepts (months, days) arrange on circles, ordinal concepts (small/medium/large) on lines, categories form clusters (Chalnev et al. 2025). The model represents more concepts than it has dimensions through superposition — encoding features as nearly-orthogonal directions, tolerating small interference. This exploits the same geometry as high-dimensional lattice packings: the Johnson-Lindenstrauss lemma guarantees that random directions in high-D space are nearly orthogonal. Elhage et al. 2022, Park et al. 2024

Aperiodic tilings and computation

Wang (1961) asked: given a set of tiles, can they tile the plane? Berger (1966) proved this is undecidable — equivalent to the halting problem. Turing machines can be encoded as Wang tile sets. If the machine halts, the tiles can't tile the plane. Crystal growth IS computation. Tiling IS the halting problem.

The 2023 discovery of the hat monotile (Smith, Myers, Kaplan, Goodman-Strauss) — a single shape that tiles the plane but only aperiodically — shows that aperiodic order can emerge from the simplest possible rules.

Shocking connection: lattices = error-correcting codes

Construction A (Leech, Sloane) converts binary error-correcting codes into lattices. The Hamming [8,4,4] code produces the E8 lattice. The Golay [24,12,8] code produces the Leech lattice. Information theory and geometry are the same subject. Shannon's channel coding theorem (maximize information per symbol) and the sphere-packing problem (maximize density per dimension) are dual statements. Conway & Sloane 1999

Software as Condensed Matter

The phase diagram of software

Phase	Physical	Software	Key Property
Gas	No interactions	Brainstorming, pseudocode	Maximum entropy, no structure
Liquid	Short-range order	Prototyping	Reshapes easily, no rigidity
Glass	Frozen disorder	Legacy spaghetti code	Metastable. Looks solid. No long-range order.
Polycrystal	Ordered grains, disordered boundaries	Microservices	Fault isolation at boundaries, but overhead
Single crystal	Complete long-range order	Well-structured monolith	Maximum consistency, but brittle — cracks propagate
Quasicrystal	Ordered, aperiodic	Event-driven / microkernel	Ordered but not periodic. No single point of failure.

Technical debt = crystal defects

Defect	Crystal	Code
Point (vacancy)	Missing atom	Missing abstraction, TODO
Point (interstitial)	Extra atom	Unnecessary dependency
Dislocation	Line defect, propagates under stress	Broken interface propagating through call chains
Stacking fault	Wrong layer sequence	Wrong abstraction level
Twin boundary	Mirror plane in crystal	Duplicated functionality
Grain boundary	Misoriented regions	Module boundary with convention mismatch
Void	Missing region	Dead code

The Taylor hardening law: stress to deform ∝ √(dislocation density). Software analog: effort to modify code ∝ √(technical debt density).

Refactoring IS annealing

Simulated annealing: heat the system (accept disorder), slowly cool (enforce constraints) → lower-energy state. Refactoring: relax constraints (accept temporary breakage), incrementally re-impose structure. Cool too fast = new glass (new spaghetti). Cool slowly enough = crystal (clean architecture).

LLM code generation IS epitaxial growth

Epitaxy = growing a crystal on an existing substrate, where the new material's structure is determined by the substrate. When an LLM reads your codebase (substrate) and generates code (growth layer):

Homoepitaxy (good): matches your patterns, idioms, naming conventions
Lattice-matched heteroepitaxy (ok): different patterns but compatible
Strained heteroepitaxy (risky): patterns from a different paradigm → interfacial stress → misfit dislocations → bugs
Amorphous deposition (bad): ignores context entirely

The context window = interaction range. A 200K-token model has a longer "coherence length" than a 4K model — it can maintain crystallographic consistency over larger codebases.

Design patterns ARE unit cells

A design pattern (Singleton, Factory, Observer) is a repeatable structural unit that propagates through a codebase exactly like a unit cell in a crystal. It contains all the information needed to instantiate itself at any point. The codebase's space group is the complete set of architectural symmetry operations.

Fowler 1999, Martin 2003, Lehman 1996. See research/07-software-as-matter.md for the full mapping.

Non-Equilibrium: The Real Physics

Crystal growth is NOT an equilibrium process. It's a dissipative structure (Prigogine, Nobel 1977) — it requires continuous energy input and entropy export. Systems far from equilibrium can self-organize into states more ordered than equilibrium.

LLM inference IS a dissipative structure

Consumes energy (GPU computation)
Exports entropy (waste heat)
Maintains low-entropy output (structured text)
Operates far from equilibrium (away from uniform token distribution)

Cut the power and the structure collapses. It exists only while being driven.

The edge of chaos

Computation is maximized at the boundary between order and chaos (Langton's λ parameter). Too ordered → repetitive, no information processing. Too chaotic → noise, no information retention. The optimal LLM temperature lives at this edge.

Dendritic instability = runaway repetition

The Mullins-Sekerka instability (1963): a growing crystal face becomes unstable — protrusions grow faster (they see steeper gradients), producing tree-like dendrites. The LLM analog: when the model locks onto a pattern, it self-reinforces through the context window, producing repetitive loops. This is dendritic overgrowth.

Surface tension (capillarity) stabilizes crystals against short-wavelength fragmentation. In LLMs, attention and learned constraints act as "surface tension" — preventing output from fragmenting into noise.

Shocking connection: Turing patterns in transformers

Turing's 1952 morphogenesis paper showed that diffusion + local reactions produce spatial patterns. A transformer has the same structure: the MLP layers are local reactions (nonlinear computation at each position), and attention is diffusion (mixing information across positions). LayerNorm acts as the fast-diffusing inhibitor. The transformer IS a reaction-diffusion system, and its ability to produce structured output follows from the same mathematics as animal stripe patterns. Turing 1952. See research/09-non-equilibrium.md

The Formal Bridge

Hopfield = attention = energy minimization

Ramsauer et al. (2021) proved that the transformer attention mechanism IS the update rule of a modern Hopfield network. Hopfield networks are spin systems — physical systems that minimize energy. John Hopfield shared the 2024 Nobel Prize in Physics for this. Therefore:

Attention IS energy minimization. Each attention head searches for the stored pattern (memory) that best matches the current query — by gradient descent on an energy function. The softmax attention weights are the Boltzmann probabilities of the stored patterns.

Spin glasses and loss landscapes

Parisi's replica symmetry breaking (Nobel 2021) describes the energy landscape of spin glasses — disordered magnets with competing interactions. The landscape is ultrametric: states organize into a hierarchical tree of nested basins. The loss landscape of neural networks has the same structure. SGD = thermal fluctuations. Weight decay = annealing pressure. Batch size = heat capacity.

Diffusion models ARE crystallization

Diffusion models (DDPM, score matching) reverse a noising process. The forward process = melting. The reverse process = crystallization. The score function ∇_x log p(x) is the negative gradient of the energy — the force field that guides atoms to lattice sites.

The complete correspondence

Crystal Growth	LLM Generation	Mathematics
Supersaturated solution	Prompt + model weights	Initial conditions
Nucleation	First tokens generated	Symmetry breaking
Unit cell	Learned pattern/template	Repeating structural unit
Growth front	Generation position	Interface: ordered \| disordered
Temperature	Temperature parameter	Boltzmann T
Crystal face	Consistent style	Symmetry constraint
Defect	Hallucination	Broken local symmetry
Grain boundary	Topic change	Interface between ordered regions
Annealing	Fine-tuning / RLHF	Controlled thermal treatment
Polymorphism	Multiple valid completions	Degenerate ground states
Phase transition	Grokking / emergence	Order parameter discontinuity
Dendritic instability	Repetitive loops	Mullins-Sekerka instability
Epitaxy	In-context code generation	Growth on existing substrate
Twinning	Code duplication	Mirror symmetry defect

Chemical Bonding

The crystal analogy captures pattern propagation — but LLMs generating code do something more specific. They bind to existing APIs, extending code at attachment points like a molecule docking into a binding pocket. This is not accidental. It is the dominant mode of LLM code generation.

The binding hypothesis

An LLM generating code is not inventing from scratch. It identifies binding sites on existing APIs — function signatures, class interfaces, import patterns — and attaches new functional groups at those points. Induction heads are the molecular recognition apparatus. Training data frequency defines a Gibbs free energy landscape over the space of possible bindings.

Where the analogy works

Valence & binding sites

An API’s valence is its combining capacity — the number of parameters, required arguments, and configuration options it exposes. A function with three required parameters has valence 3. An unsatisfied required parameter is a radical — reactive until filled. A zero-argument pure function is a noble gas — inert, self-contained.

Chemical Property

Code Analog

Quality

Valence number

Parameter count / required args

Strong

Unsatisfied valence (radical)

Required param without default

Strong

Noble gas (full shell)

Zero-argument pure function

Strong

Multivalent atom (carbon)

Highly configurable API (pandas, React)

Strong

Coordination number

Number of callers / dependents

Moderate

Highly multivalent APIs serve as structural hubs in codebases, just as carbon serves as the backbone of organic chemistry. Libraries like pandas, React, and Express have high “carbon-like” valence.

Lock-and-key vs. induced fit

Fischer’s 1894 lock-and-key model maps to static type systems: the type signature is the lock, the argument is the key. The compiler is the molecular recognition apparatus. But Fischer was wrong about rigidity — Koshland’s 1958 induced fit model (enzyme adapts to substrate) maps to duck typing and dynamic dispatch.

Binding Model	Chemical Analog	Language Implementation	Error Rate
Rigid fit	Fischer’s lock-and-key	Rust borrow checker, Haskell types	Very low (compile-time)
Semi-rigid	Modern enzyme model	TypeScript strict, Go interfaces	Low
Induced fit	Koshland’s model	Python duck typing	Higher (runtime)
Promiscuous	Enzyme promiscuity	JavaScript type coercion	Very high

Just as enzymes with higher specificity produce fewer unwanted byproducts, languages with stricter type systems produce fewer runtime errors. And just as high-specificity enzymes are slower to evolve new functions, strictly-typed codebases are slower to adapt.

Bond types = coupling strength

Bond Type	Code Coupling	Example
Covalent	Direct call + shared mutable state	`obj.method()` with mutation
Ionic	Event-driven with typed contracts	TypeScript event emitter
Hydrogen	Interface/protocol conformance	Go interface, Python Protocol
Van der Waals	Shared conventions	JSON naming conventions across services
Metallic	Shared mutable global state	Global vars, shared DB

Covalent code bonds are hard to break — just like covalent chemical bonds. Van der Waals forces are individually weak but collectively enable gecko adhesion; naming conventions are individually minor but collectively enable codebases to function. Metallic bonding (delocalized electrons) maps to shared mutable state: rapid communication (fast reads), impossible to reason about locally.

The LLM as catalyst

A catalyst lowers activation energy without being consumed. The LLM does exactly this:

Lowers the barrier: Writing correct Stripe API integration requires reading docs, understanding auth flows, handling edge cases. The LLM collapses this barrier.
Not consumed: Weights don’t change during inference. It facilitates the bond without becoming part of the product.
Alternative pathway: Intent → describe → review replaces intent → read docs → write → debug → iterate.
Can be poisoned: Bad context binds to the LLM’s “active sites,” producing wrong outputs. Deprecated API usage rises from 25-38% to 70-90% when prompt context contains outdated references.

Wang et al. ICSE 2025, CodeHalu AAAI 2025

Allosteric regulation: action at a distance

In biochemistry, an allosteric effector binds at a site other than the active site, changing the protein’s conformation and indirectly altering its function. Software has precise analogs: config files, feature flags, environment variables, dependency injection, and the CSS cascade all change behavior at distant locations through indirect binding events.

Shocking connection: induction heads ARE molecular recognition

Anthropic’s induction heads (Olsson et al. 2022) implement fuzzy pattern matching: if the context contains [A][B]…[A], predict [B]. This is directly analogous to molecular recognition in biochemistry — binding doesn’t require exact shape matching, just sufficient complementarity. When the LLM sees requests.get(, induction circuits retrieve patterns that followed it in training data, recognizing the binding site and attaching the complementary functional group. Olsson et al. 2022, Anthropic 2025

Where the analogy is bogus

No conservation laws

Code can be created from nothing and destroyed without residue. A refactoring can reduce 1,000 lines to 100 without losing functionality. A prompt can produce 10,000 lines from 10 words. You cannot write balanced equations for code transformations.

No spatial locality

Chemical bonds depend on 1/r² force laws. Code has no physical space. Any two functions can call each other regardless of “distance.” Steric hindrance, bond angles, molecular geometry — all meaningless. (Exception: the LLM context window imposes a locality constraint, but it’s topological, not spatial.)

Trivially reversible

In chemistry, breaking a covalent bond requires energy proportional to bond strength. In code, git revert cleaves any bond at zero thermodynamic cost. The barriers are cognitive and economic, not physical.

No equilibrium

Chemical systems reach thermodynamic equilibrium. Software never does. Per Lehman’s Laws, a used system must be continually adapted or it degrades. Software is permanently far from equilibrium.

Non-fungible components

Every hydrogen atom is identical. No two functions are, even with the same signature. Two sort(list) -> list implementations with different algorithms are chemically “identical” (same valence) but computationally distinct.

Intentionality

Chemistry has no concept of purpose. Software is designed to accomplish goals. The teleological dimension of code has no chemical analog.

Hallucinated bonds: when the chemistry goes wrong

When an LLM can’t find a real binding site, it invents one — fabricating plausible API methods that don’t exist. The CodeHalu framework (AAAI 2025) categorizes these into four failure modes:

Hallucination Type	Chemical Analog	Rate
Resource: nonexistent API	Imaginary element	25-43% of API misuses
Naming: wrong method name	Wrong IUPAC name	29-41% of method calls
Mapping: wrong types	Isomer confusion	Pervasive
Logic: wrong behavior model	Wrong reaction mechanism	Hardest to detect

The generated code has the right “shape” — plausible function names, reasonable parameter patterns — but the bond target is imaginary. The De-Hallucinator (2024) mitigates this through iterative grounding, functionally identical to computational docking validation in drug design.

The free energy landscape of libraries

Training data creates a Gibbs free energy surface over possible code outputs. Popular libraries sit in deep energy wells; novel libraries face barriers. The “LLMs Love Python” study (2025) quantified this: NumPy imported unnecessarily in 48% of cases. Polars (faster pandas alternative) used in 0% of cases. Models contradict their own language recommendations 83% of the time.

Shocking connection: kinetic vs. thermodynamic products

In chemistry, the kinetically favored product (low activation energy, easy to reach) may not be the thermodynamically favored product (lowest total energy). LLM coding agents consistently produce the kinetic product — conservative, conventional code — rather than the thermodynamic product — potentially better but harder-to-reach architectural improvements. When Cursor tested agents with optimistic concurrency, they became risk-averse, “making only tiny safe changes.” They are trapped in local energy minima. Baez & Pollard 2017, arXiv:2503.17181

Full analysis: research/11-chemical-bonding.md (670 lines, 8,400 words)

What the Physics Predicts

If the correspondence is real — and the mathematics says it is — then results from crystal physics make testable predictions about LLMs:

1. Nucleation theory → prompt engineering

Critical nucleus size = minimum prompt length to "lock in" a direction. Too short → the generation wanders (weak supersaturation). A well-crafted prompt = heterogeneous nucleation on a prepared substrate — the barrier is lower.

2. Mullins-Sekerka → repetition failures

Fast growth is inherently unstable. The faster you push generation, the more susceptible to dendritic instability (repetitive patterns). Prediction: there exists an optimal generation speed that maximizes quality.

3. Grain boundary engineering → module interfaces

Maximizing low-Σ CSL boundaries dramatically improves material properties. Prediction: code quality depends on the types of interfaces between modules, not just module quality.

4. Mermin-Wagner → depth requirements

Continuous symmetry can't break in ≤2D. If model depth maps to dimensionality, very shallow networks can't learn certain types of order.

5. Quasicrystal theory → optimal architectures

The most interesting structures are projections from higher dimensions: ordered but not periodic, structured but not rigid. The best software architectures might be "projections" of simpler high-dimensional designs.

6. Naica cave → patience produces perfection

The largest crystals on Earth (12m selenite, Naica, Mexico) grew at 0.5mm per millennium under minimal supersaturation for 500,000 years. The most perfect code comes from low-temperature, long-duration growth — not from fast, high-energy sprints.

Sources & Deep Dives

This page distills 650KB+ of research across 11 chapters. The raw research lives in /crystal-code/research/.

Foundational papers

Autoregressive LMs are Secretly EBMs

Zhao et al. 2025 · arXiv:2512.15605

The formal bijection between autoregressive models and energy-based models.

Attention Is All You Need

Vaswani et al. 2017 · NeurIPS

The transformer architecture. Everything starts here.

Toy Models of Superposition

Elhage et al. 2022 · arXiv:2209.10652

How neural networks encode more features than dimensions.

Grokking as a First Order Phase Transition

Levi et al. 2024 · ICLR 2024

Proves the grokking transition is mathematically identical to crystallization.

Language Modeling Is Compression

Deletang et al. 2024 · ICLR 2024

Chinchilla 70B outperforms PNG and FLAC. Prediction = compression.

Training Compute-Optimal LLMs

Hoffmann et al. 2022 · arXiv:2203.15556

Chinchilla scaling laws. The thermodynamic limit formula for LLMs.

Hopfield Networks Is All You Need

Ramsauer et al. 2021 · ICLR 2021

Proves transformer attention = Hopfield energy minimization.

Circuit Tracing in Language Models

Anthropic 2025 · transformer-circuits.pub

Attribution graphs reveal how Claude performs multi-hop reasoning.

Crystal physics

Classical Nucleation Theory

Gibbs 1876, Volmer & Weber 1926, Becker & Doring 1935

ΔG* = 16πγ³/3(Δg_v)². The nucleation barrier.

Prenucleation Clusters

Gebauer, Volkel & Colfen 2008 · Science

Non-classical nucleation: stable clusters exist even below solubility.

BCF Theory: Spiral Growth

Burton, Cabrera & Frank 1951 · Phil. Trans.

How screw dislocations enable crystal growth at low supersaturation.

Mullins-Sekerka Instability

Mullins & Sekerka 1963-64 · J. Appl. Phys.

Why flat crystal interfaces become unstable and form dendrites.

Naica Cave Megacrystals

Garcia-Ruiz et al. 2007, Van Driessche et al. 2011

12m crystals grown at 0.5mm/millennium. Patience = perfection.

The 230 Space Groups

Fedorov 1891, Schoenflies 1891, Barlow 1894

Complete classification of 3D crystal symmetry. Hahn (ed.) 2005, Int. Tables Vol. A.

Chemical bonding & code binding

CodeHalu: Code Hallucinations via Execution-based Verification

Tian et al. 2025 · AAAI 2025

Taxonomy of code hallucinations: mapping, naming, resource, logic. Four failure modes of wrong bonds.

How and Why LLMs Use Deprecated APIs

Wang et al. 2025 · ICSE 2025

Deprecated API usage rises to 70-90% with outdated context. Catalyst poisoning quantified.

LLMs Love Python

arXiv:2503.17181 · 2025

NumPy imported unnecessarily in 48% of cases. Polars used 0%. The free energy landscape of library bias.

In-context Learning and Induction Heads

Olsson et al. 2022 · Anthropic

The match-and-copy circuit: molecular recognition in transformers.

Compositional Framework for Reaction Networks

Baez & Pollard 2017 · arXiv:1704.02051

Category theory bridge: chemical reaction composition and software composition share formal structure.

Mechanistic Interpretability of Code Correctness

arXiv:2510.02917 · 2024-2025

SAEs reveal code correctness as anomaly detection. F1=0.821 for error detection vs. 0.504 for correctness.

Statistical mechanics & information theory

Renormalization Group

Wilson 1971, Nobel 1982

Why microscopically different systems share the same critical exponents.

Replica Symmetry Breaking

Parisi 1979, Nobel 2021

The ultrametric energy landscape of spin glasses = loss landscape of NNs.

More Is Different

Anderson 1972 · Science

The constructionist hypothesis fails. Each level of complexity requires new laws.

Maximum Entropy Principle

Jaynes 1957 · Phys. Rev.

The Boltzmann distribution IS the MaxEnt distribution given energy constraints.

Dissipative Structures

Prigogine, Nobel 1977

Systems far from equilibrium self-organize into states MORE ordered than equilibrium.

Kolmogorov Complexity

Kolmogorov 1965, Li & Vitanyi 2008

The shortest program that produces x. Crystals: O(1). Glass: O(N).

Sphere Packing in 8 and 24 Dimensions

Viazovska 2016, Fields Medal 2022

E8 and Leech lattice are optimal. Proved using modular forms.

Wang Tiles & Undecidability

Wang 1961, Berger 1966

Whether tiles can tile the plane is undecidable. Tiling = halting problem.

Research chapters

The full research (9,018 lines, 624KB) is organized into 10 chapters:

#	Chapter	Key Topics
01	LLM Internals	Transformer math, mechanistic interpretability, softmax-Boltzmann proof
02	Crystal Growth	Nucleation theory, BCF growth, defect physics, 230 space groups
03	Higher Dimensions	E8, Leech lattice, quasicrystals, aperiodic tilings, codes ↔ lattices
04	Statistical Mechanics	Ising model, Landau theory, renormalization group, universality
05	Information Theory	Shannon, Kolmogorov, compression = prediction, crystal entropy
06	Symmetry Breaking	Goldstone theorem, Higgs mechanism, topological order, time crystals
07	Software as Matter	Phase diagram of software, tech debt as defects, design patterns as unit cells
08	Category Theory	Functors Cryst → Type, sheaf theory, Yoneda lemma, free energy principle
09	Non-Equilibrium	Dissipative structures, edge of chaos, Landauer's principle, Turing patterns
10	Energy Models Bridge	ARM-EBM bijection, Hopfield = attention, spin glasses, the formal proof
11	Chemical Bonding	API valence, lock-and-key vs. induced fit, bond types, LLM as catalyst, hallucinated bonds, where it’s bogus