Functions All the Way Down
1.2% success — Pattern matching collapses without compositional reasoning.
What if we remove variables entirely?
Unlambda (David Madore) is pure combinator calculus. No variable names, no lambda abstractions—just function application and primitive combinators.
Everything is a function. Numbers are functions. Booleans are functions. Data structures are functions. Computation happens through reduction, not substitution.
i — Identity: ix → x
k — Constant: kxy → x (keep first, discard second)
s — Substitution: sxyz → xz(yz) (composition)
S and K alone are Turing-complete. Everything else is convenience.
Programs execute through pattern matching:
No variable substitution. Just combinators transforming into other combinators.
Booleans: true = k, false = ``ski
Numbers: Church numerals (iteration functions)
Pairs: Encoded as selector functions
Schönfinkel proved in 1924 that S and K are Turing-complete. Variables are human convenience, not computational necessity. Lambda calculus can be mechanically translated to SK combinators.
With Python (millions of examples), LLMs fake reasoning through pattern matching. With Unlambda (thousands of examples), they achieve only 1.2% success. They can't compose functions from first principles—they need to have seen similar code before.
In Unlambda, the program structure directly determines what it computes. Every character matters. You can't generate syntactically valid but semantically wrong code—if the syntax is wrong, it won't compile.
EsoLang-Bench (2026) tested Unlambda across 80 problems. Best result: GPT-5.2 at 1.2%.
The Failure Mode: Most errors are compile-time failures. Models can't even produce valid combinator expressions, let alone correct ones.
Deep Nesting: Tracking arity through ``s``s``ski requires precise logical reasoning.
No Scaffolding: Training data is 5,000-100,000× scarcer than Python. Statistical learning breaks down.
This isn't a training problem—it's a fundamental limitation. Transformers do correlation mining, not deductive reasoning. Compositional logic requires genuine understanding, not pattern matching.
Each language reveals different failure modes:
Brainfuck: 6.2% — Imperative, stateful tape
Whitespace: 0.0% — Can't tokenize invisible syntax
Unlambda: 1.2% — Can't reason about function composition
Befunge: 11.2% — Can't navigate 2D space
Unlambda and Brainfuck are opposite paradigms: imperative mutation vs functional reduction. Both minimal, both Turing-complete, both defeat LLMs through different cognitive demands.