← Virtue

Unlambda

Functions All the Way Down

EsoLang-Bench 2026

1.2% success — Pattern matching collapses without compositional reasoning.

The Question

What if we remove variables entirely?

Unlambda (David Madore) is pure combinator calculus. No variable names, no lambda abstractions—just function application and primitive combinators.

Consequence

Everything is a function. Numbers are functions. Booleans are functions. Data structures are functions. Computation happens through reduction, not substitution.

How It Works

The Combinators

i — Identity: ix → x

k — Constant: kxy → x (keep first, discard second)

s — Substitution: sxyz → xz(yz) (composition)

S and K alone are Turing-complete. Everything else is convenience.

Reduction

Programs execute through pattern matching:

``s``ski`kix → ``ski(``kii)x → ``ski(k)x → `kx(`ix) → `kx(x) → k

No variable substitution. Just combinators transforming into other combinators.

Data as Functions

Booleans: true = k, false = ``ski

Numbers: Church numerals (iteration functions)

Pairs: Encoded as selector functions

Why It Matters

1. Variables Are Syntactic Sugar

Schönfinkel proved in 1924 that S and K are Turing-complete. Variables are human convenience, not computational necessity. Lambda calculus can be mechanically translated to SK combinators.

2. The Illusion Collapses

With Python (millions of examples), LLMs fake reasoning through pattern matching. With Unlambda (thousands of examples), they achieve only 1.2% success. They can't compose functions from first principles—they need to have seen similar code before.

3. Syntax = Semantics

In Unlambda, the program structure directly determines what it computes. Every character matters. You can't generate syntactically valid but semantically wrong code—if the syntax is wrong, it won't compile.

The Research

EsoLang-Bench (2026) tested Unlambda across 80 problems. Best result: GPT-5.2 at 1.2%.

The Failure Mode: Most errors are compile-time failures. Models can't even produce valid combinator expressions, let alone correct ones.

Deep Nesting: Tracking arity through ``s``s``ski requires precise logical reasoning.

No Scaffolding: Training data is 5,000-100,000× scarcer than Python. Statistical learning breaks down.

The Discovery

This isn't a training problem—it's a fundamental limitation. Transformers do correlation mining, not deductive reasoning. Compositional logic requires genuine understanding, not pattern matching.

Connection to Other Languages

Each language reveals different failure modes:

Brainfuck: 6.2% — Imperative, stateful tape

Whitespace: 0.0% — Can't tokenize invisible syntax

Unlambda: 1.2% — Can't reason about function composition

Befunge: 11.2% — Can't navigate 2D space

Unlambda and Brainfuck are opposite paradigms: imperative mutation vs functional reduction. Both minimal, both Turing-complete, both defeat LLMs through different cognitive demands.

Further Reading

📊 EsoLang-Bench Official benchmark (2026) 📄 Research Paper arXiv:2603.09678 🌐 Official Page David Madore's docs 🎓 SKI Calculus Theoretical foundation Back to Virtue Main hub