Whitespace

Code Made Invisible

EsoLang-Bench 2026

0.0% success — The only language where every frontier model fails completely.

The Question

What if only invisible characters mattered?

Whitespace (Brady & Morris, 2003) is a Turing-complete language where only spaces, tabs, and linefeeds have meaning. Everything else is ignored as comments.

Consequence

Your code is invisible. You can hide working programs inside any text document. The void becomes the program.

How It Works

Three Tokens

Space → S
Tab   → T
LF    → L

Instructions are sequences: S S T T L (push number), T L S S (output char). Twenty-four total instructions.

Execution Model

Stack machine with three components:

Value Stack: LIFO with arbitrary-precision integers

Call Stack: Separate stack for function calls (enables recursion)

Heap: Sparse map for persistent storage

Operations: push/pop, arithmetic, heap access, labels + jumps, I/O.

Why It Matters

1. The Only Unsolved Language

Every tested model—GPT-5.2, Gemini 3 Pro, O4-mini, Qwen3, Kimi K2—achieves exactly 0%. No other esoteric language is completely unsolved. Whitespace is the hardest computational challenge for LLMs.

2. Tokenization Blindness

BPE and SentencePiece tokenizers collapse whitespace for efficiency. The difference between S S (push 2) and S S S (push 3) is one space—but the model's vocabulary cannot represent this. It's architectural blindness, not a training gap.

3. Representation ≠ Computation

Code doesn't need to be visible to execute. You can embed Whitespace programs in Python comments, HTML whitespace, email signatures. Proves that syntax is orthogonal to semantics.

The Research

EsoLang-Bench (2026) tested five frontier models across 80 problems. Results:

Whitespace: 0.0% (all models, all strategies)
Befunge-98: 11.2% (best: GPT-5.2)
Unlambda: 1.2% (best: GPT-5.2)
Brainfuck: 6.2% (best: GPT-5.2)

The Gap

Models achieving 85-95% on standard Python benchmarks drop to 0-11% on esoteric languages. Whitespace is the absolute floor—zero models can solve even Easy-tier problems.

Connection to Other Languages

Each esoteric language tests a different failure mode:

Brainfuck: 6.2% — Logic errors in sequential reasoning

Whitespace: 0.0% — Tokenization failure (can't see the code)

Unlambda: 1.2% — Can't compose functions without examples

Befunge: 11.2% — Spatial reasoning breakdown

Whitespace is unique: other languages fail at reasoning, Whitespace fails at perception.