Code Made Invisible
0.0% success — The only language where every frontier model fails completely.
What if only invisible characters mattered?
Whitespace (Brady & Morris, 2003) is a Turing-complete language where only spaces, tabs, and linefeeds have meaning. Everything else is ignored as comments.
Your code is invisible. You can hide working programs inside any text document. The void becomes the program.
Instructions are sequences: S S T T L (push number), T L S S (output char). Twenty-four total instructions.
Stack machine with three components:
Value Stack: LIFO with arbitrary-precision integers
Call Stack: Separate stack for function calls (enables recursion)
Heap: Sparse map for persistent storage
Operations: push/pop, arithmetic, heap access, labels + jumps, I/O.
Every tested model—GPT-5.2, Gemini 3 Pro, O4-mini, Qwen3, Kimi K2—achieves exactly 0%. No other esoteric language is completely unsolved. Whitespace is the hardest computational challenge for LLMs.
BPE and SentencePiece tokenizers collapse whitespace for efficiency. The difference between S S (push 2) and S S S (push 3) is one space—but the model's vocabulary cannot represent this. It's architectural blindness, not a training gap.
Code doesn't need to be visible to execute. You can embed Whitespace programs in Python comments, HTML whitespace, email signatures. Proves that syntax is orthogonal to semantics.
EsoLang-Bench (2026) tested five frontier models across 80 problems. Results:
Models achieving 85-95% on standard Python benchmarks drop to 0-11% on esoteric languages. Whitespace is the absolute floor—zero models can solve even Easy-tier problems.
Each esoteric language tests a different failure mode:
Brainfuck: 6.2% — Logic errors in sequential reasoning
Whitespace: 0.0% — Tokenization failure (can't see the code)
Unlambda: 1.2% — Can't compose functions without examples
Befunge: 11.2% — Spatial reasoning breakdown
Whitespace is unique: other languages fail at reasoning, Whitespace fails at perception.