Experiments

Tractable Research Program

TL;DR

Three phases: (1) accelerate BFF complexity through environmental structure, (2) introduce external resources to measure within-lifetime adaptation, (3) embed ARC-like tasks to test abstraction emergence. Each phase has code, metrics, and expected outcomes.

Sections

Phase 1: Complexity · Phase 2: Coupling · Phase 3: Abstraction · Metrics · Code

Prerequisites

Node.js 18+ and npm
Understanding of BFF language
Basic TypeScript knowledge

Get the code:


git clone https://krons.fiu.wtf/bff/ts/

cd ts && npm install && npm run build

Phase 1: Accelerating Complexity

Goal

Break the plateau. Make complexity grow faster and further than baseline BFF.

Experiment 1.1: Spatial Heterogeneity

Hypothesis

Resource gradients create evolutionary "hotspots" where novel traits emerge faster.

Implementation:

Divide soup into regions (e.g., 4 quadrants)
Each region has different "nutrients"—tape values that aid replication
Programs that migrate between regions face different selection pressures


// Modify soup.ts: add region-specific fitness bonus

getRegionBonus(x: number, y: number): number {

  const region = Math.floor(x * 2) + Math.floor(y * 2) * 2;

  const bonusChar = ['+', '-', '>', '<'][region];

  return this.code.includes(bonusChar) ? 1.2 : 1.0;

}

Metrics:

Diversity index (Shannon entropy over program population)
Niche specialization (programs concentrated in "home" regions)
Migration rate (programs crossing region boundaries)

Expected outcome: Higher sustained diversity than uniform soup. Specialists emerge for each region.

Experiment 1.2: Host-Parasite Coevolution

Hypothesis

Arms races drive complexity. Parasites force hosts to evolve defenses; hosts force parasites to evolve exploits.

Implementation:

Mark some programs as "parasites" (can't replicate alone, must hijack host)
Parasite success = uses host's output for own replication
Host success = replicates while rejecting parasite interference


// Parasite detection: does program read from external tape?

isParasitic(prog: Program): boolean {

  const result = BFF.exec(prog.code);

  return result.inputReads > result.selfReads * 0.5;

}

Metrics:

Arms race cycles (alternating dominance between host/parasite lineages)
Complexity growth rate (bits of functional information)
Time to first "immunity" (hosts that reject all current parasites)

Expected outcome: Based on Avida research, expect 2-3× complexity increase vs baseline.

Experiment 1.3: Island Model

Hypothesis

Isolated populations explore different fitness peaks. Periodic migration spreads innovations.

Implementation:

Run N independent soups in parallel
Every M epochs, migrate top K programs between islands
Use ring topology: island i sends to island (i+1) mod N


// Island migration

migrateIslands(islands: Soup[], topK: number): void {

  const migrants = islands.map(s => 

    s.programs.sort((a,b) => b.replicated - a.replicated).slice(0, topK)

  );

  islands.forEach((s, i) => {

    const source = (i - 1 + islands.length) % islands.length;

    s.programs.push(...migrants[source].map(p => ({...p})));

  });

}

Metrics:

Inter-island diversity (how different are the populations?)
Innovation spread time (how long until a successful mutation reaches all islands?)
Global vs local optima (does migration help escape plateaus?)

Phase 2: Environmental Coupling

Goal

Introduce external resources that programs can sense and manipulate. Measure whether organisms adapt within their lifetime, not just across generations.

Experiment 2.1: External Resource Mapping

Hypothesis

If programs can affect external state that influences their replication, they may develop "behaviors" beyond pure self-copying.

Implementation:

Reserve tape region [200-255] as "environment"
Environment contains "food" values that vary over time
Programs that read food addresses and write to "claim" addresses get replication bonus


// Environmental food that changes every N epochs

updateEnvironment(tape: Uint8Array, epoch: number): void {

  const phase = Math.floor(epoch / 100) % 4;

  const foodAddr = 200 + phase * 10;

  tape.fill(0, 200, 256);

  tape[foodAddr] = 255; // current food location

}

Metrics:

Food tracking rate (do programs "follow" moving resources?)
Environment-dependent behavior (same program acts differently based on tape state)

Experiment 2.2: Predictable Patterns

Hypothesis

If environmental changes follow learnable patterns, organisms that anticipate outcompete those that merely react.

Implementation:

Environment follows simple pattern: A → B → C → D → A (cycle)
Programs get bonus for being at the next food location before it activates
This requires "prediction"—internal state that tracks pattern

Metrics:

Anticipation rate (programs at next location before transition)
Reaction time (epochs between environment change and program response)
Pattern encoding (can we find internal state tracking the cycle?)

Experiment 2.3: Within-Lifetime Adaptation

Hypothesis

True learning = same organism improves performance during execution. Not evolution—the same program gets better.

Implementation:

Track individual program execution over time
Measure "efficiency" = successful outputs / total instructions executed
Does efficiency improve as the program runs longer?


// Track execution efficiency over time

measureAdaptation(prog: Program): number[] {

  const efficiencies = [];

  for (let window = 100; window <= 1000; window += 100) {

    const result = BFF.exec(prog.code, [], window);

    efficiencies.push(result.output.length / window);

  }

  return efficiencies; // should increase if learning

}

Key metric: Efficiency slope. Positive slope = within-lifetime improvement. Zero/negative = no learning.

Phase 3: Abstraction Tasks

Goal

Embed simplified ARC-like tasks into the soup. Test whether evolved programs can solve transformation puzzles and transfer to novel variants.

Experiment 3.1: Simple Transformations

Setup

Environment presents input → output pairs. Programs that produce correct output get replication bonus.

Task examples (simplified ARC):

Increment: Input [1,2,3] → Output [2,3,4]
Reverse: Input [1,2,3] → Output [3,2,1]
Double: Input [1,2,3] → Output [2,4,6]


// Score program on transformation task

scoreTransform(prog: Program, input: number[], expected: number[]): number {

  const result = BFF.exec(prog.code, input);

  let matches = 0;

  for (let i = 0; i < expected.length; i++) {

    if (result.output[i] === expected[i]) matches++;

  }

  return matches / expected.length;

}

Experiment 3.2: Transfer Learning Test

The Critical Test

Train on task variants A, B, C. Test on unseen variant D. Does performance transfer?

Example (increment family):

Training: increment-by-1, increment-by-2, increment-by-3
Test: increment-by-4 (never seen)

If programs solve unseen variants, they've abstracted the concept of incrementing, not just memorized specific cases.

Metrics:

Training accuracy (expected to be high)
Transfer accuracy (the key measure—should be above chance)
Generalization gap (training - transfer accuracy)

Experiment 3.3: Novelty Search for Abstraction

Alternative Approach

Instead of optimizing for task performance, use novelty search. Reward behavioral diversity.

Novelty metric:

\[ \rho(x) = \frac{1}{k} \sum_{i=1}^{k} \text{dist}(x, \mu_i) \]

Where \(\mu_i\) are k-nearest neighbors in behavior space. Higher sparseness = more novel behavior.

Behavior characterization:

Not just output, but how program produces it
Tape access patterns, loop structures, instruction frequencies
Behavioral "fingerprint" that captures program's strategy

Hypothesis: Novelty search finds programs that solve tasks via abstraction rather than memorization, because memorization is "easy" (not novel) while abstraction is "rare" (novel).

Measurement Framework

Distinguishing Learning from Evolution

The Hard Problem

How do we know if a program is "learning" vs just being a product of evolution?

Operational definition:

Signal	Evolution	Learning
Timescale	Across generations	Within lifetime
Entity	Population improves	Individual improves
Mechanism	Selection + variation	Internal state update
Transfer	Offspring inherit	Same entity applies elsewhere

Key Metrics Summary

Phase 1 Metrics (Complexity)

Diversity index: Shannon entropy over population
Functional complexity: Bits of information about environment
Plateau time: Epochs until complexity growth stops

Phase 2 Metrics (Adaptation)

Efficiency slope: Does same program improve over execution?
Anticipation rate: Programs at next location before transition
Pattern encoding: Internal state tracking environmental cycles

Phase 3 Metrics (Abstraction)

Transfer accuracy: Performance on unseen task variants
Generalization gap: Training accuracy - transfer accuracy
Behavioral novelty: k-NN sparseness in behavior space

Getting Started

Repository Structure


bff/ts/

├── src/

│   ├── bff.ts      # BFF interpreter

│   ├── soup.ts     # Primordial soup simulation

│   ├── cli.ts      # Command line interface

│   └── index.ts    # Library exports

├── package.json

└── tsconfig.json

Quick Start


# Clone and build

git clone https://krons.fiu.wtf/bff/ts/

cd ts && npm install && npm run build


# Run basic simulation

node dist/cli.js simulate --size 200 --epochs 5000 --verbose


# Interactive REPL

node dist/cli.js repl

Modify for Experiments

Each experiment requires modifications to soup.ts. Fork the repo, create a branch for each experiment:


git checkout -b exp-1.1-spatial-heterogeneity

# Edit src/soup.ts

npm run build

node dist/cli.js simulate --config experiments/spatial.json

Next Steps

💻 BFF Source Code Full TypeScript implementation 🎯 Evidence Criteria What would confirm/refute 📚 Reading List Background literature