Memory Systems

How agents remember. Five architectures from PostgreSQL clusters to markdown files.

Core Finding

The simplest systems (markdown files + grep) and the most complex (3-tier chronicles, 4-channel retrieval) converge on human-readable storage. Production data validates this: Manus ($2-3B Meta acquisition), Claude Code ($2.5B run-rate), OpenClaw (310k stars) all use markdown as primary memory. ElizaOS's pgvector is the only horizontally-scalable option for multi-tenancy. Memory architecture determines what scale you can serve.

The Five Approaches

Arizuko: Zero Infrastructure

Storage: diary/YYYYMMDD.md + facts/<topic>.md
Retrieval: grep/ripgrep + LLM context window
Embeddings: None

Deterministic keyword search. Files are git-versioned, human-readable, zero vendor lock-in. No database, no embeddings, no vector search.

Production validation: Claude Code ($2.5B revenue), OpenClaw (310k stars) started this way.

When to use: Personal assistant, small teams (<10 people), <1k files. Scales to 10k with SQLite FTS5. Upgrade when multi-tenant or semantic search at scale needed.

OpenClaw: Hybrid Search

Storage: SQLite + sqlite-vec + FTS5
Retrieval: Vector similarity + BM25 keyword, weighted merge
Embeddings: OpenAI, Gemini, local (node-llama)

Hybrid search catches exact keyword matches that pure vector misses. Embedding cache by content hash. File watcher syncs memory/ and sessions/ directories. Single SQLite file, no server.

When to use: Local-first personal assistant. Hybrid retrieval (70% vector + 30% keyword default). Single-process, cannot share across channels.

ElizaOS: Horizontally Scalable

Storage: PostgreSQL + pgvector (or PGLite for local)
Retrieval: Vector similarity only
Embeddings: Multiple dimensions (256-3072), pluggable providers

Built-in relationship tracking, room/world scoping, multi-agent coordination. Only vector search, no keyword fallback.

When to use: Multi-tenant systems serving thousands of users. Only horizontally-scalable option (PostgreSQL replication). Tradeoff: vector-only retrieval can miss exact keyword matches.

Hindsight: Multi-Channel Fusion

Storage: FAISS/ChromaDB + epistemic networks
Retrieval: 4-channel (semantic + BM25 + graph + temporal) with RRF fusion
Performance: 91.4% on LongMemEval (validated)

Epistemic networks separate world/experience/opinion knowledge. Graph spreading activation with configurable decay per hop. Disposition-aware synthesis (CARA reflect adjusts retrieval by skepticism/literalism/empathy 1-5).

When to use: Complex reasoning requiring graph relationships, temporal context, validated performance. ~5-8k LOC novel architecture. Not designed for multi-tenancy.

Not Compared Above

NanoClaw

CLAUDE.md files per group. Zero database. Model's context window IS the retrieval. Elegant for one group, cannot share knowledge across tenants.

Muaddib

3-tier markdown chronicles (short/mid/long term). Human-readable, auditable. Each channel's memory walled in its own QEMU micro-VM. Zero cross-channel recall.

Decision Matrix

Constraint	Recommended System
Multi-tenant (1000+ users)	ElizaOS (only horizontally-scalable option)
Personal assistant (<1k files)	Arizuko (zero-infra markdown)
Keyword precision required	OpenClaw (hybrid vector + BM25)
Complex reasoning (graph relations)	Hindsight (4-channel + epistemic nets)
Audit trail (compliance/debugging)	Arizuko, NanoClaw, Muaddib (markdown + git)
Intuitive UX > performance	MemPalace (spatial metaphor)

Key Tradeoffs

Markdown vs Vector: Unit Economics

With prompt caching (10x cheaper), reading markdown context costs less than embedding + vector search for <10k files. Claude 3.5 Haiku cached reads: $0.30/M vs OpenAI text-embedding-3-small: $0.02/M. But embeddings scale better beyond 10k files.

Hybrid Search Advantage

OpenClaw's BM25 + vector catches queries like "that conversation about the Redis migration" that pure vector search misses. Default 70% vector, 30% keyword. Configurable weights.

Multi-Tenancy Bottleneck

ElizaOS is the only system designed for horizontal scaling (PostgreSQL replication). SQLite (OpenClaw) is single-process. Markdown files (Arizuko, NanoClaw) are per-group. Hindsight/MemPalace are per-agent. Memory architecture determines deployment scale.

Benchmark Paradox (MemPalace)

96.6% measures baseline ChromaDB, not palace features. Room filtering degrades to 89.4% (-7.2%), compression to 84.2% (-12.4%). Lesson: test features ON vs OFF, not just final score.

Production Validation

The markdown-first approach (Arizuko, NanoClaw, Muaddib) has production validation:

Claude Code: $2.5B revenue within Anthropic's $19B annualized run-rate, MEMORY.md per project, no embeddings
OpenClaw: 310k GitHub stars, started with JSONL files, added vector search later (files remain primary)

Hindsight's 91.4% on LongMemEval validates multi-channel fusion (semantic + BM25 + graph + temporal). ElizaOS's PostgreSQL + pgvector is the only proven horizontally-scalable option for multi-tenant deployments.

When to Upgrade

From markdown to SQLite FTS5: >1k files, need faster keyword search

From SQLite to PostgreSQL: Multi-tenant (>100 users), need horizontal scaling

From keyword to hybrid: Need semantic similarity + exact keyword precision

From hybrid to multi-channel: Complex reasoning requiring graph relations, temporal context, validated performance