Transform codebases into queryable knowledge graphs. 71.5x fewer tokens per query.
Research completed: 2026-04-07
Pass 1 - Deterministic: tree-sitter extracts code structure (classes, functions, imports, call graphs) without LLM involvement. Fast, local, reproducible.
Pass 2 - Semantic: Claude subagents analyze docs, PDFs, images to extract concepts and relationships. Parallel execution, merged into NetworkX graph.
EXTRACTED: Directly discovered from source (imports, function calls, explicit mentions).
INFERRED: Reasonable deduction by Claude with confidence score 0.0-1.0. Example: "similar to X" → (A, X, SIMILAR_TO, 0.7).
AMBIGUOUS: Unclear or contradictory, flagged for human review.
Leiden community detection identifies clusters based on graph edge density, not vector embeddings. The graph structure itself is the similarity signal. Discovers natural modules in code, shared patterns across systems.
PreToolUse hooks let AI assistants query the graph before reading raw files. Git integration rebuilds graph on commits/branch switches. Platform-agnostic: Claude Code, Codex, OpenCode, OpenClaw, Factory Droid.
Scenario: Developer asks "How does the authentication module work?"
Without Graphify: LLM must read auth.py (2.3k tokens), user.py (1.8k), session.py (1.5k), middleware.py (900 tokens), config.py (600 tokens), tests (3k tokens) = ~10k tokens to find relevant context.
With Graphify: Query graph for "authentication" → returns subgraph: AuthModule → uses → {JWTHandler, SessionStore, UserModel} → called_by → {LoginEndpoint, RefreshEndpoint}. Graph response: ~140 tokens. Then read ONLY the 2-3 relevant files suggested by graph.
Result: 140 tokens (query) + 4k (targeted reads) = 4,140 tokens vs 10k. But averaged across many queries, graph structure reuse compounds: same subgraph answers "who calls auth?", "what depends on sessions?", etc. 71.5x is aggregate efficiency across diverse queries.
Our /agents hub analyzes 7 systems across ~30 pages. Graphify transforms this into a queryable knowledge base.
| Graph Element | Example Nodes | Example Relationships |
|---|---|---|
| Systems | OpenClaw, Muaddib, ElizaOS, Hermes, NemoClaw, Cline, Claude Code | NemoClaw extends OpenClaw |
| Patterns | Network isolation, doom loops, blueprints, context compaction (13 total) | Stripe implements network_isolation |
| Concepts | Subagents, memory, tools, security, routing, plugins | tools concept applies_to all_systems |
| Tech | tree-sitter, pgvector, QEMU, Landlock, seccomp, vis.js | ElizaOS uses pgvector |
| Implementations | Ring buffer hash, 70/20/10 truncation, Leiden clustering | doom_loop_detection seen_in brainpro |
Running Leiden on our agent research graph would likely discover these clusters based on edge density:
Muaddib, NemoClaw, Stripe, network isolation, QEMU, Landlock, seccomp. Dense connections around security and untrusted code execution.
OpenClaw, ElizaOS, pgvector, 24+ channels, plugin marketplace, multi-tenancy. Systems optimizing for audience reach.
Cline, Claude Code, doom loops, context compaction, MCP, interactive workflows. Single-user developer tools.
ElizaOS pgvector, OpenClaw hybrid search, Muaddib 3-tier chronicles, NanoClaw CLAUDE.md. Different memory strategies.
Query: "Which systems prioritize multi-tenancy?"
Graph response: OpenClaw (24+ channels, one process), ElizaOS (PostgreSQL horizontal scale), Hermes (11+ channels). Muaddib explicitly rejects multi-tenancy (QEMU VMs are single-user).
Token cost: ~200 tokens vs 15k reading all system overviews.
Query: "Show me all isolation techniques and their tradeoffs."
Graph response: {Network isolation: cuts network, keeps filesystem} → Stripe. {QEMU micro-VMs: strongest isolation, max 8 concurrent} → Muaddib. {Landlock + seccomp + netns: sandboxes whole bot, not per-user} → NemoClaw. {bwrap: per-execution sandbox} → Claude Code.
Token cost: ~400 tokens vs 25k reading security sections across 7 systems.
Input: existing markdown in refs/, patterns.md, comparison.md. tree-sitter for code examples. Claude analyzes research notes. Manual seed: SYSTEMS.toml with metadata.
EXTRACTED from explicit statements. INFERRED from implicit connections (Claude analyzes "similar to", "inspired by"). Confidence scoring. Flag AMBIGUOUS for review.
Run Leiden clustering. Validate discovered clusters against intuitive categorization. Visualize with color-coded communities.
graph.html with vis.js interactive visualization. Click nodes, filter relationships, search. Embed on main hub page. Link to deep-dive pages.
Export graph.json. AGENTS_GRAPH.md with query patterns. PreToolUse hooks: query graph before file reads. Telegram bot: graph queries from chat.
Claude Code handles 34M codebase searches/week with Haiku Explore agents, not knowledge graphs. Understanding why reveals the tradeoffs.
Graph: 5-30 min build time before first query. User waits or works with stale graph.
Explore: Instant. First query answered in 3-5 sec on fresh clone. Zero friction onboarding wins.
Graph: Represents codebase at time T. Code changes → stale. Options: auto-rebuild (expensive), manual update (users forget), git hooks (miss uncommitted changes).
Explore: Always queries live filesystem. Sees uncommitted WIP, current branch, temp files. Zero sync needed.
At 34M runs/week: Haiku costs $17k/week (34M × 2k tokens × $0.25/M). Graph maintenance: 100k graphs × churn × rebuild cost >> $17k.
Anthropic's calculus: Stateless agents scale O(queries). Graphs scale O(users × codebases × churn).
Graph INFERRED relationships: Confidence 0.7 = 30% hallucination risk. "Similar to X" based on what?
Explore + ripgrep: Exact matches, 100% precision. Regex support, line-level accuracy. LLM interprets, doesn't search.
Graphs excel: Structural queries ("trace call graph A→B→C"). Transitive relationships.
Graphs fail: "Find TODO mentioning 'security'", regex patterns, git-aware queries, exact line matches. Explore handles all.
Graph build: 500k tokens ($0.125). Break-even: 10 queries. Average session: 3-5 queries.
Reality: For 95% of users, graph ROI is negative. Power users (100+ queries) are <5%.
Graphify's metric: Query "How does auth work?" → Without graph: read 6 files (10k tokens). With graph: 140 tokens. 71.5x savings!
Assumptions: (1) User would read ALL 6 files (unlikely—they'd grep first), (2) Graph correctly identifies relevant files (assumes perfect recall), (3) Graph query alone answers question (usually you still read code).
Realistic comparison: Grep "auth" → 20 matches, read 2-3 files → 5k tokens. Graph query (140) + read 2-3 files → 5.14k tokens. Real savings: ~0%.
Where 71.5x holds: Power users, 100+ queries, never reading raw code (just querying structure). Rare.
| Use Case | Why Graphs Win | Claude Code Alternative |
|---|---|---|
| Architectural queries | Transitive relationships (A→B→C→D). Trace call chains, module dependencies. | Multiple Explore calls iteratively (3-5 queries, 15 sec, $0.05 vs 1 graph query, instant, $0.0002). But only 1% of queries need this. |
| Cross-codebase patterns | Relationships span repos. "How do 7 agent systems handle memory?" | Claude Code scoped to single repo. Graphify's sweet spot: multi-repo research hubs (our /agents use case). |
| Concept discovery | Leiden clustering surfaces structure humans didn't label. | Doesn't try to solve. User asks "explain architecture" → Sonnet reads + synthesizes. Not automated discovery. |
Takeaway: Graphify and Claude Code solve different problems. Graphify: multi-repo research, architecture discovery, power users. Claude Code: instant onboarding, always-fresh, millions of users, typical usage (3-5 queries/session). For our /agents hub (7 systems, cross-cutting analysis), graphs win. For single-codebase daily workflow, Haiku Explore wins.
Sources: Claude Code Subagent Docs, DEV Community Analysis, Claude Code leaked source analysis.
| Component | Technology | Purpose |
|---|---|---|
| Code parsing | tree-sitter | Deterministic AST extraction without LLM |
| Semantic analysis | Claude vision | Extract concepts from docs, PDFs, images |
| Graph structure | NetworkX | Python graph library, persistence to JSON |
| Clustering | Leiden algorithm | Community detection via modularity optimization |
| Visualization | vis.js | Interactive HTML graph rendering |
| Platforms | Claude Code, Codex, OpenCode, etc | PreToolUse hooks, platform-agnostic integration |