Arizuko Research

Multi-tenant agent platform architecture, security model, coordination patterns, and skill development

System Model

Folders are agents. Each folder is a tenant: isolated memory, persona, skills, routes. MCP over unix socket. Tool calls go through `gated`; host controls what agents can reach. Defense-in-depth. Container isolation + DNS filtering + per-group MCP sockets + secret injection.

What brings you here?

🏗️ Architecture Tenancy, routing, groups/topics/sessions 🔒 Security Crackbox, DNS filtering, MCP isolation 🔄 Patterns Coordination, dispatch, resume tokens ⚡ Skills MCP tradeoffs, marketplace, evals

Architecture

Tenancy Model

Each folder is a tenant. Authority is scoped by path depth:

Tier	Folder Path	Authority	Tools Available
0	`root`	Unrestricted	All MCP tools, migrations, vhost management
1	`world/`	World-scoped	Routing, group creation, token issuance, observe
2	`world/team/`	Send-only	Reply, like, send_file, observe (read-only)

Folder depth determines tier. gated enforces grants at tool-call time. No privilege escalation — agents can only delegate to children or escalate to parent.

Execution Model

When a message arrives:

Gateway matches route predicates (platform=, chat_jid=, sender=)
Starts a Docker container for the matched folder
Mounts /home/node/ (group workspace), /workspace/web/ (optional web surface)
Bridges MCP unix socket from host into container
Claude Code agent runs; all tool calls go through gated via MCP socket
Container exits after turn completes; workspace persists

Containers are ephemeral. All state is in /home/node/ (volume-mounted) or messages.db (WAL, host-side SQLite).

Groups, Topics, Sessions

Groups are folders. Topics scope conversations within a group. Sessions are Claude Code's execution context. Fork a topic to branch a conversation while preserving parent context.

Routing

Routes are rows in routes table with seq (priority), match (key=glob predicates), and target (folder or :daemon:).

seq  match                          target
100  platform=telegram room=main    corp/sales
200  platform=slack                 corp/eng
300  chat_jid=web:*                 :daemon:webd

First match wins. gated resolves predicates at message-receive time. Engagement overrides routing for N turns after a reply.

Observable Architecture

Groups can observe other groups' inbound messages without becoming the active agent:

observe_group(source) — subscribe to another folder's messages
set_group_open(true) — expose this group to sibling ambient context
Observed messages surface in <observed> blocks (gateway-injected)

Use for parent monitoring children, sibling awareness, or root aggregation. Observed context is capped per turn (env: OBSERVE_WINDOW_MESSAGES, OBSERVE_WINDOW_CHARS). See spec 6/F.

Security

Threat Model

Arizuko assumes:

Agents are untrusted. LLM output can be malicious or compromised.
Tool calls are the attack surface. Every MCP tool call is a potential privilege escalation.
Network egress is risk. Agents can exfiltrate data if unrestricted.

Defense: isolation at container, network, MCP, and secret layers.

Defense-in-Depth

Primitive	Lifecycle	Isolation	Use
Group	Persistent	Folder boundary	Agent identity, memory, persona, skills
Topic	Transient	Session ID	One conversation thread, forkable
Session	Ephemeral	Claude Code session	LLM context window, reset via `/new`

Layer	Mechanism	Threat Mitigated
Container	Docker isolation, no `--privileged`	Process escape, host filesystem access
Network	Crackbox DNS filtering + allowlist	Exfiltration, C2 communication
MCP	Per-group unix socket, grants enforced by `gated`	Cross-tenant tool access, privilege escalation
Secrets	AES-256-GCM at rest, env-var injection, no disk persistence	Secret theft via disk access or container inspect

Crackbox

Per-folder egress sandbox. Wraps each container with:

DNS filtering: allowlist of permitted hostnames per folder
HTTPS via CONNECT: proxy intercepts TLS, validates hostname
Seccomp + Landlock: syscall filtering, filesystem isolation
No new privileges: --security-opt no-new-privileges

Agent can only reach what operator explicitly permits. Example allowlist: ["api.anthropic.com", "*.github.com", "pypi.org"]. See crackbox docs and SECURITY.md.

MCP Socket Isolation

Each group gets its own unix socket (/ipc/<group_id>.sock). gated brokers every tool call, enforces grants, and logs actions. Agents cannot reach other groups' sockets — Docker mounts only the agent's own socket.

Secrets Injection

Secrets are AES-256-GCM encrypted in secrets.db. At container start, gated decrypts and injects as env vars. No disk persistence inside container. Secrets never written to /home/node/ or logs.

Patterns

Agent-to-Agent Coordination

Three MCP tools for hierarchical coordination:

delegate_group(group, prompt, chatJid) — hand work down to a child. Child runs async, parent doesn't block.
escalate_group(prompt, chatJid) — hand work up to parent. Parent responds back through this child.
observe_group(source) — watch another group's messages without taking over.

Use delegate for specialist work. Use escalate when authority is needed. Use observe for ambient awareness (sibling monitoring, aggregation).

Topic Forking

fork_topic(parent, child) — create a new topic from another's session state. Child gets a fresh session ID but starts with parent's Claude Code context. Use for:

Side-conversations that need parent context
Parallel workstreams without polluting parent
Focused exploration that can later merge findings back

Dispatch Trigger Design

Skills can specify ALWAYS or NEVER triggers in SKILL.md:

ALWAYS:
- "download this video" → acquire
- "transcribe this" → acquire

NEVER:
- Static pages → do NOT use agent-browser, use acquire instead

Gateway reads these at session start and surfaces them in <skills> block. Agent uses them for fast dispatch without re-reading every SKILL.md.

Stateless Iteration Loops (Ralph)

Pattern for long-running tasks that survive session resets:

Write progress to ~/state/.json
Each turn: read state, process next batch, write state
On session reset: re-read state, continue from last checkpoint
No in-memory state — stateless across turns

Named after "ralph loop" (stateless iteration). Use for multi-page scrapes, batch processing, or any task that takes >1 session.

Resume Tokens

For APIs with pagination or long-running operations:

Store resume token in ~/state/ or ~/facts/
On next turn, check for token, resume from checkpoint
Example: GitHub API pagination (page=N), Stripe list cursors

Fingerprint-Based Change Detection

Detect file changes without storing full content:

Compute SHA-256 of file, store in ~/state/fingerprints.json
On next turn, recompute and compare
Only process changed files

Use for monitoring codebases, watching config files, or tracking external resources.

Multi-Agent Pipeline Orchestration

Chain multiple agents for staged processing:

ingest → classify → route → specialist → review → publish

Each stage is a separate group. Use delegate_group or direct message routing. Example: support intake → triage agent → specialist teams → summary agent → publish.

Skills

MCP vs In-Context Skills

Aspect	MCP Tools	In-Context Skills
Context window	Tool name + description only	Full SKILL.md in system prompt
Authority	Host-enforced grants	Agent-enforced (prompt-based)
State	Stateless, always available	Loaded at session start, persists
Best for	Platform operations (routing, groups, tokens)	Workflow patterns, domain knowledge, multi-step procedures

Rule: Use MCP for side effects (send message, create group, issue token). Use in-context skills for knowledge and workflow guidance.

Skill Development

Each skill is a directory under ~/.claude/skills/<name>/ with:

SKILL.md — description, workflow, ALWAYS/NEVER triggers
Optional: scripts, templates, reference files

Gateway reads SKILL.md at session start, injects into system prompt. Skills compose — agent can invoke multiple skills per turn.

Skill Marketplace Patterns

(Research in progress. Placeholder for skill discovery, versioning, dependencies, and performance evals.)

Skill discovery: local index, remote registry?
Versioning: semver, lock file?
Dependencies: skill-to-skill references?
Evals: measurable performance improvements?

Performance Improvements

Measurable improvements from skill-based workflows:

Dispatch latency: ALWAYS/NEVER triggers reduce skill-matching overhead
Context efficiency: tight skill descriptions vs bloated prompts
Eval-driven iteration: A/B test skill variants, track success rate