Designing Simplicity & Taste Tools for LLMs

AST-based style analysis for nudging code generation toward boring code

The Problem

LLMs generate functionally correct code that often violates the subtle style patterns that make code maintainable. They overuse abstractions, introduce premature complexity, and miss the "boring code" wisdom that experienced developers apply instinctively.

Current approaches:

Linting — catches syntax violations, not style wisdom
Static analysis — measures complexity after the fact
Manual prompting — "write simple code" is too vague to be actionable

What's missing: a tool that extracts structural style patterns from a codebase and converts them into natural language fragments that nudge LLMs toward the project's actual coding philosophy.

The Solution: Style Fragment Extraction

Scan a codebase's AST, classify structural patterns, and output natural language "style fragments" for LLM system prompts.

Example Output

style_fragments:
  - "Guard clauses preferred over nested ifs (87% early-return pattern)"
  - "Error handling: check-and-return, minimal nesting (Go if-err style)"
  - "Switch/match over if-else chains (3:1 ratio)"
  - "Functions average 25 LOC, max 80 — short and focused"
  - "Single return value + error, no multi-return beyond (val, err)"
  - "Flat control flow: avg nesting depth 1.2, max 3"

These fragments go into an LLM system prompt to nudge generated code toward the codebase's actual style.

What Makes This Novel

No existing tool does this. Current AST-based tools:

CodeRabbit — uses ast-grep patterns for code review context (closest production example)
Annotated-AST-For-LLM — 37 stars, JS/TS only, Babel AST + AI summaries (proof of concept)
CodeTF (Salesforce) — Tree-sitter extraction for LLM training (utility-level)
AFT (Agent File Toolkit) — 36 stars, Rust+TS, semantic editing for AI agents (very early)

None of these: scan codebase → extract structural patterns → output natural language fragments for prompting.

Technical Architecture

                    ┌─────────────┐
                    │  Tool CLI   │
                    └──────┬──────┘
                           │
              ┌────────────┼────────────┐
              │            │            │
        ┌─────▼─────┐ ┌───▼────┐ ┌─────▼─────┐
        │ regex      │ │ tree-  │ │ go/ast    │
        │ counter    │ │ sitter │ │ (Go only) │
        │ (baseline) │ │ (all)  │ │           │
        └─────┬──────┘ └───┬────┘ └─────┬─────┘
              │            │            │
              └────────────┼────────────┘
                           │
                    ┌──────▼──────┐
                    │  metrics +  │
                    │  style      │
                    │  fragments  │
                    └─────────────┘

Layer 1: Tree-sitter (multi-language AST)

Tree-sitter is GitHub's incremental parser with 80+ language grammars. It produces concrete syntax trees preserving all tokens.

Go bindings:

smacker/go-tree-sitter — 548 stars, ~34 bundled grammars (Go, Rust, Python, TS, C, Java), mature
tree-sitter/go-tree-sitter — 228 stars, official, dynamic grammar loading

Speed: ~91 SLOC/ms. A 6,000-line file parses in ~80ms. At 16 workers, 10k files × 200 LOC ≈ 1-2s.

Query language: S-expression patterns matching AST node types and fields.

;; guard clause detection
(if_statement
  consequence: (block
    (return_statement) @guard_return))

;; nested if detection
(if_statement
  consequence: (block
    (if_statement) @nested_if))

;; early return
(function_declaration
  body: (block
    (return_statement) @early_return
    (_)))

Layer 2: Pattern Classification

For each language, write Tree-sitter queries to detect structural patterns:

Signal	Query Pattern
Guard clause	`if` → single `return`/`continue`/`break`, no else
Nested if	`if` inside `if` consequence
Early return	`return` not at end of function body
Switch vs if-else	Ratio of switch/match to if-else chains
Error check-and-return	`if err != nil { return }` (Go), `?` (Rust), `catch` (TS)
Function length	Node span of function bodies
Parameter count	Function parameter list length
Callback nesting	Function literals inside function calls inside function calls

Layer 3: Style Fragment Generation

After classifying patterns across the codebase, generate natural language fragments:

def generate_fragments(pattern_stats):
    fragments = []

    # Guard clause preference
    if pattern_stats['guard_clause_ratio'] > 0.7:
        fragments.append(
            f"Guard clauses preferred over nested ifs "
            f"({pattern_stats['guard_clause_ratio']:.0%} early-return pattern)"
        )

    # Error handling style
    if pattern_stats['check_and_return_ratio'] > 0.8:
        fragments.append(
            "Error handling: check-and-return, minimal nesting (Go if-err style)"
        )

    # Function length
    avg_loc = pattern_stats['avg_function_loc']
    max_loc = pattern_stats['max_function_loc']
    fragments.append(
        f"Functions average {avg_loc} LOC, max {max_loc} — short and focused"
    )

    # Control flow complexity
    avg_nesting = pattern_stats['avg_nesting_depth']
    max_nesting = pattern_stats['max_nesting_depth']
    fragments.append(
        f"Flat control flow: avg nesting depth {avg_nesting:.1f}, max {max_nesting}"
    )

    return fragments

Boring Code Wisdom Integration

The tool should detect and surface boring code patterns:

Code deletion over abstraction — Detect: unused exports, dead branches, unreferenced functions. Fragment: "Delete unused code (N unused exports, M dead branches removed in recent commits)"
Copy before abstraction — Detect: Similar code blocks that aren't abstracted. Fragment: "Copy 2-3 times before abstracting (duplicated logic found in N places, not prematurely unified)"
Simple-mostly-right over complex-fully-correct — Detect: Cognitive complexity per function. Fragment: "Prefer simple solutions (avg cognitive complexity: N, max: M)"
State minimization — Detect: Stateful vs pure functions ratio. Fragment: "Minimize state (N% pure functions, state isolated to M modules)"

Implementation Roadmap

Phase 1: MVP (single language)

Integrate smacker/go-tree-sitter with Go grammar
Implement 5-6 core pattern queries (guard clause, nesting, function length)
Generate basic style fragments
CLI: tool analyze --lang=go ./src → outputs fragments as JSON

Phase 2: Multi-language

Add Rust, Python, TypeScript grammars
Language-specific pattern queries (Rust ?, Python context managers)
Unified fragment format across languages

Phase 3: LLM Integration

Auto-inject fragments into system prompts (VSCode extension, GitHub Copilot plugin)
Feedback loop: track generated code → measure adherence to fragments → refine queries
Community library of pattern queries per domain (web backends, CLIs, data pipelines)

Alternatives Considered

ast-grep

Rust CLI built on tree-sitter with 13.4k stars. Source-like pattern syntax + YAML rule combinators. More readable than S-expressions but subprocess-only (no Go library). Good secondary tool for complex rules.

rule:
  pattern: if $COND { return $VAL }
  not:
    has:
      kind: else_clause

semgrep / opengrep

Semgrep went proprietary late 2024. OpenGrep is the community fork (LGPL-2.1, 2.4k stars, 40+ languages). OCaml-based, no Go bindings. Heavier (~12s for 500k LOC Python). Overkill for style extraction — best for security rules.

Per-language native parsers

For a multi-language tool, per-language native parsers aren't practical from Go. go/ast is useful for type-aware Go-specific analysis alongside tree-sitter, but tree-sitter covers everything else.

Existing Tools Landscape

Codebase Context Extraction Tools

These tools pack codebases into AI-friendly formats but don't extract style patterns:

Tool	Approach	Limitation
Repomix	Packs codebase into XML/markdown/JSON	Raw context, no style analysis
CTX	Organizes codebase into structured docs	Metadata extraction, not patterns
Code2Prompt	Converts codebase to single prompt	File concatenation, no analysis
Codebase-Digest	AI-friendly packer with 60+ prompts	Generic templates, not codebase-specific

LLM Code Quality Evaluation Tools

Tool	What It Measures	Gap
SimCopilot	Scope sensitivity, contextual dependencies	Evaluates existing code, doesn't guide generation
Copilot Arena	User preferences via IDE (4.5M+ suggestions)	Post-generation feedback, not proactive guidance
CodeRAG-Bench	Whether retrieval improves generation	Measures RAG impact, not style adherence
Pylint/Ruff	PEP 8 compliance, naming conventions	Generic standards, not project-specific taste

Style Guidance Research

Few-shot prompting is the current state-of-the-art for style control:

2-5 examples influence formatting more than verbal descriptions (research)
Example selection matters — systematic methods outperform random picks
Combine concrete examples + explicit rules for best results

RAG for code — Retrieval-Augmented Generation retrieves relevant snippets from the codebase to inform generation (survey). Improves factual accuracy but doesn't extract or communicate style patterns explicitly.

Neural steering — Advanced research on identifying style-specific neurons and deactivating unwanted style patterns (paper). Requires fine-tuning access, not practical for most users.

The Gap This Tool Fills

What Exists vs What's Missing

Existing tools:

Extract codebase files for context (Repomix, Code2Prompt)
Provide generic style templates (Codebase-Digest prompts)
Rely on few-shot examples manually curated by developers
Evaluate generated code quality after the fact (Copilot Arena, Pylint)

Missing: Automated extraction of structural style patterns → natural language fragments → proactive LLM guidance

No tool automatically answers: "How does THIS codebase handle errors? Guard clauses or nested ifs? Pure functions or stateful objects?"

Research-Backed Best Practices

Effective LLM Style Guidance (Current State)

Custom instructions — Define conventions once, apply to all interactions (GitHub Copilot, Claude)
Few-shot examples — 2-5 representative implementations showing target style
Explicit directives — "Follow PEP 8" + persona ("senior Python developer who writes idiomatic code")
Codebase context — Provide surrounding code, relevant imports, function dependencies
RAG retrieval — Fetch similar code from the repo to ground generation

This tool automates steps 2-4 by extracting patterns and generating both examples and directives from the actual codebase.

Measuring Adherence

Research shows effective metrics combine:

Static analysis — Pylint scores, cognitive complexity (SonarSource)
Human evaluation — 1-5 scales on consistency, readability, style
LLM-as-judge — GPT-4 evaluation achieving near-human alignment
Automated pattern detection — Variable naming consistency (userData vs user_data within functions)

Open Questions

How to weight conflicting patterns? (e.g., 60% guard clauses, 40% nested ifs)
Should fragments be prescriptive ("always use guard clauses") or descriptive ("guard clauses preferred 87% of time")?
How to handle codebases with multiple styles (legacy vs new code)?
Can we detect anti-patterns and generate negative fragments? ("Avoid deeply nested callbacks — max nesting depth is 5, should be 3")
How do we measure if generated code actually improved? Need a feedback loop.
Can few-shot examples be auto-selected from the codebase using AST similarity to the generation task?

Comprehensive References

AST Analysis Tools

smacker/go-tree-sitter

548 stars, 34 bundled grammars

tree-sitter/go-tree-sitter

Official, 228 stars

Tree-sitter query syntax

Official documentation

ast-grep

13.4k stars, Rust CLI

ast-grep YAML rules

Documentation

OpenGrep

2.4k stars, community fork

go/ast stdlib

Go standard library

LLM Code Quality & Evaluation

CodeRabbit: AST-grep + LLM

Production example

SimCopilot

LLM code completion eval

Copilot Arena

Code LLM evaluation platform

CodeRAG-Bench

Retrieval augmentation eval

Cognitive Complexity

SonarSource paper

Style Guidance & Prompting Research

Few-Shot LLM Code Synthesis

Research on example selection

Show and Tell

Style control strategies

Style-Specific Neurons

Neural steering for LLMs

Prompting LLMs for Code

Guidelines paper

RAG for Code Survey

Retrieval-augmented generation

Repository-Level Prompts

Context generation for LLMs

Context Engineering

Anthropic guide

Codebase Context Tools

Repomix

AI-friendly codebase packer

CTX

Context Hub Generator

Code2Prompt

Codebase to prompt CLI

Codebase-Digest

60+ coding prompts

Context Generator (MCP)

Context as Code

Claude Code Prompts

Community prompt library

Practical Guides

Prompts for Code Generation

potpie-ai wiki

GitHub Copilot Prompting

Official documentation

JetBrains: AI Agent Guidelines

Coding guidelines for AI

Martin Fowler: Context Engineering

Coding agents guide

Cody Codebase Understanding

Sourcegraph blog

← krons.fiu.wtf

Research → Implementation