This file contains important information for Claude working on this codebase.
Codebase Curator is an AI-powered codebase analysis system that enables Claude to deeply understand and work with any codebase through the MCP (Model Context Protocol).
- Two-Claude Architecture - One Claude (Curator) helps another Claude (Coding) understand codebases
- Session Persistence ✅ - Fixed! Sessions now properly maintain context across commands using --resume
- Dynamic Timeouts ✅ - Different tools get different timeouts (Task: 10min, Bash: 5min, Read: 2min)
- Smart Grep 🚀 - Semantic code search with concept groups, AND/OR/NOT searches, and cross-references
- 🔥 Hierarchical Hash Tree ✅ - Incremental indexing with Bun.hash() for lightning-fast file change detection
- 🎯 Live Monitoring ✅ - Real-time codebase overview dashboard with unique file tracking
- MCP Tool Discovery ✅ - Help Claudes discover smart-grep with compelling examples
The project is now organized as a monorepo with packages:
-
Packages (for distribution):
src/packages/semantic-core/- Core semantic indexing engine- Semantic analysis and indexing
- File change detection (HashTree)
- Incremental indexing
- Configuration management (exclusions, patterns)
- Language extractors (TypeScript, Python, Go, Rust)
src/packages/smartgrep/- Standalone semantic search packagesrc/packages/codebase-curator/- Full suite package (future)
-
Services (shared business logic):
src/services/curator/- Curator-specific servicesCuratorService.ts- Main orchestration serviceCuratorProcessService.ts- Manages Claude CLI processesCuratorPrompts.ts- Contains prompts for Curator Claude
src/services/session/- Session managementSessionService.ts- Handles conversation history (corruption issue FIXED!)
src/services/indexing/- Now part of semantic-core packagesrc/services/semantic/- Now part of semantic-core package
-
Tools (CLI interfaces):
src/tools/smartgrep/- Smart grep CLI with completions and man pagesrc/tools/monitor/- Real-time monitoring CLIsrc/tools/curator-cli/- Curator command-line interface
-
MCP Servers (AI interfaces):
src/mcp-servers/codebase-curator/- MCP server for Coding Claude
-
Shared (common utilities):
src/shared/config/- Configuration managementsrc/shared/types/- Shared TypeScript typessrc/shared/utils/- Common utilities
-
MCP Server Logging
- Use
console.error()for logging in MCP contexts, NOTconsole.log() - All stdout must be valid JSON for the MCP protocol
- Use
-
Session Persistence ✅ FIXED
- Claude CLI creates immutable sessions - each resume creates a new session ID
- We properly save and load the latest session ID
- Context is preserved even though IDs change
-
Dynamic Timeouts
- Implemented in
CuratorProcessService.getDynamicTimeout() - Task: 600s, Bash: 300s, Read: 120s, LS/Glob: 60s
- Implemented in
-
Testing Commands
# Run specific tests bun test tests/testScript.test.ts # Run all tests bun test
-
🔥 Incremental Indexing System ✅ IMPLEMENTED!
- HashTree.ts: Bun.hash() for fast file change detection with 500ms debouncing
- IncrementalIndexer.ts: Only reprocesses changed files, silent mode for clean output
- Live Monitoring: Real-time dashboard showing unique files changed (not duplicate events)
- Now part of
@codebase-curator/semantic-corepackage for reusability
# Live monitoring with codebase overview bun run monitor watch --overview # Static codebase analysis bun run monitor overview # Technical status and integrity checks bun run monitor status
-
🔍 Smart Grep - Semantic Code Search ✅ FULLY FEATURED!
- Concept Groups:
smartgrep group authsearches ALL auth patterns - Advanced Patterns: AND (
&), OR (|), NOT (!), regex (/pattern/) - Type Filters:
--type function,class,variable,string,comment - Cross-References:
smartgrep refs "functionName"shows all usages - Changes Impact:
smartgrep changesanalyzes uncommitted changes - 20+ Concept Groups: auth, error, api, database, cache, etc.
- 📖 Story Mode:
smartgrep storyextracts narrative patterns from strings
# List all concept groups bun run smartgrep group list # Search concept group bun run smartgrep group error --type function # Advanced searches bun run smartgrep "error&string" # AND search bun run smartgrep "login|signin|auth" # OR search bun run smartgrep "!test" --type function # NOT search # Find references bun run smartgrep refs "processPayment" # Analyze uncommitted changes bun run smartgrep changes # Full impact analysis bun run smartgrep changes --compact # One-line risk assessment # Framework-specific searches (NEW!) # IMPORTANT: When using Bash tool, use SINGLE QUOTES for $ symbols and special characters bun run smartgrep '$state' # Find Svelte 5 runes (MUST use single quotes!) bun run smartgrep '$derived' # Find Svelte derived runes bun run smartgrep "onMount" # Find Svelte lifecycle hooks bun run smartgrep "defineProps" # Find Vue composition API bun run smartgrep "client:load" # Find Astro client directives bun run smartgrep '{#if' # Find Svelte template directives (single quotes!) # DO NOT escape with backslash - it returns 0 results! # BAD: bun run smartgrep "\$state" # This won't work! # GOOD: bun run smartgrep '$state' # This works! # Story Mode - Extract narrative patterns from codebase bun run smartgrep story # Full codebase story analysis # Shows: # - Process flows (how things work step by step) # - Error scenarios (what can go wrong and recovery) # - System boundaries (external APIs, DBs, files) # - Recurring patterns (retry, validation, lifecycle)
- Concept Groups:
- Check for
console.log()calls that should beconsole.error() - Verify all stdout output is valid JSON
- Look for MaxListenersExceeded warnings (already fixed with setMaxListeners)
- Check cache directory permissions (falls back to temp dir if needed)
- Process Spawning: The curator spawns a separate Claude CLI process for analysis
- Session Reuse: First overview takes ~2 minutes, subsequent questions are instant
- Anthropic Caching: API caching reduces costs for repeated context
- Streaming: Files are streamed, never fully loaded into memory
- 🚀 Incremental Performance: Hash tree enables sub-second updates by only processing changed files
- Bun-Native Speed: Bun.hash() + file watching provides near-instant change detection
- Smart-Grep Cache: Semantic index persisted, instant searches after first index
- Emergent Understanding: Discover patterns, don't prescribe them
- Language Agnostic Core: Analysis algorithms work across languages
- Modular Extension: Easy to add new tools without breaking existing ones
- Practical Focus: Provide actionable insights, not academic analysis
- Two-Claude architecture with MCP
- Session persistence with --resume flag
- Dynamic timeouts for different tools
- Smart-grep with full semantic search
- Concept groups with intuitive
groupcommand - Hierarchical hash tree with Bun.hash()
- Incremental indexing with debouncing
- Live monitoring dashboard
- MCP tool discovery helpers
- .curator directory exclusion
- Unique file tracking (vs event counts)
- Multi-language support (TypeScript, Python, Go, Rust)
- Professional shell completions for all CLI tools
- Human-friendly curator CLI with chat mode
- Story Mode - narrative extraction from codebase strings
- Enhanced Monitoring: More detailed code metrics
- Performance Optimization: Parallel indexing
- More Languages: Java, C#, Ruby, PHP, etc.
// Root package.json
{
"workspaces": ["src/packages/*"]
}- @codebase-curator/semantic-core - Core indexing engine
- @codebase-curator/smartgrep - Semantic search CLI
- @codebase-curator/codebase-curator - Full suite (coming soon)
# NPM Package (when published)
npm install -g @codebase-curator/smartgrep
# Standalone Binary (future releases)
curl -L https://github.com/RLabs-Inc/codebase-curator/releases/latest/download/smartgrep-macos-arm64
# Development
bun install # Installs workspace dependencies# Search for literal term
bun run smartgrep "handleAuth"
# Rebuild semantic index
bun run smartgrep --index
# Find references
bun run smartgrep refs "apiClient"
# Extract codebase story
bun run smartgrep story# List all available groups
bun run smartgrep group list
# Search built-in concept group
bun run smartgrep group auth
# Add custom concept group
bun run smartgrep group add payments charge,bill,invoice,transaction
# Search your custom group
bun run smartgrep group payments --type function
# Remove custom group
bun run smartgrep group remove payments# AND search (must contain both)
bun run smartgrep "error&handler"
# OR search (contains any)
bun run smartgrep "login|signin|auth"
# NOT search (exclude term)
bun run smartgrep "!test" --type function
# Regex search
bun run smartgrep "/handle.*Event/" --regex# Single type
bun run smartgrep "auth" --type function
# Multiple types
bun run smartgrep "user" --type function,class
# All types: function, class, variable, string, comment, import# Concept group + type filter + sorting
bun run smartgrep group error --type function --sort usage
# Pattern + file filter + max results
bun run smartgrep "service" --file "*.ts" --max 10
# Compact output for scanning
bun run smartgrep group api --compact
# Custom group with filters
bun run smartgrep group myapi --type class,function --sort usageSmart Grep now defaults to compact summary mode specifically designed for Claudes:
# Default behavior - Compact summary (200-300 tokens)
bun run smartgrep "authService"
# Full detailed output when needed (2000-3000 tokens)
bun run smartgrep "authService" --fullWhy This Matters:
- 90% reduction in context usage - More searches before hitting limits
- Focused information - Definition, signature, top usage, breaking changes
- Smart suggestions - Next searches based on current results
- Instant answers - No scrolling through hundreds of lines
What You Get in Compact Mode:
- Primary Definition with full signature (constructor/parameters)
- Top 3 Usage Locations with actual code context
- Breaking Changes - Specific functions that call this code
- Patterns Detected - async/await, errors thrown, related terms
- Next Suggestions - Contextual follow-up searches
Context Management Tips:
- Use compact mode (default) for exploration and quick lookups
- Use
--fullonly when you need to see ALL occurrences - Chain searches using the suggested "NEXT" commands
- Combine with filters to narrow results:
--type function --file "*.ts"
Custom groups are saved to .curatorconfig.json in your project root:
{
"customGroups": {
"payments": ["charge", "bill", "invoice", "transaction"],
"frontend": {
"name": "frontend",
"description": "Frontend-specific patterns",
"emoji": "🎨",
"terms": ["component", "props", "state", "render", "ui"]
}
}
}Remember: The goal is to help AI assistants write code that truly fits into existing codebases, not just syntactically correct code in isolation.