Codebase Curator provides deep semantic understanding across 10 languages and file types, enabling powerful cross-language search and analysis.
- Extensions:
.ts,.tsx,.js,.jsx,.mjs,.cjs - Features:
- Babel AST parsing for accuracy
- JSX/TSX component understanding
- Arrow functions, async/await patterns
- Class inheritance tracking
- Import/export analysis
- Cross-references for function calls and instantiations
- Extensions:
.py - Features:
- Indentation-aware parsing
- Class method extraction with full paths (e.g.,
AuthService.authenticate) - Decorator tracking (
@login_required,@property) - Docstring extraction (multi-line)
- Magic method recognition (
__init__,__str__) - Constant detection (ALL_CAPS)
- Extensions:
.go - Features:
- Package-level understanding
- Interface and struct definitions
- Method receivers (
func (s *Service) Method()) - Channel operations
- Embedded types (composition)
- Import tracking
- Extensions:
.rs - Features:
- Trait definitions and implementations
- Macro definitions (
macro_rules!) - Lifetime parameter handling
- Derive attributes (
#[derive(Debug, Clone)]) - Module structure (
mod,pub mod) - Associated types and constants
- Extensions:
.swift - Features:
- Protocol definitions and conformance
- Extension tracking
- SwiftUI property wrappers (
@State,@Binding) - Access modifiers (
private,public,internal) - Computed properties
- Interface Builder annotations (
@IBOutlet,@IBAction)
- Extensions:
.sh,.bash,.zsh,.fish,.bashrc,.zshrc - Features:
- Shebang detection for files without extensions
- Function declarations
- Variable exports
- Alias definitions
- Trap commands
- Here-doc detection
- Command option parsing (
getopts)
- Extensions:
.json,.jsonc,.json5 - Special Handling:
- package.json: NPM scripts as functions, dependencies as imports
- tsconfig.json: Compiler options, path mappings
- General JSON: Hierarchical key extraction
- Features:
- JSONC comment removal
- Nested object traversal
- Array handling
- Cross-references for file paths
- Extensions:
.yaml,.yml - Context-Aware Parsing:
- GitLab CI pipelines (
.gitlab-ci.yml) - GitHub Actions workflows
- Docker Compose services
- Kubernetes manifests
- Ansible playbooks
- GitLab CI pipelines (
- Features:
- Multi-line string handling
- Anchor and alias support
- Indentation-based context
- Extensions:
.toml - Special Handling:
- Cargo.toml: Rust package configuration
- pyproject.toml: Python project configuration
- Features:
- Table and nested table support
- Array of tables
- Multi-line strings
- Inline table extraction
- Extensions:
.env,.env.*(e.g.,.env.local,.env.production) - Features:
- Variable categorization (database, auth, API, etc.)
- Sensitive value masking (passwords, tokens, keys)
- Comment preservation
- Multi-line value support
- Cross-references for ports and URLs
Concept groups work across ALL languages:
# Find ALL authentication patterns in any language
smartgrep group auth
# Results might include:
# - TypeScript: authenticate(), useAuth(), AuthProvider
# - Python: @login_required, authenticate_user()
# - Go: func Authenticate(), type AuthToken
# - Rust: impl Auth for User
# - Config: JWT_SECRET, AUTH_URL, oauth settings# Python decorators
smartgrep "@" --file "*.py"
# Go interfaces
smartgrep "type.*interface" --file "*.go"
# Rust traits
smartgrep "trait" --file "*.rs"
# Swift protocols
smartgrep "protocol" --file "*.swift"
# NPM scripts
smartgrep "scripts" --file "package.json"
# Docker services
smartgrep "services:" --file "docker-compose*.yml"# Find all uses of a function across languages
smartgrep refs "authenticate"
# See who imports what
smartgrep refs "AuthService"
# Track configuration usage
smartgrep refs "DATABASE_URL"All language extractors implement this interface:
interface LanguageExtractor {
canHandle(filePath: string): boolean
extract(content: string, filePath: string): ExtractionResult
}
interface ExtractionResult {
definitions: SemanticInfo[]
references: CrossReference[]
}To add support for a new language:
- Create a new extractor in
src/packages/semantic-core/src/extractors/ - Implement the
LanguageExtractorinterface - Add it to
SemanticService.tsextractors array - Export it from
src/packages/semantic-core/src/index.ts
- Incremental Indexing: Only changed files are re-processed
- Streaming Parsers: Large files are processed in chunks
- Parallel Extraction: Multiple files processed concurrently
- Smart Caching: Semantic index persisted between runs
| Language | Functions | Classes | Variables | Imports | Comments | Cross-Refs |
|---|---|---|---|---|---|---|
| TypeScript/JS | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Python | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Go | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Rust | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Swift | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Shell | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ |
| JSON | ❌ | ❌ | ✅ | ✅* | ❌ | ✅ |
| YAML | ❌ | ❌ | ✅ | ❌ | ✅ | ✅ |
| TOML | ❌ | ❌ | ✅ | ❌ | ✅ | ✅ |
| .env | ❌ | ❌ | ✅ | ❌ | ✅ | ✅ |
*JSON imports are dependencies in package.json
-
Use type filters for precise searches:
smartgrep "process" --type function --file "*.py"
-
Combine patterns for complex searches:
smartgrep "async&test" --type function # Async test functions
-
Leverage concept groups for broad searches:
smartgrep group error --max 20 # All error handling patterns -
Check cross-references before refactoring:
smartgrep refs "OldClassName" # See all usages before renaming
We're considering adding support for:
- Java/Kotlin
- C#/.NET
- Ruby
- PHP
- C/C++
- Elixir
- Clojure
Have a language request? Open an issue!