8000 GitHub - tacit-code/indexer: AST-based indexer for 8 languages. Creates minified project abstractions respecting .gitignore. Hooks auto-update PROJECT_INDEX.json outside AI context. Outputs to gitignored directory. Enables precise code targeting without context pollution. Real-time watching, parallel processing, multiple exports.
[go: up one dir, main page]

Skip to content

AST-based indexer for 8 languages. Creates minified project abstractions respecting .gitignore. Hooks auto-update PROJECT_INDEX.json outside AI context. Outputs to gitignored directory. Enables precise code targeting without context pollution. Real-time watching, parallel processing, multiple exports.

Notifications You must be signed in to change notification settings

tacit-code/indexer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

indexer-ai

High-performance universal code indexer optimized for AI assistants and modern development tools.

npm version npm downloads Build Status License: MIT Node

Production Ready: Enterprise-grade performance with Worker Threads and async I/O Performance: 3.6x faster indexing with parallel processing (10,000 files in ~25s) Clean Codebase: Major cleanup completed - removed 260+ instances of dead code

🚀 Latest Updates (September 2025)

Code Quality Improvements

  • Dead Code Removal: Eliminated 67 unused functions, 145 unused exports, 48 unused imports
  • Parser Consolidation: Merged 3 Python parsers into 1 unified Tree-sitter parser
  • VS Code Extension: Removed to focus on core indexing functionality
  • Logger Migration: Replaced 398 console.log statements with proper Logger class
  • Syntax Fixes: Fixed all cleanup-related syntax errors across 9 files

Performance Enhancements (v2.0.2)

  • Worker Threads: Parallel parsing using all CPU cores
  • Async I/O: Non-blocking file operations throughout
  • Node.js 20 LTS: Latest runtime optimizations
  • ES2023 Target: Modern JavaScript features for better performance

Performance Benchmarks

Files Before After Improvement
100 ~2s ~0.7s 2.8x faster
1,000 ~15s ~5s 3x faster
10,000 ~90s ~25s 3.6x faster
Memory 300MB 180MB 40% reduction

Features

Core Capabilities

  • Lightning Fast: Index 10,000+ files in ~25 seconds with Worker Threads
  • Multi-Language: 9 languages - JavaScript, TypeScript, Python, Go, SQL, GraphQL, YAML, Astro
  • AI-Optimized: Built for Claude, GPT-4, and other LLM assistants
  • Real-time Updates: File watching with automatic index refresh
  • Organization-Wide: Index entire organizations and monorepos automatically
  • Cross-Repository: Track API calls and dependencies between services
  • Modern Architecture: ES2023, async/await, Worker Threads

🆕 Advanced Features

  • Call Graph Analysis: Bidirectional function call tracking and dead code detection
  • AI Compression: 50-70% size reduction with token-aware optimization
  • Worker Thread Pool: Automatic parallel processing for large codebases
  • Streaming Support: Handle massive files without memory issues
  • Impact Analysis: Track cascading effects of code changes

Export Formats

  • JSON: Complete index with compression options (standard, compressed, minified)
  • Markdown: Human-readable documentation
  • Mermaid: Interactive diagrams for VS Code/Cursor
  • GraphViz: Professional dependency graphs
  • ASCII: Terminal-friendly visualizations

🎉 Now Available on npm!

Install globally in seconds:

npm install -g indexer-ai

# Use the ultra-short command (4 chars!)
idxr

Quick Start

Requirements

  • Node.js >= 20.0.0 (LTS recommended)
  • npm or yarn
  • 4GB RAM recommended for large codebases

Installation Prerequisites

Linux/WSL Requirements

For tree-sitter and native dependencies to compile:

# Ubuntu/Debian/WSL
sudo apt-get update
sudo apt-get install -y build-essential python3

# macOS (if needed)
xcode-select --install

Installation

# Install globally from npm (recommended)
npm install -g indexer-ai

# Quick usage with short command
idxr                    # Shortest command (4 chars!)
indexer                 # Alternative command
indexer-ai              # Full package name

# Or install from source
git clone https://github.com/tacit-code/indexer.git
cd indexer
yarn install && yarn build
npm install -g .

Basic Usage

# Smart mode - analyzes everything automatically
idxr                     # Quick 4-character command!
# or
indexer

# Index entire organization (all repos in subdirectories)
cd /your/organization
idxr

# Index specific project with Worker Threads (automatic for >50 files)
idxr scan /path/to/project

# Index with specific options
indexer scan --parallel 8 --output custom-index.json

# Watch mode with real-time updates
idxr watch

# Interactive chat with Claude about your codebase
idxr chat

# Query the index
idxr query "function.*Auth" --fuzzy

Multi-Repository & Organization Indexing

The indexer automatically detects and analyzes entire organization structures, monorepos, and multi-repository setups without configuration.

Organization-Wide Indexing

# Index your entire organization
cd /path/to/organization  # Parent directory containing all repos
indexer                    # Automatically indexes ALL repositories

# Example: Clone Global organization structure
/clone-global/
├── indexer/         # This tool
├── backend/         # API services
├── frontend/        # Web applications
├── mobile/          # Mobile apps
├── skills/          # Microservices
└── data-ops/        # Data pipelines

# Run from parent directory:
cd /clone-global
indexer  # Creates comprehensive cross-repository knowledge graph

Automatic Detection

The SmartIndexer automatically detects:

  • Monorepo structures: lerna.json, yarn workspaces, pnpm workspaces
  • Multi-repository setups: Multiple .git directories
  • Service architectures: Microservices, APIs, frontends
  • Shared dependencies: Cross-repository imports and libraries

Cross-Repository Analysis

Tracks relationships across your entire codebase:

  • Frontend → Backend: API calls, GraphQL queries, REST endpoints
  • Service → Service: Inter-service communication, event streams
  • Shared Libraries: Import/export dependencies, version tracking
  • Database Schemas: Cross-service data flows and dependencies

Generated Outputs

.indexer-output/current/
├── PROJECT_INDEX.json          # Combined index of ALL repositories
├── service-graph.json          # Complete dependency graph
├── multi-repo-overview.md      # Visual architecture diagram
├── multi-repo-interactive.html # Interactive dependency explorer
└── [repo-name]/               # Individual repository indexes
    └── PROJECT_INDEX.json     # Repo-specific index

Use Cases

  • Architecture Documentation: Auto-generate system architecture diagrams
  • Dependency Analysis: Find all consumers of an API endpoint
  • Impact Assessment: See affected services before making changes
  • Code Navigation: Jump between repos following API calls
  • AI Context: Give LLMs complete understanding of your entire system

Benefits for AI Assistants

When you provide the generated PROJECT_INDEX.json to Claude, GPT-4, or other AI assistants:

  • Complete Context: AI understands your entire organization from a single file
  • Cross-Repo Intelligence: AI can trace API calls across service boundaries
  • Accurate Suggestions: AI knows exact function signatures and dependencies
  • Reduced Token Usage: Compressed index uses 50-70% fewer tokens than raw code
  • System-Wide Refactoring: AI can suggest changes considering all affected services

Performance Configuration

Optimize for Your System

# Maximum performance (uses all CPU cores)
indexer scan --parallel $(nproc)

# Memory-constrained environment
indexer scan --parallel 2 --max-memory 256

# Disable Worker Threads for debugging
indexer scan --no-workers

# Incremental mode for large codebases
indexer scan --incremental

Environment Variables

# Performance
INDEXER_PARALLEL=8           # Number of parallel workers
INDEXER_MAX_MEMORY=1000      # Max memory in MB
INDEXER_USE_WORKERS=true     # Enable Worker Threads

# Node.js 20+ optimizations
NODE_OPTIONS="--max-old-space-size=4096"  # 4GB heap
UV_THREADPOOL_SIZE=16        # Larger thread pool

# AI Features
ANTHROPIC_API_KEY=sk-ant-... # Claude integration

Architecture

Modern Tech Stack

  • Runtime: Node.js 20 LTS with native ES modules support
  • Language: TypeScript 5.3+ with ES2023 target
  • Parallelization: Worker Threads for CPU-intensive parsing
  • Async I/O: Promises-based file system operations
  • Parsing: Tree-sitter (Python), Babel (JS/TS), native AST parsers

Performance Architecture

┌─────────────────────────────────────┐
│         Main Thread                  │
│  ┌─────────────────────────────┐    │
│  │   Orchestration Layer       │    │
│  └──────────┬──────────────────┘    │
│             │                        │
│  ┌──────────▼──────────────────┐    │
│  │   Worker Thread Pool        │    │
│  │  ┌────┐ ┌────┐ ... ┌────┐  │    │
│  │  │ W1 │ │ W2 │     │ Wn │  │    │
│  │  └────┘ └────┘     └────┘  │    │
│  └─────────────────────────────┘    │
│                                      │
│  ┌─────────────────────────────┐    │
│  │   Async I/O Layer           │    │
│  └─────────────────────────────┘    │
└─────────────────────────────────────┘

API Server

REST API

# Start API server
indexer api --port 4000

# Endpoints
POST /api/index          # Build index
GET  /api/index/status   # Get status
POST /api/query          # Query index
GET  /api/stats          # Statistics
POST /api/ai/analyze     # AI analysis

GraphQL API

query {
  index {
    files {
      path
      functions {
        name
        complexity
      }
    }
  }
}

Advanced Features

Multi-Repository Analysis

# Analyze entire organization
indexer multi-repo /path/to/org --cross-dependencies

# Generate knowledge graph
indexer export mermaid --multi-repo

AI-Powered Analysis

# Security scanning
indexer ai security-scan

# Bug prediction
indexer ai predict-bugs --confidence 0.8

# Code smell detection
indexer ai detect-smells

Call Graph Analysis

# Find unused code
indexer analyze dead-code

# Trace execution paths
indexer analyze call-paths main

# Detect circular dependencies
indexer analyze circular

Configuration

.indexer.yml

version: 2
performance:
  parallel: 8
  useWorkers: true
  maxMemory: 1000
  cache: true

include:
  - "**/*.{js,jsx,ts,tsx,py,go,sql}"

ignore:
  - "**/node_modules/**"
  - "**/dist/**"

export:
  formats:
    json:
      compression: true
      maxSize: 10MB

Integrations

IDE Support

  • Cursor: Native integration with AI features
  • WebStorm: Via REST API
  • Vim/Neovim: LSP integration

CI/CD

  • GitHub Actions: Pre-built workflows
  • GitLab CI: Docker images available
  • Jenkins: Plugin support
  • CircleCI: Orb available

Monitoring

  • Datadog: APM and metrics integration
  • New Relic: Performance monitoring
  • Sentry: Error tracking
  • Grafana: Custom dashboards

Architecture

Core Components

  • SmartIndexer (src/core/smart-indexer.ts) - Orchestrates all features automatically, detects project structure
  • Indexer (src/core/indexer.ts) - Main indexing engine, coordinates parsers, builds dependency graphs
  • CacheManager (src/core/cache-manager.ts) - LRU cache with 500MB limit, TTL-based expiration (1 hour)
  • FileWatcher (src/core/watcher.ts) - Real-time monitoring with Chokidar, 300ms debounced updates
  • WorkerPool (src/core/worker-pool.ts) - Thread pool for parallel parsing, automatic scaling
  • CallGraphAnalyzer (src/core/call-graph-analyzer.ts) - Function call tracking, dead code detection

Design Patterns

  • Parser Plugin System - Extensible language support via common interface
  • Factory Pattern - Dynamic parser selection based on file extension
  • Worker Thread Pool - Parallel processing with automatic scaling
  • Event-Driven Updates - Real-time index updates via EventEmitter
  • Repository Pattern - Abstracted storage through CacheManager

API Server

REST API (Port 4000)

# Start API server
idxr api

# Or with authentication
idxr api --enable-auth true

Endpoints:

  • POST /api/index - Trigger indexing
  • POST /api/ai/analyze - Comprehensive AI analysis
  • POST /api/ai/predict-bugs - Bug prediction
  • POST /api/ai/analyze-security - Security scanning
  • GET /api/health - Health check
  • GET /api/stats - Statistics

GraphQL API

# GraphQL endpoint
http://localhost:4000/graphql

Full schema with queries, mutations, and subscriptions for real-time updates.

WebSocket Support

// Real-time updates
const ws = new WebSocket('ws://localhost:4000/ws');
ws.on('message', (data) => {
  console.log('File updated:', data);
});

AI Integration

Claude SDK Integration

The indexer includes deep integration with Claude via the Anthropic SDK:

# Set up Claude
export ANTHROPIC_API_KEY="sk-ant-..."

# Run AI analysis
idxr analyze --ai

Specialized AI Agents

< 6D38 /a>
  • SecurityAgent - OWASP compliance, vulnerability detection, security patterns
  • PerformanceAgent - Optimization opportunities, memory leaks, bottlenecks
  • ArchitectureAgent - Pattern detection, SOLID principles, best practices
  • TestingAgent - Coverage analysis, test generation, quality metrics
  • Streaming analysis with real-time updates
  • Session management for continued conversations
  • Automatic bug prediction and security scanning
  • Code quality assessment and refactoring suggestions
  • Architecture recommendations and pattern detection

External Integrations

Slack Integration

Real-time notifications and monitoring:

# Configure Slack
export SLACK_TOKEN="xoxb-..."
export SLACK_CHANNEL="#dev-monitoring"

# Enable Slack bot
idxr slack --enable

Features:

  • Bug detection with severity classification
  • Code quality alerts
  • Performance degradation notifications
  • Security vulnerability alerts

Linear Integration

Automatic ticket creation for issues:

export LINEAR_API_KEY="lin_api_..."
export LINEAR_TEAM_ID="TEAM_ID"

Datadog Monitoring

Performance metrics and APM:

export DD_API_KEY="..."
export DD_APP_KEY="..."

# Send metrics
idxr monitor --datadog

Advanced Configuration

Configuration File (.indexerrc.yaml)

version: 2
name: my-project

include:
  - "**/*.{js,jsx,ts,tsx,py,go,sql}"
ignore:
  - "**/node_modules/**"
  - "**/dist/**"
  - "**/.git/**"

performance:
  parallel: true
  workers: 8
  cache: true
  maxFileSize: 2MB
  timeout: 120000

ai:
  model: claude-opus-4-1-20250805
  maxConcurrentAgents: 4
  sessionTimeout: 1800000

export:
  outputDirectory: .indexer-output
  formats:
    json:
      compression: true
      minify: false

integrations:
  slack:
    enabled: true
    channel: "#dev-alerts"
  linear:
    enabled: true
    autoCreate: true
  datadog:
    enabled: true
    tags:
      - "env:production"
      - "service:indexer"

Performance Tips

  1. Use Worker Threads for codebases >50 files (automatic)
  2. Enable incremental mode for large projects
  3. Configure parallel workers based on CPU cores
  4. Use compression for large indexes
  5. Enable caching for repeated operations
  6. Set appropriate memory limits for your system
  7. Use --no-workers flag in WSL to avoid native module issues

CLI Command Reference

Core Commands

# Initialize and scan (smart mode - recommended)
idxr                        # Automatic everything
idxr init                   # Initialize with prompts
idxr scan [path]           # Scan specific directory
idxr scan --parallel 8     # Use 8 worker threads
idxr scan --no-workers     # Disable workers (WSL fix)

# File watching
idxr watch                 # Monitor changes
idxr watch --debounce 500  # Custom debounce (ms)

# Query and search
idxr query "pattern"       # Search functions/classes
idxr query ".*Controller" --regex  # Regex search
idxr stats                 # Show statistics

# Export formats
idxr export json           # JSON format
idxr export markdown       # Markdown documentation
idxr export graphviz       # DOT graph
idxr export mermaid        # Mermaid diagram
idxr export ascii          # Terminal visualization

# API server
idxr api                   # Start API server
idxr api --port 5000       # Custom port
idxr api --enable-auth     # With authentication

# AI features
idxr analyze --ai          # AI code analysis
idxr chat                  # Interactive Claude chat
idxr predict-bugs          # Bug prediction
idxr security-scan         # Security analysis

# Integrations
idxr slack --enable        # Enable Slack bot
idxr monitor --datadog     # Send Datadog metrics

# Utilities
idxr clean                 # Clear cache
idxr health                # Health check
idxr config --list         # Show configuration
idxr validate              # Validate index
idxr migrate               # Migrate old indexes

Command Options

Most commands support these common flags:

--config <path>     # Custom config file
--output <path>     # Output location
--format <type>     # Output format
--quiet            # Suppress output
--verbose          # Detailed logging
--debug            # Debug mode
--profile          # Performance profiling
--no-cache         # Disable caching
--force            # Force operation
--help             # Show help

Troubleshooting

Common Issues

Out of Memory

# Increase Node.js heap size
NODE_OPTIONS="--max-old-space-size=8192" idxr scan

Worker Thread Issues (WSL/Linux)

# Disable workers to avoid native module errors
idxr scan --no-workers

# Or rebuild native modules
npm rebuild tree-sitter

Slow Performance

# Check Worker Thread status
idxr debug --workers

# Profile performance
idxr scan --profile

# Limit file size processing
idxr scan --max-file-size 1MB

Parser Errors

# Use fallback parser
idxr scan --parser-fallback

# Skip problematic files
idxr scan --skip-errors

# Exclude specific patterns
idxr scan --exclude "*.min.js"

Installation Issues

# Clean install
rm -rf node_modules package-lock.json
npm install

# Global installation from source
npm run build
npm install -g .

# Permission issues
sudo npm install -g indexer-ai --unsafe-perm

Development

Building

npm install
npm run build     # Compiles TypeScript and Worker scripts
npm run dev       # Watch mode
npm test          # Run tests

Testing

npm test              # Run all tests (~25% coverage)
npm run test:unit     # Unit tests
npm run test:e2e      # End-to-end tests (Cypress)
npm run test:coverage # Generate coverage report

Current Test Status: 6 test suites passing with ~25% coverage. Target: 80%.

Contributing

See CONTRIBUTING.md for development guidelines.

License

MIT © Clone Global

Support


Built for the future of AI-assisted development 🚀

About

AST-based indexer for 8 languages. Creates minified project abstractions respecting .gitignore. Hooks auto-update PROJECT_INDEX.json outside AI context. Outputs to gitignored directory. Enables precise code targeting without context pollution. Real-time watching, parallel processing, multiple exports.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0