A web-based interface for experimenting with and tuning Semantic Chunking settings. This tool provides a visual way to test and configure the semantic-chunking library's settings to get optimal results for your specific use case. Once you've found the best settings, you can generate code to implement them in your project.
- Real-time text chunking with live preview
- Interactive controls for all chunking parameters
- Visual feedback for similarity thresholds
- Model selection and configuration
- Results download in JSON format
- Code generation for your settings
- Example texts for testing
- Dark mode interface
- Syntax highlighting of JSON results and code samples
- Line wrapping toggle for JSON results
- Node.js (v18 or higher recommended)
- npm (comes with Node.js)
- Clone the repository:
git clone https://github.com/jparkerweb/semantic-chunking.git- Navigate to the webui directory:
cd semantic-chunking/webui- Install dependencies:
npm install- Start the server:
npm start- Open your browser and visit:
http://localhost:3000- Document Name: Name for your input text
- Text to Chunk: Your input text to be processed
- Max Token Size: Maximum size for each chunk (50-2500 tokens)
- Similarity Threshold: Base threshold for semantic similarity (0.1-1.0)
- Similarity Sentences Lookahead: Number of sentences to look ahead when calculating similarity (1-10)
- Dynamic Threshold Bounds: Lower and upper bounds for dynamic similarity threshold adjustment
- Combine Chunks: Enable/disable chunk combination phase
- Combine Chunks Similarity Threshold: Threshold for combining similar chunks
- Embedding Model: Choose from various supported embedding models
- DType: Select the data type for the model, affecting precision and performance (e.g.,
fp32,fp16,q8). - Device: Choose the processing device (
cpuorwebgpu).
- Return Token Length: Include token count in results
- Return Embedding: Include embeddings in results
- Chunk Prefix: Add prefix to chunks (useful for RAG applications)
- Exclude Chunk Prefix in Results: Remove prefix from final results
The Advanced Merge Settings section (collapsed by default) provides fine-grained control over the chunk optimization algorithm:
- Max Merges Per Pass: Absolute limit on merges per iteration (default: 500)
- Max Optimization Passes: Maximum iterations before stopping (default: 100)
- Merges Per Pass (%): Percentage of candidates to process per pass (default: 40)
- Uncapped Candidate Threshold: Below this count, all candidates merge (default: 12)
These settings are for advanced users tuning chunk quality. The algorithm sorts merge candidates by similarity and processes them in priority order across multiple passes for better global optimization.
Use the provided example texts to test different scenarios:
similar.txt: Text with high semantic similarity between sentencesdifferent.txt: Text with low semantic similarity between sentences
- View chunked results in real-time
- See chunk count, average token length, and processing time
- Download results as JSON
- Get generated code with your current settings
The web UI is built with:
semantic-chunkinglibrary for text processing- Express.js for the backend
- Vanilla JavaScript (ES6+) for the frontend
- CSS3 for styling
This project is licensed under the MIT License - see the LICENSE file for details.
If you enjoy this package please consider sending me a tip to support my work 😀
