🍱 Semantic Chunking Web UI

A web-based interface for experimenting with and tuning Semantic Chunking settings. This tool provides a visual way to test and configure the semantic-chunking library's settings to get optimal results for your specific use case. Once you've found the best settings, you can generate code to implement them in your project.

Features

Real-time text chunking with live preview
Interactive controls for all chunking parameters
Visual feedback for similarity thresholds
Model selection and configuration
Results download in JSON format
Code generation for your settings
Example texts for testing
Dark mode interface
Syntax highlighting of JSON results and code samples
Line wrapping toggle for JSON results

Getting Started

Prerequisites

Node.js (v18 or higher recommended)
npm (comes with Node.js)

Installation

Clone the repository:

git clone https://github.com/jparkerweb/semantic-chunking.git

Navigate to the webui directory:

cd semantic-chunking/webui

Install dependencies:

npm install

Start the server:

npm start

Open your browser and visit:

http://localhost:3000

Usage

Basic Controls

Document Name: Name for your input text
Text to Chunk: Your input text to be processed
Max Token Size: Maximum size for each chunk (50-2500 tokens)
Similarity Threshold: Base threshold for semantic similarity (0.1-1.0)
Similarity Sentences Lookahead: Number of sentences to look ahead when calculating similarity (1-10)

Advanced Settings

Dynamic Threshold Bounds: Lower and upper bounds for dynamic similarity threshold adjustment
Combine Chunks: Enable/disable chunk combination phase
Combine Chunks Similarity Threshold: Threshold for combining similar chunks

Model Settings

Embedding Model: Choose from various supported embedding models
DType: Select the data type for the model, affecting precision and performance (e.g., fp32, fp16, q8).
Device: Choose the processing device (cpu or webgpu).

Output Settings

Return Token Length: Include token count in results
Return Embedding: Include embeddings in results
Chunk Prefix: Add prefix to chunks (useful for RAG applications)
Exclude Chunk Prefix in Results: Remove prefix from final results

Advanced Merge Settings

The Advanced Merge Settings section (collapsed by default) provides fine-grained control over the chunk optimization algorithm:

Max Merges Per Pass: Absolute limit on merges per iteration (default: 500)
Max Optimization Passes: Maximum iterations before stopping (default: 100)
Merges Per Pass (%): Percentage of candidates to process per pass (default: 40)
Uncapped Candidate Threshold: Below this count, all candidates merge (default: 12)

These settings are for advanced users tuning chunk quality. The algorithm sorts merge candidates by similarity and processes them in priority order across multiple passes for better global optimization.

Example Texts

Use the provided example texts to test different scenarios:

similar.txt: Text with high semantic similarity between sentences
different.txt: Text with low semantic similarity between sentences

Results

View chunked results in real-time
See chunk count, average token length, and processing time
Download results as JSON
Get generated code with your current settings

Development

The web UI is built with:

semantic-chunking library for text processing
Express.js for the backend
Vanilla JavaScript (ES6+) for the frontend
CSS3 for styling

License

This project is licensed under the MIT License - see the LICENSE file for details.

Appreciation

If you enjoy this package please consider sending me a tip to support my work 😀

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🍱 Semantic Chunking Web UI

Features

Getting Started

Prerequisites

Installation

Usage

Basic Controls

Advanced Settings

Model Settings

Output Settings

Advanced Merge Settings

Example Texts

Results

Development

License

Appreciation

🍵 tip me here

Uh oh!

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

🍱 Semantic Chunking Web UI

Features

Getting Started

Prerequisites

Installation

Usage

Basic Controls

Advanced Settings

Model Settings

Output Settings

Advanced Merge Settings

Example Texts

Results

Development

License

Appreciation

🍵 tip me here