Skip to content

Latest commit

 

History

History
121 lines (86 loc) · 3.93 KB

File metadata and controls

121 lines (86 loc) · 3.93 KB

🍱 Semantic Chunking Web UI

A web-based interface for experimenting with and tuning Semantic Chunking settings. This tool provides a visual way to test and configure the semantic-chunking library's settings to get optimal results for your specific use case. Once you've found the best settings, you can generate code to implement them in your project.

Features

  • Real-time text chunking with live preview
  • Interactive controls for all chunking parameters
  • Visual feedback for similarity thresholds
  • Model selection and configuration
  • Results download in JSON format
  • Code generation for your settings
  • Example texts for testing
  • Dark mode interface
  • Syntax highlighting of JSON results and code samples
  • Line wrapping toggle for JSON results

semantic-chunking_web-ui

Getting Started

Prerequisites

  • Node.js (v18 or higher recommended)
  • npm (comes with Node.js)

Installation

  1. Clone the repository:
git clone https://github.com/jparkerweb/semantic-chunking.git
  1. Navigate to the webui directory:
cd semantic-chunking/webui
  1. Install dependencies:
npm install
  1. Start the server:
npm start
  1. Open your browser and visit:
http://localhost:3000

Usage

Basic Controls

  • Document Name: Name for your input text
  • Text to Chunk: Your input text to be processed
  • Max Token Size: Maximum size for each chunk (50-2500 tokens)
  • Similarity Threshold: Base threshold for semantic similarity (0.1-1.0)
  • Similarity Sentences Lookahead: Number of sentences to look ahead when calculating similarity (1-10)

Advanced Settings

  • Dynamic Threshold Bounds: Lower and upper bounds for dynamic similarity threshold adjustment
  • Combine Chunks: Enable/disable chunk combination phase
  • Combine Chunks Similarity Threshold: Threshold for combining similar chunks

Model Settings

  • Embedding Model: Choose from various supported embedding models
  • DType: Select the data type for the model, affecting precision and performance (e.g., fp32, fp16, q8).
  • Device: Choose the processing device (cpu or webgpu).

Output Settings

  • Return Token Length: Include token count in results
  • Return Embedding: Include embeddings in results
  • Chunk Prefix: Add prefix to chunks (useful for RAG applications)
  • Exclude Chunk Prefix in Results: Remove prefix from final results

Advanced Merge Settings

The Advanced Merge Settings section (collapsed by default) provides fine-grained control over the chunk optimization algorithm:

  • Max Merges Per Pass: Absolute limit on merges per iteration (default: 500)
  • Max Optimization Passes: Maximum iterations before stopping (default: 100)
  • Merges Per Pass (%): Percentage of candidates to process per pass (default: 40)
  • Uncapped Candidate Threshold: Below this count, all candidates merge (default: 12)

These settings are for advanced users tuning chunk quality. The algorithm sorts merge candidates by similarity and processes them in priority order across multiple passes for better global optimization.

Example Texts

Use the provided example texts to test different scenarios:

  • similar.txt: Text with high semantic similarity between sentences
  • different.txt: Text with low semantic similarity between sentences

Results

  • View chunked results in real-time
  • See chunk count, average token length, and processing time
  • Download results as JSON
  • Get generated code with your current settings

Development

The web UI is built with:

  • semantic-chunking library for text processing
  • Express.js for the backend
  • Vanilla JavaScript (ES6+) for the frontend
  • CSS3 for styling

License

This project is licensed under the MIT License - see the LICENSE file for details.

Appreciation

If you enjoy this package please consider sending me a tip to support my work 😀