Skip to content

Commit 5baaca6

Browse files
author
william ghysels
committed
Remove UMAP tooling from core
Drop visualize_umap.py and related deps/docs so core stays focused on scan/search.
1 parent 088f01d commit 5baaca6

File tree

6 files changed

+11
-375
lines changed

6 files changed

+11
-375
lines changed

.github/workflows/ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,4 +14,4 @@ jobs:
1414
python-version: "3.11"
1515
- name: Compile Python (syntax check)
1616
run: |
17-
python -m py_compile image_database.py visualize_umap.py
17+
python -m py_compile image_database.py

CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,4 @@ Please open a pull request for any change.
1010

1111
### Checks
1212

13-
CI runs a lightweight syntax check (`python -m py_compile`) on `image_database.py` and `visualize_umap.py`.
13+
CI runs a lightweight syntax check (`python -m py_compile`) on `image_database.py`.

README.md

Lines changed: 8 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,6 @@ A searchable image database using SigLIP 2 (CLIP) embeddings and SQLite-vec for
1111
- **Image Search**: Find similar images using a reference image
1212
- **Combined Search**: Combine text and image queries with weighted blending
1313
- **Interactive Mode**: Load model once and run multiple queries
14-
- **3D Visualization**: UMAP-based 3D visualization of image embeddings with clustering
1514
- **HTML Gallery**: Beautiful search results with image previews and direct file access
1615

1716
![Search Results Example](browser.png)
@@ -34,7 +33,7 @@ cd CLIP-database
3433

3534
2. Install dependencies:
3635
```bash
37-
cd github
36+
cd core
3837
pip install -r requirements.txt
3938
```
4039

@@ -53,7 +52,7 @@ pip install sqlite-vec
5352
Scan a directory and build the image database:
5453

5554
```bash
56-
cd github
55+
cd core
5756
python image_database.py scan /path/to/images --db "/path/to/database.db"
5857
```
5958

@@ -64,15 +63,15 @@ Options:
6463
- `--limit`: Limit number of images to process (for testing)
6564

6665
```bash
67-
cd github
66+
cd core
6867
python image_database.py scan /path/to/images --batch-size 75 --inference-batch-size 16 --profile --limit 100
6968
```
7069

7170
### Searching Images
7271

7372
#### Text Search
7473
```bash
75-
cd github
74+
cd core
7675
python image_database.py search "a red car" -k 20 --db "/path/to/database.db"
7776
```
7877

@@ -83,25 +82,25 @@ python image_database.py search "a red car" --db "/path/to/database.db" -k 20
8382

8483
#### Image Search
8584
```bash
86-
cd github
85+
cd core
8786
python image_database.py search /path/to/image.jpg --image -k 20
8887
```
8988

9089
#### Combined Search
9190
```bash
92-
cd github
91+
cd core
9392
python image_database.py search "sunset" --query2 /path/to/image.jpg --weights 0.7 0.3 -k 20
9493
```
9594

9695
#### Negative Prompts
9796
```bash
98-
cd github
97+
cd core
9998
python image_database.py search "nature" --negative "buildings" -k 20
10099
```
101100

102101
#### Interactive Mode
103102
```bash
104-
cd github
103+
cd core
105104
python image_database.py search --interactive
106105
```
107106

@@ -114,25 +113,6 @@ In interactive mode:
114113
- Type `quit` or `exit` to end session
115114

116115

117-
### 3D Visualization
118-
119-
Generate a UMAP 3D visualization of all image embeddings:
120-
121-
```bash
122-
cd github
123-
python visualize_umap.py
124-
```
125-
126-
This will:
127-
1. Load embeddings from the database
128-
2. Compute UMAP projections (cached for future runs)
129-
3. Cluster embeddings for color coding
130-
4. Generate an interactive HTML visualization
131-
132-
Open the generated HTML file in your browser and click on points to see image previews.
133-
134-
**Note:** The HTML results include "Open Image" and "Open Folder" links that use the `localexplorer:` protocol. To use these links, install a browser extension like [Local Explorer](https://chrome.google.com/webstore/detail/local-explorer/llbiblehpbpeflfgjcdfcpcakjhddedi) for Chrome/Edge. Without the extension, images will still display, but the file/folder links won't work.
135-
136116
## Model
137117

138118
This project uses [SigLIP 2 SO400M](https://huggingface.co/google/siglip2-so400m-patch14-224) from Google, which provides:
@@ -154,7 +134,6 @@ The SQLite database contains:
154134
- Use `--inference-batch-size` to optimize GPU memory usage
155135
- Enable `--profile` to identify bottlenecks
156136
- The database uses WAL mode for better concurrent access
157-
- UMAP projections are cached to avoid recomputation
158137

159138
## License
160139

config.json.example

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,5 @@
22
"database_dir": "C:\\MyExampleProject",
33
"model_cache_dir": "C:\\MyExampleProject\\models",
44
"results_dir": "results",
5-
"thumbnails_dir": "thumbnails",
6-
"umap_output_file": "umap_3d_visualization.html",
7-
"umap_cache_file": "umap_projections_cache.pkl",
8-
"umap_metadata_file": "umap_image_metadata.json"
5+
"thumbnails_dir": "thumbnails"
96
}

requirements.txt

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,4 @@ tqdm>=4.66.0
55
numpy>=1.24.0
66
sqlite-vec>=0.0.1
77
sentencepiece>=0.1.99
8-
umap-learn>=0.5.5
9-
plotly>=5.18.0
10-
pandas>=2.0.0
118
PyMuPDF>=1.23.0

0 commit comments

Comments
 (0)