about https://dharmalekha.info/search

as a reminder: `https://github.com/erc-dharma/project-documentation/issues/7`
regarding fuzzy settings for Old Javanese, the following gives a useful list: https://chromewebstore.google.com/detail/sealang-plus/elnlcjojbmjmhheahimkikghiajeloha  though it implies a slightly different transliteration system and so the list wouild need to be adapted

@michaelnmmeyer wrote:
> The new search system is at: https://dharmalekha.info/search. I will eventually move it to https://dharmalekha.info/texts when done with the interface.

might I suggest the reverse, and possibly the use of "find" rather than search", i.e. make both texts as such and strings in texts findable via https://dharmalekha.info/find or https://dharmalekha.info/search? It might be desirable immediately to separate searching metadata about texts (i.e., searching in teiHeader) and searching within texts by presenting two different search boxes.

@michaelnmmeyer wrote:
> I invite you to tell me what you want the matching behavior to be like. My plan is to define one or more matching modes. For instance, there could be a "default" mode which is case-insensitive and ignores hyphens, a "Tamil" mode that does the same and also treats 'k' and 'g' as equivalent, a "Sanskrit" mode that ignores spaces, etc.

MATCHING BEHAVIOR
- I approve of the idea of modes, though I would like us to try not to call them by the names of specific languages (as one of the fundamental ambitions of DHARMA was and is to make terminological affinities findable across language boundaries)
- we could have two basic modes, 'precise' and 'loose'
- there could be a generic (default) version of 'loose', with features such as
-- 1. being case-insensitive, 
-- 2. ignoring hyphens, 
-- 3. ignoring any milestone elements and any `<g>` elements
-- 4. ignoring any transliterated virāmas ·
-- 5. ignoring differences ē/e and ō/o
- and then there could be custom settings for 'loose'
-- 1. ignoring difference between voiced/unvoiced, 
-- 2. ignoring difference between aspirated/unaspirated, 
-- 3. ignoring difference between dental/retroflex plosives,
-- 4. ignoring difference between sibilants
-- 5. ignoring difference between vowel characters with or without macron (i.e., between vowels transliterated as long or short)
-- 6. ignoring difference between characters with and without diacritics (like searching in google docs)
-- 7. ignoring difference between sequences CC and CəC
- I would like us to offer a selection of regular expressions

PRESENTATION OF SEARCH RESULTS
I would like this to be more space-efficient and suggest that we might display only title and complete file name. E.g., compared to this

<img width="1101" height="387" alt="Image" src="https://github.com/user-attachments/assets/ecb49972-471c-41e5-9fa3-193fc37fd92f" />

displaying only 
```
Camundi
tfc-nusantara-epigraphy/DHARMA_INSIDENKCamundi
⟨01⟩ (0) [nama]ś cāmuṇḍyai
```
might be sufficient



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about https://dharmalekha.info/search #408

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

about https://dharmalekha.info/search #408

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions