Skip to content

5w0rdf15h/paliscript

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

paliscript — Pali script transliteration for Python (Thai · Sinhala · IAST)

Pali script transliteration for Python. Converts between Thai Pali script, Sinhala script, and IAST (International Alphabet of Sanskrit Transliteration / romanization) — the three scripts used for Pali in Theravada Buddhist texts, including the Pali Canon (Tipitaka).

Useful for Pali scholars, Buddhist text digitization projects (SuttaCentral, bilara, CSCD), NLP researchers working with Pali, and anyone building tools for Theravada texts.

  • Zero dependencies — stdlib only, runs anywhere Python 3.10+ does
  • Single file — drop paliscript.py into your project, or pip install
  • Bidirectional — every conversion round-trips: Thai ↔ IAST ↔ Sinhala
  • 190 tests — including cross-script sutta phrase verification against bilara-data

Try it online

Don't want to install anything? Use the free online transliteration tool at rianthai.pro/pali/transliteration — it is powered by paliscript and lets you convert between Thai Pali, Sinhala, and IAST directly in the browser.

Install

pip install paliscript

Or just copy paliscript.py — it's a single standalone file.

Quick start

Note: Examples below use aspiration=AspirationStyle.DIGRAPH for readability. The default output uses dotted-H (e.g. dḣammā) for unambiguous round-tripping. See Aspiration styles.

from paliscript import to_iast, to_thai, sinhala_to_iast, iast_to_sinhala, AspirationStyle

# Thai Pali → IAST romanization
to_iast("กุสลา ธมฺมา", aspiration=AspirationStyle.DIGRAPH)    # → "kusalā dhammā"
to_iast("พุทฺโธ", aspiration=AspirationStyle.DIGRAPH)          # → "buddho"

# Sinhala → IAST romanization
sinhala_to_iast("කුසලා ධම්මා", aspiration=AspirationStyle.DIGRAPH)  # → "kusalā dhammā"

# IAST → Sinhala
iast_to_sinhala("mettā", aspiration=AspirationStyle.DIGRAPH)   # → "මෙත්තා"

# IAST → Thai Pali
to_thai("nibbāna", aspiration=AspirationStyle.DIGRAPH)         # → "นิพฺพาน"

# Cross-script via IAST pivot: Thai → IAST → Sinhala
iast = to_iast("พุทฺโธ", aspiration=AspirationStyle.DIGRAPH)   # Thai → IAST
iast_to_sinhala(iast, aspiration=AspirationStyle.DIGRAPH)       # IAST → Sinhala: "බුද්ධො"

Dotted-H is the default — unambiguous and safe for round-tripping:

to_iast("ธมฺมา")                                       # "dḣammā" (default: dotted-H)
to_iast("ธมฺมา", aspiration=AspirationStyle.DIGRAPH)    # "dhammā" (traditional digraph)

CLI usage

paliscript --to-iast "กุสลา ธมฺมา"
# kusalā dḣammā

paliscript --to-iast --aspiration digraph "กุสลา ธมฺมา"
# kusalā dhammā

paliscript --to-iast --script sinhala "කුසලා ධම්මා"
# kusalā dḣammā

paliscript --from-iast --script sinhala "kusalā dḣammā"
# කුසලා ධම්මා

paliscript --to-thai "kusalā dhammā" --aspiration digraph
# กุสลา ธมฺมา

echo "เมตฺตา" | paliscript --to-iast
# mettā

Also works standalone without installing: python paliscript.py --to-iast "เมตฺตา"

Aspiration styles (dotted-H vs digraph)

Aspirated consonants have two IAST representations:

Style Example Notes
Dotted-H (default) kḣ, dḣ, bḣ Unambiguous — each aspirate is one token
Digraph kh, dh, bh Traditional, but ambiguous with standalone h

If you are exchanging text with bilara-data, SuttaCentral exports, or standard Pali dictionaries, use AspirationStyle.DIGRAPH.

Script coverage

Feature Thai Pali Sinhala IAST
Vowels 10 18 standalone + 18 dependent Latin + diacritics
Consonants 33 41 (incl. ligatures, ś, ṣ, f) Latin + diacritics
Virama Phinthu ฺ (U+0E3A) Hal kirīma ් (U+0DCA)

All scripts use standard Unicode encodings: Thai (U+0E00–U+0E7F), Sinhala (U+0D80–U+0DFF). Input is NFC-normalized before processing.

Scope and limitations

This library handles Pali language texts only — specifically texts written in Thai Pali script, Sinhala Pali script, or IAST. It is not a general Thai or Sinhala language transliterator: modern Thai and Sinhala characters that do not appear in the Pali alphabet will pass through unchanged.

Origin

Transliteration tables and algorithm by Bhante Buddhañāṇo Thera, originally implemented as LibreOffice StarBasic macros for Pali text processing in monastic and academic contexts. Rewritten in Python with his permission. IAST conventions are verified against bilara-data (SuttaCentral, Mahasangiti edition).

License

MIT

About

Python library and CLI for Pali script transliteration: Thai Pali, Sinhala, and IAST. Zero dependencies, single file. Converts Thai ↔ IAST ↔ Sinhala using IAST as pivot. For Theravada Buddhist texts — Tipitaka digitization, Pali NLP, digital humanities.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages