A tool that reads a DOCX book file, detects Tibetan chapter title paragraphs by their font styling, and replaces each matched title with a formatted header table containing a Nalanda University logo, the styled chapter title, and a QR code linking to the chapter's URL.
pip install .For development:
pip install -e ".[dev]"PDF conversion requires one of the following (optional — DOCX output is always produced):
- LibreOffice (recommended) — best fidelity for Tibetan fonts and table formatting
sudo apt install libreoffice # Debian/Ubuntu - Pandoc + XeLaTeX (fallback) — handles Unicode/Tibetan well
sudo apt install pandoc texlive-xetex # Debian/Ubuntu
If neither is installed, the tool will produce the DOCX output and skip PDF generation.
nalanda-docx-format book.docx chapters.yaml nalanda_logo.png -o ./output/| Argument | Description |
|---|---|
book.docx |
Source DOCX book file |
chapters.yaml |
YAML chapter lookup dictionary |
nalanda_logo.png |
Nalanda University logo image (PNG) |
-o / --output |
Output directory (default: ./output/) |
-v / --verbose |
Enable verbose (DEBUG) logging |
chapter_1:
chapter_title: "༄༅། །ཆོས་ཀྱི་དབྱིངས་སུ་བསྟོད་པ།"
chapter_url: "https://example.com/chapter1"
chapter_2:
chapter_title: "༄༅། །དཔེ་མེད་པར་བསྟོད་པ།"
chapter_url: "https://example.com/chapter2"Each entry must have:
chapter_title— the exact Tibetan title text as it appears in the DOCX (font: Monlam Uni Ouchen5, 18pt)chapter_url— URL to encode in the QR code
- Opens the DOCX as a ZIP archive and parses the XML directly using lxml
- Scans all paragraphs for runs styled with Monlam Uni Ouchen5 at 18pt (the title font)
- Matches detected titles against the YAML chapter lookup (NFC-normalized for Tibetan text)
- For each match, generates a QR code and builds a header table with three columns:
- Logo (10% width)
- Styled title text (80% width)
- QR code (10% width)
- Replaces the original title paragraph with the header table
- Saves the modified DOCX and optionally converts to PDF
The tool produces:
output/<bookname>.docx— modified DOCX with header tablesoutput/<bookname>.pdf— PDF version (if LibreOffice or Pandoc is available)output/qr_<chapter_id>.png— generated QR code images
PYTHONPATH=src pytest