A comprehensive toolkit for splitting, editing, and managing book manuscripts in DOCX format.
- Split books by heading styles (Heading 1, Heading 2, etc.)
- Merge chapters back into complete book
- Extract images into chapter-specific directories
- Preserve all formatting, styles, and embedded images
- Idempotent operations (split→merge produces identical output)
- Manifest-based editing - Reorder, exclude, or merge chapters via YAML
- Comment extraction - Extract Google Doc comments and match to chapters
- Chapter renumbering - Automatically renumber chapters sequentially
- Style simplification - Consolidate styles to canonical set
- Empty heading removal - Clean formatting artifacts
- Multiple versions - Create short, full, and alternate versions
- Version control friendly - Text-based manifest for git tracking
- Batch operations - Process all chapters at once
memoir-split/
├── scripts/ # All Python tools and utilities
│ ├── split_book.py # Split book into chapters
│ ├── merge_book.py # Merge chapters back
│ ├── build_book.py # Build from manifest
│ ├── extract_comments.py
│ ├── match_comments_improved.py
│ └── ...
├── docs/ # Documentation
│ ├── QUICK_START.md
│ ├── MANIFEST_GUIDE.md
│ ├── COMMENTS_GUIDE.md
│ └── ...
├── output/ # Split chapter files (DOCX)
├── data/ # Comments, credentials, matches
│ ├── credentials.json # Google API credentials (private)
│ ├── comments.json # Extracted comments
│ └── comment_matches_improved.json
├── archive/ # Old/test files
├── book_structure.yaml # Manifest file
├── requirements.txt
└── README.md
New! Automated workflow for updating chapters from Google Drive:
# 1. Download from Google Drive
# File > Download > Microsoft Word (.docx)
# 2. Run refresh script (auto-detects download and processes it)
python3 scripts/refresh_from_google_drive.py
# That's it! Chapters are split and renumbered automatically.See REFRESH_WORKFLOW.md for details.
pip install -r requirements.txtOr install manually:
pip install python-docx Pillow- Open your Google Doc
- Go to File > Download > Microsoft Word (.docx)
- File will be saved to ~/Downloads/
python3 scripts/refresh_from_google_drive.pyThis automatically:
- Finds the latest download
- Splits into chapters
- Renumbers sequentially
- Extracts all images
python3 scripts/split_book.py mybook.docxThis will:
- Split the book by Heading 1 styles
- Create an
outputdirectory with:chapter_01_Title.docxchapter_02_Title.docxchapter_01_images/(if chapter has images)chapter_02_images/(if chapter has images)- etc.
Specify output directory:
python scripts/split_book.py mybook.docx --output chaptersUse different heading level (e.g., Heading 2):
python scripts/split_book.py mybook.docx --heading-level 2Full example:
python scripts/split_book.py mybook.docx -o my_chapters -l 1input_file- Path to your DOCX file (required)-o, --output- Output directory (default:output)-l, --heading-level- Heading level for chapters (1-9, default: 1)-h, --help- Show help message
After running the script, you'll get:
output/
├── chapter_01_Introduction.docx
├── chapter_01_images/
│ ├── image_001.jpg
│ └── image_002.png
├── chapter_02_Getting_Started.docx
├── chapter_02_images/
│ ├── image_001.jpg
│ ├── image_002.jpg
│ └── image_003.png
├── chapter_03_Advanced_Topics.docx
└── ...
If you see "ERROR: No chapters found!", the script will list available heading styles in your document. You may need to:
- Check which heading style is actually used in your document
- Use the
--heading-leveloption with the correct level - Or update your document to use consistent heading styles
- Make sure images are embedded in the document (not just linked)
- Google Docs should embed images when exporting to DOCX
- Some image formats may not be supported
- Make sure you have write permissions in the output directory
- On macOS/Linux, you may need to make the script executable:
chmod +x split_book.py
Extract comments and match them to split chapters:
# Set up Google API credentials (one-time setup)
# See GOOGLE_API_SETUP.md for detailed instructions
# Extract comments from your Google Doc
python3 scripts/extract_comments.py --doc-id YOUR_DOCUMENT_ID
# Match comments to chapter files (99.5% match rate!)
python3 scripts/match_comments_improved.py
# Review the report
open data/comment_matches_improved.mdSee COMMENTS_GUIDE.md for complete instructions.
Easily reorder, exclude, or merge chapters using a YAML manifest:
# View current book structure
python3 scripts/build_book.py --show-structure
# Edit the manifest file
open book_structure.yaml
# Build the book according to manifest
python3 scripts/build_book.py --output my_book.docxSee docs/QUICK_START.md or docs/MANIFEST_GUIDE.md for details.
Renumber chapters to be sequential:
python3 scripts/renumber_chapters.pyClean up formatting artifacts:
# Analyze document
python3 scripts/remove_empty_headings.py mybook.docx --analyze
# Remove empty headings
python3 scripts/remove_empty_headings.py mybook.docx -o cleaned.docxConsolidate styles to a canonical set:
python3 scripts/simplify_styles.py mybook.docx -o simplified.docxHere's a typical workflow for managing a memoir:
# 1. Download from Google Docs
# File > Download > Microsoft Word (.docx)
# 2. Split into chapters
python3 scripts/split_book.py mybook.docx --heading-level 2
# 3. Renumber chapters if needed
python3 scripts/renumber_chapters.py
# 4. Extract comments from Google Doc
python3 scripts/extract_comments.py --doc-id YOUR_DOC_ID
python3 scripts/match_comments_improved.py
# 5. Edit individual chapters (use your text editor)
# 6. Use manifest to reorder/exclude chapters
python3 scripts/build_book.py --show-structure
# Edit book_structure.yaml as needed
# 7. Build final book
python3 scripts/build_book.py --output final_draft.docxWhen ready to publish, convert to EPUB (ebook) and PDF (print):
# Build both formats at once
python3 scripts/build_all.py --cover cover.jpg
# Or build individually
python3 scripts/export_markdown.py # DOCX → Markdown
python3 scripts/build_epub.py # Markdown → EPUB
python3 scripts/build_pdf.py # Markdown → PDFOutput:
book.epub- Ebook for Kindle, Apple Books, etc.book.pdf- Print-ready PDF for KDP Print, IngramSpark, etc.
See docs/PUBLISHING_GUIDE.md for complete publishing workflow.
- docs/QUICK_START.md - Quick reference for manifest-based editing
- docs/MANIFEST_GUIDE.md - Complete guide to manifest system
- docs/COMMENTS_GUIDE.md - Extract and match Google Doc comments
- docs/PUBLISHING_GUIDE.md - Convert to EPUB and PDF for publication
- docs/GOOGLE_API_SETUP.md - Set up Google API credentials
- docs/RENUMBERING_SUMMARY.md - Chapter renumbering details
- Python 3.7 or higher
- python-docx library
- Pillow (PIL) library
- PyYAML library (for manifest system)
- Google API libraries (for comment extraction)
Install all requirements:
pip install -r requirements.txt- Opens the DOCX file using python-docx
- Scans all paragraphs looking for specified heading style
- Groups content between headings into chapters
- For each chapter:
- Creates a new DOCX document
- Copies the chapter content with formatting
- Extracts embedded images to a dedicated directory
- Saves the chapter file with a sanitized filename
- Use consistent heading styles in your Google Doc before exporting
- Heading 1 is typically used for chapters
- Heading 2 could be used for major sections if your book is structured differently
- The script preserves formatting, so your chapters will look like the original