glmmedia-ocr

Self-contained command-line tool that converts PDFs and images to structured Markdown using a local GLM-OCR model through Ollama. Manages the full pipeline automatically — Ollama lifecycle, model pulling, PDF rendering, layout detection via PP-DocLayoutV3, and result merging. Dual distribution through npm and pip with 50+ configurable options.

A production-grade OCR CLI that runs entirely locally with no cloud dependencies.

Pipeline

PDF rendering (pypdfium2) → image normalization → layout detection (PP-DocLayoutV3 via HuggingFace) → OCR via Ollama's generate endpoint → structured Markdown output with page separators.

Highlights

Zero-config: auto-starts Ollama, pulls the ~2.2GB model if missing, manages virtual environments
Handles PDFs, PNG, JPEG, WebP, BMP, TIFF, GIF
Error recovery — failed pages get placeholders, processing continues
Optional cloud mode via Zhipu Cloud MaaS backend
Clean shutdown with signal traps

glmmedia-ocr

Notes

Pipeline

Highlights