Muntin Ledger
We read with docling, by IBM Research.
docling is the open-source library that turns a PDF or a scanned invoice photo into structured tables and text. We did not write it. Without it, Muntin Ledger would not exist.
What docling does for you
Reads page geometry. Identifies tables, headers, line-item rows. Hands the structured cells to our own engine, which maps them to the canonical Invoice fields you see in the ledger. We read your invoices with a small open-source layout engine called docling, plus our own per-vendor templates. Nothing else processes your invoice content. The full processing stack is named at /promises.
The project
- Repository: github.com/docling-project/docling
- License: MIT
- Authored by the Deep Search Team at IBM Research
- Sister projects we track: docling-core, docling-parse, docling-ibm-models
How to cite docling
The docling team requests the following citation. If you publish work that uses Muntin Ledger's extraction output, please cite docling alongside us.
@techreport{Docling,
author = {Deep Search Team},
month = {8},
title = {Docling Technical Report},
url = {https://arxiv.org/abs/2408.09869},
eprint = {2408.09869},
doi = {10.48550/arXiv.2408.09869},
version = {1.0.0},
year = {2024}
}Read the report: arxiv.org/abs/2408.09869
How we honour the dependency
- Naming. We name docling on this page, in our privacy policy, in our DPA, and on the invoice review surface itself where the "reads with docling" line travels alongside the extracted record.
- Upstream contributions. We open issues for every real-invoice case we found surprising, and patch upstream where a fix is ours to write. Merged PRs are listed below as they land. This is what an open-source dependency deserves: real reports + real patches, not a logo on a slide.
- No vendor lock. We pin docling versions but do not vendor a fork. Operators who want to verify our build can read the service Dockerfile against the upstream tag.
- If you want to support docling yourself, the project takes pull requests and issues directly. The link is above. We are not standing between you and the maintainers.
Merged-upstream PRs
Empty for now. The first PR ships when we have a real improvement to upstream — not before. We will not ship empty-calorie commits to feel good about this section.
Other shoulders we stand on
- Hono — the apps/api Worker framework
- Next.js — the apps/web App Router
- Cloudflare Workers, D1, KV, R2, Email Routing
- Neon — managed Postgres for tenant-scoped records
- Fly.io — compute for the docling + extract services
- Tauri — the desktop shell
- Tailwind v4 — the design-token bridge
- Resend — magic-link delivery
- rapidfuzz — header synonyms + vendor resolver fuzz
- pytest, vitest — the gates that keep this honest
Want to be named here for a contribution that helped us? Email hello@muntin.digital with a link.