Threat model
Internal-facing security artifact, intentionally publicly
readable. This is the same threat model the engineering team works
against, including the threats we have NOT yet fully mitigated.
Honesty here is more useful than completeness.
Rewritten 2026-05-11 to reflect the v4 zero-LLM architecture
(
docs/plan-v4-delta.md). The previous top-5 led with "CompromisedLLM provider" which no longer applies; the v4 plan calls for the
top-5 below: Fly compromise, template-store exfiltration, template
drift / poisoning, Neon credential leak, KMS legal process.
Last updated: 2026-05-11
Top five threats
1. Fly compromise (plaintext invoice content in worker memory)
Threat. An attacker compromises the docling worker Machine on Fly during the brief window where it holds plaintext invoice content in memory. Customer invoices flow through the docling worker (services/docling) before encryption at rest; if a compromised host can read process memory or write to the filesystem, the plaintext is exposed for the seconds it takes to extract.
Defenses (current).
- Ephemeral per-job Machines. Per
infra/fly/docling-ephemeral.toml,
every extraction spins a fresh Fly Machine, processes one invoice, terminates. There is no shared pool: tenant A's job cannot leave artifacts that tenant B's job could read because the Machine is gone (PR-7).
- Read-only rootfs + tmpfs
/tmp. Per the Dockerfile, the
container runs as a non-root user (10001:10001) with all capabilities dropped. Only the per-Machine tmpfs at /tmp is writable. There is no SSH on the Machine (Fly logs are the only access path).
- Network egress allowlist.
scripts/deploy-docling-ephemeral.sh
pins the per-Machine egress to Cloudflare R2 + Fly's 6PN internal mesh, denying everything else. A compromised worker cannot exfiltrate to the public internet.
- Runtime attestation.
services/docling/attestation.pyemits a
signed attestation (image hash + dep hashes + machine id) on startup. apps/api's POST /v1/attestation ingests it into the audit chain. Mismatched attestations alarm via the runbook at runbooks/tenant-isolation-incident.md.
- No long-term storage on Fly. Filesystem + process memory are
scrubbed at job end (Machine termination is the scrubber).
Residual risk. A zero-day in docling or its OCR runtime that escalates to host RCE before the Machine terminates. Mitigation: pip-audit on every CI run + cosign-signed images. The exposure window is bounded by the per-job Machine lifetime.
2. Template-store exfiltration
Threat. An attacker reads extraction_templates + template_observations for a tenant they do not own. Templates encode the operator's confirmed field fingerprints + values; exfiltration leaks vendor relationships, layout fingerprints, and the operator's correction history. This is a privacy-class breach even though templates are not invoice content per se.
Defenses (current).
- Postgres FORCE ROW LEVEL SECURITY. Migration
0004_templates.sql
enables RLS + FORCE on extraction_templates and template_observations. The policy uses current_setting('app.org_id') (no fail-open , true arg) for both USING (read) and WITH CHECK (write); the WITH CHECK clause prevents an authenticated session for org A from inserting rows with org_id = 'org_b' (PR-9 part 1 F2).
- Connection role separation. Production connects as a non-owner
Postgres role; the migration role retains owner rights for schema changes only. FORCE applies the policy to non-owners.
- GUC discipline at the app boundary. All Neon-backed queries
must be wrapped in sql.transaction([SET LOCAL app.org_id, ...]) so the GUC and the query share a connection (audit F1).
- Cross-tenant integration test.
infra/postgres/tests/0004_rls_cross_tenant.sql
is a runnable script that asserts org_A's role cannot SELECT or INSERT against org_B's rows. Runs against any Postgres in CI.
Residual risk. A privileged-but-non-owner role with access to the bypass surface (e.g., SET ROLE-able to a superuser). Mitigated by Neon's role hierarchy + KMS-wrapped JSON blob in json_blob_kms_wrapped (DBAs cannot read records without HSM access). Quarterly RLS-bypass drill in the Drills section below.
3. Template drift / poisoning attack
Threat. A malicious caller writes template observations against another extraction's (vendor_id, layout_hash) triple, polluting the per-rule fingerprints so subsequent extractions silently mis-extract on the victim's invoices. This is a v4-specific attack because templates are now load-bearing.
Defenses (current).
- Confirm-field ownership verification.
apps/api/src/routes/templates.ts
POST /v1/templates/extractions/:id/confirm-field calls extractions_store.getOwnershipForOrg(orgId, extractionId) and rejects with 404 if the extraction does not exist or belongs to another org, AND with 409 if the body's vendor_id / layout_hash does not match the engine's recorded values for that extraction (PR-9 part 1 F3).
- Idempotent observations.
template_store.record_observationis
idempotent on (extraction_id, field_path, fingerprint) so a retry, refresh, or duplicate Confirm cannot artificially advance a rule toward promotion (PR-9 audit F5).
- Per-org RLS on
template_observations. Even with route-layer
bypass, RLS rejects cross-org writes.
- Drift recovery test.
services/extract/tests/test_drift_recovery.py
exercises the legitimate drift path (vendor-side layout change -> engine creates fresh candidate -> 2 user confirms -> silent recovery on the third invoice). A poisoned template would fail this test on the next CI run.
Residual risk. Insider with both DB access AND knowledge of the synthetic fingerprint scheme. Mitigated by Defense 1 of "Insider exfiltration" (no standing prod access, two-person approval).
4. Neon credential leak
Threat. The Postgres connection string (DATABASE_URL) leaks via log line, error trace, build artifact, or compromised CI runner. With the connection string an attacker reads every tenant's records.
Defenses (current).
- Connection role is non-owner. A leaked connection still hits
RLS (Defense 2 above) so the attacker sees zero rows unless they ALSO know an app.org_id value to set per-session.
- Zero customer content in logs.
scripts/privacy-ci.shgreps
for console.* / print( in the customer-data path (services/, apps/api/, tools/pii-scrubber/). The typed logger that ships in the next CI hardening pass will replace ad-hoc log lines with structured events that strip values.
- Sentry payload scrubber.
tools/pii-scrubberruns in
Sentry.beforeSend so any incidental error trace that captures request bodies is stripped before persistence.
- Secret rotation runbook.
runbooks/key-rotation.mddocuments
the 24-hour rotation drill for DATABASE_URL, JWT_SECRET, and the KMS CMK.
Residual risk. A compromised CI runner with both DATABASE_URL _and_ the ability to forge app.org_id GUC values. Mitigated by the named-customer-ticket gate on production access; investigated on every drill.
5. KMS legal process (subpoena / lawful access)
Threat. A government request for customer data lands at AWS (KMS holder) or Cloudflare (R2 / D1 holder). The plan-v4 commitment is "we cannot decrypt records without KMS"; the architectural question is whether a law-enforcement KMS request would compel us to participate in decryption.
Defenses (current).
- KMS scope is narrow. Per
docs/sub-processors.mdAWS holds
KMS keys only -- no AWS compute, no S3 for customer content. A KMS request is therefore a key-release request, not a content request.
- Per-tenant DEKs. Each tenant's records are encrypted under a
per-tenant DEK wrapped by the KMS CMK. A targeted subpoena gets one tenant's records; a blanket subpoena would get every tenant's records but would also be the kind of process Cloudflare and AWS challenge.
- Warrant canary at
docs/warrant-canary.mdremoves a sentence
if any of the listed processes have been received, so customers can detect the gag-order case.
- BYOK tier (Phase 4). When BYOK ships, the customer holds the
key; we cannot decrypt at all. Self-hosted tier (Phase 4) removes Muntin from the legal process entirely.
Residual risk. US-government National Security Letter with gag order targeting our KMS account. Mitigated by the canary + BYOK as the customer-side escape.
Legacy entry: insider exfiltration (now Defense 1 of #1 above)
The original v2 top-5 led with this. The v4 architecture re-cast it as a defense within #1 / #2: there is no standing prod access, every access is audit-logged into the customer's own chain. We retain the threat as a perpetual concern but it is no longer in the top 5.
Residual risk. A Muntin employee with valid time-boxed access can still see plaintext during the access window. Documented in the DPA so customers know what they are trusting.
Sub-threats we still track (not in the v4 top-5)
The pre-pivot top-5 (compromised LLM provider, docling supply-chain, account takeover, subpoena) collapsed into either Defense lines above or sub-threats below. We retain the discipline of naming them.
Account takeover (phished operator). Magic-link auth + 15-minute TTL on the link + HTTP-only / SameSite=Lax cookie + 14-day session TTL. Risk-based step-up on new payee / new integration destination ships pre-private-beta. New-payee changes require second-admin email ack before activation. The operator can still be phished; we raise cost without eliminating the risk.
docling supply-chain. Pinned dep versions + pip-audit on every CI run + cosign-signed images + SBOMs published per release. The no-LLM gate (scripts/no-llm-ci.sh) catches accidental imports of banned packages. Network egress allowlist (Threat #1 Defense) plus Defense 1 of #4 (DBAs cannot read records without HSM access) contains a successful supply-chain attack to the Machine lifetime.
Compromised LLM provider. Not applicable in v4. No LLM is in the customer-data path. The no-LLM gate enforces this on every CI run. The Phase-4 self-hosted tier may add an optional local Ollama / vLLM endpoint; that is opt-in and out of the default path.
Threats we considered and chose NOT to defend against at GA
- DDoS at the edge. Cloudflare's free DDoS protection is the
current defense. Above that we accept brief unavailability rather than pay enterprise rates.
- Nation-state-level adversary. The architecture is built for
the realistic threat model of a small-restaurant SaaS, not Mossad. Customers with Mossad-level threat models should evaluate self-hosting (when it ships in Phase 4).
- Quantum cryptanalysis. AES-256 + ECDH is current best
practice. Post-quantum migration is a long-horizon item.
Drills
Quarterly tabletop exercises for the five threats above. Output: updates to runbooks at runbooks/. Customers may request the results summary via security@muntin.digital.
How to report a threat we missed
security@muntin.digital. PGP key + bug-bounty policy at docs/security.txt. We do not sue good-faith researchers.