Record of Processing Activities (RoPA)
Draft, awaiting counsel review. This document is the
Article 30 GDPR record kept by Muntin Digital LLC as a Processor
(and, for sign-up data, as a Controller). It enumerates each
processing activity with its purpose, legal basis, sub-processors,
retention, security measures, and international-transfer posture.
The published version (post counsel review) lives at
muntin.digital/ledger/ropaand supersedes this draft.
Last updated: 2026-05-29 (draft) · Version: 0.1 (post-pivot, pre-private-beta) — added A4 (marketing-site analytics: self-hosted, cookieless Plausible CE).
Scope and reading order
This RoPA covers four processing activities:
- A1 — Account management (sign-up, sign-in, sessions).
- A2 — Invoice extraction (the full pipeline: ingest, docling
layout, deterministic engine, per-vendor templates, audit chain, storage).
- A3 — Integrations (QuickBooks Online and Xero posting). This
activity is planned but currently disabled by env flag at private beta; it is listed here because the sub-processor list at docs/sub-processors.md pre-discloses Intuit and Xero so the DPA §6 30-day-notice clause has run when the env flag flips on.
- A4 — Marketing-site analytics (aggregate traffic measurement on
the public marketing site only, via self-hosted, cookieless Plausible Community Edition). No personal data is stored; the marketing site is not the authenticated product.
Each section uses the same nine-column structure so the document is grep-friendly for counsel review.
A1 — Account management
| Field | Value |
|---|---|
| Activity | Account management: sign-up, magic-link sign-in, session lifecycle (access + refresh tokens), /settings/security device-list surface. |
| Purpose | Authenticate the operator who will confirm invoice verdicts; bind each verdict to an identifiable user for the audit chain; let the operator revoke sessions and devices. |
| Legal basis (GDPR Art. 6) | (b) performance of a contract (the Terms of Service); for security-only audit entries, (f) legitimate interests (preventing abuse of an authenticated service), balanced against the right not to be profiled — IPs are not joined to any analytics system. |
| Categories of data subjects | The operator (a natural person who signs into the workspace). Workspace owner, bookkeeper, or invited collaborator. |
| Categories of personal data | Email address; chosen workspace name (which may contain a personal name); role; SHA-256 of the User-Agent string at session creation; request IP on security-only audit rows; magic-link token (15 min TTL). |
| Recipients (sub-processors) | Cloudflare (Workers, KV for magic-link tokens with 15-minute TTL, D1 for sessions metadata); Resend (delivering the magic-link email); Neon (Postgres sessions and users tables); AWS KMS (envelope-encrypting per-tenant DEKs). |
| Retention | Sessions: 30 days for refresh-window state, immediately revoked on sign-out. Magic-link tokens: 15 minutes, deleted on first use. Account records: until account closure plus a 30-day grace, then purged. Security-only audit rows: 7 years (financial-records overlap). |
| Security measures | TLS 1.3 in transit; AES-256-GCM at rest; per-tenant DEKs wrapped by AWS KMS; row-level security on the sessions and users tables per infra/postgres/migrations/0015_rls_data_plane.sql patterns; PII scrubber removes SSNs, US phone numbers, emails, and ACH-shaped account numbers from any Sentry error event before persistence; two-person production-access controls. |
| International transfers | US-only at GA (AWS us-east-2, Cloudflare US edge). EU residency is a Phase-4 deliverable; SCCs Module 2 (Controller-to-Processor) attach by reference to the DPA for EU customers until then. |
A2 — Invoice extraction
This is the core processing activity. The deterministic engine in services/extract/engine.py and the docling wrapper in services/docling/main.py are the production reference; nothing in this RoPA describes a future or hypothetical pipeline.
| Field | Value |
|---|---|
| Activity | Invoice extraction: ingest an invoice (upload, paste, snap, email-forward); store the ciphertext in residency-pinned R2; run docling to recover page layout, table structure, and reading order; run the deterministic engine to resolve the vendor against the seeded catalog and apply the per-vendor template the operator has confirmed (or, on early invoices from a new vendor, a synonym dictionary + position heuristics fallback); validate the result against a canonical schema; persist as an encrypted record; append the corresponding audit-chain entry; surface the draft verdict for operator review. |
| Purpose | Convert vendor invoices into structured records the operator can post to their accounting system. Compute insights (vendor spend trends, price-hike anomalies, duplicate-bill detection) by deterministic SQL over the persisted records. |
| Legal basis (GDPR Art. 6) | (b) performance of the Terms of Service when the customer is also the controller of the personal data inside the invoice content (e.g. their own employees, their own vendors). For personal data of third parties appearing incidentally in invoice content, Muntin processes only on documented instructions from the customer (Controller) under Art. 28; the customer is responsible for establishing the third-party legal basis. |
| Categories of data subjects | The customer's employees and authorised users (signup data overlap with A1); the customer's vendors (names, addresses, tax IDs from invoice headers); any natural person whose data incidentally appears in invoice content (rare; e.g. a contractor named on a service invoice). |
| Categories of personal data | Vendor identifiers (legal name, trade name, address, tax ID); banking identifiers when the invoice carries them (account numbers, routing numbers); payment terms; remit-to addresses; line-item descriptions (free text; may incidentally name a person); the operator's user_id (linked to A1). |
| Recipients (sub-processors) | Cloudflare (R2 for ciphertext invoice files, Workers for orchestration); Fly.io (services/docling and services/extract plaintext-in-memory during the request lifecycle only, with tmpfs at /tmp and per-job ephemeral Machines per infra/fly/docling-ephemeral.toml); Neon (encrypted records in extractions, templates in extraction_templates, chain entries in audit_events); AWS KMS (wraps per-tenant DEKs); AWS S3 Object Lock (WORM mirror for audit-log metadata, Phase-4). |
| Retention | Raw invoice files: per org.retention_seconds; default 24 hours after extraction, customer-configurable 1 hour to 90 days. Extracted records: until the operator deletes them or closes the account (then purged within 30 days, with chain tombstones replacing target references). Audit-log entries: 7 years (financial-records retention). Computed verdicts: until the underlying record is deleted, or cleared on Mark Expected. |
| Security measures | Per-tenant Data Encryption Keys wrapped by AWS KMS; row-level security on every Postgres table holding Customer Data (extractions, extraction_templates, template_observations, documents, audit_events per infra/postgres/migrations/0015_rls_data_plane.sql); docling Machines are ephemeral per-job with no shared state and tmpfs scratch; Worker memory holds plaintext only during the request lifecycle; CI gate scripts/no-llm-ci.sh fails any commit that imports an LLM SDK or reaches an LLM HTTP endpoint; the PII scrubber at tools/pii-scrubber/redaction.py strips SSNs, US phone numbers, emails, and ACH-shaped account numbers from any error event or audit-target reference before persistence; signed container images (cosign), SBOMs per release; audit-chain integrity verifiable by the customer via GET /v1/audit/verify; 24-hour notification SLA for any chain-integrity event (DPA §10.2). |
| International transfers | US-only at GA; EU residency in Phase 4. SCCs Module 2 attach by reference to the DPA. No third party outside the sub-processor list at docs/sub-processors.md processes invoice content. No LLM provider is in the customer-data path — the v2 architecture's zero-retention Anthropic hop was removed in the v4 pivot. |
| Automated decision-making | None with legal or similarly significant effects (GDPR Art. 22): every verdict requires operator confirmation. See DPA §12 and the corresponding section of the Privacy Policy. |
A3 — Integrations (planned, env-flag-disabled)
This activity is listed for completeness and for advance disclosure per DPA §6. It is not currently processing customer data; both the operator opt-in and the worker env flag must be set before any data flows.
| Field | Value |
|---|---|
| Activity | Post approved invoice records to a customer's QuickBooks Online realm or Xero tenant via OAuth2. One-way only (Muntin → QBO/Xero). Optional per-workspace. |
| Purpose | Move operator-approved verdicts into the customer's accounting system so they do not retype them. |
| Legal basis (GDPR Art. 6) | (b) performance of the Terms of Service (when the customer explicitly connects and opts in). |
| Categories of data subjects | Same as A2 — the customer's vendors and any natural persons appearing in invoice content. |
| Categories of personal data | Vendor identifiers; bill line items; invoice totals; payment terms; the OAuth2 token issued by Intuit or Xero (KMS-wrapped at rest). |
| Recipients (sub-processors) | Intuit (QuickBooks Online API) — _planned, opt-in, env-flag-gated_. Xero (Xero API) — _planned, opt-in, env-flag-gated_. Both are pre-listed at docs/sub-processors.md so the 30-day notice has run before activation. |
| Retention | OAuth tokens until the customer disconnects the integration or closes the account. The customer's QBO/Xero retention is governed by Intuit/Xero policy and is outside Muntin's control. |
| Security measures | OAuth tokens KMS-wrapped at rest; per-org access scope; integration sync attempts logged into the audit chain with success/failure and timestamp; the env flag (INTEGRATIONS_QBO_ENABLED / INTEGRATIONS_XERO_ENABLED) is a Worker environment variable (not a feature flag) so toggles are atomic and visible in the deploy log. |
| International transfers | Intuit and Xero are US-based for the active regions; SCCs attach via the sub-processor agreement. |
| Status | Planned, disabled by env flag at private beta. Active enablement triggers a Privacy Policy update and a sub-processor announcement-list email at least 30 days in advance. |
A4 — Marketing-site analytics
Aggregate traffic measurement on the public marketing site (ledger.muntin.digital) only. The tracker is first-party (apps/web/public/assets/p.js, loaded by app/(marketing)/_components/PlausibleAnalytics.tsx) and posts to a same-origin proxy (app/api/event/route.ts) that forwards to a self-hosted Plausible Community Edition instance on our own Fly.io tenant. It never loads inside the authenticated product (app/(product)/**).
| Field | Value |
|---|---|
| Activity | Cookieless, aggregate analytics for the public marketing site only. A first-party script sends a pageview beacon to a same-origin proxy that forwards to self-hosted Plausible CE. No tracker runs in the authenticated product. |
| Purpose | Aggregate marketing-traffic measurement (which marketing pages are visited, roughly how often, from what referrer) so we can tell whether the public site is doing its job. No per-visitor profile; no advertising; no cross-site tracking. |
| Legal basis (GDPR Art. 6) | (f) legitimate interests — measuring traffic to our own marketing site. The balancing test is light because the processing is cookieless, stores no personal data, builds no profile, and honours Global Privacy Control (/.well-known/gpc). No consent banner is required because no cookies or other client-side identifiers are stored. |
| Categories of data subjects | Visitors to the public marketing site. (Not the authenticated operator acting inside the product — that is A1.) |
| Categories of personal data | None stored. The visitor's IP address and User-Agent are used transiently by the CE instance only to derive a daily-rotating, then-discarded hash (hash(daily_salt + domain + UA + IP)); the salt is destroyed every 24h and neither the IP nor the UA is retained. The same-origin proxy forwards UA + IP transit-only and logs neither. The result is aggregate counts. |
| Recipients (sub-processors) | Fly.io (hosts the self-hosted Plausible CE instance on the same tenant as services/docling / services/extract and our self-hosted Sentry). Plausible-the-company is not a sub-processor — the software is self-hosted; no analytics data leaves our Fly instance. See the Fly.io footnote in docs/sub-processors.md. |
| Retention | Aggregate counts only, retained in the CE instance. No raw event log of IPs or User-Agents; the per-day salt that would make a hash linkable is discarded every 24 hours, so even same-day hashes cannot be correlated across days. |
| Security measures | First-party only — the script is script-src 'self' (/assets/p.js) and the beacon is connect-src 'self' (POST /api/event); no third-party host is contacted and no CSP widening was needed. TLS 1.3 in transit. The proxy runs server-side (runtime = "nodejs") and forwards UA + IP transit-only without logging them. Cookieless by construction. |
| International transfers | US — the self-hosted Plausible CE instance runs on Fly.io iad (US-East), the same region as the rest of the customer-data plane. No transfer to any third-party analytics vendor occurs. |
| Screening DPIA | Not required. No special-category data (Art. 9), no profiling, no systematic monitoring of a publicly accessible area in the Art. 35(3) sense (the processing is cookieless aggregate counting of our own marketing pages, not tracking of individuals). This is distinct from the invoice-extraction DPIA at docs/dpia-invoice-extraction.md, which is unaffected. |
Maintenance
This RoPA is reviewed annually and on every material change to a processing activity. Material changes include: adding or removing a sub-processor; changing the retention default for any data category; expanding the categories of personal data processed under an existing activity; changing the legal basis. Each change goes through the same pull-request flow as the rest of the docs; counsel signs off before the published version is updated.