Skip to main content

Three layers: a short note at the top, the key lines with our take in the middle, the full source at the bottom.

CI script

privacy-ci.sh

The privacy-CI umbrella. Runs every privacy gate (no-LLM, sub-processor, demo-no-persistence, more) on every PR.

Repo path scripts/privacy-ci.shLanguage Shell

What this is

An umbrella script that runs every individual privacy check on every code change. Think of it as the supervisor that calls each of the smaller scripts in order, fails the build if any one of them fails, and prints a single summary at the end.

What it proves

Backs every privacy promise at once. Each individual check has its own page; this is the script that makes sure none of them gets accidentally skipped. Read the promise →

What to look for in the source below

  • The ordered list of sub-checks it runs (no-LLM, no-demo-persistence, sub-processor freshness, more).
  • A single non-zero exit code that surfaces if any sub-check fails — so engineers see one clear signal, not a flood.
Show the full file (166 lines)

165 lines

#!/usr/bin/env bash
#
# Sprint-0 privacy gates.
#
# Forward defenses for the commitments in README.md and PLAN.md. These
# rules will tighten as the codebase grows; today they catch obvious
# accidents and signal that the gates exist.
#
# Run locally with: pnpm privacy-ci
# CI runs the same script via .github/workflows/ci.yml.
#
# Filenames with spaces are handled via null-terminated streams
# (git ls-files -z, xargs -0). All grep -rEn searches that operate
# on file lists pipe through xargs -0 to stay safe.
#
# PII scrubber cross-language parity gate (gate 5 below):
#   - Canonical Python: tools/pii-scrubber/redaction.py
#   - Shared TS:        packages/pii-scrub/src/index.ts
#   The four rule regex source strings (SSN / EMAIL / PHONE / BANK)
#   MUST stay byte-identical across the two files. A failure here
#   means a rule was edited in one language and not the other; the
#   fix is to mirror the change. The runtime parity vectors live in
#   apps/api/tests/pii-scrub-parity.test.ts and the Python equivalent;
#   this static grep is the cheap front-line check that fires before
#   either suite runs.

set -euo pipefail

fail=0
say() { printf '%s\n' "$@"; }

# All tracked files, NUL-separated, excluding directories we never
# scan. The trailing NUL keeps the stream parseable by xargs -0 even
# when filenames contain spaces or newlines.
mapfile -d '' files < <(
  git ls-files -z \
    | grep -zEv '^(node_modules|\.git|pnpm-lock\.yaml)' \
    || true
)

# Helper: emit a NUL-separated list of files filtered by extension regex.
filter_by_ext() {
  local pattern="$1"
  local f
  for f in "${files[@]}"; do
    if [[ "$f" =~ $pattern ]]; then
      printf '%s\0' "$f"
    fi
  done
}

# 1. No secrets-shaped files committed (use .env.example, not .env).
banned=()
for f in "${files[@]}"; do
  if [[ "$f" =~ (^|/)\.env(\..*)?$ ]] && [[ ! "$f" =~ \.env\.example$ ]] && [[ ! "$f" =~ \.env\..+\.example$ ]]; then
    banned+=("$f")
  elif [[ "$f" =~ \.(pem|key)$ ]] || [[ "$f" =~ /id_rsa$ ]] || [[ "$f" =~ /id_ed25519$ ]]; then
    banned+=("$f")
  fi
done
if [[ ${#banned[@]} -gt 0 ]]; then
  say "FAIL: secrets-shaped files committed:"
  printf '  %s\n' "${banned[@]}"
  fail=1
fi

# 2. Obvious API-key patterns in tracked text files.
key_pattern='\.(ts|tsx|js|mjs|cjs|py|json|yaml|yml|toml|md|sh)$|\.env\.example$'
key_hits=$(
  filter_by_ext "$key_pattern" \
    | xargs -0 -r grep -EnH 'sk-[A-Za-z0-9]{20,}|xoxb-[A-Za-z0-9-]+|AKIA[0-9A-Z]{16}|ghp_[A-Za-z0-9]{20,}|ghs_[A-Za-z0-9]{20,}' \
      2>/dev/null \
    | grep -v 'scripts/privacy-ci.sh' \
    || true
)
if [[ -n "$key_hits" ]]; then
  say "FAIL: probable API key in tracked file:"
  say "$key_hits"
  fail=1
fi

# 3. No "DO NOT COMMIT" / "TODO: remove" markers.
marker_pattern='\.(ts|tsx|js|mjs|cjs|py|json|yaml|yml|toml|md|sh|css|html)$'
markers=$(
  filter_by_ext "$marker_pattern" \
    | xargs -0 -r grep -EnH 'DO[ _-]?NOT[ _-]?COMMIT|TODO:[[:space:]]*remove|XXX:[[:space:]]*remove' \
      2>/dev/null \
    | grep -v 'scripts/privacy-ci.sh' \
    | grep -v 'CONTRIBUTING.md' \
    || true
)
if [[ -n "$markers" ]]; then
  say "FAIL: 'do not commit' marker in tracked file:"
  say "$markers"
  fail=1
fi

# 4. No console.* or print() in customer-data code paths.
#    services/, apps/api/, and tools/pii-scrubber/ are where invoice
#    content flows; logging there must go through a typed logger
#    (when it lands). apps/web/ is exempt today.
target_dirs=()
[[ -d services ]] && target_dirs+=("services")
[[ -d apps/api ]] && target_dirs+=("apps/api")
[[ -d tools/pii-scrubber ]] && target_dirs+=("tools/pii-scrubber")

if [[ ${#target_dirs[@]} -gt 0 ]]; then
  log_hits=$(
    find "${target_dirs[@]}" -type f \
        \( -name '*.ts' -o -name '*.tsx' -o -name '*.js' \
           -o -name '*.mjs' -o -name '*.cjs' -o -name '*.py' \) \
        -print0 2>/dev/null \
      | xargs -0 -r grep -EnH 'console\.(log|info|debug|warn|error)\b|^[[:space:]]*print\(' \
        2>/dev/null \
      | grep -vE '\.test\.|/tests?/|/cli/' \
      || true
  )
  if [[ -n "$log_hits" ]]; then
    say "FAIL: console.* or print() in customer-data code path."
    say "Use the typed logger; see PLAN.md privacy commitments."
    say "$log_hits"
    fail=1
  fi
fi

# 5. PII scrubber regex source-string parity (TS <-> Python).
#    The canonical Python at tools/pii-scrubber/redaction.py and the
#    shared TS package at packages/pii-scrub/src/index.ts must carry
#    byte-identical regex source strings for the four rules. We grep
#    each pattern out of both files; any rule that does not appear
#    in BOTH fails the gate. This catches a one-sided edit before
#    the runtime parity-vector suite even runs.
ts_scrub="packages/pii-scrub/src/index.ts"
py_scrub="tools/pii-scrubber/redaction.py"
if [[ -f "$ts_scrub" ]] && [[ -f "$py_scrub" ]]; then
  # The four canonical regex sources, stored exactly as they appear
  # in BOTH files (Python r"..." and TS /.../ share the source text).
  # If you change one, change both AND update this list.
  declare -a SCRUB_RULES=(
    '\b\d{3}-\d{2}-\d{4}\b'
    '\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b'
    '\b(?:\+?1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b'
    '\b\d{9}\b[\s\-/]+(?:[A-Za-z]{1,10}[\s\-/]+)?\d{4,17}\b'
  )
  for rule in "${SCRUB_RULES[@]}"; do
    if ! grep -qF -- "$rule" "$ts_scrub"; then
      say "FAIL: pii-scrub regex missing in TS canonical ($ts_scrub):"
      say "  $rule"
      fail=1
    fi
    if ! grep -qF -- "$rule" "$py_scrub"; then
      say "FAIL: pii-scrub regex missing in Python canonical ($py_scrub):"
      say "  $rule"
      fail=1
    fi
  done
fi

if [[ $fail -ne 0 ]]; then
  say
  say "Privacy gates failed. See README.md and PLAN.md for the rules."
  exit 1
fi

say "Privacy gates: OK."

See also

This is the file as it lives at the moment of this build. The canonical history lives in git. If you want the full history or a specific commit, write to hello@muntin.digital.

privacy-ci.sh · Verify · Muntin Ledger · Muntin