Skip to main content

Three layers: a short note at the top, the key lines with our take in the middle, the full source at the bottom.

CI script

no-llm-ci.sh

Scans new dependencies for LLM, transformer, or AI packages on every pull request. Fails the build when one slips in.

Repo path scripts/no-llm-ci.shLanguage Shell

What this is

A short shell script that runs on every code change before anything ships. It scans the list of software libraries the project depends on and refuses to let the build finish if any of them is a language-model library.

What it proves

Backs the promise that no language model ever reads the content of your invoice. The script is the gate; if someone tried to slip an LLM library in, the build would stop here. Read the promise →

What to look for in the source below

  • A list of library-name patterns the script searches for (openai, anthropic, transformers, langchain, and more).
  • An exit code of 1 when a match is found — that is the line that fails the build.
  • No allow-list — the script applies to every commit, not just to a few branches.

The lines that carry the weight

The library-name patterns it watches for

Lines 4060

LLM_IMPORT_PATTERN='(^|[^a-zA-Z_.])(anthropic|openai|instructor|cohere|together)([^a-zA-Z_]|$)|google\.generativeai|vertexai|@anthropic-ai/sdk|@google/generative-ai'

LLM_HTTP_PATTERN='api\.anthropic\.com|api\.openai\.com|generativelanguage\.googleapis\.com|api\.cohere\.(ai|com)|api\.together\.xyz'

# 1. No imports of LLM SDKs. Pre-private-beta audit F5: the broad
#    `/tests?/` exclusion is gone -- a malicious test could
#    `from openai import OpenAI` and not trip the gate. Now we only
#    exclude `tests/fixtures/` (intentionally inert data) and the
#    line-comment exclusion is precise (^[[:space:]]*#) instead of
#    the prior `"""` swallow.
import_hits=$(
  printf '%s\0' "${files[@]}" \
    | xargs -0 -r grep -EnH \
        -e '^[[:space:]]*import[[:space:]]+.*("|'"'"')(anthropic|openai|instructor|cohere|together|@anthropic-ai/sdk|@google/generative-ai|vertexai|google\.generativeai)' \
        -e '^[[:space:]]*from[[:space:]]+("|'"'"')?(anthropic|openai|instructor|cohere|together|@anthropic-ai/sdk|@google/generative-ai|vertexai|google\.generativeai)' \
        -e '^[[:space:]]*(import|from)[[:space:]]+(anthropic|openai|instructor|cohere|together|vertexai)([[:space:]]|$)' \
      2>/dev/null \
    | grep -vE '^[^:]*:[0-9]+:[[:space:]]*#' \
    | grep -vE '/tests?/fixtures/' \
    || true
)

Plain English

Each pattern is a known language-model SDK or library. If any of them appear in package.json, the script proceeds to the fail step. The list is verbose by design — better to flag an unfamiliar name and have a human check than to miss one.

The line that fails the build

Lines 7892

  say "$http_hits"
  fail=1
fi

# 3. No LLM-only environment variables (ANTHROPIC_API_KEY,
#    OPENAI_API_KEY etc.) in code or wrangler/fly configs. .env.example
#    files are in the .env-shaped exclusion above; this catches the
#    code references that would consume such a key.
env_hits=$(
  printf '%s\0' "${files[@]}" \
    | xargs -0 -r grep -EnH \
        '\b(ANTHROPIC_API_KEY|OPENAI_API_KEY|COHERE_API_KEY|VERTEX_PROJECT|GEMINI_API_KEY|TOGETHER_API_KEY)\b' \
      2>/dev/null \
    | grep -vE '^[^:]*:[0-9]+:[[:space:]]*#' \
    | grep -vE '/tests?/fixtures/|scripts/no-llm-ci\.sh' \

Plain English

When a match is found, the script prints which library tripped it and exits with code 1. Continuous integration sees the non-zero exit and refuses to merge the change. There is no override flag — the only way through is to remove the library.

Show the full file (110 lines)

109 lines

#!/usr/bin/env bash
#
# no-LLM-import gate (PR-6).
#
# Privacy invariant from the v4 plan: no module in this repo imports
# an LLM SDK, and no HTTP destination matches an LLM-vendor pattern.
# The architecture is deterministic end-to-end -- if a future change
# accidentally pulls anthropic / openai / instructor / cohere / a
# Google or Vertex client back into the customer-data path, this gate
# fails the build before the change ships.
#
# Run locally with: bash scripts/no-llm-ci.sh
# CI runs the same script via .github/workflows/ci.yml.
#
# Exclusions:
#   * docs/  -- the plan documents intentionally name LLM vendors as
#               part of the architectural decision history. They are
#               not imported by code.
#   * scripts/no-llm-ci.sh + scripts/check-verboten-phrases.mjs --
#               the gate files name the patterns they enforce.
#   * services/extract/main.py docstring + tests/* docstrings --
#               assertion strings reference the banned tokens.

set -euo pipefail

fail=0
say() { printf '%s\n' "$@"; }

# All tracked code files, NUL-separated. Pre-private-beta audit F5
# extends the file glob with *.ipynb / *.yaml / *.yml so notebooks
# and CI configs cannot pull in an LLM SDK invisibly.
mapfile -d '' files < <(
  git ls-files -z \
    -- '*.ts' '*.tsx' '*.js' '*.mjs' '*.cjs' '*.py' '*.toml' \
       '*.ipynb' '*.yaml' '*.yml' \
    | grep -zEv '^(node_modules|\.git|pnpm-lock\.yaml|docs/|scripts/(no-llm-ci\.sh|check-verboten-phrases\.mjs)$)' \
    || true
)

LLM_IMPORT_PATTERN='(^|[^a-zA-Z_.])(anthropic|openai|instructor|cohere|together)([^a-zA-Z_]|$)|google\.generativeai|vertexai|@anthropic-ai/sdk|@google/generative-ai'

LLM_HTTP_PATTERN='api\.anthropic\.com|api\.openai\.com|generativelanguage\.googleapis\.com|api\.cohere\.(ai|com)|api\.together\.xyz'

# 1. No imports of LLM SDKs. Pre-private-beta audit F5: the broad
#    `/tests?/` exclusion is gone -- a malicious test could
#    `from openai import OpenAI` and not trip the gate. Now we only
#    exclude `tests/fixtures/` (intentionally inert data) and the
#    line-comment exclusion is precise (^[[:space:]]*#) instead of
#    the prior `"""` swallow.
import_hits=$(
  printf '%s\0' "${files[@]}" \
    | xargs -0 -r grep -EnH \
        -e '^[[:space:]]*import[[:space:]]+.*("|'"'"')(anthropic|openai|instructor|cohere|together|@anthropic-ai/sdk|@google/generative-ai|vertexai|google\.generativeai)' \
        -e '^[[:space:]]*from[[:space:]]+("|'"'"')?(anthropic|openai|instructor|cohere|together|@anthropic-ai/sdk|@google/generative-ai|vertexai|google\.generativeai)' \
        -e '^[[:space:]]*(import|from)[[:space:]]+(anthropic|openai|instructor|cohere|together|vertexai)([[:space:]]|$)' \
      2>/dev/null \
    | grep -vE '^[^:]*:[0-9]+:[[:space:]]*#' \
    | grep -vE '/tests?/fixtures/' \
    || true
)
if [[ -n "$import_hits" ]]; then
  say "FAIL: LLM SDK import in tracked code:"
  say "$import_hits"
  fail=1
fi

# 2. No LLM HTTP destinations as bare strings.
http_hits=$(
  printf '%s\0' "${files[@]}" \
    | xargs -0 -r grep -EnH "$LLM_HTTP_PATTERN" \
      2>/dev/null \
    | grep -vE '^[^:]*:[0-9]+:[[:space:]]*#' \
    | grep -vE '/tests?/fixtures/' \
    || true
)
if [[ -n "$http_hits" ]]; then
  say "FAIL: LLM HTTP destination in tracked code:"
  say "$http_hits"
  fail=1
fi

# 3. No LLM-only environment variables (ANTHROPIC_API_KEY,
#    OPENAI_API_KEY etc.) in code or wrangler/fly configs. .env.example
#    files are in the .env-shaped exclusion above; this catches the
#    code references that would consume such a key.
env_hits=$(
  printf '%s\0' "${files[@]}" \
    | xargs -0 -r grep -EnH \
        '\b(ANTHROPIC_API_KEY|OPENAI_API_KEY|COHERE_API_KEY|VERTEX_PROJECT|GEMINI_API_KEY|TOGETHER_API_KEY)\b' \
      2>/dev/null \
    | grep -vE '^[^:]*:[0-9]+:[[:space:]]*#' \
    | grep -vE '/tests?/fixtures/|scripts/no-llm-ci\.sh' \
    || true
)
if [[ -n "$env_hits" ]]; then
  say "FAIL: LLM-only env var consumed in tracked code:"
  say "$env_hits"
  fail=1
fi

if [[ $fail -ne 0 ]]; then
  say
  say "no-LLM gate failed. The customer-data path is deterministic."
  say "If you have a legitimate reason to add an LLM dependency, that"
  say "decision is a v4-plan amendment, not a code-review comment."
  exit 1
fi

say "no-LLM gate: OK."

See also

This is the file as it lives at the moment of this build. The canonical history lives in git. If you want the full history or a specific commit, write to hello@muntin.digital.

no-llm-ci.sh · Verify · Muntin Ledger · Muntin