API reference · PHI handling

Three modes. Pick one.

Velora supports three PHI handling postures. They share an API surface — the analytics endpoints behave identically regardless of which mode your tenant is on. The difference is what crosses our wire and what we hold at rest.

The three modes

01 · Server-mode

Encrypted at rest in customer-isolated vaults.

Send one header — X-Velora-Customer-Id. We hold the HMAC key (AES-256-GCM at rest) and the {token → original} mapping inside a 90-day-TTL vault scoped to your tenant.

BAA · audit-logged · server holds map

02 · Client-mode

PHI not persisted on Velora.

You generate an HMAC secret and send it on the request as X-Velora-Vault-Key. Velora tokenizes in process memory and drops the originals — never written to disk, DB, backups, or logs.

BAA · in-memory only · you hold map

03 · Sidecar-mode

PHI never leaves your VPC.

Run the velora-vault tokenizer inside your perimeter — Docker, Python, or TypeScript. CSVs tokenize locally before upload; only tokens cross the wire. Send X-Velora-Vault-Mode: pretokenized.

Zero PHI on Velora · for the strictest counsel

The shipped client-mode does deliver "no PHI at rest on Velora." It does not deliver "Velora never sees plaintext" — your data still passes through TLS termination on our edge before being scrubbed in process memory. If your compliance posture distinguishes between those two claims (and many do), sidecar is the mode you need.

Full HIPAA narrative

For BAA scope discussion, breach-notification posture, and the full Safe Harbor identifier mapping, see the HIPAA technical reference. This page is the developer integration guide.

When to pick sidecar

Counsel requires plaintext-never. Your privacy office or board demands that Velora cannot see plaintext PHI at any point. Client-mode tokenizes in our memory; sidecar tokenizes in yours.
BAA scope minimization. Reducing what we see reduces what is in scope for our BAA, breach-notification obligations, and your vendor-risk reviews.
Data residency / state-level rules. State Medicaid contracts, EU residency overlays, or carrier addenda forbidding PHI from crossing certain network boundaries.
Independent crypto custody. Your security program wants the HMAC secret in your own KMS / Vault / Secrets Manager, with rotation on your schedule, not ours.

Pick something else when: your team is small and integration is the bottleneck (server-mode requires nothing on your side), your debugging cycle depends on Velora support inspecting underlying values (in sidecar we cannot — only you can), or your CSV headers are non-canonical and you want our server-side AI column-mapper to figure them out (the sidecar runs without network egress and ships without the AI mapper).

Install the tokenizer

Three install paths. Docker is the recommended default; pip and npm are for teams that already manage Python or Node services inside their VPC.

Docker (recommended)

docker pull velora/vault:0.1.0

docker run --rm \
  -e VELORA_VAULT_KEY=$(cat /path/to/your/vault.key) \
  -v $PWD/in.csv:/data/in.csv:ro \
  -v $PWD/out.csv:/data/out.csv \
  velora/vault:0.1.0 \
  tokenize --product=instant_audit --output /data/out.csv /data/in.csv

The image is python:3.13-slim-bookworm-based, runs as a non-root user (vault, uid 10001), has no network egress at runtime by design, and ships stdlib-only — no transitive dependencies. New schema versions ship as new image tags; the container never reaches out for updates on its own.

Private registry today

During early access the image lives in a private registry. Production Docker Hub publication is owed in a follow-on; until then request access via support@velora.health.

pip

# Source-tree install (supported during early access)
pip install -e packages/velora-vault

# Once installed
velora-vault version       # 0.1.0
velora-vault list-products # instant_audit, claims_replay, ...

npm (in flight)

The TypeScript / JS port is in flight. It mirrors the Python primitive byte-for-byte — same HMAC, same schema, byte-equivalent tokens — gated on a cross-language test suite. Until it lands, use the Python package or the Docker image.

Five-step quickstart

1. Generate a 32-byte secret

openssl rand -hex 32
# 7d3a9f1c4e8b2a6d5f0c3e9a8b7d6c5e4f3a2b1c9d8e7f6a5b4c3d2e1f0a9b8c

2. Store it in your secrets manager

AWS Secrets Manager, GCP Secret Manager, HashiCorp Vault — whatever your team uses for production secrets. Do not commit it. Do not paste it into Slack, issue trackers, or shell history. For local dev:

export VELORA_VAULT_KEY=7d3a9f1c4e8b2a6d5f0c3e9a8b7d6c5e4f3a2b1c9d8e7f6a5b4c3d2e1f0a9b8c

The CLI deliberately refuses a --key=<plaintext> flag — the secret should never land in shell history or ps output. Use --key-env (default VELORA_VAULT_KEY) or --key-file only.

3. Tokenize a CSV

velora-vault tokenize \
  --product=instant_audit \
  --key-env=VELORA_VAULT_KEY \
  --write-map=local-map.json \
  claims.csv > claims.tokenized.csv

This applies the instant_audit schema: HMACs direct identifiers, generalizes DOB to year, generalizes ZIP to the first three digits, leaves analytics columns (CPT codes, billed amounts, carrier id) untouched.

4. Preview first when shapes are new

velora-vault preview --product=instant_audit claims.csv

preview classifies headers without touching row data — it tells you what would be tokenized, generalized, kept, or dropped. The fastest way to confirm a new claim file shape maps to the schema you expect.

5. Upload with the pretokenized header

curl -X POST \
  -H "Authorization: Bearer <your-jwt>" \
  -H "X-Velora-Vault-Mode: pretokenized" \
  -F "file=@claims.tokenized.csv" \
  https://api.velora.health/api/v2/secure-upload/instant_audit

The header tells the server you ran the sidecar. It validates that every tokenize-marked column already holds a token-shaped value (<PREFIX>_<16 hex>) and rejects the upload with 400 if any plaintext slipped through.

Token format

NPI_a3f7c812e9d40b56
└┬┘ └────────┬───────┘
prefix     16 hex chars (HMAC-SHA-256 truncated)

Tokens are 20 characters: a three-letter semantic prefix, an underscore, and the first 16 hex chars of HMAC-SHA-256(secret, "column:value"). 64 bits of entropy is effectively collision-free for the row volumes we expect (< 10M rows / customer / year). Prefixes are semantic so the server-side validator can detect "sidecar wasn't run" cases — a column declared as ssn that arrives holding anything other than SSN_<16 hex> is rejected before analytics run.

Equivalence guarantee

The load-bearing property

Tokens produced by velora-vault for a given (secret, column, value) are byte-equivalent to tokens produced by Velora's server-side scrubber for the same inputs.

If this guarantee breaks, your sidecar tokens stop joining with anything Velora analyzes and analytics return empty results silently. The guarantee is pinned by a cross-language test suite. Four moving parts must stay byte-equivalent: the HMAC algorithm + key derivation, the hex-encoded truncation length (16 chars), the prefix table, and the header normalization function (lowercase + collapse non-alnum to _).

Security model

In sidecar mode, here is what each side sees:

Surface	Velora	You
Plaintext PHI	no	yes
HMAC secret	no	yes
Token → original map	no	yes
Tokens (HMAC outputs)	yes	yes
Generalized quasi-identifiers	yes	yes
Analytics columns (CPT, billed amts, carrier id)	yes	yes

HMAC-SHA-256 is one-way. There is no untokenize(token, secret) function. Recovery from a token to its original goes through your local map. If you lose the map but keep the secret, you can rebuild the map by re-tokenizing your source data. If you lose the secret, that particular token is permanently opaque.

This is a deliberate property: even if Velora is compelled to turn over every byte we hold for a customer, we cannot produce plaintext PHI for sidecar customers. We do not have it.

What is NOT supported

Be explicit about the trade-offs. Sidecar mode means:

No row-level operational debugging. Our support team cannot inspect the underlying value for a specific token — only you can.
No member-level cross-customer benchmarks. Tokens are per-customer; the same member in two tenants produces two different tokens by design. Provider-level benchmarks (NPI is public per NPPES) are unaffected.
No AI column-mapper for weird headers. Server- and client-mode benefit from a server-side AI mapper; the sidecar runs without egress and ships without it. Feed canonical or schema-aliased headers; unknowns pass through unchanged and are flagged on stderr.
No automatic schema updates. The container does not phone home. Pull a newer image / package tag to adopt a new schema version.

Common 400 errors

`400 vault mode required`

Your backend may be on a build that pre-dates the X-Velora-Vault-Mode: pretokenized acceptance path. Alternatively, you set conflicting mode signals on one request — sending pretokenized AND X-Velora-Vault-Key AND X-Velora-Customer-Id together is rejected as ambiguous.

`400 plaintext detected in tokenize column`

A column declared as tokenize-required in the schema arrived holding a value that doesn't match the <PREFIX>_<16 hex> shape. Most common causes: the sidecar wasn't run on this file, it was run with the wrong --product slug, or you're on an old schema version that doesn't recognize a new column. Run velora-vault preview --product=<slug> input.csv to confirm what would happen, then re-tokenize.

For the full demo bundle (synthetic claims CSV + byte-equivalent reference output + bash/Python/Node quickstarts), see Code samples. For BAA scoping and Safe Harbor identifier mapping, see the HIPAA technical reference.