Settle — Technical Architecture Document

Section 0

Architecture Principles

These principles are not aspirational — they are constraints. Every design decision in this document can be traced back to at least one of them. When a tradeoff arises, principles at the top of this list take precedence.

P-01

Defense in Depth

No single control stands alone. Encryption at the field level exists even when disk encryption is present. Auth checks exist at the middleware layer even when a route guard is in place. Every sensitive operation has multiple independent barriers.

P-02

Least Privilege

Database roles, API service accounts, and R2 credentials carry only the permissions they need and nothing more. Executors cannot see co-executor data; workers cannot read payment data. Privilege is granted explicitly, never inherited.

P-03

Fail Fast with Context

Errors surface immediately with enough context to debug. Silent failures — especially in the notification service and benefit scanner — are worse than visible crashes. Every error includes a correlation ID, estate context, and the operation that failed.

P-04

Auditability as a First-Class Feature

Every mutation to an estate record, every access to a SSN, every notification sent, and every rule version used to generate a task plan is written to an immutable audit log. This satisfies legal discovery requirements and enables debugging of long-lived workflows.

P-05

Workflows Must Never Lose State

An estate workflow lasts 16–18 months. The state machine is authoritative and must be recoverable from the database alone. In-memory state is only a performance optimization; it is never the source of truth.

P-06

Separation of Concerns

The rules engine does not send notifications. The notification service does not query benefit databases. Clear service boundaries make each component independently testable, deployable, and replaceable. The monorepo structure enforces this through module boundaries.

P-07

UPL Guardrails in Architecture

Unauthorized Practice of Law is a structural risk. The system guides, it does not advise. Content generated by the rules engine is framed as procedural guidance, not legal counsel. The architecture must make it technically impossible for the system to render a legal opinion on behalf of a specific estate situation.

P-08

Reversibility Over Optimization

At Year 1 scale, premature optimization creates accidental complexity. Prefer designs that can be replaced when requirements become clear over those that are theoretically optimal today. Postgres JSON columns over a dedicated rules engine database is the canonical example of this tradeoff.

P-09

Idempotency for External Calls

Any operation that calls an external service — Lob, Resend, NAUPA, institution APIs — must be idempotent. Retries are safe. Duplicate physical letters or duplicate cancellation requests are not. Every outbound call carries an idempotency key derived from the estate and operation context.

Section 1

System Overview

Settle is architecturally a modular monolith deployed on Fly.io, backed by Neon Postgres and Cloudflare R2. It is not a microservices system — at Year 1 scale (500 estates), the operational overhead of distributed services outweighs the benefits. Service boundaries are enforced at the module level within the codebase, making extraction straightforward when scale demands it.

Deployment Topology

All server-side compute runs on Fly.io. The SvelteKit application serves both the frontend (SSR) and the API routes. A separate Fly.io machine type runs background workers (notification service, benefit scanner) on a schedule or queue trigger. Neon Postgres provides the primary data store with automatic branching for staging environments. Cloudflare R2 stores documents, accessed via pre-signed URLs generated by the API — the browser never talks directly to R2 in upload mode.

Component	Platform	Machine Type	Count (Y1)	Scaling Trigger
api-server	Fly.io	shared-cpu-2x · 512MB	2 (HA)	CPU > 70% for 2 min
notification-worker	Fly.io	shared-cpu-1x · 256MB	1	Queue depth > 50
benefit-scanner	Fly.io	shared-cpu-1x · 256MB	1	Cron-triggered
postgres	Neon	0.25 CU · autoscale	1 primary	Neon autoscale
document-vault	Cloudflare R2	Object storage	—	Serverless
session-store	Upstash Redis	Serverless Redis	—	Serverless

Section 2

Frontend Architecture

The frontend is SvelteKit deployed to Cloudflare's global CDN edge network. This gives static assets and pre-rendered pages sub-50ms TTFB globally without any CDN configuration overhead. The SvelteKit application uses a hybrid rendering strategy: marketing pages and unauthenticated flows are fully SSR'd for SEO; the authenticated estate application is server-rendered for initial load then transitions to client-side navigation.

SSR Strategy

Route Pattern	Rendering Mode	Rationale
/	prerender	Marketing page, fully static, cached at edge
/how-it-works, /pricing	prerender	SEO-critical, no dynamic content
/signup, /login	SSR	CSRF token injection, form handling
/estate/[id]/dashboard	SSR + CSR	Initial data load server-side; subsequent navigation client-side
/estate/[id]/tasks	SSR + CSR	Task plan rendered on server, mutations via client fetch
/estate/[id]/documents	SSR + CSR	File list server-rendered; upload/preview client-side

Grief-Aware UX Architecture

The people using Settle are in one of the most cognitively impaired states a person can experience. The architecture must account for this — not just the visual design, but the caching, session, and error recovery behavior of the application.

Grief fog is a real cognitive phenomenon. Users may start a task, navigate away, return hours later, and not remember what they were doing. The application must preserve context aggressively and never ask a user to re-enter information they have already provided.

Client-Side State Preservation

SvelteKit's +page.server.ts load functions cache estate data with a 60-second stale-while-revalidate window. Form state is auto-saved to localStorage every 5 seconds during intake flows, keyed by estate ID and form name. If the browser tab is closed mid-form, the user resumes exactly where they left off on next visit.

// src/lib/stores/formPersist.ts
export function createPersistedForm(estateId: string, formKey: string) {
  const storageKey = `settle:form:${estateId}:${formKey}`;

  return {
    restore: () => {
      const raw = localStorage.getItem(storageKey);
      return raw ? JSON.parse(raw) : null;
    },
    save: (data: unknown) => {
      localStorage.setItem(storageKey, JSON.stringify({
        data,
        savedAt: Date.now()
      }));
    },
    clear: () => localStorage.removeItem(storageKey)
  };
}

Session Continuity for Long-Lived Workflows

Estate workflows span 16–18 months. Standard session expiry of 24–72 hours is inappropriate. Sessions are configured with a 30-day sliding window. The session cookie carries only the session ID; all session data (user ID, estate ID, role, last active page) lives in Redis. On each authenticated request, the session TTL is refreshed. The "last active page" is stored and presented as a resume prompt on next login.

Offline Support

Families in rural areas, or using cellular connections at a funeral home, may have intermittent connectivity. The application uses a Service Worker registered at the app root to cache the application shell and the current estate's task plan for offline viewing. Mutations made offline are queued and replayed when connectivity is restored.

Asset Type	Cache Strategy	TTL
App shell (HTML, CSS, JS)	Cache-first after first load	Stale until new deploy
Estate task plan	Stale-while-revalidate	10 minutes
Document metadata	Stale-while-revalidate	5 minutes
Document files (PDFs)	Cache on demand (explicit user action)	1 hour
API responses (GET)	Network-first with cache fallback	60 seconds

Offline mutation queue: When a user marks a task complete while offline, the mutation is stored in IndexedDB. A background sync event replays the queue when connectivity is restored. The UI shows a subtle "syncing" indicator. Conflict resolution is last-write-wins at the task level — an acceptable tradeoff given the low concurrency of a single-estate workflow.

Section 3

Backend API

The API is implemented as SvelteKit server routes (+server.ts files), co-located with the frontend. This is not a tradeoff in Year 1 — it means a single deployment, shared TypeScript types between server and client, and zero serialization overhead. The API layer follows REST conventions with resource-oriented URLs and standard HTTP semantics.

Authentication and RBAC

Authentication is session-based with cookies. Sessions are stored in Upstash Redis with a 30-day sliding expiry. The session token is a cryptographically random 256-bit value (generated with crypto.getRandomValues). The session payload includes the user ID, estate ID (if applicable), role, and a fingerprint of the request IP and user agent for anomaly detection.

Why session auth over JWTs? See ADR-004. Short version: estate workflows are long-lived. JWTs with short expiry require refresh token infrastructure that's more complex than the problem it solves. Sessions can be invalidated instantly on the server, which matters critically when an attorney is removed from an estate or a family reports unauthorized access.

Role Definitions

Role	Description	Key Permissions	Restrictions
executor	Named executor of the estate	Full CRUD on estate, tasks, documents, notifications	Cannot delete estate record (soft delete only)
co_executor	Named co-executor	Same as executor; all actions are co-attributed	Cannot remove executor or self
attorney	Legal counsel on the estate	Read all; write case notes; upload documents	Cannot send notifications or modify task status
family_viewer	Family member with view-only access	Read task status, document list (not contents), estate summary	No writes; no PII fields; no financial data
admin	Settle staff (internal only)	All estates; audit log access; legal rule management	All access is logged; no unilateral PII access

RBAC is enforced at two layers: the route handler level (SvelteKit hooks check the session role before the handler executes) and the database level (Postgres row-level security policies that reference the current session's estate ID). This means a misconfigured route cannot accidentally return data from another estate — the database policy will reject the query.

Key API Endpoints

Estate Management

Method	Path	Description	Required Role
POST	/api/estates	Create new estate (intake)	Authenticated user
GET	/api/estates/:id	Get estate summary	executor, co_executor, attorney, family_viewer
PATCH	/api/estates/:id	Update estate (assets discovered, status change)	executor, co_executor
GET	/api/estates/:id/tasks	Get full task plan with rule version metadata	executor, co_executor, attorney, family_viewer
POST	/api/estates/:id/tasks/regenerate	Regenerate task plan (new assets/state change)	executor, co_executor
PATCH	/api/estates/:id/tasks/:taskId	Update task status, add notes	executor, co_executor
POST	/api/estates/:id/documents	Request presigned upload URL	executor, co_executor, attorney
POST	/api/estates/:id/notifications	Queue notification (tier specified in body)	executor, co_executor
GET	/api/estates/:id/benefits	Get benefit scan results	executor, co_executor
POST	/api/estates/:id/benefits/scan	Trigger benefit scan for estate	executor, co_executor

Sample: Create Estate

// POST /api/estates
// Request Body:
{
  "deceased": {
    "firstName": "Margaret",
    "lastName": "Chen",
    "dateOfBirth": "1942-03-15",
    "dateOfDeath": "2026-03-01",
    "stateOfResidence": "CA",
    "ssn": "XXX-XX-XXXX"  // encrypted in transit and at rest
  },
  "executor": {
    "relationship": "child"
  },
  "estateProfile": {
    "hasRealProperty": true,
    "estimatedAssetValue": "250000-500000",
    "hasWill": true,
    "hasTrust": false
  }
}

// 201 Created Response:
{
  "estateId": "est_01j9k3m...",
  "status": "intake_complete",
  "taskPlan": {
    "generatedAt": "2026-04-03T14:22:00Z",
    "ruleSetVersion": "CA-2026.1",
    "taskCount": 34,
    "requiredDeadlines": [
      { "task": "file_probate_petition", "dueByDays": 30 }
    ]
  }
}

// 422 Error (invalid state):
{
  "error": "INVALID_STATE_CODE",
  "message": "stateOfResidence must be a valid 2-letter US state code",
  "field": "deceased.stateOfResidence"
}

Section 4

State Rules Engine

The rules engine is the most architecturally unique component in Settle. It must encode legally-correct, state-specific probate rules for all 50 states, remain maintainable by non-engineers (or at least by lawyers working with engineers), support versioning and rollback, and produce an auditable trail of exactly which rule set generated which task plan for a given estate.

This is a legal knowledge base, not a workflow engine. The rules engine's job is to answer one question: given the characteristics of this estate, in this state, what tasks must be completed and in what order? It does not manage task execution — that is the estate state machine. The distinction matters for testing, maintenance, and UPL compliance.

Rule Set Schema

Each state has a versioned JSON rule set stored in Postgres. The JSON document defines the full task graph for that state, with conditions that filter and customize tasks based on estate attributes. The schema is designed to be readable by a paralegal reviewing the document — not just a developer.

// Table: state_rule_sets
// Column: rules JSONB

{
  "state": "CA",
  "version": "CA-2026.1",
  "effectiveDate": "2026-01-01",
  "legalSources": [
    "Cal. Prob. Code § 13100",
    "Cal. Prob. Code § 8000"
  ],
  "probateThreshold": {
    "grossEstateValueCents": 18450000,
    "realPropertyIncluded": true,
    "source": "Cal. Prob. Code § 13100 (adjusted annually)"
  },
  "smallEstateAffidavit": {
    "available": true,
    "maxValueCents": 18450000,
    "waitDays": 40,
    "form": "DE-305"
  },
  "tasks": [
    {
      "id": "obtain_death_certificates",
      "category": "immediate",
      "priority": 1,
      "title": "Order certified death certificates",
      "description": "Order at least 10 certified copies from VitalChek...",
      "conditions": [],
      "deadlineDays": 7,
      "requiredFor": ["open_estate_account", "notify_ssa"],
      "legalGuidance": "Procedural step. Order via VitalChek or local registrar."
    },
    {
      "id": "file_probate_petition",
      "category": "probate",
      "priority": 2,
      "title": "File petition for probate",
      "conditions": [
        {
          "field": "estate.requiresProbate",
          "operator": "eq",
          "value": true
        }
      ],
      "deadlineDays": 30,
      "court": "Superior Court, Probate Division",
      "forms": ["DE-111", "DE-140"],
      "filingFee": {
        "baseFeeCents": 39500,
        "source": "Cal. Gov. Code § 70650"
      }
    }
  ]
}

Rule Evaluation

The rules engine evaluates a rule set against an EstateContext object — a snapshot of all estate attributes relevant to task generation. Evaluation is a pure function with no side effects: given the same EstateContext and RuleSet, it always produces the same task list. This property makes it trivially testable.

// src/lib/rules/evaluator.ts

export interface EstateContext {
  estateId: string;
  stateCode: string;
  dateOfDeath: Date;
  estimatedGrossValueCents: number;
  hasRealProperty: boolean;
  hasWill: boolean;
  hasTrust: boolean;
  hasMinorChildren: boolean;
  hasVeteranStatus: boolean;
  hasBusinessInterests: boolean;
  requiresProbate: boolean; // derived: grossValue > threshold
}

export function evaluateRuleSet(
  context: EstateContext,
  ruleSet: StateRuleSet
): GeneratedTask[] {
  const tasks: GeneratedTask[] = [];

  for (const rule of ruleSet.tasks) {
    const conditionsMet = rule.conditions.every(
      (c) => evaluateCondition(c, context)
    );

    if (conditionsMet) {
      tasks.push({
        ruleId: rule.id,
        ruleSetVersion: ruleSet.version,
        title: rule.title,
        description: rule.description,
        category: rule.category,
        priority: rule.priority,
        dueDateCalc: rule.deadlineDays
          ? addDays(context.dateOfDeath, rule.deadlineDays)
          : null,
        legalGuidance: rule.legalGuidance,
        forms: rule.forms ?? [],
        status: 'pending'
      });
    }
  }

  return tasks.sort((a, b) => a.priority - b.priority);
}

function evaluateCondition(
  condition: RuleCondition,
  context: EstateContext
): boolean {
  const value = getNestedValue(context, condition.field);
  switch (condition.operator) {
    case 'eq': return value === condition.value;
    case 'gt': return value > condition.value;
    case 'lt': return value < condition.value;
    case 'in': return condition.value.includes(value);
    default: throw new Error(`Unknown operator: ${condition.operator}`);
  }
}

Versioning and the Audit Trail

Rule sets are immutable once published. When a legal change requires updating California's rules, a new version CA-2026.2 is created — the old version is never modified. Every generated task carries the ruleSetVersion that produced it. This creates a complete audit trail: any task in any estate can be traced to the exact rule text that created it, at the time it was created.

Column	Purpose
state_rule_sets.version	Unique identifier, format `STATE-YYYY.N` (e.g., `CA-2026.2`)
state_rule_sets.status	`draft`, `review`, `published`, `superseded`
state_rule_sets.superseded_by	FK to newer version; forms a linked list of rule history
tasks.rule_set_version	Captured at task generation time; never updated
estate_events.rule_regeneration	Audit event when task plan is regenerated; records old and new versions

When new assets are discovered mid-process (e.g., a retirement account found three months in), the executor can trigger task plan regeneration. The engine runs against the current published rule set for the estate's state, adds any new tasks that aren't already present, and emits an estate_event recording the delta. Existing tasks are never deleted during regeneration — they may be superseded but the record of their creation is preserved.

Rule Update Process

1

Legal Change Detected

A paralegal or attorney identifies a statutory change (new probate threshold, new required form). They file a GitHub issue tagged legal-rule-change with the cite and effective date.

2

Draft New Rule Version

An engineer creates a new JSON rule set document by copying the current published version and making the required changes. The new version is inserted with status = 'draft' and effective_date set to the statutory effective date.

3

Automated Test Suite

Every state rule set has a corresponding test file with estate fixtures covering the key branching conditions. CI runs all rule tests on every PR. A rule change that breaks an existing fixture must be explicitly acknowledged in the PR description.

4

Legal Review

The draft rule set and the diff from the previous version are reviewed by a licensed attorney in the relevant state. The Settle admin interface provides a human-readable diff view of the rule changes. Review approval is recorded in the rule_set_reviews table with reviewer identity and timestamp.

5

Publish

On or after the effective_date, the admin publishes the new version. The old version's superseded_by is set. New estates in that state will use the new version immediately. Existing estates continue with their current task plan unless the executor triggers regeneration.

6

Notification to Affected Estates

If the rule change affects currently open estates (e.g., a new required filing), the system identifies affected estates and surfaces a notification: "A legal rule change in [State] may affect your estate. Review your task plan." No tasks are automatically added without executor acknowledgment.

UPL Guardrail: The legalGuidance field in every task rule is a procedural description, not legal advice. The schema enforces that this field cannot exceed 500 characters and must pass a content review that rejects first-person advisory language ("you should," "you must consult"). The admin UI renders a warning if submitted guidance contains these patterns.

Section 5

Data Architecture

Postgres via Neon is the primary data store for all structured data. R2 stores documents. There is no separate analytics database at Year 1 — Neon's read replica capability provides read scaling without operational overhead. The schema is designed for the long-lived, event-sourced nature of estate workflows.

Core Schema

estates Primary Entity

CREATE TABLE estates (
  id              TEXT PRIMARY KEY DEFAULT gen_estate_id(), -- 'est_' prefix + ulid
  status          TEXT NOT NULL DEFAULT 'intake',           -- intake|active|closing|closed|archived
  state_code      CHAR(2) NOT NULL,
  opened_at       TIMESTAMPTZ NOT NULL DEFAULT now(),
  closed_at       TIMESTAMPTZ,
  rule_set_version TEXT,                                     -- FK: state_rule_sets.version
  estimated_value_cents  BIGINT,
  requires_probate       BOOLEAN GENERATED ALWAYS AS (
    estimated_value_cents > (
      SELECT probate_threshold_cents
      FROM state_rule_sets
      WHERE state_code = estates.state_code
        AND status = 'published'
      LIMIT 1
    )
  ) STORED,
  metadata        JSONB NOT NULL DEFAULT '{}'               -- flexible attributes
);

CREATE INDEX idx_estates_state ON estates(state_code);
CREATE INDEX idx_estates_status ON estates(status)
  WHERE status NOT IN ('closed','archived');

deceased Sensitive PII

CREATE TABLE deceased (
  id              TEXT PRIMARY KEY DEFAULT gen_id(),
  estate_id       TEXT NOT NULL REFERENCES estates(id),
  first_name      TEXT NOT NULL,
  last_name       TEXT NOT NULL,
  date_of_birth   DATE,
  date_of_death   DATE NOT NULL,
  state_of_residence CHAR(2) NOT NULL,
  -- Column-level encrypted fields (AES-256-GCM, application-layer)
  ssn_encrypted   BYTEA,                                    -- encrypted SSN
  ssn_last4       CHAR(4),                                  -- unencrypted for display/lookup
  -- Access to ssn_encrypted is audit-logged at application layer
  created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE UNIQUE INDEX idx_deceased_estate ON deceased(estate_id);

tasks Core Workflow

CREATE TABLE tasks (
  id              TEXT PRIMARY KEY DEFAULT gen_id(),
  estate_id       TEXT NOT NULL REFERENCES estates(id),
  rule_id         TEXT NOT NULL,                            -- e.g. 'file_probate_petition'
  rule_set_version TEXT NOT NULL,                           -- immutable at creation
  title           TEXT NOT NULL,
  category        TEXT NOT NULL,                            -- immediate|probate|financial|notifications|etc
  status          TEXT NOT NULL DEFAULT 'pending',          -- pending|in_progress|complete|skipped|n_a
  priority        INTEGER NOT NULL DEFAULT 50,
  due_date        DATE,
  completed_at    TIMESTAMPTZ,
  completed_by    TEXT REFERENCES users(id),
  notes           TEXT,
  is_generated    BOOLEAN NOT NULL DEFAULT TRUE,             -- false = manually added
  superseded_at   TIMESTAMPTZ,                              -- set if task replaced on regeneration
  created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_tasks_estate ON tasks(estate_id) WHERE superseded_at IS NULL;
CREATE INDEX idx_tasks_status  ON tasks(estate_id, status)
  WHERE status IN ('pending','in_progress');

state_rule_sets Legal Knowledge Base

CREATE TABLE state_rule_sets (
  version               TEXT PRIMARY KEY,                   -- 'CA-2026.1'
  state_code            CHAR(2) NOT NULL,
  status                TEXT NOT NULL DEFAULT 'draft',      -- draft|review|published|superseded
  effective_date        DATE NOT NULL,
  superseded_by         TEXT REFERENCES state_rule_sets(version),
  probate_threshold_cents BIGINT NOT NULL,
  rules                 JSONB NOT NULL,                     -- full rule set document
  reviewed_by           TEXT,                              -- attorney name on record
  reviewed_at           TIMESTAMPTZ,
  published_by          TEXT REFERENCES users(id),
  published_at          TIMESTAMPTZ,
  created_at            TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_rules_state_published ON state_rule_sets(state_code)
  WHERE status = 'published';

notifications Audit-Critical

CREATE TABLE notifications (
  id              TEXT PRIMARY KEY DEFAULT gen_id(),
  estate_id       TEXT NOT NULL REFERENCES estates(id),
  institution_id  TEXT REFERENCES institutions(id),
  tier            SMALLINT NOT NULL CHECK (tier IN (1, 2, 3)),
  channel         TEXT NOT NULL,                            -- api|mail|phone_script
  status          TEXT NOT NULL DEFAULT 'queued',           -- queued|processing|sent|delivered|failed
  idempotency_key TEXT UNIQUE NOT NULL,                    -- prevents duplicate sends
  external_id     TEXT,                                     -- Lob letter ID, API confirmation, etc.
  queued_at       TIMESTAMPTZ NOT NULL DEFAULT now(),
  sent_at         TIMESTAMPTZ,
  delivered_at    TIMESTAMPTZ,
  failed_at       TIMESTAMPTZ,
  failure_reason  TEXT,
  retry_count     SMALLINT NOT NULL DEFAULT 0,
  payload         JSONB NOT NULL                            -- full request payload, redacted of PII
);

CREATE INDEX idx_notif_estate   ON notifications(estate_id);
CREATE INDEX idx_notif_status   ON notifications(status) WHERE status IN ('queued','processing');
CREATE UNIQUE INDEX idx_notif_idempotency ON notifications(idempotency_key);

audit_log Immutable · Append-Only

CREATE TABLE audit_log (
  id              BIGSERIAL PRIMARY KEY,
  estate_id       TEXT,                                     -- nullable for admin/system events
  actor_id        TEXT NOT NULL,                            -- user ID or 'system'
  actor_role      TEXT NOT NULL,
  action          TEXT NOT NULL,                            -- 'read_ssn', 'task_complete', etc.
  resource_type   TEXT NOT NULL,
  resource_id     TEXT,
  ip_address      INET,
  user_agent      TEXT,
  metadata        JSONB NOT NULL DEFAULT '{}',
  created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Append-only enforced via Postgres policy:
ALTER TABLE audit_log ENABLE ROW LEVEL SECURITY;
CREATE POLICY audit_insert_only ON audit_log
  FOR INSERT WITH CHECK (TRUE);
CREATE POLICY audit_no_update ON audit_log
  FOR UPDATE USING (FALSE);
CREATE POLICY audit_no_delete ON audit_log
  FOR DELETE USING (FALSE);

CREATE INDEX idx_audit_estate   ON audit_log(estate_id) WHERE estate_id IS NOT NULL;
CREATE INDEX idx_audit_action   ON audit_log(action, created_at DESC);
CREATE INDEX idx_audit_actor    ON audit_log(actor_id, created_at DESC);

Field-Level Encryption

The most sensitive fields — SSN, financial account numbers, medical record numbers — are encrypted at the application layer before being written to Postgres. Disk encryption (provided by Neon) is a necessary baseline but insufficient alone: it does not protect against a compromised database credential or a SQL injection vulnerability that returns raw rows. Field-level encryption ensures that even a full database dump is useless without the encryption keys.

Encryption Approach

Algorithm: AES-256-GCM with a random 96-bit nonce per encryption operation
Key Management: Per-estate derived keys using HKDF from a master key stored in Fly.io Secrets. The master key is rotatable without re-encrypting all records (re-encryption is a background job).
Storage format: BYTEA column stores nonce (12 bytes) || ciphertext || auth_tag (16 bytes)
Key derivation: HKDF(masterKey, salt=estateId, info="settle-v1-field-encryption")

// src/lib/crypto/fieldEncryption.ts

import { hkdf, getRandomValues } from 'node:crypto';

const MASTER_KEY = Buffer.from(process.env.FIELD_ENCRYPTION_KEY!, 'hex');
const ALGORITHM = 'aes-256-gcm';

export async function encryptField(
  plaintext: string,
  estateId: string
): Promise<Buffer> {
  const derivedKey = await deriveEstateKey(estateId);
  const nonce = getRandomValues(new Uint8Array(12));
  const cipher = createCipheriv(ALGORITHM, derivedKey, nonce);
  const encrypted = Buffer.concat([
    cipher.update(plaintext, 'utf8'),
    cipher.final()
  ]);
  const authTag = cipher.getAuthTag();
  // Pack: nonce (12) + ciphertext + authTag (16)
  return Buffer.concat([nonce, encrypted, authTag]);
}

export async function decryptField(
  cipherBuffer: Buffer,
  estateId: string
): Promise<string> {
  // Audit log this access BEFORE decryption
  await auditLog.write({ action: 'field_decryption', estateId });

  const derivedKey = await deriveEstateKey(estateId);
  const nonce = cipherBuffer.subarray(0, 12);
  const authTag = cipherBuffer.subarray(cipherBuffer.length - 16);
  const ciphertext = cipherBuffer.subarray(12, cipherBuffer.length - 16);

  const decipher = createDecipheriv(ALGORITHM, derivedKey, nonce);
  decipher.setAuthTag(authTag);
  return decipher.update(ciphertext) + decipher.final('utf8');
}

Fields That Require Encryption

Table	Column	Sensitivity	Display Fallback
deceased	ssn_encrypted	Critical	ssn_last4 (unencrypted)
financial_accounts	account_number_encrypted	Critical	account_last4
financial_accounts	routing_number_encrypted	High	institution name
benefits	policy_number_encrypted	High	insurer name + type
deceased	medical_record_numbers	High	provider name only

Document Vault (Cloudflare R2)

Documents — death certificates, wills, financial statements — are stored in R2 with server-side encryption (SSE-C using a per-estate key). The API never streams document content directly; instead it issues short-lived (5-minute) presigned URLs for download and upload. Documents are organized in R2 by a path scheme that does not expose PII in the object key.

R2 path format: estates/{estateId}/{documentType}/{ulid}.{ext}

On upload, the Document Processor worker intercepts the R2 object:created event, performs virus scanning (ClamAV via a Fly.io sidecar), attempts OCR classification, and writes document metadata to Postgres. If virus scanning fails, the document is moved to a quarantine prefix and the estate is notified.

Section 6

Notification Service Architecture

The notification service is the most operationally complex component in Settle. It must send actual cancellation requests to real institutions, generate and mail physical letters via Lob, and produce call scripts — each with fundamentally different reliability, latency, and tracking requirements. A failure in Tier 1 must not affect Tier 2; a physical letter sent twice is a worse failure mode than a digital request sent twice.

Tier 1

Automated API Cancellation

Direct API calls to subscription services
Netflix, Spotify, utility APIs
Idempotent via service-side dedup
Confirmation number stored in DB
Retry on 5xx, not on 4xx
Success = 200/204 from service

Tier 2

Managed Physical Mail

Lob API for letter generation
Death cert PDF merged into template
Lob provides USPS tracking
Delivery webhook updates status
Never send twice (idempotency key)
CASS-certified addressing required

Tier 3

Guided Call Scripts

Purely computational — no external calls
Template rendering + personalization
Returns structured script object
Includes estimated hold times
No retry needed (idempotent by nature)
Logged to notification_records table

Worker Architecture

The notification worker is a long-running Fly.io process that polls the PgBoss job queue. It is isolated from the API server to prevent notification failures from impacting user-facing requests. The worker processes jobs by tier in priority order: Tier 2 (physical mail, longest lead time) processes first, then Tier 1, then Tier 3.

Tier 1 — API Cancellation

Tier 1 notifications call institution APIs directly to cancel subscriptions or notify of a death. The key engineering challenges are: (1) each institution has a different API contract, (2) idempotency must be enforced even if the request succeeds but the network drops before the response arrives, and (3) the system must distinguish between "institution confirmed cancellation" and "institution returned 200 but nothing happened."

// Institution adapter interface — each institution implements this
interface InstitutionAdapter {
  institutionId: string;
  sendNotification(
    payload: NotificationPayload,
    idempotencyKey: string
  ): Promise<NotificationResult>;
}

interface NotificationResult {
  success: boolean;
  externalId?: string;       // confirmation number from institution
  confirmedAction?: string;   // 'cancelled' | 'notified' | 'pending_review'
  errorCode?: string;
  retryable: boolean;          // false for 4xx, true for 5xx/network errors
}

// Idempotency key construction
const idempotencyKey = `tier1:${estateId}:${institutionId}:${notificationType}`;
// This key is stable across retries. Postgres UNIQUE constraint on
// notifications.idempotency_key ensures no duplicate processing.

Tier 2 — Physical Mail via Lob

Physical mail has the highest failure cost of the three tiers. A letter sent to a wrong address wastes money and time, but more importantly, it may delay asset recovery for a grieving family. The Tier 2 handler performs address validation via Lob's CASS-certified address API before creating the letter. Only after validation succeeds does it proceed to letter creation.

// Tier 2 handler flow
async function handleTier2(job: NotificationJob) {
  // 1. Check idempotency — was this letter already sent?
  const existing = await db.notifications.findByIdempotencyKey(job.idempotencyKey);
  if (existing?.external_id) {
    // Already sent to Lob — fetch status and return
    return { success: true, externalId: existing.external_id, alreadySent: true };
  }

  // 2. Validate address via Lob before creating letter
  const addressVerification = await lob.usVerifications.verify({
    primary_line: job.recipientAddress.line1,
    city: job.recipientAddress.city,
    state: job.recipientAddress.state,
    zip_code: job.recipientAddress.zip
  });
  if (addressVerification.deliverability === 'undeliverable') {
    throw new NotificationError('UNDELIVERABLE_ADDRESS', { retryable: false });
  }

  // 3. Fetch death certificate presigned URL from R2
  const deathCertUrl = await r2.getPresignedUrl(job.estateId, 'death_certificate');

  // 4. Create letter via Lob API
  const letter = await lob.letters.create({
    description: `Estate notification: ${job.estateId}`,
    to: addressVerification.components,
    from: SETTLE_RETURN_ADDRESS,
    file: job.templateId,
    merge_variables: {
      deceasedName: job.deceasedName,
      institutionName: job.institutionName,
      estateExecutor: job.executorName,
      deathCertificateEnclosure: true
    },
    // Lob idempotency header — prevents duplicate if network fails
    idempotencyKey: job.idempotencyKey
  });

  // 5. Store Lob letter ID for tracking
  await db.notifications.update(job.notificationId, {
    status: 'sent',
    external_id: letter.id,
    sent_at: new Date()
  });
}

Lob sends delivery status webhooks when a letter is in-transit, delivered, or returned. The webhook handler updates notifications.status and, on delivery failure (returned mail), creates a new task for the executor to verify the institution's address.

Tier 3 — Call Script Generation

Tier 3 is a pure computation: given an estate context and institution profile, generate a structured call script. No external API calls are made. The script object is returned synchronously to the client and also stored in notifications for record-keeping. The handler uses a template system with institution-specific overrides for hold queues, required account information, and department routing.

Reliability Model

The critical distinction: Tier 1 failure means an institution subscription was not cancelled and may continue charging the estate. Tier 2 failure means a physical letter was not sent and a deadline may be missed. Both are worse outcomes than a system error that surfaces to the executor with a retry option.

Tier	Retry Strategy	Backoff	Max Retries	Failure Action
Tier 1	Retry on 5xx and network errors; no retry on 4xx	Exponential: 1m, 5m, 30m, 2h	4	Mark failed; create manual task for executor
Tier 2	Retry on Lob API errors; no retry on address validation failure	Exponential: 5m, 30m, 4h	3	Mark failed; notify executor to verify address
Tier 3	No retry needed (pure computation)	—	0	Surface error with context; log for debugging

Section 7

Benefit Discovery Architecture

Benefit discovery is the process of finding assets and entitlements the family may not know about: unclaimed property held by states, life insurance policies, pension benefits, VA entitlements. Some sources have APIs; some require web scraping; some require manual submission. The architecture must handle all three and present results with appropriate confidence signals so families act on real findings, not false positives.

Source Taxonomy

Source	Data	Integration Type	Auth Required	Rate Limit
NAUPA / MissingMoney	Unclaimed property (all 50 states)	HTTP scrape	None	Per-state limits, ~1 req/10s
NAIC Life Policy Locator	Life insurance policies	Form submission	Registration required	Manual review cycle (days)
PBGC Pension Search	Defined-benefit pension benefits	HTTP scrape	None	Unknown; respect robots.txt
VA Benefits	Veteran burial/death benefits	API (va.gov)	API key	Documented per-token limits
SSA Death Benefits	Lump-sum death payment	Manual guidance	—	—
MIB (Medical Info Bureau)	Insurance application records	Manual guidance	—	—
State unclaimed property (direct)	State treasury holdings	HTTP scrape	Varies by state	Vary by state

Scanner Architecture

The benefit scanner is a Fly.io worker that runs on a cron schedule (once per week per active estate) and on-demand when triggered by the executor. It uses a source adapter pattern — each external source has an adapter implementing a common interface — allowing new sources to be added without modifying the core scanner logic.

// Source adapter interface
interface BenefitSourceAdapter {
  sourceId: string;
  type: 'api' | 'scrape' | 'manual_guidance';
  canAutoScan: boolean;

  scan(
    context: ScanContext
  ): Promise<BenefitScanResult[]>;
}

interface ScanContext {
  deceasedName: { first: string; last: string };
  deceasedSsn?: string;         // decrypted only for sources that require it
  dateOfDeath: Date;
  stateOfResidence: string;
  hasVeteranStatus: boolean;
}

interface BenefitScanResult {
  sourceId: string;
  confidence: 'confirmed' | 'probable' | 'possible';
  benefitType: string;
  estimatedValueCents?: number;
  claimUrl?: string;
  manualStepsRequired?: string[];
  rawData: Record<string, unknown>;
  scannedAt: Date;
}

Caching Strategy

External benefit databases must not be queried on every page load. Scan results are cached in Postgres with a scanned_at timestamp. The frontend shows the cached result with a freshness indicator. Scans are throttled: no source is queried more than once per 24 hours per estate, regardless of how many times the executor views the benefits page. This prevents accidental hammering of NAUPA or PBGC from a user repeatedly refreshing the page.

SSN handling in the scanner: The SSN is only decrypted immediately before a scan that requires it (currently only NAUPA for some states). It is passed as a transient string within the worker's memory and is never logged, stored in the scan result, or included in audit logs. The audit log records that a scan requiring SSN access was performed, with the estate ID and timestamp, but not the SSN value itself.

Confidence Model

Confirmed

Name and SSN matched a specific record in the source database. A dollar amount is known. Example: NAUPA returned an exact unclaimed property record. Action: prompt executor to file claim immediately.

Probable

Name matched without SSN confirmation, or record exists but amount is unknown. Example: PBGC found a pension record with matching employer name. Action: display with guidance to verify and claim.

Possible

Based on estate profile characteristics (e.g., deceased worked in a state with a large unclaimed property backlog), a benefit likely exists but no matching record was found. Action: display as guidance with manual lookup steps.

Manual-guidance sources (NAIC, MIB, SSA): These sources cannot be automated. For these, the scanner generates structured guidance: the exact URL to visit, the form to complete, the information to have on hand, and the expected response timeline. This guidance is surfaced as a "Possible" benefit with a checklist of manual steps. The executor can mark it as completed when done.

Section 8

Security Architecture

Settle handles death certificates, Social Security numbers, financial account numbers, and medical history. A breach of this data against a grieving family is a categorical failure. Security is not a feature to be added in Year 2 — every component in this document has been designed with Defense in Depth as a first principle.

Threat Model Summary

Threat Vector	Severity	Primary Mitigation	Secondary Mitigation
SQL injection → PII exfiltration	Critical	Parameterized queries (Drizzle ORM); no string concatenation	Field-level encryption renders exfiltrated SSNs useless
IDOR: accessing another estate's data	Critical	Postgres RLS policies keyed to session estate ID	API middleware ownership check before every handler
Credential theft (session hijack)	Critical	HttpOnly, Secure, SameSite=Strict session cookie	Request fingerprinting; IP change triggers re-auth prompt
Document vault unauthorized access	Critical	Presigned URLs valid for 5 minutes only; R2 bucket not public	SSE-C per-estate keys; document access is audit-logged
Compromised database credential	High	Neon per-role least-privilege credentials; no superuser in app	Field encryption; SSNs unreadable without app-layer keys
Supply chain attack (npm)	High	Dependabot; lockfile integrity checked in CI	Minimal dependency philosophy; audit npm packages quarterly
Unauthorized Practice of Law	High	Content policy in rules engine schema; architectural separation of guidance vs advice	Legal review required for all rule set publications
Over-retention of sensitive data	Medium	Retention policy: 7 years post-estate closure; automated deletion jobs	Right-to-erasure workflow for CCPA compliance

Defense in Depth: Layer View

CDN

Layer 1 — Edge (Cloudflare)

DDoS mitigation, WAF rules blocking common injection patterns, bot management. Rate limiting at the IP level: 100 requests/minute to API routes, 10 authentication attempts/minute per IP. All traffic forced to HTTPS with HSTS preloading.

APP

Layer 2 — Application (SvelteKit hooks)

Session validation on every authenticated request. RBAC check before handler execution. CSRF protection via double-submit cookie on all state-mutating requests. Input validation with Zod schemas — reject malformed input before any database interaction. Content Security Policy headers on all responses.

DB

Layer 3 — Database (Postgres RLS + Roles)

Row-Level Security policies ensure a database connection authenticated as settle_app can only read rows belonging to the estate in the session context. A separate read-only role settle_analytics has no access to the PII tables. The audit_log table is append-only at the Postgres policy level.

ENC

Layer 4 — Field Encryption (AES-256-GCM)

SSN, financial account numbers, and medical record identifiers are encrypted at the application layer before being written to Postgres. The encryption keys are derived per-estate from a master key stored in Fly.io Secrets — never in the database or application code. A full database dump without the application keys is useless for extracting these fields.

R2

Layer 5 — Document Storage (R2 SSE-C)

Documents are encrypted at rest in R2 using customer-supplied keys (SSE-C), with a unique key per estate derived from the same master key as field encryption. Documents are never accessible via a public URL. All access is via presigned URLs generated server-side, valid for 5 minutes, scoped to a single object, and audit-logged.

AUD

Layer 6 — Audit Logging (Immutable)

Every access to a sensitive field, every document download, every task mutation, and every rule set publication is written to the append-only audit_log table. Postgres RLS prevents any application role from updating or deleting audit records. Logs are retained for 7 years per legal hold requirements and are exportable per estate for legal discovery.

Data Retention and the Right to Erasure

Estate records are retained for 7 years following estate closure, aligned with common statute of limitations for executor liability. At the 7-year mark, a background job initiates deletion: PII fields are overwritten with null values, documents are deleted from R2, and the estate record is anonymized (names replaced with hashed identifiers). The estate's task completion record and financial summary are retained in anonymized form for aggregate analytics.

For CCPA right-to-erasure requests during an active estate, the request is held pending estate closure (deletion during active administration would be legally problematic). The request is logged and honored automatically at closure plus a 90-day cooling-off period.

Section 9

Scaling Strategy

Settle's growth trajectory — 500 estates in Year 1, 5,000 in Year 2, 50,000 in Year 3 — spans two orders of magnitude. The architecture is sized for Year 1 today and designed to scale to Year 3 without rearchitecting the core data model or notification service. Each year has a clear set of scaling gates that trigger architectural evolution.

Year 1 — Foundation

500

Active estates

~50

Concurrent users (peak)

Modular monolith on Fly.io
Neon 0.25 CU, autoscale
Single notification worker
Upstash Redis for sessions
Benefit scanner: weekly cron

Year 2 — Vertical Scale

5,000

Active estates

~500

Concurrent users (peak)

Add Neon read replica for analytics
Scale API to 4 instances
Notification worker: 2 instances
Add connection pooling (PgBouncer)
R2 bucket per region (2 regions)

Year 3 — Horizontal Scale

50,000

Active estates

~5,000

Concurrent users (peak)

Extract notification service to standalone API
Extract benefit scanner to standalone service
Neon: scale to 4+ CU with dedicated compute
Consider Postgres partitioning on tasks by estate
Dedicated analytics database (ClickHouse)

Key Scaling Decisions and Their Triggers

When	Trigger	Action	Complexity Cost
~2,000 estates	DB p99 query time > 100ms	Add PgBouncer connection pooler; add Neon read replica for reports	Low — operational only
~5,000 estates	Notification queue depth consistently > 100	Scale notification worker to 3 concurrent instances	Low — Fly.io scaling config only
~10,000 estates	API P99 latency > 500ms or error rate > 0.1%	Separate API server and notification worker into distinct Fly apps; independent scaling	Medium — deploy config and inter-service auth
~30,000 estates	Tasks table exceeds 10M rows; scan times degrade	Partition tasks table by estate_id range; add partial indexes on active estates	Medium — zero-downtime migration required
~50,000 estates	Benefit scanner can't complete weekly runs within window	Extract scanner to dedicated service; parallelize by state/source	High — separate service with its own queue and auth

The modular monolith pays off here. Because service boundaries are enforced at the module level from Day 1 — the notification worker, benefit scanner, and rules engine are separate TypeScript modules with defined interfaces — extracting them to standalone services at Year 3 is a deployment change, not a rewrite. The code does not need to change; only the deployment topology does.

Section 10

Architecture Decision Records

Every significant architectural decision is recorded here with its context, the options considered, the decision made, and the tradeoffs accepted. These records are immutable once a decision is implemented — new decisions supersede rather than modify them.

ADR-001

Postgres JSONB for State Rules Engine (vs Dedicated Rules Engine)

Accepted

Context

The system must represent, evaluate, and version legal rules for 50 states with different probate thresholds, required forms, and filing deadlines. Options ranged from a dedicated business rules engine (Drools, OpenL Tablets) to a custom DSL to JSON data in Postgres.

Decision

Store rule sets as versioned JSONB documents in Postgres, evaluated by a pure TypeScript function in the application layer. Rule logic is simple conditional evaluation — no forward chaining, no conflict resolution, no complex inference. A full rules engine would solve problems we don't have while adding operational complexity we can't justify at Year 1 scale.

Tradeoffs

Chosen: JSONB in Postgres

No new infrastructure to operate
Rule sets are version-controlled with the database
TypeScript evaluator is trivially testable
Readable by paralegals with training
Easy to extend the schema as rules grow

Alternative: Drools / OpenL

Industry-standard for complex rule systems
Forward-chaining and conflict resolution built in
Requires Java runtime or separate service
High operational overhead for 50-state static rules
Over-engineered for conditional task filtering

Consequence

If rules become significantly more complex — mutual exclusions, forward-chained triggers, conflict resolution between state and federal rules — this decision should be revisited. The evaluator module is the natural extraction point.

ADR-002

Column-Level Encryption (vs Disk Encryption Only)

Accepted

Context

Neon provides disk-level encryption at rest (AES-256). The question is whether application-layer column-level encryption is also required for the highest-sensitivity fields (SSN, financial account numbers).

Decision

Implement column-level AES-256-GCM encryption in the application for SSN, financial account numbers, routing numbers, and medical record identifiers. Disk encryption is not sufficient because it does not protect against: a compromised database credential returning raw rows, a SQL injection vulnerability, a misconfigured query in the application, or a cloud provider employee with storage access. Column encryption adds a defense layer that is independent of all of these.

Tradeoffs

Chosen: Column-Level Encryption

Encrypted at-rest data useless without app keys
Independent of database security posture
Supports field-level access audit logging
Enables per-estate key derivation

Alternative: Disk Encryption Only

Zero application complexity
Fields are queryable/indexable
Does not protect against SQL injection
Does not protect against credential compromise

Consequence

Encrypted columns cannot be indexed or searched directly. For SSN, we store the last 4 digits in a separate unencrypted column for display and lookup purposes. Full SSN is only decrypted on explicit access, which is audit-logged. Search by SSN is not a supported use case in the product.

ADR-003

Lob for Physical Mail (vs Building Mail Infrastructure)

Accepted

Context

Tier 2 notifications require generating and mailing physical letters — often including a copy of the death certificate — to financial institutions. Options include using a mail API service (Lob), partnering with a print/mail fulfillment vendor, or building the capability in-house.

Decision

Use Lob's letter API. Lob provides CASS-certified address verification, USPS tracking, delivery webhooks, and secure document handling with SOC 2 Type II certification. Building this in-house would require print vendor relationships, postage accounts, CASS certification, and return mail handling — none of which is core product differentiation.

Tradeoffs

Chosen: Lob API

CASS address verification included
USPS tracking and delivery webhooks
SOC 2 Type II certified
No vendor relationship management
Higher per-letter cost vs volume contracts

Alternative: In-House / Vendor

Lower per-letter cost at volume
Full control over design and timing
Requires CASS certification
Return mail handling complexity
Significant operational overhead

Consequence

Lob's per-letter pricing (~$1.50–$3.50 including postage) is acceptable at Year 1–2 volume. At Year 3 (50,000 estates × multiple letters each), a volume contract negotiation with Lob or a migration to a direct print vendor should be evaluated. The Tier 2 handler interface makes this migration a swap of the Lob adapter only.

ADR-004

Session-Based Auth (vs JWT)

Accepted

Context

Settle requires authentication for a user population with a 16–18 month engagement window, role changes (adding attorneys, removing co-executors), and the potential for emergency access revocation. The choice between stateless JWT auth and stateful session auth carries meaningfully different security properties.

Decision

Session-based authentication with an HttpOnly, Secure, SameSite=Strict cookie storing a cryptographically random session token. Sessions are stored in Upstash Redis with a 30-day sliding expiry. The primary driver is instant revocability: when an attorney is removed from an estate or a user reports a compromised account, the session can be invalidated immediately by deleting it from Redis. With JWTs, the token remains valid until expiry regardless of server-side state changes.

Tradeoffs

Chosen: Session Auth

Instant session revocation on any device
Role changes take effect immediately
No complex token refresh infrastructure
Requires Redis for session storage
Every request hits Redis (fast, but a dependency)

Alternative: JWT (short-lived)

Stateless — no session store required
Horizontally scales without shared state
Cannot revoke before expiry without a denylist (= session store again)
Refresh token complexity
Wrong security model for role-change-heavy workflows

ADR-005

SvelteKit (vs Next.js)

Accepted

Context

The frontend required SSR capabilities, a co-located API layer, TypeScript support, and a modern reactive component model. The two leading options were SvelteKit and Next.js (React).

Decision

SvelteKit. The key factors: (1) SvelteKit's server routes co-locate API logic with page logic, reducing cognitive overhead for a small team; (2) Svelte's compiled output is smaller and faster than React + React DOM, relevant for users on mobile connections; (3) SvelteKit's form actions provide a clean, progressive-enhancement model for the intake flow; (4) the load function pattern makes server/client data boundary explicit and is well-suited to the estate data model.

Tradeoffs

Chosen: SvelteKit

Smaller bundle size; faster on mobile
Co-located API routes reduce context switching
Form actions great for intake flows
Smaller ecosystem than React
Fewer available engineers in hiring market

Alternative: Next.js (React)

Largest frontend ecosystem
React Server Components for complex UIs
Larger engineer hiring pool
App router complexity overhead
Larger runtime footprint

ADR-006

Modular Monolith (vs Microservices from Day 1)

Accepted

Context

The system has several conceptually distinct services: the rules engine, notification service, and benefit scanner. The architectural question is whether to deploy these as separate services immediately or co-locate them in a single deployed application.

Decision

Deploy as a modular monolith: enforce service boundaries via TypeScript module boundaries and strict inter-module interfaces, but deploy as a single Fly.io application (with the notification worker and benefit scanner as separate Fly machines). At 500 estates per year, the operational overhead of service meshes, inter-service auth, distributed tracing, and independent deployment pipelines is not justified. The module boundaries ensure extraction is a deployment change, not a rewrite, when scale demands it.

Consequence

A single TypeScript build artifact. A bug in the notification module could crash the API server. Mitigation: notification and scanner workers run as separate Fly machines from the API. A worker crash does not affect user-facing routes. The API server imports the rules engine synchronously; a rules engine bug affects the API. Mitigation: extensive test coverage of the evaluator; feature flags to disable rules engine on route level.

ADR-007

PgBoss for Job Queue (vs Redis-backed Queue)

Accepted

Context

The notification service and benefit scanner require a reliable job queue. The main options were a Redis-backed queue (BullMQ) or a Postgres-backed queue (PgBoss).

Decision

PgBoss. Because Postgres is already the primary data store, PgBoss requires no additional infrastructure. Job state, retry history, and dead-letter queues are all in the same database as the business data. This means job records and the notifications table can be queried in the same transaction, and the entire job history is available for operational debugging without a separate Redis cluster.

Tradeoffs

Chosen: PgBoss (Postgres)

No additional infrastructure
Transactional job creation with business data
Full job history queryable in SQL
Lower throughput ceiling than Redis (~1,000 jobs/sec)
Adds write load to primary database

Alternative: BullMQ (Redis)

Higher throughput (10,000+ jobs/sec)
Better real-time queue monitoring
Additional infrastructure to operate
Job state not colocated with business data
Not necessary at Year 1 volume

ADR-008

Confidence-Tiered Benefit Results (vs Binary Found/Not-Found)

Accepted

Context

The benefit scanner may find records that match by name but not SSN, or may infer likely benefits from estate characteristics without finding a specific record. The UX question is how to present these results to a grieving family without creating false expectations or causing them to overlook legitimate findings.

Decision

Use a three-level confidence model: Confirmed (SSN + name match with a specific dollar amount), Probable (name match or record exists without full confirmation), and Possible (inferred from estate characteristics). Each level has different UI treatment, different call-to-action copy, and different task urgency. This avoids the false precision of a binary model while not overwhelming the executor with speculative results.

Consequence

The "Possible" tier relies on actuarial inference rather than database results. This is intentional: a deceased veteran in California with no VA record showing in our database probably still has benefits that should be claimed. Displaying nothing would be a disservice. The content associated with "Possible" results must be clearly labeled as guidance, not as a confirmed finding — this is an active UPL guardrail concern.

ADR-009

Neon Postgres (vs PlanetScale / Supabase)

Accepted

Context

The system requires a managed Postgres-compatible database with good developer experience, branching for staging environments, and a clear scaling path. The primary contenders were Neon, PlanetScale (MySQL-based), and Supabase.

Decision

Neon. The decisive factors: (1) true Postgres compatibility — no MySQL dialect differences, no constraints on foreign keys or multi-statement transactions; (2) database branching for staging environments is first-class and maps well to the PR-based development workflow; (3) autoscaling compute from zero means no idle cost during development and low initial COGS; (4) Row Level Security works as expected with no Neon-specific constraints.

Tradeoffs

Chosen: Neon

True Postgres (not compatible — actual)
Database branching for staging
Scale to zero in dev
Connection pooling via Neon serverless proxy
Newer product; some enterprise features still maturing

Alternative: Supabase

Postgres with auth, storage, realtime built in
Could replace Redis for sessions
Tighter coupling to Supabase ecosystem
BYO auth/storage already planned
More opinionated platform lock-in

ADR-010

Immutable Audit Log via Postgres RLS (vs External SIEM)

Accepted

Context

Audit logging of all PII access and estate mutations is required for compliance and legal discovery. Options include an append-only Postgres table with RLS enforcement, a dedicated audit database, or shipping logs to an external SIEM (Datadog, Splunk).

Decision

Append-only Postgres table with RLS policies preventing UPDATE and DELETE for all application roles. At Year 1 volume, the audit log will accumulate ~5M rows per year — comfortably handled by Postgres with appropriate indexes. An external SIEM adds $3,000–$10,000/year in operational cost and complexity that is not justified until the volume of events warrants real-time alerting and cross-product correlation.

Consequence

At Year 3 scale (~50,000 estates), the audit log will have 50–100M rows. Partition the table by month at that point. Export to cold storage (R2 or S3 Glacier) for records older than 2 years. Begin evaluating Datadog or similar at 10M rows/year to enable real-time anomaly detection on PII access patterns.