Settle — System Architecture
A guided estate administration platform built for the 2.8 million families navigating probate each year. This document defines the architecture for every technical decision from field-level encryption to 50-state legal rules.
Architecture Principles
These principles are not aspirational — they are constraints. Every design decision in this document can be traced back to at least one of them. When a tradeoff arises, principles at the top of this list take precedence.
System Overview
Settle is architecturally a modular monolith deployed on Fly.io, backed by Neon Postgres and Cloudflare R2. It is not a microservices system — at Year 1 scale (500 estates), the operational overhead of distributed services outweighs the benefits. Service boundaries are enforced at the module level within the codebase, making extraction straightforward when scale demands it.
Deployment Topology
All server-side compute runs on Fly.io. The SvelteKit application serves both the frontend (SSR) and the API routes. A separate Fly.io machine type runs background workers (notification service, benefit scanner) on a schedule or queue trigger. Neon Postgres provides the primary data store with automatic branching for staging environments. Cloudflare R2 stores documents, accessed via pre-signed URLs generated by the API — the browser never talks directly to R2 in upload mode.
| Component | Platform | Machine Type | Count (Y1) | Scaling Trigger |
|---|---|---|---|---|
| api-server | Fly.io | shared-cpu-2x · 512MB | 2 (HA) | CPU > 70% for 2 min |
| notification-worker | Fly.io | shared-cpu-1x · 256MB | 1 | Queue depth > 50 |
| benefit-scanner | Fly.io | shared-cpu-1x · 256MB | 1 | Cron-triggered |
| postgres | Neon | 0.25 CU · autoscale | 1 primary | Neon autoscale |
| document-vault | Cloudflare R2 | Object storage | — | Serverless |
| session-store | Upstash Redis | Serverless Redis | — | Serverless |
Frontend Architecture
The frontend is SvelteKit deployed to Cloudflare's global CDN edge network. This gives static assets and pre-rendered pages sub-50ms TTFB globally without any CDN configuration overhead. The SvelteKit application uses a hybrid rendering strategy: marketing pages and unauthenticated flows are fully SSR'd for SEO; the authenticated estate application is server-rendered for initial load then transitions to client-side navigation.
SSR Strategy
| Route Pattern | Rendering Mode | Rationale |
|---|---|---|
| / | prerender | Marketing page, fully static, cached at edge |
| /how-it-works, /pricing | prerender | SEO-critical, no dynamic content |
| /signup, /login | SSR | CSRF token injection, form handling |
| /estate/[id]/dashboard | SSR + CSR | Initial data load server-side; subsequent navigation client-side |
| /estate/[id]/tasks | SSR + CSR | Task plan rendered on server, mutations via client fetch |
| /estate/[id]/documents | SSR + CSR | File list server-rendered; upload/preview client-side |
Grief-Aware UX Architecture
The people using Settle are in one of the most cognitively impaired states a person can experience. The architecture must account for this — not just the visual design, but the caching, session, and error recovery behavior of the application.
Client-Side State Preservation
SvelteKit's +page.server.ts load functions cache estate data with a 60-second stale-while-revalidate window. Form state is auto-saved to localStorage every 5 seconds during intake flows, keyed by estate ID and form name. If the browser tab is closed mid-form, the user resumes exactly where they left off on next visit.
// src/lib/stores/formPersist.ts
export function createPersistedForm(estateId: string, formKey: string) {
const storageKey = `settle:form:${estateId}:${formKey}`;
return {
restore: () => {
const raw = localStorage.getItem(storageKey);
return raw ? JSON.parse(raw) : null;
},
save: (data: unknown) => {
localStorage.setItem(storageKey, JSON.stringify({
data,
savedAt: Date.now()
}));
},
clear: () => localStorage.removeItem(storageKey)
};
}
Session Continuity for Long-Lived Workflows
Estate workflows span 16–18 months. Standard session expiry of 24–72 hours is inappropriate. Sessions are configured with a 30-day sliding window. The session cookie carries only the session ID; all session data (user ID, estate ID, role, last active page) lives in Redis. On each authenticated request, the session TTL is refreshed. The "last active page" is stored and presented as a resume prompt on next login.
Offline Support
Families in rural areas, or using cellular connections at a funeral home, may have intermittent connectivity. The application uses a Service Worker registered at the app root to cache the application shell and the current estate's task plan for offline viewing. Mutations made offline are queued and replayed when connectivity is restored.
| Asset Type | Cache Strategy | TTL |
|---|---|---|
| App shell (HTML, CSS, JS) | Cache-first after first load | Stale until new deploy |
| Estate task plan | Stale-while-revalidate | 10 minutes |
| Document metadata | Stale-while-revalidate | 5 minutes |
| Document files (PDFs) | Cache on demand (explicit user action) | 1 hour |
| API responses (GET) | Network-first with cache fallback | 60 seconds |
Backend API
The API is implemented as SvelteKit server routes (+server.ts files), co-located with the frontend. This is not a tradeoff in Year 1 — it means a single deployment, shared TypeScript types between server and client, and zero serialization overhead. The API layer follows REST conventions with resource-oriented URLs and standard HTTP semantics.
Authentication and RBAC
Authentication is session-based with cookies. Sessions are stored in Upstash Redis with a 30-day sliding expiry. The session token is a cryptographically random 256-bit value (generated with crypto.getRandomValues). The session payload includes the user ID, estate ID (if applicable), role, and a fingerprint of the request IP and user agent for anomaly detection.
Role Definitions
| Role | Description | Key Permissions | Restrictions |
|---|---|---|---|
| executor | Named executor of the estate | Full CRUD on estate, tasks, documents, notifications | Cannot delete estate record (soft delete only) |
| co_executor | Named co-executor | Same as executor; all actions are co-attributed | Cannot remove executor or self |
| attorney | Legal counsel on the estate | Read all; write case notes; upload documents | Cannot send notifications or modify task status |
| family_viewer | Family member with view-only access | Read task status, document list (not contents), estate summary | No writes; no PII fields; no financial data |
| admin | Settle staff (internal only) | All estates; audit log access; legal rule management | All access is logged; no unilateral PII access |
RBAC is enforced at two layers: the route handler level (SvelteKit hooks check the session role before the handler executes) and the database level (Postgres row-level security policies that reference the current session's estate ID). This means a misconfigured route cannot accidentally return data from another estate — the database policy will reject the query.
Key API Endpoints
Estate Management
| Method | Path | Description | Required Role |
|---|---|---|---|
| POST | /api/estates | Create new estate (intake) | Authenticated user |
| GET | /api/estates/:id | Get estate summary | executor, co_executor, attorney, family_viewer |
| PATCH | /api/estates/:id | Update estate (assets discovered, status change) | executor, co_executor |
| GET | /api/estates/:id/tasks | Get full task plan with rule version metadata | executor, co_executor, attorney, family_viewer |
| POST | /api/estates/:id/tasks/regenerate | Regenerate task plan (new assets/state change) | executor, co_executor |
| PATCH | /api/estates/:id/tasks/:taskId | Update task status, add notes | executor, co_executor |
| POST | /api/estates/:id/documents | Request presigned upload URL | executor, co_executor, attorney |
| POST | /api/estates/:id/notifications | Queue notification (tier specified in body) | executor, co_executor |
| GET | /api/estates/:id/benefits | Get benefit scan results | executor, co_executor |
| POST | /api/estates/:id/benefits/scan | Trigger benefit scan for estate | executor, co_executor |
Sample: Create Estate
// POST /api/estates
// Request Body:
{
"deceased": {
"firstName": "Margaret",
"lastName": "Chen",
"dateOfBirth": "1942-03-15",
"dateOfDeath": "2026-03-01",
"stateOfResidence": "CA",
"ssn": "XXX-XX-XXXX" // encrypted in transit and at rest
},
"executor": {
"relationship": "child"
},
"estateProfile": {
"hasRealProperty": true,
"estimatedAssetValue": "250000-500000",
"hasWill": true,
"hasTrust": false
}
}
// 201 Created Response:
{
"estateId": "est_01j9k3m...",
"status": "intake_complete",
"taskPlan": {
"generatedAt": "2026-04-03T14:22:00Z",
"ruleSetVersion": "CA-2026.1",
"taskCount": 34,
"requiredDeadlines": [
{ "task": "file_probate_petition", "dueByDays": 30 }
]
}
}
// 422 Error (invalid state):
{
"error": "INVALID_STATE_CODE",
"message": "stateOfResidence must be a valid 2-letter US state code",
"field": "deceased.stateOfResidence"
}
State Rules Engine
The rules engine is the most architecturally unique component in Settle. It must encode legally-correct, state-specific probate rules for all 50 states, remain maintainable by non-engineers (or at least by lawyers working with engineers), support versioning and rollback, and produce an auditable trail of exactly which rule set generated which task plan for a given estate.
Rule Set Schema
Each state has a versioned JSON rule set stored in Postgres. The JSON document defines the full task graph for that state, with conditions that filter and customize tasks based on estate attributes. The schema is designed to be readable by a paralegal reviewing the document — not just a developer.
// Table: state_rule_sets
// Column: rules JSONB
{
"state": "CA",
"version": "CA-2026.1",
"effectiveDate": "2026-01-01",
"legalSources": [
"Cal. Prob. Code § 13100",
"Cal. Prob. Code § 8000"
],
"probateThreshold": {
"grossEstateValueCents": 18450000,
"realPropertyIncluded": true,
"source": "Cal. Prob. Code § 13100 (adjusted annually)"
},
"smallEstateAffidavit": {
"available": true,
"maxValueCents": 18450000,
"waitDays": 40,
"form": "DE-305"
},
"tasks": [
{
"id": "obtain_death_certificates",
"category": "immediate",
"priority": 1,
"title": "Order certified death certificates",
"description": "Order at least 10 certified copies from VitalChek...",
"conditions": [],
"deadlineDays": 7,
"requiredFor": ["open_estate_account", "notify_ssa"],
"legalGuidance": "Procedural step. Order via VitalChek or local registrar."
},
{
"id": "file_probate_petition",
"category": "probate",
"priority": 2,
"title": "File petition for probate",
"conditions": [
{
"field": "estate.requiresProbate",
"operator": "eq",
"value": true
}
],
"deadlineDays": 30,
"court": "Superior Court, Probate Division",
"forms": ["DE-111", "DE-140"],
"filingFee": {
"baseFeeCents": 39500,
"source": "Cal. Gov. Code § 70650"
}
}
]
}
Rule Evaluation
The rules engine evaluates a rule set against an EstateContext object — a snapshot of all estate attributes relevant to task generation. Evaluation is a pure function with no side effects: given the same EstateContext and RuleSet, it always produces the same task list. This property makes it trivially testable.
// src/lib/rules/evaluator.ts
export interface EstateContext {
estateId: string;
stateCode: string;
dateOfDeath: Date;
estimatedGrossValueCents: number;
hasRealProperty: boolean;
hasWill: boolean;
hasTrust: boolean;
hasMinorChildren: boolean;
hasVeteranStatus: boolean;
hasBusinessInterests: boolean;
requiresProbate: boolean; // derived: grossValue > threshold
}
export function evaluateRuleSet(
context: EstateContext,
ruleSet: StateRuleSet
): GeneratedTask[] {
const tasks: GeneratedTask[] = [];
for (const rule of ruleSet.tasks) {
const conditionsMet = rule.conditions.every(
(c) => evaluateCondition(c, context)
);
if (conditionsMet) {
tasks.push({
ruleId: rule.id,
ruleSetVersion: ruleSet.version,
title: rule.title,
description: rule.description,
category: rule.category,
priority: rule.priority,
dueDateCalc: rule.deadlineDays
? addDays(context.dateOfDeath, rule.deadlineDays)
: null,
legalGuidance: rule.legalGuidance,
forms: rule.forms ?? [],
status: 'pending'
});
}
}
return tasks.sort((a, b) => a.priority - b.priority);
}
function evaluateCondition(
condition: RuleCondition,
context: EstateContext
): boolean {
const value = getNestedValue(context, condition.field);
switch (condition.operator) {
case 'eq': return value === condition.value;
case 'gt': return value > condition.value;
case 'lt': return value < condition.value;
case 'in': return condition.value.includes(value);
default: throw new Error(`Unknown operator: ${condition.operator}`);
}
}
Versioning and the Audit Trail
Rule sets are immutable once published. When a legal change requires updating California's rules, a new version CA-2026.2 is created — the old version is never modified. Every generated task carries the ruleSetVersion that produced it. This creates a complete audit trail: any task in any estate can be traced to the exact rule text that created it, at the time it was created.
| Column | Purpose |
|---|---|
| state_rule_sets.version | Unique identifier, format STATE-YYYY.N (e.g., CA-2026.2) |
| state_rule_sets.status | draft, review, published, superseded |
| state_rule_sets.superseded_by | FK to newer version; forms a linked list of rule history |
| tasks.rule_set_version | Captured at task generation time; never updated |
| estate_events.rule_regeneration | Audit event when task plan is regenerated; records old and new versions |
When new assets are discovered mid-process (e.g., a retirement account found three months in), the executor can trigger task plan regeneration. The engine runs against the current published rule set for the estate's state, adds any new tasks that aren't already present, and emits an estate_event recording the delta. Existing tasks are never deleted during regeneration — they may be superseded but the record of their creation is preserved.
Rule Update Process
legal-rule-change with the cite and effective date.status = 'draft' and effective_date set to the statutory effective date.rule_set_reviews table with reviewer identity and timestamp.effective_date, the admin publishes the new version. The old version's superseded_by is set. New estates in that state will use the new version immediately. Existing estates continue with their current task plan unless the executor triggers regeneration.legalGuidance field in every task rule is a procedural description, not legal advice. The schema enforces that this field cannot exceed 500 characters and must pass a content review that rejects first-person advisory language ("you should," "you must consult"). The admin UI renders a warning if submitted guidance contains these patterns.
Data Architecture
Postgres via Neon is the primary data store for all structured data. R2 stores documents. There is no separate analytics database at Year 1 — Neon's read replica capability provides read scaling without operational overhead. The schema is designed for the long-lived, event-sourced nature of estate workflows.
Core Schema
CREATE TABLE estates (
id TEXT PRIMARY KEY DEFAULT gen_estate_id(), -- 'est_' prefix + ulid
status TEXT NOT NULL DEFAULT 'intake', -- intake|active|closing|closed|archived
state_code CHAR(2) NOT NULL,
opened_at TIMESTAMPTZ NOT NULL DEFAULT now(),
closed_at TIMESTAMPTZ,
rule_set_version TEXT, -- FK: state_rule_sets.version
estimated_value_cents BIGINT,
requires_probate BOOLEAN GENERATED ALWAYS AS (
estimated_value_cents > (
SELECT probate_threshold_cents
FROM state_rule_sets
WHERE state_code = estates.state_code
AND status = 'published'
LIMIT 1
)
) STORED,
metadata JSONB NOT NULL DEFAULT '{}' -- flexible attributes
);
CREATE INDEX idx_estates_state ON estates(state_code);
CREATE INDEX idx_estates_status ON estates(status)
WHERE status NOT IN ('closed','archived');
CREATE TABLE deceased (
id TEXT PRIMARY KEY DEFAULT gen_id(),
estate_id TEXT NOT NULL REFERENCES estates(id),
first_name TEXT NOT NULL,
last_name TEXT NOT NULL,
date_of_birth DATE,
date_of_death DATE NOT NULL,
state_of_residence CHAR(2) NOT NULL,
-- Column-level encrypted fields (AES-256-GCM, application-layer)
ssn_encrypted BYTEA, -- encrypted SSN
ssn_last4 CHAR(4), -- unencrypted for display/lookup
-- Access to ssn_encrypted is audit-logged at application layer
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE UNIQUE INDEX idx_deceased_estate ON deceased(estate_id);
CREATE TABLE tasks (
id TEXT PRIMARY KEY DEFAULT gen_id(),
estate_id TEXT NOT NULL REFERENCES estates(id),
rule_id TEXT NOT NULL, -- e.g. 'file_probate_petition'
rule_set_version TEXT NOT NULL, -- immutable at creation
title TEXT NOT NULL,
category TEXT NOT NULL, -- immediate|probate|financial|notifications|etc
status TEXT NOT NULL DEFAULT 'pending', -- pending|in_progress|complete|skipped|n_a
priority INTEGER NOT NULL DEFAULT 50,
due_date DATE,
completed_at TIMESTAMPTZ,
completed_by TEXT REFERENCES users(id),
notes TEXT,
is_generated BOOLEAN NOT NULL DEFAULT TRUE, -- false = manually added
superseded_at TIMESTAMPTZ, -- set if task replaced on regeneration
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_tasks_estate ON tasks(estate_id) WHERE superseded_at IS NULL;
CREATE INDEX idx_tasks_status ON tasks(estate_id, status)
WHERE status IN ('pending','in_progress');
CREATE TABLE state_rule_sets (
version TEXT PRIMARY KEY, -- 'CA-2026.1'
state_code CHAR(2) NOT NULL,
status TEXT NOT NULL DEFAULT 'draft', -- draft|review|published|superseded
effective_date DATE NOT NULL,
superseded_by TEXT REFERENCES state_rule_sets(version),
probate_threshold_cents BIGINT NOT NULL,
rules JSONB NOT NULL, -- full rule set document
reviewed_by TEXT, -- attorney name on record
reviewed_at TIMESTAMPTZ,
published_by TEXT REFERENCES users(id),
published_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_rules_state_published ON state_rule_sets(state_code)
WHERE status = 'published';
CREATE TABLE notifications (
id TEXT PRIMARY KEY DEFAULT gen_id(),
estate_id TEXT NOT NULL REFERENCES estates(id),
institution_id TEXT REFERENCES institutions(id),
tier SMALLINT NOT NULL CHECK (tier IN (1, 2, 3)),
channel TEXT NOT NULL, -- api|mail|phone_script
status TEXT NOT NULL DEFAULT 'queued', -- queued|processing|sent|delivered|failed
idempotency_key TEXT UNIQUE NOT NULL, -- prevents duplicate sends
external_id TEXT, -- Lob letter ID, API confirmation, etc.
queued_at TIMESTAMPTZ NOT NULL DEFAULT now(),
sent_at TIMESTAMPTZ,
delivered_at TIMESTAMPTZ,
failed_at TIMESTAMPTZ,
failure_reason TEXT,
retry_count SMALLINT NOT NULL DEFAULT 0,
payload JSONB NOT NULL -- full request payload, redacted of PII
);
CREATE INDEX idx_notif_estate ON notifications(estate_id);
CREATE INDEX idx_notif_status ON notifications(status) WHERE status IN ('queued','processing');
CREATE UNIQUE INDEX idx_notif_idempotency ON notifications(idempotency_key);
CREATE TABLE audit_log (
id BIGSERIAL PRIMARY KEY,
estate_id TEXT, -- nullable for admin/system events
actor_id TEXT NOT NULL, -- user ID or 'system'
actor_role TEXT NOT NULL,
action TEXT NOT NULL, -- 'read_ssn', 'task_complete', etc.
resource_type TEXT NOT NULL,
resource_id TEXT,
ip_address INET,
user_agent TEXT,
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Append-only enforced via Postgres policy:
ALTER TABLE audit_log ENABLE ROW LEVEL SECURITY;
CREATE POLICY audit_insert_only ON audit_log
FOR INSERT WITH CHECK (TRUE);
CREATE POLICY audit_no_update ON audit_log
FOR UPDATE USING (FALSE);
CREATE POLICY audit_no_delete ON audit_log
FOR DELETE USING (FALSE);
CREATE INDEX idx_audit_estate ON audit_log(estate_id) WHERE estate_id IS NOT NULL;
CREATE INDEX idx_audit_action ON audit_log(action, created_at DESC);
CREATE INDEX idx_audit_actor ON audit_log(actor_id, created_at DESC);
Field-Level Encryption
The most sensitive fields — SSN, financial account numbers, medical record numbers — are encrypted at the application layer before being written to Postgres. Disk encryption (provided by Neon) is a necessary baseline but insufficient alone: it does not protect against a compromised database credential or a SQL injection vulnerability that returns raw rows. Field-level encryption ensures that even a full database dump is useless without the encryption keys.
Encryption Approach
- Algorithm: AES-256-GCM with a random 96-bit nonce per encryption operation
- Key Management: Per-estate derived keys using HKDF from a master key stored in Fly.io Secrets. The master key is rotatable without re-encrypting all records (re-encryption is a background job).
- Storage format:
BYTEAcolumn storesnonce (12 bytes) || ciphertext || auth_tag (16 bytes) - Key derivation:
HKDF(masterKey, salt=estateId, info="settle-v1-field-encryption")
// src/lib/crypto/fieldEncryption.ts
import { hkdf, getRandomValues } from 'node:crypto';
const MASTER_KEY = Buffer.from(process.env.FIELD_ENCRYPTION_KEY!, 'hex');
const ALGORITHM = 'aes-256-gcm';
export async function encryptField(
plaintext: string,
estateId: string
): Promise<Buffer> {
const derivedKey = await deriveEstateKey(estateId);
const nonce = getRandomValues(new Uint8Array(12));
const cipher = createCipheriv(ALGORITHM, derivedKey, nonce);
const encrypted = Buffer.concat([
cipher.update(plaintext, 'utf8'),
cipher.final()
]);
const authTag = cipher.getAuthTag();
// Pack: nonce (12) + ciphertext + authTag (16)
return Buffer.concat([nonce, encrypted, authTag]);
}
export async function decryptField(
cipherBuffer: Buffer,
estateId: string
): Promise<string> {
// Audit log this access BEFORE decryption
await auditLog.write({ action: 'field_decryption', estateId });
const derivedKey = await deriveEstateKey(estateId);
const nonce = cipherBuffer.subarray(0, 12);
const authTag = cipherBuffer.subarray(cipherBuffer.length - 16);
const ciphertext = cipherBuffer.subarray(12, cipherBuffer.length - 16);
const decipher = createDecipheriv(ALGORITHM, derivedKey, nonce);
decipher.setAuthTag(authTag);
return decipher.update(ciphertext) + decipher.final('utf8');
}
Fields That Require Encryption
| Table | Column | Sensitivity | Display Fallback |
|---|---|---|---|
| deceased | ssn_encrypted | Critical | ssn_last4 (unencrypted) |
| financial_accounts | account_number_encrypted | Critical | account_last4 |
| financial_accounts | routing_number_encrypted | High | institution name |
| benefits | policy_number_encrypted | High | insurer name + type |
| deceased | medical_record_numbers | High | provider name only |
Document Vault (Cloudflare R2)
Documents — death certificates, wills, financial statements — are stored in R2 with server-side encryption (SSE-C using a per-estate key). The API never streams document content directly; instead it issues short-lived (5-minute) presigned URLs for download and upload. Documents are organized in R2 by a path scheme that does not expose PII in the object key.
R2 path format: estates/{estateId}/{documentType}/{ulid}.{ext}
On upload, the Document Processor worker intercepts the R2 object:created event, performs virus scanning (ClamAV via a Fly.io sidecar), attempts OCR classification, and writes document metadata to Postgres. If virus scanning fails, the document is moved to a quarantine prefix and the estate is notified.
Notification Service Architecture
The notification service is the most operationally complex component in Settle. It must send actual cancellation requests to real institutions, generate and mail physical letters via Lob, and produce call scripts — each with fundamentally different reliability, latency, and tracking requirements. A failure in Tier 1 must not affect Tier 2; a physical letter sent twice is a worse failure mode than a digital request sent twice.
- Direct API calls to subscription services
- Netflix, Spotify, utility APIs
- Idempotent via service-side dedup
- Confirmation number stored in DB
- Retry on 5xx, not on 4xx
- Success = 200/204 from service
- Lob API for letter generation
- Death cert PDF merged into template
- Lob provides USPS tracking
- Delivery webhook updates status
- Never send twice (idempotency key)
- CASS-certified addressing required
- Purely computational — no external calls
- Template rendering + personalization
- Returns structured script object
- Includes estimated hold times
- No retry needed (idempotent by nature)
- Logged to notification_records table
Worker Architecture
The notification worker is a long-running Fly.io process that polls the PgBoss job queue. It is isolated from the API server to prevent notification failures from impacting user-facing requests. The worker processes jobs by tier in priority order: Tier 2 (physical mail, longest lead time) processes first, then Tier 1, then Tier 3.
Tier 1 — API Cancellation
Tier 1 notifications call institution APIs directly to cancel subscriptions or notify of a death. The key engineering challenges are: (1) each institution has a different API contract, (2) idempotency must be enforced even if the request succeeds but the network drops before the response arrives, and (3) the system must distinguish between "institution confirmed cancellation" and "institution returned 200 but nothing happened."
// Institution adapter interface — each institution implements this
interface InstitutionAdapter {
institutionId: string;
sendNotification(
payload: NotificationPayload,
idempotencyKey: string
): Promise<NotificationResult>;
}
interface NotificationResult {
success: boolean;
externalId?: string; // confirmation number from institution
confirmedAction?: string; // 'cancelled' | 'notified' | 'pending_review'
errorCode?: string;
retryable: boolean; // false for 4xx, true for 5xx/network errors
}
// Idempotency key construction
const idempotencyKey = `tier1:${estateId}:${institutionId}:${notificationType}`;
// This key is stable across retries. Postgres UNIQUE constraint on
// notifications.idempotency_key ensures no duplicate processing.
Tier 2 — Physical Mail via Lob
Physical mail has the highest failure cost of the three tiers. A letter sent to a wrong address wastes money and time, but more importantly, it may delay asset recovery for a grieving family. The Tier 2 handler performs address validation via Lob's CASS-certified address API before creating the letter. Only after validation succeeds does it proceed to letter creation.
// Tier 2 handler flow
async function handleTier2(job: NotificationJob) {
// 1. Check idempotency — was this letter already sent?
const existing = await db.notifications.findByIdempotencyKey(job.idempotencyKey);
if (existing?.external_id) {
// Already sent to Lob — fetch status and return
return { success: true, externalId: existing.external_id, alreadySent: true };
}
// 2. Validate address via Lob before creating letter
const addressVerification = await lob.usVerifications.verify({
primary_line: job.recipientAddress.line1,
city: job.recipientAddress.city,
state: job.recipientAddress.state,
zip_code: job.recipientAddress.zip
});
if (addressVerification.deliverability === 'undeliverable') {
throw new NotificationError('UNDELIVERABLE_ADDRESS', { retryable: false });
}
// 3. Fetch death certificate presigned URL from R2
const deathCertUrl = await r2.getPresignedUrl(job.estateId, 'death_certificate');
// 4. Create letter via Lob API
const letter = await lob.letters.create({
description: `Estate notification: ${job.estateId}`,
to: addressVerification.components,
from: SETTLE_RETURN_ADDRESS,
file: job.templateId,
merge_variables: {
deceasedName: job.deceasedName,
institutionName: job.institutionName,
estateExecutor: job.executorName,
deathCertificateEnclosure: true
},
// Lob idempotency header — prevents duplicate if network fails
idempotencyKey: job.idempotencyKey
});
// 5. Store Lob letter ID for tracking
await db.notifications.update(job.notificationId, {
status: 'sent',
external_id: letter.id,
sent_at: new Date()
});
}
Lob sends delivery status webhooks when a letter is in-transit, delivered, or returned. The webhook handler updates notifications.status and, on delivery failure (returned mail), creates a new task for the executor to verify the institution's address.
Tier 3 — Call Script Generation
Tier 3 is a pure computation: given an estate context and institution profile, generate a structured call script. No external API calls are made. The script object is returned synchronously to the client and also stored in notifications for record-keeping. The handler uses a template system with institution-specific overrides for hold queues, required account information, and department routing.
Reliability Model
| Tier | Retry Strategy | Backoff | Max Retries | Failure Action |
|---|---|---|---|---|
| Tier 1 | Retry on 5xx and network errors; no retry on 4xx | Exponential: 1m, 5m, 30m, 2h | 4 | Mark failed; create manual task for executor |
| Tier 2 | Retry on Lob API errors; no retry on address validation failure | Exponential: 5m, 30m, 4h | 3 | Mark failed; notify executor to verify address |
| Tier 3 | No retry needed (pure computation) | — | 0 | Surface error with context; log for debugging |
Benefit Discovery Architecture
Benefit discovery is the process of finding assets and entitlements the family may not know about: unclaimed property held by states, life insurance policies, pension benefits, VA entitlements. Some sources have APIs; some require web scraping; some require manual submission. The architecture must handle all three and present results with appropriate confidence signals so families act on real findings, not false positives.
Source Taxonomy
| Source | Data | Integration Type | Auth Required | Rate Limit |
|---|---|---|---|---|
| NAUPA / MissingMoney | Unclaimed property (all 50 states) | HTTP scrape | None | Per-state limits, ~1 req/10s |
| NAIC Life Policy Locator | Life insurance policies | Form submission | Registration required | Manual review cycle (days) |
| PBGC Pension Search | Defined-benefit pension benefits | HTTP scrape | None | Unknown; respect robots.txt |
| VA Benefits | Veteran burial/death benefits | API (va.gov) | API key | Documented per-token limits |
| SSA Death Benefits | Lump-sum death payment | Manual guidance | — | — |
| MIB (Medical Info Bureau) | Insurance application records | Manual guidance | — | — |
| State unclaimed property (direct) | State treasury holdings | HTTP scrape | Varies by state | Vary by state |
Scanner Architecture
The benefit scanner is a Fly.io worker that runs on a cron schedule (once per week per active estate) and on-demand when triggered by the executor. It uses a source adapter pattern — each external source has an adapter implementing a common interface — allowing new sources to be added without modifying the core scanner logic.
// Source adapter interface
interface BenefitSourceAdapter {
sourceId: string;
type: 'api' | 'scrape' | 'manual_guidance';
canAutoScan: boolean;
scan(
context: ScanContext
): Promise<BenefitScanResult[]>;
}
interface ScanContext {
deceasedName: { first: string; last: string };
deceasedSsn?: string; // decrypted only for sources that require it
dateOfDeath: Date;
stateOfResidence: string;
hasVeteranStatus: boolean;
}
interface BenefitScanResult {
sourceId: string;
confidence: 'confirmed' | 'probable' | 'possible';
benefitType: string;
estimatedValueCents?: number;
claimUrl?: string;
manualStepsRequired?: string[];
rawData: Record<string, unknown>;
scannedAt: Date;
}
Caching Strategy
External benefit databases must not be queried on every page load. Scan results are cached in Postgres with a scanned_at timestamp. The frontend shows the cached result with a freshness indicator. Scans are throttled: no source is queried more than once per 24 hours per estate, regardless of how many times the executor views the benefits page. This prevents accidental hammering of NAUPA or PBGC from a user repeatedly refreshing the page.
Confidence Model
Security Architecture
Settle handles death certificates, Social Security numbers, financial account numbers, and medical history. A breach of this data against a grieving family is a categorical failure. Security is not a feature to be added in Year 2 — every component in this document has been designed with Defense in Depth as a first principle.
Threat Model Summary
| Threat Vector | Severity | Primary Mitigation | Secondary Mitigation |
|---|---|---|---|
| SQL injection → PII exfiltration | Critical | Parameterized queries (Drizzle ORM); no string concatenation | Field-level encryption renders exfiltrated SSNs useless |
| IDOR: accessing another estate's data | Critical | Postgres RLS policies keyed to session estate ID | API middleware ownership check before every handler |
| Credential theft (session hijack) | Critical | HttpOnly, Secure, SameSite=Strict session cookie | Request fingerprinting; IP change triggers re-auth prompt |
| Document vault unauthorized access | Critical | Presigned URLs valid for 5 minutes only; R2 bucket not public | SSE-C per-estate keys; document access is audit-logged |
| Compromised database credential | High | Neon per-role least-privilege credentials; no superuser in app | Field encryption; SSNs unreadable without app-layer keys |
| Supply chain attack (npm) | High | Dependabot; lockfile integrity checked in CI | Minimal dependency philosophy; audit npm packages quarterly |
| Unauthorized Practice of Law | High | Content policy in rules engine schema; architectural separation of guidance vs advice | Legal review required for all rule set publications |
| Over-retention of sensitive data | Medium | Retention policy: 7 years post-estate closure; automated deletion jobs | Right-to-erasure workflow for CCPA compliance |
Defense in Depth: Layer View
settle_app can only read rows belonging to the estate in the session context. A separate read-only role settle_analytics has no access to the PII tables. The audit_log table is append-only at the Postgres policy level.audit_log table. Postgres RLS prevents any application role from updating or deleting audit records. Logs are retained for 7 years per legal hold requirements and are exportable per estate for legal discovery.Data Retention and the Right to Erasure
Estate records are retained for 7 years following estate closure, aligned with common statute of limitations for executor liability. At the 7-year mark, a background job initiates deletion: PII fields are overwritten with null values, documents are deleted from R2, and the estate record is anonymized (names replaced with hashed identifiers). The estate's task completion record and financial summary are retained in anonymized form for aggregate analytics.
For CCPA right-to-erasure requests during an active estate, the request is held pending estate closure (deletion during active administration would be legally problematic). The request is logged and honored automatically at closure plus a 90-day cooling-off period.
Scaling Strategy
Settle's growth trajectory — 500 estates in Year 1, 5,000 in Year 2, 50,000 in Year 3 — spans two orders of magnitude. The architecture is sized for Year 1 today and designed to scale to Year 3 without rearchitecting the core data model or notification service. Each year has a clear set of scaling gates that trigger architectural evolution.
- Modular monolith on Fly.io
- Neon 0.25 CU, autoscale
- Single notification worker
- Upstash Redis for sessions
- Benefit scanner: weekly cron
- Add Neon read replica for analytics
- Scale API to 4 instances
- Notification worker: 2 instances
- Add connection pooling (PgBouncer)
- R2 bucket per region (2 regions)
- Extract notification service to standalone API
- Extract benefit scanner to standalone service
- Neon: scale to 4+ CU with dedicated compute
- Consider Postgres partitioning on tasks by estate
- Dedicated analytics database (ClickHouse)
Key Scaling Decisions and Their Triggers
| When | Trigger | Action | Complexity Cost |
|---|---|---|---|
| ~2,000 estates | DB p99 query time > 100ms | Add PgBouncer connection pooler; add Neon read replica for reports | Low — operational only |
| ~5,000 estates | Notification queue depth consistently > 100 | Scale notification worker to 3 concurrent instances | Low — Fly.io scaling config only |
| ~10,000 estates | API P99 latency > 500ms or error rate > 0.1% | Separate API server and notification worker into distinct Fly apps; independent scaling | Medium — deploy config and inter-service auth |
| ~30,000 estates | Tasks table exceeds 10M rows; scan times degrade | Partition tasks table by estate_id range; add partial indexes on active estates | Medium — zero-downtime migration required |
| ~50,000 estates | Benefit scanner can't complete weekly runs within window | Extract scanner to dedicated service; parallelize by state/source | High — separate service with its own queue and auth |
Architecture Decision Records
Every significant architectural decision is recorded here with its context, the options considered, the decision made, and the tradeoffs accepted. These records are immutable once a decision is implemented — new decisions supersede rather than modify them.
- No new infrastructure to operate
- Rule sets are version-controlled with the database
- TypeScript evaluator is trivially testable
- Readable by paralegals with training
- Easy to extend the schema as rules grow
- Industry-standard for complex rule systems
- Forward-chaining and conflict resolution built in
- Requires Java runtime or separate service
- High operational overhead for 50-state static rules
- Over-engineered for conditional task filtering
- Encrypted at-rest data useless without app keys
- Independent of database security posture
- Supports field-level access audit logging
- Enables per-estate key derivation
- Zero application complexity
- Fields are queryable/indexable
- Does not protect against SQL injection
- Does not protect against credential compromise
- CASS address verification included
- USPS tracking and delivery webhooks
- SOC 2 Type II certified
- No vendor relationship management
- Higher per-letter cost vs volume contracts
- Lower per-letter cost at volume
- Full control over design and timing
- Requires CASS certification
- Return mail handling complexity
- Significant operational overhead
- Instant session revocation on any device
- Role changes take effect immediately
- No complex token refresh infrastructure
- Requires Redis for session storage
- Every request hits Redis (fast, but a dependency)
- Stateless — no session store required
- Horizontally scales without shared state
- Cannot revoke before expiry without a denylist (= session store again)
- Refresh token complexity
- Wrong security model for role-change-heavy workflows
- Smaller bundle size; faster on mobile
- Co-located API routes reduce context switching
- Form actions great for intake flows
- Smaller ecosystem than React
- Fewer available engineers in hiring market
- Largest frontend ecosystem
- React Server Components for complex UIs
- Larger engineer hiring pool
- App router complexity overhead
- Larger runtime footprint
- No additional infrastructure
- Transactional job creation with business data
- Full job history queryable in SQL
- Lower throughput ceiling than Redis (~1,000 jobs/sec)
- Adds write load to primary database
- Higher throughput (10,000+ jobs/sec)
- Better real-time queue monitoring
- Additional infrastructure to operate
- Job state not colocated with business data
- Not necessary at Year 1 volume
- True Postgres (not compatible — actual)
- Database branching for staging
- Scale to zero in dev
- Connection pooling via Neon serverless proxy
- Newer product; some enterprise features still maturing
- Postgres with auth, storage, realtime built in
- Could replace Redis for sessions
- Tighter coupling to Supabase ecosystem
- BYO auth/storage already planned
- More opinionated platform lock-in