Architecture
Written for Phase 2 planning (2026-05-18). Some details are outdated:
- Label count: describes a 47-label schema; shipped weights use 21 BIO labels (Tier 2)
- Tokenizer: describes 16K vocab; the A1 48K tokenizer exists but is not yet used in a stable classifier
- Repo layout: references a
packages/directory; the current repo uses root workspaces - Model config: 6-layer/256d/4-head is accurate for current shipped weights
See /docs/status for current shipped state.
System shapeβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Mailwoman SDK β
β β
β ββββββββββββββββββββ ββββββββββββββββββββββββββββββββββ β
β β Tokenization ββββΆβ Classifiers (rule + neural) β β
β β (existing) β β β β
β ββββββββββββββββββββ β ββββββββββββββββββββββββββββ β β
β β β rule: house_number β β β
β β β rule: postcode β β β
β β β rule: street_prefix β β β
β β β rule: whos_on_first β β β
β β β ... β β β
β β β β β β
β β β neural: NeuralSequence ββββΌβββ β
β β ββββββββββββββββββββββββββββ β β β
β ββββββββββββββββββββββββββββββββββ β β
β β β β
β βΌ β β
β ββββββββββββββββββββββββββββββββββ β β
β β ClassifierPolicy (per-tag) β β β
β ββββββββββββββββββββββββββββββββββ β β
β β β β
β βΌ β β
β ββββββββββββββββββββββββββββββββββ β β
β β Solver (existing) β β β
β β ExclusiveCartesianSolver + β β β
β β filters + augmenters β β β
β ββββββββββββββββββββββββββββββββββ β β
β β β β
β βΌ β β
β ββββββββββββββββββββββββββββββββββ β β
β β Ranked Solutions β β β
β ββββββββββββββββββββββββββββββββββ β β
β β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌβββ
β
βββββββββββββββββββββββββββββββββββββββ΄βββ
β @mailwoman/neural (new package) β
β β
β ββββββββββββββββββββββββββββββββββββ β
β β NeuralSequenceClassifier β β
β β - lazy-loads ONNX model β β
β β - SentencePiece tokenizer β β
β β - emits ClassificationProposal β β
β ββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββ β
β β Weights packages (per locale) β β
β β @mailwoman/neural-weights-en-us β β
β β @mailwoman/neural-weights-fr-fr β β
β β @mailwoman/neural-weights-ja-jp β β
β ββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββ
Monorepo layoutβ
packages/
core/ # existing β tokenization, span/section/phrase, classifier base
classifiers/ # existing rule classifiers β pulled out of core for clarity
neural/ # NEW β ONNX runtime + SentencePiece + sequence classifier
neural-weights-en-us/ # NEW β ONNX model weights for US English
neural-weights-fr-fr/ # NEW β ONNX model weights for FR French
corpus/ # NEW β TS adapters, alignment, synthesis, Parquet output
corpus-python/ # NEW β Python training pipeline, NOT published to npm
studio/ # FUTURE β Phase 5 web UI for human correction
sdk/ # existing β public consumer API
server/ # existing β HTTP service
cli.ts # existing
The split between neural/ (runtime) and neural-weights-*/ (data) is deliberate. Users running US-only deployments should not download French weights. Weights packages are versioned independently from the runtime.
Key abstractionsβ
ComponentTagβ
The canonical union of address component types. Defined in reference/SCHEMA.md. Single source of truth. Adding a tag requires a written rationale.
ClassificationProposalβ
The shape every classifier produces. Replaces the current ad-hoc shape used by rule classifiers. See reference/INTERFACES.md.
interface ClassificationProposal {
span: Span
component: ComponentTag
confidence: number // 0..1
source: "rule" | "neural" | "merged"
source_id: string // 'house_number' for rule, 'neural-v0.3-en-us' for neural
penalty: number
metadata?: Record<string, unknown>
}
Existing rule classifiers wrap their output in this shape. Neural classifier emits the same shape. The solver does not distinguish.
ClassifierPolicyβ
Per-component configuration that decides which classifier(s) get authority for that component. This is the Ship of Theseus dial.
interface ClassifierPolicy {
component: ComponentTag
mode: "rule_only" | "neural_only" | "both" | "neural_preferred" | "rule_preferred"
confidence_threshold?: number
locale?: string // policy can vary per locale
}
Default policy is rule_only for everything until neural is shipped. Migration happens one component at a time, gated on golden-set metrics.
LocaleProfileβ
Encapsulates locale-specific behavior: which weights package to load, which rule classifiers apply, which synthesis rules govern training data generation.
interface LocaleProfile {
locale: string // 'en-US', 'fr-FR', 'ja-JP'
weightsPackage: string // npm package name
ruleClassifiers: string[] // identifiers of rule classifiers active for this locale
componentsSupported: ComponentTag[] // not every locale supports every tag (e.g. JP has no `street`)
policy: ClassifierPolicy[]
}
Locale strategyβ
Locales are first-class. The system does not assume English. Every classifier (rule or neural) declares which locales it serves.
- A US-only deployment loads
en-USprofile, English rule classifiers, US weights. - A multi-locale deployment loads all configured profiles. Locale detection per input is a separate concern handled by a
LocaleDetector(Phase 2 work; v0 can require explicit locale). - New locale = new
LocaleProfile+ new weights package + (optionally) new rule classifiers. Core code does not change.
Inference pathβ
input string
β
βΌ
Tokenization (existing Mailwoman code)
β
βΌ
For each Section:
β
ββββΆ Rule classifiers produce ClassificationProposals
β
ββββΆ NeuralSequenceClassifier (if locale supports it):
β 1. Tokenize section with SentencePiece (WASM)
β 2. Run ONNX inference via onnxruntime-node
β 3. Decode BIO labels back to character spans
β 4. Emit ClassificationProposals
β
βΌ
ClassifierPolicy filter:
For each component, keep only proposals from authorized classifiers
β
βΌ
Existing ExclusiveCartesianSolver
β
βΌ
Ranked solutions
Training path (Python, internal)β
Source data (OSM, WOF, BAN, OpenAddresses, government registries)
β
βΌ TypeScript adapters (corpus/adapters/*.ts)
Canonical rows: { raw, components, country, source, source_id }
β
βΌ Alignment (corpus/align.ts)
BIO-labeled rows: { raw, tokens, labels, country, source, source_id }
β
βΌ Synthesis (corpus/synthesize.ts)
Augmented rows: + case variants, abbreviations, typos, ordering perturbations
β
βΌ Parquet shards (versioned)
corpus-vX.Y.Z/{train,val,test}-*.parquet
β
βΌ Python training (corpus-python/)
HuggingFace Transformers + PyTorch
β
βΌ ONNX export + int8 quantization
neural-weights-{locale}/model.onnx + tokenizer.model
β
βΌ Eval on golden set
If golden_F1 >= threshold: publish
Model architecture (what the neural classifier IS)β
Family: encoder-only transformer + per-token classification head. Standard NER (named entity recognition) shape. Each token of the input address string gets exactly one BIO label from a fixed set (O + B-<tag> + I-<tag> for each tag in the ComponentTag union β currently 47 labels = 1 + 2 Γ 23 tags).
Why this family, not the othersβ
| Family | Used for | Why not us |
|---|---|---|
| Encoder-only + classification head (BERT / RoBERTa / DeBERTa + NER heads) | Per-token tagging; same number of output labels as input tokens | This is us. Address parsing is a tagging problem, not a generation problem. |
| Encoder-decoder (T5, BART) | Seq2seq β translation, summarization | The output isn't a different sequence; it's structured tags per input token. Encoder-decoder is overkill and lossy for this shape. |
| Decoder-only (GPT, Llama) | Autoregressive generation | We annotate, we don't generate. A decoder LLM could approximate via constrained decoding at 1000Γ the cost. |
| Conditional random field (CRF) | Pre-transformer per-token tagging with Markov features | This is libpostal. We're replacing the CRF with a transformer encoder in front of an effectively-similar tag space. |
Concrete config (Phase 2 / issue #10)β
- Backbone: HuggingFace
BertForTokenClassificationwith custom small config β 6 layers, 256 hidden dim, 4 attention heads, 1024 FF intermediate, max position 128. Trained from scratch (not fine-tuned from a pretrained English-internet BERT β the address vocabulary is too narrow and too unlike natural language for that pretraining to help). - Tokenizer: SentencePiece (unigram),
vocab_size=16000,character_coverage=0.9995,byte_fallback=true. Trained on the corpus, not picked off the shelf. Tokenizer version is locked into corpus version. - Output head: linear projection β 47 logits per token (the BIO label space derived from
ComponentTag). - Inference: softmax over labels, take argmax per token, decode BIO spans back to character offsets via SentencePiece's offset map, emit
ClassificationProposalper span with mean per-token softmax probability as confidence. - Size target: < 30 MB int8-quantized ONNX, < 40 MB including tokenizer. Total parameter count in the low millions.
- Runtime:
onnxruntime-node, CPU execution provider default, no Python at inference time.
The cousin: spaCy's ja_core_news_trfβ
The closest commercial reference point is spaCy's Japanese transformer pipeline:
spaCy ja_core_news_trf | mailwoman planned | |
|---|---|---|
| Backbone | cl-tohoku/bert-base-japanese-char-v2 (BERT-base β 12 layers, 768d) | small BERT (6 layers, 256d) β ~10Γ smaller |
| Vocab | 6,144 char-level (Japanese requires character-level for OOV handling) | 16,000 SentencePiece (multilingual; bytes fall back for OOV) |
| Pretrained | Yes (Tohoku-Univ. Inui Lab BERT) | No (from scratch on address corpus) |
| Task heads | 3 on shared encoder: morphologizer (POS), parser (dependency), NER (22 entity types) | 1 on dedicated encoder: NER for 23 address component types |
| Model size | 320 MB | < 40 MB int8 |
| NER F1 (their domain) | 83.31 ENTS_F on general Japanese NER | TBD; the rule baseline is > 95% on country / region for the golden set, so the floor is high |
| Inference runtime | Python (spaCy) | Node (onnxruntime-node) |
| License | CC BY-SA 3.0 | AGPL-3.0 |
spaCy validates the family choice β when the industry-standard NER toolkit's flagship Japanese model is BERT-encoder + token-classification head, we're in the right shape. The trade-offs we make vs them:
- Smaller (10Γ fewer parameters): we're a narrower task. General-purpose Japanese NER must handle 22 entity types across all of news text; address parsing handles 23 component types in addresses only. Narrow task tolerates narrower model.
- From scratch, not fine-tuned: their cl-tohoku backbone was pretrained on Japanese Wikipedia + news. Our domain is far enough from natural-language text that the pretraining transfer is weak; training from scratch on a domain-pure corpus gets us a smaller, faster, more accurate model.
- Single head, not multi-head: they share one encoder across morphologizer + parser + NER. We have one head only β the encoder can fully specialize for component disambiguation. (Multi-head is a future extension: see below.)
Multi-head extension (Phase 4+)β
The spaCy pipeline pattern β one shared encoder + multiple task heads β is a powerful future direction. Candidates if/when we want them:
- Venue vs address-part head: a binary classifier per token to help compositional disambiguation (
Buffalo Health Clinic Buffalo NYβ is "Buffalo" venue-token or locality-token?). Useful for [[project-mailwoman-bitter-lesson]]'s kryptonite cases. - Locale detection head: predicts
en-USvsfr-FRvsja-JPfor the whole input, letsLocaleDetector(currently a separate concern) ride the same encoder. - Completeness head: predicts whether the input is a complete address, a fragment, or noise β feeds graceful-failure signaling per [[project-mailwoman-graceful-failure]].
- Confidence head: a separate calibration learner trained on held-out data, replacing the per-token softmax with a calibrated probability.
None of these are Phase 0β3 scope. Filed for Phase 4 territory or beyond. spaCy's existence is the evidence the pattern works at scale.
Frameworks + tooling stance β what we use, what we don't, what we stealβ
Use: HuggingFace Transformers + PyTorch for training (Python-only, single-shot, output is the ONNX file). onnxruntime-node for inference (TypeScript / Node, the user-facing path). SentencePiece for tokenization on both sides. @dsnp/parquetjs (patched β see .yarn/patches/ in the repo) for corpus output. That's the production stack β anything not on this list is either dead weight or duplicative.
Don't use: spaCy as a dependency, either runtime or training. Reasons:
- Runtime: spaCy is Python-only. The plan's "TypeScript-first" stance is a hard constraint from the epic; adding Python at inference time defeats the deployment-regression-for-Node-users argument that's the project's reason to exist. spaCy's ONNX export path (
spacy-onnx-export) exists but is rarely production-grade. - Training: spaCy's config-driven training is nice (
spacy.cfg+Thinc) but pays off mostly when pipeline composition matters or the task is novel. Our task β single-head NER onBertForTokenClassificationβ is the most-trodden path in HuggingFace Transformers. spaCy adds an abstraction layer for marginal convenience. Operator's "less Python the better" stance argues against it even for training where Python is unavoidable.
Steal the patterns, not the tool:
| What | From | Where to apply |
|---|---|---|
| Release notes template (label scheme table + transformer config block + per-component F1 + compatibility matrix) | spaCy's GitHub release format for models like ja_core_news_trf-3.7.0 | Phase 3 (#11) @mailwoman/neural-weights-* package READMEs + model card JSON |
Eval reporting conventions (per-component P/R/F1 + confusion matrix layout, displaCy-style visualization patterns) | spaCy benchmark reports | Phase 2 (#10) eval reports |
| "Training described entirely by a config file, code is just the runner" hygiene | spacy.cfg pattern | corpus-python/train.py config shape β adopt the pattern without taking spaCy as a dep |
| Vintage / language-model-version conventions | spaCy's ja_core_news_trf-3.7.0 naming (<lang>_<domain>_<size>_<framework>-<version>) | Weights package versioning + ONNX file naming |
Optional Phase 3 baseline comparison: run a fine-tuned xx_ent_wiki_sm or en_core_web_trf against the same golden set, report comparison numbers in the model card. General NER tagging LOC / ORG where we expect locality / venue is informative β shows the address-specific gain over general NER. Operator-time, ~1 hour for an honest baseline run. Not in the install dependency tree; one-off comparison only.
Casual descriptions (for audiences who don't want the architecture deep-dive)β
- Technical: "Small encoder-only transformer (BERT-class, from scratch, ~few-million params, less than 40 MB int8 ONNX), per-token classification head emitting BIO-tagged address-component spans. Runs in Node via onnxruntime-node, no Python at inference."
- Practitioner: "Tiny BERT-class NER model specialized for address parsing β a learned replacement for libpostal's CRF, sized to fit in a Node process. Architecturally the same family as spaCy's ja_core_news_trf, ~10Γ smaller because the task is narrower."
- User-facing: "A small ONNX model that reads an address string and tells you, with per-component confidence, what each piece of it means."
- For the bitter-lesson audience: "The contextual-parser half of a contextual-parser-plus-constraint-solver geocoder front-end. The transformer proposes; the policy registry + solver dispose."
Notably NOT: a translator (no language pair, no autoregressive output). Not a generative model.
Training cadence vs. plan (2026-05-18 reframing)β
The original phase plan (phases/PHASE_2_training.md) targeted a single training run hitting >95% per-component F1 in ~2 weeks. Two iterations in, that framing has been replaced by an iteration cadence: each cycle is ~3-7 days of focused work (corpus refinement β train 6-10h GPU β eval β ship), and each cycle produces a shippable artifact even if it doesn't hit the >95% bar.
Why the change:
- >95% F1 in one shot was aspirational. Address parsing from scratch on a noisy real-world corpus is harder than that target acknowledged. Real ship gates are lower (~0.85 F1 on coarse components is genuinely useful in production with proper calibration).
- Shipping below-target IS the bitter-lesson framing. Per [[project-mailwoman-graceful-failure]], the success metric is graceful failure, not max F1. A 0.85-F1 model with 0.88 calibration in its conf>0.9 bucket is more useful than a 0.95-F1 model with 0.33 calibration (the v0.1.0 situation).
- Ship-of-Theseus coexistence makes shipping safe. The neural classifier adds proposals alongside the rule classifiers via the policy registry; we don't have to wait for parity to start using it.
- Iteration > single-run. Each cycle produces real artifacts: ONNX weights, model card, eval report, npm package shape, demo updates. Compounding wins.
Iteration history (live)β
| Iteration | Scope | Wall-clock | Outcome |
|---|---|---|---|
| v0.1.0 | First Tier 1 ship; sequential-data overfit due to corpus imbalance | ~3 days | Honest below-target ship; F1 ~0.03 macro, calibration 0.33 (confidently wrong) |
| v0.2.0 | source_weights mechanism + relaxed coarse gate (Phase 2 Β§6 recipe) | ~3 days | 9Γ macro-F1 (0.037 β 0.335); calibration tightened 2.6Γ (0.33 β 0.88 in conf>0.9 bucket); still below 95% targets but ship-worthy |
| v0.3.0 β v3.0.0 (shipped 2026-05-22) | Tier 2 label expansion (B/I-venue, B/I-street, B/I-house_number) + linear-chain CRF decoder + dual loss (CE + 0.05Β·CRF NLL) + corpus rebuild adding NAD (57.9M) β 677M aligned rows. npm major-bumped from 2.0.6 β 3.0.0. | ~1 day (overlap with adapter+demo work) | macro F1 0.32 on golden v0.1.2 (4,535 entries); capability-surface win (house_number 0.78, venue 0.39, street 0.27); coarse F1 regressed (region 0.83 β 0.18) β under-trained at step-1800 ship + label-space dilution. CRF makes orphan-I-* decode structurally impossible (Saint Petersburg fix verified on the live demo). |
| v0.4.0 (next β #116) | Recover the coarse-F1 regression + meet #57 fine floors + harden the dual loss. Per-token CRF NLL norm, longer training (step-5000+), class-weighted CE, source-weight rebalance, JS-side Viterbi + vocab-from-model-card. Reuses corpus-v0.3.0. | ~3-5 days | Targets: coarse F1 back to β₯0.6 region/locality + β₯0.7 postcode; venue β₯0.6, street β₯0.7 (the #57 floors); calibration β₯0.85; training runs to step 10K+ without divergence. |
| v0.5.0+ | Tier 3 organization/POI venue + top-k decoding for Resolver | ~5-7 days | Deferred until v0.4.0's coarse recovery lands |
| Phase 3 (Integration & Ship) runs in parallel with iterations | npm publish pipeline, @mailwoman/neural SDK glue | ~5-7 days | @mailwoman/neural@0.x.y on npm |
| Phase 4 (Resolver) | Source annotations, top-k disambiguation, gazetteer lookup | Deferred | Operator-gated |
Why more data doesn't blow up the iteration costβ
A single training run is fixed-step-count, not fixed-epoch. v0.2.0's 50K steps Γ 128 effective batch = 6.4M examples seen β 2.4% of the 263M-row corpus per pass. Adding NAD (62M) + OA-CA (10M) + state backfills bumps the corpus to ~340M rows; same 50K steps still finishes in ~6.5-10h. The model is nowhere near training-data-limited; more corpus β more diversity per example seen, not more wall-clock.
What ACTUALLY costs more time per iteration:
- Vocabulary tier expansion β adding venue/street/house_number labels requires adapter alignment updates (every adapter's
align.tsemits the new labels where source data supports it) + golden set verification + corpus rebuild (~260 min). This is the bulk of v0.3.0 work. - CRF decoder + label smoothing β small training-time additions, big calibration/coherence wins, lands in the same retrain.
- Per-iteration golden expansion β already automated via
expand-golden.ts+promote-golden.ts(PR #48 / #49); each iteration can re-eval against the latest golden version.
Re-framed success metric per iterationβ
Not "did F1 hit 95%" but "is the artifact independently shippable and a measurable improvement over the prior iteration?" The eval ledger at evals/scores-by-version.json makes this empirical β every shipped run gets a row, with corpus + eval-set sha-pinning so the deltas are honest apples-to-apples.
Why this shapeβ
- Classifier-as-plugin preserves Mailwoman's existing extensibility. New rule classifiers, new neural variants, future learned rankers all conform to the same interface.
- Per-component policy allows incremental rollout without big-bang risk. A regression in neural
streetparsing doesn't affectcountryparsing. - Locale profiles prevent the architecture from quietly Anglocentric-by-default. If the abstractions can't express a non-Anglo locale cleanly, we fix the abstractions, not the locale.
- Separate weights packages keep npm install times sane and let users opt in to languages they need.
- Training in Python, inference in TS uses each ecosystem for what it's best at and avoids the dual-language maintenance tax in production code.
Output shape (Phase 3 concern, not Phase 2)β
The neural model emits per-token BIO labels regardless of how its output is presented. The DECODER chooses the format. Three legitimate shapes, in increasing expressivity:
| Shape | Preserves | Loses | Round-trip fidelity |
|---|---|---|---|
JSON object {locality: "Paris", postcode: "75005"} | tag β value | order, repetition, hierarchy, untagged tokens | Country-specific formatter required; reorder errors are silent. Libpostal/Pelias compatible. |
Tuple array [["locality", "Paris"], ["postcode", "75005"]] | tag β value + original order + repetition | hierarchy | Reconstruction is array.map(([_, v]) => v).join(" "); trivially faithful to ordering. |
S-expression (locality "Paris" (postcode "75005")) | order + hierarchy + arbitrary attributes (:conf, :src) | nothing material | The most expressive β lets the parsed output say "this postcode is INSIDE this locality." Natural carrier for Phase 4 source annotations. |
Operator framing (2026-05-18): address parsing is inherently lossy β you can't extract the original formatted address from the parsed result. The closest you can get is by applying a formatter to the parsed result. S-expressions may better preserve the original shape because they encode hierarchy + ordering + attributes in one tree.
Implementation plan (Phase 3, #11): ship all three decoders behind a --format json|tuple|sexp CLI flag. Default = JSON for libpostal-compat; S-expr as power-user mode. Round-trip eval (parse(format(parse(raw))) == parse(raw)) tells us which shape preserves the most fidelity; if S-expr wins, flip default.
No model retraining required. This is post-processing on the BIO output. The same neural model serves all three formats.
Existing isp-nexus formatter (isp-nexus/universe/mailwoman/postal/formatting.ts) handles flat-object β multi-line-string for shipping-label rendering. Port it as the JSON-decoder's inverse direction; build a separate sexpβstring formatter for the S-expr path.
Source attribution (Phase 4 β #12, deferred)β
Differentiator vs libpostal and Pelias. Libpostal says "this token is a city"; Pelias adds hit/miss provenance against ES indexes. Mailwoman's goal (operator framing): "this token is a locality identified by the wof-admin gazetteer with 0.97 confidence."
The corpus already carries source per-row (every alignment + augmentation tags provenance). What's missing is propagating that provenance through the inference output.
Decision (operator, 2026-05-18): hybrid Option C β model + Resolver.
- Model emits BIO labels + per-token confidence (already does)
- Resolver at inference time maps
(span, label) β (source-class, src-conf)via fast lookups against pre-indexed gazettes (wof, ban, tiger, nppes, hrsa, imls-pls, state-*) - Output annotations live as S-expression attributes:
(locality "Paris" :src wof-admin :conf 0.97
(postcode "75005" :src wof-postalcode :conf 0.93))
Key insight: provenance is already flowing through trainingβ
source_weights (shipped 2026-05-18 in PR #44) affects what the trained model emphasizes during training even though the model itself doesn't emit source labels. The Resolver at inference time surfaces what was implicit in training. The data path:
corpus row (source-stamped)
β source_weights filter at data loader
β training distribution (model internalizes source priors)
β model emits BIO + confidence (no source field)
β Resolver looks up labeled spans against gazettes
β output: BIO + label + confidence + source-class + src-conf
Rejected alternativesβ
- Resolver-only at inference, no source-aware training β model has no internal calibration toward gazette-confirmed tokens; weaker.
- Multi-task neural head emitting
(label, source-class)per token β ~15 Γ 8 = 120 sub-classes; bigger model, harder to retrain, loses the clean separation between span-identification (model job) and provenance (Resolver job).
Anti-patterns to avoidβ
- β A
mode: 'auto'setting inClassifierPolicythat "intelligently" picks rule vs neural. Hidden state, hard to debug. Always explicit. - β Hardcoding
'en-US'as default in core. Default should be "no locale, classifiers must opt in." - β Loading all weights at module init. Lazy-load on first classification request.
- β Mixing inference and resolution. The parser does not know about coordinates, place IDs, or WOF resolution. That's Phase 4.
- β Adding tags to
ComponentTagwithout updating the schema doc and the alignment logic in the same commit. Schema drift kills the corpus.