Phase 3 โ TypeScript Integration & Ship
Goal: integrate the trained model into Mailwoman via NeuralSequenceClassifier, wire it through the policy system, publish @mailwoman/neural@0.1.0 and weight packages to npm. End state: a user can npm install @mailwoman/neural @mailwoman/neural-weights-en-us and get neural classification for coarse components.
Duration estimate: 1.5 weeks.
Branch: neural/phase-3-integration
Depends on: Phase 2 complete with weights packages on disk.
Pre-flightโ
- Phase 2 weights packages exist and pass eval
- You have npm publish access (or know how to coordinate with the maintainer for publishing)
-
onnxruntime-nodeinstalls cleanly in the dev environment
Tasksโ
1. Package scaffoldingโ
- Create
packages/neural/workspace - Dependencies:
onnxruntime-node, SentencePiece WASM (pick one โ@bloomberg/sentencepieceor a maintained alternative; verify and document inDECISIONS.md),@mailwoman/core - Strict TypeScript config
2. SentencePiece tokenizer wrapperโ
-
packages/neural/src/tokenizer.ts - Loads the SentencePiece model from a weights package
- Exposes
encode(text: string): { ids: number[]; tokens: string[]; offsets: [number, number][] } - The
offsetsfield is critical โ it's how we map model output BIO labels back to character spans in the original input.
โ Tokenizer parity check โ the single most important test in this phase:
-
packages/neural/test/tokenizer-parity.test.ts - Loads 10k
(raw, tokens)pairs from corpus-v0.1.0 - Re-tokenizes each
rawwith the TS SentencePiece - Asserts byte-for-byte equality with the stored
tokens - If a single example fails, the integration is broken. Fix before continuing.
3. ONNX inference wrapperโ
-
packages/neural/src/onnx-runner.ts - Loads ONNX model via
onnxruntime-node - Exposes
infer(tokenIds: number[]): Promise<number[][]>returning logits per token - Handles batching (group multiple sections for one inference call)
- Configurable execution provider (CPU default, CUDA via env var)
- Lazy loading: model loads on first
infercall, not on construction (unlesswarmup: true)
4. BIO decodingโ
-
packages/neural/src/decode.ts - Input: logits per token, original token offsets
- Output: list of
(span, component, confidence)tuples - BIO decoding rules:
B-Tstarts a new spanI-Tcontinues the previous span if it wasB-TorI-T, else treat asB-T(graceful)Oends any open span
- Confidence per span = mean softmax probability of the predicted label across the span's tokens
- Test cases: well-formed BIO, ill-formed BIO (recovery), all-O input, single-token spans
5. NeuralSequenceClassifierโ
-
packages/neural/src/sequence-classifier.ts - Implements
Classifierfrom@mailwoman/core - Constructor takes
NeuralSequenceClassifierOptions, resolves and loads the weights package -
classify(section, context):- Tokenize the section's text
- Run ONNX inference
- Decode BIO labels to spans
- Filter by
minConfidence - Map spans back to
Spanobjects in the original input (use SentencePiece offsets, then translate to original-input offsets via section's start offset) - Emit
ClassificationProposal[]withsource: 'neural',source_idfrom the model card version
- Returns empty array if model fails to load (graceful degradation per
reference/OPERATIONS.md)
6. Weights package loaderโ
-
packages/neural/src/weights.ts -
loadWeights(packageName: string): Promise<{ modelBytes, tokenizerBytes, modelCard }> - Resolves package via Node module resolution
- Reads files from package's distribution directory
- Caches in memory across instances of
NeuralSequenceClassifier
7. Solver integrationโ
- In
packages/core/src/solver.ts(or wherever the existing solver entry lives), add the neural classifier path:- If a
LocaleProfilehas aweightsPackage, instantiate aNeuralSequenceClassifierfor that locale - Run it alongside rule classifiers during the classify phase
- All proposals flow through
PolicyRegistry.applybefore solving
- If a
- No solver logic changes needed โ the abstraction from Phase 0 was designed for this.
8. Policy migration: country โ neural_preferredโ
The first Ship of Theseus step.
- Update
packages/core/src/policy-defaults.ts:country: fromrule_onlytoneural_preferredwithconfidence_threshold: 0.8
- Run the full test suite + golden set eval
- Verify on the golden set that
countryaccuracy improved (or held) and no other component regressed - If anything regresses, revert the policy change and investigate. The migration is gated on metrics, not on intent.
โถ Repeat the same for region if Phase 2 metrics justify. Coarse components only. Do not migrate locality or below in Phase 3 โ defer until Tier 2 trains.
9. End-to-end testโ
-
packages/neural/test/e2e.test.ts - Test:
parse("123 Main St, Portland, OR 97215", { locale: 'en-US' })- Neural classifier emits
country: USA(inferred even though "USA" is absent from input) - Rule classifier still emits
postcode: 97215 - Final solution has both
sourcefields are correctly populated
- Neural classifier emits
- Test: same with
locale: 'fr-FR'and a French address - Test: missing weights package โ graceful degradation, rule-only solution
10. CLI surfacingโ
-
npx mailwoman parse --locale en-US --neural ...opts into neural classifier -
npx mailwoman parse --locale en-US --no-neural ...opts out (rule-only) - Default: neural on if weights package is installed, off otherwise
- Debug output: when neural is on, show which classifier produced each component in the output
11. Documentationโ
-
packages/neural/README.mdโ installation, usage, configuration, theLocaleProfileextension point - Update root
README.mdwith a new "Neural" section - Migration guide: how an existing Mailwoman user opts into the neural classifier
- Performance numbers from the benchmark suite
12. Performance & benchmarksโ
-
packages/neural/bench/with vitest-bench or simple scripts - Benchmark: cold load time, warm classification latency p50/p95/p99, throughput, memory footprint
- All within budgets in
reference/OPERATIONS.md - If neural is slower than budgeted: profile. Likely culprits: ONNX session not reused across calls, batching not enabled, SentencePiece WASM not warm.
13. Publishโ
- Final version bumps:
@mailwoman/coreโ minor bump (added types)@mailwoman/classifiersโ minor bump (adapter additions)@mailwoman/neural@0.1.0โ new, marked beta@mailwoman/neural-weights-en-us@0.1.0โ new, marked beta@mailwoman/neural-weights-fr-fr@0.1.0โ new, marked beta
- Check licenses on every package's
package.jsonโ AGPL-3.0 for code, weights packages need their own license decision (see Phase 6 license note) -
npm publisheach. Use--access publicfor scoped packages. - Verify install in a clean directory:
npm install @mailwoman/neural @mailwoman/neural-weights-en-usand run a sample parse.
โ Before publishing weights, settle the license question. OSM is ODbL (share-alike). WOF is CC0. BAN is Licence Ouverte. A model trained on ODbL data is arguably ODbL โ talk to a lawyer or pick a clear license posture and document it.
Success criteria checklistโ
- Tokenizer parity test green on 10k examples
- End-to-end test green for en-US and fr-FR
- Benchmarks within budget
- Golden set: country F1 improved (or held), nothing regressed
-
npx mailwoman parseworks with--localeand uses neural classifier when available - Packages published to npm
- Documentation updated
- Branch tagged
neural-phase-3-complete - Blog post draft (optional but recommended) in
docspathname:///blog/2026-...-neural-classifier.md
When the human will want a check-inโ
After Phase 3 ships, pause. The human (the project creator) should look at:
- Real-world feedback from a couple of test deployments
- The blog post draft
- License decisions on weights packages
- Whether to proceed to Tier 2 (street-level) or pause to gather usage data
Do not begin Phase 4 (geocoder fusion) or even Tier 2 of Phase 1/2 (street-level training) without explicit confirmation. The point of shipping is to learn.
When to call this phase doneโ
When npm install @mailwoman/neural @mailwoman/neural-weights-en-us followed by a parse() call works on a clean machine and produces output with source: 'neural' proposals for the country component.