Skip to main content

Status

This page is the single source of truth for what works in Mailwoman right now.

Verified as of 2026-06-10 (night-10) โ€” release 4.2.0

Numbers below are sourced from the shipped model-card.json and the parity scorecard (2026-06-10), which is the authoritative per-tag table. For which release added which capability, see Releases & capabilities.

Packagesโ€‹

All publishable workspaces release in lockstep; the current npm version is 4.2.0.

PackageDescription
mailwomanCLI + high-level AddressParser + runtime pipeline factory
@mailwoman/coreTokenization, classification, solver, decoder, policy registry
@mailwoman/codexPer-address-system postal reference data (USPS suffixes, ZIP types)
@mailwoman/classifiersRule-based classifiers
@mailwoman/neuralONNX runtime + SentencePiece tokenizer + anchor inference
@mailwoman/neural-weights-en-us / -fr-frTrained model bundles
@mailwoman/normalize / query-shape / locale-gate / kind-classifier / phrase-grouperPipeline stages 1โ€“2.7
@mailwoman/resolver-wof-sqlite / -wasmWOF gazetteer resolver (Node SQLite / browser WASM)
@mailwoman/neural-webBrowser-side neural classifier (onnxruntime-web)

Model weights (4.2.0)โ€‹

PropertyValue
Lineagev1.0.2-consolidation-runB @ step 20k โ€” the v1.0 parity consolidation (unit + affix + country anchor + multi-locale), corpus v0.4.12
Architecture29.3M params โ€” 6-layer encoder, 384 hidden, 6 heads, 48K-vocab SentencePiece embedding
Label vocabulary33 BIO labels over 16 component tags (incl. unit, po_box, street_prefix/_suffix, intersection_a/b)
Training corpuscorpus-v0.4.12-consolidation ยท tokenizer v0.6.0-a0
AnchorsPostcode anchor (PCB1) + gazetteer soft anchor (country candidate-tags), both live at inference
DecodeViterbi with frozen BIO structural mask (CRF at inference; CE-only training)
Size28.4 MB int8 ONNX (112.9 MB fp32) ยท opset 17 ยท max sequence 128

Per-tag health (summary)โ€‹

The full per-tag table โ€” two lenses, golden-dev and real-OOD โ€” lives in the parity scorecard. The shape of it:

  • Healthy: postcode (98/99 US/FR), house_number (96/91), venue-US (90), street (79 US).
  • Fixed by the parity campaign: unit 0 โ†’ 92.3 on real-OOD designators (the 4.1.0 headline).
  • Shipped by the consolidation (v4.2.0): locality +12.8, region +10.7, country 0 โ†’ 89.8 (gazetteer soft anchor), affixes now exist (0 โ†’ 64.9/48.8 at Pโ‰ˆ100), DE 90.9. Stated re-baselines: US street 76.2 (โˆ’2.3), unit 90.6 (โˆ’1.7) โ€” capacity ceiling measured, see #492.
  • Open gaps: FR venue/region (#330), intersections (#487).

Capability-arena truth (whole-parse, vs the v0 rules system): rules still win on clean/canonical input, the neural parser wins decisively on noisy/degraded input (+21pp), both are weak on rare edge formats. The arbitration layer (#478) exists to make the pipeline take the better of both per component.

Runtime pipeline โ€” stage statusโ€‹

StageNameStatusNotes
1Normalizeโœ… defaultNFC, whitespace collapse, locale-aware case folding
2Locale gate๐Ÿšฉ opt-inShips (@mailwoman/locale-gate); not wired as factory default
2.5Kind classifierโœ… defaultRule-based; fast-paths bare postcodes and single localities
2.7Phrase grouperโœ… defaultRule-based span proposals before classification
3Token classifyโœ… defaultNeural encoder + rule classifiers, hybrid via policy registry (per-component modes)
4Sequence correctโœ… defaultViterbi over the frozen BIO mask; postcode/unit label repairs
5Reconcile๐Ÿšฉ opt-inJoint decoding with WOF concordance scoring (forceJointReconcile + parseWithLogits); argmax is the default
6Resolve๐Ÿšฉ opt-inWOF SQLite gazetteer: admin/postcode-level candidates with coordinates, exact-match tiering, coordinate-first postcode disambiguation, opt-in hierarchyCompletion/cityStateFallback/includeAncestors

Confidenceโ€‹

Every node carries a confidence score; pass a calibrator (isotonic, shipped per-locale in the weights bundle as calibration.json) to make those scores honest probabilities โ€” see Confidence calibration for what the numbers mean.

Browser demoโ€‹

The live demo at mailwoman.sister.software/demo runs the int8 model (28.4 MB) via @mailwoman/neural-web plus a slim WOF "hot" database via @mailwoman/resolver-wof-wasm, with artifacts served from a versioned releases.json pointer. It calls the classifier's argmax path directly; joint reconcile is not active in the demo.

Model line (June 2026)โ€‹

The shipped weights are 4.2.0 โ€” the v1.0 consolidation flag-plant (#466, closed). The next model-side step is the architecture escalation (#492, operator-gated); the next pipeline step is arbitration (#478). The ordered work queue to the full geocoder is epic #488.

What is not supported todayโ€‹

  • Rooftop / address-point geocoding โ€” the resolver returns place-level candidates. The address-point tier (#476) and interpolation (#483) are queued.
  • Reverse geocoding (#484) and autocomplete-as-a-product (#190).
  • Multi-locale ensemble (en-US + fr-FR weights in the same runtime).
  • Korean addresses (no open data path). Japanese addresses are supported via the postcode-route resolver (Geographic Rule Engine).
  • A supported server API โ€” mailwoman/server ships an experimental Express server (/api/parse, /api/resolve); production service hardening is #485.