API reference
Entry pointsโ
createRuntimePipeline(opts?)โ
The recommended entry point. Returns a one-call pipeline function that runs all six stages.
import { createRuntimePipeline } from "mailwoman"
function createRuntimePipeline(opts?: {
classifier?: AddressClassifier // Stage 3 โ typically NeuralAddressClassifier
resolver?: Resolver // Stage 6 โ typically createWofResolver(...)
detectLocale?: LocaleDetector // Stage 2 override (caller-trust stub by default)
classifyKind?: KindClassifier // Stage 2.5 override (rule-based by default)
groupPhrases?: PhraseGrouper // Stage 2.7 override (rule-based by default)
}): (raw: string, runOpts?: PipelineOpts) => Promise<PipelineResult>
NeuralAddressClassifierโ
The neural model wrapper. Loads ONNX weights + SentencePiece tokenizer.
import { NeuralAddressClassifier } from "@mailwoman/neural"
// From npm weights package
const classifier = await NeuralAddressClassifier.loadFromWeights({ locale: "en-US" })
// From explicit file paths
const classifier = await NeuralAddressClassifier.loadFromWeights({
modelPath: "./model.onnx",
tokenizerPath: "./tokenizer.model",
})
// Parse a single address
const tree: AddressTree = await classifier.parse("350 5th Ave, New York, NY 10118")
// Parse with logits exposed (for downstream joint-reconcile)
const { tree, logits, pieces } = await classifier.parseWithLogits("350 5th Ave, New York, NY 10118")
createWofResolver(opts)โ
Creates a resolver backed by a Who's On First SQLite distribution.
import { createWofResolver } from "@mailwoman/core/resolver"
const resolver = await createWofResolver({
dbPath: "./wof.sqlite",
// Optional: locale hint for name-language preference
locale: "en-US",
})
// Resolve a parsed tree
const resolvedTree: AddressTree = await resolver.resolveTree(tree)
// Opt-in resolver enrichments (both off by default โ byte-stable):
// hierarchyCompletion โ recover the dropped locality of a dual-role place (Berlin, Milano)
// includeAncestors โ stamp each resolved node's containment chain onto metadata.ancestors
// See concepts/dual-role-places.md.
const enriched = await resolver.resolveTree(tree, { hierarchyCompletion: true, includeAncestors: true })
The resolver is the second half of a deliberate division of labor: gazetteer knowledge also flows into the parse itself as soft anchor features (postcode anchor, gazetteer candidate-tag clues) โ the lexicon informs the model, it never overrides it. See What Mailwoman is for the architecture and Closed-vocab fields: model-first for why "informs, never overrides" is load-bearing.
Output typesโ
AddressTreeโ
The result of parsing. A flat list of root nodes with parent-child nesting.
interface AddressTree {
raw: string
roots: AddressNode[]
}
AddressNodeโ
A single parsed component.
interface AddressNode {
tag: ComponentTag // "locality", "street", "house_number", "postcode", etc.
value: string // The raw text span, e.g. "New York"
span: Span // { start: number, end: number } character offsets
confidence: number // 0..1, mean per-token softmax probability
source: "rule" | "neural" // Which classifier produced this node
children: AddressNode[] // Nested components (e.g. street contains street_prefix)
alternatives?: AddressAlternative[] // Resolver candidates when --candidates is set
interpretations?: Interpretation[] // Extra roles this one span plays โ a city-state is region AND locality (#413)
}
A node usually plays one role (its tag). A dual-role place plays several on one span: Berlin is both a region and a locality. The extra roles ride in interpretations (each a { tag, placeId, lat, lon, โฆ }), and the flat projections surface every role โ decodeAsJson emits both region and locality, decodeAsXml lists roles="region locality". See concepts/dual-role-places.md.
AddressAlternativeโ
A resolver candidate for an ambiguous component.
interface AddressAlternative {
placeId: number // WOF ID
name: string // Canonical name
placetype: string // "locality", "region", "country", etc.
lat: number
lon: number
parentId?: number // WOF parent_id for hierarchy traversal
score: number // Match ร population ranking score
}
PipelineResultโ
The full output from createRuntimePipeline.
interface PipelineResult {
input: string
normalized: NormalizedInputLite // NFC-normalized text + offset map
queryShape: QueryShapeLite // Script class, postcode format hits, token classes
locale: LocaleHint // Detected or caller-provided locale
kind: QueryKindResult // postcode_only | structured_address | intersection | ...
phraseProposals: PhraseProposal[] // Stage 2.7 span boundary proposals
tree: AddressTree // Parsed + (optionally) resolved output
timing: Record<string, number> // Per-stage wall-clock in ms
path: "fast-path" | "full" // Whether the pipeline took the fast-path shortcut
}
Pipeline optionsโ
interface PipelineOpts {
locale?: string // BCP-47 tag, e.g. "en-US". Default: caller-trust stub
signal?: AbortSignal // Cancel the pipeline between stages
resolveOpts?: {
candidatesPerLookup?: number // How many alternatives to surface (default: 1)
}
forceJointReconcile?: boolean // Use joint decoding instead of argmax fallback
}
forceJointReconcile requires a classifier with parseWithLogits support (the NeuralAddressClassifier has this). When enabled, Stage 5 runs beam search over phrase proposals ร classifier top-K ร resolver candidates with WOF concordance scoring. Currently opt-in behind this flag.
CLIโ
mailwoman parse [options] <address string>
| Flag | Description |
|---|---|
--locale <tag> | BCP-47 locale (default: en-US) |
--format json|tuple|xml | Output format (default: json) |
--resolve | Run resolver against WOF SQLite |
--resolve-db <path> | Path to WOF SQLite distribution |
--candidates <n> | Surface up to n alternative resolutions per node |
--debug | Emit full PipelineResult as JSON |
--no-neural | Skip neural classifier (normalize + queryShape + resolver only) |
--isolated | Legacy rule-only parser (bypasses the pipeline) |
--benchmark <n> | Run pipeline n times and emit per-stage p50/p95/p99 |
--policy <c>=<m> | Per-component policy override (repeatable). e.g. --policy postcode=rule_only |
Policy modesโ
| Mode | Behavior |
|---|---|
rule_only | Use only the rule classifier for this component |
neural_only | Use only the neural classifier |
both | Run both, solver picks the best |
neural_preferred | Prefer neural, fall back to rule if confidence < threshold |
rule_preferred | Prefer rule, fall back to neural if confidence < threshold |
Browser importsโ
// Neural classifier (WASM-based onnxruntime-web)
import { loadNeuralClassifierFromUrls } from "@mailwoman/neural-web"
// WOF resolver (WASM-based sqlite-wasm)
import { createWasmResolver } from "@mailwoman/resolver-wof-wasm"
// Map rendering (MapLibre + Protomaps)
import { StyleSpecificationComposer } from "@mailwoman/cartographer"
The browser path mirrors the Node API. classifier.parse() and classifier.parseWithLogits() have the same signatures. The demo at /demo is the reference browser integration.
Component tagsโ
The full ComponentTag union (source of truth: core/types/component.ts):
| Tag | Description | Example |
|---|---|---|
country | Sovereign state | USA |
region | First-level admin | OR |
locality | City, town | Portland |
dependent_locality | Sub-locality | Brooklyn |
postcode | Postal code | 97215 |
subregion | County-level admin | Multnomah County |
house_number | Building number | 6220 |
street | Street name | Salmon St |
street_prefix | Directional prefix | SE |
street_suffix | Street type | Street |
venue | Named place | Mt Tabor Park |
unit | Apartment/suite | Apt 4B |
intersection_a | First crossing street | 5th Ave |
intersection_b | Second crossing street | 42nd St |
attention | Care-of line | c/o Jane Doe |
po_box | Post office box | PO Box 1234 |
cedex | FR postal routing | CEDEX 08 |
The current neural model (v0.4.0) emits 10 of these: country, region, locality, dependent_locality, postcode, subregion, cedex, venue, street, house_number. The remaining tags are rule-classifier-only.