Skip to main content

API reference

Entry pointsโ€‹

createRuntimePipeline(opts?)โ€‹

The recommended entry point. Returns a one-call pipeline function that runs all six stages.

import { createRuntimePipeline } from "mailwoman"

function createRuntimePipeline(opts?: {
classifier?: AddressClassifier // Stage 3 โ€” typically NeuralAddressClassifier
resolver?: Resolver // Stage 6 โ€” typically createWofResolver(...)
detectLocale?: LocaleDetector // Stage 2 override (caller-trust stub by default)
classifyKind?: KindClassifier // Stage 2.5 override (rule-based by default)
groupPhrases?: PhraseGrouper // Stage 2.7 override (rule-based by default)
}): (raw: string, runOpts?: PipelineOpts) => Promise<PipelineResult>

NeuralAddressClassifierโ€‹

The neural model wrapper. Loads ONNX weights + SentencePiece tokenizer.

import { NeuralAddressClassifier } from "@mailwoman/neural"

// From npm weights package
const classifier = await NeuralAddressClassifier.loadFromWeights({ locale: "en-US" })

// From explicit file paths
const classifier = await NeuralAddressClassifier.loadFromWeights({
modelPath: "./model.onnx",
tokenizerPath: "./tokenizer.model",
})

// Parse a single address
const tree: AddressTree = await classifier.parse("350 5th Ave, New York, NY 10118")

// Parse with logits exposed (for downstream joint-reconcile)
const { tree, logits, pieces } = await classifier.parseWithLogits("350 5th Ave, New York, NY 10118")

createWofResolver(opts)โ€‹

Creates a resolver backed by a Who's On First SQLite distribution.

import { createWofResolver } from "@mailwoman/core/resolver"

const resolver = await createWofResolver({
dbPath: "./wof.sqlite",
// Optional: locale hint for name-language preference
locale: "en-US",
})

// Resolve a parsed tree
const resolvedTree: AddressTree = await resolver.resolveTree(tree)

// Opt-in resolver enrichments (both off by default โ†’ byte-stable):
// hierarchyCompletion โ€” recover the dropped locality of a dual-role place (Berlin, Milano)
// includeAncestors โ€” stamp each resolved node's containment chain onto metadata.ancestors
// See concepts/dual-role-places.md.
const enriched = await resolver.resolveTree(tree, { hierarchyCompletion: true, includeAncestors: true })

The resolver is the second half of a deliberate division of labor: gazetteer knowledge also flows into the parse itself as soft anchor features (postcode anchor, gazetteer candidate-tag clues) โ€” the lexicon informs the model, it never overrides it. See What Mailwoman is for the architecture and Closed-vocab fields: model-first for why "informs, never overrides" is load-bearing.

Output typesโ€‹

AddressTreeโ€‹

The result of parsing. A flat list of root nodes with parent-child nesting.

interface AddressTree {
raw: string
roots: AddressNode[]
}

AddressNodeโ€‹

A single parsed component.

interface AddressNode {
tag: ComponentTag // "locality", "street", "house_number", "postcode", etc.
value: string // The raw text span, e.g. "New York"
span: Span // { start: number, end: number } character offsets
confidence: number // 0..1, mean per-token softmax probability
source: "rule" | "neural" // Which classifier produced this node
children: AddressNode[] // Nested components (e.g. street contains street_prefix)
alternatives?: AddressAlternative[] // Resolver candidates when --candidates is set
interpretations?: Interpretation[] // Extra roles this one span plays โ€” a city-state is region AND locality (#413)
}

A node usually plays one role (its tag). A dual-role place plays several on one span: Berlin is both a region and a locality. The extra roles ride in interpretations (each a { tag, placeId, lat, lon, โ€ฆ }), and the flat projections surface every role โ€” decodeAsJson emits both region and locality, decodeAsXml lists roles="region locality". See concepts/dual-role-places.md.

AddressAlternativeโ€‹

A resolver candidate for an ambiguous component.

interface AddressAlternative {
placeId: number // WOF ID
name: string // Canonical name
placetype: string // "locality", "region", "country", etc.
lat: number
lon: number
parentId?: number // WOF parent_id for hierarchy traversal
score: number // Match ร— population ranking score
}

PipelineResultโ€‹

The full output from createRuntimePipeline.

interface PipelineResult {
input: string
normalized: NormalizedInputLite // NFC-normalized text + offset map
queryShape: QueryShapeLite // Script class, postcode format hits, token classes
locale: LocaleHint // Detected or caller-provided locale
kind: QueryKindResult // postcode_only | structured_address | intersection | ...
phraseProposals: PhraseProposal[] // Stage 2.7 span boundary proposals
tree: AddressTree // Parsed + (optionally) resolved output
timing: Record<string, number> // Per-stage wall-clock in ms
path: "fast-path" | "full" // Whether the pipeline took the fast-path shortcut
}

Pipeline optionsโ€‹

interface PipelineOpts {
locale?: string // BCP-47 tag, e.g. "en-US". Default: caller-trust stub
signal?: AbortSignal // Cancel the pipeline between stages
resolveOpts?: {
candidatesPerLookup?: number // How many alternatives to surface (default: 1)
}
forceJointReconcile?: boolean // Use joint decoding instead of argmax fallback
}

forceJointReconcile requires a classifier with parseWithLogits support (the NeuralAddressClassifier has this). When enabled, Stage 5 runs beam search over phrase proposals ร— classifier top-K ร— resolver candidates with WOF concordance scoring. Currently opt-in behind this flag.

CLIโ€‹

mailwoman parse [options] <address string>
FlagDescription
--locale <tag>BCP-47 locale (default: en-US)
--format json|tuple|xmlOutput format (default: json)
--resolveRun resolver against WOF SQLite
--resolve-db <path>Path to WOF SQLite distribution
--candidates <n>Surface up to n alternative resolutions per node
--debugEmit full PipelineResult as JSON
--no-neuralSkip neural classifier (normalize + queryShape + resolver only)
--isolatedLegacy rule-only parser (bypasses the pipeline)
--benchmark <n>Run pipeline n times and emit per-stage p50/p95/p99
--policy <c>=<m>Per-component policy override (repeatable). e.g. --policy postcode=rule_only

Policy modesโ€‹

ModeBehavior
rule_onlyUse only the rule classifier for this component
neural_onlyUse only the neural classifier
bothRun both, solver picks the best
neural_preferredPrefer neural, fall back to rule if confidence < threshold
rule_preferredPrefer rule, fall back to neural if confidence < threshold

Browser importsโ€‹

// Neural classifier (WASM-based onnxruntime-web)
import { loadNeuralClassifierFromUrls } from "@mailwoman/neural-web"

// WOF resolver (WASM-based sqlite-wasm)
import { createWasmResolver } from "@mailwoman/resolver-wof-wasm"

// Map rendering (MapLibre + Protomaps)
import { StyleSpecificationComposer } from "@mailwoman/cartographer"

The browser path mirrors the Node API. classifier.parse() and classifier.parseWithLogits() have the same signatures. The demo at /demo is the reference browser integration.

Component tagsโ€‹

The full ComponentTag union (source of truth: core/types/component.ts):

TagDescriptionExample
countrySovereign stateUSA
regionFirst-level adminOR
localityCity, townPortland
dependent_localitySub-localityBrooklyn
postcodePostal code97215
subregionCounty-level adminMultnomah County
house_numberBuilding number6220
streetStreet nameSalmon St
street_prefixDirectional prefixSE
street_suffixStreet typeStreet
venueNamed placeMt Tabor Park
unitApartment/suiteApt 4B
intersection_aFirst crossing street5th Ave
intersection_bSecond crossing street42nd St
attentionCare-of linec/o Jane Doe
po_boxPost office boxPO Box 1234
cedexFR postal routingCEDEX 08

The current neural model (v0.4.0) emits 10 of these: country, region, locality, dependent_locality, postcode, subregion, cedex, venue, street, house_number. The remaining tags are rule-classifier-only.