Skip to main content

Getting started

Mailwoman parses address strings into structured components and resolves them against open gazetteer data. It runs in Node.js and the browser.

Installationโ€‹

npm install mailwoman @mailwoman/neural @mailwoman/neural-weights-en-us

This installs the CLI, the neural runtime, and the US English model weights (~25 MB). For French addresses, add @mailwoman/neural-weights-fr-fr.

Your first parseโ€‹

Node.jsโ€‹

import { NeuralAddressClassifier } from "@mailwoman/neural"

const classifier = await NeuralAddressClassifier.loadFromWeights({ locale: "en-US" })
const tree = await classifier.parse("350 5th Ave, New York, NY 10118")

for (const node of tree.roots) {
console.log(`${node.tag}: "${node.value}" (confidence: ${node.confidence.toFixed(2)})`)
}

Output:

house_number: "350" (confidence: 0.94)
street: "5th Ave" (confidence: 0.72)
locality: "New York" (confidence: 0.65)
region: "NY" (confidence: 0.88)
postcode: "10118" (confidence: 0.91)

Those confidence numbers are calibrated probabilities, not vibes โ€” pass the bundled per-locale calibrator and "0.88" means right-about-88%-of-the-time. See Confidence calibration for how to load it and what the numbers mean.

CLIโ€‹

mailwoman parse "350 5th Ave, New York, NY 10118"

Add --resolve to look up components against the Who's On First gazetteer:

mailwoman parse "350 5th Ave, New York, NY 10118" --resolve --resolve-db ./wof.sqlite

Output format options: --format json (default), --format tuple, --format xml.

Browserโ€‹

import { loadNeuralClassifierFromUrls } from "@mailwoman/neural-web"

const classifier = await loadNeuralClassifierFromUrls({
modelUrl: "/mailwoman/model.onnx",
tokenizerUrl: "/mailwoman/tokenizer.model",
})

const tree = await classifier.parse("350 5th Ave, New York, NY 10118")

See the demo page for a live example with map integration.

Using the staged pipelineโ€‹

The classifier.parse() call above uses a direct neural path. For the full staged pipeline โ€” normalize โ†’ query shape โ†’ kind classify โ†’ phrase group โ†’ token classify โ†’ resolve โ€” use createRuntimePipeline:

import { createRuntimePipeline } from "mailwoman"
import { NeuralAddressClassifier } from "@mailwoman/neural"

const classifier = await NeuralAddressClassifier.loadFromWeights({ locale: "en-US" })
const pipeline = createRuntimePipeline({ classifier })

const result = await pipeline("350 5th Ave, New York, NY 10118")
// result.tree โ€” the parsed address
// result.timing โ€” per-stage wall-clock breakdown
// result.queryShape โ€” structural input priors
// result.kind โ€” query category (structured_address, postcode_only, etc.)
// result.phraseProposals โ€” span boundaries from the phrase grouper

Fast-path routing: bare postcodes and single localities skip the neural classifier automatically.

Adding resolutionโ€‹

The resolver turns parsed components into place IDs and coordinates:

import { createWofResolver } from "@mailwoman/core/resolver"

const resolver = await createWofResolver({ dbPath: "./wof.sqlite" })
const pipeline = createRuntimePipeline({ classifier, resolver })

const result = await pipeline("350 5th Ave, New York, NY 10118")
// result.tree roots now carry wof:id + lat/lon for matched components

Honest caveatsโ€‹

  • The resolver is administrative/postcode-level, not rooftop. It returns place centroids (locality, region, postcode), not delivery-point coordinates.
  • Neural model quality varies by component. house_number F1 is 0.79. street and venue are lower. Coarse components (country, region, locality) are better served by the rule-based WOF dictionary classifiers in the hybrid pipeline.
  • Non-Latin scripts have limited support. The current tokenizer (v0.1.0, 16K vocab) falls back to raw bytes on CJK, Cyrillic, and other non-Latin scripts.
  • Browser cold load is ~60 MB (25 MB model + 35 MB gazetteer). Cached after first visit.
  • Full-parse exact match is low (~8% on the golden set). The model is early-stage. The architecture is additive โ€” rule classifiers handle what the model doesn't.

Where to go nextโ€‹