Addresses that break geocoders
What is an address? covered the data model. This article goes the other way: a tour of address shapes that consistently break parsers and resolvers, with concrete examples for each. If you are building or evaluating a geocoder, this is the failure-mode catalogue worth keeping next to your test suite.
Falsehoods about addresses
This article series is inspired by and cites Michael Tandy's excellent, exhaustive original — the canonical catalogue of address falsehoods, maintained since 2013. Tandy's article is a taxonomy of assumptions that break parsers, validators, and databases. This series expands on that taxonomy, adding historical context on how geocoders have handled (or failed to handle) each category, and what Mailwoman's neural approach changes.
From Pelias to Mailwoman
A short lineage for readers who already know Pelias.
Glossary
Every technical term used in this documentation, defined once. Skim it before reading the concepts/ track, or come back when a word is unfamiliar.
How it used to work
Mailwoman v1 (the pre-2026 version, still living on as the rule classifiers inside v2) parses an address in four steps. This article walks through each one with a concrete example.
How it will work
This article describes where Mailwoman is heading. The work is tracked in GitHub issues and the plan/ directory. Status is current as of May 2026.
Rule-based classifiers
A rule classifier is a small piece of hand-written code that labels tokens. Pelias and Mailwoman v1 are built almost entirely on rule classifiers. Mailwoman v2 keeps them and adds the neural classifier alongside — see How it works now for the hybrid.
The knowledge ladder
The staged pipeline is a contract for decomposition by what each layer knows. Every stage is the rightful home of a particular kind of information; pushing work to the wrong stage produces fragile systems that try to learn things from data that they could have looked up, or look up things that they could have learned. This article catalogues the layers, what each one knows, and the two layers we don't ship yet but should.
The pipeline contract
You don't have to take Mailwoman's pipeline as-is. The runtime coordinator (createRuntimePipeline) accepts each of the six stages as an injectable function or interface; an integrator can swap any of them for a custom implementation without forking the core.
The staged pipeline
Reading Addresses that break geocoders makes one thing obvious: no single model handles every failure class well. Different failures want different fixes. Some want preprocessing rules, some want a small classifier, some want a transformer, some want a resolver that returns candidates instead of pretending to be sure.
What is an address?
Addresses look obvious. You see one every day. But parsing them turns out to be one of the harder problems in natural language processing, because a single string can mean many places, and a single place can be written many ways.