Understanding Mailwoman
This track is for readers who want to understand why Mailwoman exists, what problem it solves, what it parses, and how it works โ in that order. The articles build on each other, but each is self-contained enough to be read independently.
The four layersโ
Tier 0 โ The problem (articles 2โ7)โ
These six articles establish the domain. If you are sceptical that address parsing needs a neural model, start here.
- How mail delivery actually works โ the postal system is already fuzzy. It tolerates ambiguity through human intervention.
- How humans break addresses โ the failure taxonomy. Real input is messier than any database expects.
- The database fallacy โ why "just store all addresses" is economically infeasible and geometrically wrong.
- The tokenization tautology โ why traditional rule-based parsers hit a structural ceiling.
- The 90% trap โ why 90% geocoder coverage is deceptively expensive.
- Why a neural parser? โ the bitter-lesson argument applied to address parsing. Bridges Tier 0 โ Tier 1.
Tier 0.5 โ Postal address concepts (articles 8โ12)โ
These five articles explain the fundamental concepts of postal addressing โ the things a parser actually parses and a resolver actually resolves. New as of May 2026.
- What is a postcode? โ postcodes are routing instructions, not geographic areas. International comparison.
- What is a ZIP Code and how is it structured? โ the US 11-digit system in detail. Why US-trained parsers fail on non-US postcodes.
- What is a concordance? โ how the resolver validates that parsed components form a coherent real-world place.
- What is an intersection address? โ crossing streets as locations. Why not all addresses have building numbers.
- How can a building have two addresses? โ mailing vs 911 vs utility vs management. An address is a protocol, not a property.
Tier 1 โ The architecture (articles 13โ20)โ
These describe Mailwoman's design. Most existed before May 2026 and have been reordered to follow the domain articles.
- What is an address? โ the data model, moved from concepts.
- Addresses that break geocoders โ concrete failure examples, moved from concepts.
- From Pelias to Mailwoman โ the short historical bridge.
- How it used to work โ Mailwoman v1 (rule-based) in detail.
- How it works now โ the current rule + neural hybrid.
- How it will work โ the near-future roadmap.
- The knowledge ladder โ the decomposition principle behind the staged pipeline.
- The staged pipeline โ the Mailwoman runtime end-to-end.
Referenceโ
- Glossary โ every technical term defined on first use.
Appendixโ
- Why not just use Google's API? โ pricing, terms, lock-in, and when renting is the right choice.
- Why not use geocode.earth? โ the open-source hosted alternative and its Pelias parser ceiling.
The case for simple geocodersโ
Steel-manning the reasonably defensible compromises that work for most applications.
- Overview โ when simple is the right choice, and when it isn't.
- Normalize to match โ strip, lowercase, fuzzy-match against a known database.
- Postcode-only โ extract the postcode, centroid the result.
- Gazetteer-first โ skip parsing, treat as information retrieval.
- Regex-anchored fields โ extract the 3-4 fields you care about, ignore the rest.
- Locality-only โ find the city, centroid it.
- Human-in-the-loop โ don't parse, suggest, let the user confirm.
- Close-enough โ define your precision requirement, pick the cheapest approach, stop.
Falsehoods about addressesโ
Inspired by and citing Michael Tandy's original catalogue. Each article takes one category of falsehood, explains how traditional geocoders handled it, and what Mailwoman's neural approach changes.
- Overview โ the taxonomy and why it matters for Mailwoman.
- Numbers in addresses โ zero, negative, fractions, duplicates, ranges.
- Street names โ missing suffixes, numbered streets, recurring names, no streets at all.
- Postcodes โ leading zeros, multi-city, per-building, missing postcodes.
- Administrative hierarchy โ no states, no counties, duplicate cities, city-states.
- Address format โ non-ASCII, variable ordering, special characters, changing addresses.
- Shapes and dimensions โ not a point, not a polygon, not a building, not at ground level, not unique per coordinate.
- Precision and frontages โ not one correct coordinate, not always "close enough," not necessarily the front door.
Exotic point-of-interest queriesโ
Not every geocoder query is an address. A large fraction of real-world searches ask for points of interest โ named places, categories, brands, landmarks, and transit infrastructure.
- Overview โ what counts as a POI query vs an address query.
- Amenity queries โ water fountain, gas station, ATM, pharmacy.
- Franchise and brand queries โ Walmart, McDonalds, Hilton, Starbucks.
- Regional variants โ servo, bodega, ใใฏใ, off-licence, chemist.
- Landmark queries โ Eiffel Tower, Golden Gate Bridge, Empire State Building.
- Transit queries โ subway station, bus stop, airport, train station.
Reading orderโ
If you are sceptical about the whole project, read Tier 0 in order (articles 2โ6). It should take about 45 minutes. If you are still sceptical after that, the project has failed to make its case โ open an issue.
If you are new to geocoding but curious, start with What is an address? (article 13), then read Tier 0. The data-model article grounds the domain arguments.
If you are from the Pelias world, read From Pelias to Mailwoman (article 14), then How it works now (article 16). Those two articles bridge your existing knowledge to the current system.
If you want to go deeper, see the concepts/ track โ per-topic deep dives into tokenization, BIO labels, the CRF decoder, ONNX runtime, training pipeline, corpus construction, and more.
A note on languageโ
The team behind Mailwoman speaks many languages, and English is a second language for many readers. Our goal in these docs is plain English with technical terms defined on first use. If a sentence is hard to follow, that is a documentation bug โ please open an issue.
Where this lives in the repoโ
docs/
โโโ articles/
โ โโโ understanding/ โ you are here
โ โโโ concepts/ โ per-concept deep dives
โ โโโ plan/ โ operator + agent technical plan
โ โโโ evals/ โ per-version evaluation reports
โโโ src/pages/demo/ โ the live browser-side demo