Addresses that break geocoders
What is an address? covered the data model. This article goes the other way: a tour of address shapes that consistently break parsers and resolvers, with concrete examples for each. If you are building or evaluating a geocoder, this is the failure-mode catalogue worth keeping next to your test suite.
Amenity queries
An amenity query asks for the nearest instance of a generic category. The user doesn't care which gas station — they care which one is closest, open, or cheapest. The geocoder's job is to translate a category label into a set of candidate locations and rank them.
Exotic point-of-interest queries
Not every geocoder query is an address. A large fraction of real-world searches ask for points of interest — named places, categories of things, brands, landmarks, and transit infrastructure. "Find the nearest gas station" is not an address. "Where is the Eiffel Tower?" is not an address. "Show me every Hilton in Manhattan" is not an address.
Falsehoods about address format
The falsehoods
Falsehoods about address shapes and dimensions
An address is not necessarily a point on a map. It is not necessarily a polygon. It is not necessarily a discrete building. It is not necessarily at ground level. It is not necessarily the only address at its coordinates. And it is not necessarily stationary.
Falsehoods about addresses
This article series is inspired by and cites Michael Tandy's excellent, exhaustive original — the canonical catalogue of address falsehoods, maintained since 2013. Tandy's article is a taxonomy of assumptions that break parsers, validators, and databases. This series expands on that taxonomy, adding historical context on how geocoders have handled (or failed to handle) each category, and what Mailwoman's neural approach changes.
Falsehoods about administrative hierarchy
The falsehoods
Falsehoods about numbers in addresses
The falsehoods
Falsehoods about postcodes
The falsehoods
Falsehoods about street names
The falsehoods
Franchise and brand queries
A franchise query names a specific chain or brand. The user doesn't want the nearest restaurant — they want the nearest McDonalds. The brand name is the query, and the geocoder's job is to resolve it to one or more specific locations.
Gazetteer-first geocoding
The most radical simplification: don't parse at all. Treat the input as an information retrieval problem. Tokenize the string, try every token and n-gram against a gazetteer index, return the best placetype match. The parser is a search engine. The resolver is the search result.
How humans break addresses
Users do not type addresses the way gazetteers store them. They type what they know, in the order they think of it, with the spellings their keyboard supports, trusting autocomplete suggestions they didn't verify. A parser that only handles well-formed addresses fails on real input.
How mail delivery actually works
Before you can judge a parser, you need to know what it is parsing for. Address parsing is not an academic exercise in string labelling — it exists to route physical objects to physical places through a system designed in the 19th century and patched ever since.
Human-in-the-loop geocoding
Don't parse. Don't resolve. Don't guess. Show the user what you think they meant and let them confirm. The parser is a suggestion engine, not an authority. The user is the ground truth.
Landmark queries
A landmark query names a specific, usually unique, place. The user doesn't want the nearest instance of a category — they want the Eiffel Tower, the Golden Gate Bridge, the Empire State Building. The geocoder's job is to recognize the landmark name and resolve it to a single coordinate or small candidate set.
Locality-only geocoding
Parse only the locality — the city, town, or village. Everything else is supplementary. Geocode to city-level accuracy. This is sufficient for statistical aggregation, market analysis, regional routing, and most applications that don't need to find a specific building.
Postcode-only geocoding
The fastest geocoder extracts the postcode, looks it up in a postcode-to-coordinate table, and returns the centroid. It doesn't parse the street, the building number, or the locality. It doesn't need to. The postcode is the most machine-readable part of any address, and for many applications, postcode-level accuracy is sufficient.
Regional variant queries
A regional variant is a local term for an amenity, brand, or category that differs from the global or standard name. "Servo" means gas station in Australia. "Bodega" means corner store in New York City. "マクド" (makudo) means McDonalds in the Kansai region of Japan. In their region, these are the standard — the everyday word millions of people reach for, every bit as legitimate as the global name.
Resolver and Who's On First
Parsing answers "what kind of thing is each part of this string?". Resolving answers "where is the resulting place?". They are different jobs and Mailwoman keeps them apart on purpose.
The database fallacy
There is a persistent belief in geocoding: "If we just had a database of all addresses, this problem would be trivial." The belief is wrong in three independent ways. Together, they make the database approach not merely incomplete but structurally inadequate.
Transit queries
A transit query names a station, stop, airport, terminal, or interchange. The user wants to find the transit facility — to depart from it, arrive at it, or navigate near it. Transit queries sit between landmark queries (named stations are landmarks) and amenity queries (unnamed bus stops are amenities).
What is a postcode?
A postcode is a routing instruction, not a geographic area. It tells a postal service how to sort and deliver mail. It does not tell a geocoder where a building is, what municipality it belongs to, or what polygon contains it. Confusing these two things is the most common error in address geocoding — and the source of a surprising fraction of production bugs.
What is an intersection address?
An intersection address names a location by the crossing of two streets. Unlike a conventional street address ("350 5th Ave"), it does not specify a building number. It says "find where these two streets meet." This is common in urban navigation, emergency dispatch, and some countries (notably Japan) where the street-address system is entirely different.
Why not just use Google's API?
Google's Geocoding API is the default choice for geocoding. It is fast, globally comprehensive, and returns results in milliseconds. For many applications, calling Google is the right move. For applications that need to own their address data, audit their results, or operate at scale, it is the expensive kind of cheap.
Why not use geocode.earth?
If Google's API is the proprietary default, geocode.earth is the open-source default. It runs Pelias — the geocoder Mailwoman forked — as a hosted service with global coverage, open data, and transparent pricing. For many applications, geocode.earth is the right answer. For applications that need to own their parser, it has the same fundamental limitation as every hosted API: you do not own the parsing layer.