Addresses that break geocoders
What is an address? covered the data model. This article goes the other way: a tour of address shapes that consistently break parsers and resolvers, with concrete examples for each. If you are building or evaluating a geocoder, this is the failure-mode catalogue worth keeping next to your test suite.
Falsehoods about geocoded precision and frontages
"Close enough" is a statement about your use case, not about the coordinate. A geocode that is correct for statistical aggregation may be catastrophically wrong for emergency dispatch. And the coordinate itself answers a question — "the front door" — that nobody bothered to define.
Falsehoods about numbers in addresses
The falsehoods
Falsehoods about street names
The falsehoods
How can a building have two addresses?
A single physical building can have multiple valid addresses. For commercial buildings, multifamily housing, corner properties, and any structure that touches more than one administrative system, that is the normal state of affairs. A parser that assumes "one building = one address" will fail silently on a large fraction of real-world queries.
How humans break addresses
Users do not type addresses the way gazetteers store them. They type what they know, in the order they think of it, with the spellings their keyboard supports, trusting autocomplete suggestions they didn't verify. A parser that only handles well-formed addresses fails on real input.
Regex-anchored fields
Most applications don't need a full address parse. They need three or four fields: the postcode, the state, the street number, maybe the street name. Extract those with regexes. Ignore everything else. The unparsed tokens are "the rest" — available for display, label printing, and downstream processing, but not part of the geocoding decision.
The tokenization tautology
Traditional address parsers split the input into tokens, classify each token independently, then try to reassemble the pieces into a coherent parse. This sequence contains a structural circularity: you cannot group tokens correctly without knowing their types, and you cannot type them correctly without knowing their groups. The traditional architecture resolves this with heuristics, exceptions, and solver post-processing. The exception pile grows without bound.
What is an intersection address?
An intersection address names a location by the crossing of two streets. Unlike a conventional street address ("350 5th Ave"), it does not specify a building number. It says "find where these two streets meet." This is common in urban navigation, emergency dispatch, and some countries (notably Japan) where the street-address system is entirely different.