11 docs tagged with "en-us"

Falsehoods about postcodes

The falsehoods

How can a building have two addresses?

A single physical building can have multiple valid addresses. For commercial buildings, multifamily housing, corner properties, and any structure that touches more than one administrative system, that is the normal state of affairs. A parser that assumes "one building = one address" will fail silently on a large fraction of real-world queries.

How mail delivery actually works

Before you can judge a parser, you need to know what it is parsing for. Address parsing is not an academic exercise in string labelling — it exists to route physical objects to physical places through a system designed in the 19th century and patched ever since.

Normalize to match

The simplest geocoder doesn't parse addresses at all. It normalizes them — strips punctuation, lowercases, expands abbreviations — and matches the result against a known database of addresses. The "parser" is a fuzzy string matcher. The "resolver" is a hash table lookup.

Postcode-only geocoding

The fastest geocoder extracts the postcode, looks it up in a postcode-to-coordinate table, and returns the centroid. It doesn't parse the street, the building number, or the locality. It doesn't need to. The postcode is the most machine-readable part of any address, and for many applications, postcode-level accuracy is sufficient.

Regex-anchored fields

Most applications don't need a full address parse. They need three or four fields: the postcode, the state, the street number, maybe the street name. Extract those with regexes. Ignore everything else. The unparsed tokens are "the rest" — available for display, label printing, and downstream processing, but not part of the geocoding decision.

The 90% trap

A geocoder that correctly parses 90% of addresses sounds good. It sounds especially good when the alternative — building or buying a better one — costs engineering time or API fees. The trap is that the 10% tail is not random. It is concentrated in the populations you most need to serve: rural areas, developing economies, multifamily housing, and anyone whose address doesn't look like a suburban US single-family home.

What is a postcode?

A postcode is a routing instruction, not a geographic area. It tells a postal service how to sort and deliver mail. It does not tell a geocoder where a building is, what municipality it belongs to, or what polygon contains it. Confusing these two things is the most common error in address geocoding — and the source of a surprising fraction of production bugs.

What is a ZIP Code?

The US ZIP Code is the most influential postal code system in the world — not because it is the best, but because US-origin address data dominates geocoding training sets and shapes what parsers expect. Understanding its structure is essential for understanding why US-trained parsers fail on international addresses.

Why not just use Google's API?

Google's Geocoding API is the default choice for geocoding. It is fast, globally comprehensive, and returns results in milliseconds. For many applications, calling Google is the right move. For applications that need to own their address data, audit their results, or operate at scale, it is the expensive kind of cheap.

Why not use geocode.earth?

If Google's API is the proprietary default, geocode.earth is the open-source default. It runs Pelias — the geocoder Mailwoman forked — as a hosted service with global coverage, open data, and transparent pricing. For many applications, geocode.earth is the right answer. For applications that need to own their parser, it has the same fundamental limitation as every hosted API: you do not own the parsing layer.