11 docs tagged with "hubris"

Close-enough geocoding

Not every application needs sub-meter accuracy. For most applications, "in the right city" is close enough. A geocode that places a customer in the correct metropolitan area is sufficient for market analysis, sales territory assignment, and regional logistics. The coordinate doesn't need to be correct — it needs to be useful.

Gazetteer-first geocoding

The most radical simplification: don't parse at all. Treat the input as an information retrieval problem. Tokenize the string, try every token and n-gram against a gazetteer index, return the best placetype match. The parser is a search engine. The resolver is the search result.

Human-in-the-loop geocoding

Don't parse. Don't resolve. Don't guess. Show the user what you think they meant and let them confirm. The parser is a suggestion engine, not an authority. The user is the ground truth.

Locality-only geocoding

Parse only the locality — the city, town, or village. Everything else is supplementary. Geocode to city-level accuracy. This is sufficient for statistical aggregation, market analysis, regional routing, and most applications that don't need to find a specific building.

Normalize to match

The simplest geocoder doesn't parse addresses at all. It normalizes them — strips punctuation, lowercases, expands abbreviations — and matches the result against a known database of addresses. The "parser" is a fuzzy string matcher. The "resolver" is a hash table lookup.

Postcode-only geocoding

The fastest geocoder extracts the postcode, looks it up in a postcode-to-coordinate table, and returns the centroid. It doesn't parse the street, the building number, or the locality. It doesn't need to. The postcode is the most machine-readable part of any address, and for many applications, postcode-level accuracy is sufficient.

Regex-anchored fields

Most applications don't need a full address parse. They need three or four fields: the postcode, the state, the street number, maybe the street name. Extract those with regexes. Ignore everything else. The unparsed tokens are "the rest" — available for display, label printing, and downstream processing, but not part of the geocoding decision.

The 90% trap

A geocoder that correctly parses 90% of addresses sounds good. It sounds especially good when the alternative — building or buying a better one — costs engineering time or API fees. The trap is that the 10% tail is not random. It is concentrated in the populations you most need to serve: rural areas, developing economies, multifamily housing, and anyone whose address doesn't look like a suburban US single-family home.

The case for simple geocoders

Not every geocoding problem needs a neural parser. This series steel-mans the reasonably defensible compromises that work for most applications — the architectures that ship in days, cover 90% of addresses, and cost nothing to maintain. Each article describes one approach, when it works, what it loses, and where Mailwoman fits (or doesn't).

Why not just use Google's API?

Google's Geocoding API is the default choice for geocoding. It is fast, globally comprehensive, and returns results in milliseconds. For many applications, calling Google is the right move. For applications that need to own their address data, audit their results, or operate at scale, it is the expensive kind of cheap.

Why not use geocode.earth?

If Google's API is the proprietary default, geocode.earth is the open-source default. It runs Pelias — the geocoder Mailwoman forked — as a hosted service with global coverage, open data, and transparent pricing. For many applications, geocode.earth is the right answer. For applications that need to own their parser, it has the same fundamental limitation as every hosted API: you do not own the parsing layer.