Skip to main content

14 docs tagged with "locality"

View all tags

A walkthrough — NY-NY Steakhouse, Houston, TX

The knowledge ladder explains why the v0.5.0 pipeline grew two new information layers (Stage 2.7 phrase grouper, expanded Stage 5 reconcile). This article walks through what they actually do on one concrete input, end-to-end.

Addresses that break geocoders

What is an address? covered the data model. This article goes the other way: a tour of address shapes that consistently break parsers and resolvers, with concrete examples for each. If you are building or evaluating a geocoder, this is the failure-mode catalogue worth keeping next to your test suite.

Close-enough geocoding

Not every application needs sub-meter accuracy. For most applications, "in the right city" is close enough. A geocode that places a customer in the correct metropolitan area is sufficient for market analysis, sales territory assignment, and regional logistics. The coordinate doesn't need to be correct — it needs to be useful.

Dual-role places and hierarchy completion

Some places are two things at once. Berlin is a city and a federal state. Milano is a comune and the province named after it. Bristol is a city and a unitary authority. Utrecht is a city and a province. A census of our gazetteer found about 122 of these across nine countries — Italy alone has 70 (its provinces are nearly all named after their capital), then Spain (14), the UK (12), Japan (10), Korea (7), and a handful each for France, Germany, the Netherlands, and China.

Falsehoods about address shapes and dimensions

An address is not necessarily a point on a map. It is not necessarily a polygon. It is not necessarily a discrete building. It is not necessarily at ground level. It is not necessarily the only address at its coordinates. And it is not necessarily stationary.

Falsehoods about geocoded precision and frontages

"Close enough" is a statement about your use case, not about the coordinate. A geocode that is correct for statistical aggregation may be catastrophically wrong for emergency dispatch. And the coordinate itself answers a question — "the front door" — that nobody bothered to define.

How can a building have two addresses?

A single physical building can have multiple valid addresses. For commercial buildings, multifamily housing, corner properties, and any structure that touches more than one administrative system, that is the normal state of affairs. A parser that assumes "one building = one address" will fail silently on a large fraction of real-world queries.

How humans break addresses

Users do not type addresses the way gazetteers store them. They type what they know, in the order they think of it, with the spellings their keyboard supports, trusting autocomplete suggestions they didn't verify. A parser that only handles well-formed addresses fails on real input.

Human-in-the-loop geocoding

Don't parse. Don't resolve. Don't guess. Show the user what you think they meant and let them confirm. The parser is a suggestion engine, not an authority. The user is the ground truth.

Locality-only geocoding

Parse only the locality — the city, town, or village. Everything else is supplementary. Geocode to city-level accuracy. This is sufficient for statistical aggregation, market analysis, regional routing, and most applications that don't need to find a specific building.

Resolver and Who's On First

Parsing answers "what kind of thing is each part of this string?". Resolving answers "where is the resulting place?". They are different jobs and Mailwoman keeps them apart on purpose.

The database fallacy

There is a persistent belief in geocoding: "If we just had a database of all addresses, this problem would be trivial." The belief is wrong in three independent ways. Together, they make the database approach not merely incomplete but structurally inadequate.

What is a concordance?

In Mailwoman's architecture, a concordance is the resolver's answer to the question: "Do these parsed components form a coherent place in the real world?" It is the mechanism that prevents the parser from emitting a parse that is structurally valid but geographically impossible — like "Paris, Texas" labelled as "Paris, Île-de-France" or "NY-NY Steakhouse, Houston TX" with "NY" tagged as a region.