Skip to main content

11 docs tagged with "resolver"

View all tags

A walkthrough — NY-NY Steakhouse, Houston, TX

The knowledge ladder explains why the v0.5.0 pipeline grew two new information layers (Stage 2.7 phrase grouper, expanded Stage 5 reconcile). This article walks through what they actually do on one concrete input, end-to-end.

Dual-role places and hierarchy completion

Some places are two things at once. Berlin is a city and a federal state. Milano is a comune and the province named after it. Bristol is a city and a unitary authority. Utrecht is a city and a province. A census of our gazetteer found about 122 of these across nine countries — Italy alone has 70 (its provinces are nearly all named after their capital), then Spain (14), the UK (12), Japan (10), Korea (7), and a handful each for France, Germany, the Netherlands, and China.

FST gazetteer prior

The FST (finite-state transducer) gazetteer prior is the structure that lets the neural classifier benefit from everything Who's On First already knows. The neural model knows grammar; the gazetteer knows places. The FST is the bridge — pre-computed at build time so the classifier can consult it at inference time without paying gazetteer-lookup costs per token.

Gazetteer-first geocoding

The most radical simplification: don't parse at all. Treat the input as an information retrieval problem. Tokenize the string, try every token and n-gram against a gazetteer index, return the best placetype match. The parser is a search engine. The resolver is the search result.

How it works now

Mailwoman v2 (current as of May 2026) runs addresses through a six-stage pipeline. Each stage adds one kind of knowledge the stages below it cannot easily derive. Rule classifiers and a neural classifier coexist. A policy registry decides whose vote wins for each address component.

Resolver and Who's On First

Parsing answers "what kind of thing is each part of this string?". Resolving answers "where is the resulting place?". They are different jobs and Mailwoman keeps them apart on purpose.

The knowledge ladder

The staged pipeline is a contract for decomposition by what each layer knows. Every stage is the rightful home of a particular kind of information; pushing work to the wrong stage produces fragile systems that try to learn things from data that they could have looked up, or look up things that they could have learned. This article catalogues the layers, what each one knows, and the two layers we don't ship yet but should.

What is a concordance?

In Mailwoman's architecture, a concordance is the resolver's answer to the question: "Do these parsed components form a coherent place in the real world?" It is the mechanism that prevents the parser from emitting a parse that is structurally valid but geographically impossible — like "Paris, Texas" labelled as "Paris, Île-de-France" or "NY-NY Steakhouse, Houston TX" with "NY" tagged as a region.

Who's On First — data model and gotchas

Who's On First (WOF) is the best open gazetteer available. It's also one of the strangest datasets you'll encounter as a developer. This article documents the gotchas — the structural quirks that trip up new consumers — and the tooling Mailwoman built to work around them.