Skip to main content

3 docs tagged with "gazetteer"

View all tags

FST gazetteer prior

The FST (finite-state transducer) gazetteer prior is the structure that lets the neural classifier benefit from everything Who's On First already knows. The neural model knows grammar; the gazetteer knows places. The FST is the bridge — pre-computed at build time so the classifier can consult it at inference time without paying gazetteer-lookup costs per token.

The WOF hierarchy gap

Who's On First is a place gazetteer. Mailwoman is an address parser. The two are misaligned at one specific point in the hierarchy — and that misalignment shapes how the neural model fails on street-level inputs.

Who's On First — data model and gotchas

Who's On First (WOF) is the best open gazetteer available. It's also one of the strangest datasets you'll encounter as a developer. This article documents the gotchas — the structural quirks that trip up new consumers — and the tooling Mailwoman built to work around them.