29 docs tagged with "architecture"

A walkthrough — NY-NY Steakhouse, Houston, TX

The knowledge ladder explains why the v0.5.0 pipeline grew two new information layers (Stage 2.7 phrase grouper, expanded Stage 5 reconcile). This article walks through what they actually do on one concrete input, end-to-end.

Attention and bidirectional context

The neural classifier's encoder is a 6-layer transformer with bidirectional

Containment and tree projection

After the encoder produces emissions, the FST priors add their biases, and

Corpus poisoning vulnerability

The empirical-learner nature of the neural parser

CRF decoder

A Conditional Random Field (CRF) is a structured-prediction layer that sits on top of a per-token classifier and finds the best whole-sequence assignment of labels, subject to learnable transition rules between consecutive labels.

From Pelias to Mailwoman

A short lineage for readers who already know Pelias.

FST gazetteer prior

The FST (finite-state transducer) gazetteer prior is the structure that lets the neural classifier benefit from everything Who's On First already knows. The neural model knows grammar; the gazetteer knows places. The FST is the bridge — pre-computed at build time so the classifier can consult it at inference time without paying gazetteer-lookup costs per token.

FST priors as shallow fusion

The neural classifier doesn't know what Boston is. It knows that the

How it used to work

Mailwoman v1 (the pre-2026 version, still living on as the rule classifiers inside v2) parses an address in four steps. This article walks through each one with a concrete example.

How it will work

This article describes where Mailwoman is heading. The work is tracked in GitHub issues and the plan/ directory. Status is current as of May 2026.

How it works now

Mailwoman v2 (current as of May 2026) runs addresses through a six-stage pipeline. Each stage adds one kind of knowledge the stages below it cannot easily derive. Rule classifiers and a neural classifier coexist. A policy registry decides whose vote wins for each address component.

How the model reasons

What kind of computation does mailwoman's neural parser do? When it looks at

Neural classification

The neural classifier is a small transformer encoder that reads an address as a sequence of tokens and emits one BIO label for each token. This article explains what a transformer encoder is, what makes Mailwoman's specific model the shape it is, and what the model is and is not capable of.

ONNX runtime

ONNX (Open Neural Network Exchange) is a standardized format for serializing trained neural networks. ONNX Runtime is the family of inference engines that consume those files. Mailwoman uses ONNX so the same trained model can run in Node.js, browsers, mobile devices, or anywhere else with an ONNX Runtime build.

Reconcile — why empty parses dominate without an inclusion bonus

A short article on the one non-obvious knob inside Stage 5's joint decoder. If you skip it the reconciler converges on the empty parse for every input.

Resolver and Who's On First

Parsing answers "what kind of thing is each part of this string?". Resolving answers "where is the resulting place?". They are different jobs and Mailwoman keeps them apart on purpose.

Rule-based classifiers

A rule classifier is a small piece of hand-written code that labels tokens. Pelias and Mailwoman v1 are built almost entirely on rule classifiers. Mailwoman v2 keeps them and adds the neural classifier alongside — see How it works now for the hybrid.

Street-supplement architecture

This is the design reference for the work that fills the WOF hierarchy gap. It synthesizes the architectural decisions reached during the v0.6.1 postmortem and the subsequent design consult. Code in neural/, resolver-wof-sqlite/, and core/ refers back here; this article tells you why each piece looks the way it does.

Synonymy and homonymy — canonicalize, disambiguate, never drop a span

An address parser has to hold two facts at once that feel contradictory until you

The knowledge ladder

The staged pipeline is a contract for decomposition by what each layer knows. Every stage is the rightful home of a particular kind of information; pushing work to the wrong stage produces fragile systems that try to learn things from data that they could have looked up, or look up things that they could have learned. This article catalogues the layers, what each one knows, and the two layers we don't ship yet but should.

The pipeline contract

You don't have to take Mailwoman's pipeline as-is. The runtime coordinator (createRuntimePipeline) accepts each of the six stages as an injectable function or interface; an integrator can swap any of them for a custom implementation without forking the core.

The staged pipeline

Reading Addresses that break geocoders makes one thing obvious: no single model handles every failure class well. Different failures want different fixes. Some want preprocessing rules, some want a small classifier, some want a transformer, some want a resolver that returns candidates instead of pretending to be sure.

The tokenization tautology

Traditional address parsers split the input into tokens, classify each token independently, then try to reassemble the pieces into a coherent parse. This sequence contains a structural circularity: you cannot group tokens correctly without knowing their types, and you cannot type them correctly without knowing their groups. The traditional architecture resolves this with heuristics, exceptions, and solver post-processing. The exception pile grows without bound.

The WOF hierarchy gap

Who's On First is a place gazetteer. Mailwoman is an address parser. The two are misaligned at one specific point in the hierarchy — and that misalignment shapes how the neural model fails on street-level inputs.

Tokenization

Tokenization is the step that turns an input string into a list of small pieces called tokens. Every downstream component — rule classifiers, the neural classifier, the solver, the resolver — works with tokens, not with raw strings.

Viterbi and BIO validity

After the encoder produces per-token emission logits and the FST priors

What is a concordance?

In Mailwoman's architecture, a concordance is the resolver's answer to the question: "Do these parsed components form a coherent place in the real world?" It is the mechanism that prevents the parser from emitting a parse that is structurally valid but geographically impossible — like "Paris, Texas" labelled as "Paris, Île-de-France" or "NY-NY Steakhouse, Houston TX" with "NY" tagged as a region.

What Mailwoman is — a name for the thing we're building

If you've tried to describe Mailwoman to someone and reached for "it's like

Why a neural parser?

The first five articles in this track established the problem: