18 docs tagged with "neural"

A walkthrough — NY-NY Steakhouse, Houston, TX

The knowledge ladder explains why the v0.5.0 pipeline grew two new information layers (Stage 2.7 phrase grouper, expanded Stage 5 reconcile). This article walks through what they actually do on one concrete input, end-to-end.

Attention and bidirectional context

The neural classifier's encoder is a 6-layer transformer with bidirectional

BIO labels

BIO labelling is the trick that lets a token classifier (which decides one token at a time) emit spans (groups of consecutive tokens that mean one thing together). It is the standard approach for sequence labelling tasks in NLP — Named Entity Recognition, part-of-speech tagging, address parsing.

Confidence calibration

Every span the parser emits carries a number when the model says it's 94% sure, is it actually right 94% of the time?

CRF decoder

A Conditional Random Field (CRF) is a structured-prediction layer that sits on top of a per-token classifier and finds the best whole-sequence assignment of labels, subject to learnable transition rules between consecutive labels.

From Pelias to Mailwoman

A short lineage for readers who already know Pelias.

FST gazetteer prior

The FST (finite-state transducer) gazetteer prior is the structure that lets the neural classifier benefit from everything Who's On First already knows. The neural model knows grammar; the gazetteer knows places. The FST is the bridge — pre-computed at build time so the classifier can consult it at inference time without paying gazetteer-lookup costs per token.

FST priors as shallow fusion

The neural classifier doesn't know what Boston is. It knows that the

How it will work

This article describes where Mailwoman is heading. The work is tracked in GitHub issues and the plan/ directory. Status is current as of May 2026.

How it works now

Mailwoman v2 (current as of May 2026) runs addresses through a six-stage pipeline. Each stage adds one kind of knowledge the stages below it cannot easily derive. Rule classifiers and a neural classifier coexist. A policy registry decides whose vote wins for each address component.

How the model reasons

What kind of computation does mailwoman's neural parser do? When it looks at

Neural classification

The neural classifier is a small transformer encoder that reads an address as a sequence of tokens and emits one BIO label for each token. This article explains what a transformer encoder is, what makes Mailwoman's specific model the shape it is, and what the model is and is not capable of.

ONNX runtime

ONNX (Open Neural Network Exchange) is a standardized format for serializing trained neural networks. ONNX Runtime is the family of inference engines that consume those files. Mailwoman uses ONNX so the same trained model can run in Node.js, browsers, mobile devices, or anywhere else with an ONNX Runtime build.

The knowledge ladder

The staged pipeline is a contract for decomposition by what each layer knows. Every stage is the rightful home of a particular kind of information; pushing work to the wrong stage produces fragile systems that try to learn things from data that they could have looked up, or look up things that they could have learned. This article catalogues the layers, what each one knows, and the two layers we don't ship yet but should.

The tokenization tautology

Traditional address parsers split the input into tokens, classify each token independently, then try to reassemble the pieces into a coherent parse. This sequence contains a structural circularity: you cannot group tokens correctly without knowing their types, and you cannot type them correctly without knowing their groups. The traditional architecture resolves this with heuristics, exceptions, and solver post-processing. The exception pile grows without bound.