Concept deep dives
This track has one article per concept. The articles are independent — read them in any order. If you are new to Mailwoman, read understanding/ first.
Suggested reading order
A useful order if you want to follow the data and code from input to output:
- Tokenization — splitting strings into tokens
- Rule-based classifiers — the Mailwoman v1 approach
- BIO labels — how the neural model marks spans
- Neural classification — what a transformer encoder does
- CRF decoder — fixing structurally-invalid label sequences
- Training pipeline — from corpus to model file
- Corpus construction — how the training data is built
- ONNX runtime — running the model in production
- Resolver and Who's On First — turning labels into coordinates
Additional articles:
- Joint decoding walkthrough — how the reconciler picks coherent interpretations
- Reconcile empty-parse bonus — handling the "model gives up" failure mode
- Dual-role places — completing the dropped locality of a city-state or capital-seat province
- Synthetic corpus validation — how we validate LLM-generated training data
- DeepSeek reasoning budget — the adversarial corpus generation pipeline
- Staged pipeline contract — implementer-facing TypeScript contracts
Articles moved to understanding/ in May 2026:
- What is an address? — now at Tier 1 position 8
- Addresses that break geocoders — now at Tier 1 position 9
- The knowledge ladder — now at Tier 1 position 14
- The staged pipeline — now at Tier 1 position 15
Dependency map
Some concepts assume others:
You can short-circuit: if you know what a transformer is, skip directly to CRF decoder. If you want a tour of the training side, jump to Training pipeline.
Article shape
Each article is about 5–10 minutes to read, with:
- A short motivation: why this concept matters in Mailwoman.
- An explanation that defines its terms on first use.
- A diagram where structure helps (built with mermaid).
- A short code-or-data example.
- Pointers into the source code for readers who want to go further.
- A "See also" list.