Skip to main content

Concept deep dives

This track has one article per concept. The articles are independent — read them in any order. If you are new to Mailwoman, read understanding/ first.

Suggested reading order

A useful order if you want to follow the data and code from input to output:

  1. Tokenization — splitting strings into tokens
  2. Rule-based classifiers — the Mailwoman v1 approach
  3. BIO labels — how the neural model marks spans
  4. Neural classification — what a transformer encoder does
  5. CRF decoder — fixing structurally-invalid label sequences
  6. Training pipeline — from corpus to model file
  7. Corpus construction — how the training data is built
  8. ONNX runtime — running the model in production
  9. Resolver and Who's On First — turning labels into coordinates

Additional articles:

Articles moved to understanding/ in May 2026:

Dependency map

Some concepts assume others:

You can short-circuit: if you know what a transformer is, skip directly to CRF decoder. If you want a tour of the training side, jump to Training pipeline.

Article shape

Each article is about 5–10 minutes to read, with:

  • A short motivation: why this concept matters in Mailwoman.
  • An explanation that defines its terms on first use.
  • A diagram where structure helps (built with mermaid).
  • A short code-or-data example.
  • Pointers into the source code for readers who want to go further.
  • A "See also" list.