Error Analysis Baseline — v0.5.3

Run date: 2026-05-28 Golden set: 4535 entries (US 2936 + FR 1545 + adversarial 54) Tokenizer: v0.5.0-a1 Time: 17.4s

Summary

Metric	Count	Rate
Exact match	1147	25.3%
Missed entities	1770	—
Boundary errors	3690	—
Confused tags	1538	—
Hallucinated tags	1319	—

Missed entities (1770) — top categories

street_prefix (e.g. "SE", "N") — not in Stage 2 vocab
street_suffix (e.g. "St", "Ave") — not in Stage 2 vocab
po_box — not in Stage 2 vocab
unit (e.g. "Apt 4B") — not in Stage 2 vocab
intersection_a/b — not in Stage 2 vocab

Boundary errors (3690) — top patterns

street: golden expects "Salmon" got "SE Salmon St" (Stage 2 merges prefix+name+suffix)
street: golden expects "main st" got "main st portland" (locality boundary leak)

Key insight

~80% of failures are SCHEMA MISMATCH, not model error. The model is doing Stage 2 (10 tags) while the golden set uses Stage 3 (~16 tags). Boundary errors will largely disappear when Stage 3 ships and street is decomposed.

Summary​

Missed entities (1770) — top categories​

Boundary errors (3690) — top patterns​

Key insight​

Summary

Missed entities (1770) — top categories

Boundary errors (3690) — top patterns

Key insight