Resolver ranking: exact-match tiering + parent fallback (2026-05-30)
Direction-C resolver-depth PR1. Two ranking fixes that attack the verified dominant failure mode of the end-to-end resolver โ wrong-admin cascades from a mis-resolved region โ measured on the 2406-row WOF-bootstrap eval.
The bug, root-causedโ
The end-to-end resolver eval ceiling was 68.9% Acc@1 with a p90 coordinate error of 1090 km. Decomposing the failures pinned the cause precisely:
- Of neural's 749 eval failures, 100% resolved to a place โ just the wrong one (0% were unresolved). So the ceiling is a resolver-ranking problem, not a parser-coverage problem.
- 77% of the wrong resolutions are >100 km off (median wrong-error 333 km, p90 2624 km) โ distances that mean wrong state, not a local near-miss.
Tracing the wrong-state cases exposed the mechanism. For Portland, ME the
resolver resolved the region "ME" to Missouri, then correctly scoped the
locality to its (wrong) parent โ Portland, MO. The findPlace("ME", region, US)
candidate list:
ME โ Missouri score 6.83 pop 6.2M โ won
Maine score 6.04 pop 1.4M โ exact alias `ME`, but lost
Maine โ Maine score 23.2 (full-name match scores ~17 higher)
Maine is the only US region with the alias ME (verified in the gazetteer's
names table; Missouri's alias is MO). It still lost โ because the ranking adds
population as a large additive boost (populationBoost, up to +4), and
Missouri's larger population overcame Maine's bm25 edge on the 2-letter query.
Fix 1 โ exact-match tiering (the aligned fix)โ
The principle (and why it aligns with population/importance rather than overriding it): population is a prominence prior. Its job is to break ties among candidates that match the query equally well โ e.g. "Springfield" โ Springfield IL over Springfield MA, both exact name matches, population correctly picks the bigger. It was never meant to promote a candidate that matches the query worse. The additive scheme accidentally let it do exactly that, because +4 of population exceeds the bm25 gap between an exact alias match and a non-match.
The fix makes ranking two-keyed: match quality is the primary key, prominence (population) the secondary key within a tier. A candidate whose name or any alias equals the query (case-folded) ranks above any partial match; the existing weighted sum โ including population โ orders candidates within a tier.
Springfield, ILstill works: both IL and MA Springfields are exact name matches โ same tier โ population decides โ IL (bigger). Unchanged.Portland, MEnow works: only Maine has the exact aliasMEโ higher tier โ population never gets to override it โ Maine.
So population/importance keeps doing exactly what it was tuned for (surfacing the
famous Springfield/New York/Chicago among equally-good name matches); it simply
no longer leaks across match-quality tiers. Implemented in
resolver-wof-sqlite/lookup.ts (RankingWeights.exactMatchTiering, default true,
one cheap indexed lookup over <schema>.names for the over-fetched candidate ids).
Known limitation: tiering re-ranks within the over-fetched window (limit*4).
An exact match that falls outside that window isn't rescued. For the region-abbrev
case the window is comfortably sufficient (a handful of states match a 2-letter
query); pushing the exact-match term into the SQL ORDER BY is a follow-up if a
wider case ever needs it.
Fix 2 โ parent fallbackโ
parentId is a hard descendant filter in the backend. If a parent resolves
wrong, or the gazetteer hierarchy is incomplete (a real locality whose ancestors
chain is missing its region), the filtered lookup returns nothing and the node
goes unresolved. Fix: when a parent-constrained lookup returns zero, retry once
without the parent constraint (keeping the country constraint, so resolution
still can't wander cross-border). Prefers a parent-scoped hit; never sacrifices
recall. core/resolver/resolve.ts (ResolveOpts.parentFallback, default true).
Results (2406-row WOF-bootstrap, v0.7.2 model)โ
One toggle apart (--exact-tiering/--parent-fallback off vs on), same model, same gazetteer:
| metric | baseline (off) | both fixes (on) | ฮ |
|---|---|---|---|
| neural-only Acc@1 โ all | 68.9% | 77.4% | +8.5pp |
| neural-only โ canonical | 77.1% | 88.0% | +10.9pp |
| neural-only โ perturbed | 64.8% | 72.1% | +7.3pp |
| v0-via-adapter โ all | 63.7% | 70.0% | +6.3pp |
| arbiter โ all | 72.0% | 80.0% | +8.0pp |
| oracle โ all | 77.9% | 86.4% | +8.5pp |
| coord error p90 (km) | 1090 | 211 | 5.2ร lower |
| neural resolved-but-wrong | 749 | 544 | โ205 (โ27%) |
The headline: +8.5pp end-to-end Acc@1 and a 5.2ร cut in p90 coordinate error (1090 km โ 211 km โ the cross-state tail collapses, exactly as the "ME โ Missouri" root cause predicted). Every baseline improves; the gains are largest on the canonical subset (+10.9pp) where clean two-letter region codes are most common. Failure attribution confirms the mechanism: neural's resolver-side errors drop from 749 to 544 (โ27%) with parser-side errors still at 0 โ the fix moved wrong-state resolutions into correct ones, not coverage.
The 211 km p90 that remains is the genuine admin-centroid tail (large rural counties / sprawling metros whose centroid is far from the labeled point) plus a residual same-state ambiguity โ the target for the learned re-ranker and the street-level tier, not for this PR.
Reproduceโ
# baseline (both fixes off) vs both on โ one toggle apart:
node --experimental-strip-types scripts/eval/resolver-eval.ts \
--eval /tmp/wof-bootstrap/eval.jsonl --country US \
--exact-tiering {true|false} --parent-fallback {true|false} \
--model <v0.7.2.onnx> --tokenizer <v0.6.0-a0> --model-card <card> \
--wof admin-global-priority.db,postalcode-us.db
Both fixes are A/B flags (default on) so the change is one toggle from the prior behaviour and can be reverted per-call.