Skip to main content

Session Report โ€” 2026-05-28

Summaryโ€‹

Two major infrastructure shifts: (1) migrated from self-hosted docs to GitHub Pages + HF Bucket for model assets, and (2) expanded WOF data from US-only to 7-country global coverage with a multi-script tokenizer retrain.

WebGPU debugging arcโ€‹

The bugโ€‹

The demo's ONNX model produced correct results via WASM but garbage via WebGPU on Safari/iOS. All-locality output with 0.2-0.4 confidence. Playwright tests (headless, no GPU) always passed โ€” masking the bug entirely.

Root causeโ€‹

onnxruntime-web ships two WebGPU execution providers in the same package:

  • import "onnxruntime-web" โ†’ JSEP (old, broken slice kernel on Metal)
  • import "onnxruntime-web/webgpu" โ†’ Native EP (correct on all backends)

The default import uses JSEP. Chrome's Dawn/Vulkan masked the bug; Safari's Metal exposed it.

Fixโ€‹

One import path change: onnxruntime-web โ†’ onnxruntime-web/webgpu. WebGPU now works on Chrome, Safari macOS, and iOS Safari โ€” verified on real devices.

Takeawayโ€‹

Headless CI cannot catch GPU-specific bugs. The verify skill's Playwright tests were passing while every real user saw garbage. Added diagnostics (backend indicator, force-WASM toggle, build stamp) to make this class of issue visible.

Reference: Technical doc. Blog post: "Our model worked in CI but broke on every real device".

Infrastructure migrationโ€‹

GitHub Pages + HF Bucketโ€‹

Replaced the playpen nginx + rsync deployment with:

  • GitHub Pages: Docusaurus builds and deploys via Actions on push to main
  • HF Bucket: model.onnx, tokenizer.model, fst-en-US.bin, wof-hot.db served from CDN
  • Version selector: demo page fetches releases.json manifest, lets users switch between v0.5.3, v0.5.2, v0.4.0, v0.1.0
  • Cache-busting: stale-while-revalidate on HTML, version-qualified asset URLs

Demo improvementsโ€‹

  • Build stamp footer (commit hash + timestamp)
  • Backend indicator (webgpu/wasm + model size)
  • Force WASM toggle
  • CodeBlock for XML output (syntax highlighting + copy)
  • Example buttons clear stale results
  • FST provenance metadata (expandable details)

Global WOF expansionโ€‹

Data pipelineโ€‹

Built a unified SQLite from 7 priority countries (US, FR, JP, CN, KR, DE, GB):

MetricValue
GeoJSON files1.74M
Admin places1,288,749
Name variants10,233,886
Languages20+
Build time185s (3 min)
Frozen DB1.09 GB

Pipeline implements the WAL + Freeze design brief: WAL during ingest, checkpoint + journal_mode=DELETE + indexes + ANALYZE + VACUUM INTO for the frozen artifact.

Tokenizer retrainโ€‹

v0.6.0-a0 tokenizer trained on 2.19M multi-script WOF names:

Scriptv0.5.0-a1 (old)v0.6.0-a0 (new)
Chinese50-75% byte-fallback0%
Japanese58-60%0%
Korean41%0%
Thai30%0%
Aggregate36.6%0.0%

Issue #120 target was less than 5%. Hit 0%. Same 48K vocab, 28 seconds to train.

Global FSTโ€‹

Multi-language FST with Unicode-aware normalization:

MetricUS-only (old)Global (new)
States60K2.08M
Places94K1.25M
Name insertions128K4.16M
Binary size9 MB302 MB
CJK queriesImpossibleWorking

"ๆฑไบฌ" โ†’ Tokyo, "ๅŒ—ไบฌ" โ†’ Beijing, "์„œ์šธ" โ†’ Seoul, "ๅคง้˜ช" โ†’ Osaka.

CJK normalization fixesโ€‹

Three files needed Unicode-aware regexes:

  1. fst-matcher.ts: normalizeTokens() โ€” [^a-z0-9\s] โ†’ [\p{P}\p{S}]
  2. fst-builder.ts: languages: ["*"] support for all-language indexing
  3. fst-prior.ts: hasAlnum check and normalizeFstToken() โ€” ASCII โ†’ Unicode property escapes

Piscina lock fixโ€‹

The wof/prepare CLI command's Piscina workers were opening concurrent SQLite connections, causing SQLITE_BUSY. Fixed per the design brief: workers return ParsedPlace[] to the main thread, main thread handles all DB writes in batched transactions.

FST V4 formatโ€‹

The global FST's root state has 488K edges โ€” overflowed the u16 edgeCount field. Bumped to V4 with u32 edge/place counts per state (STATE_ENTRY_SIZE 12 โ†’ 16). Backwards compatible.

v0.5.4 trainingโ€‹

Running on Modal A100 with:

  • v0.6.0-a0 tokenizer (multi-script, 0% CJK byte-fallback)
  • v0.5.1 recipe (wof-admin: 2.0, constant LR, no label smoothing, 100K steps)
  • Per-tag F1 logging
  • Step 48000/100000 at time of writing

HF Bucket inventoryโ€‹

PathSizeDescription
en-us/v0.5.3/*~75 MBCurrent model release (4 versions available)
tokenizer/v0.6.0-a0/*1.9 MBMulti-script tokenizer
wof/admin-global-priority-7country.db1.09 GBGlobal admin DB
wof/fst-global-7country-all-langs.bin302 MBGlobal multi-language FST

Issues closedโ€‹

  • #120: Tokenizer retrain (0% byte-fallback achieved)
  • #98: Phase B browser demo (closed previous session)
  • #47: Phase 3.x browser demo (closed previous session)

Night shift addendum (02:00โ€“02:40 UTC)โ€‹

v0.5.4 shippedโ€‹

  • Training completed (100K steps, A100, ~2.5h)
  • Export โ†’ fp32 117 MB โ†’ int8 28.1 MB
  • Demo presets: 6/6 correct on int8
  • Error analysis: 17.0% exact match on 4535 golden entries (cross-tokenizer comparison invalid per eval protocol โ€” schema mismatch dominates; Stage 3 will close most boundary errors)
  • Uploaded to HF en-us/v0.5.4/ with model.onnx, tokenizer.model, fst-en-US.bin, model-card.json
  • en-us/releases.json updated, defaultVersion: v0.5.4
  • neural-weights-en-us package version bumped to 0.5.4

Stage 3 corpus adapters shippedโ€‹

All three priority adapters now emit street_prefix, street_suffix, unit from existing structured input โ€” no rescraping needed.

  • TIGER: decomposeStreet() parses FULLNAME using libpostal/en directionals + street_types. 8 unit tests pass.
  • NAD: Uses NAD's structured St_PreDir/St_PreTyp/St_PosTyp/St_PosDir fields directly. Unit/Building/Floor/Room chain into unit tag.
  • BAN: French street types are leading words (Rue, Avenue, Boulevard). decomposeFrStreet() uses libpostal/fr/street_types. 6 unit tests pass.

ACTIVE_TAGS remains STAGE2 โ€” bump to STAGE3 when v0.6.0 training is ready.

Japanese WOF adapter prototypeโ€‹

wof-admin-jp walks the global SQLite parent chain for every ไธ็›ฎ (chome) in the JP repo. Produces 6,373 synthetic JP training rows across 47 prefectures, 269 localities. Top: ๆฑไบฌ (Tokyo) 2,251 rows. See the blog post "Why Japanese addresses break Western parsers".

Per-locale FSTs on HFโ€‹

Seven locale FSTs uploaded to hf://buckets/sister-software/mailwoman/fst/:

LocaleSizeStates
en-US20.9 MB160K
en-GB3.7 MB33K
fr-FR10.2 MB72K
ja-JP13.0 MB116K
ko-KR7.1 MB55K
zh-CN92.5 MB589K
de-DE8.1 MB70K

CJK queries (ๆฑไบฌ, ๅŒ—ไบฌ, ์„œ์šธ) now resolve correctly. Deep demo wiring (locale detection โ†’ dynamic FST swap) deferred โ€” out of scope for tonight.