Skip to main content

Phase 5 โ€” Studio (Human-in-the-Loop Correction)

Goal: a web UI where humans can paste an address, see the parse, correct spans, and submit corrections. Corrections feed back into corpus retraining. This is the commercial differentiator โ€” enterprises will pay for the ability to correct their own address data.

Status: deferred until Phase 3 has shipped and gathered usage. Do not begin Phase 5 without explicit confirmation.

This document is a sketch. Detailed plan will be written when Phase 5 begins.

Why this mattersโ€‹

  • Address parsing has a long tail of locale-specific edge cases that no amount of pretraining catches
  • Customers with their own address datasets (real estate, logistics, government compliance) need to fix model output for their data
  • The studio is what turns a free library into a commercial product line

Sketch of componentsโ€‹

Frontendโ€‹

  • React or Svelte (project creator's call)
  • Paste address โ†’ see live parse with span boundaries
  • Drag span edges to adjust boundaries
  • Right-click span to relabel component
  • Submit correction with optional notes

Backendโ€‹

  • Postgres table: corrections { id, raw, original_parse, corrected_parse, user_id, project_id, created_at, notes }
  • API: POST /corrections, GET /corrections?project_id=...
  • Auth: minimal โ€” project tokens, no full user system v1

Retraining loopโ€‹

  • Nightly job: dump new corrections to /data/corpus/sources/corrections/
  • Tag with high sample weight
  • Trigger retraining (manual confirmation, not automatic โ€” corrections can be adversarial)

What Phase 3 should leave in good shapeโ€‹

  • ClassificationProposal carries enough info to round-trip through human correction
  • The output format includes character offsets (it does) so span editing is mechanical
  • Telemetry hooks exist to count "user accepted the parse as-is" vs "user corrected" โ€” informs which components most need correction

Open questions for when Phase 5 beginsโ€‹

  • Is this open source or commercial-only?
  • How does the correction format become training data? (BIO labels need to be re-derived from corrected components)
  • Quality gating: are all corrections trusted equally, or does each project's corrections only affect that project's model?
  • Does the studio host its own model fine-tuned on the project's corrections?