Confidence calibration
Every span the parser emits carries a number when the model says it's 94% sure, is it actually right 94% of the time?
Every span the parser emits carries a number when the model says it's 94% sure, is it actually right 94% of the time?