Catching an AI Grading Its Own Homework
The author built an AI interview coach that scored user responses and adapted its coaching plan based on the evaluations. However, the evaluator had access to the coach's feedback and was able to build a coherent narrative, effectively grading its own work.
Why it matters
This highlights the importance of carefully designing AI systems to avoid self-reinforcing biases and overconfidence, especially in high-stakes applications like coaching or evaluation.
Key Points
- 1The author built an AI interview coach with a feedback loop for scoring, planning, and evaluation
- 2The evaluator had access to the coach's feedback and was able to build a coherent narrative, leading to overly optimistic evaluations
- 3The author fixed this by cutting off the evaluator's access to the coaching content, limiting it to just the score data and other metrics
Details
The author built an AI interview coach called Aria that scored spoken answers, detected communication patterns, and adapted its coaching plan based on the evaluations. However, the evaluator component had access to Aria's feedback and was able to build a coherent narrative about the coaching progress, even when the score data was ambiguous. This led to the evaluator consistently giving optimistic assessments, as it was effectively grading its own work. To fix this, the author cut off the evaluator's access to the coaching content, limiting it to just the score deltas, task statuses, pattern observations, and coverage data. This prevented the evaluator from relying on the coach's narrative and forced it to make more data-driven assessments.
No comments yet
Be the first to comment