Automated Feedback Systems for Enhanced Tutor Training

cover
23 May 2025

Abstract and 1 Introduction

2. Background

2.1 Effective Tutoring Practice

2.2 Feedback for Tutor Training

2.3 Sequence Labeling for Feedback Generation

2.4 Large Language Models in Education

3. Method

3.1 Dataset and 3.2 Sequence Labeling

3.3 GPT Facilitated Sequence Labeling

3.4 Metrics

4. Results

4.1 Results on RQ1

4.2 Results on RQ2

5. Discussion

6. Limitation and Future Works

7. Conclusion

8. Acknowledgments

9. References

APPENDIX

A. Lesson Principles

B. Input for Fine-Tunning GPT-3.5

C. Scatter Matric of the Correlation on the Outcome-based Praise

D. Detailed Results of Fine-Tuned GPT-3.5 Model's Performance

2.2 Feedback for Tutor Training

Feedback in the learning process is universally recognized for its significant impact on learning outcomes [50, 17, 18, 37, 23, 21], with effects ranging from significantly positive [50, 17] to occasionally negative [18], depending on the content and method of delivery. The effectiveness of feedback, as highlighted by Hattie and Timperley [21], is intricately linked to its relevance to the learning context, its timing following initial instruction, and its focus on addressing misconceptions or incorrect reasoning [21]. In particular, immediate, explanatory feedback, which clarifies why certain responses are correct or incorrect, plays a crucial role in promoting active engagement and thoughtful practice among learners [53, 37, 21, 23, 17, 50]. The growing importance of feedback has motivated the adoption of automated feedback systems in educational settings, such as OnTask, which allows educators to provide scalable feedback based on conditional rules related to students’ academic activities and performance [49]. Yet, the application of such systems in tutor training remains under-explored.

An important method of deploying automated feedback in tutor training involves the use of templated feedback. The templated feedback, including specific references to desired and less-desired elements of the tutor responses, is informed by earlier results on the effectiveness of having a rich, datadriven error diagnosis taxonomy driving template-based feedback [1]. Our study aims to employ natural language processing (NLP) techniques to automate the identification of desirable and less desirable elements within tutor responses, facilitating the generation of templated explanatory feedback.

This paper is available on arxiv under CC BY 4.0 DEED license.

Authors:

(1) Jionghao Lin, Carnegie Mellon University (jionghal@cs.cmu.edu);

(2) Eason Chen, Carnegie Mellon University (easonc13@cmu.edu);

(3) Zeifei Han, University of Toronto (feifei.han@mail.utoronto.ca);

(4) Ashish Gurung, Carnegie Mellon University (agurung@andrew.cmu.edu);

(5) Danielle R. Thomas, Carnegie Mellon University (drthomas@cmu.edu);

(6) Wei Tan, Monash University (wei.tan2@monash.edu);

(7) Ngoc Dang Nguyen, Monash University (dan.nguyen2@monash.edu);

(8) Kenneth R. Koedinger, Carnegie Mellon University (koedinger@cmu.edu).