no code implementations • 2 May 2024 • Jionghao Lin, Zifei Han, Danielle R. Thomas, Ashish Gurung, Shivang Gupta, Vincent Aleven, Kenneth R. Koedinger
Our findings indicate that: 1) using a few-shot approach, the GPT-4 model effectively identifies correct/incorrect trainees' responses from three training lessons with an average F1 score of 0. 84 and an AUC score of 0. 85; and 2) using the few-shot approach, the GPT-4 model adeptly rephrases incorrect trainees' responses into desired responses, achieving performance comparable to that of human experts.
no code implementations • 4 Feb 2024 • Zifei, Han, Jionghao Lin, Ashish Gurung, Danielle R. Thomas, Eason Chen, Conrad Borchers, Shivang Gupta, Kenneth R. Koedinger
The results indicate that the RAG prompt demonstrated more accurate performance (assessed by the level of hallucination and correctness in the generated assessment texts) and lower financial costs than the other strategies evaluated.
no code implementations • 6 Jan 2024 • Sanjit Kakarla, Danielle Thomas, Jionghao Lin, Shivang Gupta, Kenneth R. Koedinger
By analyzing 50 real-life tutoring dialogues, we find both GPT-3. 5-Turbo and GPT-4 demonstrate proficiency in assessing the criteria related to reacting to students making errors.
no code implementations • 27 Jun 2023 • Jionghao Lin, Danielle R. Thomas, Feifei Han, Shivang Gupta, Wei Tan, Ngoc Dang Nguyen, Kenneth R. Koedinger
Research demonstrates learners engaging in the process of producing explanations to support their reasoning, can have a positive impact on learning.