Predicting Students’ Difficulties From a Piece of Code

Based on hundreds of thousands of hours of data about how students learn in massive open online courses, educational machine learning promises to help students who are learning to code. However, in most classrooms, students and assignments do not have enough historical data for feeding these data hungry algorithms. Previous work on predicting dropout is data hungry and, moreover, requires the code to be syntactically correct. As we deal with beginners’ code in a text-based language our models are trained on noisy student text; almost 40% of the code in our datasets contains parsing errors. In this article, we compare two machine learning models that predict whether students need help regardless of whether their code compiles or not. That is, we compare two methods for automatically predicting whether students will be able to solve a programming exercise on their own. The first model is a heavily featureengineered approach that implements pedagogical theories of the relation between student interaction patterns and the probability of dropout; it requires a rich history of student interaction. The second method is based on a short program (that may contain errors) written by a student, together with a few hundred attempts by their classmates on the same exercise. This second method uses natural language processing techniques; it is based on the intuition that beginners’ code may be closer to a natural language than to a formal one. It is inspired by previous work on predicting people’s fluency when learning a second natural language.

PDF Abstract
No code implementations yet. Submit your code now



  Add Datasets introduced or used in this paper

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.