Magazine: Features
Harnessing Machine Learning and Generative AI: A New Era in Online Tutoring Systems
FREE CONTENT FEATURE
Discover how the convergence of machine learning and generative AI is revolutionizing online tutoring, enabling systems that evolve to become better teachers--continuously refining their instructional methods based on student data and feedback.
Harnessing Machine Learning and Generative AI: A New Era in Online Tutoring Systems
Full text also available in the ACM Digital Library as PDF | HTML | Digital Edition
Personal human tutors set the gold standard in education by providing tailored instructions that address students' individual needs and challenges. This one-on-one interaction format enables an accelerated learning process by adapting to the student's prior knowledge, delivering appropriate learning activities, and providing immediate feedback to responses and questions. Similarly, intelligent tutoring systems (ITSs) are a digital learning technology that democratizes the benefits of personal tutoring providing access to learning materials and adaptive instruction to millions of users worldwide. ITSs promote equitable learning experiences that play an integral role in narrowing educational achievement gaps and, in certain contexts, have been found to be as effective as human tutoring [1].
Central to the functionality of ITSs are machine learning algorithms that enable these systems to mimic the diverse strategies used by human tutors. For example, tutoring systems can learn to assess students' evolving knowledge states by analyzing log data describing sequences of student interactions such as responses to practice questions and engagements with readings and videos. This enables the adaptive selection of concepts and learning activities to ensure that students are neither overwhelmed by difficulty nor bored by practicing skills they have already mastered. Furthermore, ITSs use a combination of rule-based and natural language processing (NLP) techniques to understand and respond to students' inputs enabling immediate and targeted feedback.
Although tutoring systems have significantly advanced in recent decades, teaching countless students, they still grapple to fully understand and adapt to individual student needs. These challenges stem largely from the limitations of their algorithms, which often fail to capture the intricacies of human learning, and from the intensive demands placed on instructional designers to manually craft workflows for every conceivable teaching scenario. Excitingly, recent breakthroughs in machine learning and generative artificial intelligence (AI), empowered by the vast data available in today's tutoring platforms, herald a new era for these systems. Drawing from a series of case studies, in this article we will explore how these advances are being tailored to meet the unique demands of educational environments, enhancing both the efficacy and accessibility of personalized learning experiences.
Machine Learning Meets Instructional Complexity
When I started working on education as a trained machine learning scientist, the question of how we can support human learners seemed a daunting challenge. I knew the first step required for our algorithms to make meaningful advancements is the definition of a well-posed machine learning problem. This involves (i) identifying a concrete task we want to perform, (ii) determining a performance measure that tells us how we are doing, and (iii) identifying sources of training data for our algorithms. What I did not know was how to apply this formal framework in a real-world educational setting.
Education, and particularly tutoring, is a complex and multifaceted domain for several reasons. Firstly, human teachers perform not just one, but a multitude of tasks: curating learning materials, conducting lessons, and grading assignments. Teachers also form social bonds with students and provide career advice. Secondly, the approach to teaching can vary significantly among educators. For instance, math concepts may be taught through practical, everyday problems or through more formal presentations. Furthermore, individual concepts may be taught sequentially or in an interleaved approach. In the educational literature these types of decisions are referred to as "instructional design choices" and in many contexts we haven't reached a consensus on what is most effective [2]. Thirdly, ideal instructional design often differs from one student to another. Effective teachers consider factors, such as students' prior achievements and motivation, and may adjust levels of autonomy accordingly. Lastly, measuring educational outcomes presents its own challenges. Unlike medical professionals, who can reference precise measurements of patients' vitals, educators need to infer intangible factors such as students' knowledge and motivation from tangible indicators like homework submissions and classroom behavior.
Reflecting on these complexities, it is crucial to recognize the ability of our teachers to guide us not only in acquiring fundamental skills but also in discovering our passions and becoming confident and capable members of society. To fully harness the potential of modern machine learning and generative AI, we must acknowledge the complexities of education, understand existing instructional practices, and engage in a dialogue with human teachers and students to pinpoint real-world challenges. In the context of online tutoring systems, the rest of this article will explore this process through three case studies. These studies illustrate how we can support educators to enhance established pedagogical practices, make more effective instructional design decisions, and create novel learning experiences by overcoming limitations of previous technologies.
Supporting Established Pedagogical Practice
When developing a tutoring system for a specific domain, for instance, 8th-grade algebra, instructional designers face two key questions: 1. Which learning activities should be included (e.g., lecture videos, practice problems)? 2. How can we support users during these activities (e.g., through hints, worked examples)? Guided by these questions, modern ITSs offer personalized tutoring experiences by structuring the learning process into two nested loops, as illustrated in Figure 1. The outer loop reflects on the learner's current knowledge state to select the most effective learning activity at each point in time (e.g., a practice question of appropriate difficulty). Following this, the inner loop provides targeted feedback and support as the learner engages with the selected activity (e.g., by highlighting an incorrect solution step). Upon completion, the outer loop selects the next activity, typically repeating the process until the system is confident that the learner has mastered the relevant skills.
Reasoning about this workflow, instructional designers make design choices that specify the exact pedagogy they want to implement in their systems. Two prominent examples of design choices adopted by many current ITSs are Bloom's mastery learning and the Goldilocks principle. Mastery learning postulates that students should achieve a high level of understanding in the current skill before moving to more advanced ones. This principle ensures learners build a strong foundation for mastering subsequent skills. Meanwhile, the Goldilocks principle suggests learning is most effective when the difficulty of the content is neither too hard nor too easy, but just right for the student's current ability level. This principle helps fine-tune the challenges presented to the learners, facilitating engagement and effective practice.
To effectively implement mastery learning and the Goldilocks principle as learning activity sequencing strategies within a tutoring system, the software must track the student's evolving knowledge state over time. From a machine learning perspective, this presents a supervised sequence learning task, often referred to as knowledge tracing. The primary goal is to predict a student's likelihood to answer future questions correctly based on their prior interactions with the system. To facilitate these predictions, ITSs collect various types of log data, such as submitted answers, hint usage, and time spent on lesson materials. Using this log data, we can train knowledge tracing algorithms and assess their ability to accurately predict student performance using techniques such as cross-validation. Leveraging these predictions allows ITSs to monitor whether students have mastered the skills associated with the current activity and choose subsequent tasks that are suitably challenging.
While various knowledge tracing algorithms have been developed, they still deal with limitations that current research efforts aim to overcome. For example, most existing algorithms make predictions based solely on the sequence and correctness of answered questions. By incorporating additional data types, such as response times, hint usage, and semantic relationships between learning materials, the accuracy of knowledge tracing can be enhanced, allowing more detailed modeling of the human learning process [3]. Additionally, ITSs rely on data from previous learners to analyze interactions with future learners, which leads to a cold-start problem whenever a new course is introduced. Transfer learning can address this issue by leveraging data from existing courses, enabling first-time adopters to benefit from adaptive instruction [4]. Overall, these advances improve the capability of tutoring systems to monitor and adapt to learners' evolving knowledge states, thereby supporting the implementation of established pedagogical strategies.
Making Better Instructional Design Choices
In their pursuit to create effective learning experiences, designers of ITSs are confronted with numerous design decisions. These range from selecting general instructional design principles, such as mastery learning for activity sequencing (as discussed previously), to curating individual learning and practice materials. While designers rely on their domain expertise and consider the impacts of various choices, predicting the optimal decision often remains challenging and often thousands of case-by-case decisions have to be made while preparing the content of a single course. Relatedly, this second case study explores how reinforcement learning can enhance a large-scale online tutoring system, enabling the ITS to improve learning outcomes by making data-driven design decisions and refinements based on student interaction data.
To put theory into practice, we collaborated with the CK-12 Foundation, a non-profit organization that supports millions of students across various subjects and grade levels through its Flexbook 2.0 system. A key feature of this platform is adaptive practice. The workflow develops students' skills by tracking their knowledge state, selecting practice questions of appropriate difficulty, and providing support in the form of hints, as illustrated in Figure 2. Over the years, the courses within this tutoring system have been continuously refined, accumulating a vast content base featuring tens of thousands of practice questions. We found CK-12's instructional designers had crafted multiple hints for individual questions, ranging from one or two-sentence hints and keyword definitions to more comprehensive explanatory paragraphs. This wide variety led us to the question: Which exact hints yield the best learning outcomes when provided after an incorrect student response?
Educators need to infer intangible factors such as students' knowledge and motivation from tangible indicators like homework submissions and classroom behavior.
We approached this task by framing it as a multi-armed bandit problem, where our goal was to learn an assistance policy that for each practice question optimally selects hints based on a carefully designed reward function. In collaboration with CK-12 experts, we compiled a comprehensive list of potential learning outcome measures, including students' session performance, question-solving times, and self-reported confidence levels. Using offline policy evaluation techniques, we then analyzed student log data to study trade-offs that can occur when optimizing policies for these different measures. This allowed us to assess the impacts of various potential policies without deploying them in the online system. We discovered the hints that aid students the most when reattempting the current question are not always best for their performance on future questions. Furthermore, we found no single hint type, such as keyword definitions, consistently outperforms others, highlighting the need for data-driven algorithms capable of making effective case-by-case decisions. Ultimately, we decided on a reward function prioritizing two key outcomes: engagement through immediate problem-solving support and the provision of generalizable insights that help students solve subsequent questions within the practice session.
After everything was in place, we deployed the assistance policy trained using this reward function inside the platform and conducted an A/B evaluation, which verified significant improvements in student learning outcomes [5]. The reinforcement learning approach equipped the ITS with the ability to automatically refine its teaching strategies by revising design choices based on student interaction data. The related software infrastructure also enabled CK-12's instructional designers to analyze how the learning materials they created impact the learning process. They can now refer to examples of effective hints and identify questions where content can be revised. To further enhance assistance within adaptive practice, we have started employing causal machine learning techniques to better understand which types of hints are most beneficial for different student groups. Additionally, we are developing algorithms capable of assessing the effectiveness of hints based on textual features by leveraging existing log data [6], thereby minimizing the need for future live content evaluations.
Generative AI As Engine For Future Learning Technologies
In the context of tutoring systems, generative AI serves as an enabling technology that addresses several long-standing challenges. Firstly, creating one hour of ITS content can take human experts hundreds of hours, depending on the depth of instructional design and authoring tools used [7]. This significant investment typically restricts ITSs to focus on core subject areas and key demographic groups. By accelerating system authoring, by an order of magnitude potentially, generative AI can increase the range of topics covered and the diversity of learners served. Secondly, previous NLP techniques often limited ITSs' ability to fully comprehend learners' textual inputs and to meet their personal needs, for example by answering questions or by clarifying misconceptions [8]. In this third case study, we illustrate how generative AI is fundamentally redefining the way tutoring systems operate, drawing from our experience with designing and evaluating Ruffle&Riley [9].
Ruffle&Riley represents a novel type of tutoring system that enables instructional designers to realize tutoring experiences by building on the foundations provided by large-language models (LLMs). This begins by assisting educators to define what they want to teach. The system uses an AI-assisted content authoring process that automatically generates a tutoring script from existing lesson texts. This script consists of a series of guiding questions, each with discussion points aligned to the learning objectives, and can be easily edited by instructional designers to meet their specific needs. The system then orchestrates a free-form conversational learning activity that centers around the topics specified in the tutoring script in a learning-by-teaching format, as shown in Figure 3. In this format, the learner engages with two LLM-based agents: Ruffle, who acts as a student, and Riley, who acts as a professor. The learner's task is to teach the topics in the script to Ruffle, with guidance and support from Riley.
Ruffle&Riley emulates human tutors via a strategy known as expectation misconception tailoring (EMT) [8]. Ruffle (the student agent) initiates the interaction by asking the learner to explain a guiding question. Based on the learner's response, it then continues to pose follow-up questions until all key points expected in the tutoring script are covered. Concurrently, Riley (the professor agent) monitors the learner's responses for misconceptions and offers targeted clarifications. This workflow requires the tutoring system to understand and respond to student inputs within context. Traditionally, such systems needed domain experts to predefine dialogue moves and conversational turn management, which constrained them to predefined teaching scenarios. In contrast, Ruffle&Riley automates these processes using LLMs. It integrates both the tutoring script and conversational strategies into the agents' prompts, employing the LLMs to translate the instructional content into a dynamic learning workflow based on the instructional design choices specified in the prompts.
In our user studies, participants using Ruffle&Riley reported high levels of engagement. They found the system helpful and enjoyable, with some dialogues spanning up to 50 turns. This highlights the underlying GPT-4 model's ability to facilitate coherent and extended conversational tutoring. Numerous open questions remain regarding the optimal use of generative AI in educational settings, particularly concerning the accuracy and reliability of explanations provided by LLM-based agents. In Ruffle&Riley, we addressed this by ensuring the agent's responses remain within the confines of the current lesson text. Looking ahead, AI-based authoring tools are expected to enable educators and researchers to focus more on refining instructional design, rather than on content creation and technical execution. This shift has the potential to accelerate the evaluation of instructional design principles, leading to enhancements in ITSs and broader insights for the field of learning science.
Reflecting on these three case studies, we explored how machine learning and generative AI can enhance ITSs by personalizing and adapting learning experiences. The first case study highlighted the use of supervised learning to develop predictive models that generate insights into the human learning process (e.g., by assessing student proficiency across multiple skills). Tutoring systems leverage these insights to implement pedagogical strategies, such as selecting practice questions at the appropriate difficulty level. In the second case study, we examined how reinforcement learning leverages usage data to help domain experts make data-driven design decisions, like determining the most effective hints for practice questions, thereby refining instructional strategies automatically over time. The third case study introduced a recent ITS that uses LLMs for AI-assisted content authoring and facilitating free-form conversational tutoring. This demonstrated how foundation models can create personalized content on-demand, enabling tutoring systems to operate outside the boundaries of content predefined by instructional designers.
In the context of tutoring systems, generative AI serves as an enabling technology that addresses several long-standing challenges.
As we move forward, it is evident that we are entering a new era for online tutoring systems. Educational platforms such as CK-12, Khan Academy, and Duolingo now offer AI-driven conversational learning experiences, making advanced technologies accessible to millions of learners worldwide. Furthermore, machine learning enables tutoring systems to refine their instructional methods and content continuously based on student interactions and feedback. While these advancements promise to enhance the effectiveness of online learning, it is crucial to maintain a focus on ethical considerations and to consider how these technologies affect students' lives. Ensuring that these technologies are developed and implemented with a strong ethical framework and deep understanding of the unique demands of educational environments is essential, as it will help safeguard the integrity of educational practices and foster a trustworthy environment for learners.
[1] VanLehn, K. The relative effectiveness of human tutoring, intelligent tutoring systems, and other tutoring systems. Educational Psychologist 46, 4 (2011), 197–221.
[2] Koedinger, K. R. et al. Instructional complexity and the science to constrain it. Science 342, 6161 (2013), 935–937.
[3] Schmucker, R. et al. Assessing the performance of online students - new data, new approaches, improved accuracy. Journal of Educational Data Mining 14, 1 (2022), 1–45.
[4] Schmucker, R. and Mitchell, T. M. Transferable student performance modeling for intelligent tutoring systems. In Proceedings of the International Conference on Computers in Education (ICCE '22). APSCE, 2022, 13–23.
[5] Schmucker, R. et al. Learning to give useful hints: Assistance action evaluation and policy improvements. In Responsive and Sustainable Educational Futures: 18th European Conference on Technology Enhanced Learning (EC-TEL 2023). Proceedings. Springer-Verlag, Berlin, Heidelberg, 2023, 383–398.
[6] Zhang, T. et al. Learning to compare hints: Combining insights from student logs and large language models. In Proceedings of the 2024 AAAI Conference on Artificial Intelligence. PMLR 257, 2024, 162–169.
[7] Aleven, V. et al. Example-tracing tutors: Intelligent tutor development for non-programmers. International Journal of Artificial Intelligence in Education 26 (2016), 224–269.
[8] Nye, B. D. et al. AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education 24, (2014), 427-469.
[9] Schmucker, R. et al. Ruffle&Riley: Insights from designing and evaluating a large language model-based conversational tutoring system. In Olney, A. M. et al. (eds) Artificial Intelligence in Education. AIED 2024. Lecture Notes in Computer Science (LNAI), vol 14829. Springer, Cham, 2024; https://doi.org/10.1007/978-3-031-64302-6_6.
Robin Schmucker is a Ph.D. candidate in the Machine Learning Department at Carnegie Mellon University, advised by Prof. Tom Mitchell. His research focuses on machine learning for education, particularly in the context of large-scale online learning systems. At the CK-12 Foundation, his algorithms for student knowledge modeling and content selection serve millions of learners worldwide.
Figure 1. Intelligent tutoring systems (ITSs) provide personalized learning experiences by adaptively selecting learning activities and supporting users in the learning process.
Figure 2. Machine learning can help us make better instructional design choices for example by determining which exact hint among multiple candidates is best for a particular practice question
Figure 3. Examples of LLM-driven workflows inside Ruffle&Riley. Users are asked to teach Ruffle (the student agent) in a free-form conversation while receiving support from Riley (the professor agent).
This work is licensed under a Creative Commons Attribution International 4.0 License.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2024 ACM, Inc.