Ethics, Governance, and User Mental Models for Large Language Models in Computing Education

Magazine: Features
Ethics, Governance, and User Mental Models for Large Language Models in Computing Education

FREE CONTENT FEATURE

Large language models like ChatGPT are disrupting many industries, including computing education. How should policy evolve to improve learning outcomes?

Ethics, Governance, and User Mental Models for Large Language Models in Computing Education

By Kyrie Zhixuan Zhou, Zachary Kilhoffer, Madelyn Rose Sanfilippo, Ted Underwood, Ece Gumusel, Mengyi Wei, Abhinav Choudhry, Jinjun Xiong, October 2024

Full text also available in the ACM Digital Library as PDF | HTML | Digital Edition

Tags: Artificial intelligence, Computer-assisted instruction, Management of computing and information systems, Surveys and overviews

Large language models (LLMs) have advanced significantly in recent years, leading to widespread adoption of the powerful products they enable. As of 2024, millions of people use LLM-powered chatbots, like OpenAI's ChatGPT and Baidu's Ernie, for various purposes. LLMs can interpret and generate natural and computer languages to assist with cognitively demanding tasks, offering creativity support, task automation, and additional functionalities. But what does this mean for education?

The education space is rich with potential LLM use cases, leading to an emerging body of literature on the use of LLMs in higher education. Research suggests LLMs can help engage students, facilitate collaboration, and personalize learning experiences. Researchers have highlighted these and other benefits, as well as risks, of LLMs in education.

The ethics of LLMs in computer science education and related computing disciplines are notable for several reasons. First, many students in computing disciplines will work on LLMs and related products and serve as decision-makers in various industries in the future. Second, students in computing disciplines can use two essential features LLMs offer: writing text and coding. This is a profound innovation with significant effects on coding education. Traditionally, early computing classes focus on coding assignments, but LLMs offer a way for students to generate code without learning to do so, comprehending its meaning, or relying on other people. Third, students in computing disciplines are better positioned to understand how LLMs function, which may have interesting impacts on their LLM usage and ethical concerns.

Existing research on LLMs in computing education concerns both practical and ethical aspects. It remains unclear if and how novice programmers validate LLM outputs, which is essential for effective usage. Lau and Guo, therefore, emphasized the need to build a theory of how artificial intelligence (AI) novices construct mental models on LLMs [1]. Further gaps include: 1. how computing curricula can use LLMs toward greater equity and access, rather than furthering a digital divide; 2. how introductory computing curricula and assessment thereof may evolve, given the ease of doing certain types of coursework with LLMs; and 3. what type of policies are desirable to govern LLM usage in computing education.

To bridge these research gaps, we interviewed 20 computing education stakeholders, including four undergraduate students (U1–U4), four master's students (M1–M4), four doctoral students (D1–D4), four professors (P1–P4), and four industrial practitioners (I1–I4). The students and teachers were mostly in computer science or information science. Participants shared their experiences and opinions on LLMs, in particular, ChatGPT, in computing education. Based on these, we extracted mental models of ChatGPT use in terms of a coding tool, a writing tool, and an information tool. Our participants held nuanced and intertwined ethical concerns and overwhelmingly supported explicit, but permissive, LLM policies.

ChatGPT Use and Experience

Our participants used LLMs, mostly ChatGPT-3.5, for writing (N=16), Q&A (N=15), and coding (N=14) in diverse academic and industrial settings. Other LLMs, like those powering Baidu's Ernie or Microsoft's Copilot, were mostly mentioned as comparisons. The participants understood the utility and ethics of LLMs through the lens of their experience with ChatGPT.

Many participants perceived ChatGPT positively as "the great invention of humans" (U2), an efficiency booster, and a good tool for education. Compared to other AI tools and expert systems that our participants had used, ChatGPT had multiple advantages. ChatGPT users enjoyed more autonomy and control compared to other AI tools, according to U3, who added: "It's an active use of AI. You're taking control. Other products provide AI-generated things for you, and you can use them in no other way." I1 compared the general-purpose ChatGPT to other specific-purpose AI tools such as TikTok filters: "[Other AI tools are for] specific usage and can't be asked questions. ChatGPT is a universal, general agent. You can ask ChatGPT everything." I3 and I4 valued the personalization ChatGPT affords compared to other AI implementations. Participants learned how to write better prompts to make the best use of ChatGPT's powerful Q&A and feedback capability.

Disadvantages of ChatGPT included token limitation (M1, I4: "Claude can handle longer text [compared to ChatGPT]"), generating hallucinations (P3), deep fake risks (U4), lack of integration with search engines (D1: "Copilot is better than ChatGPT in coding as it's combined with search engines"), and being close-sourced (D4: "CLIP is trainable and fine-tuning is affordable"). Participants typically regarded ChatGPT outputs with a grain of salt. They applied their own critical thinking to learn, ensure the programs were correct, or prevent essays from being identified as ChatGPT-generated. Notably, some disadvantages of ChatGPT, such as hallucinations and deep fake risks, are shared across LLMs.

Ethical Concerns

Inaccuracy. Twelve participants re-called inaccurate responses or factual errors when interacting with ChatGPT. Repeated incorrect answers upset U3. D3 estimated an accuracy rate of 60–70%, so she always needed to use her judgment. D1 similarly gave an accuracy rate of 60%, but both of them acknowledged ChatGPT was more accurate than search engines (30–40% accuracy, D1). P2 attributed inaccurate answers to the fact that ChatGPT always confidently generated an answer and never said, "I don't know." When D1 tested ChatGPT on history knowledge, such as historical Chinese figures, it often erred. ChatGPT may generate code with bugs or misunderstand user instructions. The inaccuracy of ChatGPT responses may naturally lead to students' low grades in assignments and essays (U4). P2 thought professors should communicate with students to help them understand this issue and make good use of ChatGPT.

Hallucination. When LLMs output false information as if it were true, this is called a hallucination. Hallucinations were prevalent in our participants' ChatGPT use, especially when ChatGPT responded to domain-specific questions. Five participants discussed hallucination in code or algorithm generation, e.g., suggesting non-existent functions and libraries (I1), making up HTML elements (I3), and using non-existent figure plotting packages (D3). Eleven participants complained about hallucinated references from ChatGPT, which hurt its utility as a literature review tool. P3 suggested other customized LLMs that give better references. P1 suggested ChatGPT may help with checking references instead of constructing false references if used wisely. P2 took an ontological approach, teaching students to think about what truth is—this has "become a more important digital literacy skill in the LLM era."

Bias. Nine participants were not worried about bias carried by ChatGPT. U2 observed ChatGPT was good at avoiding answering potentially biased questions. Often, it gave a disclaimer: "I'm just a model…." Most participants felt bias in ChatGPT may impose stereotypes or Western-centric views but contextualized the risk. I1 thought university students could understand and recognize bias, but it was more of a concern for younger students. M4 acknowledged ChatGPT's bias but thought it displayed bias less than search engines and many of his professors. He liked the safety feature of ChatGPT, which allows people to report harmful responses. Some participants proposed countermeasures (N=5) to mitigate the impact of biased responses. M3 tried to prevent bias by prompting ChatGPT appropriately, e.g., explicitly asking ChatGPT not to treat different genders or races in a stereotypical way. P1 actively taught about bias in her class, for example, asking students to discuss gendered language or what a computer programmer looks like. Her thinking was that students would be equipped to identify bias in LLMs. P3 suggested not using AI in critical applications such as hiring.

Privacy. People had varied levels of concern regarding privacy leakage when interacting with ChatGPT, and four participants were not concerned at all. Some participants viewed the privacy issue contextually. P1 thought the privacy issue was not as severe as other LLM ethics issues: "Google has tons of user information. We don't know how it's used, but we still want to use free Gmail." I4 criticized OpenAI for transparency, stating dark patterns tricked people into giving their data for training ChatGPT: "It's not legal to use your data for training, but they make it look legal by collecting your data in an unnoticeable manner." Specific concerns included privacy leakages, privacy-preserving practices, third-party data flows in the use of ChatGPT, and inequities that might exacerbate digital privacy divides.

Universities should prepare students for the LLM era and, in the meantime, adopt necessary regulations and restrictions to help them learn.

Academic integrity. Nine participants raised cheating as a concern for academic integrity because it may hurt students' learning efforts. U3 acknowledged: "Sometimes I hit ChatGPT with all the questions without thinking by myself. I do this even if I have AI literacy. I'm not learning from the problem-solving process." A lack of shared understanding of what constituted cheating, knowledge, or education further complicated the issue. I1 illustrated this point, "How do you define you're studying? What is knowledge? Before, we learned and memorized knowledge from previous people. Now we just ask ChatGPT. I don't know if students learn less, but more efficiently for sure." Six participants believed cheating only occurred when ChatGPT was misused; otherwise, it was a good learning tool. U4 suggested using detection tools to curb ChatGPT's strong generative power in helping students cheat in exams. However, three other participants mentioned potential errors and biases introduced by detection tools for AI-generated content. U1 argued translated content was more likely to be classified as AI-generated, leading to more false positives for non-native English speakers who used translation tools for their assignments.

LLM Usage, Ethics, and User Mental Models

Mental models are "the subjective and hypothetical mental representation that integrates an individual's memory, knowledge, perception, and assumptions of the target system" [2]. To unite LLM usage patterns with ethics, thinking about user mental models can be useful. Our participants perceived ChatGPT as a writing tool, a coding tool, and an information tool. Each mental model for ChatGPT has specific use cases, possible alternatives (e.g., CoPilot for coding), ethical implications, and practical precautions, as shown in Figure 1.

Mental Model 1: LLMs as a writing tool. Most participants (N=16) used LLMs, in particular, ChatGPT, as a writing tool to write new texts or improve existing ones. While most students claimed not to use ChatGPT to write essays/assignments per se, ChatGPT still helped write emails, draft "first try" paragraphs in academic essays, and paraphrase prompts into academic style. For the most part, before LLMs were deployed as convenient chatbots, tools could translate and check grammar/spelling, but no comparable tools existed that could draft unique text. A few participants who were not English-native speakers used ChatGPT more heavily for proofreading and also for translation tasks. Participants generally expressed that as a writing tool, ChatGPT posed the greatest risk for academic integrity, requiring very precise plagiarism definitions (which are typically lacking).

Mental Model 2: LLMs as a coding tool. Most participants (N=14) used ChatGPT to assist them in coding tasks. Academic integrity was less of an issue here, as code plagiarism has different standards than writing (e.g., copying a function from StackOverflow is not usually problematic) [3]. However, participants emphasized the possible tradeoff between "doing coding" and "learning coding" for beginners. Additionally, many emphasized the responsibility of coders to understand and test code, especially before deploying it in sensitive contexts.

Mental Model 3: LLMs as an information tool. Our participants (N=15) also used ChatGPT for information retrieval and summarization. Many participants discussed the advantages and disadvantages of ChatGPT for this use and shared when ChatGPT works better for them than Wikipedia, Google, and other information tools. Here, the novelty of ChatGPT is less considerable than writing or coding, but ChatGPT was a useful alternative to search engines or asking a human. For example, treating ChatGPT as a Q&A bot was helpful in quickly summarizing an unfamiliar topic or retrieving a good definition. A few participants also used ChatGPT where they were likely to fail, such as accurately sourcing information for references or literature review.

Policy and Governance of LLMS in CS Education

Don't ban LLMs, regulate them. All participants agreed that LLMs should not be banned, as they were useful for learning and were prevalent in the workplace. Universities should prepare students for the LLM era and, in the meantime, adopt necessary regulations and restrictions to help them learn. U1 gave an example of how LLMs helped students learn to code: "Professors should not ban AI uses. AI is quite useful because it can generate code correctly. It's useful for students to learn to code. Professors may not have enough time to teach it in class." Banning was also viewed as impractical.

Another consensus was students should not use LLMs for everything and restrictions should be implemented. M3 leaned toward more restricted LLM policies given the "lazy nature of human beings." M2 mentioned an incident in a co-op course for preparing students for internships and full-time jobs, where students used ChatGPT to write their cover letters and emails intended for hiring managers even after the teacher had warned against such use. M1 thought freshmen and entry-level students should use ChatGPT less in their coursework; otherwise, they would lose the chance to learn to code.

According to our participants, university policies regarding LLM use should be flexible and contextual. D4 believed LLM use in different scenarios needed to be treated differently—using LLMs to answer homework questions was not acceptable while using them to refine reports was fine as long as the students highlighted the parts generated by LLMs. P1 added that LLMs were new, and more observations were needed to inform an effective policy.

There was no consensus regarding who was responsible for ensuring students use LLMs ethically. Some thought students held responsibility. For example, P1 said students were responsible as it was the student's learning process and ultimately, students decide how to learn with integrity. Five participants put the responsibility on universities or departments. I1 thought departments should start from a high level by guiding teachers and students since students and teachers were confused about what responsible use of LLMs would look like. Five participants agreed it was the joint responsibility of students, professors, and universities to make students ethically use LLMs.

Any education policy on LLMs needs to clarify the boundary between acceptable and unacceptable LLM usage as precisely and specifically as possible.

In the LLM era, we must rethink education, assessment, and academic integrity. Re-constructing assessments and assignments was commonly expressed as a solution. I2 was a teaching assistant and tended to ask students to write essays that LLMs could not generate. She would ask students to write specific points, e.g., how a particular argument relates to a particular lecture—information LLMs would not have access to. U2 would like to see changes to class syllabi, with a new focus on innovation instead of memorization to prepare students for a future where AI is everywhere. U4 thought the curriculum needed to be updated and made more challenging. Correspondingly, D2 wanted to see more developed, constructive policies and wording regarding what constituted plagiarism if using LLM-generated content.

Proactive LLM education for students and professors. Instead of regulating LLM use, more participants wanted to see proactive education to teach students how to use LLMs ethically and responsibly. P3 thought tech ethics was always a conversation to be had and should be integrated from Day 1. She told us how she would convey the message to students in her course, "In my class, I'll tell my students, 'You're gonna use it [ChatGPT]. ChatGPT is a baseline, and you need to make your answers better than that.'" From a student's perspective, M4 suggested having courses on how to use LLMs. He thought how to use LLMs and eliminating their misuse were important parts of digital literacy in the LLM era. Though P3 had not seen university rules or professors prohibiting LLMs, she thought universities should be responsible for delivering ethics courses to teach and educate students to make their own decisions. Professors should be educated about LLMs in a way that allows them to realize their benefits for student learning (D3).

Discussion and Takeaways

Ethics of LLMs in computing education. The participants' views on ethical concerns raised in the literature confirmed the relevance of hallucination, bias, inaccuracy, etc., for LLMs in computing education. Additionally, our participants raised diverse concerns, ranging from potential monopoly to energy consumption in training LLMs. We find evidence of the complex interaction of LLM usage patterns and ethical concerns. For example, our participants commonly opined that novice coders would not learn with ChatGPT's generative power, which could result in deploying unethical code. This means the tension between LLMs as a learning tool and LLMs as a coding tool also implicates accountability and responsibility. Other remaining challenges include inaccurate or biased detection tools, and a lack of shared understanding of plagiarism, educational assessment, and knowledge formation.

Almost unanimously, the participants believed critical thinking and verification are required to combat inaccuracy and hallucination. Further, students should be held responsible for LLMs-generated content, either text or code, echoing previous research [4]. The influence of anticipated guilt, detection probability, sanction severity, and personal reputational risk on students' ethical use of LLMs [5] was supported by our sample.

It is important to regulate ethical risks embedded in LLMs that do not exclusively pertain to educational or academic contexts, e.g., privacy. Our participants are worried about the privacy of students, research participants, or clients even if they are desensitized to their personal privacy. Professors refuse to feed student assignments into LLMs due to concerns about student privacy, copyright, and consent. Students are worried that data breaches will lead to the revealing of their LLM use, suggesting the interaction between privacy and academic integrity concerns. We find support for a privacy gap that could widen with time—the paid version of ChatGPT affords better privacy protection—threatening equitable educational opportunities and inviting further digital divides.

Adjusting policy and education. LLMs have brought two major challenges to computing education: First, they might impact the effectiveness of student learning, and second, they might make assessing students much harder for teachers. However, instead of banning LLMs, which is impractical and likely to fail, our study suggests universities should prepare students for the LLM era and, in the meantime, adjust policy and education to help students learn.

We should think of what could be considered ethical/reasonable use of LLMs in education. One way is to look at earlier technological advances as precedents, like how math curricula adapted to the spread of calculators [6]. Any education policy on LLMs needs to clarify the boundary between acceptable and unacceptable LLM usage as precisely and specifically as possible. Based on our study, acceptable usage presumably involves students acknowledging that they used an LLM and citing it; students should not present an LLM's work as their own. Much trickier is defining what uses are acceptable in a computer science context, as reusing open-sourced code has become a norm. Failing to clarify LLM usage in computing education will have a few impacts. First, some students who could benefit from using LLMs will not, for fear of violating rules of academic integrity. It may be especially problematic for international and EFL (English as a foreign language) students. Second, well-meaning students who use LLMs risk accidentally violating academic integrity rules. Finally, with the prevalent use of LLMs, curricula, especially for entry-level computer science courses, may stay poorly suited for assessing student learning outcomes, hurting students' learning and fairness.

We need to adjust approaches to policy and education by:

Developing new and responsive sociotechnical governance for LLMs that extend beyond privacy regulation.
Reconceiving educational policy beyond one-size-fits-all approaches, moving toward flexible, adaptive, and contextual educational policy.
Building accountability measures to ensure those responsible for the development, deployment, and control of LLMs are engaged in governance and impacted by enforcement measures associated with emergent governance.
Revitalizing education to intentionally transform engagement between learners and teachers amidst these new technologies and embed responsible computing literacy within education.

References

[1] Lau, S. and Guo, P. From "ban it till we understand it" to "resistance is futile": How university programming instructors plan to adapt as more students use AI code generation and explanation tools such as ChatGPT and GitHub Copilot. In Proceedings of the 2023 ACM Conference on International Computing Education Research - Volume 1 (ICER '23). ACM, New York, 2023, 106–121; https://doi.org/10.1145/3568813.3600138.

[2] Hu, X. and Twidale, M. A scoping review of mental model research in HCI from 2010 to 2021. In HCI International 2023 – Late Breaking Papers: 25^th International Conference on Human-Computer Interaction, HCII 2023, Copenhagen, Denmark, July 23–28, 2023, Proceedings, Part I. Springer-Verlag, Berlin, Heidelberg, 2023, 101–125; https://doi.org/10.1007/978-3-031-48038-6_7.

[3] Lakhani, K. R. and Robert G Wolf, R. G. \ Why hackers do what they do: Understanding motivation and effort in free/open source software projects. Sept. 15, 2003. SSRN; https://ssrn.com/abstract=443040.

[4] Laato, S., Morschheuser, B., Hamari, J., and Björne, J. AI-assisted learning with ChatGPT and large language models: Implications for higher education. In 2023 IEEE International Conference on Advanced Learning Technologies (ICALT). IEEE, 2023, 226–230.

[5] Florentin, G. F. N., Niu, B., and Eivazinezhad, S. Generative conversational AI and academic integrity: A mixed method investigation to understand the ethical use of LLM chatbots in higher education. August 22, 2023. SSRN; https://ssrn.com/abstract=4548263.

[6] Waits, B. and Pomerantz, H. The role of calculators in math education. Department of Mathematics of The Ohio State University, 1997, 39–43.

Authors

Kyrie Zhixuan Zhou is a Ph.D. candidate in the School of Information Sciences at the University of Illinois at Urbana-Champaign. His research interests are broadly in tech accessibility, tech ethics, and tech education. He aspires to design, govern, and teach about ICT/AI experience for vulnerable populations.

Zachary Kilhoffer is a tech policy researcher and Ph.D. candidate at the University of Illinois at Urbana-Champaign. With his background in governance and technical ML studies, Kilhoffer aims to standardize development and deployment procedures to make AI systems more fair, accountable, transparent, and ethical. In his free time, Kilhoffer likes woodworking, sci-fi, and spending time with his cats Theodore Roosevelt (Teddy) and Franklin Delano Roosevelt (Frankie).

Madelyn Rose Sanfilippo is an assistant professor of information sciences at the University of Illinois at Urbana-Champaign and an expert on data governance, technology policy, and privacy. She is a Public Voices Fellow with The OpEd Project and a general editor for Cambridge Studies on Governing Knowledge Commons.

Ted Underwood is a professor of information sciences and English at the University of Illinois at Urbana-Champaign. He has written extensively about the interaction of machine learning and literary culture.

Ece Gumusel is a Ph.D. candidate in information science with a minor in computer science at Indiana University, Bloomington. Her doctoral research focuses on user privacy dynamics in conversational text-based AI chatbots through mixed-method approaches. Her research agenda encompasses areas such as usable privacy, privacy compliance, human-computer interaction, and social informatics. Prior to pursuing her Ph.D., she earned her LL.M. (Master of Laws) in intellectual property and technology law at the University of Illinois at Urbana-Champaign, and her LL.B. (Bachelor of Laws) at Başkent University.

Mengyi Wei is a Ph.D. candidate in cartography and visualization analytics from the Technical University of Munich and is a member of the Doctoral Training Network (DTN) of EIT Urban Mobility. Her work focuses on the social impact of AI ethics, aiming to help the public better understand and address AI ethics issues through visual analytics.

Abhinav Choudhry has a bachelor's in computer science engineering from Rajiv Gandhi Technological University, a master's in finance from Delhi University and a master's in environmental science & policy from Cornell University, and is getting a Ph.D. in information sciences at the University of Illinois Urbana-Champaign. In addition, he has worked in financial services with Citibank in mortgages, and he also has worked as an associate researcher with Tata-Cornell Institute, Cornell University. Besides that, he has diverse project and internship experiences in food, agricultural finance, biodiversity conservation, and renewable energy. He is currently doing research work on AI for speech-language therapy, financial literacy and planning for older adults, conversational AI-based mHealth for older adults, cancer survivors, and those suffering from chronic diseases, and information search behavior changes induced by generative AI.

Dr. Jinjun Xiong is the Empire Innovation Professor with the Department of Computer Science and Engineering at the University at Buffalo (UB). He serves as the scientific director for the National AI Institute for Exceptional Education, which aims to advance foundational AI technologies for their applications in helping children with speech and language processing challenges. Dr. Xiong is also the director for the SUNY-UB Institute for Artificial Intelligence and Data Science. His research interests are on across-stack AI systems research, including AI applications, algorithms, tooling and computer architectures. Many of his research results have been adopted in IBM's products and tools. He published more than 160 peer-reviewed papers in top AI conferences and systems conferences. His publication won eight Best Paper Awards and received nine nominations for Best Paper Awards.

Figures

Figure 1. LLM user mental models in computing education. (Missing bullets in the figure indicate that the mental model does not relate to the ethical concern.)

This work is licensed under a Creative Commons Attribution International 4.0 License.

Crossroads The ACM Magazine for Students

Magazine: Features Ethics, Governance, and User Mental Models for Large Language Models in Computing Education

FREE CONTENT FEATURE

Ethics, Governance, and User Mental Models for Large Language Models in Computing Education

Magazine: Features
Ethics, Governance, and User Mental Models for Large Language Models in Computing Education