Generative AI's Impact on Student Performance
Generative AI's Impact on Student Performance
Johannes Voshaara
ORCID: https://orcid.org/0000-0003-0276-4509
LinkedIn: https://www.linkedin.com/in/johannes-voshaar-431aa6201
Jochen Zimmermanna
ORCID: https://orcid.org/0000-0002-1189-7007
a
Faculty of Business Studies and Economics, University of Bremen, Bremen,
Germany
This version: May 2024
Abstract
This study evaluates the impact of students’ usage of generative artificial intelligence
(GenAI) tools such as ChatGPT on their academic performance. We analyze student essays
using GenAI detection systems to identify GenAI users among the cohort. Employing
multivariate regression analysis, we find that students using GenAI tools score on average
6.71 (out of 100) points lower than non-users. While GenAI tools may offer benefits for
learning and engagement, the way students actually use it correlates with diminished
academic outcomes. Exploring the underlying mechanism, additional analyses show that the
effect is particularly detrimental to students with high learning potential, suggesting an effect
whereby GenAI tool usage hinders learning. Our findings provide important empirical
evidence for the ongoing debate on the integration of GenAI in higher education and
underscores the necessity for educators, institutions, and policymakers to carefully consider
its implications for student performance.
The launch of OpenAI’s user-friendly and conversational ChatGPT in November 2022 made
generative artificial intelligence (GenAI) models widely accessible to a broad audience, regardless
of technical proficiency (Kishore et al. 2023). ChatGPT can process and generate natural language
question-answering, and machine translation (Lee 2023; Zhu et al. 2023). These novel
characteristics led to a tremendous surge in public attention, with over 100 million monthly active
users two months after its launch (Ahangama 2023; Gregor 2024). The launch spurred heated
discussions on the implications of GenAI across various sectors of society. Researchers and media
voiced concern about its potential for spreading misinformation, undermining trust (Hsu and
Thompson 2023), and threatening democratic processes and social cohesion (Ferrara 2024).
The popularity of ChatGPT among students has led to extensive debate over what role
GenAI applications should play in higher education (Abdaljaleel et al. 2024; Katavic et al. 2023;
Ngo 2023; Strzelecki 2023; Tiwari et al. 2023). GenAI tools offer considerable benefits such as
enabling personalized learning and adaptive instruction, enhancing learning efficiency and student
engagement, as well as providing intelligent tutoring systems including real-time feedback, hints,
and scaffolding (Chen et al. 2022; Kishore et al. 2023). Nevertheless, these tools may hinder
students’ ability to think independently and critically and to solve problems; they also harbor
strong potential for perpetuating biases and misinformation (Kasneci et al. 2023; Kishore et al.
2023). Some educational institutions have thus prohibited the use of ChatGPT at school or blocked
This debate has been informed by recent research examining the implications of GenAI
applications in higher education settings. Studies have found that GenAI enhances student
1
understanding, engagement, and academic writing by simplifying information (Engelmann et al.
2023; Pavlik 2023), providing grammatical feedback (Perkins 2023; Wu et al. 2023), and fostering
creativity (Qadir 2023). Other research has also found detrimental effects, such as GenAI leading
2022), reduced social interaction (Eager and Brunton 2023), and superficial learning (Rasul et al.
2023). Studies across the board have identified various ways GenAI can support or hinder student
learning. However, the overall effect on students’ performance remains unclear. We seek to bridge
this gap by addressing the research question: How does students’ GenAI usage affect their exam
performance?
To do so, we collect a sample of student data and empirically assess the effect of GenAI
usage of students’ academic performance. We employ a fixed effects regression model controlling
for numerous factors affecting the exam score. Because students’ use of GenAI for writing case
study essays is not directly observable, we harness the capabilities of ZeroGPT, a renowned GenAI
detection system. We conduct a set of robustness checks to ensure our findings hold under different
approaches for identifying GenAI usage and we address potential endogeneity issues using entropy
balancing. In additional analyses, we utilize an identification strategy with exam retakers to further
examine the causal effect and disaggregate the effect by considering students’ learning potential.
Our results show that students who use GenAI score significantly lower on exams. This finding
holds after including control variables, fixed effects, and several robustness checks. The negative
effect is particularly large for students with high learning potential, indicating that GenAI use
We contribute to the existing literature in several ways. First, we extend the information
systems (IS) research on GenAI applications by examining its implications in higher education. A
2
number of IS studies emphasize the technical capabilities and power of GenAI tools and its
disruptive potential (Ahangama 2023; Chen et al. 2020; Kasneci et al. 2023; Lee 2023; Mu et al.
2022; Radford et al. 2018). We shed light on how GenAI impacts critical areas of society by
documenting empirical evidence on the effect of GenAI usage on exam scores. Second, we
performance. Educational institutions and researchers have discussed the costs and benefits of
GenAI extensively, ultimately asking whether tools such as ChatGPT should be banned in higher
education (Chhina et al. 2023; Kishore et al. 2023; Van Slyke et al. 2023). We not only provide
evidence on the implications of GenAI for academic performance, but are also able to address
effects on different user groups. Third, by using detection systems to identify GenAI usage, we
extend research on the functionalities of GenAI detectors, discuss various approaches to identify
AI-generated content, and document empirical findings on which GenAI detectors provide
trustworthy results.
The paper is structured as follows: Section 2 covers the conceptual basics of GenAI and
large language models (LLMs) and summarizes the relevant literature. In Section 3, we describe
our multivariate model, discuss AI detectors, and provide details on our sample and descriptive
statistics. Section 4 presents the empirical results, robustness checks, and additional analyses,
which are discussed in detail in Section 5. In Section 6, the paper concludes with a summary, an
GenAI refers to machine learning techniques (e.g., neural networks) to create seemingly novel and
meaningful data instances or artifacts based on patterns and relationships in training data
3
(Feuerriegel et al. 2024; Tao et al. 2023). These artifacts appear in various forms such as text,
images, sound, and video (Alavi et al. 2024). LLMs are a subset of GenAI models capable of
processing and creating natural language by applying learning technologies to extensive datasets
(Lee 2023). They can comprehend context and create textual data outputs similar to human
language without requiring specific input formats (Brown et al. 2020; Teubner et al. 2023; von
GenAI constitutes the larger technological infrastructure required for the practical
implementation of LLMs, including the actual model and user-facing components, their modality,
and corresponding data processing (Feuerriegel et al. 2024). Such implementation enables users
to enter input data and instructions conditioning the LLM, which is referred to as prompting
(Feuerriegel et al. 2024; von Brackel-Schmidt et al. 2023). With the emergence of conversational
LLMs (e.g., models with a chat-based interface), prompting shifted from one-off inputs toward
multi-step interactions (von Brackel-Schmidt et al. 2023). Such GenAI models are capable of
completing various tasks, such as developing creative ideas, software coding, or textual content
creation with high accuracy in grammar and wording (Yuan and Chen 2023). These capabilities
render GenAI applications particularly interesting for knowledge work as in academia and higher
A considerable body of literature has rapidly emerged discussing how GenAI influences learning
behavior and success. GenAI seems to offer several benefits for learning, potentially supporting
academic performance. For example, numerous studies discuss how GenAI chatbots can serve as
virtual tutors. Fauzi et al. (2023) and Gilson et al. (2023) report that ChatGPT can help students
by providing accurate answers to unclear questions, which can lead to better understanding and
4
knowledge retention. In this way, GenAI has the potential to replace search engines by responding
directly to questions rather than providing information from which the answer must be pieced
together (AlAfnan et al. 2023). Pavlik (2023) and Engelmann et al. (2023) highlight the ability of
GenAI to summarize or simplify information into a shorter or less complex form. Students can use
this to understand and process textual learning material better or faster (Calderon et al. 2023;
Sallam et al. 2023). Qadir (2023) emphasizes the usefulness of ChatGPT in enhancing students’
creativity by helping them brainstorm ideas and organize their thoughts. Students can also use
GenAI as an academic writing assistant that helps with phrasing (Lund et al. 2023), correcting
grammar (Wu et al. 2023), or providing feedback (Perkins 2023). Research has also shown more
latent benefits. Cotton et al. (2023) document increased student engagement and collaboration,
while other studies mention higher motivation when studying is supported by ChatGPT (Ali et al.
2023; Fauzi et al. 2023). By providing plain language explanations, giving feedback on grammar,
less privileged students, such as non-native speakers or those with communication disabilities
Other studies report potential negative consequences of the use of GenAI on academic
performance. Markauskaite et al. (2022) argue that AI-assisted learning may decrease learning
Excessive usage also reduces opportunities for interactions with teachers and peers, impacting
social and emotional learning. Similarly, Eager and Brunton (2023) find that AI-enabled learning
determinant of academic performance (Jain and Kapoor 2015). Research also argues that students’
approaches to learning are affected by the use of GenAI. Easy access to answers without the need
5
for close and detailed engagement with materials may lead to superficial learning (Rasul et al.
2023), which could hinder students’ ability to deeply understand learning materials (Crawford et
al. 2023a). Sallam et al. (2023) discuss that the quick and easy answers from AI software impede
is no AI available in exams. Crawford et al. (2023b) mention that inaccuracies and lack of
students’ knowledge, ultimately reducing learning quality. Moreover, the extent to which GenAI
facilitates writing scientific texts and essays for students can also have negative effects on learning
(Lund et al. 2023). According to Milano et al. (2023) this diminishes the effort involved in crafting
well-written and argued texts – effort that helps in understanding course materials and which has
Research has thus identified various ways in which GenAI can support or hinder learning,
tools in higher education requires investigation of this tangible effect, which so far has not been
thoroughly explored. While existing studies have documented several individual effects of GenAI
usage on performance, our study seeks to examine its overall impact. To do so, we analyze how
students' GenAI use influences their exam scores. Focusing on exam scores provides a measure
that encapsulates the individual effects of GenAI on learning, offering a comprehensive view of
6
3 Data and Methodology
To answer the research question, we utilize the educational setting of our institution’s first-year
introductory accounting class. To detect GenAI users among the cohort, we rely on case study
essays our students submitted during the semester. The case study concerns a knowledge transfer
exercise for students to immerse themselves in the course material and enhance comprehension.
frequently used online GenAI detector. This measure of GenAI usage allows us to empirically
assess its impacts on exam performance using a fixed effects ordinary least square (OLS)
regression. In line with related research (e.g., Chiu et al. 2023; Eskew and Faley 1988), we control
for various factors that have been shown to affect exam performance. The full OLS model reads
as follows:
Our dependent variable is Exam Score, which is a continuous measure indicating the
percentage of points a student achieved in the final exam. While the minimum is zero, the actual
(achievable) maximum is 96.67 (100). The variable of interest is GenAI User, an indicator variable
taking the value one if a student uses GenAI for studying and for producing work that the instructor
intended to be written without such assistance, and zero otherwise. Based on the indicated
7
probability of our GenAI detector, we classify students as GenAI User if the text is more likely to
preparedness and achievements prior to higher education. We include A-Level Grade as a common
predictor for academic performance (e.g., Lento 2018; Massoudi et al. 2017). We also include two
of maturity and experience (e.g., Guney 2009; Hartnett et al. 2004; Voshaar et al. 2023b). Second,
we control for academic behavior by including session Attendance and the number of Attempts at
taking the final exam (e.g., Cheng and Ding 2021; Massoudi et al. 2017; Voshaar et al. 2023a).
Third, previous studies have found correlations between exam performance and gender as well as
course of study (e.g., Aldamen et al. 2015; Hu et al. 2023; Wecks et al. 2023). We include Course
of study-fixed effects and a dummy variable indicating students’ gender (Female). In addition, we
introduce a novel control variable by adding a dummy indicating whether a student is a LinkedIn
User.2 LinkedIn usage has been found to be correlated with exam performance (Paul et al. 2012).
Also, because GenAI acceptance among students is driven by personal innovativeness (Strzelecki
2023), which is also correlated with (new) social media usage (Aldahdouh et al. 2020; Wijesundara
and Xixiang 2018), we adopt LinkedIn User as a potentially important control variable for our
analysis.
1
Alternatively, and to rule out potential biases, we use higher (0.6) and lower (0.4) thresholds in ZeroGPT as well as
other GenAI detection tools to define our variable of interest (also see our robustness checks).
2
We also show results without including the variable LinkedIn User.
8
3.2 Generative AI Detection Systems
Constructing our variable of interest requires us to differentiate between students who utilize
GenAI to write their case study essays and those who do not. For this, we leverage the capabilities
of a GenAI detection tool. As the availability of GenAI models has become more widespread, the
development and dissemination of AI detection tools has also accelerated (e.g., Dalalah and
Dalalah 2023). Many detection tools are now available online for a fee or free of charge. They
process the inputted text by splitting it into individual tokens and predicting the probability that a
specific token will be followed by the subsequent sequence (Crothers et al. 2023). The detector
also analyzes a text’s perplexity, which refers to the use of random elements and idiosyncrasies
typical of human writing and speech (Walters 2023). If the detector identifies high predictability
and low perplexity, it is probable that the text is AI-written and is recognized as such. In this
instance, the GenAI text detector provides a qualitative (i.e., “Your file content is AI/GPT
generated”) or quantitative (i.e., “63%”) evaluation of the probability that the text was generated
by AI.
After assessing the capabilities of the most frequently used GenAI detectors and their
suitability for German language texts, and upon reviewing the scientific literature, we opted to use
ZeroGPT (https://www.zerogpt.com/) for multiple reasons.3 First, previous research has found
ZeroGPT to be among the best detector tools, and has been consistently and accurately identifying
AI-generated texts (Aremu 2023; Liang et al. 2023; Walters 2023; Weber-Wulff et al. 2023). Its
capabilities in detecting human-generated texts have also been shown to be precise (Aremu 2023;
Liang et al. 2023; Pegoraro et al. 2023; Weber-Wulff et al. 2023). Second, it is essential to select
3
For a comprehensive overview on the effectiveness and capabilities of 16 AI text detectors, we refer the reader to
Walters (2023).
9
a detector that does not have high rates of false positives or negatives. While false negatives lead
to GenAI use remaining undetected, false positives might lead to unwarranted accusations against
students. ZeroGPT has been shown to perform well at avoiding both (Walters 2023). Third,
ZeroGPT is specifically designed to identify content generated by the most popular GenAI models,
such as ChatGPT 3.5 and 4, Gemini, and LLaMA. Finally, ZeroGPT claims to be multilingual,
ZeroGPT uses machine learning algorithms and natural language processing techniques to
analyze textual data and identify patterns common to GenAI-generated text (Alhijawi et al. 2024;
ZeroGPT 2024). The tool offers a front-end for inputting texts, which it passes along to a pretrained
model in the back-end. It outputs the proportion of the tokens estimated to be AI-generated, which
is more detailed than the binary outputs of several other detectors (e.g., Copyleaks).
Our study is based on a broad sample of business, economics, and management students taking an
introductory financial accounting course at a German university in the winter term 2023/2024. To
obtain the required data for our analysis, we have drawn on several data sources. First, using an
online survey at the beginning of the semester, we collected data on student characteristics that
might influence exam success. Second, we retained the students’ case study essays throughout the
semester, and processed and analyzed them in terms of the use of GenAI after the final exam.
Third, we obtained the final exam scores from the central examination office to evaluate the impact
on academic performance.
Starting with an initial sample of 572 students who participated in the survey, we first
excluded students who did not hand in an essay (N = 243). Additionally, we excluded those
students who did not take the final exam (N = 127). Finally, we dropped the observations with
10
missing data for the required variables (N = 9). This leaves us with a final sample of 193 students.
Given the 502 students in the final exam, our sample accounts for about 38% of the underlying
Table 1 reports the descriptive statistics of the student characteristics for the full sample in
Panel A.4 The mean exam score is 45.39, indicating that our sample students on average fall below
the 50% threshold. This reflects the high failure rates commonly observed among higher-education
introductory accounting courses (Prinsloo et al. 2010; Sanders and Willis 2009). The binary
variable of interest GenAI User has a mean of 0.306, indicating that 30.6% of the students in our
sample (i.e., 59 students) are identified as GenAI users by ZeroGPT. The mean value of ZeroGPT
indicates that on average 35.4% of students’ texts are flagged as AI-generated. The average student
in our sample has taken the final exam for the first time (mean Attempt = 1.425) and has attended
fewer than half of the offered tutorials (mean Attendance = 0.447). A rather small subset of 19.2%
Panel B of Table 1 presents descriptive statistics divided into GenAI users and non-users.
The last column presents the results of a two-tailed t-test. The average GenAI user achieves 9.027
(p-value < 0.01) fewer exam points, has a lower A-level grade (0.152; p-value < 0.1), and a higher
number of attempts compared to non-users. This gives an initial indication of poorer exam
performance among GenAI users compared to non-users. However, because not only GenAI usage
but also student characteristics may influence exam performance, the association of GenAI usage
4
Pearson correlations are reported in Online Appendix B (https://tinyurl.com/zjehfa3n) and do not show any indication
of multicollinearity issues, as the highest absolute value is 0.331.
11
and exam performance calls for multivariate examination taking control variables into
consideration.
Table 2 shows the results of our multivariate regression. The GenAI Usage dummy variable, being
the only independent variable in column (1), displays a significantly negative coefficient. In
column (2), we add the control variables commonly found in literature explaining exam
performance. The coefficient of the variable of interest remains significant and slightly lower.
Finally, column (3) presents the results of our main model, (Equation 1).5 The control variables
primarily show significance and signs in line with the literature. For the variable of interest (GenAI
User), we continue to find a statistically significant and negative coefficient. According to the
model, students using GenAI score 6.71 points lower in the final exam, which is substantial, as the
mean student scores 45.39 points. Thus, on average, the scores of GenAI users are about 15%
lower than that of the mean non-user. Given that the passing threshold is at 41 points, GenAI use
can tip the scales toward failing the exam – at least statistically. The empirical evidence provides
a clear picture of a negative influence of students’ GenAI usage on their exam scores.
5
The link test (Pregibon 1980; Tukey 1949), a significant F-test, and the coefficient of determination (adjusted R2) all
indicate a well-fitted model. As the Breusch and Pagan (1979) and Cook and Weisberg (1983)-test detects no
heteroscedasticity (p-value of 0.56), we refrain from using robust standard errors.
12
4.2 Robustness Checks
We conduct a set of robustness checks to ensure the reliability and validity of our results. Our
initial step involves scrutinizing the robustness of the GenAI User variable. In our main analysis,
we identify GenAI users based on a threshold of over 50% in the written case study essays, as
determined by ZeroGPT. This approach yields 30.6% of our sample as GenAI users. To test
whether this share is realistic, we conduct an anonymous survey among all students in our sample
sample), 30.0% state that they used GenAI tools for the written case study, aligning well with our
measured value. Identifying about a third of our population as GenAI users is also consistent with
findings reported elsewhere in the literature. von Garrel and Mayer (2023) conducted a
representative survey among German university students and find that 34.8% of them report using
AI-based tools for studying occasionally, frequently, or very often. Considering the variation in
reported usage rates across different studies and online reports (Abdaljaleel et al. 2024), we
proceed to test alternative thresholds for the detection tool. Adjusting the threshold to 0.6 reduces
GenAI users to 20.2%, while a threshold of 0.4 increases them to 40.9%. The results in columns
(1) and (2) of Table 3 with the adjusted thresholds remain robust and unchanged.
Additionally, we ensure the robustness of our findings by using alternative detection tools.
We utilize Originality.AI, given its prominence in the literature (Walters 2023) and its claims to
confirming the observed effect, is not statistically significant. Looking at the detector score
indeed face difficulties with German language texts. Except for a few cases, all values are either
13
exactly 0%, 50%, or 100%, showing an apparent dysfunction. Despite the difficulties of
Originality.AI for German texts, the GenAI User coefficient remains negative and shows a similar
Given this potential for bias with German texts, we repeat the robustness check using an
AI detector particularly designed for the German language, developed at the University of Applied
Sciences Wedel (Tlok et al. 2023). According to the developer, this tool’s outputs are probabilities
and thus not directly comparable to other tools. As shown in Online Appendix D
and a uniform distribution across the remaining value range. A more likely than not classification
would be impractical. Instead, we run a pre-test with 12 student seminar papers written before
GenAI was available and modify them using ChatGPT 4, creating a paired sample of known GenAI
and non-GenAI texts on the same topics as our main study. The German detector consistently
shows values below 10% for human-written and above 10% for AI-generated texts. Adopting this
threshold for our robustness check, we identify 36.3% of our sample as GenAI users, which is
similar to the main analysis, our survey, and the values reported in previous literature (von Garrel
and Mayer 2023). Column (4) presents the results using the German detector. The GenAI User
coefficient is now even more negative and significant, suggesting improved accuracy due to the
robustness check by manually computing the propensity of GenAI usage. A growing body of
research analyzes AI-generated texts and how to distinguish them from human-written ones. The
literature reports that AI-generated texts typically exhibit lower readability, higher lexical richness,
and a greater number of adjectives than human-written texts (Martínez et al. 2024; Muñoz-Ortiz
14
et al. 2023; Shah et al. 2023). AI texts tend to have more words per sentence and sentences per
paragraph, contributing to lower readability scores (Deveci et al. 2023; Pehlivanoğlu et al. 2023).
Lexical richness essentially refers to the ratio of unique words measurable by the metric Herdan’s
C (Herdan 1960). As another metric, we consider the proportion of adjectives used as another
metric (Markowitz et al. 2023). We conduct a principal component analysis to consolidate these
three variables into a single vector.6 This results in a factor variable ranging from 0 to 1 indicating
AI markers. In column (5), we use this factor as variable of interest and find that the presence of
lexical characteristics of AI-generated texts in a student’s work correlates with lower exam scores,
Beyond the robustness of our measure, we also account for potential endogeneity issues.
The group-wise comparison between GenAI users and non-users in Panel B of Table 1 indicates
that GenAI users have a lower A-level grade and a higher number of attempts. Academic
preparedness or experience might affect both exam performance and GenAI usage. Our approach
already addresses potential endogeneity to some extent, as our control variables can capture such
balance the distributions of control variables between GenAI users and non-users and subsequently
repeat the analysis (Hainmueller 2012). The results in column (6) underscore our main findings
6
The Kaiser-Meyer-Olkin measure of 0.657 (Kaiser and Rice 1974) and a significant Bartlett (1937) test of sphericity
(p-value < 0.01) indicate that the variables are highly correlated and collectively measure the construct of a text being
written by GenAI.
7
This approach primarily addresses observable sample selection bias, while this issue might also arise from omitted
correlated variables (unobserved). However, our course of study-fixed effects mitigate this to some extent
(Wooldridge 1995) and the Ramsey (1969) RESET test indicates no omitted variables (p-value of 0.764). Additionally,
potential instrumental variables related to technical affinity and engagement do not show any correlation with GenAI
User, allowing us to discount endogeneity due to omitted correlated variables.
15
4.3 Additional Analyses
Our main results show a negative impact of students’ GenAI usage on their exam performance. To
further explore the effect and the mechanism at work, we complement our findings by conducting
additional analyses. We explore the documented effect for different levels of student engagement
and cognitive abilities and measure how GenAI use affects performance improvement when
First, we apply an identification strategy to further analyze the causal effect. In our first
additional analysis, we identify and match all repeating students who did not pass the exam in the
year before our main analysis, when GenAI models were not yet available for student use (Pre
GenAI).8 This leaves us with a sample of matched observations before and after broad GenAI
availability containing (Pre) GenAI Users (N = 15) and (Pre) Non-Users (N = 12). Figure 1 shows
the distribution of exam scores for each group and the differences in mean exam scores between
the groups and time periods (within group) along with their statistical significance. Due to the
The Pre GenAI period solely consists of students who failed the exam. We observe a
statistically significant improvement in exam score in the next attempt for both groups (Post
GenAI).9 However, the GenAI User group shows a substantially lower increase. While both
subsamples perform equally well in their second attempt, those in the GenAI User group reach far
8
The Pre GenAI semester ended in January 2023. The first publicly available GenAI model (ChatGPT 3) was launched
only a few weeks before the exam but had very few users, difficult access, and extensive downtime at that early stage.
Therefore, an effect on the exam performance can be ruled out, allowing us to consider this semester as a Pre GenAI
period.
9
An increased performance among repeating students aligns with related research attributing the effect to increased
commitment (e.g., Martínez and Martinez 1992; Voshaar et al. 2023a; Wecks et al. 2023).
16
more points than the Non-User group in the attempt before in the Pre GenAI period. In the attempt
after GenAI was widely available, the Non-User group increases their exam scores to a greater
extent than the GenAI User group. Consequently, we observe a learning-hindering for students
In our second additional analysis, we perform split sample analyses using two measures of
student capabilities and engagement. If the documented effect in our main results is indeed
attributable to hindering learning, the effect should be more pronounced for students where
individual learning and comprehending the content would otherwise have fallen on fertile ground.
To approximate this characteristic, we use students’ A-level grades and attendance at tutorials.
While the first addresses academic preparedness, pre-university achievements, and cognitive
abilities, the latter measures engagement. We conduct a median split for both measures to create
two samples with low and high A-level grades and attendance to gain additional insights into the
mechanism behind the effect. Table 4 presents the results of the additional split sample analysis.
Columns (1) and (2) include the two regressions for the split samples by A-level grade, while
For the split sample of students with good A-level grades (and therefore strong cognitive
of GenAI User to be highly significant and negative (column (1)). While this aligns with our main
results, the coefficient’s magnitude is almost doubled compared to the one in column (3) of
Table 2. In contrast, the coefficient is positive but insignificant for students with A-level grades
below the median (Column (2)). A similar picture emerges when utilizing students’ tutorial
attendance (and hence engagement) for median split. While the students with higher attendance
17
perform worse not only statistically significantly but to an educationally impactful extent when
using GenAI, the effect is insignificant for low-attendance students (columns (3) and (4),
respectively). This suggests that the documented negative effect of GenAI use in the main analysis
is primarily due to the effect on students with good A-level grades and high attendance.
We can conclude that the impact of GenAI use on exam performance varies depending on
students’ prior academic achievements and/or cognitive abilities as well as engagement during the
semester. We find using GenAI to be detrimental to the exam scores of higher achieving and more
engaged students. This confirms the learning-hindering mechanism as those students who would
have been well equipped to understand the learning materials suffer particularly from the forgone
opportunity to engage with the course content. When compared intra-group with other students
with good prerequisites and who prepare for and write the essays themselves, the disadvantage of
Our main results show that GenAI usage negatively affects students’ exam scores. While the
literature has found many aspects of GenAI that can have a positive or negative influence on
academic performance, it is unclear whether the benefits or the downsides prevail. We observe
clear evidence of a negative overall effect. Positive aspects such as summarizing information
(Pavlik 2023), increasing study motivation (Ali et al. 2023; Fauzi et al. 2023), or providing plain-
language explanations (Sullivan et al. 2023) may still occur. However, our results show that these
are overshadowed by the negative effects that have been described in previous studies (Crawford
repeating students and find that students opting for GenAI usage for learning and academic writing
18
purposes do not exhibit an increase in their exam scores similar to that of their peers not using
GenAI. We ascribe this result primarily to a learning-hindering mechanism when using GenAI.
For example, when students let GenAI write an essay on a complex and challenging topic, instead
of exploring, grappling with, and mastering the content and then writing the essay themselves,
students waste the opportunity to learn and to experience the inherent rewards of figuring things
out. Research such as Milano et al. (2023) warn of this, stating that GenAI’s role in facilitating
academic writing is dependable to the point of negatively impacting students’ learning. Similarly,
the ready availability of quick answers may reduce the intensity of students’ engagement with the
subject matter and ultimately deter their learning, as argued by Cotton et al. (2023), Rudolph et al.
The results of our split sample regressions further support the learning-hindering
mechanism, as students with high learning potential are especially impacted by GenAI usage. The
significantly negative effect for the students in our sample with good A-level grades or high
attendance – which can be reasonably equated to higher levels of skill or commitment – indicates
that these students have more to lose when they do not immerse themselves in the subject matter.
And indeed, we find no effect for the opposite group, who are less predisposed to assimilate
knowledge due to lower attendance or cognitive ability. We can however document the impact of
GenAI usage for these students when looking at the results of the identification strategy analysis.
This subsample solely consists of repeating students with attendance and A-Level grades that are
below average. While we document a significant performance increase in the next exam attempt,
GenAI usage hampers this improvement. Thus, the negative GenAI influence becomes evident
when there is considerable learning potential, providing further support for the learning-hindering
mechanism.
19
These findings align with the constructivist theory of learning (cf. Bada and Olusegun
2015), which highlight the importance of being actively involved in the learning process to achieve
deeper understanding and build knowledge. In this sense, writing a case study essay is valuable
not merely as an end in itself but also as an instrument to assist students in developing skills in
planning, completing, and editing their written work (Dweck 1986) as well as engaging with,
exploring, and understanding subject-related content. When students immerse themselves in their
subjects, they are more likely to experience an eureka moment that enhance comprehension. By
using GenAI for essay writing, students may bypass this essential cognitive process of
comprehension, analysis, and summarization. This might similarly occur when GenAI is used to
study for exams.10 If for instance GenAI is used to simplify or explain complex topics, students
might use it as a shortcut that makes learning seem easier, but which actually prevents them from
Our findings have important implications for students, educators, and educational
institutions. For students, the results suggest a cautious approach to using GenAI for learning.
While GenAI may appear to ease the learning process, it can adversely affect learning
performance. Students should be mindful of the potential drawbacks and consider integrating
GenAI as a supplementary tool rather than a primary resource for grappling with complex topics.
Educators likewise need to take students’ use of GenAI into account when designing curricula. It
is essential to provide tasks and learning materials that promote deep learning and minimize the
potential for GenAI to diminish engagement with the subject matter. Strategies could include
incorporating in-class discussions, handwritten assignments, and other methods that encourage
10
Our detection solely identifies GenAI use in essay writing, yet it is likely that GenAI users apply this versatile
technology for various academic tasks. This is supported by our survey, as 26.7% indicate using GenAI broadly for
academic purposes, close to the 30.0% reporting that they use it for the essay.
20
active learning and critical thinking. Lastly, this study offers valuable insights for educational
institutions regarding GenAI policies in higher education. Although our results point to negative
implications of GenAI use on student performance, we do not advocate for outright bans. As with
many revolutionary information systems, there are both positive and negative aspects of GenAI.
The negative results of this study may rather show that higher education does not yet harness the
full potential of GenAI. Educational institutions should guide educators on how to instruct students
in the proper use of GenAI and develop policies that mitigate its negative effects while amplifying
its benefits. Similar to the disruptions of higher education caused by calculators and the internet,
banning is not a practical solution. Students will inevitably encounter GenAI outside the university
setting and must learn to use it effectively rather than confining themselves to getting by without
it.
The present study contributes to the rich debate on how GenAI will affect learning in (higher)
education by evaluating the tangible effects of GenAI usage on exam performance. We address an
important research gap, as performance effects have not yet been examined but are nevertheless
crucial when discussing how to adapt education to the age of GenAI. Our findings reveal that using
GenAI tools for writing essays and likely for other learning purposes significantly decreases exam
mechanism through with GenAI usage negatively affects exam scores. Our study thus has
Some of the limitations of our study warrant further attention. First, using GenAI detector
tools to identify GenAI users among our students comes with the risk of inaccurate detection
results. Not identifying all GenAI users (or too many) in our sample would affect our findings.
21
Having said that, we improve the robustness of our results by using several different GenAI
detector tools. These tools are particularly suited to German language applications or have distinct
operating principles. We also find the share of detected GenAI users to align with the numbers
found in related research (von Garrel and Mayer 2023) and through our own anonymous survey.
Second, our results may lack generalizability as our study is limited to a financial accounting class
at one German university. Finally, our study does not account for usage behavior and hence usage
intensity. Different intensities of use might come with different impacts on exam performance.
In addressing these limitations, future research might explore students’ GenAI usage
behavior. This could help in understanding the implications more comprehensively. Future
research might also evaluate the impact of GenAI tools from an educator’s perspective. Our study
takes a student-centric viewpoint, but GenAI also affects the daily work of educators. Exploring
the threats and opportunities from this perspective would help institutions position themselves in
22
References
Abdaljaleel, M., Barakat, M., Alsanafi, M., Salim, N. A., Abazid, H., Malaeb, D., Mohammed, A. H., Hassan,
B. A. R., Wayyes, A. M., Farhan, S. S., Khatib, S. E., Rahal, M., Sahban, A., Abdelaziz, D. H., Mansour,
N. O., AlZayer, R., Khalil, R., Fekih-Romdhane, F., Hallit, R., Hallit, S., and Sallam, M. 2024. "A
Multinational Study on the Factors Influencing University Students’ Attitudes and Usage of ChatGPT,"
Scientific Reports (14:1), pp. 1983-1997.
Ahangama, S. 2023. "An Investigation of Domain-Based Social Influence on ChatGPT-Related Twitter
Data," ICIS 2023 Proceedings.
AlAfnan, M. A., Dishari, S., Jovic, M., and Lomidze, K. 2023. "ChatGPT as an Educational tool:
Opportunities, Challenges, and Recommendations for Communication, Business Writing, and
Composition Courses," Journal of Artificial Intelligence and Technology (3:2), pp. 60-68.
Alavi, M., Leidner, D. E., and Mousavi, R. 2024. "Knowledge Management Perspective of Generative
Artificial Intelligence," Journal of the Association for Information Systems (25:1), pp. 1-12.
Aldahdouh, T. Z., Nokelainen, P., and Korhonen, V. 2020. "Technology and Social Media Usage in Higher
Education: The Influence of Individual Innovativeness," SAGE Open (10:1).
Aldamen, H., Al-Esmail, R., and Hollindale, J. 2015. "Does Lecture Capturing Impact Student Performance
and Attendance in an Introductory Accounting Course?," Accounting Education (24:4), pp. 291-317.
Alhijawi, B., Jarrar, R., AbuAlRub, A., and Bader, A. 2024. "Deep Learning Detection Method for Large
Language Models-Generated Scientific Content," available at arXiv: 2403.00828.
Ali, J. K. M., Shamsan, M. A. A., Hezam, T. A., and Mohammed, A. A. Q. 2023. "Impact of ChatGPT on
Learning Motivation," Journal of English Studies in Arabia Felix (2:1).
Aremu, T. 2023. "Unlocking Pandora's Box: Unveiling the Elusive Realm of AI Text Detection," Available
at SSRN: 4470719.
Bada, S. O., and Olusegun, S. 2015. "Constructivism Learning Theory: A Paradigm for Teaching and
Learning," Journal of Research & Method in Education (5:6), pp. 66-70.
Bangert-Drowns, R., Hurley, M. M., and Wilkinson, B. 2004. "The Effects of School-Based Writing-to-
Learn Interventions on Academic Achievement: A Meta-Analysis," Review of Educational Research
(74), pp. 29-58.
Bartlett, M. S. 1937. "Properties of Sufficiency and Statistical Tests," Proceedings of the Royal Society of
London. Series A, Mathematical and Physical Sciences (160:901), pp. 268-282.
Benbya, H., Strich, F., and Tamm, T. 2024. "Navigating Generative Artificial Intelligence Promises and
Perils for Knowledge and Creative Work," Journal of the Association for Information Systems (25:1),
pp. 23-36.
Breusch, T., and Pagan, A. 1979. "A Simple Test for Heteroscedasticity and Random Coefficient Variation,"
Econometrica (47:5), pp. 1287-1294.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry,
G., and Askell, A. 2020. "Language Models are Few-Shot Learners," Advances in Neural Information
Processing Systems (33), pp. 1877-1901.
Calderon, T. G., Gao, L., and Cardoso, R. L. 2023. "Generative Artificial Intelligence in the Classroom: A
Financial Accounting Experience," in Advances in Accounting Education: Teaching and Curriculum
Innovations, T.G. Calderon (ed.). Emerald Publishing Limited, pp. 125-144.
Chen, M., Radford, A., Child, R., Wu, J., Jun, H., Luan, D., and Sutskever, I. 2020. "Generative Pretraining
from Pixels," International Conference on Machine Learning, pp. 1691-1703.
Chen, X., Zou, D., Xie, H., Cheng, G., and Liu, C. 2022. "Two Decades of Artificial Intelligence in Education
Contributors, Collaborations, Research Topics, Challenges, and Future Directions," Educational
Technology & Society (25:1), pp. 28-47.
Cheng, P., and Ding, R. 2021. "The Effect of Online Review Exercises on Student Course Engagement and
Learning Performance: A Case Study of an Introductory Financial Accounting Course at an
International Joint Venture University," Journal of Accounting Education (54:1), p. 100699.
Chhina, S., Antony, B., and Firmin, S. 2023. "Navigating the Terrain of Large Language Models in Higher
Education: A Systematic Literature Review," ACIS 2023 Proceedings.
Chiu, C., King, R., and Crossin, C. 2023. "Using Colour-Coded Digital Annotation for Enhanced Case-Based
Learning Outcomes," Accounting Education (32:2), pp. 201-221.
Cook, R. D., and Weisberg, S. 1983. "Diagnostics for Heteroscedasticity in Regression," Biometrika (70:1).
23
Cotton, D. R. E., Cotton, P. A., and Shipway, J. R. 2023. "Chatting and Cheating: Ensuring Academic
Integrity in the Era of ChatGPT," Innovations in Education and Teaching International (61:2), pp.
228-239.
Crawford, J., Cowling, M., and Allen, K.-A. 2023a. "Leadership is Needed for Ethical ChatGPT: Character,
Assessment, and Learning Using Artificial Intelligence (AI)," Journal of University Teaching &
Learning Practice (20:3).
Crawford, J., Cowling, M., Ashton-Hay, S., Kelder, J.-A., Middleton, R., and Wilson, G. S. 2023b. "Artificial
Intelligence and Authorship Editor Policy: ChatGPT, Bard, Bing AI, and Beyond," Journal of University
Teaching & Learning Practice (20:5).
Crothers, E. N., Japkowicz, N., and Viktor, H. L. 2023. "Machine-Generated Text: A Comprehensive Survey
of Threat Models and Detection Methods," IEEE Access (11), pp. 70977-71002.
Dalalah, D., and Dalalah, O. M. A. 2023. "The False Positives and False Negatives of Generative AI Detection
Tools in Education and Academic Research: The Case of ChatGPT," The International Journal of
Management Education (21:2), p. 100822.
Deveci, C. D., Baker, J. J., Sikander, B., and Rosenberg, J. 2023. "A Comparison of Cover Letters Written
by ChatGPT-4 or Humans," Danish Medical Bulletin (70:11).
Dweck, C. S. 1986. "Motivational Processes Affecting Learning," American Psychologist (41:10), pp. 1040-
1048.
Eager, B., and Brunton, R. 2023. "Prompting Higher Education Towards AI-Augmented Teaching and
Learning Practice," Journal of University Teaching & Learning Practice (20:5).
Engelmann, B., Haak, F., Kreutz, C. K., Khasmakhi, N. N., and Schaer, P. 2023. "Text Simplification of
Scientific Texts for Non-Expert Readers," available at arXiv: 2307.03569.
Eskew, R. K., and Faley, R. H. 1988. "Some Determinants of Student Performance in the First College-Level
Financial Accounting Course," The Accounting Review (63:1), pp. 137–147.
Fauzi, F., Tuhuteru, L., Sampe, F., Ausat, A. M. A., and Hatta, H. R. 2023. "Analysing the Role of ChatGPT
in Improving Student Productivity in Higher Education," Journal on Education (5:4), pp. 14886-14891.
Ferrara, E. 2024. "GenAI Against Humanity: Nefarious Applications of Generative Artificial Intelligence
and Large Language Models," Journal of Computational Social Science (Forthcoming).
Feuerriegel, S., Hartmann, J., Janiesch, C., and Zschech, P. 2024. "Generative AI," Business & Information
Systems Engineering (66:1), pp. 111-126.
Gilson, A., Safranek, C. W., Huang, T., Socrates, V., Chi, L., Taylor, R. A., and Chartash, D. 2023. "How Does
ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of
Large Language Models for Medical Education and Knowledge Assessment," JMIR Medical Education
(9:1).
Gregor, S. 2024. "Responsible Artificial Intelligence and Journal Publishing," Journal of the Association
for Information Systems (25:1), pp. 48-60.
Guney, Y. 2009. "Exogenous and Endogenous Factors Influencing Students' Performance in
Undergraduate Accounting Modules," Accounting Education (18:1), pp. 51-73.
Gunning, R. 1952. The Technique of Clear Writing. McGraw-Hill.
Hainmueller, J. 2012. "Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to
Produce Balanced Samples in Observational Studies," Political Analysis (20:1), pp. 25-46.
Hartnett, N., Römcke, J., and Yap, C. 2004. "Student Performance in Tertiary‐Level Accounting: An
International Student Focus," Accounting & Finance (44:2), pp. 163-185.
Herdan, G. 1960. Type-Token Mathematics. The Hague: Mouton.
Hill, A. D., Johnson, S. G., Greco, L. M., O’Boyle, E. H., and Walter, S. L. 2021. "Endogeneity: A Review and
Agenda for the Methodology-Practice Divide Affecting Micro and Macro Research," Journal of
Management (47:1), pp. 105-143.
Hsu, T., and Thompson, S. A. 2023. "Disinformation Researchers Raise Alarms About A.I. Chatbots." The
New York Times, from https://www.nytimes.com/2023/02/08/technology/ai-chatbots-
disinformation.html
Hu, Y., Nath, N., Zhu, Y., and Laswad, F. 2023. "Accounting Students’ Online Engagement, Choice of Course
Delivery Format and Their Effects on Academic Performance," Accounting Education (32), pp. 1-36.
Jain, T., and Kapoor, M. 2015. "The Impact of Study Groups and Roommates on Academic Performance,"
Review of Economics and Statistics (97:1), pp. 44-54.
Johnson, A. 2023. "ChatGPT in Schools: Here’s Where it’s Banned - and How it Could Potentially Help
Students." Forbes.
24
Kaiser, H. F., and Rice, J. 1974. "Little Jiffy," Educational and Psychological Measurement (34:1), pp. 111-
117.
Kasneci, E., Sessler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., Gasser, U., Groh, G.,
Günnemann, S., Hüllermeier, E., Krusche, S., Kutyniok, G., Michaeli, T., Nerdel, C., Pfeffer, J., Poquet,
O., Sailer, M., Schmidt, A., Seidel, T., Stadler, M., Weller, J., Kuhn, J., and Kasneci, G. 2023. "ChatGPT
for Good? On Opportunities and Challenges of Large Language Models for Education," Learning and
Individual Differences (103).
Katavic, R., Pahuja, A., and Syed, T. A. 2023. "Navigating the Use of ChatGPT in Classrooms: A Study of
Student Experiences," ICIS 2023 Proceedings.
Kishore, S., Hong, Y., Nguyen, A., and Qutab, S. 2023. "Should ChatGPT be Banned at Schools? Organizing
Visions for Generative Artificial Intelligence (AI) in Education," ICIS 2023 Proceedings.
Lee, M. 2023. "A Mathematical Investigation of Hallucination and Creativity in GPT Models," Mathematics
(11:10).
Lento, C. 2018. "Student Usage of Assessment-Based and Self-Study Online Learning Resources in
Introductory Accounting," Issues in Accounting Education (33:4), pp. 13–31.
Liang, W., Yuksekgonul, M., Mao, Y., Wu, E., and Zou, J. 2023. "GPT Detectors are Biased Against Non-
Native English Writers," available at ArXiv: 2304.02819.
Lund, B. D., Wang, T., Mannuru, N. R., Nie, B., Shimray, S., and Wang, Z. 2023. "ChatGPT and a New
Academic Reality: Artificial Intelligence‐Written Research Papers and the Ethics of the Large Language
Models in Scholarly Publishing," Journal of the Association for Information Science and Technology
(74:5), pp. 570-581.
Markauskaite, L., Marrone, R., Poquet, O., Knight, S., Martinez-Maldonado, R., Howard, S., Tondeur, J.,
De Laat, M., Buckingham Shum, S., Gašević, D., and Siemens, G. 2022. "Rethinking the Entwinement
Between Artificial Intelligence and Human Learning: What Capabilities Do Learners Need for a World
with AI?," Computers and Education: Artificial Intelligence (3), p. 100056.
Markowitz, D. M., Hancock, J. T., and Bailenson, J. N. 2023. "Linguistic Markers of Inherently False AI
Communication and Intentionally False Human Communication: Evidence From Hotel Reviews,"
Journal of Language and Social Psychology (43:1), pp. 63-82.
Martínez, G., Hernández, J. A., Conde, J., Reviriego, P., and Merino, E. 2024. "Beware of Words: Evaluating
the Lexical Richness of Conversational Large Language Models," available at arXiv: 2402.15518.
Martínez, J. G., and Martinez, N. C. 1992. "Re-Examining Repeated Testing and Teacher Effects in a
Remedial Mathematics Course," The British Journal of Educational Psychology (62:3), pp. 356-363.
Massoudi, D., Koh, S., Hancock, P. J., and Fung, L. 2017. "The Effectiveness of Usage of Online Multiple
Choice Questions on Student Performance in Introductory Accounting," Issues in Accounting
Education (32:4), pp. 1-17.
Milano, S., McGrane, J. A., and Leonelli, S. 2023. "Large Language Models Challenge the Future of Higher
Education," Nature Machine Intelligence (5:4), pp. 333-334.
Mu, N., Kirillov, A., Wagner, D., and Xie, S. 2022. "Slip: Self-Supervision Meets Language-Image Pre-
raining," European Conference on Computer Vision: Springer, pp. 529-544.
Muñoz-Ortiz, A., Gómez-Rodríguez, C., and Vilares, D. 2023. "Contrasting Linguistic Patterns in Human
and LLM-Generated Text," available at ArXiv: 2308.09067.
Ngo, T. T. A. 2023. "The Perception by University Students of the Use of ChatGPT in Education,"
International Journal of Emerging Technologies in Learning (18:17).
Originality.AI. 2024. "Originality AI Plagiarism and Fact Checker." from https://originality.ai/
Paul, J. A., Baker, H. M., and Cochran, J. D. 2012. "Effect of Online Social Networking on Student Academic
Performance," Computers in Human Behavior (28:6), pp. 2117-2127.
Pavlik, J. V. 2023. "Collaborating With ChatGPT: Considering the Implications of Generative Artificial
Intelligence for Journalism and Media Education," Journalism & Mass Communication Educator
(78:1), pp. 84-93.
Pegoraro, A., Kumari, K., Fereidooni, H., and Sadeghi, A.-R. 2023. "To ChatGPT, or Not To ChatGPT: That
is the Question!," available at arXiv: 2304.01487.
Pehlivanoğlu, M. K., Syakura, M. A., and Duru, N. 2023. "Enhancing Paraphrasing in Chatbots Through
Prompt Engineering: A Comparative Study on ChatGPT, Bing, and Bard," 8th International Conference
on Computer Science and Engineering: IEEE, pp. 432-437.
Perkins, M. 2023. "Academic Integrity Considerations of AI Large Language Models in the Post-Pandemic
Era: ChatGPT and Beyond," Journal of University Teaching and Learning Practice (20:2).
25
Pregibon, D. 1980. "Goodness of Link Tests for Generalized Linear Models," Applied Statistics (29:1), pp.
15-14.
Prinsloo, P., Müller, H., and Du Plessis, A. 2010. "Raising awareness of the risk of failure in first-year
accounting students," Accounting Education: an international journal (19:1-2), pp. 203-218.
Qadir, J. 2023. "Engineering Education in the Era of ChatGPT: Promise and Pitfalls of Generative AI for
Education," IEEE Global Engineering Education Conference IEEE, pp. 1-9.
Radford, A., Narasimhan, K., Salimans, T., and Sutskever, I. 2018. "Improving Language Understanding by
Generative Pre-Training." OpenAI
Ramsey, J. B. 1969. "Tests for Specification Errors in Classical Linear Least-Squares Regression Analysis,"
Journal of the Royal Statistical Society Series B: Statistical Methodology (31:2), pp. 350-371.
Rasul, T., Nair, S., Kalendra, D., Robin, M., de Oliveira Santini, F., Ladeira, W. J., Sun, M., Day, I., Rather,
R. A., and Heathcote, L. 2023. "The Role of ChatGPT in Higher Education: Benefits, Challenges, and
Future Research Directions," Journal of Applied Learning and Teaching (6:1).
Rudolph, J., Tan, S., and Tan, S. 2023. "War of the Chatbots: Bard, Bing Chat, ChatGPT, Ernie and Beyond.
The New AI Gold Rush and Its Impact on Higher Education," Journal of Applied Learning and
Teaching (6:1).
Sallam, M., Salim, N. A., Barakat, M., and Ala'a, B. 2023. "ChatGPT Applications in Medical, Dental,
Pharmacy, and Public Health Education: A Descriptive Study Highlighting the Advantages and
Limitations," Narra J (3:1).
Sanders, D. E., and Willis, V. F. 2009. "Setting the PACE for Student Success in Intermediate Accounting,"
Issues in Accounting Education (24:3), pp. 319-337.
Shah, A., Ranka, P., Dedhia, U., Prasad, S., Muni, S., and Bhowmick, K. 2023. "Detecting and Unmasking
Ai-Generated Texts Through Explainable Artificial Intelligence Using Stylistic Features," International
Journal of Advanced Computer Science and Applications (14:10).
Strzelecki, A. 2023. "To Use or Not to Use ChatGPT in Higher Education? A Study of Students’ Acceptance
and Use of Technology," Interactive Learning Environments (31), pp. 1-14.
Sullivan, M., Kelly, A., and McLaughlan, P. 2023. "ChatGPT in Higher Education: Considerations for
Academic Integrity and Student Learning," Journal of Applied Learning & Teaching (6:1).
Tao, Y., Yoo, C., and Animesh, A. 2023. "AI Plus Other Technologies? The Impact of ChatGPT and Creativity
Support Systems on Individual Creativity," ICIS 2023 Proceedings.
Teubner, T., Flath, C. M., Weinhardt, C., van der Aalst, W., and Hinz, O. 2023. "Welcome to the Era of
ChatGPT et al," Business & Information Systems Engineering (65:2), pp. 95-101.
Tiwari, C. K., Bhat, M. A., Khan, S. T., Subramaniam, R., and Khan, M. A. I. 2023. "What Drives Students
Toward ChatGPT? An Investigation of the Factors Influencing Adoption and Usage of ChatGPT,"
Interactive Technology and Smart Education (2023).
Tlok, T., Annuth, H., and Pawłowski, M. 2023. "Robuste Erkennung von KI-generierten Texten in deutscher
Sprache," in: Master Thesis at Department of Data Science & Artificial Intelligence – FH Wedel.
Tukey, J. W. 1949. "One Degree of Freedom for Non-additivity," Biometrics (5:3), pp. 232-242.
Van Slyke, C., Johnson, R., and Sarabadani, J. 2023. "Generative Artificial Intelligence in Information
Systems Education: Challenges, Consequences, and Responses," Communications of the Association
for Information Systems (53:1), pp. 1-21.
von Brackel-Schmidt, C., Kučević, E., Memmert, L., Tavanapour, N., Cvetkovic, I., Bittner, E. A., and
Böhmann, T. 2023. "A User-Centric Taxonomy for Conversational Generative Language Models," ICIS
2023 Proceedings.
von Garrel, J., and Mayer, J. 2023. "Artificial Intelligence in Studies - Use of ChatGPT and Ai-Based Tools
Among Students in Germany," Humanities and Social Sciences Communications (10:1), p. 799.
Voshaar, J., Knipp, M., Loy, T., Zimmermann, J., and Johannsen, F. 2023a. "The impact of using a mobile
app on learning success in accounting education," Accounting Education), pp. 222-247.
Voshaar, J., Wecks, J. O., Johannsen, F., Knipp, M., Loy, T., and Zimmermann, J. 2023b. "Supporting
Students in the Transition to Higher Education: Evidence from a Mobile App in Accounting Education,"
International Conference in Information Systems, Hyderabad.
Walters, W. H. 2023. "The Effectiveness of Software Designed to Detect AI-Generated Writing: A
Comparison of 16 AI Text Detectors," Open Information Science (7:1), p. 20220158.
Weber-Wulff, D., Anohina-Naumeca, A., Bjelobaba, S., Foltýnek, T., Guerrero-Dib, J., Popoola, O., Šigut,
P., and Waddington, L. 2023. "Testing of Detection Tools for AI-Generated Text," International
Journal for Educational Integrity (19:1), p. 26.
26
Wecks, J. O., Voshaar, J., and Zimmermann, J. 2023. "Using Machine Learning to Address Individual
Learning Needs in Accounting Education," Available at SSRN).
Wijesundara, T. R., and Xixiang, S. 2018. "Social Networking Sites Acceptance: The Role of Personal
Innovativeness in Information Technology," International Journal of Business and Management
(13:8).
Wilson, D., Burnett, P., Valacich, J. S., and Jenkins, J. 2023. "Human or AI? Using Digital Behavior to Verify
Essay Authorship," ICIS 2023 Proceedings.
Wooldridge, J. M. 1995. "Selection Corrections for Panel Data Models Under Conditional Mean
Independence Assumptions," Journal of Econometrics (68:1), pp. 115-132.
Wu, H., Wang, W., Wan, Y., Jiao, W., and Lyu, M. R. 2023. "ChatGPT or Grammarly? Evaluating ChatGPT
on Grammatical Error Correction Benchmark," available at ArXiv: 2303.13648.
Yuan, Z., and Chen, H. 2023. "The Impact of ChatGPT on the Demand for Human Content Generating and
Editing Services: Evidence from an Online Labor Market," ICIS 2023 Proceedings.
ZeroGPT. 2024. "AI Text Detector." from https://www.zerogpt.com
Zhu, Q., Zhang, X., and Luo, J. 2023. "Biologically Inspired Design Concept Generation Using Generative
Pre-Trained Transformers," Journal of Mechanical Design (145:4), p. 041409.
27
Figures and Tables
|Mean
Group Pre GenAI Post GenAI
Diff|
GenAI
User 12.76**
(N = 15)
Non-User
18.91***
(N = 12)
|Mean
6.86 0.72
Diff|
Figure 1 presents the bootstrap distributions (1,000 replications) and the 95% confidence interval of the exam score
from the four groups of (Pre) GenAI User and Non-User before and after the public release of GenAI. Also, the
mean differences between GenAI user and non-user performance are reported as absolute values and tested by a
paired sample (two-sample) t-test for within- (between-)group difference. ***, **, * indicate statistical significance
at the 1%, 5%, and 10% level, respectively (two-tailed).
28
Panel A:
N Mean Median SD P25 P75
Student Data
Exam Score 193 45.39 45.83 22.23 27.50 61.11
GenAI User 193 0.306 0.462
ZeroGPT 193 0.354 0.296 0.277 0.119 0.556
A-Level Grade 193 2.290 2.200 0.607 1.800 2.700
Attempt 193 1.425 1 1.223 1 1
Attendance (relative) 193 0.447 0.444 0.322 0.111 0.778
Vocational Training 193 0.135 0.342
Voluntary Service 193 0.363 0.482
Female 193 0.472 0.500
LinkedIn User 193 0.192 0.395
Panel B:
GenAI User Non-User
Student Data by GenAI Usage |Diff.|
Variables N Mean N Mean
Exam Score 59 39.120 134 48.147 9.027 ***
A-Level Grade 59 2.184 134 2.337 0.152 *
Attempt 59 1.763 134 1.276 0.486 **
Attendance (relative) 59 0.413 134 0.463 0.051
Vocational Training 59 0.085 134 0.157 0.072
Voluntary Service 59 0.322 134 0.381 0.059
Female 59 0.390 134 0.508 0.118
LinkedIn User 59 0.221 134 0.179 0.041
Table 1 presents the descriptive statistics of student characteristics in Panel A. For binary variables, only means and
standard deviations are presented. Panel B shows student characteristics disaggregated by GenAI usage. The last
column presents the difference in mean values and the significance level of a two-tailed t-test (chi-squared test) for
continuous (binary) variables. ***, **, * indicate statistical significance at the 1%, 5%, and 10% level, respectively.
29
Variables (1) (2) (3)
GenAI User -8.48 ** -6.32 ** -6.71 **
(-2.40) (-2.01) (-2.17)
A-Level Grade 12.04 *** 11.42 ***
(4.98) (4.79)
Attempt 0.93 0.76
(0.77) (0.63)
Attendance (relative) 17.99 *** 18.30 ***
(3.73) (3.86)
Vocational Training 10.54 ** 10.80 **
(2.41) (2.52)
Voluntary Service 2.20 1.76
(0.72) (0.59)
Female -9.54 *** -10.16 ***
(-3.23) (-3.50)
LinkedIn User 9.60 ***
(2.75)
Constant Included Included Included
Course of Study-Fixed Effects Included Included Included
N 193 193 193
Adj. R² 0.03 0.28 0.30
Table 1 presents the results of regressing GenAI usage on the exam score. Column (1) depicts the standalone effect
of the GenAI usage dummy. In column (2), we add control variables commonly found in literature explaining exam
performance. Column (3) adds another control variable to better explain the independent variable representing our
main results. Bold font indicates the variable of interest. ***, **, * indicate statistical significance at the 1%, 5%, and
10% level (two-tailed), respectively. t-values are presented in parentheses. All variables are defined in Appendix A
(https://tinyurl.com/zjehfa3n).
Table 2. Regression Results on the Impact of GenAI Usage on Exam Performance
30
(1) (2) (3) (4) (5) (6)
Threshold 0.4 Threshold 0.6 Originality.AI German Manual Balanced
Variables Detector Computation Sample
GenAI User -7.10 ** -7.17 ** -4.14 -8.53 *** -2.66 *** -6.51 **
(-2.43) (-2.02) (-1.38) (-2.90) (-2.70) (-2.07)
A-Level Grade 11.64 *** 11.42 *** 11.42 *** 11.21 *** 11.62 *** 8.88 ***
(4.92) (4.78) (4.73) (4.75) (4.87) (3.03)
Attempt 0.89 0.84 0.29 0.94 0.79 2.05 **
(0.74) (0.69) (0.24) (0.79) (0.66) (1.98)
Attendance (relative) 17.70 *** 18.38 *** 18.01 *** 16.81 *** 15.55 *** 17.75 ***
(3.75) (3.87) (3.78) (3.58) (3.25) (2.95)
Vocational Training 10.37 ** 11.25 *** 11.98 *** 10.93 ** 12.09 *** 8.75
(2.41) (2.63) (2.79) (2.58) (2.86) (1.65)
Voluntary Service 1.81 2.07 2.13 1.57 1.24 1.52
(0.60) (0.69) (0.71) (0.53) (0.41) (0.45)
Female -10.14 *** -9.29 *** -9.97 *** -10.20 *** -10.18 *** -10.49 ***
(-3.50) (-3.18) (-3.41) (-3.54) (-3.50) (-3.22)
LinkedIn User 10.75 *** 9.25 *** 9.05 ** 10.26 *** 7.89 ** 10.53 ***
(3.05) (2.65) (2.58) (2.96) (2.25) (2.82)
Constant Included Included Included Included Included Included
Course of Study-FE Included Included Included Included Included Included
N 193 193 193 193 193 193
Adj. R² 0.31 0.30 0.29 0.32 0.31 0.20
Table 3 presents the results of the robustness checks. In columns (1) and (2), we reduced (> 0.4) or increased (> 0.6) the threshold
of the AI detector value to be classified in the GenAI User group. Columns (3) and (4) use alternative AI detectors. Column (5)
includes a manual computed score that represents AI detection. In column (6), we again present our main results but with an
entropy-balanced sample. Bold font indicates the variable of interest. ***, **, * indicate statistical significance at the 1%, 5%, and
10% level (two-tailed), respectively. t-values are presented in parentheses. All variables are defined in Appendix A
(https://tinyurl.com/zjehfa3n).
31
(1) (2) (3) (4)
Variables Higher Lower Higher Lower
A-Level Grade A-Level Grade Attendance Attendance
ChatGPT User -12.27 *** 2.09 -11.90 *** -2.92
(-2.73) (0.46) (-2.74) (-0.64)
A-Level Grade 11.96 ** 24.13 *** 11.58 *** 12.64 ***
(2.37) (3.25) (3.60) (3.31)
Attempt -2.03 1.21 -1.78 2.27
(-0.87) (0.87) (-0.71) (1.44)
Attendance (relative) 17.18 ** 12.07 * 18.62 34.93 *
(2.24) (1.89) (1.65) (1.83)
Vocational Training 4.41 24.40 *** 10.33 * 11.93 *
(0.76) (3.80) (1.77) (1.76)
Voluntary Service -1.17 3.97 -2.80 8.47 *
(-0.27) (0.94) (-0.67) (1.83)
Female -9.65 ** -9.44 ** -11.07 *** -8.22 *
(-2.25) (-2.30) (-2.76) (-1.71)
LinkedIn User 9.87 ** 6.00 16.44 *** 0.81
(2.12) (1.12) (3.37) (0.15)
Constant Included Included Included Included
Course of Study-FE Included Included Included Included
N 103 90 104 89
Adj. R² 0.27 0.28 0.30 0.12
Table 4 presents the regression results using split samples. In columns (1) and (2), we repeat our main regression
analysis on a restricted sample only containing students with above- (below-)median A-Level Grade. Columns (3)
and (4) present the main regression separately for students with above- and below-median attendance. Bold font
indicates the variable of interest. ***, **, * indicate statistical significance at the 1%, 5%, and 10% level (two-tailed),
respectively. t-values are presented in parentheses. All variables are defined in Appendix A
(https://tinyurl.com/zjehfa3n).
Table 4. Results of Split Sample Regressions
32