Gamification
Gamification
A R T I C L E I N F O A B S T R A C T
Keywords: The advent of Generative Artficial Intelligence (GenAI) and large language models like LLama, Palm2, GPT,
Applications in subject areas Gemini, and Claude has revolutionized education by generating human-like text and contextually relevant
Post-secondary education responses. Our research investigates the impact of structured prompt training on students’ learning in data analysis
Teaching/learning strategies
and programming. We experimented with 157 first-year engineering students divided into three groups: a control
group (internet access, no GenAI), an experimental group 1 (internet and GenAI without prompt training), and an
experimental group 2 (internet and GenAI with prompt training). The prompt training session included techniques
like few-shot prompting, chain prompting, and the CLEAR framework. We assessed participants’ performance
in data analysis tasks using Python, with pre-tests and post-tests measuring their skills in programming across
three Bloom’s taxonomy levels (understanding, application, and analysis). ANOVA on post-test scores showed
significant differences among the groups, with G3 (with prompt training) outperforming G2 (without prompt
training) and the control group across all three levels, evidenced by higher mean scores (G3: 6.60, G2: 4.94,
Control: 4.28), similar pattern observed in task completion also. These results underscore the effectiveness of
structured prompt training in enhancing students’ data analysis and programming skills. Our study highlights
the potential of GenAI and structured prompt training to transform educational practices and suggests future
research directions, including integrating prompt engineering within human-AI collaboration.
* Corresponding author.
E-mail addresses: arggarg1996@gmail.com (A. Garg), nisumba_soodhani@iitb.ac.in (K. Nisumba Soodhani), ramkumar.rajendran@iitb.ac.in (R. Rajendran).
1
LLaMA, PaLM2, GPT-3, and Claude 2.
https://doi.org/10.1016/j.caeai.2025.100380
Received 1 October 2024; Received in revised form 10 January 2025; Accepted 3 February 2025
2
A. Garg, K. Nisumba Soodhani and R. Rajendran
Computers and Education: Arti cial Intelligence 8 (2025) 100380
Extending beyond, generative AI has shown remarkable promise in out specific training data, as demonstrated by Radford et al. (2019).
the domain of programming also, recent studies have illuminated the Next, few-shot prompting introduces a few examples to improve model
impactful role of generative AI in this field. An experimental study show understanding, with Brown et al. (2020) showing that providing high
cased how the programming tool Codex, powered by generative AI, out quality examples enhances performance on complex tasks. Chain-of
performed learners in a CS1 class on a rainfall problem, ranking in the thought (CoT) prompting, introduced by Wei et al. (2022), advances
top quartile (Denny et al., 2023). Another investigation used the flake8 the technique by enabling step-by-step reasoning, significantly improv
tool to assess code generated by AI against the PEP8 coding style, reveal ing performance in reasoning tasks. To automate this process, Zhang et
ing a minimal syntax error rate of 2.88% (Feng et al., 2023). A notable al. (2022) proposed Automatic Chain-of-Thought (Auto-CoT) prompt
study involved Github’s generative AI platform, which initially failed to ing, which generates diverse reasoning chains automatically, improving
solve 87 Python problems; however, applying prompt engineering tech accuracy and robustness without manual effort. Further rfining CoT
niques enabled it to resolve approximately 60.9% of them successfully prompting, Wang et al. (2022) introduced self-consistency, generating
(Finnie-Ansley et al., 2022). These findings collectively highlight the ef multiple reasoning chains and selecting the most consistent final an
ficacy of generative AI in code generation. Despite these advancements, swer to enhance accuracy through diverse approaches. Finally, Logical
there is a notable gap in empirical research examining the ifluence Chain-of-Thought (LogiCoT) prompting, proposed by Zhao et al. (2023),
of prompt training in the education domain. While prompt engineering integrates symbolic logic principles to verify each reasoning step, reduc
has been shown to enhance the performance of generative AI in solving ing errors and hallucinations.
problems, its application and effectiveness in the context of data analysis To capture the nuance of these prompting techniques, frameworks
education remain underexplored. This area needs further investigation are designed to facilitate their practical implementation in real-world
to determine how targeted prompt training can improve learning out applications. Moreover, prompting frameworks play a crucial role in
comes and practical skills for students in data analysis courses. bridging the gap between the model’s capabilities and the practical
needs of users. They provide the necessary infrastructure for integrating
2.2. Challenges in using GenAI external tools, maintaining historical information, and ensuring struc
tured and safe outputs (Liu et al., 2023). For example, frameworks like
Alongside these advancements, generative AI also faces significant LangChain and Semantic Kernel enable LLMs to interact with databases,
challenges. One major issue is hallucination in AI models, where the gen web browsers, and other external systems, thus overcoming the LLMs’
erated content deviates significantly from practical constraints or reality inherent limitations and extending their applicability (Liu et al., 2023).
(Mukherjee & Chang, 2023). This problem arises because AI models, In this study we have chosen the CLEAR (Concise, Logical, Explicit,
unlike traditional rule-based systems, learn from examples encountered Adaptive, Rflective) framework as mentioned in Table 1, offering a
during their training phase and lack an explicitly codfied set of rules to systematic strategy for crafting prompts that harness the full potential
separate fact from fiction (Varsha, 2023). Consequently, AI models may of AI language models. It’s not merely about the individual components
produce creative content that deviates too far from practical constraints of clarity, context, formatting, or verbosity control; it’s about how these
(hallucination) or remains too rigidly within the cofines of existing elements synergize through a comprehensive, adaptable framework to
data (memorization) (Prasai, 2023) (Ji et al., 2023). Striking a balance elevate the entire communication process with AI (Lo, 2023).
between novelty and usefulness is crucial for generating original con
tent that is both relevant and practically applicable in a given context 3. Methods
(Mukherjee & Chang, 2023). Understanding and mitigating hallucina
tion is essential for developing AI systems that generate trustworthy 3.1. Experimental procedure
and practical content (Sovrano et al., 2023). Additionally, AI models,
trained on vast datasets, generate content based on probabilities with The study emphasizes two critical data analysis concepts: data aggre
out understanding real-world contexts, leading to potentially misleading gation and merging. Aggregation simplfies data, revealing trends and
and irrelevant outputs (Ebert & Louridas, 2023). Ensuring the accuracy easing novices into complex tasks, much like learning the alphabet be
and reliability of AI-generated content is crucial. fore forming sentences (McKinney, 2022). Merging integrates diverse
Ways to address these challenges are included both from the de datasets, essential for the data landscape, offering a unfied view (McK
signer’s end and from the learners’ end. Improvements by fine-tuning inney, 2022). The tasks of data aggregation and merging that are to be
the model and development of human-AI collaboration are from the completed are designed in a way that cannot be completed with non
designers’ perspective (Leiser et al., 2023). From the learner’s end, programming software like Excel and Tableau. We have selected Python
one promising solution to these challenges can be prompt engineering. programming because of its relevance in real-world applications, along
Prompt engineering extends the capabilities of large language models with market demand, as highlighted by Ying and Zhang (2019). The
(LLMs) and vision-language models (VLMs) by providing task-specific dataset includes dummy data of tablet usage by school students from
instructions (Sahoo et al., 2024). These prompts guide model outputs September to January, capturing attributes such as student video us
without modifying core parameters, allowing seamless integration into age, student ID, school ID, view count, and last access date and time.
diverse tasks (Sahoo et al., 2024) (Shah, 2024). This approach enhances Each month’s dataset contains over 10,000 observations, providing a
model efficacy, enabling success in applications ranging from question comprehensive view of student engagement with video content. This
answering to commonsense reasoning. Prompting leverages pre-trained dataset draws inspiration from a school education program where stu
knowledge to generate desired behaviors, enhancing adaptability and dents are provided tablets to enhance learning. The following are the
reducing the need for extensive retraining (Kucharavy, 2024) (Sahoo et problem statements of the task:
al., 2024) (Shah, 2024). By systematically categorizing various prompt
ing techniques and highlighting their strengths and limitations, prompt • T1: Calculate the total daily video usage for each student across
engineering offers a practical way to address the challenges posed by all months. (In the dataset, students watched the content multiple
generative AI (Brown et al., 2020) (Wei et al., 2022) (Radford et al., times a day. Participants need to calculate the total daily video us
2019). age for each day and repeat this task for all four months of the
dataset October to January).
2.3. Prompt engineering • T2: Given the unique data capture cycle of student video usage (the
26th of one month to the 25th of the next), compute the monthly
The journey of prompt engineering begins with zero-shot prompt total video usage for each student. For example, compute the total
ing, where models perform tasks based on general instructions with video usage for October (1st October to 31st October).
3
A. Garg, K. Nisumba Soodhani and R. Rajendran
Computers and Education: Arti cial Intelligence 8 (2025) 100380
Table 1
Elements of the CLEAR framework provide guidance for writing structured prompt (Lo, 2023).
Characteristic Description
Concise Prompt should be clear and focused on the task’s core elements to guide AI towards relevant and
precise responses.
Logical Prompts need a logical flow of ideas, helping AI understand context and relationships between
concepts, resulting in coherent outputs.
Explicit Prompts must clearly specify the expected output format, content, or scope to avoid irrelevant
responses.
Adaptive Prompts should be flexible, allowing experimentation with different structures and settings to
balance creativity and specficity.
Rflective Continuous evaluation and rfinement of prompts are crucial, using insights from previous
responses to improve future interactions.
• T3: Calculate the monthly video usage for each school over all the highlighted critical dimensions of student data literacy, informing the
months. (Each school has multiple students, so aggregate the to survey’s emphasis on tangible skill assessment (Oguguo et al., 2020).
tal video usage for each month and compute the total video usage Zheng provided methodological insights into teaching and learning data
during the intervention period for each school). analytics, which guided item development regarding practical AI usage
(Zheng, 2019). Finally, da Silveira et al. underscored the importance of
Datasets and tasks are provided to students, and the output CSV/Excel programming languages in cultivating analytical prficiency, shaping
file is collected at the end of the task. the survey’s focus on technical fluency (Balreira et al., 2023). Together,
these references helped ensure that the survey items captured both the
3.2. Participants theoretical underpinnings of data literacy and its real-world application
within generative AI contexts.
We grouped learners into three groups - the control group (access Additionally, the data literacy survey questionnaire was meticu
to the internet but no access to GenAI), experimental group 1 (access lously reviewed by experts to ensure alignment with the curriculum
to the internet and GenAI without prompt training), and experimen requirements of engineering colleges in India. Further statistical evalu
tal group 2 (access to the internet and GenAI with prompt training). ations, including a Cronbach’s alpha value of 0.71, cofirmed the ques
In total 157 learners participated in this study all were first-year un tionnaire’s reliability, making it a robust tool for assessing data literacy
dergraduate students from an engineering college. Though the students skills within this educational context.
were from different streams all of them are first-year undergraduate stu
dents for whom the curriculum remains the same regardless of their 3.3.2. Pre-test and post-test of domain knowledge
specialization. The stream-specific courses start from the second year A set of 15 multiple choice questions was used for the pre-test and
onwards. post-test focusing on the concepts of aggregation and merging. These
The sample size is predicated upon several critical metrics, namely questions assessed the participants’ comprehensive understanding of the
effect size, power, and alpha level. According to the methodologies out concepts through Python programming. Designed across three levels of
lined by Endres et al. (2017), an effect size of 0.35 was selected, repre Bloom’s taxonomy - understanding (L1), applying (L2) and analyzing
senting the expected magnitude of differences between the groups. To (L3), the test comprised five questions each for each level. The questions
achieve a statistically robust analysis, a power level of 0.8 was estab were adapted to assess concepts of data aggregation and data merging
lished. This power level signfies an 80% probability of detecting true in Python, drawing inspiration from the official Pandas documentation
effects, thereby minimizing the risk of Type II errors (Cohen, 2013). and the book Python for Data Analysis by McKinney (2022).
An alpha level of 0.05 was set to control for the likelihood of Type
I errors, ensuring the precision and accuracy of the results. Utilizing • Item Development and Expert Review: Each item underwent face-
these parameters, a GPower analysis was conducted to determine the re and content-validity checks with domain experts (academics and
quired sample size for the study. GPower, a widely recognized tool for industry professionals). We rfined or removed ambiguous items
power analysis, indicated that a sample size of 125 participants would during the development phase based on expert feedback. This re
be necessary to achieve reliable results using the one-way ANOVA tech view ensured that questions accurately assessed data aggregation
nique. and merging skills at the intended Bloom’s taxonomy level.
To facilitate effective comparative analysis, the study flow for var • Statistical Item Analysis: We conducted item analysis on the pre
ious groups is shown in Fig. 1. A survey on familiarity with the use of test scores to evaluate question quality. Cronbach’s alpha was 0.42
GenAI was collected followed by pre-test, tasks, and post-test. Prompt for the overall test, while modest, was deemed acceptable given
training was conducted for group 3 before the start of the task. Ethical the exploratory nature and specific context of this study. While this
approval was obtained from the Institute Review Board of the authors’ value is below the conventional benchmark of 0.70, we believe it
institution. can be partly attributed to:
– The short test length (15 items split across three distinct cognitive
3.3. Measures levels).
– The multidimensional nature of the instrument (covering under
3.3.1. Data literacy survey standing, application, and analysis).
In line with Hamilton et al., who dfine data literacy as ``the abil
ity to ask and answer questions about collecting, analyzing, and making Despite this modest alpha, the iterative review process ensures construct
sense of data,'' a customized four-item Likert-scale survey was devel and content validity in line with our educational objectives.
oped to evaluate participants’ competence in using generative AI for
data analysis (Hamilton et al., 2009). This measure encompasses two 3.3.3. Task score
primary dimensions: (1) cofidence in applying AI-driven solutions, and The tasks mentioned in section were evaluated using a rubric. Task
(2) frequency of AI usage in practical data-analysis scenarios. The scale completion scores were calculated across three tasks, including the to
design drew on several sources in the existing literature. Oguguo et al. tal score. For Task 1 and Task 2, each task is worth a maximum of 4
4
A. Garg, K. Nisumba Soodhani and R. Rajendran
Computers and Education: Arti cial Intelligence 8 (2025) 100380
Fig. 1. The figure explains the study flow for all three groups and what the difference in the process.
points, where students earn 1 point per correctly produced data output 3.5. Prompt training
file for each month, as specfied in the dataset. Thus, students who gen
erate all four required files correctly receive the full 4 points for these In the previous detailing of the control group and experimental
tasks. Task 3, however, has a maximum score of 2 points and requires group, it is highlighted that one subset of participants was granted ac
the correct merging of data across all schools. Here, students receive 1 cess to Generative AI and underwent training in the prompt formulation.
point for a correctly merged dataset but lose 1 point if any school is To facilitate this, a one-hour instructional session was organized before
missing from the merged data, rflecting the importance of comprehen the main task as depicted in Fig. 2. During this session, participants are
sive data integration. Overall, the scoring system emphasizes accuracy introduced to a handout that explains the fundamentals of generative
in file generation and completeness in data merging. AI and various prompting techniques such as zero prompting, few-shot
prompting, and chain prompting. It also includes the CLEAR framework
3.3.4. System usability scale for prompting and strategies from OpenAI. Students read through the
The System Usability Scale (SUS) was adapted to capture feedback handout to understand these basic concepts, followed by a short group
on the Generative AI tool used in data analysis. This scale is a widely discussion to assess their understanding and address any questions they
utilized 10-item questionnaire that has been validated across numerous may have.
domains to measure perceived ease of use and user satisfaction. The SUS Then, students apply their knowledge by writing a prompt for a given
uses a balanced mix of positive and negative items scored on a 5-point question mentioned in Fig. 3. This exercise allows them to rflect on
scale, suitable for evaluating the tool’s effectiveness and user experience. their understanding and practice their prompt-writing skills. After five
A comprehensive evaluation of the tool’s impact on user prficiency minutes, we present a structured prompt for the same question and fa
can be achieved by triangulating SUS scores with task performance and cilitate a discussion to analyze its structure and the concepts employed.
This helps students connect the learned concepts to the prompt, rein
learning metrics (Brooke, 1996).
forcing their comprehension.
In the next phase, students work on a data analysis problem men
3.4. Experience survey tioned in Fig. 4 using a provided dataset that is different from the task
dataset. They write a prompt to generate Python code for the analysis,
The Experience Survey was conducted to gather detailed feedback applying their previous learning and seeking assistance online or from
from participants regarding their use of prompts in data analysis tasks. ChatGPT if needed. After their attempt, we present a structured prompt
The survey focused on the usability of prompts and suggestions for im for the same problem and engage in a discussion to decode and explain
provement. The feedback collected through this survey was analyzed to the strategies used. Students then use the prompt to generate and run
identify common themes and areas for improvement. the Python code, ensuring it functions as expected.
5
A. Garg, K. Nisumba Soodhani and R. Rajendran
Computers and Education: Arti cial Intelligence 8 (2025) 100380
Table 2
Establishing Baseline Homogeneity on the basis of Pre-test score of participants.
Male 46 29 42
Female 18 5 17 𝜒 2 (2) = 2.66, p =
0.2645
Data Literacy Survey
Cofidence in performing data 2.53 (1.08) 2.53 (0.98) 2.73 (0.86) F(2, 154) = 0.742,
analysis tasks p = 0.4777
Cofidence in programming with 2.64 (0.87) 2.38 (0.67) 2.54 (0.60) F(2, 154) = 1.026,
Python p = 0.3610
Frequency of using generative AI 3.09 (1.20) 2.65 (1.08) 3.15 (1.17) F(2, 154) = 2.632,
for data analysis p = 0.0751
Cofidence in using generative AI 3.08 (1.03) 2.50 (0.86) 2.93 (0.75) F(2, 154) = 4.238,
for data analysis p = 0.0899
Pre-test Scores (Mean ± S.D)
Total Score 3.85 (2.50) 3.47 (2.71) 3.83 (2.14) F(2, 144) = 0.736,
p = 0.481
4. Results
This instructional design ensures that students not only grasp the the
oretical aspects of prompting but also apply them practically, creating a 4.1. Baseline homogenity
comprehensive learning experience. By integrating reading, discussion,
application, and rflection, we foster a holistic learning environment Baseline homogeneity was assessed among all 157 participants us
that promotes deeper understanding and skill development in prompt ing a data literacy survey, demographic analysis, and pre-test scores. All
writing and generative AI. participants were first-year engineering students in India, aged between
To ensure that no undue advantage was conferred in the task do 18 and 20, with a common curriculum background. Group details are
main, the live example provided was unrelated to data analysis tasks. provided in Table 2. The survey included Likert scale-based questions
It is worth noting that, post-study, the remaining two groups were also designed to gauge participants’ skills in data analysis and programming.
The responses were recorded on a 1-5 scale, with 1 indicating low con
educated on prompt formulation to uphold ethical standards.
fidence or frequency and 5 indicating high cofidence or frequency.
ANOVA was conducted for each question to test for significant differ
3.6. Statistical analysis ences between the groups. The results, including means, standard devi
ations, and p-values, are presented in Table 2. No significant differences
All statistical analyses were performed utilizing the Real Stats add-on were found among the groups for any question (𝑝 > 0.05), cofirming
for MS Excel. To compare data literacy scores among the three groups the homogeneity in baseline skills. Additionally, the chi-square test con
(one control and two experimental), a one-way Analysis of Variance firmed no significant gender-based effect, 𝜒 2 (2) = 2.66, p = 0.2645.
(ANOVA) was employed, as it is suitable for assessing mean differences These results indicate no significant differences among the groups
across multiple independent groups. Gender-based effects were analyzed in their pre-test scores for any level (𝑝 > 0.05), cofirming baseline ho
using a chi-square test appropriate for examining associations between mogeneity in programming skills related to data analysis. This ensures
categorical variables. The chi-square assumptions were satified because that any observed effects in the main study are due to the experimental
the observations were independent, the sample size exceeded 20, ex conditions rather than pre-existing differences among the participants.
6
A. Garg, K. Nisumba Soodhani and R. Rajendran
Computers and Education: Arti cial Intelligence 8 (2025) 100380
Fig. 3. Structured prompt example -1, explain the concept and strategies mentioned to write the prompt, this example is for the UPSC problem.
Fig. 4. The Figure explains for structured prompt for a given dataset problem to generate Python code and asks to explain the code.
4.2. Effect of prompt training and generative AI on programming learning G2 and G3. Additionally, at understanding (L1) G2 performed substan
tially better compared to G1, highlighting the effectiveness of generative
The impact of the intervention on participants’ programming knowl AI compared to internet search. At the same time, performance at level
edge was analyzed using ANOVA on post-test scores for understanding 2 and level 3 is not significant which highlights the need for prompting
(L1), application (L2), and analysis (L3) levels, as well as the total score skills.
mentioned in Table 3. The table presents the results of 144 students who
completed the post-test. Out of the 157 students who consented to par 4.3. Impact of intervention on task completion scores
ticipate in the study, 13 did not take the post-test. The results showed
significant differences among the groups for all levels, indicating a sub
stantial impact of the intervention. G3 consistently outperformed G2 To evaluate the impact of the intervention, we analyzed the task
and G1, with the highest mean scores across all levels. For L1 (under completion scores across three tasks and the total score for all 157 partic
standing), G3 scored 2.69 (s.d=0.88), G2 scored 2.31 (s.d=0.87), and ipants. Each task was rated based on a detailed rubric, with scores out of
G1 scored 1.39 (s.d=0.35). Similar trends were observed for L2 (ap 10. The ANOVA results in Table 4 indicate significant differences among
plication) and L3 (analysis), with G3 scoring significantly higher than the groups for all tasks, highlighting the intervention’s effect. G3 consis
the other groups. The total post-test scores also revealed G3’s superior tently scored highest across all tasks and the total score. For Task 1, G3
performance (6.60, s.d=2.69) compared to G2 (4.94, s.d=2.25) and scored 3.41 (s.d=0.75), significantly higher than G2 (2.74, s.d=0.83)
G1 (4.28, s.d=1.17). These results underscore the effectiveness of the and G1 (1.11, s.d=0.94), with F(2, 154) = 117.38, 𝑝 < 0.001. A simi
intervention prompt training with the integration of generative AI, par lar trend was observed for Task 2, with G3 outperforming G2 and G1.
ticularly for G3. For task 3, G2 and G3 outperformed G1, but there was no significant
Further analysis using Tukey’s HSD test revealed significant differ difference between G2 and G3. The total score analysis also showed sig
ences between specific groups. These findings highlight the effectiveness nificant differences, with G3 scoring 8.76 (s.d=1.43) compared to G2
of the intervention for G3, suggesting that the training program signifi (6.91, s.d=2.34) and G1 (2.00, s.d=2.26), with F(2, 154) = 183.15,
cantly enhanced their programming skills in data analysis compared to 𝑝 < 0.001.
7
A. Garg, K. Nisumba Soodhani and R. Rajendran
Computers and Education: Arti cial Intelligence 8 (2025) 100380
Table 3
Comparison of Post-Test Scores Using ANOVA, Welch’s ANOVA and Post Hoc Test, Indicating Significant Differences in Post-Test
Scores at All Levels and Total Score Among the Three Groups.
L1: Under 1.39 2.31 2.69 F(2, 141) = 43.075 8.92E-14 2.76E-06 0.097
standing (0.35) (0.87) (0.88) 37.22, (2,71.76),
1.05E-13 5.13E-13
L2: 1.56 1.38 2.24 F(2, 141) = 22.8298 7.91E-07 0.410 9.01E-08
Application (0.46) (0.44) (0.41) 22.28, (2,80.53),
3.90E-09 1.40E-08
L3: Analysis 1.32 1.28 1.84 F(2, 141) = 11.7119 7.42E-05 0.967 3.59E-04
(0.40) (0.34) (0.44) 12.12, (2,82.99),
1.39E-05 3.31E-05
Total Score 4.28 4.94 6.60 F(2, 141) = 38.4089 1.48E-14 0.092 1.30E-06
(1.17) (2.25) (2.69) 39.29, (2,74.41),
2.75E-14 3.48E-12
Table 4
Tasks scores across groups G1, G2, and G3.
Task 1 1.11 2.74 3.41 F(2, 154) = 113.31 (2, 1.69E-14 1.51E-14 9.62E-04
(0.94) (0.83) (0.75) 117.38, 86.37),
1.08E-31 8.63E-26
Task 2 0.75 2.59 3.66 F(2, 154) = 69.91 (2, 1.69E-14 1.63E-07 3.55E-03
(1.57) (1.94) (1.12) 57.81, 77.23),
1.87E-19 7.29E-18
Task 3 0.28 1.59 1.69 F(2, 154) = 68.59 (2, 1.69E-14 9.02E-14 0.78
(0.70) (0.82) (0.73) 66.15, 82.76),
1.84E-21 2.07E-21
Total Score 2.00 6.91 8.76 F(2, 154) = 198.05 (2, 1.69E-14 1.69E-14 9.69E-05
(2.26) (2.34) (1.43) 183.15, 77.97),
1.94E-41 6.75E-47
Table 5
Hake gain of pre and post-test, total task score, and Spearman correlation for groups
G1, G2, and G3.
Table 4 shows the mean and standard deviation (Mean ± S.D) of scored high in both areas, demonstrating a strong association between
post-test scores for tasks and total scores across groups G1, G2, and programming knowledge and better performance in data analysis tasks.
G3. ANOVA results indicate significant differences, with post hoc test Group 2 showed a significant role of programming knowledge in their
p-values highlighting specific group differences. intermediate performance levels, while Group 1, despite having lower
To further understand the relationship between programming knowl scores, also displayed a strong association between programming knowl
edge and data analysis task performance, we conducted a Hake Gain edge and task performance. These findings emphasize that programming
analysis and non-linear correlation analysis (Spearman’s correlation) knowledge is a critical factor in enhancing data analysis task perfor
between the Hake Gain scores and data analysis task completion scores. mance across all groups
The Hake Gain analysis results are summarized in Table 5
4.4. System usability scale results
Table 5 presents the Hake gain of pre and post-test, total task score,
and Spearman correlation for groups G1, G2, and G3. Of the 157 partic To evaluate the usability of ChatGPT, a comparison of System Us
ipants, 121 completed both the pre- and post-tests as well as the task. ability Scale (SUS) scores was conducted between experimental group 1
The correlation coefficients (r) and p-values indicate the strength and (G2), which did not receive prompt training, and experimental Group 2
significance of the relationship between Hake gain and total task scores (G3), which received prompt training. An independent samples t-test as
within each group. suming equal variances revealed a statistically significant difference in
There is a strong positive correlation between programming knowl SUS scores between the two groups, with G3 reporting a mean SUS score
edge and data analysis task performance across all groups. Group 3 of 77.33 (SD = 9.64) and G2 a mean score of 73.16 (SD = 10.18), t(91)
8
A. Garg, K. Nisumba Soodhani and R. Rajendran
Computers and Education: Arti cial Intelligence 8 (2025) 100380
= 1.967, p = 0.026 (one-tailed). The effect size (r = 0.202) indicates three individual tasks and the total task score level, underscoring the
a small but meaningful impact of prompt training on usability. These combined benfits of Generative AI and prompt techniques. This was
results suggest that prompt training significantly improves users’ per contrasted with Group 1 (G1), which relied solely on traditional meth
ceived usability of ChatGPT, underscoring its importance for enhancing ods, and Group 2 (G2), which used Generative AI without prompt train
user experience. ing. When comparing G2 to G1, we found significant differences in all
task scores, both individually and in total, indicating that even without
5. Discussion prompt training, Generative AI substantially improves task completion
rates compared to traditional methods. This finding aligns with previ
5.1. Effect of generative AI and prompt training in programming knowledge ous research in programming, where code generated by AI tools often
surpasses the quality of code written by novice learners within a limited
To address RQ1, ``What is the impact of integrating Generative AI time frame. Moreover, the higher System Usability Scale (SUS) scores re
with prompt engineering on first-year engineering students’ learning ported by participants in G3 further support these results, indicating a
in the domain of programming for data analysis?'' we compared the more user-friendly and supportive learning environment. Recent studies
performance of different groups. Specifically, Group 3 (G3), which uti show that accuracy enhancements of up to 13.79% have been observed
lized both Generative AI and prompt engineering, outperformed Group in code generation tasks Li et al. (2023), and a 12% gain in complex rea
1 (G1), which relied on traditional internet searches, across all three lev soning tasks was achieved Li et al. (2023). These improvements support
els of programming knowledge. This demonstrates the effectiveness of our findings that integrating Generative AI with prompt engineering sig
the combined approach of Generative AI and prompt engineering over nificantly enhances first-year engineering students’ task completion in
traditional methods. However, when examining the effect of Generative data analysis, highlighting the effectiveness of structured prompting in
AI alone, the results indicate significant improvement only at the un educational settings.
derstanding level, but not at the other two levels. This finding suggests Additionally, a Spearman’s correlation analysis revealed a strong
that while Generative AI alone contributes to learning, its full potential positive correlation (approximately 0.8) between programming knowl
is realized when combined with structured prompt engineering. edge and task completion scores, suggesting that participants who per
To verify the role of prompt engineering, we compared Group 2 (G2), formed well in tasks also showed improved programming knowledge.
which used Generative AI without prompt training, to Group 3 (G3). The This indicates that while Generative AI facilitates task completion, the
comparison revealed that no significant differences emerged at the un retention and deeper understanding of programming concepts develop
derstanding level, which is the lower-order thinking skills of Bloom’s over time. Our study demonstrates that integrating Generative AI with
taxonomy. Nonetheless, the overall scores and performance at higher prompt engineering significantly improves data analysis task comple
cognitive levels were significantly better in G3, highlighting the criti tion among first-year engineering students. This integration not only
cal role of prompting skills in enhancing programming learning. These enhances immediate task performance but also supports deeper learn
findings align with several recent studies emphasizing the role of Gen ing and retention of programming concepts, making it a valuable tool
erative AI as a learning tool and the efficacy of prompting techniques. in educational settings.
The enhanced learner performance observed in our study can be further
explained by the higher System Usability Scale (SUS) scores reported by 5.3. Experience survey feedback
participants in G3 compared to G2. A more interactive and supportive
environment with high usability reduces cognitive load, allowing par The Learning Experience Survey results provide additional support
ticipants to focus more on domain learning. Participants reported that for the efficacy of integrating Generative AI with prompt engineering.
prompt techniques, which involved asking the AI to explain the syntax Participants reported that diverse explanations offered by AI catered to
and underlying concepts of generated code, significantly aided their un various learning styles, fostering a deeper understanding of program
derstanding of programming intricacies compared to both control (G1) ming concepts. The structured and iterative nature of prompt crafting
and experimental groups (G2,G3). promoted rflective learning and skill reinforcement. Additionally, the
Our findings that integrating Generative AI with prompt engineer interactive and immediate feedback from AI-fueled curiosity and en
ing significantly enhances first-year engineering students’ learning out couraged continuous discovery. These qualitative insights corroborate
comes in programming for data analysis are corroborated by few recent our quantitative findings, demonstrating that prompt engineering, com
studies. For instance, Chan and Hu (2023) observed that ChatGPT’s bined with Generative AI, not only improves performance in program
ability to handle repetitive tasks allows students to focus on advanced ming tasks but also enriches the overall learning experience.
learning, enhancing their performance. Similarly, Alneyadi and War Additionally, participants noted that prompt training improved their
dat (2023) demonstrated a clear improvement in post-test scores for questioning skills, emphasizing the importance of context in generating
students using ChatGPT, highlighting its effectiveness in educational set code, breaking down problems, and logically structuring queries for bet
tings. Additionally, Boubker (2024) found that personalized approaches ter responses. This suggests that prompt engineering not only enhances
with ChatGPT improve students’ perceived usefulness and learning ef the effectiveness of Generative AI tools but also boosts learners’ effi
fectiveness, aligning with our results that structured prompt training ciency and problem-solving abilities during experiential learning.
enhances understanding and mastery of programming concepts. These
studies collectively support our conclusion that combining Generative 5.4. Implications for educational theories and pedagogy
AI with prompt engineering is more effective than using traditional
methods or AI alone. Our results show that structured prompt training (G3) fosters gains
not only at the lower levels of Bloom’s taxonomy (understanding) but
5.2. Effect of generative AI and prompt training in data analysis task also at higher levels (application and analysis). By teaching students
completion how to formulate effective prompts and interpret AI-generated feedback,
educators can scaffold progression across cognitive domains, moving
To address RQ2, ``What is the impact of integrating Generative AI learners from basic comprehension toward more complex analytical and
with prompt engineering on first-year engineering students in data anal evaluative tasks. This finding underscores the value of explicit prompt
ysis task completion?'' we analyzed the performance of different groups engineering instruction in helping novice programmers advance along
in the given tasks. We found that Group 3 (G3), which utilized both Bloom’s hierarchy more rapidly and effectively.
Generative AI and prompt engineering, consistently outperformed all From a constructionist perspective, the prompt-training process func
other groups. The task scores for G3 were significantly higher across all tions as an effective scaffold, guiding students through a cycle of experi
9
A. Garg, K. Nisumba Soodhani and R. Rajendran
Computers and Education: Arti cial Intelligence 8 (2025) 100380
mentation, feedback, and rflection. Rather than passively receiving in and adapting to individual learning styles, thereby optimizing learn
formation, learners actively construct their programming knowledge by ing outcomes. Theoretical advancements are needed to rfine existing
crafting prompts, testing AI responses, and rfining their understanding frameworks and guide the design of AI-driven educational interventions.
of both code logic and data-analysis techniques. This iterative process New approaches to prompt engineering and user interface design can en
echoes the hallmarks of cognitive apprenticeship, where novices learn hance the usability and effectiveness of AI tools in the learning-teaching
expert strategies (e.g., chaining prompts, using the CLEAR framework) domain.
through guided practice. Integrating these scaffolds into a programming
curriculum can thus enhance engagement and accelerate the develop CRediT authorship contribution statement
ment of problem-solving skills.
Lastly, blending Generative AI with structured prompt training Ashish Garg: Writing -- original draft, Methodology, Investigation,
closely aligns with 21st-century educational objectives that empha Formal analysis, Data curation, Conceptualization. K. Nisumba Sood
size critical thinking, digital fluency, and adaptable problem-solving. hani: Writing -- review & editing, Writing -- original draft, Methodol
By interacting with AI-driven tools, students in G3 cultivated planning ogy, Investigation. Ramkumar Rajendran: Writing -- review & editing,
queries, evaluating AI outputs, and iterating solutions—all of which are Validation, Supervision, Software, Methodology, Investigation, Funding
indispensable for modern engineering roles. Moreover, when basic or acquisition, Conceptualization.
repetitive coding tasks are offloaded to AI, instructors can reorient class
time to deeper conceptual discussions and collaborative projects, foster Ethical considerations and data availability
ing the creativity, communication, and self-directed inquiry advocated
by contemporary learning frameworks. The study was approved by the Institution Review Board ethical com
In practice, these findings suggest that instructors should consider mittee with ID: IIT-IRB/2021/006. Informed consent was obtained from
incorporating explicit prompt-engineering lessons—covering both tech all participants, and their privacy rights were strictly observed. The data
nical (e.g., how to specify parameters or contexts) and cognitive (e.g., can be obtained by sending request emails to the corresponding author.
how to break down a complex task into smaller prompts) dimensions-
into early programming curricula. Doing so can help students leverage Declaration of generative AI and AI-assisted technologies in the
AI more effectively for task completion and skill-building, while also writing process
honing their ability to articulate, analyze, and rfine problem-solving
strategies. Ultimately, by combining Generative AI with thoughtfully During the preparation of this work, the author(s) used ChatGPT 3.5
designed scaffolds, educators can foster robust learning environments in order to check the grammatical errors and improve readability. After
that promote immediate task success. using this tool/service, the authors reviewed and edited the content as
needed and take full responsibility for the content of the publication.
6. Conclusion and future work
Declaration of competing interest
6.1. Conclusion
The authors declare that they have no known competing financial
The integration of generative AI in education is transforming teach interests or personal relationships that could have appeared to ifluence
ing and learning methodologies. This study explores the potential of the work reported in this paper.
structured prompt training to enhance the educational utility of AI
generated content. Our findings demonstrate that prompt training sig Acknowledgements
nificantly improves learning outcomes, engagement, and the quality of
AI interactions. In data analysis and programming tasks, participants We acknowledge SBI Foundation Hub for Data Science & Analytics,
who received structured prompt training outperformed those who did IIT Bombay for supporting the work done in this project. We thank the
not receive prompt training. By delivering context-specific cues and teachers and students who participated in this study, whose contribu
structured guidance, prompt training significantly improves problem tions were invaluable to this work.
solving skills by addressing the limitations of AI-generated responses,
which often lack contextual relevance. References
The study has the following limitations. The gender imbalance
Aktay, S., Gök, S., & Uzunoğlu, D. (2023). Chatgpt in education. Türk Akademik Yayınlar
among participants and the lack of comparative gender analysis limit Dergisi (TAY Journal), 7(2), 378--406.
the generalizability of the findings. Additionally, the short duration of Alier, M., García-Peñalvo, F., & Camba, J. D. (2024). Generative artficial intelligence in
the study poses challenges in assessing the long-term effects of the inter education: From deceptive to disruptive.
ventions. The lack of detailed interaction analysis and quantification of Alneyadi, S., & Wardat, Y. (2023). ChatGPT: Revolutionizing student achievement in the
electronic magnetism unit for eleventh-grade students in Emirates schools. Contempo
prompting skills during task execution makes it difficult to understand rary Educational Technology, 15(4), Article ep448.
the intricacies of the learning process. Balreira, D. G., Silveira, T. L. d., & Wickboldt, J. A. (2023). Investigating the impact of
adopting python and c languages for introductory engineering programming courses.
6.2. Future work Computer Applications in Engineering Education, 31(1), 47--62.
Barbas, M. P., Vieira, A. T., & Branco, P. D. (2023). The importance of chat gpt train
ing for higher education: Case study. In International conference on design and digital
Future research should address the limitations identfied in this study communication (pp. 695--705). Springer.
to enhance our understanding of the impact of generative AI in educa Boubker, O. (2024). From chatting to self-educating: Can AI tools boost student learning
tion. Longitudinal studies are essential to explore the sustained impact outcomes?. Expert Systems with Applications, 238, Article 121820.
Bozkurt, A. (2023). Generative artficial intelligence (ai) powered conversational educa
of structured prompt training on learning outcomes over time. A more
tional agents: The inevitable paradigm shift. Asian Journal of Distance Education, 18(1).
balanced gender representation and comparative gender analysis are Brooke, J. (1996). System usability scale (sus). usability evaluation in industry.
necessary to ensure the results are applicable across different demo Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A.,
graphics. Multi-modal analysis, incorporating visual and auditory cues, Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners.
should be pursued to gain deeper insights into student-AI interactions Advances in Neural Information Processing Systems, 33, 1877--1901.
Brynjolfsson, E., Li, D., & Raymond, L. R. (2023). Generative ai at work. Tech. Rep., Na
and cognitive processes. This approach can lead to more personalized tional Bureau of Economic Research.
and effective educational interventions. Developing sophisticated in Cao, L., & Dede, C. (2023). Navigating a world of generative ai: Suggestions for educators,
teraction analysis systems will be key in providing real-time feedback The next level lab at harvard graduate school of education 5 (2).
10
A. Garg, K. Nisumba Soodhani and R. Rajendran
Computers and Education: Arti cial Intelligence 8 (2025) 100380
Chan, C. K. Y., & Hu, W. (2023). Students’ voices on generative AI: Perceptions, benfits, Liu, X., Wang, J., Sun, J., Yuan, X., Dong, G., Di, P., Wang, W., & Wang, D. (2023). Prompt
and challenges in higher education. International Journal of Educational Technology in ing frameworks for large language models: A survey. preprint, arXiv:2311.12785.
Higher Education, 20(1), Article 43. Lo, L. S. (2023). The clear path: A framework for enhancing information literacy through
Chen, X., Zou, D., Xie, H., Cheng, G., & Liu, C. (2022). Two decades of artficial intelligence prompt engineering. Journal of Academic Librarianship, 49(4), Article 102720.
in education. Educational Technology & Society, 25(1), 28--47. McKinney, W. (2022). Python for data analysis, ``O’Reilly Media, Inc.''.
Cohen, J. (2013). Statistical Power Analysis for the Behavioral Sciences. Routledge. Michel-Villarreal, R., Vilalta-Perdomo, E., Salinas-Navarro, D. E., Thierry-Aguilera, R., &
Dave, D. M., Mandvikar, S., & Engineer, P. A. (2023). Augmented intelligence: Human-ai Gerardou, F. S. (2023). Challenges and opportunities of generative ai for higher edu
collaboration in the era of digital transformation. International Journal of Engineering cation as explained by chatgpt. Education Sciences, 13(9), 856.
Applied Sciences and Technology, 8(6), 24--33. Mukherjee, A., & Chang, H. (2023). The creative frontier of generative ai: Managing the
Denny, P., Kumar, V., & Giacaman, N. (2023). Conversing with copilot: Exploring prompt novelty-usefulness tradeoff. preprint, arXiv:2306.03601.
engineering for solving cs1 problems using natural language. In Proceedings of the 54th Nasution, M. K., Syah, R., & Elveny, M. (2023). What is data science. In Data science with
ACM technical symposium on computer science education V. 1 (pp. 1136--1142). semantic technologies (pp. 1--25). CRC Press.
Donoho, D. (2024). Data science at the singularity. Harvard Data Science Review, 6(1). Oguguo, B., Nannim, F. A., Okeke, A. O., Ezechukwu, R. I., Christopher, G. A., & Ugorji, C.
Doughty, J., Wan, Z., Bompelli, A., Qayum, J., Wang, T., Zhang, J., Zheng, Y., Doyle, A., O. (2020). Assessment of students’ data literacy skills in southern nigerian universities.
Sridhar, P., Agarwal, A., et al. (2024). A comparative study of ai-generated (gpt-4) and Universal Journal of Educational Research, 8(6), 2717--2726.
human-crafted mcqs in programming education. In Proceedings of the 26th Australasian Park, D., An, G-t., Kamyod, C., & Kim, C. G. (2023). A study on performance improvement
computing education conference (pp. 114--123). of prompt engineering for generative ai with a large language model. Journal of Web
Ebert, C., & Louridas, P. (2023). Generative ai for software practitioners. IEEE Software, Engineering, 22(8), 1187--1206.
40(4), 30--38. Pesovski, I., Santos, R., Henriques, R., & Trajkovik, V. (2024). Generative ai for customiz
able learning experiences. Sustainability, 16(7), 3034.
Endres, T., Carpenter, S., Martin, A., & Renkl, A. (2017). Enhancing learning by retrieval:
Prasai, S. (2023). Algorithmic Hypnosis.
Enriching free recall with elaborative prompting. Learning and Instruction, 49, 13--20.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al. (2019). Language
Feng, Y., Vanam, S., Cherukupally, M., Zheng, W., Qiu, M., & Chen, H. (2023). Inves
models are unsupervised multitask learners. OpenAI Blog, 1(8), 9.
tigating code generation performance of chatgpt with crowdsourcing social data. In
Rahman, M. M., & Watanobe, Y. (2023). Chatgpt for education and research: Opportuni
2023 IEEE 47th annual computers, software, and applications conference (COMPSAC)
ties, threats, and strategies. Applied Sciences, 13(9), 5783.
(pp. 876--885). IEEE.
Rogel-Salazar, J. (2023). Statistics and data visualisation with Python. Chapman and
Feuerriegel, S., Hartmann, J., Janiesch, C., & Zschech, P. (2024). Generative ai. Business
Hall/CRC.
& Information Systems Engineering, 66(1), 111--126.
Sahoo, P., Singh, A. K., Saha, S., Jain, V., Mondal, S., & Chadha, A. (2024). A systematic
Finnie-Ansley, J., Denny, P., Becker, B. A., Luxton-Reilly, A., & Prather, J. (2022). The
survey of prompt engineering in large language models: Techniques and applications.
robots are coming: Exploring the implications of openai codex on introductory pro
preprint, arXiv:2402.07927.
gramming. In Proceedings of the 24th Australasian computing education conference
Schroeder, K. T., Hubertz, M., Van Campenhout, R., & Johnson, B. G. (2022). Teaching and
(pp. 10--19).
learning with ai-generated courseware: Lessons from the classroom. Online Learning,
Fui-Hoon Nah, F., Zheng, R., Cai, J., Siau, K., & Chen, L. (2023). Generative ai and chatgpt:
26(3), 73--87.
Applications, challenges, and ai-human collaboration. Seo, K., Tang, J., Roll, I., Fels, S., & Yoon, D. (2021). The impact of artficial intelligence on
Gozalo-Brizuela, R., & Garrido-Merchan, E. C. (2023). Chatgpt is not all you need. A state learner–instructor interaction in online learning. International Journal of Educational
of the art review of large generative ai models. preprint, arXiv:2301.04655. Technology in Higher Education, 18, 1--23.
Guo, K., Zhong, Y., Li, D., & Chu, S. K. W. (2023). Effects of chatbot-assisted in-class de Shah, C. (2024). From prompt engineering to prompt science with human in the loop.
bates on students’ argumentation skills and task motivation. Computers and Education, preprint, arXiv:2401.04122.
203, Article 104862. Siontis, K. C., Attia, Z. I., Asirvatham, S. J., & Friedman, P. A. (2024). Chatgpt hallucinating:
Hadi, M. U., Qureshi, R., Shah, A., Irfan, M., Zafar, A., Shaikh, M. B., Akhtar, N., Wu, can it get any more humanlike?.
J., Mirjalili, S., et al. (2023). Large language models: A comprehensive survey of its Sovrano, F., Ashley, K., & Bacchelli, A. (2023). Toward eliminating hallucinations: Gpt
applications, challenges, limitations, and future prospects. Authorea. Preprints. based explanatory ai for intelligent textbooks and documentation. In CEUR workshop
Hamilton, L., Halverson, R., Jackson, S. S., Mandinach, E., Supovitz, J. A., Wayman, J. C., proceedings, no. 3444 (pp. 54--65). CEUR-WS.
Pickens, C., Martin, E., & Steele, J. L. (2009). Using student achievement data to support Steiss, J., Tate, T., Graham, S., Cruz, J., Hebert, M., Wang, J., Moon, Y., Tseng, W.,
instructional decision making. Warschauer, M., & Olson, C. B. (2024). Comparing the quality of human and chat
Hong, W. C. H. (2023). The impact of chatgpt on foreign language teaching and learn gpt feedback of students’ writing. Learning and Instruction, 91, Article 101894.
ing: Opportunities in education and research. Journal of Educational Technology and Varsha, P. (2023). How can we manage biases in artficial intelligence systems–a system
Innovation, 5(1). atic literature review. International Journal of Information Management Data Insights,
Hutson, J., & Schnellmann, A. (2023). The poetry of prompts: The collaborative role of 3(1), Article 100165.
generative artficial intelligence in the creation of poetry and the anxiety of machine Walter, Y. (2024). Embracing the future of artficial intelligence in the classroom: The rel
ifluence. Global Journal of Computer Science and Technology: D, 23(1). evance of ai literacy, prompt engineering, and critical thinking in modern education.
Jeon, J., & Lee, S. (2023). Large language models in education: A focus on the comple International Journal of Educational Technology in Higher Education, 21(1), 15.
mentary relationship between human teachers and chatgpt. Education and Information Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., & Zhou,
Technologies, 28(12), 15873--15892. D. (2022). Self-consistency improves chain of thought reasoning in language models.
Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y., Chen, D., Dai, W., Chan, preprint, arXiv:2203.11171.
H. S., Madotto, A., & Fung, P. (2023). Survey of hallucination in natural language Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q. V., Zhou, D., et
generation. arXiv:2202.03629. al. (2022). Chain-of-thought prompting elicits reasoning in large language models.
Khazanchi, R., & Khazanchi, P. (2024). Generative ai to improve special education teacher Advances in Neural Information Processing Systems, 35, 24824--24837.
preparation for inclusive classrooms. Exploring New Horizons: Generative Artficial In Weisz, J. D., He, J., Muller, M., Hoefer, G., Miles, R., & Geyer, W. (2024). Design principles
telligence and Teacher Education, 159. for generative ai applications. In Proceedings of the CHI conference on human factors in
Kshetri, N., Dwivedi, Y. K., Davenport, T. H., & Panteli, N. (2023). Generative artficial computing systems (pp. 1--22).
intelligence in marketing: Applications, opportunities, challenges, and research agenda. Ying, F., & Zhang, Z. (2019). Data visualization analysis of big data recruitment positions
Kucharavy, A. (2024). Adapting llms to downstream applications. In Large language mod in hangzhou based on python. Review of Computer Engineering Studies, 6(4).
els in cybersecurity: Threats, exposure and mitigation (pp. 19--29). Switzerland Cham: Yu, H., & Guo, Y. (2023). Generative artficial intelligence empowers educational reform:
Springer Nature. Current status, issues, and prospects. Frontiers in education: Vol. 8. Frontiers Media SA
Lee, J. H., Shin, D., & Noh, W. (2023). Artficial intelligence-based content generator tech (p. 1183162).
nology for young english-as-a-foreign-language learners’ reading enjoyment. RELC Zhai, X. (2023). Chatgpt for next generation science learning, XRDS: Crossroads. The ACM
Journal, 54(2), 508--516. Magazine for Students, 29(3), 42--46.
Leiser, F., Eckhardt, S., Knaeble, M., Maedche, A., Schwabe, G., & Sunyaev, A. (2023). Zhang, Z., Zhang, A., Li, M., & Smola, A. (2022). Automatic chain of thought prompting
From chatgpt to factgpt: A participatory design study to mitigate the effects of large in large language models. preprint, arXiv:2210.03493.
language model hallucinations on users. In Proceedings of mensch und computer 2023 Zhao, X., Li, M., Lu, W., Weber, C., Lee, J. H., Chu, K., & Wermter, S. (2023). Enhanc
(pp. 81--90). ing zero-shot chain-of-thought reasoning in large language models through logic.
Li, Q., Fu, L., Zhang, W., Chen, X., Yu, J., Xia, W., Zhang, W., Tang, R., & Yu, Y. (2023). preprint, arXiv:2309.13339.
Zheng, Y. (2019). A comparison of tools for teaching and learning data analytics. In Pro
Adapting large language models for education: Foundational capabilities, potentials,
ceedings of the 20th annual SIG conference on information technology education (p. 160).
and challenges, arXiv preprint arXiv:2401.08664.
11