[go: up one dir, main page]

0% found this document useful (0 votes)
46 views11 pages

Gamification

This study investigates the impact of structured prompt training on first-year engineering students' data analysis and programming skills using Generative AI tools. The results indicate that students who received prompt training significantly outperformed those without training and the control group across various skill levels. The findings suggest that integrating prompt engineering can enhance educational practices and improve student learning outcomes in data analysis tasks.

Uploaded by

Phoenix raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views11 pages

Gamification

This study investigates the impact of structured prompt training on first-year engineering students' data analysis and programming skills using Generative AI tools. The results indicate that students who received prompt training significantly outperformed those without training and the control group across various skill levels. The findings suggest that integrating prompt engineering can enhance educational practices and improve student learning outcomes in data analysis tasks.

Uploaded by

Phoenix raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Computers and Education: Arti cial Intelligence 8 (2025) 100380

Contents lists available at ScienceDirect

Computers and Education: Art­ficial Intelligence


journal homepage: www.sciencedirect.com/journal/computers-and-education-artificial-intelligence

Enhancing data analysis and programming skills through structured


prompt training: The impact of generative AI in engineering education
,∗
Ashish Garg, K. Nisumba Soodhani , Ramkumar Rajendran
Centre for Educational Technology, Indian Institute of Technology Bombay, India

A R T I C L E I N F O A B S T R A C T

Keywords: The advent of Generative Art­ficial Intelligence (GenAI) and large language models like LLama, Palm2, GPT,
Applications in subject areas Gemini, and Claude has revolutionized education by generating human-like text and contextually relevant
Post-secondary education responses. Our research investigates the impact of structured prompt training on students’ learning in data analysis
Teaching/learning strategies
and programming. We experimented with 157 first-year engineering students divided into three groups: a control
group (internet access, no GenAI), an experimental group 1 (internet and GenAI without prompt training), and an
experimental group 2 (internet and GenAI with prompt training). The prompt training session included techniques
like few-shot prompting, chain prompting, and the CLEAR framework. We assessed participants’ performance
in data analysis tasks using Python, with pre-tests and post-tests measuring their skills in programming across
three Bloom’s taxonomy levels (understanding, application, and analysis). ANOVA on post-test scores showed
significant differences among the groups, with G3 (with prompt training) outperforming G2 (without prompt
training) and the control group across all three levels, evidenced by higher mean scores (G3: 6.60, G2: 4.94,
Control: 4.28), similar pattern observed in task completion also. These results underscore the effectiveness of
structured prompt training in enhancing students’ data analysis and programming skills. Our study highlights
the potential of GenAI and structured prompt training to transform educational practices and suggests future
research directions, including integrating prompt engineering within human-AI collaboration.

1. Introduction & Garrido-Merchan, 2023) (Bozkurt, 2023). These capabilities have


opened new avenues for enhancing educational outcomes and address­
In recent years, the advent of Generative art­ficial intelligence ing diverse learning needs.
(GenAI) and large language models (LLMs)1 has revolutionized vari­ The ben­fits of incorporating GenAI in educational settings are man­
ous fields like healthcare, business, and education (Gozalo-Brizuela & ifold. Firstly, they offer on-demand assistance, allowing students to ac­
Garrido-Merchan, 2023) (Bozkurt, 2023). These models can generate cess help whenever needed, which is particularly ben­ficial in self-paced
human-like text and provide instant, contextually relevant responses. As learning environments (Cao & Dede, 2023) (Seo et al., 2021). Secondly,
such, they have become valuable tools for students and educators, of­ these models can cater to different learning styles by providing explana­
fering support in various educational activities like writing assistance, tions in multiple formats, such as text, summaries, or even step-by-step
providing supplementary learning resources, guiding in learning pro­ guides, and have been shown to improve the performance and engage­
gramming languages, and support for instructors in evaluating, content ment of students (Pesovski et al., 2024). Thirdly, GenAI can assist educa­
generation (Rahman & Watanobe, 2023) (Michel-Villarreal et al., 2023) tors by automating repetitive tasks like evaluating assessments, creating
(Steiss et al., 2024). Initially, GenAI tools were primarily used for text lesson plans, etc., thus freeing up time for more personalized instruc­
generation. However, with the advancements in natural language pro­ tion and interaction with students (Zhai, 2023) (Khazanchi & Khazanchi,
cessing (NLP) and machine learning, GenAI has transitioned into more 2024) (Schroeder et al., 2022).
interactive roles such as creating personalized learning experiences, pro­ Despite these advantages, there are notable challenges and limita­
viding feedback, tutoring students in specific subjects, and generating tions associated with the use of LLMs in education. One of the primary
various modalities such as audio, image, video, etc (Gozalo-Brizuela concerns is the quality and accuracy of the generated content. LLMs

* Corresponding author.
E-mail addresses: arggarg1996@gmail.com (A. Garg), nisumba_soodhani@iitb.ac.in (K. Nisumba Soodhani), ramkumar.rajendran@iitb.ac.in (R. Rajendran).
1
LLaMA, PaLM2, GPT-3, and Claude 2.

https://doi.org/10.1016/j.caeai.2025.100380
Received 1 October 2024; Received in revised form 10 January 2025; Accepted 3 February 2025

Available online 10 February 2025


2666-920X/© 2025 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC license (http://creativecommons.org/licenses/by-
nc/4.0/).
A. Garg, K. Nisumba Soodhani and R. Rajendran
Computers and Education: Arti cial Intelligence 8 (2025) 100380
are trained on vast datasets that include both accurate information and pre-test, post-test, system usability scale, and additionally feedback on
misinformation, which can sometimes lead to the generation of incor­ using prompts was obtained using experience surveys. The results indi­
rect or biased responses (Doughty et al., 2024). Additionally, the lack cate significant differences in post-test scores between the control group
of contextual understanding often results in generic or irrelevant infor­ and experimental groups at different levels of Bloom’s taxonomy and
mation (Siontis et al., 2024) (Weisz et al., 2024) (Michel-Villarreal et task performance, within experimental groups, experimental group 2
al., 2023). Moreover, the lack of skill in how to effectively use these (G3) performed significantly better compared to group G2 which un­
tools can result in suboptimal outcomes (Yu & Guo, 2023) (Barbas et al., derscores the effect of prompt training.
2023). To minimize these challenges, one way is to fine-tune AI mod­ The paper is structured as follows: The Literature Review section will
els with improved datasets and feedback ensuring greater accuracy and provide an overview of existing research on the use of large language
relevance (Dave et al., 2023). Another technique suggested is prompt en­ models in education and the theoretical framework of prompt training.
gineering, which involves crafting precise and context-specific queries The Methodology section will detail the experimental design, data col­
to guide AI responses (Park et al., 2023) (Alier et al., 2024). Studies lection measures, and analytical techniques employed in this study. The
show that prompting methods like zero-shot and few-shot prompting Results section will present the findings, highlighting the differences
allow models to handle new tasks with minimal training data, leverag­ in performance across the control and experimental groups. The Dis­
ing pre-existing knowledge for accurate responses (Radford et al., 2019) cussion section will interpret the results, compare them with existing
(Brown et al., 2020). Another prompting method, Chain-of-Thought literature, and explore the implications for educational practice and fu­
(CoT) prompting improves reasoning by breaking down tasks into step­ ture research. Finally, the Conclusion will summarize the key findings,
by-step processes, enhancing accuracy in complex problem-solving (Wei discuss the limitations of the study, and suggest directions for future
et al., 2022) (Zhang et al., 2022). In educational contexts, prompt train­ research.
ing could empower students to better leverage AI tools for learning,
leading to improved academic performance and a deeper understanding 2. Background
of the subject matter (Walter, 2024). However, there are limited stud­
ies analyzing the effects of prompting skills and student performance, 2.1. Generative AI in education
suggesting a need for further research in this area.
Building on this understanding, our study considers prompt train­ Generative AI employs machine and deep learning to autonomously
ing as an intervention to assess its effect on the learning of students in create new data, marking a significant difference from traditional AI
the domain of data analysis and programming. Recent research in data tasks (Feuerriegel et al., 2024). Large Language Models (LLMs) are at
science indicates that pr­ficiency in data analysis skills enables individ­ the core of this technology, which are constructed from extensive neural
uals to effectively interpret complex datasets and uncover meaningful networks, and excel in processing and generating language-based data,
patterns (Donoho, 2024) (Nasution et al., 2023). For example, program­ producing outputs that closely mimic human creations (Brynjolfsson et
ming languages like Python automate data processing and facilitate re­ al., 2023). Their underlying architectures, particularly the transformer
producible analyses, enhancing efficiency and accuracy (Rogel-Salazar, model with self-attention mechanisms, enable LLMs to capture intricate
2023). Moreover, the job market increasingly demands these competen­ linguistic patterns and relationships, facilitating advanced text genera­
cies, with employers seeking professionals who can transform raw data tion (Hadi et al., 2023). The transformative potential of generative AI
into actionable insights (Donoho, 2024) (Nasution et al., 2023). extends across multiple fields, including healthcare, business, research,
To study the impact of prompt training, we designed an experiment and education (Fui-Hoon Nah et al., 2023). By leveraging its ability to
in the domain of data analysis using prompt training with ChatGPT. Our generate high-quality content, these sectors have seen significant ad­
study involves three groups of participants: vancements in efficiency, creativity, and engagement. For example, in
business, it enhances customer service through specific chatbots and per­
• Control Group: Participants have access to the internet but do not sonalized marketing strategies (Kshetri et al., 2023).
have access to generative AI tools. (G1) Generative AI has demonstrated applications in education, enhanc­
• Experimental Group 1: Participants have access to the internet and ing teaching and learning experiences. A few studies have highlighted
GenAI (ChatGPT) without any prompt training. (G2) its role in increasing student engagement and motivation. For instance,
• Experimental Group 2: Participants have access to the internet and Lee et al. (2023) found that AI-based content generators improved read­
GenAI (ChatGPT) with prompt training. (G3) ing enjoyment and interest among elementary students, making learning
more engaging and interactive. Similarly, Guo et al. (2023) highlighted
All the groups were given a three-hour task to measure the data how AI-based chatbots can enhance task motivation and argumentation
analysis skills focusing on two basic concepts - data aggregation and skills in undergraduate students, fostering critical thinking and effective
merging. While experimental group 1 had access to internet, experimen­ communication. In a study by Aktay et al. (2023), fourth-grade students
tal group 2 underwent an hour-long training session to learn essential found ChatGPT engaging and ben­ficial for academic achievement, sug­
prompting skills while using ChatGPT. This session educates students gesting its use in various subjects. In the context of language learning,
on prompting techniques such as few-shot prompting, chain prompting, Jeon and Lee (2023) emphasized the complementary relationship be­
and the CLEAR framework of prompting (Lo, 2023). Using this study tween human teachers and ChatGPT, showcasing its role in providing
design we propose to answer these research questions, personalized feedback and practice for language learners. Hong (2023)
highlighted the opportunities ChatGPT presents in foreign language
• RQ1 How do the learning outcomes, as measured by post-test scores teaching, stressing the importance of ethical discussions with students
across different task levels as d­fined by Bloom’s taxonomy, differ about its use. Moreover, the broader impact of generative AI on edu­
between the control group and the experimental groups (with and cational methods and creative collaboration has been explored. Chen et
without prompt training)? al. (2022) reviewed two decades of AI in education, underscoring its po­
• RQ2 How does the use of ChatGPT, with and without prompt train­ tential to revolutionize instructional methods and student engagement.
ing, impact the performance of students in data analysis tasks com­ Hutson and Schnellmann (2023) discussed the collaborative role of AI
pared to a control group with no access to generative AI tools? in creating poetry, emphasizing the need to balance AI assistance and
independent learning. These studies collectively illustrate the versatile
In this study, 157 learners (M = 117, F = 40) participated and applications of Generative AI in education, promoting personalized, ef­
all of them were first-year undergraduate students from an engineer­ ficient, and engaging learning experiences across different educational
ing college. The following measures were used - data literacy survey, levels and subjects.

2
A. Garg, K. Nisumba Soodhani and R. Rajendran
Computers and Education: Arti cial Intelligence 8 (2025) 100380
Extending beyond, generative AI has shown remarkable promise in out specific training data, as demonstrated by Radford et al. (2019).
the domain of programming also, recent studies have illuminated the Next, few-shot prompting introduces a few examples to improve model
impactful role of generative AI in this field. An experimental study show­ understanding, with Brown et al. (2020) showing that providing high­
cased how the programming tool Codex, powered by generative AI, out­ quality examples enhances performance on complex tasks. Chain-of­
performed learners in a CS1 class on a rainfall problem, ranking in the thought (CoT) prompting, introduced by Wei et al. (2022), advances
top quartile (Denny et al., 2023). Another investigation used the flake8 the technique by enabling step-by-step reasoning, significantly improv­
tool to assess code generated by AI against the PEP8 coding style, reveal­ ing performance in reasoning tasks. To automate this process, Zhang et
ing a minimal syntax error rate of 2.88% (Feng et al., 2023). A notable al. (2022) proposed Automatic Chain-of-Thought (Auto-CoT) prompt­
study involved Github’s generative AI platform, which initially failed to ing, which generates diverse reasoning chains automatically, improving
solve 87 Python problems; however, applying prompt engineering tech­ accuracy and robustness without manual effort. Further r­fining CoT
niques enabled it to resolve approximately 60.9% of them successfully prompting, Wang et al. (2022) introduced self-consistency, generating
(Finnie-Ansley et al., 2022). These findings collectively highlight the ef­ multiple reasoning chains and selecting the most consistent final an­
ficacy of generative AI in code generation. Despite these advancements, swer to enhance accuracy through diverse approaches. Finally, Logical
there is a notable gap in empirical research examining the i­fluence Chain-of-Thought (LogiCoT) prompting, proposed by Zhao et al. (2023),
of prompt training in the education domain. While prompt engineering integrates symbolic logic principles to verify each reasoning step, reduc­
has been shown to enhance the performance of generative AI in solving ing errors and hallucinations.
problems, its application and effectiveness in the context of data analysis To capture the nuance of these prompting techniques, frameworks
education remain underexplored. This area needs further investigation are designed to facilitate their practical implementation in real-world
to determine how targeted prompt training can improve learning out­ applications. Moreover, prompting frameworks play a crucial role in
comes and practical skills for students in data analysis courses. bridging the gap between the model’s capabilities and the practical
needs of users. They provide the necessary infrastructure for integrating
2.2. Challenges in using GenAI external tools, maintaining historical information, and ensuring struc­
tured and safe outputs (Liu et al., 2023). For example, frameworks like
Alongside these advancements, generative AI also faces significant LangChain and Semantic Kernel enable LLMs to interact with databases,
challenges. One major issue is hallucination in AI models, where the gen­ web browsers, and other external systems, thus overcoming the LLMs’
erated content deviates significantly from practical constraints or reality inherent limitations and extending their applicability (Liu et al., 2023).
(Mukherjee & Chang, 2023). This problem arises because AI models, In this study we have chosen the CLEAR (Concise, Logical, Explicit,
unlike traditional rule-based systems, learn from examples encountered Adaptive, R­flective) framework as mentioned in Table 1, offering a
during their training phase and lack an explicitly cod­fied set of rules to systematic strategy for crafting prompts that harness the full potential
separate fact from fiction (Varsha, 2023). Consequently, AI models may of AI language models. It’s not merely about the individual components
produce creative content that deviates too far from practical constraints of clarity, context, formatting, or verbosity control; it’s about how these
(hallucination) or remains too rigidly within the co­fines of existing elements synergize through a comprehensive, adaptable framework to
data (memorization) (Prasai, 2023) (Ji et al., 2023). Striking a balance elevate the entire communication process with AI (Lo, 2023).
between novelty and usefulness is crucial for generating original con­
tent that is both relevant and practically applicable in a given context 3. Methods
(Mukherjee & Chang, 2023). Understanding and mitigating hallucina­
tion is essential for developing AI systems that generate trustworthy 3.1. Experimental procedure
and practical content (Sovrano et al., 2023). Additionally, AI models,
trained on vast datasets, generate content based on probabilities with­ The study emphasizes two critical data analysis concepts: data aggre­
out understanding real-world contexts, leading to potentially misleading gation and merging. Aggregation simpl­fies data, revealing trends and
and irrelevant outputs (Ebert & Louridas, 2023). Ensuring the accuracy easing novices into complex tasks, much like learning the alphabet be­
and reliability of AI-generated content is crucial. fore forming sentences (McKinney, 2022). Merging integrates diverse
Ways to address these challenges are included both from the de­ datasets, essential for the data landscape, offering a un­fied view (McK­
signer’s end and from the learners’ end. Improvements by fine-tuning inney, 2022). The tasks of data aggregation and merging that are to be
the model and development of human-AI collaboration are from the completed are designed in a way that cannot be completed with non­
designers’ perspective (Leiser et al., 2023). From the learner’s end, programming software like Excel and Tableau. We have selected Python
one promising solution to these challenges can be prompt engineering. programming because of its relevance in real-world applications, along
Prompt engineering extends the capabilities of large language models with market demand, as highlighted by Ying and Zhang (2019). The
(LLMs) and vision-language models (VLMs) by providing task-specific dataset includes dummy data of tablet usage by school students from
instructions (Sahoo et al., 2024). These prompts guide model outputs September to January, capturing attributes such as student video us­
without modifying core parameters, allowing seamless integration into age, student ID, school ID, view count, and last access date and time.
diverse tasks (Sahoo et al., 2024) (Shah, 2024). This approach enhances Each month’s dataset contains over 10,000 observations, providing a
model efficacy, enabling success in applications ranging from question­ comprehensive view of student engagement with video content. This
answering to commonsense reasoning. Prompting leverages pre-trained dataset draws inspiration from a school education program where stu­
knowledge to generate desired behaviors, enhancing adaptability and dents are provided tablets to enhance learning. The following are the
reducing the need for extensive retraining (Kucharavy, 2024) (Sahoo et problem statements of the task:
al., 2024) (Shah, 2024). By systematically categorizing various prompt­
ing techniques and highlighting their strengths and limitations, prompt • T1: Calculate the total daily video usage for each student across
engineering offers a practical way to address the challenges posed by all months. (In the dataset, students watched the content multiple
generative AI (Brown et al., 2020) (Wei et al., 2022) (Radford et al., times a day. Participants need to calculate the total daily video us­
2019). age for each day and repeat this task for all four months of the
dataset October to January).
2.3. Prompt engineering • T2: Given the unique data capture cycle of student video usage (the
26th of one month to the 25th of the next), compute the monthly
The journey of prompt engineering begins with zero-shot prompt­ total video usage for each student. For example, compute the total
ing, where models perform tasks based on general instructions with­ video usage for October (1st October to 31st October).

3
A. Garg, K. Nisumba Soodhani and R. Rajendran
Computers and Education: Arti cial Intelligence 8 (2025) 100380
Table 1
Elements of the CLEAR framework provide guidance for writing structured prompt (Lo, 2023).

Characteristic Description

Concise Prompt should be clear and focused on the task’s core elements to guide AI towards relevant and
precise responses.
Logical Prompts need a logical flow of ideas, helping AI understand context and relationships between
concepts, resulting in coherent outputs.
Explicit Prompts must clearly specify the expected output format, content, or scope to avoid irrelevant
responses.
Adaptive Prompts should be flexible, allowing experimentation with different structures and settings to
balance creativity and spec­ficity.
R­flective Continuous evaluation and r­finement of prompts are crucial, using insights from previous
responses to improve future interactions.

• T3: Calculate the monthly video usage for each school over all the highlighted critical dimensions of student data literacy, informing the
months. (Each school has multiple students, so aggregate the to­ survey’s emphasis on tangible skill assessment (Oguguo et al., 2020).
tal video usage for each month and compute the total video usage Zheng provided methodological insights into teaching and learning data
during the intervention period for each school). analytics, which guided item development regarding practical AI usage
(Zheng, 2019). Finally, da Silveira et al. underscored the importance of
Datasets and tasks are provided to students, and the output CSV/Excel programming languages in cultivating analytical pr­ficiency, shaping
file is collected at the end of the task. the survey’s focus on technical fluency (Balreira et al., 2023). Together,
these references helped ensure that the survey items captured both the
3.2. Participants theoretical underpinnings of data literacy and its real-world application
within generative AI contexts.
We grouped learners into three groups - the control group (access Additionally, the data literacy survey questionnaire was meticu­
to the internet but no access to GenAI), experimental group 1 (access lously reviewed by experts to ensure alignment with the curriculum
to the internet and GenAI without prompt training), and experimen­ requirements of engineering colleges in India. Further statistical evalu­
tal group 2 (access to the internet and GenAI with prompt training). ations, including a Cronbach’s alpha value of 0.71, co­firmed the ques­
In total 157 learners participated in this study all were first-year un­ tionnaire’s reliability, making it a robust tool for assessing data literacy
dergraduate students from an engineering college. Though the students skills within this educational context.
were from different streams all of them are first-year undergraduate stu­
dents for whom the curriculum remains the same regardless of their 3.3.2. Pre-test and post-test of domain knowledge
specialization. The stream-specific courses start from the second year A set of 15 multiple choice questions was used for the pre-test and
onwards. post-test focusing on the concepts of aggregation and merging. These
The sample size is predicated upon several critical metrics, namely questions assessed the participants’ comprehensive understanding of the
effect size, power, and alpha level. According to the methodologies out­ concepts through Python programming. Designed across three levels of
lined by Endres et al. (2017), an effect size of 0.35 was selected, repre­ Bloom’s taxonomy - understanding (L1), applying (L2) and analyzing
senting the expected magnitude of differences between the groups. To (L3), the test comprised five questions each for each level. The questions
achieve a statistically robust analysis, a power level of 0.8 was estab­ were adapted to assess concepts of data aggregation and data merging
lished. This power level sign­fies an 80% probability of detecting true in Python, drawing inspiration from the official Pandas documentation
effects, thereby minimizing the risk of Type II errors (Cohen, 2013). and the book Python for Data Analysis by McKinney (2022).
An alpha level of 0.05 was set to control for the likelihood of Type
I errors, ensuring the precision and accuracy of the results. Utilizing • Item Development and Expert Review: Each item underwent face-
these parameters, a GPower analysis was conducted to determine the re­ and content-validity checks with domain experts (academics and
quired sample size for the study. GPower, a widely recognized tool for industry professionals). We r­fined or removed ambiguous items
power analysis, indicated that a sample size of 125 participants would during the development phase based on expert feedback. This re­
be necessary to achieve reliable results using the one-way ANOVA tech­ view ensured that questions accurately assessed data aggregation
nique. and merging skills at the intended Bloom’s taxonomy level.
To facilitate effective comparative analysis, the study flow for var­ • Statistical Item Analysis: We conducted item analysis on the pre­
ious groups is shown in Fig. 1. A survey on familiarity with the use of test scores to evaluate question quality. Cronbach’s alpha was 0.42
GenAI was collected followed by pre-test, tasks, and post-test. Prompt for the overall test, while modest, was deemed acceptable given
training was conducted for group 3 before the start of the task. Ethical the exploratory nature and specific context of this study. While this
approval was obtained from the Institute Review Board of the authors’ value is below the conventional benchmark of 0.70, we believe it
institution. can be partly attributed to:
– The short test length (15 items split across three distinct cognitive
3.3. Measures levels).
– The multidimensional nature of the instrument (covering under­
3.3.1. Data literacy survey standing, application, and analysis).
In line with Hamilton et al., who d­fine data literacy as ``the abil­
ity to ask and answer questions about collecting, analyzing, and making Despite this modest alpha, the iterative review process ensures construct
sense of data,'' a customized four-item Likert-scale survey was devel­ and content validity in line with our educational objectives.
oped to evaluate participants’ competence in using generative AI for
data analysis (Hamilton et al., 2009). This measure encompasses two 3.3.3. Task score
primary dimensions: (1) co­fidence in applying AI-driven solutions, and The tasks mentioned in section were evaluated using a rubric. Task
(2) frequency of AI usage in practical data-analysis scenarios. The scale completion scores were calculated across three tasks, including the to­
design drew on several sources in the existing literature. Oguguo et al. tal score. For Task 1 and Task 2, each task is worth a maximum of 4

4
A. Garg, K. Nisumba Soodhani and R. Rajendran
Computers and Education: Arti cial Intelligence 8 (2025) 100380

Fig. 1. The figure explains the study flow for all three groups and what the difference in the process.

points, where students earn 1 point per correctly produced data output 3.5. Prompt training
file for each month, as spec­fied in the dataset. Thus, students who gen­
erate all four required files correctly receive the full 4 points for these In the previous detailing of the control group and experimental
tasks. Task 3, however, has a maximum score of 2 points and requires group, it is highlighted that one subset of participants was granted ac­
the correct merging of data across all schools. Here, students receive 1 cess to Generative AI and underwent training in the prompt formulation.
point for a correctly merged dataset but lose 1 point if any school is To facilitate this, a one-hour instructional session was organized before
missing from the merged data, r­flecting the importance of comprehen­ the main task as depicted in Fig. 2. During this session, participants are
sive data integration. Overall, the scoring system emphasizes accuracy introduced to a handout that explains the fundamentals of generative
in file generation and completeness in data merging. AI and various prompting techniques such as zero prompting, few-shot
prompting, and chain prompting. It also includes the CLEAR framework
3.3.4. System usability scale for prompting and strategies from OpenAI. Students read through the
The System Usability Scale (SUS) was adapted to capture feedback handout to understand these basic concepts, followed by a short group
on the Generative AI tool used in data analysis. This scale is a widely discussion to assess their understanding and address any questions they
utilized 10-item questionnaire that has been validated across numerous may have.
domains to measure perceived ease of use and user satisfaction. The SUS Then, students apply their knowledge by writing a prompt for a given
uses a balanced mix of positive and negative items scored on a 5-point question mentioned in Fig. 3. This exercise allows them to r­flect on
scale, suitable for evaluating the tool’s effectiveness and user experience. their understanding and practice their prompt-writing skills. After five
A comprehensive evaluation of the tool’s impact on user pr­ficiency minutes, we present a structured prompt for the same question and fa­
can be achieved by triangulating SUS scores with task performance and cilitate a discussion to analyze its structure and the concepts employed.
This helps students connect the learned concepts to the prompt, rein­
learning metrics (Brooke, 1996).
forcing their comprehension.
In the next phase, students work on a data analysis problem men­
3.4. Experience survey tioned in Fig. 4 using a provided dataset that is different from the task
dataset. They write a prompt to generate Python code for the analysis,
The Experience Survey was conducted to gather detailed feedback applying their previous learning and seeking assistance online or from
from participants regarding their use of prompts in data analysis tasks. ChatGPT if needed. After their attempt, we present a structured prompt
The survey focused on the usability of prompts and suggestions for im­ for the same problem and engage in a discussion to decode and explain
provement. The feedback collected through this survey was analyzed to the strategies used. Students then use the prompt to generate and run
identify common themes and areas for improvement. the Python code, ensuring it functions as expected.

5
A. Garg, K. Nisumba Soodhani and R. Rajendran
Computers and Education: Arti cial Intelligence 8 (2025) 100380
Table 2
Establishing Baseline Homogeneity on the basis of Pre-test score of participants.

Category G1 (N = 64) G2 (N = 34) G3 (N = 59) ANOVA/𝜒 2 p-value

Male 46 29 42
Female 18 5 17 𝜒 2 (2) = 2.66, p =
0.2645
Data Literacy Survey
Co­fidence in performing data 2.53 (1.08) 2.53 (0.98) 2.73 (0.86) F(2, 154) = 0.742,
analysis tasks p = 0.4777
Co­fidence in programming with 2.64 (0.87) 2.38 (0.67) 2.54 (0.60) F(2, 154) = 1.026,
Python p = 0.3610
Frequency of using generative AI 3.09 (1.20) 2.65 (1.08) 3.15 (1.17) F(2, 154) = 2.632,
for data analysis p = 0.0751
Co­fidence in using generative AI 3.08 (1.03) 2.50 (0.86) 2.93 (0.75) F(2, 154) = 4.238,
for data analysis p = 0.0899
Pre-test Scores (Mean ± S.D)
Total Score 3.85 (2.50) 3.47 (2.71) 3.83 (2.14) F(2, 144) = 0.736,
p = 0.481

pected frequencies were at least 5 in every category, and sampling was


random.
To ensure group homogeneity before evaluating treatment effects,
pre-test scores were compared across groups using ANOVA. The impact
of prompt training and generative AI usage on outcomes was also ana­
lyzed through ANOVA, allowing for the assessment of significant inter­
vention effects. Although standard ANOVA is generally robust to mod­
erate violations of the homogeneity-of-variance assumption, Welch’s
ANOVA provides an additional safeguard when group sizes and/or vari­
ances differ significantly. For Welch’s ANOVA and ANOVA, the nor­
mality assumption was assessed using the Shapiro-Wilk test (𝑝 > 0.05),
indicating no deviation from normality. The variance assumption was
checked using Levene’s test (𝑝 > 0.05), co­firming the homogeneity of
variances.
Post-hoc comparisons were conducted using Tukey’s Honestly Signif­
icant Difference (HSD) test to identify specific differences between group
means. The assumptions for Tukey HSD were met because the data satis­
fied ANOVA assumptions for normality (𝑝 > 0.05 via Shapiro-Wilk test)
and homogeneity of variances (𝑝 > 0.05 via Levene’s test), with inde­
pendent observations and a significant ANOVA result (𝑝 < 0.05). To
investigate the relationship between programming knowledge and data
analysis task performance, Hake gain analysis was performed to mea­
Fig. 2. The flowchart outlines the structured approach used to train participants sure improvements in understanding. Non-linear correlations were then
in writing effective prompts, incorporating generative AI, prompt techniques, explored using Spearman’s rank correlation coefficient to assess the re­
the CLEAR framework, and OpenAI strategies and examples. lationship between Hake gain scores and task completion performance.

4. Results
This instructional design ensures that students not only grasp the the­
oretical aspects of prompting but also apply them practically, creating a 4.1. Baseline homogenity
comprehensive learning experience. By integrating reading, discussion,
application, and r­flection, we foster a holistic learning environment Baseline homogeneity was assessed among all 157 participants us­
that promotes deeper understanding and skill development in prompt ing a data literacy survey, demographic analysis, and pre-test scores. All
writing and generative AI. participants were first-year engineering students in India, aged between
To ensure that no undue advantage was conferred in the task do­ 18 and 20, with a common curriculum background. Group details are
main, the live example provided was unrelated to data analysis tasks. provided in Table 2. The survey included Likert scale-based questions
It is worth noting that, post-study, the remaining two groups were also designed to gauge participants’ skills in data analysis and programming.
The responses were recorded on a 1-5 scale, with 1 indicating low con­
educated on prompt formulation to uphold ethical standards.
fidence or frequency and 5 indicating high co­fidence or frequency.
ANOVA was conducted for each question to test for significant differ­
3.6. Statistical analysis ences between the groups. The results, including means, standard devi­
ations, and p-values, are presented in Table 2. No significant differences
All statistical analyses were performed utilizing the Real Stats add-on were found among the groups for any question (𝑝 > 0.05), co­firming
for MS Excel. To compare data literacy scores among the three groups the homogeneity in baseline skills. Additionally, the chi-square test con­
(one control and two experimental), a one-way Analysis of Variance firmed no significant gender-based effect, 𝜒 2 (2) = 2.66, p = 0.2645.
(ANOVA) was employed, as it is suitable for assessing mean differences These results indicate no significant differences among the groups
across multiple independent groups. Gender-based effects were analyzed in their pre-test scores for any level (𝑝 > 0.05), co­firming baseline ho­
using a chi-square test appropriate for examining associations between mogeneity in programming skills related to data analysis. This ensures
categorical variables. The chi-square assumptions were sati­fied because that any observed effects in the main study are due to the experimental
the observations were independent, the sample size exceeded 20, ex­ conditions rather than pre-existing differences among the participants.

6
A. Garg, K. Nisumba Soodhani and R. Rajendran
Computers and Education: Arti cial Intelligence 8 (2025) 100380

Fig. 3. Structured prompt example -1, explain the concept and strategies mentioned to write the prompt, this example is for the UPSC problem.

Fig. 4. The Figure explains for structured prompt for a given dataset problem to generate Python code and asks to explain the code.

4.2. Effect of prompt training and generative AI on programming learning G2 and G3. Additionally, at understanding (L1) G2 performed substan­
tially better compared to G1, highlighting the effectiveness of generative
The impact of the intervention on participants’ programming knowl­ AI compared to internet search. At the same time, performance at level
edge was analyzed using ANOVA on post-test scores for understanding 2 and level 3 is not significant which highlights the need for prompting
(L1), application (L2), and analysis (L3) levels, as well as the total score skills.
mentioned in Table 3. The table presents the results of 144 students who
completed the post-test. Out of the 157 students who consented to par­ 4.3. Impact of intervention on task completion scores
ticipate in the study, 13 did not take the post-test. The results showed
significant differences among the groups for all levels, indicating a sub­
stantial impact of the intervention. G3 consistently outperformed G2 To evaluate the impact of the intervention, we analyzed the task
and G1, with the highest mean scores across all levels. For L1 (under­ completion scores across three tasks and the total score for all 157 partic­
standing), G3 scored 2.69 (s.d=0.88), G2 scored 2.31 (s.d=0.87), and ipants. Each task was rated based on a detailed rubric, with scores out of
G1 scored 1.39 (s.d=0.35). Similar trends were observed for L2 (ap­ 10. The ANOVA results in Table 4 indicate significant differences among
plication) and L3 (analysis), with G3 scoring significantly higher than the groups for all tasks, highlighting the intervention’s effect. G3 consis­
the other groups. The total post-test scores also revealed G3’s superior tently scored highest across all tasks and the total score. For Task 1, G3
performance (6.60, s.d=2.69) compared to G2 (4.94, s.d=2.25) and scored 3.41 (s.d=0.75), significantly higher than G2 (2.74, s.d=0.83)
G1 (4.28, s.d=1.17). These results underscore the effectiveness of the and G1 (1.11, s.d=0.94), with F(2, 154) = 117.38, 𝑝 < 0.001. A simi­
intervention prompt training with the integration of generative AI, par­ lar trend was observed for Task 2, with G3 outperforming G2 and G1.
ticularly for G3. For task 3, G2 and G3 outperformed G1, but there was no significant
Further analysis using Tukey’s HSD test revealed significant differ­ difference between G2 and G3. The total score analysis also showed sig­
ences between specific groups. These findings highlight the effectiveness nificant differences, with G3 scoring 8.76 (s.d=1.43) compared to G2
of the intervention for G3, suggesting that the training program signifi­ (6.91, s.d=2.34) and G1 (2.00, s.d=2.26), with F(2, 154) = 183.15,
cantly enhanced their programming skills in data analysis compared to 𝑝 < 0.001.

7
A. Garg, K. Nisumba Soodhani and R. Rajendran
Computers and Education: Arti cial Intelligence 8 (2025) 100380
Table 3
Comparison of Post-Test Scores Using ANOVA, Welch’s ANOVA and Post Hoc Test, Indicating Significant Differences in Post-Test
Scores at All Levels and Total Score Among the Three Groups.

Level G1 (Mean G2 (Mean G3 (Mean ANOVA Welch’s G1 vs G3 G2 vs G1 G3 vs G2


± S.D) ± S.D) ± S.D) F(df), ANOVA F’ (p-value) (p-value) (p-value)
p-value (df1,df2),
p-value

L1: Under­ 1.39 2.31 2.69 F(2, 141) = 43.075 8.92E-14 2.76E-06 0.097
standing (0.35) (0.87) (0.88) 37.22, (2,71.76),
1.05E-13 5.13E-13
L2: 1.56 1.38 2.24 F(2, 141) = 22.8298 7.91E-07 0.410 9.01E-08
Application (0.46) (0.44) (0.41) 22.28, (2,80.53),
3.90E-09 1.40E-08
L3: Analysis 1.32 1.28 1.84 F(2, 141) = 11.7119 7.42E-05 0.967 3.59E-04
(0.40) (0.34) (0.44) 12.12, (2,82.99),
1.39E-05 3.31E-05
Total Score 4.28 4.94 6.60 F(2, 141) = 38.4089 1.48E-14 0.092 1.30E-06
(1.17) (2.25) (2.69) 39.29, (2,74.41),
2.75E-14 3.48E-12

Table 4
Tasks scores across groups G1, G2, and G3.

Level G1 (Mean G2 (Mean G3 (Mean ANOVA Welch’s G1 vs G3 G2 vs G1 G3 vs G2


± S.D) ± S.D) ± S.D) F(df), ANOVA F’ (p-value) (p-value) (p-value)
p-value (df1,df2),
p-value

Task 1 1.11 2.74 3.41 F(2, 154) = 113.31 (2, 1.69E-14 1.51E-14 9.62E-04
(0.94) (0.83) (0.75) 117.38, 86.37),
1.08E-31 8.63E-26
Task 2 0.75 2.59 3.66 F(2, 154) = 69.91 (2, 1.69E-14 1.63E-07 3.55E-03
(1.57) (1.94) (1.12) 57.81, 77.23),
1.87E-19 7.29E-18
Task 3 0.28 1.59 1.69 F(2, 154) = 68.59 (2, 1.69E-14 9.02E-14 0.78
(0.70) (0.82) (0.73) 66.15, 82.76),
1.84E-21 2.07E-21
Total Score 2.00 6.91 8.76 F(2, 154) = 198.05 (2, 1.69E-14 1.69E-14 9.69E-05
(2.26) (2.34) (1.43) 183.15, 77.97),
1.94E-41 6.75E-47

Table 5
Hake gain of pre and post-test, total task score, and Spearman correlation for groups
G1, G2, and G3.

Group Hake Gain (Mean Total Task Score Spearman


± S.D) (Mean ± S.D) Correlation

G1 0.0401 ± 0.2609 2.00 ± 2.26 r(42) = 0.72, P


value = 4.48 E-08
G2 0.1274 ± 0.0878 6.91 ± 2.34 r(25) = 0.90, P
value = 1.42 E-10
G3 0.2463 ± 0.0044 8.76 ± 1.43 r(48) = 0.77, P
value = 2.93 E-12

Table 4 shows the mean and standard deviation (Mean ± S.D) of scored high in both areas, demonstrating a strong association between
post-test scores for tasks and total scores across groups G1, G2, and programming knowledge and better performance in data analysis tasks.
G3. ANOVA results indicate significant differences, with post hoc test Group 2 showed a significant role of programming knowledge in their
p-values highlighting specific group differences. intermediate performance levels, while Group 1, despite having lower
To further understand the relationship between programming knowl­ scores, also displayed a strong association between programming knowl­
edge and data analysis task performance, we conducted a Hake Gain edge and task performance. These findings emphasize that programming
analysis and non-linear correlation analysis (Spearman’s correlation) knowledge is a critical factor in enhancing data analysis task perfor­
between the Hake Gain scores and data analysis task completion scores. mance across all groups
The Hake Gain analysis results are summarized in Table 5
4.4. System usability scale results
Table 5 presents the Hake gain of pre and post-test, total task score,
and Spearman correlation for groups G1, G2, and G3. Of the 157 partic­ To evaluate the usability of ChatGPT, a comparison of System Us­
ipants, 121 completed both the pre- and post-tests as well as the task. ability Scale (SUS) scores was conducted between experimental group 1
The correlation coefficients (r) and p-values indicate the strength and (G2), which did not receive prompt training, and experimental Group 2
significance of the relationship between Hake gain and total task scores (G3), which received prompt training. An independent samples t-test as­
within each group. suming equal variances revealed a statistically significant difference in
There is a strong positive correlation between programming knowl­ SUS scores between the two groups, with G3 reporting a mean SUS score
edge and data analysis task performance across all groups. Group 3 of 77.33 (SD = 9.64) and G2 a mean score of 73.16 (SD = 10.18), t(91)

8
A. Garg, K. Nisumba Soodhani and R. Rajendran
Computers and Education: Arti cial Intelligence 8 (2025) 100380
= 1.967, p = 0.026 (one-tailed). The effect size (r = 0.202) indicates three individual tasks and the total task score level, underscoring the
a small but meaningful impact of prompt training on usability. These combined ben­fits of Generative AI and prompt techniques. This was
results suggest that prompt training significantly improves users’ per­ contrasted with Group 1 (G1), which relied solely on traditional meth­
ceived usability of ChatGPT, underscoring its importance for enhancing ods, and Group 2 (G2), which used Generative AI without prompt train­
user experience. ing. When comparing G2 to G1, we found significant differences in all
task scores, both individually and in total, indicating that even without
5. Discussion prompt training, Generative AI substantially improves task completion
rates compared to traditional methods. This finding aligns with previ­
5.1. Effect of generative AI and prompt training in programming knowledge ous research in programming, where code generated by AI tools often
surpasses the quality of code written by novice learners within a limited
To address RQ1, ``What is the impact of integrating Generative AI time frame. Moreover, the higher System Usability Scale (SUS) scores re­
with prompt engineering on first-year engineering students’ learning ported by participants in G3 further support these results, indicating a
in the domain of programming for data analysis?'' we compared the more user-friendly and supportive learning environment. Recent studies
performance of different groups. Specifically, Group 3 (G3), which uti­ show that accuracy enhancements of up to 13.79% have been observed
lized both Generative AI and prompt engineering, outperformed Group in code generation tasks Li et al. (2023), and a 12% gain in complex rea­
1 (G1), which relied on traditional internet searches, across all three lev­ soning tasks was achieved Li et al. (2023). These improvements support
els of programming knowledge. This demonstrates the effectiveness of our findings that integrating Generative AI with prompt engineering sig­
the combined approach of Generative AI and prompt engineering over nificantly enhances first-year engineering students’ task completion in
traditional methods. However, when examining the effect of Generative data analysis, highlighting the effectiveness of structured prompting in
AI alone, the results indicate significant improvement only at the un­ educational settings.
derstanding level, but not at the other two levels. This finding suggests Additionally, a Spearman’s correlation analysis revealed a strong
that while Generative AI alone contributes to learning, its full potential positive correlation (approximately 0.8) between programming knowl­
is realized when combined with structured prompt engineering. edge and task completion scores, suggesting that participants who per­
To verify the role of prompt engineering, we compared Group 2 (G2), formed well in tasks also showed improved programming knowledge.
which used Generative AI without prompt training, to Group 3 (G3). The This indicates that while Generative AI facilitates task completion, the
comparison revealed that no significant differences emerged at the un­ retention and deeper understanding of programming concepts develop
derstanding level, which is the lower-order thinking skills of Bloom’s over time. Our study demonstrates that integrating Generative AI with
taxonomy. Nonetheless, the overall scores and performance at higher prompt engineering significantly improves data analysis task comple­
cognitive levels were significantly better in G3, highlighting the criti­ tion among first-year engineering students. This integration not only
cal role of prompting skills in enhancing programming learning. These enhances immediate task performance but also supports deeper learn­
findings align with several recent studies emphasizing the role of Gen­ ing and retention of programming concepts, making it a valuable tool
erative AI as a learning tool and the efficacy of prompting techniques. in educational settings.
The enhanced learner performance observed in our study can be further
explained by the higher System Usability Scale (SUS) scores reported by 5.3. Experience survey feedback
participants in G3 compared to G2. A more interactive and supportive
environment with high usability reduces cognitive load, allowing par­ The Learning Experience Survey results provide additional support
ticipants to focus more on domain learning. Participants reported that for the efficacy of integrating Generative AI with prompt engineering.
prompt techniques, which involved asking the AI to explain the syntax Participants reported that diverse explanations offered by AI catered to
and underlying concepts of generated code, significantly aided their un­ various learning styles, fostering a deeper understanding of program­
derstanding of programming intricacies compared to both control (G1) ming concepts. The structured and iterative nature of prompt crafting
and experimental groups (G2,G3). promoted r­flective learning and skill reinforcement. Additionally, the
Our findings that integrating Generative AI with prompt engineer­ interactive and immediate feedback from AI-fueled curiosity and en­
ing significantly enhances first-year engineering students’ learning out­ couraged continuous discovery. These qualitative insights corroborate
comes in programming for data analysis are corroborated by few recent our quantitative findings, demonstrating that prompt engineering, com­
studies. For instance, Chan and Hu (2023) observed that ChatGPT’s bined with Generative AI, not only improves performance in program­
ability to handle repetitive tasks allows students to focus on advanced ming tasks but also enriches the overall learning experience.
learning, enhancing their performance. Similarly, Alneyadi and War­ Additionally, participants noted that prompt training improved their
dat (2023) demonstrated a clear improvement in post-test scores for questioning skills, emphasizing the importance of context in generating
students using ChatGPT, highlighting its effectiveness in educational set­ code, breaking down problems, and logically structuring queries for bet­
tings. Additionally, Boubker (2024) found that personalized approaches ter responses. This suggests that prompt engineering not only enhances
with ChatGPT improve students’ perceived usefulness and learning ef­ the effectiveness of Generative AI tools but also boosts learners’ effi­
fectiveness, aligning with our results that structured prompt training ciency and problem-solving abilities during experiential learning.
enhances understanding and mastery of programming concepts. These
studies collectively support our conclusion that combining Generative 5.4. Implications for educational theories and pedagogy
AI with prompt engineering is more effective than using traditional
methods or AI alone. Our results show that structured prompt training (G3) fosters gains
not only at the lower levels of Bloom’s taxonomy (understanding) but
5.2. Effect of generative AI and prompt training in data analysis task also at higher levels (application and analysis). By teaching students
completion how to formulate effective prompts and interpret AI-generated feedback,
educators can scaffold progression across cognitive domains, moving
To address RQ2, ``What is the impact of integrating Generative AI learners from basic comprehension toward more complex analytical and
with prompt engineering on first-year engineering students in data anal­ evaluative tasks. This finding underscores the value of explicit prompt­
ysis task completion?'' we analyzed the performance of different groups engineering instruction in helping novice programmers advance along
in the given tasks. We found that Group 3 (G3), which utilized both Bloom’s hierarchy more rapidly and effectively.
Generative AI and prompt engineering, consistently outperformed all From a constructionist perspective, the prompt-training process func­
other groups. The task scores for G3 were significantly higher across all tions as an effective scaffold, guiding students through a cycle of experi­

9
A. Garg, K. Nisumba Soodhani and R. Rajendran
Computers and Education: Arti cial Intelligence 8 (2025) 100380
mentation, feedback, and r­flection. Rather than passively receiving in­ and adapting to individual learning styles, thereby optimizing learn­
formation, learners actively construct their programming knowledge by ing outcomes. Theoretical advancements are needed to r­fine existing
crafting prompts, testing AI responses, and r­fining their understanding frameworks and guide the design of AI-driven educational interventions.
of both code logic and data-analysis techniques. This iterative process New approaches to prompt engineering and user interface design can en­
echoes the hallmarks of cognitive apprenticeship, where novices learn hance the usability and effectiveness of AI tools in the learning-teaching
expert strategies (e.g., chaining prompts, using the CLEAR framework) domain.
through guided practice. Integrating these scaffolds into a programming
curriculum can thus enhance engagement and accelerate the develop­ CRediT authorship contribution statement
ment of problem-solving skills.
Lastly, blending Generative AI with structured prompt training Ashish Garg: Writing -- original draft, Methodology, Investigation,
closely aligns with 21st-century educational objectives that empha­ Formal analysis, Data curation, Conceptualization. K. Nisumba Sood­
size critical thinking, digital fluency, and adaptable problem-solving. hani: Writing -- review & editing, Writing -- original draft, Methodol­
By interacting with AI-driven tools, students in G3 cultivated planning ogy, Investigation. Ramkumar Rajendran: Writing -- review & editing,
queries, evaluating AI outputs, and iterating solutions—all of which are Validation, Supervision, Software, Methodology, Investigation, Funding
indispensable for modern engineering roles. Moreover, when basic or acquisition, Conceptualization.
repetitive coding tasks are offloaded to AI, instructors can reorient class
time to deeper conceptual discussions and collaborative projects, foster­ Ethical considerations and data availability
ing the creativity, communication, and self-directed inquiry advocated
by contemporary learning frameworks. The study was approved by the Institution Review Board ethical com­
In practice, these findings suggest that instructors should consider mittee with ID: IIT-IRB/2021/006. Informed consent was obtained from
incorporating explicit prompt-engineering lessons—covering both tech­ all participants, and their privacy rights were strictly observed. The data
nical (e.g., how to specify parameters or contexts) and cognitive (e.g., can be obtained by sending request emails to the corresponding author.
how to break down a complex task into smaller prompts) dimensions-
into early programming curricula. Doing so can help students leverage Declaration of generative AI and AI-assisted technologies in the
AI more effectively for task completion and skill-building, while also writing process
honing their ability to articulate, analyze, and r­fine problem-solving
strategies. Ultimately, by combining Generative AI with thoughtfully During the preparation of this work, the author(s) used ChatGPT 3.5
designed scaffolds, educators can foster robust learning environments in order to check the grammatical errors and improve readability. After
that promote immediate task success. using this tool/service, the authors reviewed and edited the content as
needed and take full responsibility for the content of the publication.
6. Conclusion and future work
Declaration of competing interest
6.1. Conclusion
The authors declare that they have no known competing financial
The integration of generative AI in education is transforming teach­ interests or personal relationships that could have appeared to i­fluence
ing and learning methodologies. This study explores the potential of the work reported in this paper.
structured prompt training to enhance the educational utility of AI­
generated content. Our findings demonstrate that prompt training sig­ Acknowledgements
nificantly improves learning outcomes, engagement, and the quality of
AI interactions. In data analysis and programming tasks, participants We acknowledge SBI Foundation Hub for Data Science & Analytics,
who received structured prompt training outperformed those who did IIT Bombay for supporting the work done in this project. We thank the
not receive prompt training. By delivering context-specific cues and teachers and students who participated in this study, whose contribu­
structured guidance, prompt training significantly improves problem­ tions were invaluable to this work.
solving skills by addressing the limitations of AI-generated responses,
which often lack contextual relevance. References
The study has the following limitations. The gender imbalance
Aktay, S., Gök, S., & Uzunoğlu, D. (2023). Chatgpt in education. Türk Akademik Yayınlar
among participants and the lack of comparative gender analysis limit Dergisi (TAY Journal), 7(2), 378--406.
the generalizability of the findings. Additionally, the short duration of Alier, M., García-Peñalvo, F., & Camba, J. D. (2024). Generative art­ficial intelligence in
the study poses challenges in assessing the long-term effects of the inter­ education: From deceptive to disruptive.
ventions. The lack of detailed interaction analysis and quantification of Alneyadi, S., & Wardat, Y. (2023). ChatGPT: Revolutionizing student achievement in the
electronic magnetism unit for eleventh-grade students in Emirates schools. Contempo­
prompting skills during task execution makes it difficult to understand rary Educational Technology, 15(4), Article ep448.
the intricacies of the learning process. Balreira, D. G., Silveira, T. L. d., & Wickboldt, J. A. (2023). Investigating the impact of
adopting python and c languages for introductory engineering programming courses.
6.2. Future work Computer Applications in Engineering Education, 31(1), 47--62.
Barbas, M. P., Vieira, A. T., & Branco, P. D. (2023). The importance of chat gpt train­
ing for higher education: Case study. In International conference on design and digital
Future research should address the limitations ident­fied in this study communication (pp. 695--705). Springer.
to enhance our understanding of the impact of generative AI in educa­ Boubker, O. (2024). From chatting to self-educating: Can AI tools boost student learning
tion. Longitudinal studies are essential to explore the sustained impact outcomes?. Expert Systems with Applications, 238, Article 121820.
Bozkurt, A. (2023). Generative art­ficial intelligence (ai) powered conversational educa­
of structured prompt training on learning outcomes over time. A more
tional agents: The inevitable paradigm shift. Asian Journal of Distance Education, 18(1).
balanced gender representation and comparative gender analysis are Brooke, J. (1996). System usability scale (sus). usability evaluation in industry.
necessary to ensure the results are applicable across different demo­ Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A.,
graphics. Multi-modal analysis, incorporating visual and auditory cues, Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners.
should be pursued to gain deeper insights into student-AI interactions Advances in Neural Information Processing Systems, 33, 1877--1901.
Brynjolfsson, E., Li, D., & Raymond, L. R. (2023). Generative ai at work. Tech. Rep., Na­
and cognitive processes. This approach can lead to more personalized tional Bureau of Economic Research.
and effective educational interventions. Developing sophisticated in­ Cao, L., & Dede, C. (2023). Navigating a world of generative ai: Suggestions for educators,
teraction analysis systems will be key in providing real-time feedback The next level lab at harvard graduate school of education 5 (2).

10
A. Garg, K. Nisumba Soodhani and R. Rajendran
Computers and Education: Arti cial Intelligence 8 (2025) 100380
Chan, C. K. Y., & Hu, W. (2023). Students’ voices on generative AI: Perceptions, ben­fits, Liu, X., Wang, J., Sun, J., Yuan, X., Dong, G., Di, P., Wang, W., & Wang, D. (2023). Prompt­
and challenges in higher education. International Journal of Educational Technology in ing frameworks for large language models: A survey. preprint, arXiv:2311.12785.
Higher Education, 20(1), Article 43. Lo, L. S. (2023). The clear path: A framework for enhancing information literacy through
Chen, X., Zou, D., Xie, H., Cheng, G., & Liu, C. (2022). Two decades of art­ficial intelligence prompt engineering. Journal of Academic Librarianship, 49(4), Article 102720.
in education. Educational Technology & Society, 25(1), 28--47. McKinney, W. (2022). Python for data analysis, ``O’Reilly Media, Inc.''.
Cohen, J. (2013). Statistical Power Analysis for the Behavioral Sciences. Routledge. Michel-Villarreal, R., Vilalta-Perdomo, E., Salinas-Navarro, D. E., Thierry-Aguilera, R., &
Dave, D. M., Mandvikar, S., & Engineer, P. A. (2023). Augmented intelligence: Human-ai Gerardou, F. S. (2023). Challenges and opportunities of generative ai for higher edu­
collaboration in the era of digital transformation. International Journal of Engineering cation as explained by chatgpt. Education Sciences, 13(9), 856.
Applied Sciences and Technology, 8(6), 24--33. Mukherjee, A., & Chang, H. (2023). The creative frontier of generative ai: Managing the
Denny, P., Kumar, V., & Giacaman, N. (2023). Conversing with copilot: Exploring prompt novelty-usefulness tradeoff. preprint, arXiv:2306.03601.
engineering for solving cs1 problems using natural language. In Proceedings of the 54th Nasution, M. K., Syah, R., & Elveny, M. (2023). What is data science. In Data science with
ACM technical symposium on computer science education V. 1 (pp. 1136--1142). semantic technologies (pp. 1--25). CRC Press.
Donoho, D. (2024). Data science at the singularity. Harvard Data Science Review, 6(1). Oguguo, B., Nannim, F. A., Okeke, A. O., Ezechukwu, R. I., Christopher, G. A., & Ugorji, C.
Doughty, J., Wan, Z., Bompelli, A., Qayum, J., Wang, T., Zhang, J., Zheng, Y., Doyle, A., O. (2020). Assessment of students’ data literacy skills in southern nigerian universities.
Sridhar, P., Agarwal, A., et al. (2024). A comparative study of ai-generated (gpt-4) and Universal Journal of Educational Research, 8(6), 2717--2726.
human-crafted mcqs in programming education. In Proceedings of the 26th Australasian Park, D., An, G-t., Kamyod, C., & Kim, C. G. (2023). A study on performance improvement
computing education conference (pp. 114--123). of prompt engineering for generative ai with a large language model. Journal of Web
Ebert, C., & Louridas, P. (2023). Generative ai for software practitioners. IEEE Software, Engineering, 22(8), 1187--1206.
40(4), 30--38. Pesovski, I., Santos, R., Henriques, R., & Trajkovik, V. (2024). Generative ai for customiz­
able learning experiences. Sustainability, 16(7), 3034.
Endres, T., Carpenter, S., Martin, A., & Renkl, A. (2017). Enhancing learning by retrieval:
Prasai, S. (2023). Algorithmic Hypnosis.
Enriching free recall with elaborative prompting. Learning and Instruction, 49, 13--20.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al. (2019). Language
Feng, Y., Vanam, S., Cherukupally, M., Zheng, W., Qiu, M., & Chen, H. (2023). Inves­
models are unsupervised multitask learners. OpenAI Blog, 1(8), 9.
tigating code generation performance of chatgpt with crowdsourcing social data. In
Rahman, M. M., & Watanobe, Y. (2023). Chatgpt for education and research: Opportuni­
2023 IEEE 47th annual computers, software, and applications conference (COMPSAC)
ties, threats, and strategies. Applied Sciences, 13(9), 5783.
(pp. 876--885). IEEE.
Rogel-Salazar, J. (2023). Statistics and data visualisation with Python. Chapman and
Feuerriegel, S., Hartmann, J., Janiesch, C., & Zschech, P. (2024). Generative ai. Business
Hall/CRC.
& Information Systems Engineering, 66(1), 111--126.
Sahoo, P., Singh, A. K., Saha, S., Jain, V., Mondal, S., & Chadha, A. (2024). A systematic
Finnie-Ansley, J., Denny, P., Becker, B. A., Luxton-Reilly, A., & Prather, J. (2022). The
survey of prompt engineering in large language models: Techniques and applications.
robots are coming: Exploring the implications of openai codex on introductory pro­
preprint, arXiv:2402.07927.
gramming. In Proceedings of the 24th Australasian computing education conference
Schroeder, K. T., Hubertz, M., Van Campenhout, R., & Johnson, B. G. (2022). Teaching and
(pp. 10--19).
learning with ai-generated courseware: Lessons from the classroom. Online Learning,
Fui-Hoon Nah, F., Zheng, R., Cai, J., Siau, K., & Chen, L. (2023). Generative ai and chatgpt:
26(3), 73--87.
Applications, challenges, and ai-human collaboration. Seo, K., Tang, J., Roll, I., Fels, S., & Yoon, D. (2021). The impact of art­ficial intelligence on
Gozalo-Brizuela, R., & Garrido-Merchan, E. C. (2023). Chatgpt is not all you need. A state learner–instructor interaction in online learning. International Journal of Educational
of the art review of large generative ai models. preprint, arXiv:2301.04655. Technology in Higher Education, 18, 1--23.
Guo, K., Zhong, Y., Li, D., & Chu, S. K. W. (2023). Effects of chatbot-assisted in-class de­ Shah, C. (2024). From prompt engineering to prompt science with human in the loop.
bates on students’ argumentation skills and task motivation. Computers and Education, preprint, arXiv:2401.04122.
203, Article 104862. Siontis, K. C., Attia, Z. I., Asirvatham, S. J., & Friedman, P. A. (2024). Chatgpt hallucinating:
Hadi, M. U., Qureshi, R., Shah, A., Irfan, M., Zafar, A., Shaikh, M. B., Akhtar, N., Wu, can it get any more humanlike?.
J., Mirjalili, S., et al. (2023). Large language models: A comprehensive survey of its Sovrano, F., Ashley, K., & Bacchelli, A. (2023). Toward eliminating hallucinations: Gpt­
applications, challenges, limitations, and future prospects. Authorea. Preprints. based explanatory ai for intelligent textbooks and documentation. In CEUR workshop
Hamilton, L., Halverson, R., Jackson, S. S., Mandinach, E., Supovitz, J. A., Wayman, J. C., proceedings, no. 3444 (pp. 54--65). CEUR-WS.
Pickens, C., Martin, E., & Steele, J. L. (2009). Using student achievement data to support Steiss, J., Tate, T., Graham, S., Cruz, J., Hebert, M., Wang, J., Moon, Y., Tseng, W.,
instructional decision making. Warschauer, M., & Olson, C. B. (2024). Comparing the quality of human and chat­
Hong, W. C. H. (2023). The impact of chatgpt on foreign language teaching and learn­ gpt feedback of students’ writing. Learning and Instruction, 91, Article 101894.
ing: Opportunities in education and research. Journal of Educational Technology and Varsha, P. (2023). How can we manage biases in art­ficial intelligence systems–a system­
Innovation, 5(1). atic literature review. International Journal of Information Management Data Insights,
Hutson, J., & Schnellmann, A. (2023). The poetry of prompts: The collaborative role of 3(1), Article 100165.
generative art­ficial intelligence in the creation of poetry and the anxiety of machine Walter, Y. (2024). Embracing the future of art­ficial intelligence in the classroom: The rel­
i­fluence. Global Journal of Computer Science and Technology: D, 23(1). evance of ai literacy, prompt engineering, and critical thinking in modern education.
Jeon, J., & Lee, S. (2023). Large language models in education: A focus on the comple­ International Journal of Educational Technology in Higher Education, 21(1), 15.
mentary relationship between human teachers and chatgpt. Education and Information Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., & Zhou,
Technologies, 28(12), 15873--15892. D. (2022). Self-consistency improves chain of thought reasoning in language models.
Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y., Chen, D., Dai, W., Chan, preprint, arXiv:2203.11171.
H. S., Madotto, A., & Fung, P. (2023). Survey of hallucination in natural language Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q. V., Zhou, D., et
generation. arXiv:2202.03629. al. (2022). Chain-of-thought prompting elicits reasoning in large language models.
Khazanchi, R., & Khazanchi, P. (2024). Generative ai to improve special education teacher Advances in Neural Information Processing Systems, 35, 24824--24837.
preparation for inclusive classrooms. Exploring New Horizons: Generative Art­ficial In­ Weisz, J. D., He, J., Muller, M., Hoefer, G., Miles, R., & Geyer, W. (2024). Design principles
telligence and Teacher Education, 159. for generative ai applications. In Proceedings of the CHI conference on human factors in
Kshetri, N., Dwivedi, Y. K., Davenport, T. H., & Panteli, N. (2023). Generative art­ficial computing systems (pp. 1--22).
intelligence in marketing: Applications, opportunities, challenges, and research agenda. Ying, F., & Zhang, Z. (2019). Data visualization analysis of big data recruitment positions
Kucharavy, A. (2024). Adapting llms to downstream applications. In Large language mod­ in hangzhou based on python. Review of Computer Engineering Studies, 6(4).
els in cybersecurity: Threats, exposure and mitigation (pp. 19--29). Switzerland Cham: Yu, H., & Guo, Y. (2023). Generative art­ficial intelligence empowers educational reform:
Springer Nature. Current status, issues, and prospects. Frontiers in education: Vol. 8. Frontiers Media SA
Lee, J. H., Shin, D., & Noh, W. (2023). Art­ficial intelligence-based content generator tech­ (p. 1183162).
nology for young english-as-a-foreign-language learners’ reading enjoyment. RELC Zhai, X. (2023). Chatgpt for next generation science learning, XRDS: Crossroads. The ACM
Journal, 54(2), 508--516. Magazine for Students, 29(3), 42--46.
Leiser, F., Eckhardt, S., Knaeble, M., Maedche, A., Schwabe, G., & Sunyaev, A. (2023). Zhang, Z., Zhang, A., Li, M., & Smola, A. (2022). Automatic chain of thought prompting
From chatgpt to factgpt: A participatory design study to mitigate the effects of large in large language models. preprint, arXiv:2210.03493.
language model hallucinations on users. In Proceedings of mensch und computer 2023 Zhao, X., Li, M., Lu, W., Weber, C., Lee, J. H., Chu, K., & Wermter, S. (2023). Enhanc­
(pp. 81--90). ing zero-shot chain-of-thought reasoning in large language models through logic.
Li, Q., Fu, L., Zhang, W., Chen, X., Yu, J., Xia, W., Zhang, W., Tang, R., & Yu, Y. (2023). preprint, arXiv:2309.13339.
Zheng, Y. (2019). A comparison of tools for teaching and learning data analytics. In Pro­
Adapting large language models for education: Foundational capabilities, potentials,
ceedings of the 20th annual SIG conference on information technology education (p. 160).
and challenges, arXiv preprint arXiv:2401.08664.

11

You might also like