(PDF) Factors Influencing Performance of Students in Software Automated Test Tools Course

The present work proposes the application of machine learning techniques to predict the final grades (FGs) of students based on their historical performance of grades. The proposal was applied to the historical academic information available for students enrolled in the computer engineering degree at an Ecuadorian university. One of the aims of the university’s strategic plan is the development of a quality education that is intimately linked with sustainable development goals (SDGs). The application of technology in teaching–learning processes (Technology-enhanced learning) must become a key element to achieve the objective of academic quality and, as a consequence, enhance or benefit the common good. Today, both virtual and face-to-face educational models promote the application of information and communication technologies (ICT) in both teaching–learning processes and academic management processes. This implementation has generated an overload of data that needs to be processed p...

In this paper, we are interested in predicting the computer skills-2 scores for Irbid University college students based on a relationship between the time spent studying for the computer skills-2 test and the final scores, we used it as training data to learn a model that uses the study time to predict the test scores of future students in Irbid University College who are planning to take computer skills-2 course. The predicting model use regression analysis as a type of supervised learning that use to predict the continuous outcomes, we are given a number of predictor variables (the time spent on study for computer skills-2 daily, the time spent on study before exam, the branch of secondary education (Al-Tawjihi) and the student major) and a continuous response variable (outcome/test scores), and we try to find a relationship between those variables that allows us to predict an outcome. The data for 400 students in two semesters was used as the test set for training the predictive model and we used data for 50 students to validate the predictive model. Results showed that the predicting model is useful to use to predicate students' scores. The error range between the predicative score and the actual score for the 50 students it's between +6 and-6. While the root mean squared error (RMSE) = 2.424871 and the root mean squared percentage error (RMSPE) = 4%. The time spent on study for computer skills-2 daily has the most critical influence on the student score.

Nowadays, competence-based assessment is a term widely used in higher educational institutions. Many educational centres focus their efforts on the concretion of competences for the evaluation. For this purpose, in the European Union (EU) was defined the European e-Competence Framework (e-CF), which provides a reference of 40 competences applied to the field of Information and Communication Technologies (ICT) using a common language of competences, skills and capability levels. On the other hand, the incorporation of Information Technologies (IT) such as Audience Response Systems (ARSs) in the European Higher Education Area (EHEA) or other educational institutions has supposed a new way to teach. Unifying these two concepts (Competences and ARSs), this paper presents a novel approach for automated measurement of competences applied to the field of Software Engineering and shows a description about the conceptual model to develop this type of evaluations. Furthermore, we propose the ...

17th IEEE International Conference of Software Testing, Verification and Validation Workshops (ICSTW-2024) Toronto, Canada, pp. 331-338, DOI: https://doi.org/10.1109/ICSTW60967.2024.00064, May 2024 Factors Influencing the Performance of Students in Software Automated Test Tools Course Susmita Haldar Mary Pierce Luiz Fernando Capretz School of Information Technology Fanshawe College London, Canada shaldar@fanshawec.ca Faculty of Business, Info Technology and Pt Studies Fanshawe College London, Canada mpierce@fanshawec.ca Department of Electrical and Computer Engineering Western University London, Canada lcapretz@uwo.ca Abstract—Formal software testing education is important for building efficient QA professionals. Various aspects of quality assurance approaches are usually covered in courses for training software testing students. Automated Test Tools is one of the core courses in the software testing post-graduate curriculum due to the high demand for automated testers in the workforce. It is important to understand which factors are affecting student performance in the automated testing course to be able to assist the students early on based on their needs. Various metrics that are considered for predicting student performance in this testing course are student engagement, grades on individual deliverables, and prerequisite courses. This study identifies the impact of assessing students based on individual vs. group activities, theoretical vs. practical components, and the effect of having taken prerequisite courses in their final grade. To carry out this research, student data was collected from the automated test tools course of a community college-based postgraduate certificate program in software testing. The dataset contained student records from the years 2021 to 2022 and consisted of information from five different semesters. Various machine learning algorithms were applied to develop an effective model for predicting students’ performance in the automated software testing tools course, and finally, important features affecting the students’ performance were identified. The predictive performance model of the automated test tools course that was developed by applying the logistic regression technique, showed the best performance, with an accuracy score of 90%. Index Terms—Software Testing Education, Automated Testing, Selenium, Quality Assurance, Student Engagement I. INTRODUCTION Software testing is critical to delivering software with fewer defects and in turn with good quality. Industries rely on their QA team to verify that the software has been fully tested. Manual testing can be very labor-intensive, monotonous, and errorprone. As a result, to ensure the delivery of quality products, workplaces focus on automating manual test scripts to increase test coverage. Automated testing requires specialized skills in addition to having the mindset of a manual tester, which includes an understanding of coding concepts, knowledge of the IT domain, and a basic understanding of software testing. However, a software testing career is a less favorable choice within the software engineering domain [1]. Capretz et al. [2] conducted a study on the software testing profession, where they identified the factors that motivate or demotivate software testing professionals to sustain their software testing career and found that only 25% of the selected respondents from Canada were motivated to take software testing as a career option. Therefore, when choosing from among the few experienced QA professionals, companies want to ensure that these individuals can recommend the required tools for automating the manual testing effort and develop and execute automated test scripts to ensure higher test coverage. To bridge this gap, the post-graduate certificate program in software testing offers a dedicated course in automated test tools which is worth four credits and can be taken in the second level of a two-level postgraduate program. Students are only permitted to take this course after completing prerequisite courses containing foundation-level concepts in software testing and JAVA programming language. The college needs to be careful to choose only the prerequisite courses that are essential for students’ learning. Keeping unnecessary prerequisites in a course can demotivate students who are ready to take on challenges of advanced concepts. It can also affect the student retention strategy as the students will be restricted from taking this course if they have failed the prerequisite courses in level one. On the other hand, exceptional students may want to take on a challenge by taking an advanced course in their program to keep them motivated without waiting for completion of the prerequisite courses. If only theoretical concepts are taught in a software testing course, the student’s knowledge may be far from the realworld expectations [3]. At the same time, teaching hands-on problems only may not cultivate the critical thinking strategies required for developing an effective automated software testing (AST) solution. In addition, instructors need to understand whether practical exercises are affecting the students’ final grades compared to assessments on theoretical quizzes or assignments to customize the course delivery to meet the needs of the students. Student engagement factors such as the number of times the student logged into the learning management system, and the number of video recording contents completed by the student can assist the instructor and the academic advisor in guiding the students to focus on studying. The number of assignment submissions, and quiz submissions demonstrates whether the 17th IEEE International Conference of Software Testing, Verification and Validation Workshops (ICSTW-2024) Toronto, Canada, pp. 331-338, DOI: https://doi.org/10.1109/ICSTW60967.2024.00064, May 2024 student missed tests or assignments because of being ignorant about them. This research will also assess the contribution of student engagement to students’ final grades. A minimum C grade is desirable as a cumulative GPA lower than a C grade can impact their academic standing status. Teaching students AST concepts, techniques, and theories is a challenging task, especially where the focus is also on vocational learning outcomes, and essential employability skills in addition to course learning outcomes. Software engineering students often do not gain the essential skills and knowledge they need to succeed in the IT industry [4]. Although most Computer Science programs offer some testing concepts in their programs, they often fail to remain aligned with the realities of the IT industry [5]. Applying machine learning algorithms, this paper will investigate whether students’ performance in automated software testing tools course can be predicted based on their assessment from milestone deliverables, their engagement in class, prerequisite courses, etc. The rest of this paper is organized as follows. Section II provides background on Software Testing education, highlighting automated software testing education. Section III describes how the AST course has been structured, and provides the methodology for developing the model for predicting students’ performance on the AST tools course. Section IV presents the results of this study. Finally, section V provides analysis and discussion. This is followed by the paper’s conclusions and suggestions for future work in section VII. II. BACKGROUND AND LITERATURE SURVEY A. Software Testing Education There has been an emphasis on considering software testing in the Software Engineering or Computer Science curriculum. Garousi et al. [6] conducted a study on the state of software testing education in Canadian and American universities and identified the strengths and areas for improvement. As part of their recommendations, they encouraged a systematic software testing curriculum. They also remarked that Computer Science and software engineering graduates need to be able to perform testing and quality assurance tasks on the software they produce after graduation. They found that even some top university programs were not offering any courses in the software testing domain. B. Automated Software Testing Education Teaching only certain testing techniques may not provide the skillset to apply automated testing skills in the industry. As a result, teaching automated test tools as a separate course can give proper emphasis to the professors to teach students how to utilize various AST tools effectively instead of focusing on manual testing only. Barrett et al. [7] surveyed 25 courses offered by 14 universities in Sweden. From their analysis of the basic curriculum, it appears that utilization of AST tools was incorporated with other components in the majority of the courses, but it appears that automated testing was not offered as a separate course in most of these selected universities. C. Student Performance Prediction Student performance prediction is an emerging research area, but this approach has not been applied to software testing students, even though this control group can demonstrate different characteristics than others. For instance, in previous studies, one of the challenges shown with teaching software testing was keeping students’ motivation in testing courses alive compared to their motivation in other subjects [8], [9]. Different researchers applied various techniques for teaching software testing, such as using a gamebased approach [10], [11], utilizing free and open-source software [12], and using Selenium [13] for creating automated test solutions [14], etc. Mohammaed et al. [15] applied the Apriori rule-based algorithm to identify the relationship between student engagement and student performance. They found that highly engaged students were performing well in the courses. Burman et al. [16] applied a multi-class Support Vector Machine (SVM) classification model to classify the learners in the categories of high, average, and low, according to their academic scores. They then applied linear kernel and radial basis kernel. RBF produced better results than the linear kernel for predicting students’ performance and the Radial Basis Function kernel gave more accurate results than the Linear Kernel, which showed an accuracy of approximately 91%. Bhutto et al. [17] applied logistic regression and SVM for student performance prediction. They obtained 73% and 70% accuracy using logistic regression and SVM respectively. Bujang et al. [18] used Decision Tree, SVM, Naive Bayes, KNN, Logistic Regression, and Random Forest(RF) on the course grade datasets of real students, incorporating 1282 records. They proposed a multiclass prediction model to reduce the over-fitting and miss-classification results caused by imbalanced multi-classification, based on the Synthetic Minority Oversampling Technique (SMOTE). Their proposed model, integrated with RF, gave the highest F1 score of 99.5%. Khan et al. [19] used data mining techniques based on LMS activity logs and applied a rule-based algorithm for predicting student performance. They found that there is a considerable correlation between student performance and several different factors, such as resource views, activity gaps, grades from the previous semester, grades from prerequisite courses, and evaluations of first-term tests. In this research, we will also look at student engagement metrics to evaluate student performance in the automated software testing tool course. Professors and academic advisors can use this study to spot students who need extra assistance so they can intervene. Shi et al. [20] analyzed the characteristics of college students’ learning behaviors and explored the predictive learning effect by constructing a machine learning model of learning effect based on information literacy learning behavior characteristics. Out of several algorithms, they attempted, the RF model showed the best performance with a value accuracy of 92.50%, Precision of 84.56%, Recall of 94.81%, F1-Score 89.39%, and Kappa coefficient of 0.859%. 17th IEEE International Conference of Software Testing, Verification and Validation Workshops (ICSTW-2024) Toronto, Canada, pp. 331-338, DOI: https://doi.org/10.1109/ICSTW60967.2024.00064, May 2024 Fig. 1. Methodology applied for predicting the final performance of students of Automated Test Tools course Jayasundara et al. [21] conducted a study on building an explainable boosting student performance model with a dataset from university entrance exam performance. They considered the interpretability of the model by demographic information such as gender, caste, parent’s education, and previous educational background. The accuracy of the developed model had a maximum F1 score of 77% on predicting average grades for students while the other categories such as good, bad, and excellent grade detection rates were relatively low. This research will contribute to the software testing education community by helping us to understand which factors are important for a course in automated testing. III. METHODOLOGY The methodology of the program has been demonstrated in Fig. 1. The first step was to collect the data and apply feature engineering and pre-processing. Selected machine learning algorithms were applied for the student performance model development. The performance of the developed models was evaluated according to the selected criteria. On the bestperforming model, the SHAP technique was applied to understand the feature importance of the model. The details are described below. A. Data Collection The development of the model was initiated with data collection from the one-year postgraduate software testing certificate program available under the Department of Information Technology at Fanshawe College of Applied Arts and Technology located in London Ontario. Automated Test Tools is a second-level course among a total of 11 courses available in the program. The data was collected from various sources, especially from the database where grading information is stored. The dataset was anonymized due to having protected information. Student-related information, such as date of birth, actual student ID, etc., was removed from the dataset. The dataset consists of students from five different semesters, and the duration represents the classes taken during the pandemic when the classes were taken online. The total sample size of the dataset is 223. This work utilized established machine learning algorithms applicable to classification problems with mid-sized datasets. The expected output was split into five different categories which are D, B, C, A, and A+. B. Feature Selection The evaluation strategy of the Automated Test Tools course has been illustrated in Table I. TABLE I PRIMARY DISTRIBUTION OF GRADE COMPONENTS Grade Item Research Projects In-class exercises Quizzes Submission Type Group submission Individual submission Individual submission Type of assignment Practical focused Practical focused Theory focused Worth 45% 15% 40% The research projects are worth 45% of the final grade and are conducted over the semester through group collaboration. Forty percent of the overall grade is taken from class tests, and the remaining 15% is dedicated to in-class exercises. The research projects are divided into three projects with the same teams of 3-4 students working together on the projects, for the full term. This research work evaluation is often split between 40% on projects which require teamwork, and 5% on individual class attendance based on the professor’s interest. The research projects consist of three group projects reflecting the development of automated test solutions. The projects involve software testing-related activities starting from gathering requirements, designing, and implementing an automated regression suite repository using an open-source tool called Selenium IDE [22], moving forward with automated test data generation using available open-source tools, and finally building automated test solution using Selenium WebDriver, JUnit [23] and TestNg [24] and deploying the code in DevOps tool Jenkins [25]. Table II shows a sample rubric of how Project 1 worth around 13.33% is being evaluated. The following are the mapped course learning outcomes for this project: • Discuss what it means to implement an AST solution. • Describe the benefits of implementing AST. • Design an automated testing strategy for a baseline software application. As the description suggests, the research projects require the technical knowledge to explore various options for implementing an effective AST solution. The group projects give 17th IEEE International Conference of Software Testing, Verification and Validation Workshops (ICSTW-2024) Toronto, Canada, pp. 331-338, DOI: https://doi.org/10.1109/ICSTW60967.2024.00064, May 2024 TABLE II RUBRICS FOR PROJECT 1 WHICH IS A GROUP PROJECT FOR IMPLEMENTING AUTOMATED TESTING SOLUTION USING SELENIUM IDE Criteria Excellent Average Poor SUT Selection The selected website was an excellent candidate because of the opportunity to test various features, and the justification of selection was provided in detail. The selected website was considered average as it offers some features for testing, but with a limited scope. The document lacks sufficient reasoning for its selection. The selected website was a poor candidate for this AST project due to not having enough testable features. The document did not provide proper justification. Requirements Traceability Matrix The RTM was very thorough and demonstrated how requirements are linked to the test cases. Some requirements are mapped to test cases. The creation of RTM was done poorly. Selection of Testcases Efforts in creating the testcases is very prominent, and a sufficient number of testcases were attempted. Testcase selection is showing an average effort. Not enough test cases were automated. Test Scripts Document Actual results for all the scenarios - completed and traceable. Actual results for major scenarios - completed and traceable. Actual results for scenarios - Incomplete or not traceable. Automated Scripts Automated tools were used comprehensively, and results were generated correctly. Automated tools were used okay, and most of the results were generated correctly. Automated tool was insufficiently used, resulting in incomplete results. Documentation Carefully followed all instructions, adhering to document format/presentation, spelling, and grammar standards, including providing sufficient screenshots. Followed most provided instructions, document format/presentation, spelling, and grammar including providing some screenshots. Did not follow instructions, and the document did not contain all the requested contents. Source Code Source code is submitted and accompanied by thorough documentation and clear explanations, ensuring a comprehensive understanding of its functionality Source code was submitted, but the source code is somewhat comprehensive. Source code was not submitted or the submitted source code is of poor quality, and not comprehensive. Critical thinking & problem solving skills The implemented solution demonstrates excellent critical thinking and problemsolving skills. The implemented solution demonstrates somewhat critical thinking and problemsolving skills. The implemented solution demonstrates limited critical thinking and problemsolving skills. the students exposure to the practical aspects of software automation through collaboration, interaction, and brainstorming with team members, and the successful implementation of the projects. Grade items on the test, worth 40% of the total mark, assess students on their understanding of theoretical concepts of software testing and are equally distributed among four different tests. The tests are individual deliverables and usually have multiple-choice questions along with a few scenariobased questions. The in-class exercises, which entail 15% of the final grade, let the students compare manual and automated efforts, implement automated solutions using an open-source tool called Katalon Recorder [26], and finally write test cases using Selenium WebDriver [13]. As part of data preparation and preprocessing activities, the percentage of research scores, test scores, and in-class exercise scores were calculated for each student, and the percentage scores were considered. Next, the system also contains information about how many times the student accessed the course page, how many of the available content or videos have been reviewed by the students, the number of assignments submitted in the course, and the number of quizzes completed. These are indicators of student engagement. For instance, the number of available content modules students are completing, along with their positive motivation towards completing the available recorded video lectures, may demonstrate that students are engaged in this course. Afterward, the student’s grades in the two prerequisite courses were added to the dataset. As discussed in previous sections, an interest of this study was to assess whether the grade in the prerequisite courses assisted with the student’s performance prediction. The students’ past failure history in the pre-requisite courses was calculated and considered as a separate input feature for developing the predictive model. A value of 1 indicates the student needed to retake the prerequisite course, and 0 indicates the student was able to pass the course with a single attempt. The prerequisite course ’Test Methodology’ gives students a background in quality assurance methodologies, including black-box, white-box, grey-box, unit, and other testing methods. The other prerequisite course is ’Coding for Tests’. This course examines the practices and procedures related to creating and debugging software. This course also prepares the student to write code, using a procedural approach initially, and then migrating to an object-oriented approach. The final feature list has been demonstrated in Table III. This table shows the 12 features selected as input for developing the model that would predict the letter-grade-category field. C. Data Pre-processing The data cleaning step verified the dataset for any null or redundant values. The data was split into 70% for training and 30% for testing to ensure that both training and testing datasets contained representation from each selected grade category 17th IEEE International Conference of Software Testing, Verification and Validation Workshops (ICSTW-2024) Toronto, Canada, pp. 331-338, DOI: https://doi.org/10.1109/ICSTW60967.2024.00064, May 2024 TABLE III AUTOMATED TEST TOOL - STUDENT PERFORMANCE DATA Feature Name ContentCompleted QuizCompleted #OfAssignmentSubmissions CourseAccess CodingforTest score TestMethodology score Type Numeric Numeric Numeric Numeric Numeric Numeric Retook TestMethodology Retook CodingForTest TestScore ResearchScore InClassExerciseScore MidtermScore Numeric Numeric Numeric Numeric Numeric Numeric Description The total number of views of the contents that were uploaded to the course home page. Total number of quizzes completed. Students are required to write four quizzes. Total number of assignments submitted. Number of times the student has accessed the course home page. Students’ grade in the prerequisite course of ”Coding for Test”. This course gives exposure to writing code Students grade in the prerequisite course of ”Testing Methodology Course”. This course introduces the student to a myriad of QA methodologies, including black-box, white-box, grey-box, unit, and other testing methods. A value of 0 if the student never failed prerequisite ”TestMethodology course”, and 1 otherwise. A value of 0 if the student never failed prerequisite ”Prerequisite Coding for Test course”, and 1 otherwise. Total score in the number of tests taken in the course. This is worth 40% of the final grade. Total score on research projects. The research projects are done in group settings. Total score on in-class exercises done as independent assignments, and worth 15% of the final grade. The total score is a summation of all the assignments done before the midterm reporting deadline. This may reflect the score in tests, research, and in-class exercises that are completed before the midterm due date. of D, C, B, A, and A+. The training and testing data was normalized using RobustScaler [27], a functionality available from the scikit-learn library of Python, to scale features using statistics that are robust to outliers. RobustScaler removes the median and scales the data according to the quartile range. D. Applied Machine Learning Algorithms As our dependent variable consists of different categories, including A+, A, B, C, and D, we have applied machine learning algorithms that apply to common classifiers that work for multi-classification problems. The multi-classification problem has more than two possible classes or variables. In literature, Alamri et al. [28] surveyed various machine learning algorithms that have been employed in explainable student performance studies, and it appears that the Decision Tree Algorithm and rule-based algorithms have been successful in developing the models for multi-class classification problems. In a similar vein, this study also considered the explainability of the model in terms of feature selection and investigated the effectiveness with simpler machine learning techniques of Logistic Regression [29], SVMs [30], and Random Forest Model, and then moved to the complex technique of DFFN (Deep Feed-Forward Neural Network). Our study compared these classification algorithms in their ability to detect student performance. Logistic Regression predicts the probability that an instance belongs to a particular class [31]. The target variable has five different ordinal grade categories. Multiclass Logistic regression is extended from a binary classification using one versus all or one versus rest method. One-vs-rest (OVR) classifier involves training a single classifier per class, with the samples of that class as positive samples and all other samples as negatives. Support Vector Machine SVM is a supervised learning technique that aims to classify the data. It uses a hyperplane for dividing the dataset into classes with a gap as wide as possible known as a margin. For considering this multiclassification problem with predicting the selected five types of grades, the problem was broken down into multiple binary classification problems. Similar to logistic regression, the onevs-rest technique was used. [32] Decision Tree Classifier can be applied in both binary and multiclass classification problems. This algorithm is useful for relatively small datasets that have a simple underlying structure, and this model is easily interpretable [33]. Random Forest [34] is an ensemble classification technique. This algorithm constructs many decision trees like a forest with random attribute values. Deep Feed-Forward Neural Network [35] is the simplest type of artificial neural network that has various applications in machine learning. DFNN is a suitable candidate for multiclass classification and works with larger datasets. E. Evaluation Metrics We have used the traditional evaluation methodology for comparing the effectiveness of the selected multi-class classification models which are Accuracy, Precision, Recall, F1Score, ROC AUC Score. Accuracy It is the ratio of correctly predicted instances to the total instances. Numberof CorrectPredictions Accuracy = (1) TotalNumberof Predictions Although accuracy is considered an important metric for measuring the performance of a problem, additional measures are required to ensure the result is not biased toward predicting a single category of students. To ensure the reliability of the model, all different types of grades should be predicted. The following evaluation metrics were considered in addition to accuracy. Precision is the ratio of true positive predictions to the total positive predictions. It focuses on the accuracy of positive predictions. Precision has been defined by the following equation: TruePositives Precision = (2) TruePositives + FalsePositives Recall is the ratio of true positive predictions to the total actual positive instances. It measures the ability of the model to capture all positive instances. TruePositives Recall = (3) TruePositives + FalseNegatives 17th IEEE International Conference of Software Testing, Verification and Validation Workshops (ICSTW-2024) Toronto, Canada, pp. 331-338, DOI: https://doi.org/10.1109/ICSTW60967.2024.00064, May 2024 F1 score is the harmonic mean of precision and recall. It provides a balanced measure between precision and recall. Precision + Recall (4) Precision × Recall AUC ROC curve is a graphical representation of the model’s ability to distinguish between positive and negative instances. The AUC (Area Under the Curve) summarizes the ROC curve into a single value. ROC AUC score is a single number that summarizes the classifier’s performance across all possible classification thresholds. F 1Score = 2 × F. Feature Importance After the best model is selected, the features that contributed to the model’s development will be analyzed based on a model-agnostic approach. To rely on the model interpretation, SHapley Additive exPlanations (SHAP), a game-based theory approach has been used to explain the machine learning model [36]. Finding a consistent and impartial explanation of how each characteristic affects the model’s prediction is commonly accomplished through the use of SHAP values. The significance of each feature in a model is assigned by SHAP values, which are based on game theory. How strong the influence is shown by its magnitude. IV. RESULTS Table IV demonstrates the parameters that were selected for each of the selected algorithms after applying hyperparameter tuning using the grid-search algorithm. TABLE IV HYPERPARAMETERS UTILIZED IN THE MACHINE LEARNING MODELS. Model Name Best Parameter Logistic Regression C: 120, class weight: balanced, max iter: 1000, multi class: multinomial, penalty: l2, solver: lbfgs. SVM estimator C: 23, estimator gamma: 0.05, estimator kernel: rbf. Random Forest max depth: 30, min samples leaf: min samples split: 5, n estimators: 100. Decision Tree criterion: entropy, max depth: None, min samples leaf: 1, min samples split: 2, splitter: best. Deep Neural Network batch size: 40, epochs: 700, model activation: relu, model dropout rate: 0.2, model hidden units: (45, 36, 27, 18), model optimizer: adam. 1, Table V summarizes the result of applying the selected algorithms. Logistic Regression outperformed all other models in multiple evaluation metrics. The logistic regression model had the highest value for accuracy, precision, recall, and F1 Score, and the second highest score for ROC AUC score. The next best-performing model was the Deep Feed Forward Neural Network with an accuracy score of 85% compared to 90% from Logistic Regression, a precision value of 83% which is less than a precision value of 89% from Logistic Regression, and an equal value with Random Forest classifier. The Recall and F1 Score of the DFNN network are in second position compared to other models presented in this study. The AUC ROC score of the DFNN model was the highest among all models, which means it detected the prediction from each of the categories. The Random Forest algorithm had a score of about 80% in all the evaluation metrics but had a slightly lower score than DFFNN and lower than Logistic Regression in all categories of evaluation metrics. SVM and Decision Tree classifiers were the poorest performers compared to Logistic Regression, with a score of between 70% to 79% in all categories except the AUC ROC score. We, therefore, considered the Logistic Regression model the best-performing model because although the ROC AUC score is slightly lower than the DFNN model with a value of 98% compared to 99%, all other evaluation metrics achieved higher values for the Logistic Regression model. For verifying the feature importance, the SHAP method has been applied to the best-performing Logistic model. The generated summary plot of SHAP values has been shown in Fig. 2. The X-axis of the bar graph shows the mean absolute SHAP values while the y-axis demonstrates the feature contributions from highest to lowest ranking. The feature importance is calculated based on the aggregated values of student records from five different semesters. The legend shows the instances taken from the five different selected student grades which are A+, A, B, C, and D. By examining these colored segments, we can understand how each feature influences the model’s predictions for each category. This detailed visualization allows interpretation of feature importance, shedding light on the specific role of each feature in differentiating between each grade type, and we can thus understand how each feature influences the model’s predictions. This bar graph shows that the highest contributor to the final grade is the research score, followed by the test score, and inclass exercise score. As these scores are direct components of the final grade, these features contribute the most to the model aligning with the expectation. Although the research score has the highest impact on the model contribution, observing at a more granular level shows that for students with A+ grades, the testing score was more important than the research score. On the other hand, for students with a D grade, the highest impact came from research scores. The next important feature shown in the bar graph of this student performance model is the midterm score. This feature is not a direct component of the final grade but is an indication of a student’s progress in the middle of the term. Satisfactory or unsatisfactory status is provided based on the accumulated score from other components until the midterm grade submission time. The midterm grade is usually reported between the 6th to 7th week of a 15-week term. This result illustrates how the student’s performance at the halfway point of the term contributes to predicting the student’s final grade. The result shows the midterm score is one of the important, but not the most important contributors. This could be due to the first 17th IEEE International Conference of Software Testing, Verification and Validation Workshops (ICSTW-2024) Toronto, Canada, pp. 331-338, DOI: https://doi.org/10.1109/ICSTW60967.2024.00064, May 2024 TABLE V RESULT OF RUNNING THE CLASSIFIERS Model Name Logistic Regression SVM Decision Tree Classifier Random Forest Classifier Deep Feed Forward Neural Network Accuracy 90% 75% 78% 84% 85% Precision 89% 70% 77% 83% 83% Recall 90% 75% 78% 84% 85% F1 Score 89% 72% 77% 82% 84% AUC 98% 97% 77% 97% 99% Fig. 2. Feature Importance from SHAP model several weeks of the classes delivering fundamental concepts. More challenging concepts and assignments are available in the second part of the course due to the requirement of understanding advanced concepts. The next attribute of the student’s grades prediction model is their prerequisite test methodology. This is not a direct component of the final grade. However, as this course provides a foundation-level knowledge of testing, the student’s competency in this course was measured using their grades and appears to affect students’ outcomes in the AST tool course. However, the score of the prerequisite course is not the primary contributor as the students can still manage to do well in research work through group studies, independent work toward tests, and by being attentive during in-class exercises. At the same time, this prerequisite course cannot be waived from the course since the SHAP model demonstrates one of the important indicators of this model. Also, not having enough background in this prerequisite course may set the students to achieve the learning objective of this course. The next three features reflecting student engagement are the number of assignment submissions, course access, and content completed in order of feature contributions to this model respectively. If the students don’t submit all their assignments as expected, that means they were not regular in class. Hence, the strong students will try to abide by the due date and submit their assignments accordingly. However, the students who skip their assignments entirely may be indicating that they are not serious about their grades. The course access attribute demonstrates the number of times the student accessed the course home page. This set of students’ records reflects when the students were moved to virtual classes. Most of the students were attending the synchronous lectures, and being up to date with the lectures can assist with their overall performance in the course. After the synchronous lectures, professors used to post the video lectures and other materials for students to review. This next feature shows that students’ completion or review of the content also helped with their overall performance, and vice versa. Another component that was used for developing this model was the grade in their prerequisite coding for tests course where the JAVA programming language was introduced. The Coding for Test course did impact the students’ overall performance prediction, but except for students who received D grades, it appears the priority is on the student’s background in the prerequisite testing methodology course compared to the Coding for Test course. In this course, through group research projects, students review and practice skills learned in the coding for test course. The score on the research project is important as the weight of this component in the final grade is 45%. The SHAP model also conforms to this by showing the research score as the strongest contributor to this course. As the students need to do coding in the automated test tools course, they will certainly suffer in the automated testing course without prior knowledge of coding. 17th IEEE International Conference of Software Testing, Verification and Validation Workshops (ICSTW-2024) Toronto, Canada, pp. 331-338, DOI: https://doi.org/10.1109/ICSTW60967.2024.00064, May 2024 QuizCompleted is an attribute from the student engagement category. Although this factor impacts the outcome of the student performance prediction model, it is not one of the major contributors. The reason could be that the majority of the students don’t skip their quizzes as each of the tests is worth 10% of their total score. In exceptional scenarios, they request a rescheduling of the exam. The last two factors affecting this model are the students who had a history of failure in the prerequisite courses of test methodology and coding for tests respectively. As the students cannot take the automated testing course without passing these two prerequisite courses, the previous failure history on these courses does not significantly impact this student performance model. The main factors are whether the students work hard towards their group projects, study for their tests, and participate effectively in their in-class exercises, followed by their previous background in testing methodology courses and engagement towards their course material. V. ANALYSIS AND DISCUSSION The performance of the student performance prediction model is comparable to other previous studies. For instance, the student performance model developed by Burman et al. [16] showed that the best-performing model had an accuracy of 91% whereas our implementation using the logistic regression model showed an accuracy of 90%. However, the other evaluation criteria such as precision, recall, F1-Score, and ROC AUC score were not measured in Burman et al.’s study. Since the result contains multiple grade categories, we wanted to ensure all different grade categories have been represented properly through evaluation criteria in addition to accuracy. The precision, recall, F1-Score, and ROC AUC scores of the developed models are 89%, 90%, 89%, and 98% respectively. These high scores bring confidence in the developed model. Bujang et al. [18] achieved an F1 score of 99.5% compared to the score of 89% from our best-performing model. In their study, the oversampling strategy was used to generate more data for each different great category. Without applying oversampling techniques, we were able to achieve closer to the 90% range for the F1 measure. This study was able to predict how the prerequisite courses and course engagements can also impact the student’s academic performance compared to other studies. The SHAP explanation from the obtained model shows that students can learn effectively when they work in a group setting by viewing the highest ranking in research projects. An independent study on gaining theoretical background is also important because it can lead to doing well in research work. Student engagement through in-class activities is important for students’ success in their courses. The practical components applied during in-class exercises use open-source tools in addition to in-house applications when validating the selected software being tested. This combined approach of theory with hands-on experience is developing effective QA professionals to be ready for the job market. VI. THREATS TO VALIDITY This work was based on the accumulation of data over five semesters during the pandemic period. The ability to provide more data including from the post-pandemic period may help us to generalize the findings. In addition, depending on the program, and student enrollment requirements, the prerequisite requirements can vary. This study was based on a postgraduate certificate program where students usually have an IT background with a diploma in the business or IT domain with or without a combination of work experience. This finding may vary slightly for students taking software testing courses without any Computer Science background. VII. CONCLUSION This study investigated the performance of the student performance prediction model of the software automated testing tool from the software testing post-graduate certificate program. Software performance prediction is an emerging research area, but this has not been applied to assessing academic performance in the software quality assurance domain. Software testing students show different personality traits and often have less motivation toward a software testing career compared to other software engineering students. Five different machine learning algorithms were attempted, and logistic regression performed best out of these models, with scores close to the 90% range in accuracy, precision, recall, F1 score, and ROC AUC score. This study has found that a student’s regular submissions of deliverables, engagement in class, and background in the testing foundation and programming concepts–along with dedication to the group and independent work with a hybrid knowledge in theory and practical concepts on developing effective software testing solutions—helps with performance prediction. This study can be extended by investigating whether other prerequisite courses should be added to this course. This could be achieved by assessing the grades of other level-one courses in the program. Few other courses in level one cover a curriculum consisting of soft skills such as communications, academic integrity, and applied project management. Our future software testing professionals will require both soft skills and technical skills to enable them to be ready for the real world. Finally, this study was able to develop an effective student performance model for software-automated test tools courses with the capability of explaining important factors for predicting student performance. ACKNOWLEDGMENT The authors would like to acknowledge Dr. Dev. Sainani, the associate dean of the School of Information Technology of Fanshawe College for supporting this research work. The authors would like to extend their thanks to Learning Systems Services, Robert R Downie, the Institutional Research Department, and the REB board of Fanshawe College for assisting with the data collection process. 17th IEEE International Conference of Software Testing, Verification and Validation Workshops (ICSTW-2024) Toronto, Canada, pp. 331-338, DOI: https://doi.org/10.1109/ICSTW60967.2024.00064, May 2024 REFERENCES [1] R. d. S. Santos, L. F. Capretz, C. V. C. de Magalha˜es, and R. Souza, “Myths and Facts about a Career in Software Testing: The Perspectives of Students and Practitioners,” in 2023 IEEE 35th International Conference on Software Engineering Education and Training (CSEE&T), Aug. 2023, pp. 120–120, iSSN: 2377-570X. [Online]. Available: https://ieeexplore.ieee.org/document/10229341 [2] L. F. Capretz, P. Waychal, J. Jia, D. Varona, and Y. Lizama, “Studies on the Software Testing Profession,” in 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSECompanion), May 2019, pp. 262–263, iSSN: 2574-1934. [Online]. Available: https://ieeexplore.ieee.org/document/8802688 [3] S. M. Melo, V. X. S. Moreira, L. N. Paschoal, and S. R. S. Souza, “Testing Education: A Survey on a Global Scale,” in Proceedings of the XXXIV Brazilian Symposium on Software Engineering, ser. SBES ’20. New York, NY, USA: Association for Computing Machinery, Dec. 2020, pp. 554–563. [Online]. Available: https://dl.acm.org/doi/10.1145/3422392.3422483 [4] D. Oguz and K. Oguz, “Perspectives on the gap between the software industry and the software engineering education,” IEEE Access, vol. 7, pp. 117 527–117 543, 2019. [5] M. Aniche, F. Hermans, and A. van Deursen, “Pragmatic software testing education,” in Proceedings of the 50th ACM Technical Symposium on Computer Science Education, ser. SIGCSE ’19. New York, NY, USA: Association for Computing Machinery, 2019, p. 414– 420. [Online]. Available: https://doi.org/10.1145/3287324.3287461 [6] V. Garousi and A. Mathur, “Current State of the Software Testing Education in North American Academia and Some Recommendations for the New Educators,” in 2010 23rd IEEE Conference on Software Engineering Education and Training, Mar. 2010, pp. 89–96, iSSN: 2377-570X. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/5463575 [7] A. A. Barrett, E. Paul Enoiu, and W. Afzal, “On the Current State of Academic Software Testing Education in Sweden,” in 2023 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), Apr. 2023, pp. 397–404, iSSN: 2159-4848. [Online]. Available: https://ieeexplore.ieee.org/document/10132264 [8] D. Towey and T. Y. Chen, “Teaching software testing skills: Metamorphic testing as vehicle for creativity and effectiveness in software testing,” in 2015 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE), Dec. 2015, pp. 161–162. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/7386036 [9] G. Fraser, A. Gambi, M. Kreis, and J. M. Rojas, “Gamifying a software testing course with code defenders,” in Proceedings of the 50th ACM Technical Symposium on Computer Science Education, 2019, pp. 571– 577. [10] B. S. Clegg, J. M. Rojas, and G. Fraser, “Teaching Software Testing Concepts Using a Mutation Testing Game,” in 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering Education and Training Track (ICSE-SEET), May 2017, pp. 33–36. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/7964327 [11] J. Andrews, “Killer app: A eurogame about software quality,” in 2013 26th International Conference on Software Engineering Education and Training (CSEE&T), 05 2013, pp. 319–323. [12] L. Deng, J. Dehlinger, and S. Chakraborty, “Teaching Software Testing with Free and Open Source Software,” in 2020 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), Oct. 2020, pp. 412–418. [Online]. Available: https://ieeexplore.ieee.org/document/9155837 [13] SeleniumHQ, “Selenium webdriver: From foundations to framework,” https://www.selenium.dev/documentation/en/webdriver/, Open Source Community, 2022. [14] I. S. Elgrably and S. Ronaldo Bezerra Oliveira, “Model for teaching and training software testing in an agile context,” in 2020 IEEE Frontiers in Education Conference (FIE), Oct. 2020, pp. 1–9, iSSN: 2377-634X. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/9274117 [15] A. Moubayed, M. Injadat, A. Shami, and H. Lutfiyya, “Relationship Between Student Engagement and Performance in E-Learning Environment Using Association Rules,” in 2018 IEEE World Engineering [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] Education Conference (EDUNINE), Mar. 2018, pp. 1–6. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/8451005 I. Burman and S. Som, “Predicting students academic performance using support vector machine,” in 2019 Amity international conference on artificial intelligence (AICAI). IEEE, 2019, pp. 756–759. E. S. Bhutto, I. F. Siddiqui, Q. A. Arain, and M. Anwar, “Predicting students’ academic performance through supervised machine learning,” in 2020 International Conference on Information Science and Communication Technology (ICISCT), 2020, pp. 1–6. S. D. A. Bujang, A. Selamat, R. Ibrahim, O. Krejcar, E. HerreraViedma, H. Fujita, and N. A. M. Ghani, “Multiclass Prediction Model for Student Grade Prediction Using Machine Learning,” IEEE Access, vol. 9, pp. 95 608–95 621, 2021, conference Name: IEEE Access. [Online]. Available: https://ieeexplore.ieee.org/document/9468629 M. Khan, S. Naz, Y. Khan, M. Zafar, M. Khan, and G. Pau, “Utilizing machine learning models to predict student performance from lms activity logs,” IEEE Access, vol. 11, pp. 86 953–86 962, 2023. Y. Shi, F. Sun, H. Zuo, and F. Peng, “Analysis of learning behavior characteristics and prediction of learning effect for improving college students’ information literacy based on machine learning,” IEEE Access, vol. 11, pp. 50 447–50 461, 2023. S. Jayasundara, A. Indika, and D. Herath, “Interpretable Student Performance Prediction Using Explainable Boosting Machine for MultiClass Classification,” in 2022 2nd International Conference on Advanced Research in Computing (ICARC), Feb. 2022, pp. 391–396. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/9753867 Selenium, “Selenium IDE,” https://www.selenium.dev/selenium-ide/, Year. J. Team, JUnit 5 User Guide, JUnit Contributors, 2022. [Online]. Available: https://junit.org/junit5/docs/current/user-guide/ C. Beust and A. Popescu, TestNG: The Testing Framework for Java, TestNG Team, 2022. [Online]. Available: https://testng.org/doc/ J. Community, Jenkins Documentation, Jenkins Project, 2022. [Online]. Available: https://www.jenkins.io/doc/ Katalon LLC, “Katalon Recorder,” https://www.katalon.com/, Year. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011. R. Alamri and B. Alharbi, “Explainable Student Performance Prediction Models: A Systematic Review,” IEEE Access, vol. 9, pp. 33 132– 33 143, 2021, conference Name: IEEE Access. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/9360749 D. W. Hosmer Jr, S. Lemeshow, and R. X. Sturdivant, Applied logistic regression. John Wiley & Sons, 2013, vol. 398. M. Hearst, S. Dumais, E. Osuna, J. Platt, and B. Scholkopf, “Support vector machines,” IEEE Intelligent Systems and their Applications, vol. 13, no. 4, pp. 18–28, 1998. S. Menard, Applied logistic regression analysis. Sage, 2002, no. 106. S. Suthaharan and S. Suthaharan, “Support vector machine,” Machine learning models and algorithms for big data classification: thinking with examples for effective learning, pp. 207–235, 2016. B. Charbuty and A. Abdulazeez, “Classification based on decision tree algorithm for machine learning,” Journal of Applied Science and Technology Trends, vol. 2, no. 01, pp. 20–28, 2021. L. Breiman, “Random forests,” Machine learning, vol. 45, pp. 5–32, 2001. G. Bebis and M. Georgiopoulos, “Feed-forward neural networks,” IEEE Potentials, vol. 13, no. 4, pp. 27–31, 1994. S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” in Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds. Curran Associates, Inc., 2017, pp. 4765–4774. [Online]. Available: http://papers.nips.cc/paper/7062-aunified-approach-to-interpreting-model-predictions.pdf

RELATED PAPERS

RELATED TOPICS

Log In

Factors Influencing Performance of Students in Software Automated Test Tools Course

Factors Influencing Performance of Students in Software Automated Test Tools Course

Related Papers

RELATED PAPERS

RELATED TOPICS