0% found this document useful (0 votes)

17 views25 pages

Data Mining Report

This project report details the development of a machine learning model for predicting used car prices using historical sales data. The model employs a stacking regressor that combines multiple regression algorithms, achieving a high R² score of 0.97 on test data, indicating strong predictive accuracy. The project emphasizes the importance of data preprocessing and feature engineering in enhancing model performance and aims to improve transparency and fairness in the used car market.

Uploaded by

kumshi71

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views25 pages

Data Mining Report

Uploaded by

kumshi71

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

DATA MINING (22CS63)

PROJECT REPORT

CAR PRICE PREDICTION

Submitted in partial fulfillment of the requirement for the award of Degree of

Bachelor of Engineering
in
Computer Science and Engineering
Submitted by:
HARSHITH KOLLURU 1NT22CS073
GAGAN S KUNKANAD 1NT22CS067

Under the Guidance of

Dr. Vijaya Shetty S
Professor, Dept. of CS&E, NMIT

Department of Computer Science and Engineering

(Accredited by NBA Tier-1)
2025-2026
Table of Contents

Abstract
1. Introduction
1.1 Motivation
1.2 Problem Domain
1.3 Aim and Objectives
2. Data Source and Data Quality
2.1 Dataset Used
2.2 Data Preprocessing
3. Methods & Models
3.1 Data Mining Questions
3.2 Data Mining Algorithms
3.3 Data Mining Models
4. Model Evaluation & Discussion
5. Conclusion & Future Direction
6. Reflection Portfolio
References
Appendices
a. Link to the dataset chosen
b. Python Codes Implemented
c. Setup to execute the code
Table of Figures

SI No Name Page

Fig 3.1 Stacked Model being used 6

Fig 4.1 Top 10 Features of Dataset identified 8

Fig 4.2 Distribution of Selling Price with 10

Frequency

Fig 4.3 Predicted vs Actual Selling Price Plot 11

Fig 4.4 Residual Plot 12

NITTE MEENAKSHI INSTITUTE OF TECHNOLOGY
(AN AUTONOMOUS INSTITUTION, AFFILIATED TO VISVESVARAYA TECHNOLOGICAL UNIVERSITY, BELGAUM
, APPROVED BY AICTE & GOVT.OF KARNATAKA)

Department of Computer Science and Engineering

(Accredited by NBA Tier-1)

CERTIFICATE

This is to certify that the Car Price Prediction is an authentic work carried out by HARSHITH
KOLLURU 1NT22CS073 and GAGAN S KUNKANAD 1NT22CS067 bonafide students of
Nitte Meenakshi Institute of Technology, Bangalore in partial fulfillment for the award of the
degree of Bachelor of Engineering in COMPUTER SCIENCE AND ENGINEERING of
Visvesvaraya Technological University, Belgavi during the academic year 2025-2026. It is
certified that all corrections and suggestions indicated during the internal assessment has been
incorporated in the report. This project has been approved as it satisfies the academic
requirement in respect of project work presented for the said degree.

Internal Guide Signature of the HOD Signature of Principal

Dr. Vijaya Shetty S Dr. S Meenakshi Sundaram Dr. H. C. Nagaraj

Associate Professor, Professor, Head, Dept. CSE, Principal,
Dept. CSE, NMIT Bangalore NMIT Bangalore NMIT,Bangalore
DECLARATION
We hereby declare that Learning activity project work

(i) The project work is our original work

(ii) This Project work has not been submitted for the award of any degree or examination at any
other university/College/Institute.
(iii) This Project Work does not contain other persons’ data, pictures, graphs or other information,
unless specifically acknowledged as being sourced from other persons.
(iv) This Project Work does not contain other persons’ writing, unless specifically acknowledged
as being sourced from other researchers. Where other written sources have been quoted, then:
a) their words have been re-written but the general information attributed to them has been
referenced;
b) where their exact words have been used, their writing has been placed inside quotation
marks, and referenced.
(v) This Project Work does not contain text, graphics or tables copied and pasted from the
Internet, unless specifically acknowledged, and the source being detailed in the thesis and in
the References sections.

NAME USN Signature

HARSHITH KOLLURU 1NT22CS073

GAGAN S KUNKANAD 1NT22CS067

Date:
ACKNOWLEDGEMENT

The satisfaction and euphoria that accompany the successful completion of any task would be
incomplete without the mention of the people who made it possible, whose constant guidance
and encouragement crowned our effort with success. I express my sincere gratitude to our
Principal Dr. H. C. Nagaraj, Nitte Meenakshi Institute of Technology for providing facilities.

We wish to thank our HoD, Dr. S Meenakshi Sundaram for the excellent environment created
to further educational growth in our college. We also thank him for the invaluable guidance
provided which has helped in the creation of a better project.

I hereby like to thank Dr. Vijaya Shetty S, Professor, Department of Computer Science &
Engineering on her periodic inspection, time to time evaluation of the project and help to bring the
project to the present form.

Thanks to our Departmental Project coordinators. We also thank all our friends, teaching and
non-teaching staff at NMIT, Bangalore, for all the direct and indirect help provided in the
completion of the project.

NAME USN Signature

HARSHITH KOLLURU 1NT22CS073

GAGAN S KUNKANAD 1NT22CS067

Date:
ABSTRACT
This report presents a comprehensive study on the application of machine learning techniques for
predicting the selling price of used cars based on historical sales data. The project encompasses
the development of a robust data pipeline, including data collection, preprocessing, exploratory
data analysis, feature engineering, model selection, and evaluation. Key preprocessing steps
involved handling missing values, encoding categorical variables, and normalizing numerical
features to ensure data integrity and model effectiveness.
A variety of regression algorithms were explored, with particular emphasis on ensemble learning
methods. The final predictive model utilizes a stacking regressor that integrates both linear and
non-linear base models, thereby leveraging their complementary strengths. Extensive
hyperparameter tuning and cross-validation were performed to optimize model performance and
mitigate overfitting.
The proposed solution achieved an R² score of 0.97 on unseen test data, indicating a high level of
predictive accuracy. The results demonstrate that machine learning-driven approaches can
significantly enhance the transparency, efficiency, and reliability of used car price estimation.
The project concludes with the deployment of a user-oriented application, underscoring the
practical value and real-world applicability of the developed system in the automotive market.
1. Introduction

1.1 Motivation

In today’s digital economy, transparency and data-driven decision-making are more important
than ever, especially in industries like automotive resale where pricing can be highly subjective.
The process of buying or selling a used car is often fraught with uncertainty, as both buyers and
sellers struggle to determine a fair and accurate price. Traditional valuation methods frequently
rely on personal judgment or limited market data, leading to inconsistencies and potential
mistrust between parties.

Recognizing these challenges, we were motivated to explore how data science and machine
learning could introduce greater consistency and fairness into the used car market. With the
increasing availability of comprehensive historical sales data, there is a significant opportunity to
apply advanced analytics to predict car prices more accurately. Our goal was to develop a
predictive model that leverages these data resources to provide reliable price estimates, thereby
streamlining transactions and improving confidence for all stakeholders.

By addressing this real-world problem, our project aims to demonstrate the transformative
potential of machine learning in creating transparent, efficient, and equitable solutions within the
automotive resale industry

1.2 Problem Domain

Used car pricing is shaped by a wide range of factors, including the vehicle’s age, fuel type,
mileage, brand reputation, and ownership history. Additional elements such as service records,
accident history, and prevailing market trends can further complicate the valuation process. In
the absence of standardized pricing practices, these variables are often assessed inconsistently,
which can lead to unfair advantages or disadvantages for both buyers and sellers. Such
inconsistencies not only create confusion but also undermine trust and transparency in the used
car market.

1
This project seeks to address these challenges by introducing a data-driven, standardized
approach to used car pricing using machine learning techniques. By analyzing historical sales
data and identifying key patterns among the various influencing factors, our goal is to develop a
predictive model that delivers objective and accurate price estimates. Through this approach, we
aim to foster greater transparency and fairness, ultimately contributing to a more trustworthy and
efficient secondary automobile market.

1.3 Aim and Objectives

 To analyze a real-world car dataset and extract meaningful insights.

 To process and transform data into a form suitable for modeling.
 To apply and compare different regression algorithms.
 To build an ensemble model that combines the strengths of multiple base models.
 To visualize results for effective communication and understanding.
 To understand the full life-cycle of a machine learning pipeline.

2
2. Data Source and Data Quality

2.1 Dataset Used

The dataset used in this study was obtained from Kaggle, titled "Vehicle dataset from CarDekho".
It comprises 301 records with 9 attributes. These include both numerical features (e.g.,
Present_Price, Kms_Driven) and categorical features (e.g., Fuel_Type, Transmission).

2.2 Data Preprocessing

Preprocessing played a vital role in the quality and success of the model:

 Feature Engineering:

 Categorical Encoding:

 Feature Scaling & Polynomial Features:

These steps significantly improved model performance and made the data more suitable for
learning.

3
3. Methods & Models

3.1 Data Mining Questions

 What car features most influence the selling price?

 Can polynomial feature transformation improve model performance?
 Which regression model gives the best generalization on unseen data?

3.2 Data Mining Algorithms

We evaluated multiple regression algorithms:

 Linear Regression (for baseline performance)

 Lasso Regression (to penalize less useful features)
 Decision Tree Regressor (captures non-linearity)
 Gradient Boosting Regressor (robust ensemble method)
 Stacking Regressor (combines multiple models for superior performance)

3.3 Data Mining Models

4
This code implements a stacking ensemble regressor using four different regression models as
base learners and Linear Regression as the meta-model. Here’s a breakdown of the models and
key parameters used:

Base Models:

 Linear Regression: A standard regression model that fits a linear relationship between
features and the target variable2.
 Lasso Regression (alpha=0.1): A linear model with L1 regularization, which helps in feature
selection by penalizing the absolute values of coefficients. The parameter alpha=0.1 controls
the strength of the regularization, with higher values leading to more regularization.
 Decision Tree Regressor (max_depth=5): A tree-based model that splits data into branches
to predict continuous outcomes. The parameter max_depth=5 limits the depth of the tree to
prevent overfitting by restricting how many times the tree can split.
 Gradient Boosting Regressor (n_estimators=150, learning_rate=0.1, max_depth=3): An
ensemble model that builds trees sequentially to correct errors of previous
trees. n_estimators=150 sets the number of boosting stages, learning_rate=0.1 controls the
contribution of each tree, and max_depth=3 restricts the depth of individual trees to prevent
overfitting.

Meta-Model:

Linear Regression: Used as the final estimator to combine the predictions of the base models,
learning how to best weight their outputs for improved accuracy.

Stacking Regressor:

Combines the predictions of all base models using the meta-model for a more robust and
accurate prediction. The model is trained on the training data with stacking_model.fit(X_train,
y_train)

5
Fig 3.1: Stacked Model being used

6
4. Model Evaluation & Discussion

Feature Importance Visualization

To gain insights into which features most significantly influenced the predictions of the
Gradient Boosting Regressor, we conducted a feature importance analysis after applying
polynomial feature expansion. The following steps outline the process:

 Extraction of Feature Importance:

The attribute feature_importance_ of the trained Gradient Boosting Regressor (gbr_model)
was used to obtain the relative importance of each input feature. These importance scores
reflect the contribution of each feature to the model’s predictive performance.
 Retrieval of Feature Names:
After polynomial transformation, the feature space includes both original and newly
generated polynomial features. The method poly.get_feature_names_out(X.columns) was
employed to retrieve the names of all features present in the transformed dataset.
 Construction of the Importance DataFrame:
A pandas DataFrame was created to pair each feature name with its corresponding
importance score, facilitating easier analysis and visualization.
 Ranking and Selection of Top Features:
The DataFrame was sorted in descending order based on the importance scores. The top 10
most influential features were then selected to highlight those with the greatest impact on the
model’s predictions.

This analysis not only identifies the most critical factors (including polynomial feature
combinations) affecting the model’s output but also enhances the interpretability and

7
transparency of the predictive process. The results can guide further feature engineering and
inform stakeholders about the key drivers of used car prices in the dataset.

Fig 4.1: Top 10 Features of Dataset identified

Target Distribution Visualization

8
Fig 4.2: Distribution of Selling Price with Frequency

A histogram of the Selling_Price column was plotted using Seaborn’s histplot function with 30
bins and an orange color scheme. The kde=True parameter adds a smooth density curve to the
plot. This visualization helps reveal the distribution, central tendency, and spread of selling
prices in the dataset, providing useful insights for further analysis and modeling.

Training Set Performance

The model demonstrates excellent training performance, achieving a very high R² score of 0.98,
which indicates it explains 98% of the variance in selling prices. The low MAE (0.46) and
RMSE (0.67) values further suggest highly accurate predictions with minimal average error on
the training data.

9
Test Set Performance

The model maintains strong performance on the test data, with an R² score of 0.97, indicating it
captures 97% of the variance in selling prices. The low MAE (0.43) and RMSE (0.71) values
confirm that the model provides accurate and reliable predictions on unseen data, demonstrating
good generalization

Regression Fit Visualization

10
Fig 4.3: Predicted vs Actual Selling Price Plot

Insights from Predicted vs Actual Selling Price Plot

The scatter plot compares predicted selling prices against actual selling prices for the test data.
Most points closely follow the red diagonal line, indicating strong agreement between predicted
and actual values. The tight clustering around the line and the narrow confidence band suggest
high predictive accuracy and minimal bias. Overall, the model demonstrates excellent
performance in estimating used car prices, with only minor deviations for a few higher-priced
cars.

11
Residual Analysis

Fig 4.4 Residual Plot

Insights from Residual Plot

The residual plot displays the differences between actual and predicted selling prices against the
predicted values. Most residuals are scattered closely around the zero line, indicating that the
model’s predictions are generally unbiased and errors are randomly distributed. There are a few
outliers, but no clear pattern or systematic deviation is observed, suggesting that the model
captures the underlying relationships well and does not suffer from major issues like
heteroscedasticity or non-linearity.The results demonstrate minimal overfitting. The residuals
being centered around zero confirms a low prediction bias. These findings validate the
robustness of our pipeline.

12
5. Conclusion & Future Direction

Conclusion

Working on this project has been both a technically enriching and intellectually fulfilling
experience. Our primary objective was to develop a robust machine learning model that could
accurately predict the selling prices of used cars based on multiple features. We successfully
built a sophisticated stacking ensemble model that integrates linear regression, lasso regression,
decision trees, and gradient boosting—culminating in a strong, generalizable predictor.

Through systematic preprocessing, feature engineering (like the creation of Car_Age and the
log transformation of Kms_Driven), and the application of polynomial features to capture non-
linearity, we were able to transform raw tabular data into an optimized format for learning. The
stacking model's performance—R² of 0.9664 on test data—demonstrates high accuracy and low
generalization error.

Key Learnings:

Value of Ensemble Methods:

We learned firsthand that combining multiple models through ensemble techniques, like
stacking, can significantly boost performance compared to relying on a single algorithm. This
approach allowed us to leverage the unique strengths of different models and achieve more
reliable results.

Importance of Feature Engineering:

One of our biggest takeaways was the critical role of feature engineering. Creating new features,
selecting the most relevant variables, and properly scaling the data had a direct and noticeable
impact on the model’s accuracy. This process taught us how thoughtful data preparation can
make or break a machine learning project.

13
Power of Visualization:
Visualizing the data and model results helped us understand not just the numbers, but also the
story behind them. Tools like scatter plots and residual plots were essential for diagnosing
issues, interpreting results, and communicating our findings clearly.

Collaboration and Problem-Solving:

Throughout the project, we worked closely as a team, sharing ideas and troubleshooting
challenges together. This collaborative environment helped us develop our communication
skills and learn from each other’s perspectives.

Future Work

This project has inspired us to think about how we can take our work further:

Web Deployment:
We are excited about the prospect of deploying our model as a web application, making it
accessible to anyone who wants a data-driven estimate for their used car.

Expanding the Feature Set:

In the future, we hope to include more detailed features, such as specific car models, brands,
and geographic locations, to make our predictions even more accurate and relevant.

Model Interpretability:
We also recognize the importance of making our model’s decisions understandable. Exploring
interpretability tools like SHAP or LIME will help us explain our predictions and build trust
with users.

14
6. Reflection Portfolio

This project provided a comprehensive, hands-on opportunity to bridge academic concepts with
real-world machine learning applications. Beyond technical outcomes, it fostered critical
professional competencies essential for aspiring data scientists. Below, we summarize our key
insights and growth areas:

1. Data Understanding and Preparation

Working with raw, unstructured data underscored the importance of meticulous data exploration
and cleaning. We developed strategies to address missing values, outliers, and inconsistencies-
skills crucial for transforming imperfect real-world datasets into reliable modeling inputs.

2. Preprocessing and Feature Engineering

Through iterative experimentation, we recognized how preprocessing choices (e.g., log

transformations, polynomial feature expansion) directly influence model performance. Feature
engineering emerged as both an art and a science, requiring domain intuition and empirical
validation.

3. Model Development and Ensemble Learning

By implementing and comparing diverse algorithms-from linear regression to gradient boosting-

we deepened our understanding of their theoretical assumptions and practical trade-offs. The
stacking ensemble highlighted the power of combining models to balance bias, variance, and
interpretability.

4. Evaluation and Communication

We refined our ability to critically assess model performance using metrics like MAE, RMSE,
and R². Visualization tools (e.g., residual plots, regression diagnostics) became indispensable for
diagnosing errors and communicating results to stakeholders.

15
5. Collaboration and Project Management

Navigating team workflows, version control, and task delegation mirrored real-world data
science environments. These experiences emphasized the importance of clear communication,
adaptability, and iterative problem-solving in collaborative projects.

Broader Implications

This project demonstrated how machine learning can address tangible challenges in industries
like automotive resale, where transparency and fairness are paramount. By delivering a robust,
data-driven pricing framework, we showcased the potential of predictive analytics to enhance
market efficiency and stakeholder trust.

Preparedness for Future Challenges

The technical and soft skills developed through this work-from coding proficiency to critical
thinking-have equipped us to tackle complex data problems across domains. We are now better
positioned to contribute meaningfully to future projects, whether in academic research, industry
applications, or entrepreneurial ventures.

16
References

[1] Scikit-learn Developers, "Scikit-learn: Machine Learning in Python," [Online].

Available: https://scikit-learn.org/. [Accessed: May 18, 2025].

[2] N. Birla, "Vehicle Dataset from Cardekho," Kaggle, [Online].

Available: https://www.kaggle.com/datasets/nehalbirla/vehicle-dataset-from-cardekho.
[Accessed: May 18, 2025].

[3] Python Software Foundation, "Python 3 Documentation," [Online].

Available: https://docs.python.org/3/. [Accessed: May 18, 2025].

[4] S. S. Patil and S. S. Patil, "Used Car Price Prediction System," International Journal of
Scientific Research in Science and Technology, vol. 11, no. 3, pp. 108–113, 2024. [Online].
Available: https://www.ijsrst.com/index.php/home/article/view/IJSRST24113108

[5] B. N. Bala, "Price Prediction for Used Cars (Data Science Project)," GitHub, [Online].
Available: https://github.com/bala-1409/Price-Prediction-for-Used-Cars-Datascience-Project.
[Accessed: May 18, 2025].

[6] S. Sharma and A. Kumar, "Comparative Analysis of Machine Learning Algorithms for
Used Car Price Prediction," International Journal of Current Science Research and Review, vol.
7, no. 2, pp. 123–130, 2024. [Online]. Available: https://ijcsrr.org/comparative-analysis-of-
machine-learning-algorithms-for-used-car-price-prediction/

[7] M. A. Rahman, "Used Car Price Prediction and Valuation using Data Mining
Techniques," RIT Scholar Works, 2019. [Online].
Available: https://repository.rit.edu/cgi/viewcontent.cgi?article=12220&context=theses
Appendices
a. Link to Dataset:
https://www.kaggle.com/datasets/nehalbirla/vehicle-dataset-from-cardekho

b. Python Codes Implemented:

Available in below GitHub repository:

https://github.com/GMLDEV/DATA_MINING.git

c. Setup to Execute the Code:

 Python 3.10+
 Google Colab
 Required Libraries:

 pandas
 numpy
 matplotlib
 seaborn
 scikit-learn

Final Print
No ratings yet
Final Print
39 pages
Minor Project RRR
No ratings yet
Minor Project RRR
24 pages
Used Car Price Prediction Using Machine Learning: Veluru Ranjith (Urk18Cs020)
No ratings yet
Used Car Price Prediction Using Machine Learning: Veluru Ranjith (Urk18Cs020)
26 pages
Car Price Prediction Report
No ratings yet
Car Price Prediction Report
29 pages
BLACKBOOK
No ratings yet
BLACKBOOK
33 pages
Report
No ratings yet
Report
20 pages
Price Prediction
No ratings yet
Price Prediction
14 pages
Final Project - Merged
No ratings yet
Final Project - Merged
17 pages
Used Car Price Prediction
No ratings yet
Used Car Price Prediction
20 pages
Car Price Prediction
No ratings yet
Car Price Prediction
21 pages
ML FYP Report Final
No ratings yet
ML FYP Report Final
49 pages
ML Project (1) Final
No ratings yet
ML Project (1) Final
15 pages
Rajjippt
No ratings yet
Rajjippt
14 pages
Updated Car Price Prediction Report v2
No ratings yet
Updated Car Price Prediction Report v2
27 pages
Car Price Prediction Report
No ratings yet
Car Price Prediction Report
24 pages
Wa0001.
No ratings yet
Wa0001.
36 pages
Mini Project New
No ratings yet
Mini Project New
25 pages
Sample
No ratings yet
Sample
15 pages
Report On Used Car Price Prediction
No ratings yet
Report On Used Car Price Prediction
4 pages
Intelligent Vehicle Support
No ratings yet
Intelligent Vehicle Support
35 pages
Analyzing Selling Price of Used Cars Using Machine Learning
No ratings yet
Analyzing Selling Price of Used Cars Using Machine Learning
41 pages
Project Soft
No ratings yet
Project Soft
28 pages
Updated Used Cars Price Prediction Using Machine Learning
No ratings yet
Updated Used Cars Price Prediction Using Machine Learning
24 pages
Used Car Price Prediction Guide
No ratings yet
Used Car Price Prediction Guide
3 pages
Car Price Prediction Project Chapters
No ratings yet
Car Price Prediction Project Chapters
30 pages
ML Course
No ratings yet
ML Course
23 pages
Used Cars Price Prediction and Valuation Using Data Mining Techni
No ratings yet
Used Cars Price Prediction and Valuation Using Data Mining Techni
37 pages
Demo Abstract
No ratings yet
Demo Abstract
1 page
Duplichecker Plagiarism Report
No ratings yet
Duplichecker Plagiarism Report
3 pages
Car Price Predication Using Linear Regression
No ratings yet
Car Price Predication Using Linear Regression
24 pages
Bulldozer Price Prediction Using Regression Model (Research Ethics)
No ratings yet
Bulldozer Price Prediction Using Regression Model (Research Ethics)
19 pages
Project
No ratings yet
Project
24 pages
ML Case Study
No ratings yet
ML Case Study
11 pages
PPSD 1743674861
No ratings yet
PPSD 1743674861
3 pages
Car Resale Value
No ratings yet
Car Resale Value
20 pages
Car Dekho-Used Car Price Prediction
No ratings yet
Car Dekho-Used Car Price Prediction
10 pages
1st Review
No ratings yet
1st Review
9 pages
IRJMETS60300008997
No ratings yet
IRJMETS60300008997
6 pages
IOMP1
No ratings yet
IOMP1
21 pages
Automobile Prediction
No ratings yet
Automobile Prediction
35 pages
Anuj 1
No ratings yet
Anuj 1
18 pages
Car Price Prediction Using Ai
No ratings yet
Car Price Prediction Using Ai
6 pages
Predicting Pre-Owned Car Prices Using Machine Learning
No ratings yet
Predicting Pre-Owned Car Prices Using Machine Learning
17 pages
Car Price Prediction Synopsis
No ratings yet
Car Price Prediction Synopsis
2 pages
Used Cars Price Prediction and Valuation Using Data Mining Techni
100% (1)
Used Cars Price Prediction and Valuation Using Data Mining Techni
37 pages
Prediction of The Price of Used Cars Based On Mach
No ratings yet
Prediction of The Price of Used Cars Based On Mach
7 pages
Mini
No ratings yet
Mini
16 pages
Paper 10479
No ratings yet
Paper 10479
4 pages
Project Poster A17
No ratings yet
Project Poster A17
1 page
Data Analytics Research Paper
No ratings yet
Data Analytics Research Paper
3 pages
74 Ijcse2018 19
No ratings yet
74 Ijcse2018 19
7 pages
Car Evaluation
No ratings yet
Car Evaluation
62 pages
Savitribai Phule Pune University: A Report On Mini Project
No ratings yet
Savitribai Phule Pune University: A Report On Mini Project
10 pages
Used Car Price Prediction Model
No ratings yet
Used Car Price Prediction Model
10 pages
Wa0014.
No ratings yet
Wa0014.
3 pages
Rohit Godke Dsbda Report Sppu
No ratings yet
Rohit Godke Dsbda Report Sppu
10 pages
Pre-Owned Car Price and Life Prediction Using Machine Learning
No ratings yet
Pre-Owned Car Price and Life Prediction Using Machine Learning
26 pages
Chapter 14
No ratings yet
Chapter 14
3 pages
C.V. Raman Global University: Bhubaneswar - 752 054 (Odisha)
No ratings yet
C.V. Raman Global University: Bhubaneswar - 752 054 (Odisha)
3 pages
AI HL Functions
No ratings yet
AI HL Functions
93 pages
Thesis Using Linear Regression
100% (2)
Thesis Using Linear Regression
7 pages
Om Scratch
100% (1)
Om Scratch
124 pages
4 The Effect of Customer Trust On Customer
No ratings yet
4 The Effect of Customer Trust On Customer
11 pages
Statistics & Probability Quiz
100% (1)
Statistics & Probability Quiz
35 pages
MCQs Simple Linear Regression
90% (10)
MCQs Simple Linear Regression
3 pages
Applications and Challenges of Implementin AI in Orthodontics
No ratings yet
Applications and Challenges of Implementin AI in Orthodontics
5 pages
Econometrics Exam Guide
No ratings yet
Econometrics Exam Guide
2 pages
Ethics and Ai
No ratings yet
Ethics and Ai
26 pages
Wilcox Et Al-2017-Ecology Letters
No ratings yet
Wilcox Et Al-2017-Ecology Letters
12 pages
4 GB-T50080-2002 Standard For Test Method of Performance On Ordinary Fresh Concrete
No ratings yet
4 GB-T50080-2002 Standard For Test Method of Performance On Ordinary Fresh Concrete
33 pages
PDF Statistics in Engineering With Examples in MATLAB and R Second Edition Chapman Hall CRC Texts in Statistical Science Andrew Metcalfe Download
100% (2)
PDF Statistics in Engineering With Examples in MATLAB and R Second Edition Chapman Hall CRC Texts in Statistical Science Andrew Metcalfe Download
54 pages
Reading 7 Introduction To Linear Regression - Answers
No ratings yet
Reading 7 Introduction To Linear Regression - Answers
8 pages
Orthodontics Basic Aspects ITO12
100% (10)
Orthodontics Basic Aspects ITO12
456 pages
Digital Marketing Course Guide
No ratings yet
Digital Marketing Course Guide
11 pages
PLS-SEM for Researchers
No ratings yet
PLS-SEM for Researchers
41 pages
Study On Employee Engagement - With Reference To Middle and Junior Level Management Employees at Manufacturing Industry, Chennai, Tamilnadu
No ratings yet
Study On Employee Engagement - With Reference To Middle and Junior Level Management Employees at Manufacturing Industry, Chennai, Tamilnadu
54 pages
Cs3352 Foundations of Data Science L T P C
No ratings yet
Cs3352 Foundations of Data Science L T P C
2 pages
Quantitative Methods for Managers
No ratings yet
Quantitative Methods for Managers
5 pages
ABEBE and Gash Research Final
No ratings yet
ABEBE and Gash Research Final
27 pages
BR 11 eLMS Activity 1 - ARG
No ratings yet
BR 11 eLMS Activity 1 - ARG
3 pages
Development of The Analytical Method For 1,4-Dioxane in Water by Liquid-Liquid Extraction
No ratings yet
Development of The Analytical Method For 1,4-Dioxane in Water by Liquid-Liquid Extraction
7 pages
Chapter 3 Two Variable Regression Model
No ratings yet
Chapter 3 Two Variable Regression Model
7 pages
The Relationship Between Student Motivation and Class Engagement Levels
No ratings yet
The Relationship Between Student Motivation and Class Engagement Levels
22 pages
Statistical Methods For Psychology Seventh Edition David C. Howell PDF Available
100% (3)
Statistical Methods For Psychology Seventh Edition David C. Howell PDF Available
78 pages
Future Generation Computer Systems: Yan Wang Zhensen Wu Yuanjian Zhu Pei Zhang
No ratings yet
Future Generation Computer Systems: Yan Wang Zhensen Wu Yuanjian Zhu Pei Zhang
10 pages
Waist-Hip Ratio and Cognitive Ability: Is Gluteofemoral Fat A Privileged Store of Neurodevelopmental Resources?
No ratings yet
Waist-Hip Ratio and Cognitive Ability: Is Gluteofemoral Fat A Privileged Store of Neurodevelopmental Resources?
9 pages
NMIMS Centre For Distance and Online Education (NCDOE) Course: Business Analytics Internal Assignment Applicable For Dec 2025 Examination
No ratings yet
NMIMS Centre For Distance and Online Education (NCDOE) Course: Business Analytics Internal Assignment Applicable For Dec 2025 Examination
3 pages

Data Mining Report

Uploaded by

Data Mining Report

Uploaded by

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

DATA MINING (22CS63)

CAR PRICE PREDICTION

Submitted in partial fulfillment of the requirement for the award of Degree of

Under the Guidance of

Department of Computer Science and Engineering

Fig 3.1 Stacked Model being used 6

Fig 4.1 Top 10 Features of Dataset identified 8

Fig 4.2 Distribution of Selling Price with 10

Fig 4.3 Predicted vs Actual Selling Price Plot 11

Fig 4.4 Residual Plot 12

Department of Computer Science and Engineering

Internal Guide Signature of the HOD Signature of Principal

Dr. Vijaya Shetty S Dr. S Meenakshi Sundaram Dr. H. C. Nagaraj

(i) The project work is our original work

NAME USN Signature

GAGAN S KUNKANAD 1NT22CS067

NAME USN Signature

GAGAN S KUNKANAD 1NT22CS067

1.2 Problem Domain

1.3 Aim and Objectives

 To analyze a real-world car dataset and extract meaningful insights.

2.1 Dataset Used

2.2 Data Preprocessing

 Feature Scaling & Polynomial Features:

3.1 Data Mining Questions

 What car features most influence the selling price?

3.2 Data Mining Algorithms

We evaluated multiple regression algorithms:

 Linear Regression (for baseline performance)

3.3 Data Mining Models

Feature Importance Visualization

 Extraction of Feature Importance:

Fig 4.1: Top 10 Features of Dataset identified

Target Distribution Visualization

Training Set Performance

Regression Fit Visualization

Insights from Predicted vs Actual Selling Price Plot

Fig 4.4 Residual Plot

Insights from Residual Plot

Value of Ensemble Methods:

Importance of Feature Engineering:

Collaboration and Problem-Solving:

Expanding the Feature Set:

1. Data Understanding and Preparation

2. Preprocessing and Feature Engineering

Through iterative experimentation, we recognized how preprocessing choices (e.g., log

3. Model Development and Ensemble Learning

By implementing and comparing diverse algorithms-from linear regression to gradient boosting-

4. Evaluation and Communication

Preparedness for Future Challenges

[1] Scikit-learn Developers, "Scikit-learn: Machine Learning in Python," [Online].

[2] N. Birla, "Vehicle Dataset from Cardekho," Kaggle, [Online].

[3] Python Software Foundation, "Python 3 Documentation," [Online].

b. Python Codes Implemented:

Available in below GitHub repository:

c. Setup to Execute the Code:

You might also like