[go: up one dir, main page]

100% found this document useful (3 votes)
2K views30 pages

Used Car Price Prediction: B.E. (CSE) VI Semester Case Study

This document summarizes a case study report on used car price prediction. The report describes building a machine learning model to accurately predict used car prices based on vehicle features. It introduces random forest regression to analyze a dataset from an online car marketplace. The model aims to help customers and dealers determine fair market prices. The report outlines the purpose, scope, methodology, and background of the project, which uses Python libraries for data analysis and model training. The case study evaluates machine learning techniques for price prediction to benefit the growing used car industry.

Uploaded by

Ajay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
2K views30 pages

Used Car Price Prediction: B.E. (CSE) VI Semester Case Study

This document summarizes a case study report on used car price prediction. The report describes building a machine learning model to accurately predict used car prices based on vehicle features. It introduces random forest regression to analyze a dataset from an online car marketplace. The model aims to help customers and dealers determine fair market prices. The report outlines the purpose, scope, methodology, and background of the project, which uses Python libraries for data analysis and model training. The case study evaluates machine learning techniques for price prediction to benefit the growing used car industry.

Uploaded by

Ajay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

A Case Study Report on

USED CAR PRICE PREDICTION

Submitted in partial fulfilment for the requirements of

B.E. (CSE) VI Semester Case Study

in

COMPUTER SCIENCE AND ENGINEERING

by

k Venkata Ajay Kumar(160118733119)

D Raviteja(160118733308)

Department of Computer Science and Engineering, Chaitanya

Bharathi Institute of Technology (Autonomous),


(Affiliated to Osmania University, Hyderabad) Hyderabad,
TELANGANA (INDIA) – 500 075

1
CERTIFICATE
This is to certify that the project entitled Used Car Price Prediction, submitted to the Computer Science and
Engineering Department, Chaitanya Bharathi Institute of Technology, in partial fulfilment of the requirement
for the course Mini Project, is a bonafide record of work done by Venkata Ajay Kumar Kadiyala(1601-18-
733-119) and Domakonda RaviTeja (1601-18-733-308), from August, 2020 to November 2020 under our
guidance and supervision.

Mentors, Supervisor,
Mr.R.Srikanth Dr.Kolla Morarjee
Assistant Professor Assistant Professor,
Department of CSE, CBIT Department of CSE,
CBIT
Dr.Y.Rama Devi
Head of The department,
Department of CSE, CBIT

2
ACKNOWLEDGEMENTS

We would like to express our heartfelt gratitude to Mr.R.Srikanth(Assistant Professor)


mentor, for his invaluable guidance and constant support, along with his capable
instruction and persistent encouragement.
We are grateful to our Head of Department, Dr. Y.Ramadevi(Professor), for her steady
support and the provision of every resource required for the completion of this case study.
.
We would like to take this opportunity to thank our Incharge, Sri. Kolla Morarjee, as well
as the management of the institute, for having designed an excellent learning atmosphere.
Our thanks are due to all members of the staff and our friends for providing us with the
help required to carry out the groundwork of this project

3
Table of Contents

Abstract 5
1.Introduction 6
1.1 Purpose
1.2 Intended Audience and Reading Suggestions
1.3 Product Scope
1.4 References

2.Background Information 8

3.Scope of the case study 9

4.Design and Implementation 10

4.1 Overall Description


4.1.1 Product Perspective
4.1.2 Product Functions
4.1.3 User Classes and Characteristics
4.1.4 Operating Environment
4.1.5 Design and Implementation Constraints
4.1.6 User Documentation
4.1.7 Assumptions and Dependencies
4.2 External Interface Requirements
4.2.1 User Interfaces
4.2.2 Hardware Interfaces
4.2.3 Software Interfaces
4.2.4 Communications Interfaces
4.3 System Features
4.3.1 Data Preprocessing
4.3.2 Data Training and Modelling
4.3.3 Proposed Model
4.4 Other Nonfunctional Requirements
4.4.1 Performance Requirements
4.4.2 Safety Requirements
4.4.3 Security Requirements
4.4.4 Software Quality Attributes
4.4.5 Business Rules
4.5 Design and Implementation
4.5.1 DFD
4.5.2 ER Diagram
4.5.3 Use Case diagram
4.5.4 Sequence And Collaboration diagram
4.5.5 Activity and State chart diagram
4.5.6 Component and Deployment Diagram
4.5.7 Web application
5.Result Analysis and Recommendations 29
6.Conclusion and Discussion 30
References

4
ABSTRACT

The production rates of cars have been rising progressively during the past decade, with almost 92
million cars being produced in the year 2019. This has provided the used car market with a big rise
which has now come into picture as a well-growing industry. The recent arrival of various online
portals and websites has provided with the need of the customers, clients, dealers and the sellers
to be updated with the current scenario and trends to know the actual value of any used car in
the current market. While there are numerous applications of machine learning in real life but
one of the most pronounced application is it’s use in solving the prediction problems. Again, there
is an end number of topics on which the prediction can be done.

This Case Study is very much focused and based upon one such application. This is a web
application built using python .Making the use of a Machine Learning Algorithm such as Random
Forest, we will try to predict the price of a used car and build a statistical model based on
provided data with a given set of attributes

5
1. Introduction

1.1 Purpose

Determining whether the listed price of a used car is a challenging task, due to the many factors that
drive a used vehicle’s price on the market. The focus of this project is developing machine learning
model that can accurately predict the price of a used car based on its features, in order to make
informed purchases. We implement and evaluate method on a dataset consisting of the sale prices
of different makes and models across cities. The model used is Random Forest model. Using the
model out-comes we then show the best price for the used car.

1.2 Intended Audience and Reading Suggestions

This project is a prototype for the Used cars price prediction and it is restricted within the college
premises. This has been implemented under the guidance of college professor.
This project is useful for predicting the price of old car that can be sold with a good price as per
market value.

1.3 Product Scope

The Used Car Price Prediction is a flask web application which predicts car prices based on given
independent features like Car_Name, Year, Selling_Price, Present_Price, Kms_Driven, Fuel_Type,
Seller_Type, Transmission, and Owner. The dataset is available at Kaggle, and it's provided by
cardekho.com.

The code is written in Python 3.6.

6
Most of the project has been developed using Python as the programming language of choice and
the following libraries:

Scikit-Learn, regression models and cross validation techniques.


Spark-Sklearn, parallelization of the hyperparameter tuning process.
Pandas, data analysis purposes.

1.4 References

[1] Vehicle Data Set-retrieved-from https://www.kaggle.com/nehalbirla/vehicle-dataset-from-


cardekho

[2] https://www.temjournal.com/content/81/TEMJournalFebruary2019_113_118.pdf

[3] http://ripublication.com/irph/ijict_spl/ijictv4n7spl_17.pdf

[4] https://www.youtube.com/watch?v=p_tpQSY1aTs

7
2.Background information

We utilized several classic and state-of-the-art methods, including ensemble learning techniques, with
a 90% - 10% split for the training and test data. To reduce the time required for training, we used 500
thousand examples from our dataset. Random Forest is our baseline methods. For most of the model
implementations, the open-source Scikit-Learn

Random Forest is an ensemble learning based regression model. It uses a model called decision tree,
specifically as the name suggests, multiple decision trees to generate the ensemble model which
collectively produces a prediction. The benefit of this model is that the trees are produced in parallel
and are relatively uncorrelated, thus producing good results as each tree is not prone to individual
errors of other trees. This uncorrelated behavior is partly ensured by the use of Bootstrap Aggregation
or bagging providing the randomness required to produce robust and uncorrelated trees. This model
was hence chosen to account for the large number of features in the dataset and compare a bagging
technique with the following gradient boosting methods.

Used cars sales are on global increase .There is a need for used car price prediction system to
effectively determine business finance and customer purchase.

Due to the increased price of new cars and the incapability of customers to buy new cars due to the
lack of funds, used cars sales are on a global increase

The production rates of cars have been rising progressively during the past decade, with almost 92
million cars being produced in the year 2019.

There is a need for a used car price prediction system to effectively determine the worthiness of the
car using a variety of features

This has provided the used car market with a big rise which has now come into picture as a well-
growing industry

8
3.Scope of the case study

This study was performed to understand the concepts of Machine learning techniques.
Experimental results clearly depict that applying these techniques has increased the accuracy
of the model by decreasing overfitting.

The Used Car Price Prediction is a flask web application which predicts car prices based on given
independent features like Car_Name, Year, Selling_Price, Present_Price, Kms_Driven, Fuel_Type,
Seller_Type, Transmission, and Owner. The dataset is available at Kaggle, and it's provided by
cardekho.com.

The application predicts the actual market value considering various characteristics that helps
both buyers and sellers.

The application successfully predict the price of a used car and give the best selling price to
user.

We would like to make use of our model in a real time system and deploy it. If possible, we would
also like to create an Indian dataset and see how our model is compatible with that dataset.

This prediction application will be deployed on cloud platform Heroku in future.This project will
be very helpful in selling/buying the cars for customers and buying dealers.

The application will help the customers in such a way that whether they have paid the correct
price or not for used car.

9
4.Design and Implementation

4.1 Overall Description

4.1.1 Product Perspective


We propose a methodology using Machine Learning model namely random forest to predict the
prices of used cars given the features. The price is estimated based on the number of features as
mentioned above. The intricate details about this model on the used car's data set along with the
accuracy are narrated in depth in Section V. We then deploy a website to display our results which
are capable of predicting the price of a car given so many features of it. This deployed service is a
result of our work, and it incorporates the data, ML model with the features.

4.1.2 Product Functions


First, we collect the data about used cars, identify important features that reflect the price.

Second, we preprocess and remove entries with NA values. Discard features that are not
relevant for the prediction of the price.

Third, we apply random forest model on the preprocessed dataset with features as inputs and
the price as output.

Finally, we deploy a web page as a service which incorporates all the features of the used
cars and the random forest model to predict the price of a car

Predicts the best price for the given car details

4.1.3 User Classes and Characteristics

India has one of the biggest automobile markets all over the globe every day many buyers usually
sell their cars after using for the time to another buyer, we call them as 2nd /3rd owner etc. Many
platforms such as cars24.com, cardekho.com and OLX.com provides these buyers with a platform

10
where they can sell their used cars, but what should be the price of the car, this is the toughest
question ever. Machine Learning algorithms can bring a solution to this problem.

This application predicts the car value based on the following characteristics

1.Model year

2.Showroom Price

3.Kilometers Driven

4.Owner Type(ex:1st owner)

5.Fuel type

6. Seller Type(Individual/Dealer)
7.Transmission Type(Manual / Auto)

All these classes gives the best selling price for the vehicle

The features that are available to the user are:

• Accuracy: - The level of accuracy in the proposed system will be higher. All operation
would be done correctly and it ensures that whatever information is coming from
the center is accurate.
• Reliability: - The reliability of the proposed system will be high due to the abovestated
reasons. The reason for the increased reliability of the system is that now there would
be proper storage of information.
• No Redundancy: - In the proposed system utmost care would be that no
information is repeated anywhere, in storage or otherwise. This would assure
economic use of storage space and consistency in the data stored.
• Easy to Operate: - The system should be easy to operate and should be such thatit can
be developed within a short period of time and fit in the limited budget of the user.

11
4.1.4 Operating Environment
The product will be operating in windows/mac/linux environment.can be opened in all the terminals
/command Prompt or IDE and is easy to operate in any Operating system .First we train the dataset
using the Random Forest Algorithm and create a model. Using this model we run the application in
anaconda which displays a web page asking the details of the user car.
The application can be operated in offline mode. The hardware configuration includes Hard Disk: 40
GB, Monitor: 15” Color monitor,Keyboard: 122 keys. The basic input devices required are keyboard,
mouse etc.

4.1.5 Design and Implementation Constraints


The model mentioned in this document is being deployed on Heroku platform and hence
feeding it input can be done through the UI design.

4.1.6 User Documentation


This document can be referred by the end user.

4.1.7 Assumptions and Dependencies


The assumptions are:

• The coding should be error free.


• The application should be user-friendly so that it is easy to use for the users.
• The application should give accurate and fast results to the user.

The dependencies are:

• The specific hardware, software and libraries due to which the application will berun.
• On the basis of listing requirements and specification the project will be developed and
run.
• The end users should have proper understanding of the product.

12
• The model should be trained to achieve accuracy.

4.2External Interface Requirements

4.2.1User Interfaces
The user is asked to enter all the details to calculate the selling price of the car.
1.Model year

2.Showroom Price

3.Kilometers Driven

4.Owner Type(ex:1st owner)

5.Fuel type

6.Seller Type(Individual/Dealer)

7.Transmission Type(Manual / Auto)

4.2.2Hardware Interfaces

GPU:-Graphics Processor (NVIDIA) ̶ min 2GB

The current development of TensorFlow supports only GPU


computing using NVIDIA toolkits and software

Storage Disk (Optional): SSD – Min 400MB/s Read Speed

13
4.2.3Software Interfaces
Software:-Anaconda, Python 3.x (3.8 or earlier)

Editor:-VS Code/ PyCharm/ Sublime/ Spyder

Framework:-Flask

Packages:-pandas,numpy,sklearn,seaborn

4.2.4Communications Interfaces
There are no external communications interface requirements,as we are using a specific
data set.

4.3System Features
4.3.1 Data Preprocessing:

“Before Training, any model using any algorithm Data Preprocessing is that the most significant step
and will be the primary step. the data Preprocessing contains several checkpoints (steps) such as: "

Step 1: Import Libraries: The essential Libraries for Data preprocessing I used are Pandas for data
manipulation and analysis, Numpy for numerical analysis, Matplotlib and Seaborn for better visuals
and graphical stats of the data.

Step 2: Import the Dataset: First downloaded this dataset from Kaggle, and then downloaded the
dataset using the pandas library.

Step 3: Taking care of Missing Data in Dataset: After evaluation of this dataset, I found no missing
values in the dataset.

14
Step 4: Encoding categorical data: This dataset contains some Categorical values such as fuel type,
owner type, seller type, so we need to encode these categorical data into an encoded format to
better train our model, to do this I used get_Dummies() method of pandas and this converted the
whole Categorical values in the dataset into binary values.

Step 5: Splitting the Dataset into the Training set and Test Set: To split this dataset into Test and
Train dataset to train our machine learning model I used the capable machine learning library of
python, scikit-learn or sklearn. Using its model selection method to create testing data by picking
random values from the available dataset for model prediction, or we can say Supervised Learning.

Step 6: Feature Scaling: Since all the data, available in a standard format, so here I do not use any
feature scaling techniques.

15
4.3.2 Data Training and Modelling:
To train and develop a model, first of all, we need to the dependent and independent variables. To
find these variables, first I used to find the correlation between the variables of the output and then
separates my variables into two different axes we call it x and y where the x-axis contains all the
independent variable and y-axis having the dependent variable, in our model its selling price of the
Used Cars. Using sklearn.model_selection library and its train_test_split function, further this
dataset is distributed in the train-test dataset to find the best hyperparameters for our model
prediction.

4.3.3 Proposed Model:


The proposed model is an application of the machine learning algorithm i.e. Random Forest
Algorithm . In this model first, the dataset is loaded for further exploration. In this specific model, I
used a Dataset available at Kaggle. After performing the Data preprocessing steps on this dataset
such as handling missing values, Hot encoding of Categorical Values, we start training the model for
distributed dataset into two 1. Training Dataset and 2. Test Dataset. This test data is picked randomly
from the original dataset. Applied the machine Learning algorithm i.e. Random Forest Algorithm and
done tuning of the Hyperparameters to get the best Hyper-Parameters for result prediction. Once
the model predicts a result, it prints out the selling price of the car based on given parameters.

16
4.4 Other Nonfunctional Requirements

4.4.1 Performance Requirements


Performance is one of the most important aspect for this system. The system should have a
high performance such that the user can see the best selling price.

4.4.2 Safety Requirements


There are no safeguards or actions that must be taken, as well as actions that must be
prevented.

4.4.3 Security Requirements


Security must be maintained in protecting the dataset.As changes in the dataset can
change the selling price .Only authorized users should be given access to the backend of
the application.

4.4.4 Software Quality Attributes


This application is reusable even as the days pass on by changing the dataset.The end user
needs to be aware of the fact that all the outputs obtained from the model may not be what
it is expected. i .e a positive review on a product might end up yielding a fairly neutral output.

4.4.5 Business Rules


Any Individual or Business organizations like Olx , Cardekho ,various car showrooms can use
this application for predicting the price of the used car based on the present requirements.

17
Data Flow Diagram

18
a.ER Diagram

19
a. Use Case and Uml Diagram

20
21
b. Sequence and Collaboration Diagram

22
23
c. Activity and State Chart Diagram

24
25
-
d. Component and Deployment Diagram

26
27
4.5.7 Web application

• The data set is taken from Kaggle.We have used Random forest algorithm .
• Random Forest is a popular machine learning algorithm that belongs to the supervised
learning technique. It can be used for both Classification and Regression problems in
ML.
• It is based on the concept of ensemble learning, which is a process of combining
multiple classifiers to solve a complex problem and to improve the performance of the
model.

28
5.Result Analysis and Recommendation

After the user giving the input according to the users car ,the car details will be
compared with the data set that is taken from Kaggle and is processed with the machine
learning algorithm i.e Random forest Model.

According to the market valuation the best price will be given to the user that will be displayed
on the website.

This application predicts the actual market value considering various characteristics that helps
both buyers and sellers.

This application successfully predict the price of a used car and give the best selling price to user.

29
6.conclusion and future work

This application predicts the actual market value considering various characteristics that helps
both buyers and sellers.

This application successfully predict the price of a used car and give the best selling price to
user.

We will be doing this application more user friendly with better UI and will add more
features for the current application.

This prediction application will be deployed on cloud platform Heroku.

We have an idea of developing user friendly mobile application which will be easier to use .

References

[1] Vehicle Data Set retrieved from https://www.kaggle.com/nehalbirla/vehicle-dataset-from-


cardekho

[2] https://www.temjournal.com/content/81/TEMJournalFebruary2019_113_118.pdf

[3] http://ripublication.com/irph/ijict_spl/ijictv4n7spl_17.pdf

[4] https://www.youtube.com/watch?v=p_tpQSY1aTs

30

You might also like