[go: up one dir, main page]

0% found this document useful (0 votes)
27 views26 pages

Lung Cancer Detection Report

Uploaded by

ritiksuri82
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views26 pages

Lung Cancer Detection Report

Uploaded by

ritiksuri82
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 26

INTERNSHIP AND PROJECT REPORT

UTTARANCHAL INSTITUTE OF TECHNOLOGY

TOPIC - LUNG CANCER DETECTION USING MACHINE LEARNING


Submitted in partial fulfilment of the
Requirements for the award of

Degree of Bachelor of Technology in Computer Science &


Engineering

Name: Abhishek Ranjan


University Roll No.: 2201010019 , Semester/Branch: 5th / CSE-1

SUBMITTED TO: Ms. STUTI BHATT

Department of Computer Science & Engineering


UIT, UTTARANCHAL UNIVERSITY
Dehradun (Uttarakhand), 248001.
INTERNSHIP AND PROJECT REPORT

TOPIC - LUNG CANCER DETECTION USING MACHINE LEARNING

Submitted in Partial Fulfilment of the Requirements for the Award of


Degree of Bachelor of Technology in Computer Science & Engineering

Submitted By :
Name: Abhishek Ranjan
University Roll No.: 2201010019
Semester/Branch: 5th Semester, Computer Science & Engineering

SUBMITTED TO :
Ms. Stuti Bhatt
Department of Computer Science & Engineering
UIT, UTTARANCHAL UNIVERSITY
Dehradun (Uttarakhand), 248001
CERTIFICATE
DECLARATION

I hereby declare that the Project Report on the “ LUNG CANCER DETECTION USING
MACHINE LEARNING” is an authentic record of my own work, carried out as a part of the
Minor Industrial Training, submitted in partial fulfilment for the award of the degree of
Bachelor of Technology (B.Tech) in Computer Science & Engineering at University Institute
of Technology (UIT), Uttaranchal University, Dehradun.
This work was completed under the valuable guidance and supervision of Ms. Stuti Bhatt,
whose expertise and support greatly contributed to the successful completion of this project. I
confirm that this report is a reflection of my own efforts, original research, and design
process as per the requirements set forth by the institution. I also declare that any sources or
references used in this work have been duly acknowledged and credited within the report,
adhering to academic integrity standards.
This project has strengthened my understanding of Machine Learning principles in making
the Lung cancer detection project .The main objective of this project is to develop a machine
learning model that can classify and predict lung cancer based on relevant features extracted
from the dataset. This project will involve data preprocessing, feature selection, model
training, and evaluation to achieve a high-performing classifier.

Date:
Signature of Student: Abhishek Ranjan
University Roll No.: 2201010019
ACKNOWLEDGEMENT

First and foremost, I would like to express my sincere gratitude to my mentor,


Ms. Stuti Bhatt, for her insightful guidance and encouragement during my Lung
Cancer Detection Using Machine Learning. Her mentorship has been invaluable
in helping me navigate the complexities of design, inspiring me to reach a
higher level of proficiency in creating an efficient platform for agricultural
commerce.
I am also deeply thankful to Prof. Dr. Madhu Kirola, Head of the Department of
Computer Science & Engineering, Uttaranchal University, for his unwavering
support. Her commitment to fostering an environment of learning and
exploration has provided me with the foundation to develop my technical and
professional skills. I am also grateful to the professors and mentors from the
NPTEL program, who have provided a strong foundation in Machine Learning
Additionally, I extend my gratitude to the faculty and staff of the Computer
Science & Engineering Department for their resources, guidance, and feedback,
which have significantly contributed to the success of this project.
Lastly, I am grateful to my family, friends, and colleagues for their support,
which has kept me motivated throughout this journey.
Thank you all for making this project a successful learning experience and a
significant step forward in applying machine learning to healthcare.

Signature of Student: Abhishek Ranjan


University Roll No.: 2201010019
Table of Contents

About Industry

INTRODUCTION OF MACHINE LEARNING


Project on Lung Cancer Detection
About Industry

NPTEL (National Programme on Technology Enhanced Learning) is an initiative by seven


Indian Institutes of Technology (IITs) and the Indian Institute of Science (IISc) funded by the
Ministry of Education, Government of India. The goal of NPTEL is to provide accessible, high-
quality education in various fields, primarily focusing on engineering, technology,
management, and science, using online platforms.

Key Features of NPTEL:


1. Online Courses:
o NPTEL offers a wide range of online courses, including free video lectures,
tutorials, and assignments in diverse subjects like Computer Science,
Electrical Engineering, Civil Engineering, Mechanical Engineering,
Mathematics, Physics, and more.
o Courses are typically created and taught by faculty from top institutions like
IITs and IISc, ensuring high-quality content.
2. Certification:
o Learners can enroll in NPTEL courses for free, but there is an option to appear
for a proctored exam to earn a certificate.
o These certificates are highly valued and can be used to enhance resumes and
career opportunities.
3. Flexible Learning:
o Courses are self-paced, allowing learners to access content anytime and
anywhere.
o This flexibility makes it ideal for students, working professionals, and lifelong
learners.
4. Local Chapters:
o NPTEL has a network of Local Chapters in colleges and universities across
India.
o These chapters facilitate guidance and help students register, learn, and even
conduct local exams.
5. Online Learning Platform:
o NPTEL's content is hosted on platforms like YouTube, providing free access to
video lectures.
o It also partners with platforms like Swayam for additional interactive content
and certification options.
6. Assignments and Exams:
o NPTEL courses include weekly assignments and regular exams to assess
understanding and encourage active learning.
o These assignments help students practice and evaluate their grasp of the
subject matter.

Benefits of NPTEL:
 Quality Education: Courses are taught by expert professors from top institutes,
ensuring high academic standards.
 Cost-Effective: Most content is free to access; the only cost is for exams if certification
is desired.
 Career Enhancement: Earning a certificate from NPTEL can significantly enhance job
prospects and academic credentials.
 Continuous Learning: NPTEL encourages a culture of continuous learning, helping
professionals upskill in their careers.
Overall, NPTEL is a valuable resource for anyone seeking to improve their knowledge, skills,
and qualifications through quality online education.
INTRODUCTION OF MACHINE LEARNING

Machine Learning (ML) is a subfield of Artificial Intelligence (AI) that focuses on the
development of algorithms and statistical models to enable computers to perform specific
tasks without explicit instructions. These tasks involve recognizing patterns, making
decisions, and predicting outcomes based on data. ML has become a crucial technology in
various fields, including healthcare, finance, marketing, and more.
Machine Learning is a method of data analysis that automates analytical model building. It is
based on the idea that systems can learn from data, identify patterns, and make decisions
with minimal human intervention.

There are three main types of machine learning:

1. Supervised Learning:
o The algorithm learns from labeled data.
o The model predicts outcomes based on past experiences.
o Example algorithms: Linear Regression, Decision Trees, Support Vector
Machines (SVM), etc.

2. Unsupervised Learning:
o The algorithm works on unlabeled data.
o It finds hidden patterns or intrinsic structures.
o Example algorithms: K-means Clustering, Principal Component Analysis (PCA),
etc.

3. Reinforcement Learning:
o The algorithm learns through trial and error.
o It receives rewards or penalties based on its actions.
o Example algorithms: Q-Learning, Deep Q Networks (DQN), etc.

Applications of Machine Learning


Machine Learning is widely used in several domains:
 Healthcare: Disease prediction, medical imaging analysis, personalized medicine.
 Finance: Fraud detection, risk assessment, algorithmic trading.
 Retail: Recommendation systems, customer segmentation, demand forecasting.
 Manufacturing: Predictive maintenance, quality control, process optimization

Project on Lung Cancer Detection


Lung cancer is one of the leading causes of cancer-related deaths globally. Early detection of
lung cancer can significantly improve patient outcomes and survival rates. Traditional
diagnostic methods, such as biopsies and imaging, can be invasive, time-consuming, and
costly. Machine learning offers a way to analyze large amounts of medical data quickly and
accurately to detect lung cancer early

Lung Cancer Detection using Machine Learning:


1. Objective:

• The goal of this project is to develop a machine learning model to detect lung cancer
based on patient data . The model aims to classify whether a given sample indicates
the presence of lung cancer or not.

• The goal of this project is to develop a machine learning model capable of accurately
predicting the presence of lung cancer using medical data, such as demographic
information, imaging features, and patient history. By leveraging various machine
learning algorithms, the project aims to assist in early diagnosis, improve patient
outcomes, and support healthcare professionals in decision-making processes.

• This project will explore different classification techniques, evaluate their


performance using appropriate metrics (accuracy, precision, recall, etc.), and
optimize the model for real-world application in lung cancer detection.
• Machine learning algorithms can be applied to predict lung cancer disease based on
various factors such as medical history, lifestyle habits, and medical imaging data. By
training models on historical data of lung cancer patients and their characteristics,
algorithms can learn the patterns associated with the disease and predict the
likelihood of its development in new individuals.
2. Dataset:
 Data Source: The dataset consists of medical data related to lung cancer, including
features like age, gender, smoking history, and results from medical tests (e.g.,
imaging results, biomarkers).
 Size: The dataset contains thousands of records of patients with labeled outcomes
(cancerous or non-cancerous).
 Features: Demographic data, medical history, clinical test results, imaging data (if
available).
 The dataset for this has been taken from KAGGLE .

3. Steps Involved in the Project:


a. Data Collection:
o Gather data from reliable sources, including public datasets and medical
research databases.
o Ensure data quality and completeness.
b. Data Preprocessing:
o Handle missing values, outliers, and inconsistent data.
o Normalize/standardize data to ensure all features contribute equally to the
model.
o Split data into training and testing sets.
c. Feature Engineering:
o Select relevant features for the model.
o Extract useful patterns and attributes from raw data.
d. Model Selection:
o Choose appropriate machine learning algorithms for classification.
o Consider different models like:
 Logistic Regression: For binary classification.
 KNN: It is particularly useful when the dataset is small, the
relationships between data points are clear, and the decision
boundary is not overly complex.
 Random Forest: For handling large datasets and complex interactions.
 Support Vector Machine (SVM): For high-dimensional data.
 Neural Networks: For deep learning applications.
e. Model Training and Evaluation:
o Train the selected models using the training dataset.
o Use cross-validation to avoid overfitting.
o Evaluate the models using metrics like accuracy, precision, recall, F1-score,
and AUC (Area Under the ROC Curve).
f. Hyperparameter Tuning:
o Optimize model performance using techniques like Grid Search or Random
Search.
o Fine-tune parameters to improve accuracy and reduce errors.
g. Model Deployment:
o Deploy the trained model for real-time or batch predictions.
o Use frameworks like Flask or Django for building a web interface to interact
with the model.

4. Workflow:

5. Libraries & Algorithms Used:


LIBRARIES:-
i. NUMPY
ii. PANDAS
iii. SCIKIT-LEARN
iv. MATPLOTLIB
ALGORITHM’S:-
a. K NEAREST NEIGHBOUR
b. DECISION TREE
c. RANDOM FOREST
d. LOGISTIC REGRESSION
• The Integrated Development Environment(IDE) used is “JUPYTER” where the
“PYTHON” programming language is used to get the desired output of the project.
A. K-Nearest Neighbors (KNN) Algorithm:

The K-Nearest Neighbors (KNN) algorithm is a simple yet powerful supervised learning
technique used for both classification and regression tasks. It operates on the principle of
proximity, where the algorithm predicts the class or value of a new data point based on the
classes or values of its nearest neighbors. The algorithm's performance depends on the
choice of "k," the number of neighbors considered, as well as the distance metric used to
calculate the proximity between data points.

How it Works
 Calculate the distance between the new data point and all the existing data points in
the training set.
 Select the "k" nearest neighbors based on the calculated distances.
 For classification, assign the class of the majority of the "k" nearest neighbors to the
new data point.
 For regression, average the values of the "k" nearest neighbors to predict the new
data point's value.
Strengths
• Simple to implement and understand.
• No assumptions about data distribution.
• Can handle both classification and regression problems
Limitations
• Can be slow for large datasets.
• Sensitive to the choice of "k" and distance metric.
• Prone to overfitting, especially with high values of "k“.

B. Decision Trees

Decision trees are built by recursively partitioning the data based on the feature that
best separates the classes or predicts the target variable at each node. The splitting
criteria, such as Gini impurity or entropy, are used to determine the optimal feature
for partitioning.

C. Random Forests

Random forests are ensembles of decision trees that are trained on different subsets of
the data and features. This ensemble approach reduces variance and improves
prediction accuracy by combining the predictions of multiple trees. Random forests
are highly robust to overfitting and can handle large datasets efficiently.

D. Logistic Regression

• Logistic regression is a powerful supervised learning algorithm used for classification


tasks. Unlike linear regression, which predicts continuous values, logistic regression
predicts the probability of a data point belonging to a specific class. It uses a sigmoid
function to map the input features to a probability between 0 and 1, which is then used
to classify the data points.
• Logistic Regression is used when the dependent variable(target) is categorical.
Strengths

• Simple to understand and implement.


• Provides probabilities for predictions.
• Can handle both binary and multi-class classification

Limitations

• Assumes linear relationship between features and target variable.


• Can be sensitive to outliers.
• Prone to overfitting with high dimensionality
6. Model Evaluation Metrics:
 Accuracy: Measures the percentage of correctly predicted instances.
 Precision: Determines the proportion of true positives among the predicted
positives.
 Recall: Measures the ability to identify all actual positives.
 F1 Score: A balance between precision and recall.
 ROC Curve and AUC: Evaluates the performance of the classification model.

7. Challenges:
 Imbalanced dataset due to fewer positive cases (cancer cases) compared to negative
ones.
 Feature selection for identifying the most impactful medical attributes.
 Dealing with high-dimensional data when medical images are involved.

8. Results and Findings:


a) Result Of Knn Algorithm:-
b) Result Of Decision Tree Algorithm:-
c) Result Of Random Forest Classifier:-
d) Result Of Logistic Regression Algorithm:-
1) The Random Forest algorithm, Logistic Regression and Decision Tree Algorithm
demonstrated the best performance in terms of accuracy and robustness for lung cancer
detection in this project.
2) The model's performance metrics:
a) Accuracy: 96.77%
b) Precision-(0): 50% ,-(1): 98%
c) Recall-(0): 50% ,-(1): 98%
d) F1 Score-(0): 50% ,-(1): 98%
3) The feature importance analysis indicated that smoking history and specific biomarkers
were significant indicators of lung cancer.

9.Conclusion:
Machine Learning offers a powerful toolset for early detection of diseases such as lung
cancer. This project demonstrated how data-driven approaches could significantly aid in
medical diagnosis, potentially saving lives by detecting conditions earlier than traditional
methods. With further refinement and access to more data, these models can achieve even
higher accuracy and reliability.

10. Future Work:


 Explore deep learning models like Convolutional Neural Networks (CNN) for analyzing
medical imaging data (CT scans, X-rays).
 Increase the dataset size to improve the model's generalization.
 Implement a user-friendly interface for medical professionals to utilize the detection
model.
 Integrate the ML model into a clinical decision support system for automated, real-
time diagnostics.

11. References:
 "Introduction to Machine Learning" by Alpaydin, E.
 "Machine Learning Yearning" by Andrew Ng.
 NPTEL Course on Machine Learning.
 Research articles and journals on lung cancer detection using machine learning.
THANK YOU

You might also like