Face Detection Project Report White Book
Face Detection Project Report White Book
Master of Science
(Computer Science)
Guide
Swati Mourya
HOD
Department of Information Technology and Computer Science
S K Somaiya College, Somaiya Vidyavihar University
By
Rohit Nagula
Roll no: 31031522021
Project
Subject
S K Somaiya College
Institution
2024
Year
Swati Mourya
Teacher in Charge
Face Detection
Title of the Project
Vidyavihar
Location
Duration
This is to certify that the project entitled “Face detection” is a bonafide work of
“Rohit Nagula” (Exam seat No. 31031522021) submitted to the S K Somaiya College in
partial fulfillment of the requirement for the award of the degree of “M.Sc. in the
subject of Computer Science”.
I considered that the dissertation has reached the standards and fulfilling the
requirements of the rules and regulations relating to the nature of the degree. The
contents embodied in the dissertation have not been submitted for the award of any
other degree or diploma in this or any other university.
Date:
Place:
I certify that
a) The work contained in the thesis is original and has been done by myself
under the supervision of my supervisor.
b) The work has not been submitted to any other Institute for any degree or
diploma.
c) I have conformed to the norms and guidelines given in the Ethical Code of
Conduct of the Institute.
d) Whenever I have used materials (data, theoretical analysis, and text) from
other sources, I have given due credit to them by citing them in the text of
the thesis and giving their details in the references.
e) Whenever I have quoted written materials from other sources and due credit
is given to the sources by citing them.
f) From the plagiarism test, it is found that the similarity index of the whole
thesis within 25% and single paper is less than 10 % as per the university
guidelines.
Date:
Place:
----------------------------------
Signature
Rohit Nagula
31031522021
Department of Information
CERTIFICATE
Seal
Examiner Approval sheet
This dissertation/project report entitled (Title) by (Name) is approved for the degree of Master
of Science in the subject of Biotechnology.
Examiners
1.---------------------------------------------
Place:
Date:
Acknowledgement
Write an acknowledgement for a maximum of one page. The candidate should convey his appreciation to
all whom have played a role in the completion of his/her Project work. The supervisor, supervisor, head of
the department, faculty members, lab mates etc may be acknowledged. Any controversial statement or non-
academic/abused sentiments are not allowed to write on this page. At the end the student should put his
signature.
1 Title page -
2 Student details i.
3 Certificate of authentication ii.
4 Declaration by the student iii.
5 Department certificate iv.
6 Acknowledgement v.
7 Contents vi.
List of Abbreviations vii.
List of Figures viii.
List of Tables ix.
Abstract and keywords x.
Face detection in unrestricted conditions has been a trouble for years due to various expressions,
brightness, and coloration fringing. Recent studies show that deep learning knowledge of
strategies can acquire spectacular performance inside the identification of different gadgets and
patterns.
This face detection in unconstrained surroundings is difficult due to various poses, illuminations,
and occlusions. Figuring out someone with a picture has been popularized through the mass
media. However, it's miles less sturdy to fingerprint or retina scanning.
The latest research shows that deep mastering techniques can gain mind-blowing performance on
those two responsibilities. In this paper, I recommend a deep cascaded multi-venture framework
that exploits the inherent correlation among them to boost up their performance.
In particular, my framework adopts a cascaded shape with 3 layers of cautiously designed deep
convolutional networks that expect face and landmark region in a coarse-to-fine way. Besides,
within the gaining knowledge of the procedure, I propose a new online tough sample mining
method that can enhance the performance robotically without manual pattern choice.
With every passing day, we are becoming more and more dependent upon technology to carry
out even the most basic of our actions. Facial detection help us in many ways, be it sorting of
photos in our mobile phone gallery by recognizing pictures with their face in them or unlocking a
phone by a mere glance to adding biometric information in the form of face images in the
country’s unique ID database (Aadhaar) as an acceptable biometric input for verification.
This project lays out the basic terminology required to understand the implementation of Face
Detection using Intel’s Computer Vision library called ‘OpenCV’. It also shows the practical
implementation of the Face Detection.
The aim of the project is to implement Facial Detection on faces that the script can be trained for.
The input is taken from a webcam and the recognized faces are displayed along with their name
in real time. This project can be implemented on a larger scale to develop a biometric attendance
system which can save the time-consuming process of manual attendance system.
Keywords:
Face detection is a fundamental task in computer vision that involves identifying and locating
human faces in digital images. Its applications span numerous fields such as biometric
authentication, surveillance systems, social media, and user interaction.
Traditional face detection methods include techniques like Haar Cascades and Histogram of
Oriented Gradients (HOG) combined with Support Vector Machines (SVM). These methods,
while effective, often struggle with accuracy and robustness in varying conditions such as
different lighting, occlusions, and facial expressions.
The advent of machine learning and deep learning has revolutionized face detection techniques,
making them more accurate and reliable. Convolutional Neural Networks (CNNs) and their
variants, such as R-CNN, Fast R-CNN, and Faster R-CNN, have set new benchmarks in face
detection performance.
These deep learning models leverage large datasets and powerful computational resources to
learn complex features and patterns, resulting in significantly improved detection accuracy.
The evolution of face detection has been marked by significant advancements, transitioning from
early heuristic-based methods to sophisticated deep learning models. This progression reflects
the broader trends in artificial intelligence and computer vision, showcasing the increasing
complexity and capability of computational methods.
Early Approaches
Initially, face detection algorithms relied heavily on hand-crafted features and straightforward
classifiers. The Viola-Jones object detection framework, introduced in 2001, was one of the first
effective methods for face detection. It utilized Haar-like features and a cascade of boosted
classifiers to detect faces efficiently. Despite its real-time performance, the Viola-Jones method
faced challenges with variations in pose, illumination, and occlusions.
With the advent of machine learning, more robust methods emerged. Techniques like the
Histogram of Oriented Gradients (HOG) combined with Support Vector Machines (SVM)
improved the ability to capture essential facial features. While these methods offered enhanced
performance over heuristic-based approaches, they still encountered difficulties in handling
complex backgrounds and diverse facial expressions.
The most significant advancements in face detection have been driven by deep learning.
Convolutional Neural Networks (CNNs) have revolutionized the field by learning hierarchical
features directly from raw image data. Models such as the Multi-task Cascaded Convolutional
Networks (MTCNN) and Single Shot MultiBox Detector (SSD) have demonstrated superior
performance, effectively managing variations in scale, rotation, and lighting conditions.
Recent developments have focused on enhancing the speed and scalability of face detection
systems. Techniques like You Only Look Once (YOLO) and its variants have enabled real-time
face detection by predicting bounding boxes and class probabilities in a single forward pass of
the network. These advancements are crucial for applications requiring immediate feedback,
such as surveillance and interactive systems.
1.2 Research Hypothesis
Hypothesis : Advanced deep learning models, such as Convolutional Neural Networks (CNNs),
provide significantly higher accuracy and robustness in face detection tasks compared to
traditional machine learning methods and heuristic-based approaches.
Elaboration: Traditional methods, such as Viola-Jones and HOG + SVM, rely on hand-
crafted features which often fail under varying conditions like different poses, lighting,
and occlusions. CNNs, however, learn hierarchical and abstract features from raw data,
enabling them to adapt and generalize better across diverse conditions, leading to more
accurate and consistent face detection.
Deep learning models are designed to capture complex patterns in data, which
enhances their ability to detect faces accurately. This results in higher recall and
precision rates, meaning the models are less likely to miss faces (reducing false
negatives) and less likely to misidentify non-faces as faces (reducing false positives).
Hypothesis : Implementing real-time face detection systems using state-of-the-art models like
YOLO and MTCNN will meet or exceed performance benchmarks for speed and accuracy,
making them suitable for applications requiring immediate feedback.
Elaboration: Data augmentation techniques, such as rotating, scaling, flipping, and color
adjustments, artificially increase the diversity of the training dataset. This helps the model to
become more robust and better at recognizing faces under various transformations and
distortions, leading to improved performance on unseen data.
Transfer learning involves using pre-trained models on large datasets as a starting point, which
helps in faster convergence and better accuracy. This approach is particularly beneficial when
the available training data is limited, as the pre-trained models already possess a strong ability
to recognize basic features, which can be fine-tuned for face detection tasks.
Hypothesis : Combining face detection with alignment algorithms, such as in the MTCNN
framework, will result in improved accuracy in detecting faces under varied poses and lighting
conditions.
Hypothesis : Face detection models trained on large, diverse datasets will perform better in real-
world scenarios compared to models trained on limited or less varied datasets, highlighting the
importance of comprehensive training data in the development of robust face detection systems.
Elaboration: Training on diverse datasets ensures that the model encounters a wide range of
facial features, skin tones, and backgrounds. This exposure enables the model to generalize better
and perform more accurately across different demographic groups and environmental conditions,
reducing the likelihood of biases. Ensuring diversity in the training data helps mitigate biases
that can arise from overrepresentation of certain demographics.
Chapter 2: -
2.3 Techniques/Methodology: -
1.Providing input: The first and the fore most steps in any of the model is to
provide the input. There are many kinds of input like the input may be in form
of text, numbers, images etc. depending on the type of the problem. As our
problem is face recognition we are taking the images of the people as input to
our problem.
In order to recognize the face first we have to detect where the face is located in
the image. Because we always not make sure that the provided image consists of
only the face of the
person. There may be some other objects in the image. Moreover, many of the
images will contain the entire image of the person. But we do not need the entire
image of the person to recognise his/her face. So we need to remove the
unnecessary things in the image. This step is known as removing noise or noise
removal. Noise is nothing but the corrupted pixels in the image that may lead to
inaccurate results. Noise can be created during the time of photo capture or
during the time of the transmission. Here we are going to use HAAR cascade
classifier for the purpose of face detection.
Paul Viola and Michael Jones in their paper “Rapid object detection using boosted
cascade of simple features” discussed about the HAAR cascade classifier in the
year 2001. A cascade function is used to train lot of positive and negative images.
Positive images are the images contain the images which we want our classifier to
identify. Negative images are the Images of everything else, which do not contain
the object
we want to detect. For the purpose of noise removal HAAR features are used.
Different HAAR features are edge features, line features, four rectangle features.
HAAR Cascades use the
set to give an efficient result of classifiers. HAAR features are used to detect the
features in the given image. Each feature produces a single value calculated by
subtracting the number of pixels
under the white rectangle from the number of pixels under the black rectangle.
HAAR like
features is done on the image to detect human faces starting from the upper left
corner and ending in the lower right corner. Scanning is done several times to
detect human faces in an image.
The process of flow in HAAR cascade classifier is as shown in above figure. First it
will take the
input and try to detect whether it is a face or not. If it is not a face then it will exit
from the algorithm. Otherwise it will be passed on to the next stage to detect
face. The output of the above step will be as shown in the picture:
Fig: 5.7 output of face detection stage
The next step in the process is to extract the features of the face using HOG
extractor.
5.4 HOG (Histogram of Gradients Algorithm): Now we will take the detected face
and extract
features from the face using histogram of gradients. For that first we need to find
the
gradients in the image. We all know that image is a collection of pixels. When we
move from left to right pixel by pixel, we will find that after some steps, there is a
sudden change in the pixel value i.e, from a black pixel (lower pixel number) to a
white pixel (higher pixel number). This sudden change in the colour is called a
gradient and going from a darker tone to a lighter tone is called a positive
gradient and vice versa. From left to right gives us the horizontal gradient and as
expected going from top to down gives a vertical gradient.
The position of the image is not always the same. In some images the face of the person may be
slightly tilted or only the side portion of the face may be visible. Even in that case also we
should be able to recognise the face. For that reason, we wrap each picture so that the eyes
and lips are always in the sample place in the image. We will come up with 68 specific points
(called landmarks) that exist on every face like the top of the chin, the outside edge of each
eye, the inner edge of each eyebrow, etc.
Figure 5.12 Face landmarks estimation
Classifier:
This last step is actually the easiest step in the whole process. All we have to do is
find the person who has the closest measurements to our test image. We will use
a simple linear SVM classifier. It does some extremely complex data
transformations, then figures out how to separate your data based on the labels
or outputs you've defined.
Running this classifier takes milliseconds. The result of the classifier is the name of
the person,Support vector machines (SVMs) are formulated to solve a classical
two class pattern recognition problem. We adapt SVM to face recognition by
modifying the interpretation of the output of a SVM classifier and devising a
representation of facial images that is concordant returns a binary value, the class
of the object. To train our SVM algorithm, we formulate the problem in a
difference space, which explicitly captures the dissimilarities between two facial
images.
This is a departure from traditional face space or view-based approaches, which
encodes each facial image as a separate view of a face. In difference space, we are
interested in the following two classes: the dissimilarities between images of the
same individual, and dissimilarities between images of different people. These
two classes are the input to a SVM algorithm. A SVM algorithm generates a
decision surface separating the two classes. For face recognition, we reinterpret
thedecision surface to produce a similarity metric between two facial images. This
allows us to construct face-recognition algorithm
Implementation: -
Step 1: Setup
Install Necessary Libraries First, install the required libraries, including TensorFlow, OpenCV,
and MTCNN.
image = cv2.imread('face_image.jpg')
cv2.waitKey(0)
cv2.destroyAllWindows()
Using MTCNN for Face Detection MTCNN (Multi-task Cascaded Convolutional Networks) is
a robust method for face detection and alignment.
detector = MTCNN()
faces = detector.detect_faces(image)
cv2.waitKey(0)
cv2.destroyAllWindows()
train_generator = datagen.flow_from_directory('path_to_training_data',
target_size=(224, 224), batch_size=32, class_mode='binary')
Model Definition
model = Sequential([
MaxPooling2D((2, 2)),
MaxPooling2D((2, 2)),
Flatten(),
Dense(512, activation='relu'),
Dropout(0.5),
Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy',
metrics=['accuracy'])
Model Training
# Define callbacks
Model Evaluation
# Load test data
test_datagen = ImageDataGenerator(rescale=1./255)
test_generator = test_datagen.flow_from_directory('path_to_test_data',
target_size=(224, 224), batch_size=32, class_mode='binary')
import numpy as np
model.load_weights('best_model.h5')
cap = cv2.VideoCapture(0)
while True:
if not ret:
break
predictions = model.predict(input_frame)
break
cap.release()
cv2.destroyAllWindows()
This implementation demonstrates a complete pipeline for face detection using deep learning,
from data preprocessing and augmentation to training a custom model, evaluating its
performance, and deploying it in a real-time application using a webcam.
Chapter 3: -
Results
3.1 Introduction: -
Face detection is a fundamental task in computer vision, with applications ranging
from security and surveillance to personal electronics and digital marketing. The
ability to accurately identify and locate human faces in images and videos forms
the basis for advanced applications such as facial recognition, emotion detection,
and augmented reality.
This section provides a detailed introduction to the results obtained from our
research on face detection, focusing on methodologies, performance metrics, and
their implications for practical applications.
Chapter 4: -
Conclusions and Future Prospects
4.1 Bibliography: -
Conclusions
In conclusion, our exploration into face detection technologies has revealed several key insights
and advancements:
Robustness and Adaptability: The ability of deep learning models to handle variations in pose,
lighting conditions, and occlusions highlights their robustness and adaptability in diverse
scenarios. This is crucial for applications requiring reliable face detection across different
settings, including surveillance, human-computer interaction, and healthcare.
Future Prospects
Looking ahead, several avenues for future research and development in face detection include:
Multi-modal Integration: Integration of face detection with other modalities such as voice
recognition and gesture detection to enhance multimodal interaction systems.
Ethical Considerations: Addressing ethical concerns related to privacy, bias, and fairness in the
deployment of face detection technologies, ensuring responsible and transparent practices.
Domain-specific Applications: Tailoring face detection models for specific domains such as
healthcare (patient monitoring), retail (customer analytics), and automotive (driver monitoring)
to maximize their impact and effectiveness.
Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features.
IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. IEEE
Computer Society Conference on Computer Vision and Pattern Recognition.
Zhang, K., Zhang, Z., Li, Z., & Qiao, Y. (2016). Joint face detection and alignment using
multitask cascaded convolutional networks. IEEE Signal Processing Letters.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-
time object detection. IEEE Conference on Computer Vision and Pattern Recognition Schroff,
F., Kalenichenko, D., & Philbin, J. (2015). FaceNet: A unified embedding for face recognition
and clustering. IEEE Conference on Computer Vision and Pattern Recognition.
Yan, J., Zhang, S., Lei, Z., & Li, S. Z. (2018). S3FD: Single shot scale-invariant face detector.
IEEE Conference on Computer Vision and Pattern Recognition.
Somaiya Vidyavihar University
Vidyavihar East, Mumbai 400077, India
W: www.somaiya.edu