A CAL Project Report
Project Report
                                    on
“DETECTION AND CLASSIFICATION OF
            CONSUMED FOOD ITEMS”
USING DEEP LEARNING ALGORITHM
to be submitted in partial fulfilling of the requirements for the course on
       Data Mining and Business Intelligence – ITA5007
                               (B2+TB2)
                                   by
                      Ashok Rajbanshi (20MCA0271)
                    Jeffery M. Lawrence (21MCA0137)
                       Frank Therattil (21MCA0184)
                         Winter Semester 2021-2022
                                     1
                      TABLE OF CONTENTS
ABSTRACT
1. Introduction ……………………………………………………………. 04
2. Review 1 (Survey, Analysis)...………………………………….……… 05
    a. Problem definition
    b. Dataset Description
3. Review 2 (Design)…………………….………………….. 06
    a. Methodology
           i. Module Description
                1. Data exploration
                2. Pre-processing
                3. …….. (include other modules like that)
         ii. Algorithms used
                1. Justification for choosing the models
         iii. Flow diagram of your model
         iv. Dataset after preprocessing
         v. Dataset split(train and test)
4. Review 3 (Code)…………………………………. 11
    a. Implementation
           i. Software and hardware description
         ii. Output screenshots
    b. Confusion Matrix
    c. Comparison of the models used
    d. Comparison graph
5. Conclusion ………………………………………………………....…… 18
6. References ………………………………………………………………. 19
                                    2
                                         ABSTRACT
This project is aimed at detecting and classifying 22 different food items from the input images.
The system will accept the input as images of various food items and will identify and classify
them based on previous images with which the system was trained and tested.
The project will be using an image dataset of 22 different classes of food items and each class
containing 100 images of a single food item of that particular class. The system developed for
the project will make use of deep learning algorithm-MobileNetV2 to detect and
categorise/classify the input given by the user such as the name of the food item, the category of
the food item-(healthy or unhealthy).
                                                 3
   1. INTRODUCTION
Food-related photos have become popular, due to social networks, food recommendation and
dietary assessment systems. Social networking sites are nowadays flooded with Food related
photos. For instance, new trend is sharing dining-out experiences on social networks. In fact,
people are increasingly interested in discovering and sharing new cuisines, and knowing more
about different aspects of the food they consume. Many works on food recognition have been put
forward in recent years based on different visual representations most of them are limited to a
few food classes in controlled settings. Accurate food recognition from only visual information is
still a troublesome task. In contrast to objects, food items are deformable and with high interclass
variability, e.g. diverse cooking styles and seasonings will lead to different appearances of the
same food. Moreover, different foods share many ingredients and often differences between
some foods classes are difficult to detect. Also the difference in appearance and presentation of
same dish at various restaurants add to the complexity of recognizing the dish.
A few techniques that exist for multi-class image classification are SVM, KNN, and Artificial
Neural Networks. Transfer Learning technique has shown promising results in the field of image
classification. Transfer learning is a deep learning technique where a model is trained to learn
and store the knowledge from one problem and use the same model to other similar problems.
Convolution Neural Network is a deep learning technique that has gained popularity in image
recognition tasks due to its high accuracy and robustness.
The purpose of this project is to present a system which can detect and classify different food
items that we consume on a normal basis by using a pre-trained MobileNetV2 model. A
comparison of the above model is made based on accuracy and loss. In this the user will be
giving an input as an image so now the system shall perform detection and classification based
on the input image give and the output generated will be the name of the food item. We have
used Streamlit which is an open source app framework in Python language. It helps us create
web apps for data science and machine learning in a short time. It is compatible with major
Python libraries such as scikit-learn, Keras, PyTorch, SymPy(latex), NumPy, pandas, Matplotlib
etc. so with the help of this we have created a web application which allows the user to insert an
image and it will send the image to the food detection model which performs the detection and
                                                 4
sends the name of the food item which it has detected, now in our web application we have
created two list one for health food items and another for un-healthy food item. It now checks in
which category does the food name generated lies and based on that it specifies the category of
the food item. We have also used web scrapping to fetch the calories of the food item (per 100g),
with the help of a Google API, and the output generated by the web applications are category,
prediction and calories.
   2. Survey & Dataset Collection
       1. Problem Definition:
       Social networking sites are nowadays flooded with Food related photos. For instance,
       new trend is sharing dining-out experiences on social networks. In fact, people are
       increasingly interested in discovering and sharing new cuisines, and knowing more about
       different aspects of the food they consume. Many works on food recognition have been
       put forward in recent years based on different visual representations most of them are
       limited to a few food classes in controlled settings. Accurate food recognition from only
       visual information is still a troublesome task.
       2. Dataset Description:
       Data Source: https://www.kaggle.com/datasets/kmader/food41
       Data Source: https://www.kaggle.com/datasets/kritikseth/fruit-and-vegetable-image-
       recognition?select=validation
       We have used two different dataset food-101 and fruit-and-vegetable dataset and created
       our own custom dataset by combining few classes from the above two dataset and our
       dataset consist of 22 different classes of food items. The dataset consist of total 2600
       images in which each class consist of 100 training images, 10 test images and 10
       validation images. The food classes 'pizza' 'french_fries' 'chicken_curry' 'cauliflower'
       'burger' 'tomato' 'omlette' 'hot_dog' 'ice_cream' 'samosa' 'pineapple' 'banana'
       'cheese' 'cabbage' 'apple' 'grapes' 'mango' 'corn' 'momos' 'donuts' 'carrot' 'soup'.
                                                 5
3. Methodology:
The proposed work is implemented in Python using a Convolutional Neural Network (CNN)
model and Transfer Learning. The models were trained on 100 images for 22 classes and then
used to predict food class. A new input goes through stages of image processing like resizing and
colour space conversion etc, before it is fed to the trained model. After comparing the features of
the input with the features of each trained class, the output is predicted. The User uploads a food
image from the system using a GUI. The image goes through the stages of image preprocessing
and is then passed on to the Mobilenet_v2 (CNN) model for classification.
       i.      Module Description:
               1. Data Exploration: In this step we are creating labels for each class to in
                   order to make the model understand that to which class a given image belongs
                   to.
               2. Pre-Processing:
                   For pre-processing of our data we have used MobilenetV2.preprocess_input
                   along with that ImageDataGenerator() which is a Keras Image augmentation
                   library. This step is performed for both the train, test and validation dataset.
                   Image Augmentation techniques with Keras ImageDataGenerator
                        Rotations
                        Shifts
                        Flips
                        Brightness
                        Zoom
                   Code: for preprocessing
                                                6
             train_generator = tf.keras.preprocessing.image.ImageDataGenerator(
                 preprocessing_function=tf.keras.applications.mobilenet_v2.preprocess_inp
             ut
             )
             test_generator = tf.keras.preprocessing.image.ImageDataGenerator(
                 preprocessing_function=tf.keras.applications.mobilenet_v2.preprocess_inp
             ut
             )
ii.       Algorithms used:
          In this model we have used Mobilenet_V2. It is a convolutional neural
          network that is 53 layers deep. It is fast and more accurate that other object
          detection models and its small in size. We have used the pretrained weights of
          MobileNet-V2 for transfer learning of our model.
          MobileNetV2:
         In MobileNetV2, there are two types of blocks. One is residual block with stride
          of 1. Another one is block with stride of 2 for downsizing.
         There are 3 layers for both types of blocks.
         This time, the first layer is 1×1 convolution with ReLU6.
         The second layer is the depthwise convolution.
         The third layer is another 1×1 convolution but without any non-linearity. It is
          claimed that if ReLU is used again, the deep networks only have the power of a
          linear classifier on the non-zero volume part of the output domain.
         And there is an expansion factor t. And t=6 for all main experiments.
         If the input got 64 channels, the internal output would get 64×t=64×6=384
          channels.
                                          7
Overall architecture:
                        8
   where t: expansion factor, c: number of output channels, n: repeating number, s:
    stride. 3×3 kernels are used for spatial convolution.
    Impact of overall Linear Bottleneck:
  With the removal of ReLU6 at the output of each bottleneck module, accuracy is
   improved.
 ImageNet Classification:
        ImageNet Top-1 Accuracy
       MobileNetV2 outperforms MobileNetV1 and ShuffleNet with comparable
        model size and computational cost.
       When we compare MobileNetV2 with other deep learning algorithm on
        ImageNet dataset we find that it the total CPU time consumed for the
                                    9
          detection by MobileNetv2 was 75ms which is the minimum among all
          algorithms in the image shown above.
iii.   Flow Diagram of the Model:
iv.    Dataset split:
       In our dataset we have total of 2600 images for 22 different food classes out
       which we have used 100 images of each class for Training and 10 images of each
       class for Test and Validation.
                                     10
4. Review 3:
   a) Implementation:
        i. Software and hardware description:
            Software:
                Python:3.9
                Jupyter-notebook
               Python Libraries:-
                Tensorflow library
                Keras Library
                Matplotlib library
                Numpy library
                Pandas library
                Sklearn library
            Hardware:
            Os: Windows 8,10,11, linux, macOs
            R.A.M: 4gb.
        ii. Output Screenshots:
Input: apple.jpg
                                         11
Input:Burger.jpeg
Input:French_fries.jpg
                         12
Input: Mango.jpg
Input:omelette.jpg
                     13
Input: pizza.jpeg
Input: samosa.jpg
                    14
Input: soup.jpg
Image: pineapple.jpg
                       15
b) Confusion Matrix:
                       16
c) Comparison Graph:
     Accuracy vs val_accuracy:
     Loss vs val_loss:
                                 17
5. Conclusion:
In this study, we have learned the working and how object detection is done using MobileNetV2
model based on their accuracy in multi-class classification for food image dataset. It is seen that
the MobileNetV2 model outperforms the other CNN model in terms of accuracy. It can be
concluded that when a large dataset is not available it is better to use Transfer Learning than
Conventional CNN. We have also seen how we can use Google api along with web scrapping to
extract the calories of food items that is predicted by our model. In future work, ingredient
identification in the particular class of food can be obtained. A more sophisticated tool for image
classification can be developed using more than 22 classes.
                                                 18
6. References:
            https://www.irjet.net/archives/V8/i8/IRJET-V8I8102.pdf
            https://paperswithcode.com/method/mobilenetv2
            https://books.aijr.org/index.php/press/catalog/book/114/chapter/1068
            https://www.researchgate.net/publication/
             341129298_Food_Image_Classification_with_Improved_MobileNet_Architectur
             e_and_Data_Augmentation
                                             19