2609 BDA Final
2609 BDA Final
Title of Micro project: Load the Dataset and Store in a Data-Frame using Pandas
Academic Year: 2023-2024                         Program Code: AN
Course: Big Data Analytics                       Course Code:
22684
Submitted by
                                                                        1
                                  Institute Code: 0141
                                   CERTIFICATE
Certified that this micro project report titled “Load the Dataset and Store in a Data-
Frame using Pandas” is the bonafide work of Mr. Sarang Jagdale Roll no 2609 of
third year diploma in Artificial intelligence and machine learning for the course:
Big Data Analytics [BDA] code 22684 during the academic year 2023-2024, who
carried out the micro project work under my supervision.
                                                                                    2
                                ACKNOWLEDGEMENT
We would like to express our special thanks of gratitude to our teachers, who gave us
opportunity to do this wonderful micro project on the topic “ Load the Dataset and Store in a
Data-Frame using Pandas” which also helped us in doing a lot of Research and we came to
know about so many new things we all really thankful to all who help us doing this micro
project.
Secondly we would also like to thank our parents and friends who helped us a lot in finalizing
this project within the limited time frame.
Name Signature
Sarang Jagdale
                                                                                             3
  ALL INDIA SHRI SHIVAJI MEMORIAL SOCIETY’S POLYTECHNIC, PUNE -1
 VISION:
 MISSION:
  M1: Empower the students by inculcating various technical and soft skills.
  M2: Upgrade teaching-learning process and industry-institute interaction
  continuously
 Vision
      Mission
     M1:To fulfill industrial requirement in the area of artificial intelligence and machine
     Learning.
                                                                                        7
                                   INDEX
1. Title 1
2. Certificate 2
3. Acknowledgement 3
4. Annexure I 9
5. Annexure II 12
6. Annexure III 21
7. Annexure IV 23
8. Log Book 24
                                                      8
                                                                                 Annexure-I
                                 Micro-Project Proposal
Title of Micro-Project: Load the Dataset and Store in a Data-Frame using Pandas
 Aim: -
 To load the dataset and store it in a data frame using pandas.
 Pandas Data Frame is a structure that contains two-dimensional data and its corresponding labels.
 Benefits: -
 •   Helps to develop the skill of creating programs using logical statements.
 •   This project will build an ability to use the python software in a better way.
 •   The benefit taken from the micro-project is that to understand and apply logic to solve
     different problems and find solutions for them.
                                                                                               9
      4.0 Action Plan
Sr.   Details of Activity                            Planned      Planned       Name of Responsible
No.                                                  Start date   Finish date   Team Members
1.    Introduction to Micro-project: Study             01/01/24     03/01/24       Sarang Jagdale
      for selecting Micro project topic
                                                                                           10
     5.0 Resources Required
Sr. Name of               Specifications                                  Qty.       Remarks
No. Resources/material
1.    Computer System              Laptop i5 11th gen, RAM –7GB                  1
2.    Operating System                      Windows 11                           1
3.          Printer                             -                                -
4.    Internet/Websites   https://github.com/topics/pandas?l=java
                                                                    Ms.R.G. Waghmare
                                                (To be approved by the Concerned Teacher)
                                                                                       11
                                                                                Annexure-II
                                    Micro-Project Report
Title of Micro-Project: Load the Dataset and Store in a Data-Frame using Pandas
  1.0 Rationale:
  Data Frames are similar to SQL tables or the spreadsheets that you work with in Excel or
  Calc. In many cases, Data Frames are faster, easier to use, and more powerful than tables or
  spreadsheets because they’re an integral part of the Python and NumPy ecosystems.
  Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and
  columns. We can perform basic operations on rows/columns like selecting, deleting, adding, and
  renaming. Column Selection: In Order to select a column in Pandas Data Frame we can either access
  the columns by calling them by their columns name.
  Aim: -
  To load the dataset and store it in a data frame using pandas.
  Pandas Data Frame is a structure that contains two-dimensional data and its corresponding labels.
  Benefits: -
  •   Helps to develop the skill of creating programs using logical statements.
  •   This project will build an ability to use the python software in a better way.
  •   The benefit taken from the micro-project is that to understand and apply logic to solve
      different problems and find solutions for them.
                                                                                                12
       5.0 Actual Methodology Followed
        Sr. No./             Date                                  Work Done
        Hour No.
           1.            03/01/24                               Finalize the Topic
           2.            05/01/24                              Distribution of Work
           3.            09/01/24                              Distribution of Topic
           4.            14/01/24                         Collecting Images/Information
           5.            17/01/24                               Starting animation
           6.            24/01/24                             Completing animation
           7.            27/01/24                           Creating a Word Document
           8.            03/02/24                              Inserting information
           9.            13/02/24                            Arranged the Information
           10.           27/02/24                            Proofread the Information
           11.           05/03/24                          Editing the Word Document
           12.           13/03/24                            Review from the Teacher
           13.           24/03/24              Editing the Project Report as per Teacher’s suggestion
           14.           01/04/24                        Proofread and Finalize the Report
           15.           01/04/24                               Finalize the report
           16.           08/04/24                         Final submission of the Report
                                                                                                   13
  7.0 Output of the Micro-Project:
In the real world, a Pandas DataFrame will be created by loading the datasets from existing storage,
storage can be SQL Database, CSV file, and Excel file. Pandas DataFrame can be created from
the lists, dictionary, and from a list of dictionary etc. Dataframe can be created in different ways
here are some ways by which we create a dataframe: Creating a dataframe using List: DataFrame
can be created using a single list or a list of lists.
                                                                                               14
Dealing with Rows and Columns:
A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and
columns. We can perform basic operations on rows/columns like selecting, deleting, adding, and
renaming. Column Selection: In Order to select a column in Pandas DataFrame, we can either access
the columns by calling them by their columns name.
Row Selection:
Pandas provide a unique method to retrieve rows from a Data frame. DataFrame.loc [] method is used
to retrieve rows from Pandas DataFrame. Rows can also be selected by passing integer location to an
iloc [] function.
                                                                                                15
Indexing and Selecting Data:
Indexing in pandas means simply selecting particular rows and columns of data from a DataFrame.
Indexing could mean selecting all the rows and some of the columns, some of the rows and all of the
columns, or some of each of the rows and columns. Indexing can also be known as Subset Selection.
Indexing a Dataframe using indexing operator [] : Indexing operator is used to refer to the square
brackets following an object. The .loc and .iloc indexers also use the indexing operator to make
selections. In this indexing operator to refer to df[].
                                                                                                  16
Indexing a DataFrame using .iloc[ ] :
This function allows us to retrieve rows and columns by position.In order to do that, we’ll need to specify
the positions of the rows that we want, and the positions of the columns that we want as well.
All these function help in filling a null values in datasets of a DataFrame. Interpolate() function is
basically used to fill NA values in the dataframe but it uses various interpolation technique to fill the
missing values rather than hard-coding the value.
                                                                                                  17
Dropping missing values using dropna() :
In order to drop a null values from a dataframe, we used dropna() function this fuction drop
Rows/Columns of datasets with Null values in different ways.
Iteration is a general term for taking each item of something, one after another. Pandas DataFrame
consists of rows and columns so, in order to iterate over dataframe, we have to iterate a dataframe
like a dictionary. Iterating over rows : In order to iterate over rows, we can use three function
iteritems(), iterrows(), itertuples() . These three function will help in iteration over rows.
                                                                                               18
Output of the Micro-Project
   a. The project’s main application is to load the dataset and store it in a dataframe using pandas
   b. This project will help to load the dataset and store it in a dataframe using pandas
Ms.R.G. Waghmare
                                                                                                 19
                                         Annexure - III
                          Rubric for Assessment of Micro Project
                                                                                                 20
                           Precautions and      conclusion. but        precautions and   precautions and
                           Conclusions          clarity is not there   conclusion.       conclusion.
                           omitted, some        in presentation.       Sufficient        Enough tables,
                           details are wrong.   But not enough         graphic           charts and
                                                graphic                description       sketches
                                                description
7.   Presentation of the   Major information    Includes major         Includes major    Well organized,
     Micro-Project         is not included,     information but        information but   Includes major
                           information is not   not well               not well          information,
                           well organized.      organized not          organized not     presented well.
                                                presented well.        presented well.
                                                                                                21
                                                                                   Annexure IV
Title of the Micro-project: Load the Dataset and Store it in Data-Frame using Pandas
                                                                                             22
                       Log Book of the Student (Hourly Work
                               Report) Academic Year: 2023-2024
Name of Student: Sarang Jagdale
Title of the Project: Load the Dataset and Store it in Data-Frame using Pandas
Course: Big Data Analytics [BDA]                 Course Code: 22684
Semester: AN6I
 Sr. No.        Date          Time                         Work Done
Ms.R.G. Waghmare
                                                                                           23
                                 Rubrics Used for Evaluation of a Micro Project
         Assessment of micro project based on rubrics for performance in group activity :( Marks to be
         given out of 06
         Assessment of performance in individual presentation/Viva of micro project: (Marks to be given
         out of 04
                   Scale used for assessment: Poor (1-3), Average (4-5), Good (6-8), Excellent (9-10)
             A) Process and Product Assessment (A):
         Rubric
                     Characteristics to be assessed                              Marks Obtained out of 10
         No.
             1       Relevance to course
             2       Literature review/information collection
             3       Completion of target as per project proposal
             4       Analysis of data and representation
             5       Quality of prototype/model
             6       Report Preparation
                                                             Total Out of (60)
                         Process and Product Assessment (A): Total Out of
                                                                          (06)
B) Individual Presentation/Viva(B)
                                                                                                               24
                          Evaluation Sheet for the Micro Project
Academic Year: 2023-2024                             Name of Faculty: Ms.R.G.Waghmare
Course: Big Data Analytics [BDA]
Course Code: 22684
Semester: AN6I
Title of the Project: Load the Dataset and Store it in Data-Frame using Pandas
25