[go: up one dir, main page]

0% found this document useful (0 votes)
66 views23 pages

2609 BDA Final

.................................................................................................................................................................................................................scsdc

Uploaded by

harshkale38
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views23 pages

2609 BDA Final

.................................................................................................................................................................................................................scsdc

Uploaded by

harshkale38
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 23

Institute Code: 0141

Title of Micro project: Load the Dataset and Store in a Data-Frame using Pandas
Academic Year: 2023-2024 Program Code: AN
Course: Big Data Analytics Course Code:
22684

Submitted by

Roll No Name Sign of student

2609 Sarang Jagdale

Under the Guidance of:


- Ms.R.G. Waghmare

1
Institute Code: 0141
CERTIFICATE

Certified that this micro project report titled “Load the Dataset and Store in a Data-
Frame using Pandas” is the bonafide work of Mr. Sarang Jagdale Roll no 2609 of
third year diploma in Artificial intelligence and machine learning for the course:
Big Data Analytics [BDA] code 22684 during the academic year 2023-2024, who
carried out the micro project work under my supervision.

Name & signature of Course Teacher


Ms.R.G.Waghmare

2
ACKNOWLEDGEMENT

We would like to express our special thanks of gratitude to our teachers, who gave us
opportunity to do this wonderful micro project on the topic “ Load the Dataset and Store in a
Data-Frame using Pandas” which also helped us in doing a lot of Research and we came to
know about so many new things we all really thankful to all who help us doing this micro
project.
Secondly we would also like to thank our parents and friends who helped us a lot in finalizing
this project within the limited time frame.

Name Signature

Sarang Jagdale

3
ALL INDIA SHRI SHIVAJI MEMORIAL SOCIETY’S POLYTECHNIC, PUNE -1

Department of Artificial Intelligence and Machine Learning


VISION AND MISSION OF THE INSTITUTE

 VISION:

Achieve excellence in quality technical education by imparting knowledge,


skills and abilities to build a better technocrat.

 MISSION:

M1: Empower the students by inculcating various technical and soft skills.
M2: Upgrade teaching-learning process and industry-institute interaction
continuously

VISION AND MISSION OF THE AI & ML DEPARTMENT

 Vision

To serve the society by imparting knowledge in artificial intelligence and machine


Learning along with professional skills to build a responsible human being.

 Mission
M1:To fulfill industrial requirement in the area of artificial intelligence and machine
Learning.

M2: To motivate students for continuous learning with entrepreneurial skills.


M3: To inculcate ethical values, soft skills and leadership skills in students for

Overall personality development. 6


ALL INDIA SHRI SHIVAJI MEMORIAL SOCIETY’S POLYTECHNIC, PUNE -1

Department of Artificial Intelligence and Machine Learning


PROGRAM OUTCOMES (POs)
PO1 Basic and Discipline specific knowledge: Apply knowledge of basic mathematics,
science and engineering fundamentals and engineering specialization to solve the
engineering problems.
PO2 Problem analysis: Identify and analyze well-defined engineering problems using
codified standard methods.
PO3 Design/ development of solutions: Design solutions for well-defined technical
problems and assist with the design of systems components or processes to meet
specified needs.
PO4 Engineering Tools, Experimentation and Testing: Apply modern engineering
tools and appropriate technique to conduct standard tests and measurements.
PO5 Engineering practices for society, sustainability and environment: Apply
appropriate technology in context of society, sustainability, environment and
ethical practices.
PO6 Project Management: Use engineering management principles individually, asa
team member or a leader to manage projects and effectively communicate about
well-defined engineering activities.
PO7 Life-long learning: Ability to Analize individual needs and engage in updatingin
the context of technological changes.

PROGRAM SPEICIFIC OUTCOMES (POs)

Students will be able to:


PSO 1: Apply computing knowledge with standard practices to develop software.

PSO 2: Maintain Computer Hardware and Software System.

7
INDEX

Sr. No. Content Page No.

1. Title 1

2. Certificate 2

3. Acknowledgement 3

4. Annexure I 9

5. Annexure II 12

6. Annexure III 21

7. Annexure IV 23

8. Log Book 24

9. Rubrics Used for Evaluation 25

10. Evaluation Sheet 26

8
Annexure-I
Micro-Project Proposal

Title of Micro-Project: Load the Dataset and Store in a Data-Frame using Pandas

1.0Aims/Benefits of the Micro-Project

Aim: -
To load the dataset and store it in a data frame using pandas.
Pandas Data Frame is a structure that contains two-dimensional data and its corresponding labels.

Benefits: -
• Helps to develop the skill of creating programs using logical statements.
• This project will build an ability to use the python software in a better way.
• The benefit taken from the micro-project is that to understand and apply logic to solve
different problems and find solutions for them.

2.0Course Outcomes Addressed

C22684.a: Describe Big data and Big Data Analytics.


C22684.b: Apply the Big data and Big Data Analytics.

3.0 Proposed Methodology


1. Arrangement of groups and representatives for groups that are not usually represented as
partners in main projects.
2. Capacity building and networking in relation to the role as partners in micro
projects. 3.Collected materials related to project.
4.Support development of more need and user driven
projects. 5.Contribute to the maximum requirements of
project.
6. An eligible project idea addressing one of the four Priority Axes and a work plan for a
micro project including a description of how the capacity building and networking should
take place.
7. The project involves maximum three partners. From three partners, the contributions of
micro project are distributed.
8. An eligible Lead member who will guide the group members and analyzed the
data. 9.Eligible match finding the proper information.
10.Softcopy corrections by respective
teachers. 11.Completion of the micro project
properly.
12.Final copy and submission.

9
4.0 Action Plan
Sr. Details of Activity Planned Planned Name of Responsible
No. Start date Finish date Team Members
1. Introduction to Micro-project: Study 01/01/24 03/01/24 Sarang Jagdale
for selecting Micro project topic

2. Introduction to Micro-project: 03/01/24 05/01/24 Sarang Jagdale


Discussion about selected Micro project
topic with concerned Course Teacher

3. Introduction to Micro-project: Finalize 07/01/24 09/01/24 Sarang Jagdale


and Study for selected topic

4. Drafting Proposals 12/01/24 14/01/24 Sarang Jagdale

5. Proposal submission 15/01/24 17/01/24 Sarang Jagdale

6. Micro project Proposal Presentation 22/01/24 24/01/24 Sarang Jagdale

7. Making Changes in presentation, if suggested 25/01/24 27/01/24 Sarang Jagdale


by concerned teacher
8 Executing Micro-Project: Study from 01/02/24 03/02/24 Sarang Jagdale
different resources
9. Executing Micro-Project: Collect 11/02/24 13/02/24 Sarang Jagdale
information from studied resources
10. Executing Micro-Project: Arrange collected 25/02/24 27/02/24 Sarang Jagdale
information
11. Executing Micro project 03/03/24 05/03/24 Sarang Jagdale

12. Drafting Methodology 11/03/24 13/03/24 Sarang Jagdale

13. Drafting Literature Review 22/03/24 24/03/24 Sarang Jagdale

14. Drafting Result, Discusser 28/03/24 01/04/24 Sarang Jagdale


15. Micro project Presentation 01/04/24 01/04/24 Sarang Jagdale
16. Micro Project final submission 08/04/24 08/04/24 Sarang Jagdale

10
5.0 Resources Required
Sr. Name of Specifications Qty. Remarks
No. Resources/material
1. Computer System Laptop i5 11th gen, RAM –7GB 1
2. Operating System Windows 11 1
3. Printer - -
4. Internet/Websites https://github.com/topics/pandas?l=java

Names of Team Members with Roll Nos.


Roll No Name

2609 Sarang Jagdale

Ms.R.G. Waghmare
(To be approved by the Concerned Teacher)

11
Annexure-II

Micro-Project Report
Title of Micro-Project: Load the Dataset and Store in a Data-Frame using Pandas

1.0 Rationale:
Data Frames are similar to SQL tables or the spreadsheets that you work with in Excel or
Calc. In many cases, Data Frames are faster, easier to use, and more powerful than tables or
spreadsheets because they’re an integral part of the Python and NumPy ecosystems.

Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and
columns. We can perform basic operations on rows/columns like selecting, deleting, adding, and
renaming. Column Selection: In Order to select a column in Pandas Data Frame we can either access
the columns by calling them by their columns name.

2.0 Aim/Benefits of the Micro-Project:

Aim: -
To load the dataset and store it in a data frame using pandas.
Pandas Data Frame is a structure that contains two-dimensional data and its corresponding labels.

Benefits: -
• Helps to develop the skill of creating programs using logical statements.
• This project will build an ability to use the python software in a better way.
• The benefit taken from the micro-project is that to understand and apply logic to solve
different problems and find solutions for them.

3.0Course Outcomes Addressed

C22684.a: Describe Big data and Big Data Analytics.


C22684.b: Apply the Big data and Big Data Analytics.

4.0 Literature Review:


https://github.com/topics/memory-game?l=java
https://github.com/melongbob/MatchingGame

12
5.0 Actual Methodology Followed
Sr. No./ Date Work Done
Hour No.
1. 03/01/24 Finalize the Topic
2. 05/01/24 Distribution of Work
3. 09/01/24 Distribution of Topic
4. 14/01/24 Collecting Images/Information
5. 17/01/24 Starting animation
6. 24/01/24 Completing animation
7. 27/01/24 Creating a Word Document
8. 03/02/24 Inserting information
9. 13/02/24 Arranged the Information
10. 27/02/24 Proofread the Information
11. 05/03/24 Editing the Word Document
12. 13/03/24 Review from the Teacher
13. 24/03/24 Editing the Project Report as per Teacher’s suggestion
14. 01/04/24 Proofread and Finalize the Report
15. 01/04/24 Finalize the report
16. 08/04/24 Final submission of the Report

6.0 Actual Resources Used


Sr . Name of Specifications Qty. Remarks
No Resources/
material

1. Computer System Laptop i5 11th gen, RAM –7GB 1


2. Operating System Windows -
11
3. Printer - -
4. Internet/Websites https://github.com/topics/ 7
memory-game?l=java

13
7.0 Output of the Micro-Project:

Creating a Pandas Data Frame:

In the real world, a Pandas DataFrame will be created by loading the datasets from existing storage,
storage can be SQL Database, CSV file, and Excel file. Pandas DataFrame can be created from
the lists, dictionary, and from a list of dictionary etc. Dataframe can be created in different ways
here are some ways by which we create a dataframe: Creating a dataframe using List: DataFrame
can be created using a single list or a list of lists.

Create a Pandas Dataframe from a dict of ndarray/lists:


To create Dataframe from dict of narray/list, all the narray must be of same length. If index is passed
then the length index should be equal to the length of arrays. If no index is passed, then by default,
index will be range(n) where n is the array length

14
Dealing with Rows and Columns:

A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and
columns. We can perform basic operations on rows/columns like selecting, deleting, adding, and
renaming. Column Selection: In Order to select a column in Pandas DataFrame, we can either access
the columns by calling them by their columns name.

Row Selection:

Pandas provide a unique method to retrieve rows from a Data frame. DataFrame.loc [] method is used
to retrieve rows from Pandas DataFrame. Rows can also be selected by passing integer location to an
iloc [] function.

15
Indexing and Selecting Data:

Indexing in pandas means simply selecting particular rows and columns of data from a DataFrame.
Indexing could mean selecting all the rows and some of the columns, some of the rows and all of the
columns, or some of each of the rows and columns. Indexing can also be known as Subset Selection.
Indexing a Dataframe using indexing operator [] : Indexing operator is used to refer to the square
brackets following an object. The .loc and .iloc indexers also use the indexing operator to make
selections. In this indexing operator to refer to df[].

Selecting a single columns


In order to select a single column, we simply put the name of the column in-between the brackets

Indexing a DataFrame using .loc[ ] :


This function selects data by the label of the rows and columns.
The df.loc indexer selects data in a different way than just the indexing operator. It can select subsets
of rows or columns. It can also simultaneously select subsets of rows and columns.

Selecting a single row :


In order to select a single row using .loc[], we put a single row label in a .loc function.

16
Indexing a DataFrame using .iloc[ ] :
This function allows us to retrieve rows and columns by position.In order to do that, we’ll need to specify
the positions of the rows that we want, and the positions of the columns that we want as well.

Working with Missing Data:


Missing Data can occur when no information is provided for one or more items or for a whole unit.
Missing Data is a very big problem in real life scenario. Missing Data can also refer to as NA
(Not Available) values in pandas. Checking for missing values using isnull() and notnull() : In order to
check missing values in Pandas DataFrame, we use a function isnull() and notnull().

Filling missing values using fillna(), replace() and interpolate() :

All these function help in filling a null values in datasets of a DataFrame. Interpolate() function is
basically used to fill NA values in the dataframe but it uses various interpolation technique to fill the
missing values rather than hard-coding the value.

17
Dropping missing values using dropna() :

In order to drop a null values from a dataframe, we used dropna() function this fuction drop
Rows/Columns of datasets with Null values in different ways.

Iterating over rows and columns:

Iteration is a general term for taking each item of something, one after another. Pandas DataFrame
consists of rows and columns so, in order to iterate over dataframe, we have to iterate a dataframe
like a dictionary. Iterating over rows : In order to iterate over rows, we can use three function
iteritems(), iterrows(), itertuples() . These three function will help in iteration over rows.

18
Output of the Micro-Project

8.0 Skill Developed/Learning Outcomes of this Micro-Project:

a. Working in a team and as an individual.


b. Presenting information in proper sequence.
c. Developed skills to apply python concepts.
d. Improved analysis skill.
Applications of this Micro project:

a. The project’s main application is to load the dataset and store it in a dataframe using pandas
b. This project will help to load the dataset and store it in a dataframe using pandas

Names of Team Members with Roll Nos.


1. 2609 Sarang Jagdale

Ms.R.G. Waghmare

(To be evaluated by the Concerned Teacher)

19
Annexure - III
Rubric for Assessment of Micro Project

S. Characteristics to Poor Average Good Excellent


No be assessed (Marks 1-3) (Marks 4-5) (Marks 6-8) (Marks 9-10)
1. Relevance to the Related to very Related to some Addressed at- Addressed more
Course few LOs Los least one CO than one CO
2. Literature Not more than At-least 5 At-least 7 About 10
Review/information two sources relevant sources, relevant sources, relevant sources,
collection (Primary and at least 2 latest most latest most latest
Secondary),
very old reference
3. Completion of Completed less Completed 50 to Completed 60 Completed more
Target as per than 50% 60% to70% than70%
Project proposal
4. Analysis of Data Sample Size all, Sufficient and Sufficient and Enough data
and representation data neither appropriate appropriate collected by
organized nor sample, enough sample, enough sufficient and
presented well data generated but data generated appropriate
not organized and which is sample size.
not well presented organized and Proper inferences
well. No or poor presented well. drawn by
inferences drawn But poor organizing and
inferences presenting data
drawn through tables,
charts and graphs
5. Quality of Incomplete Just assembled/ Well a Just Well a Just
prototype/Model fabrication/ fabricated and assembled/ assembled/
assembly parts are not fabricated with fabricated with
functioning well. proper proper
Not in proper functioning functioning
shape, dimensions parts.in proper parts.in proper
beyond tolerance shape, within shape, within
limit. tolerance tolerance
Appearance/ dimensions and dimensions and
finish is shabby. good finish. But good finish/
no creativity in appearance.
design and use Creativity in
of material design and use of
material.
6. Report Preparation Very short, poor Nearly sufficient Detailed, correct Very detailed,
quality sketches, and correct details and clear correct, clear
Details about about methods, description of description of
methods, materials, methods, methods,
materials, precautions and materials, materials,

20
Precautions and conclusion. but precautions and precautions and
Conclusions clarity is not there conclusion. conclusion.
omitted, some in presentation. Sufficient Enough tables,
details are wrong. But not enough graphic charts and
graphic description sketches
description

7. Presentation of the Major information Includes major Includes major Well organized,
Micro-Project is not included, information but information but Includes major
information is not not well not well information,
well organized. organized not organized not presented well.
presented well. presented well.

8. Viva Could not reply to Replied to Replied properly Replied most of


considerable considerable considerable the questions
number of number of number of properly
question questions nut not question.
very properly

21
Annexure IV

Micro Project Evaluation Sheet

Name of Student: Sarang Jagdale Enrollment No: 2101410027

Name of Program: TYAN Semester: AN-6-I

Course Title: Big Data Analytics [BDA] Code: 22684

Title of the Micro-project: Load the Dataset and Store it in Data-Frame using Pandas

Course Outcomes Achieved: -


C22684.a: Describe Big data and Big Data Analytics.
C22684.b: Apply the Big data and Big Data Analytics.

Sr Characteristic to be Poor Average Good Excellent Sub Total


No. accessed (Marks 1- (Marks 4-5) (Marks 6- (Marks 9-
3) 8) 10)
(A) Process and Product Assessment (Convert above total marks out of 6 Marks)
1 Relevance to the
course
2 Literature
Review/information
collection
3 Completion of the
Target as per
project proposal
4 Analysis of Data
and representation
5 Quality of the
Prototype/Model
6 Report Preparation
(B) Individual Presentation/ Viva (Convert above total marks out of 4 Marks)
7 Presentation
8 Viva

(A) (B) Total Marks


Process and Product Individual 10
Assessment (6 Marks) Presentation/ Viva
(4 Marks)

Comments/ suggestions about Team work/ Leadership/Inter-Personal communication


(If any)
……………………………………………………………………………………………
Name and Designation of the Teacher…………………………………….
Dated Signature……………………………………………………………

22
Log Book of the Student (Hourly Work
Report) Academic Year: 2023-2024
Name of Student: Sarang Jagdale
Title of the Project: Load the Dataset and Store it in Data-Frame using Pandas
Course: Big Data Analytics [BDA] Course Code: 22684
Semester: AN6I
Sr. No. Date Time Work Done

1. 03/01/24 4 PM - 5PM Study for selecting Micro project topic

2. 05/01/24 4 PM - 5PM Discussion about selected Micro project


topic with concerned Course Teacher
3. 09/01/24 4 PM - 5PM Finalize and Study for selected topic

4. 14/01/24 4 PM - 5PM Drafting Proposals

5. 17/01/24 4 PM - 5PM Proposal submission

6. 24/01/24 4 PM - 5PM Micro project Proposal Presentation

7. 27/01/24 4 PM - 5PM Making Changes in presentation, if suggested by


concerned teacher
8. 03/02/24 4 PM - 5PM Study from different resources

9. 13/02/24 4 PM - 5PM Collect information from studied resources

10. 27/02/24 4 PM - 5PM Arrange collected information

11. 05/03/24 4 PM - 5PM Executing Micro project

12. 13/03/24 4 PM - 5PM Drafting Methodology

13. 24/03/24 4 PM - 5PM Drafting Literature Review

14. 01/04/24 4 PM - 5PM Drafting Result, Discusser

15. 01/04/24 4 PM - 5PM Micro project Presentation

16. 08/04/24 4 PM - 5PM Micro Project final submission

Ms.R.G. Waghmare

23
Rubrics Used for Evaluation of a Micro Project

Program/Semester /Master: AN6I Course Code:


22684
Course : Big Data Analytics [BDA]
Title of the Micro project: Load the Dataset and Store it in Data-Frame using Pandas
Course Outcome Achieved: -
C22684.a: Describe Big data and Big Data Analytics.
C22684.b: Apply the Big data and Big Data Analytics.

Assessment of micro project based on rubrics for performance in group activity :( Marks to be
given out of 06
Assessment of performance in individual presentation/Viva of micro project: (Marks to be given
out of 04
Scale used for assessment: Poor (1-3), Average (4-5), Good (6-8), Excellent (9-10)
A) Process and Product Assessment (A):

Rubric
Characteristics to be assessed Marks Obtained out of 10
No.
1 Relevance to course
2 Literature review/information collection
3 Completion of target as per project proposal
4 Analysis of data and representation
5 Quality of prototype/model
6 Report Preparation
Total Out of (60)
Process and Product Assessment (A): Total Out of
(06)

B) Individual Presentation/Viva(B)

Rubric 7 Rubric7 Individual


Individual
Presentation/
Presentation/
Viva
Viva Total
Individual Individu (Convert out
Roll Enrollment No. (Addition of (A+B)
Name of Student Presentation al Viva of 08 marks
No. marks in
into out of 4)
Rubric 7 to7) (B)
Marks Marks Marks out of Marks out of Marks
out of 10 out of 10 20 04 out of 10
2609 2101410027 Sarang Jagdale

Name & signature of Faculty

24
Evaluation Sheet for the Micro Project
Academic Year: 2023-2024 Name of Faculty: Ms.R.G.Waghmare
Course: Big Data Analytics [BDA]
Course Code: 22684
Semester: AN6I
Title of the Project: Load the Dataset and Store it in Data-Frame using Pandas

COs addressed by the Micro Project:


CO1. Understand Big Data and its analytics in the real world
CO2. Implement the Arrays and functions in Android.
CO3. Create event-based web forms using Android.
CO4. Create Menus and navigations in web Pages.

Major Learning Outcomes achieved by students by doing the project:

(a) Practical Outcomes:


a. Working in a team and as an individual.
b. Presenting information in proper sequence.
c. Developed skills to apply python concepts.
d. Improved analysis skill.

(b) Unit Outcomes in Cognitive domain:


1c. Develop Android to implement loop for solving the given iterative problem.
2b. Perform the specified string manipulation operation on the given String(s).
3a. Analyze the Big Data framework like Hadoop and NOSQL to efficiently store
and process Big Data to generate analytics
3b. Implement Big Data Activities using Hive
6c. Understand Big Data and its analytics in the real world

(c) Outcomes in Affective Domain:


a. Follow safety practices.
c. Demonstrate working as a leader/a team member.
d. Follow ethical Practices

Comments/Suggestions about team work/leadership/inter-personal communication


Roll No. Student Name Marks out of (6) for Marks out of (4) Total
performance in group For performance in (10)
activity oral /presentation

2609 Sarang Jagdale

(Dated Signature of Faculty)

25

You might also like