20ad41e2 - Data Science
20ad41e2 - Data Science
Course
Professional Elective Credits: 3
Category:
Course
Theory Lecture-Tutorial-Practical: 3-0-0
Type:
Sessional Evaluation: 40
Require Transformation Techniques, Linear
Prerequisite: Univ. Exam Evaluation: 60
Algebra and Python Programming.
Total Marks: 100
To impart knowledge on basics of data science, data manipulation and
exploratory data analysis concepts that is vital for data science.
Objectives: To develop skills for applying tools and techniques to analyze, visualize and
interpret data.
Upon successful completion of the course, the students will be able to:
Demonstrate knowledge on the concepts of data science to perform
CO1 mathematical computations using efficient storage and data handling methods in
NumPy.
Apply Data Preparation and Exploration methods using Pandas to perform
CO2
Course data manipulation
Outcomes CO3 To determine Data transformation, String manipulation techniques
CO4 To recognize Combining and merging datasets
Create data visualization using charts, plots and histograms to identify
CO5
trends, patterns and outliers in data using Matplotlib and Seaborn.
Construct methods to analyze and interpret time series data to extract
CO6
meaningful statistics
Course UNIT-I
Content Introduction to Data Science: Basic terminologies of data science, Types of data, Five
steps of data science, Arrays and vectorized computation using NumPy - The NumPy
ndarray: A multidimensional array object, Universal functions: Fast element-wise Array
functions, Array-oriented Programming with arrays, File input and output with arrays,
Linear algebra, pseudorandom number generation.
UNIT-II
Data Exploration with Pandas: Process of exploring data, Pandas data structures –
Series, Data frame, Index objects; Essential functionality, Summarizing and computing
descriptive statistics - Correlation and covariance, Unique values, Value counts and
membership; Data loading, Storage, and file formats - Reading and writing data in text
format, Binary data formats, Interacting with web APIs, Interacting with databases.
UNIT-III
Data Cleaning, Preparation: Handling missing data, Data transformation, String
manipulation - String object methods, Regular expressions, Vectorized string functions
in Pandas;
UNIT-IV
Data Wrangling: Data wrangling: join, Combine and reshape - Hierarchical indexing,
Combining and merging datasets, Reshaping and pivoting.
1
UNIT-V
Data Visualization with Matplotlib: Plotting and visualization- A brief matplotlib API
primer, Plotting with Pandas and Seaborn, Other python visualization tools; Data
aggregation and Group operations- GroupBy mechanics, Data aggregation, Apply:
General split-apply-combine, Pivot tables and Cross-tabulation.
UNIT-VI
Time Series Analysis: Date and time data types and tools, Time series basics, Date
ranges, Frequencies, and shifting. Time zone handling, Periods and period arithmetic,
Resampling and frequency Conversion – Downsampling, upsampling and interpolation,
Resampling with periods; Moving window functions.
TEXT BOOKS:
1. Wes McKinney, Python for Data Analysis, O ‘Reilly, 2nd Edition, 2017.
Text Books
& REFERENCE BOOKS:
Reference
1. Sinan Ozdemir, Principles of Data Science, Packt Publishers, 2nd Edition, 2018.
Books
2. Rachel Schutt, Cathy O‘Neil,Doing Data Science: Straight Talk from the
Frontline, O‘Reilly, 2014.
1. https://swayam.gov.in/nd1_noc19_cs60/preview
2. https://towardsdatascience.com/
3. https://www.w3schools.com/datascience/
E-Resources
4. https://github.com/jakevdp/PythonDataScienceHandbook
5. https://www.kaggle.com
PO11
PO12
PO1
PO2
PO3
PO4
PO5
PO6
PO7
PO8
PO9
CO1 2 3
CO2 1
CO3 2
CO4 2
CO5 3 1
CO6 3 1