[go: up one dir, main page]

0% found this document useful (0 votes)
25 views16 pages

8 LO5 Lect 1

Uploaded by

Ali Azgar Katha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views16 pages

8 LO5 Lect 1

Uploaded by

Ali Azgar Katha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

LO5

Prepare Data for Modeling


Topics cover in LO5 & LO6
• Introduce various python libraries
• Filtering and selecting data
• Concatenating and transforming data
• Data visualization best practices
• Visualizing data
• Creating a plot
• Creating statistical data graphics
• Performing basic math and linear algebra
• Correlation analysis
• Multivariate analysis
• Data sourcing via web scraping
Objective of today’s session
After attending this session, you should know
• Panda library introduction
• Filtering and selecting
• Treating missing values
Coding languages for data science
• Python
• R
• Julia
• Go

• Python is a high-level interpreted coding language that's useful for a wide


variety of applications.
• It is an official programming language of Google
Benefits of using Python
• It is extremely easy to learn and it's human readable.
• Got an extensive array of well-supported date science libraries.
• Got the biggest user base of all data science languages.
• Use for building predictive web applications as well and use for lot of
different functions, not just data science.
Python is a popular language
Python is most popular in data science
Why use Python for working with Data
Python is useful for:
• Data science, data analytics, and data engineering
• Useful in both a professional and an academic environment
• Python is an open-source programming language
• Web development
• Application development
• Game development
Main Python libraries for data science
Panda library introduction
• Pandas is useful for its fast data cleansing preparation, powerful analysis
capabilities, ease of use for data visualization, ease of use for machine
learning
• its compatibility with NumPy array and matrices.
• It is built on top of NumPy.
• Arrays and matrices are called series and DataFrames in pandas.

Shortcuts in jupyter: https://yoursdata.net/jupyter-lab-shortcut-and-magic-functions-tips/


Indexing in pandas
• An index is a list of integers or labels you use to uniquely identify rows
and columns.
We use
• A set of square-brackets […..]
• The .loc[] indexer
Introducing the pandas library
• A DataFrame object is pretty much a
spreadsheet of rows and columns
• the rows and columns individually are
actually series objects in the pandas
library
• DataFrames are indexable.
• A series object is a single row or
column and it is always indexed
Comparison operators in pandas
Code demonstration
• Introduce Jupyter notebook
• Plain indexing
• Data slicing
• Arithmetic comparisons

PACKAGES/MODULE: https://ajaytech.co/2020/04/21/modules-vs-packages-vs-libraries-vs-frameworks/
Random seed: https://www.youtube.com/watch?v=8B1z3xwNy2s
Summary
• Panda library introduction
• Filtering and selecting
• Treating missing values
Himanshu Patel, Instructor
Saskatchewan Polytechnic
email: patelh@saskpolytech.ca
Mining building, Saskatoon

You might also like