Lecture 01
Lecture 01
Firoz Anwar
Source: https://www.edureka.co/blog/what-is-data-science/
Source: https://www.dataquest.io/blog/what-is-data-science/
Source: https://data-flair.training/blogs/data-science-applications/
§ Time Series Analysis
§ Method of analysing time series data to extract meaningful pattern and
characteristics of the data
§ It
has Seasonality Trends, i.e. variations specific to a particular
time frame. For example, if you see the sales of a woollen jacket
over time, you will invariably find higher sales in winter seasons.
§ Trend
A general direction in which
something is developing or changing.
§ Seasonality
Predictable pattern that recurs or
repeats over regular intervals,
typically within a year or less.
§ Irregular fluctuation
§ Variations that occur due to sudden
causes and are unpredictable.
Source: https://towardsdatascience.com/
§ Typical Steps
§ Understanding the Data
§ Hypothesis
§ Feature Extraction
§ Exploratory Data Analysis (EDA)
§ Forecasting with Multiple Model
§ Naive approach, Moving average, Simple exponential smoothing, Holt.s linear
trend model, Auto Regression Integrated Moving Average(ARIMA), SARIMAX,
etc.
§ Model Evaluation
§ Mean Square Error(MSE), Root Mean Squared Error(RMSE) etc.
“Around 25-50 billion devices are
expected to be connected to the
Internet by 2020.” (Mahdavinejad et
al. 2017)
Source: https://www.channelfutures.com/
§ https/json
§ Plain text
§ Binary data
§ XML
§ Proprietary
§ Stream Data
§ Periodical data collection from device
§ API endpoints
§ Window-based descriptive statistics
§ Seasonal pattern
§ Trend pattern
Source: https://www.i-scoop.eu/internet-of-things-guide/
CGM (Continuous Glucose
Monitoring)
§ “Glucose Concen- tration can be
Predicted Ahead in Time From
Continuous Glucose Moni- toring
sensor Time-Series” by Sparacino et
al.
§ Parameter estimation
§ Weighted Linear Regression on
sampling window
Source: https://towardsdatascience.com/
§ Data Analysis is the heart of Data Science
§ Combination of various analytics/advance analytics shows the bigger
picture.
Introduction to Python Programming
Why Python Programming
I Why Programming?
I Programming is a tool to realise your data analysis ideas
I Data Science relies on programming heavily (why?)
I Why Python Programming?
I Interpreted Programming Language
I Can run interactively (natively interactive including terminal
and IPython)
I Other fancy stuff: Jupyter Notebook, python markdown, . . .
I Easy and flexible syntax
I Powerful third-party package support
I Convenient interface with other languages such as C/C++
The classic Hello, World! program
Python version
print "Hello, world!"
Java version
public class HelloWorldApp {
C++ version public static void main(String [] arg
{
#include <iostream> System.out.println("Hello, World!");
}
int main(){ }
std::cout<<"Hello, World!"<<std::endl;
return 0;
}
Basics of Python Programming
Output and input
## Hello, World
## I love programming
## I love Python
or
## Hello, World
## I love programming
## I love Python
## "Hello, World"
## Hello, World
= "Hello"
t = "World"
print s, t
## Hello World
Keyboard input method
Read a string
Read an integer
a=5/2 # a=2,
b=5.0/2 # b=2.5
I Relational Operators (>,>=,<,<=, ==, !=)
I Used for variable comparison, e.g. numbers and strings
I == (equality) vs = (assignment)
if a==b:
print "a equals b"
C = input("Enter a
Celsius value: ")
F = 9.0/5*C+32
mark = input("Enter
your mark: ")
if mark<50:
print "Fail"
else:
print "Pass"
Example: grade calculator II
mark = input("Enter your mark: ")
if mark<50:
print "Fail"
else:
if mark<65:
print "Pass"
else:
if mark<75:
print "Credit"
else:
if mark<85:
print "D"
else:
print "HD"
Example: grade calculator II (using elif statement)
8 4
8
if mark<50: if mark<50:
if mark<50: print "Fail" print "Fail"
print "Fail" else: else:
print "Pass" print "Pass"
while len(str)!=8:
print "Input error"
str = raw_input("Enter a
8-character string")
Example: range-based loop
Loops can be used to go through all items in a list
# sum over all items in the list
# print all items in the list
xlist = [1, 3, 5, 7, 9]
xlist = [1, 3, 5, 7, 9]
sum = 0
for x in xlist:
for x in xlist:
print x,
sum = sum + x
print
print "sum =", sum
## 1 3 5 7 9
## sum = 25
How about searching for a number in the list?
Or finding the maximum/minimum value?
Specifying range
range(stop)
range(start, stop[, step])
Use a function
# x = [0,1,2,3,4,5,6,7,8,9]
xlist = range(10)
# the following code prints the sum of xlist
sum = 0
for x in xlist:
sum = sum + x
print "sum =", sum
# x = [1,3,5,7,9]
xlist = range(1, 10, 2)
# the following code prints the sum of xlist
sum = 0
for x in xlist:
sum = sum + x
print "sum =", sum
## sum = 45
## sum = 25
Code reuse
# define the sum function
def sumFunc(xlist):
sum = 0
for x in xlist:
sum = sum + x
print "sum =", sum
# x = [0,1,2,3,4,5,6,7,8,9]
xlist = range(10)
# the following code prints the sum of xlist
sumFunc(xlist)
# x = [1,3,5,7,9]
xlist = range(1, 10, 2)
# the following code prints the sum of xlist
sumFunc(xlist)
## sum = 45
## sum = 25
Summary
List
- A list is a collection of values
food = [“chicken”, “beef”, “egg”, “milk”]
- ‘[’ and ’]’ are used to define the list
- items are separated by ’,’s — A list item can be any object — even
another list
Lists behaves like arrays in C++ and Java and follow similar indexing
rules.
List operations
Create a list
Search a list
x = [ hello , world ]
# return the position of "world" in the list
pos = x.index("world") # pos = 1
# raises a valueError if item not found
pos = x.index("work") # pos undefined
Modify a list
Initial: x = [0,1,2]
x = x + [3] # x = [0,1,2,3]
x.append(3) # x = [0,1,2,3]
x = x + [3,4,5] # x = [0,1,2,3,4,5]
x.extend([3,4,5]) # x = [0,1,2,3,4,5]
x.insert(1,5) # x = [0,5,1,2]
x.insert(0,3) # x = [3,0,1,2]
x = [3] + x # x = [3,0,1,2]
x = [3,4] + x # x = [3,4,0,1,2]
Iteration and List Comprehension
List items can be iterated in a loop
for x in range(5):
print x
## 0
## 1
## 2
## 3
## 4
for x in [0,1,2,3,4]:
print x
## 0
## 1
## 2
## 3
## 4
Return the list index in a loop
## 0 Hello
## 1 world
## 2 Python
List comprehension o�ers easy and natural ways to construct lists
## [0, 1, 4, 9, 16]
## [0, 2, 4, 6, 8]
Tuples
Modify a dictionary
fid = open("mytext.txt")
fid = open("mytext.txt")
# print each line of mydata.txt in a loop
for line in fid: fid = open("mytext.txt")
print line # print each line of mydata.txt in a loop
fid.close() for line in fid:
print line.strip()
## First line fid.close()
##
## Second line ## First line
## ## Second line
## Three lines in total ## Three lines in total
Write to files
fid.write(line)