Unit 4
Unit 4
Internet of Things
Creating an interactive environment
Network of devices connected together
IOT: Remote Data Logging
Collect data from the devices in the network
Send the data to a server/remote machine
Control the network remotely
System Overview:
A network of Temperature and humidity sensor connected with
Raspberry Pi
Read data from the sensor
Send it to a Server
Save the data in the server
IOT : Remote Data Logging
( contd..)
Requirements
DHT Sensor
4.7K ohm resistor
Jumper wires
Raspberry Pi
DHT
Sensor
Digital Humidity and Temperature
Sensor (DHT)
PIN 1, 2, 3, 4 (from left to right)
PIN 1- 3.3V-5V Power supply
PIN 2- Data
PIN 3- Null
PIN 4- Ground
Sensor- Raspberry Pi
Interface
Connect pin 1 of DHT sensor to the
3.3V pin of Raspberry Pi
Socket Programming:
Creates a two-way communication between two nodes in a network
The nodes are termed as Server and Client
Server performs the task/service requested by the client
Creating a socket:
s = socket.socket (SocketFamily, SocketType, Protocol=0)
SocketFamily can be AF_UNIX or AF_INET
SocketType can be SOCK_STREAM or SOCK_DGRAM
Protocol is set default to 0
Sending Data to a Server (contd..)
Server:
s = socket.socket() # creating a socket object
host = socket.gethostname() # local machine name/address
Client:
s = socket.socket() # creating a socket object
host = socket.gethostname() # getting local machine name
port = 12345 # assigning a port
s.connect((host, port))
print s.recv(1024)
s.close
Sending Data to a Server (contd..)
def sensordata():
GPIO.setmode(GPIO.BOARD)
GPIO.setwarnings(False)
sensor = Adafruit_DHT.AM2302
humidity, temperature = Adafruit_DHT.read_retry(sensor,17)
return(humidity, temperature)
while True:
data, address =
sock.recvfrom(4096) with
open(“Datalog.txt","a") as f:
mess=str(data)
f.write(mess) print
mess
f.close()
Resul
t
The client takes reading from
the sensor and sends it to the
server
The server receives the
data from the client and
saves it in a text file
DataLog.txt
Implementation of IoT with Raspberry Pi: Part
3
Internet of Things
Creating an interactive environment
Network of devices connected
together
IOT: Remote Data Logging
Collect data from the devices in the network
Send the data to a server/remote machine
Processing the data
Respond to the network
IOT: Remote Data Logging
System Overview:
A network of Temperature and humidity sensor connected with
Raspberry Pi
Read data from the sensor
Send it to a Server
Save the data in the server
Data Splitting
Plot the data
IOT: Remote Data Logging
( contd..)
Requirements
DHT Sensor
4.7K ohm resistor
Jumper wires
Raspberry Pi
DHT
Sensor
Digital Humidity and
Temperature Sensor (DHT)
PIN 1, 2, 3, 4 (from left to
right)
PIN 1- 3.3V-5V Power
supply
PIN 2- Data
PIN 3- Null
PIN 4- Ground
Sensor- Raspberry Pi
Interface
Connect pin 1 of DHT
sensor to the 3.3V pin of
Raspberry Pi
Connect pin 2 of DHT
sensor to any input pins
of Raspberry Pi, here we
have used pin 11
Connect pin 4 of DHT
sensor to the ground pin
of the Raspberry Pi
Read Data from the Sensor
Use the Adafruit library for DHT22 sensor to read the sensor
data
Sending Data to a Server
Sending data to server using socket programming
Create a client and server
Establish connection between the two
Send data from the client to the server
Save the data in a file
Data Processing
Source: MATPLOTLIB, John Hunter, Darren Dale, Eric Firing, Michael Droettboom and the Matplotlib development team, 2012 - 2016
Data Processing (contd..)
Source: MATPLOTLIB, John Hunter, Darren Dale, Eric Firing, Michael Droettboom and the Matplotlib development team, 2012 - 2016
Data Processing (contd..)
Basic Plot:
Data Proceessing (contd..)
Source: MATPLOTLIB, John Hunter, Darren Dale, Eric Firing, Michael Droettboom and the Matplotlib development team, 2012 - 2016
Sending Data to a Server (contd..)
sock = socket.socket(socket.AF_INET,
Client: socket.SOCK_DGRAM) #create UDP socket
server_address = ('10.14.3.194', 10001)
try:
def sensordata(): while (1):
GPIO.setmode(GPIO.BOARD) h,t = sensordata()
GPIO.setwarnings(False) message = str(h)+','+str(t) #Send data
sensor = Adafruit_DHT.AM2302 print >>sys.stderr, 'sending "%s"' % message
sent = sock.sendto(message, server_address)
humidity, temperature =
finally:
Adafruit_DHT.read_retry(sensor,17)
print >>sys.stderr, 'closing socket'
return(humidity, temperature) sock.close()
Sending Data to a Server (contd..)
•Server:
ax.grid()
ax = fig.add_subplot(122)
• def coverage_plot(data,i): ax.plot(hum,i, c='b', marker=r'$\Phi$')
hum=data.split(",")[0] plt.xlabel('Humidity ($\%$)')
• tem=data.split(",")[1] ax.grid()
• print 'temp='+(str(tem))+'iter='+str(i) fig.show()
plt.ion() fig=plt.figure(num=1,figsize=(6,6)) fig.canvas.draw(
• plt.title(' IoT Temperature and Humidity Monitor') )
• ax = fig.add_subplot(121)
• ax.plot(tem,i, c='r', marker=r'$\Theta$') sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
plt.xlabel('Temp ($^0 C$)’)
# Bind the socket to the port
server_address = ('10.14.3.194', 10001)
sock.bind(server_address)
Sending Data to a Server (contd..)
Server:
i=0
while True:
data, address = sock.recvfrom(4096)
with open("DataLog.txt","a") as f:
mess=str(data)
f.write(mess)
coverage_plot(mess,i)
print mess
i+=1
f.close()
Output
The Reading from the sensor is
sent to the Server and saved
in a text file.
Two different plots for
temperature and humidity
data
Output
Software- Defined Networking –
Part I
Restructuring the Current Network
Infrastructure
Overview of Current Network
User II
User I
Overview of Current Network
User II
User I
• When the data is sent OSPF protocol route the packet through
switches depending on which route the OSPF protocol finds to be
the best.
Overview of Current Network
User II
User I
app
app OS
OS hardware app
hardware
app OS
OS hardware
app
hardware
OS app
hardware OS
hardware
Switches forward traffic in a distributed manner. They do not have a global view of the
network
Limitations in Current Network
Routing, mobility
Thousands
management, etc. app app app
lines of
code
Operating system
Cost-expensive
Specialized packet
Millions of gates, forwarding hardware
~10GB RAM
It is required is to make these networks efficient by trying to overcome
these challenges or the limitations.
app
OS
app
app
OS
OS
hardware
app hardware
OS
hardwar
e
app app
hardware OS OS
hardware
hardware
Current Network to SDN
Operating system
DATA PLANE
Origin of
SDN
2006: At Stanford university, a team proposes a clean-slate security
architecture (SANE) to control security policies in a centralized
manner instead of doing it at edges.
2008: The idea of software-defined network is originated from
OpenFlow project (ACM SIGCOMM 2008).
2009: Stanford publishes OpenFlow V1.0.0 specs.
June 2009: Nicira network is founded.
March 2011: Open Networking Foundation is formed.
Oct 2011: First Open Networking Summit. Many Industries
(Juniper, Cisco announced to incorporate.
SDN Architecture
Network Function
Virtualization
Notes
- Northbound API
- Southbound API
(via OpenFlow)
APIs in
SDN
Southbound API
Used to communicate between control layer and infrastructure layer.
OpenFlow protocol is used.
Northbound API
Used to communicate between control layer and application layer.
Standard APIs are used.
East-Westbound APIs
Used to communicate among multiple controllers in the control layer.
Basic Concepts of
SDN
Separate control logic from hardware switches
Define the control logic in a centralized manner
Control the entire network including individual switches
Communication between the application, control, and data
planes are done through APIs
Components/Attributes of SDN
Hardware switches
Controller
Applications
Flow-Rules
Application programming interfaces (APIs)
Current Status of SDN
Rule Placement I
The flow tables are residing in these TCAM memories and these
TCAM memories are very fast; that means, fast processing can be
done that is why the specialized memories are required.
Rule Placement Challenges II
On receiving a request, for which no flow-rule is present in the switch,
the switch sends a PACKET-IN message to the controller.
The controller decides a suitable flow-rule for the request.
The flow-rule is inserted at the switch.
Typically, 3-5ms delay is involved in a new rule placement
Controller
OpenFlow Protocol
Only one protocol is available for rule placement – OpenFlow.
It has different versions – 1.0, 1.1, 1.2, 1.3, etc. – to have
different number of match-fields.
Different match-fields
Source IP
Destination IP
Source Port
Priority
etc.
How much time a flow-rule is to be kept at the switch?
Hard timeout
All rules are deleted from the switch at hard timeout.
This can used to reset the switch.
Soft timeout
If NO flow is received associated with a rule for a particular time, the rule is
deleted.
This is used to empty the rule-space by deleting an unused rule.
OpenFlow Switch Software
Packet-IN
Flow-Rule
Switch
Hierarchical (tree) Architecture
Ring Architecture
Mesh Architecture - Reliable
User II
User I
Control Mechanisms
Distributed
The control decisions can be taken in a distributed manner
Ex: each subnetwork is controlled by different controller
Centralized
The control decisions are taken in a centralized manner.
Ex: A network is controlled by a single controller.
Backup Controller
Security I
Enhanced security using SDN
Firewall
Proxy
HTTP
Intrusion detection system (IDS)
Controller Configuration Software
Pox
Nox
FloodLight
OpenDayLight [Popular!]
ONOS [Popular!]
Software- Defined IoT Networking – Part I
Recent Advances of SDN in
IoT
IoT
Architecture
Benefits of Integrating SDN in
IoT
Intelligent routing decisions can be deployed using SDN
Simplification of information collection, analysis and decision
making
Visibility of network resources – network management is
simplified based on user, device and application-specific
requirements
Intelligent traffic pattern analysis and coordinated
decisions
SDN for IoT
I
SDN for IoT
II
R le-placement at access
u devices, hile considering
w mobility and terogeneity of
h end-users
e
SDN for IoT
IV
Number of control messages in the network is higher using Soft-WSN over the
traditional WSN. This is due to the PACKET-IN message in the network. Each time a
node receives a new packet, it asks the controller for getting adequate forwarding logic.
Software- Defined WSN
III
SDN-WISE (Galluccio et al., IEEE INFOCOM ’15)
A software-defined WSN platform is designed
Flow-table for rule placement at sensor nodes is designed
Any programming language can be used through API to program the
nodes in real-time
SDN-WISE Protocol Stack
Sensor node includes
IEEE 802.15.4 protocol
Micro control unit (MCU)
Above IEEE 802.15.4 stack,
Forwarding layer consists of
Flow-rules.
INPP – In Network Packet
Processing
Source: Towards Programmable Enterprise WLANs with Odin, ACM HotSDN 2012
ODIN II
Conversion of 802.11
LVAP – Light virtual AP
Ubi-Flow I
Mobility management in SDIoT
Scalable control of the APs
Fault tolerance
Flow-Scheduling
Network partition
Network matching
Load balancing
Source: UbiFlow: Mobility Management in Urban-scale Software Defined IoT, IEEE INFOCOM 2015
Mobi-Flow I
Mobility-aware flow-rule placement in SDIoT
Mininet-WiFi
Can be used to deploy a network
Supports both wired and wireless network
Wired – Ethernet protocol
Wireless – WiFi protocol (IEEE 802.11 group)
ONOS
Can be used to place the controllers
Summary
SDN-based solution approaches are useful to deal with
different challenges present in an IoT network
Issues with interference and mobility can be managed using
software-defined approaches
Data Handling and Analytics – Part I
What is Data Handling
Data handling
Ensures that research data is stored, archived or disposed off in a
safe and secure manner during and after the conclusion of a
research project
Includes the development of policies and procedures to manage
data handled electronically as well as through non‐electronic
means.
“Big data shall mean the data of which the data volume,
acquisition speed, or data representation limits the capacity of
using traditional relational methods to conduct effective analysis
or the data which may be effectively processed with important
horizontal zoom technologies.”
[National Institute of Standards and Technology (NIST)]
Types of Data
Structured data
Data that can be easily organized.
Usually stored in relational databases.
Structured Query Language (SQL) manages structured data in
databases.
It accounts for only 20% of the total available data today in the
world.
Unstructured data
Information that do not possess any pre‐defined model.
Traditional RDBMSs are unable to process unstructured data.
Enhances the ability to provide better insight to huge datasets.
It accounts for 80% of the total data available today in the world.
Characteristics of Big Data
Big Data is characterized by 7 Vs –
Volume
Velocity
Variety
Variability
Veracity
Visualization
Value 5
Characteristics of Big Data (Contd.)
Volume
Quantity of data that is generated
Sources of data are added continuously
Example of volume ‐
• 30TB of images will be generated every night from the Large
Synoptic Survey Telescope (LSST)
• 72 hours of video are uploaded to YouTube every minute 6
Characteristics of Big Data (Contd.)
Velocity
Refers to the speed of generation of data
Data processing time decreasing day‐by‐day in order to provide
real‐time services
Older batch processing technology is unable to handle high
velocity of data
Example of velocity –
140 million tweets per day on average (according to a survey conducted
in 2011)
New York Stock Exchange captures 1TB of trade information during
each trading session 7
Characteristics of Big Data (Contd.)
Variety
Refers to the category to which the data belongs
No restriction over the input data formats
Data mostly unstructured or semi‐structured
Example of variety –
Pure text, images, audio, video, web, GPS data, sensor data,
SMS, documents, PDFs, flash etc.
8
Characteristics of Big Data (Contd.)
Variability
Refers to data whose meaning is constantly changing.
Meaning of the data depends on the context.
Data appear as an indecipherable mass without structure
Example:
Language processing, Hashtags, Geo‐spatial data, Multimedia, Sensor events
Veracity
Veracity refers to the biases, noise and abnormality in data.
It is important in programs that involve automated decision‐
making, or feeding the data into an unsupervised machine
learning algorithm.
Veracity isn’t just about data quality, it’s about data 9
understandability.
Characteristics of Big Data (Contd.)
Visualization
Presentation of data in a pictorial or graphical format
Enables decision makers to see analytics presented visually
Identify new patterns
Value
It means extracting useful business information from scattered
data.
Includes a large volume and variety of data
Easy to access and delivers quality analytics that enables informed
decisions 10
Data Handling Technologies
Cloud computing
Essential characteristics according to NIST
On‐demand self service
Broad network access
Resource pooling
Rapid elasticity
Measured service
Basic service models provided by cloud computing
Infrastructure‐as‐a‐Service (IaaS)
Platform‐as‐a‐Service (PaaS)
11
Software‐as‐a‐Service (SaaS)
Data Handling Technologies (Contd.)
Internet of Things (IoT)
According to Techopedia, IoT “describes a future where
every day physical objects will be connected to the
internet and will be able to identify themselves to other
devices.”
Internet.
Data Handling Technologies (Contd.)
Data handling at data centers
Storing, managing, and organizing data.
Estimates and provides necessary processing capacity.
Provides sufficient network infrastructure.
Effectively manages energy consumption.
Replicates data to keep backup.
Develop business oriented strategic solutions from big
data.
Helps business personnel to analyze existing data.
Discovers problems in business operations. 14
Flow of Data
15
Data Sources
Enterprise data Bio‐medical data
Online trading and analysis Masses of data generated
data. by gene sequencing.
Production and inventory data. Data from medical clinics and
Sales and other financial data. medical R&Ds.
IoT data Other fields
Data from industry, Fields such as – computational
agriculture, traffic, biology, astronomy, nuclear
transportation research etc
Medical‐care data,
Data from public departments,
and families.
16
Data Acquisition
Data collection
Log files or record files that are automatically generated by data sources
to record activities for further analysis.
Sensory data such as sound wave, voice, vibration, automobile, chemical,
current, weather, pressure, temperature etc.
Complex and variety of data collection through mobile devices. E.g. –
geographical location, 2D barcodes, pictures, videos etc.
Data transmission
After collecting data, it will be transferred to storage system for further
processing and analysis of the data.
Data transmission can be categorized as – Inter‐DCN transmission and
Intra‐DCN transmission.
17
Data Acquisition (Contd.)
Data pre‐processing
Collected datasets suffer from noise, redundancy, inconsistency
etc., thus, pre‐ processing of data is necessary.
Pre‐processing of relational data mainly follows – integration,
cleaning, and redundancy mitigation
Integration is combining data from various sources and provides users
with a uniform view of data.
Cleaning is identifying inaccurate, incomplete, or unreasonable
data, and then modifying or deleting such data.
Redundancy mitigation is eliminating data repetition through
detection, filtering and compression of data to avoid unnecessary
18
transmission.
Data Storage
File system
Distributed file systems that store massive data and ensure – consistency,
availability, and fault tolerance of data.
GFS is a notable example of distributed file system that supports large‐
scale file system, though it’s performance is limited in case of small files
Hadoop Distributed File System (HDFS) and Kosmosfs are other notable file
systems, derived from the open source codes of GFS.
Databases
Emergence of non‐traditional relational databases (NoSQL) in order to deal
with the characteristics that big data possess.
Three main NoSQL databases – Key‐value databases, column‐oriented
databases, and document‐oriented databases.
19
Data Handling Using Hadoop
Reliable, scalable, distributed data handling
20
What is Hadoop
21
Building Blocks of Hadoop
Hadoop Common
A module containing the utilities that support the other Hadoop components
Hadoop Distributed File System (HDFS)
Provides reliable data storage and access across the nodes
MapReduce
Framework for applications that process large amount of datasets in parallel.
Yet Another Resource Negotiator (YARN)
Next‐generation MapReduce, which assigns CPU, memory and storage to applications
running on a Hadoop cluster.
22
Hadoop Distributed File System (HDFS)
Centralized node
Namenode
Maintains metadata info about files
Distributed node
Datanode
Store the actual data
Files are divided into blocks
Each block is replicated
Source:
http://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
23
Name and Data Nodes
Namenode
Stores filesystem metadata.
Maintains two in‐memory tables, to map the datanodes to the blocks, and vice versa
Datanode
Stores actual data
Data nodes can talk to each other to rebalance and replicate data
Data nodes update the namenode with the block information periodically
Before updating datanodes verify the checksums.
24
Job and Task Tra ckers
Job Tracker –
Runs with the Namenode
Receives the user’s job
Decides on how many tasks will run (number
of mappers)
Decides on where to run each mapper
(concept of locality)
Task Tracker –
Runs on each datanode
Receives the task from Job Tracker
Always in communication with the Job
Source:
Tracker reporting progress http://developeriq.in/articles/2015/aug/11/an-introduction-to-
apache-hadoop-for-big-data/
25
Hadoop Master / Slave Architecture
Master‐slave shared‐nothing architecture
Master
Executes operations like opening, closing,
and renaming files and directories.
Determines the mapping of blocks to
Datanodes.
Slave
Serves read and write requests from the
file system’s clients.
Performs block creation, deletion, and
replication as instructed by the Namenode.
Source:
http://ankitasblogger.blogspot.in/2011/01/hadoop-cluster-setup.html
26
Data Handling and Analytics – Part II
14
0
What is Data Analytics
“Data analytics (DA) is the process of examining data sets in
order to draw conclusions about the information they
contain, increasingly with the aid of specialized systems and
software. Data analytics technologies and techniques are
widely used in commercial industries to enable organizations
to make more‐ informed business decisions and by scientists
and researchers to verify or disprove scientific models,
theories and hypotheses.”
[An admin's guide to AWS data management]
Types of Data Analysis
Qualitative Analysis
o Deals with the analysis of data that is categorical in nature
Quantitative Analysis
o Quantitative analysis refers to the process by which numerical
data is analyzed
Qualitative Analysis
Data is not described through numerical values
Described by some sort of descriptive context such as text
Data can be gathered by many methods such as interviews, videos and
audio recordings, field notes
Data needs to be interpreted
The grouping of data into identifiable themes
Qualitative analysis can be summarized by three basic principles :
Notice things
Collect things
Think about things
Quantitative Analysis
Quantitative analysis refers to the process by which numerical data is
analyzed
Involves descriptive statistics such as mean, median, standard deviation
The following are often involved with quantitative analysis:
Statistical models Regression analysis
Analysis of variables Statistical significance
Data dispersion Precision
Analysis of relationships between variables Error limits
Contingence and correlation
Comparison
Qualitative Data Quantitative Data
Data is observed Data is measured
Examples are color, smell, taste, etc. Examples are volume, weight, etc.
Advantages
Allows for the identification of important (and often mission‐
critical) trends
Helps businesses identify performance problems that require
some sort of action
Can be viewed in a visual manner, which leads to faster and
better decisions
Better awareness regarding the habits of potential customers
It can provide a company with an edge over their competitors
Statistic al models
The statistical model is defined as the mathematical equation that
are formulated in the form of relationships between variables.
15
3
Data dispersion
A measure of statistical dispersion is a nonnegative real number that is zero if all
the data are the same and increases as the data becomes more diverse.
15
4
Data dispersion (Contd.)
Range
The range is calculated by simply taking the difference between the maximum and
minimum values in the data set.
Average absolute deviation
The average absolute deviation (or mean absolute deviation) of a data set is the average of the
absolute deviations from the mean.
Variance
Variance is the expectation of the squared deviation of a random variable from its mean
Standard deviation
Standard deviation (SD) is a measure that is used to quantify the amount of variation
or dispersion of a set of data values
15
5
Contingence and correlation
In statistics, a contingency table (also known as a cross tabulation or crosstab) is a
type of table in a matrix format that displays the (multivariate) frequency
distribution of the variables.
15
6
Contingence and correlation (Contd.)
Correlation is a technique for investigating the relationship between two
quantitative, continuous variables
Correlations are useful because they can indicate a predictive relationship that can be
exploited in practice
15
7
Regression analysis
In statistical modeling, regression analysis is a statistical process for estimating the
relationships among variables
15
8
Regression analysis (Contd.)
The estimation target is a function of the independent variables called the
regression function
Characterize the variation of the dependent variable around the regression
function which can be described by a probability distribution
Regression analysis is widely used for prediction and forecasting, where its use has
substantial overlap with the field of machine learning
Regression analysis is also used to understand which among the independent
variables are related to the dependent variable
15
9
Statistic al signific ance
Statistical significance is the likelihood that the difference in conversion rates
between a given variation and the baseline is not due to random chance
Statistical significance level reflects the risk tolerance and confidence level
There are two key variables that go into determining statistical significance:
Sample size
Effect size
16
0
Statistic al signific ance (Contd.)
Sample size refers to the sample size of the experiment
The larger your sample size, the more confident you can be in the result of the
experiment (assuming that it is a randomized sample)
The effect size is just the standardized mean difference between the two groups
If a particular experiment replicated, the different effect size estimates from each
study can easily be combined to give an overall best estimate of the effect size
16
1
Precision and Error limits
Precision refers to how close estimates from different samples are to each other
When the standard error is small, estimates from different samples will be close in
value and vice versa
16
2
Precision and Error limits (Contd.)
The limits of error are the maximum overestimate and the maximum underestimate
from the combination of the sampling and the non‐sampling errors
16
3