[go: up one dir, main page]

0% found this document useful (0 votes)
2 views163 pages

Unit 4

The document discusses the implementation of IoT using Raspberry Pi, focusing on remote data logging with temperature and humidity sensors, data transmission to a server, and data processing using socket programming. It also introduces Software Defined Networking (SDN), highlighting its architecture, APIs, and advantages over traditional networking. The document outlines the challenges and current status of SDN in modern networks.

Uploaded by

tejthunder60
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views163 pages

Unit 4

The document discusses the implementation of IoT using Raspberry Pi, focusing on remote data logging with temperature and humidity sensors, data transmission to a server, and data processing using socket programming. It also introduces Software Defined Networking (SDN), highlighting its architecture, APIs, and advantages over traditional networking. The document outlines the challenges and current status of SDN in modern networks.

Uploaded by

tejthunder60
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 163

UNIT - IV

 Implementation of IoT with Raspberry Pi

 Introduction to Software defined Network (SDN)

 SDN for IoT

 Data Handling and Analytics


Implementation of IoT with
Raspberry Pi: Part 2

Internet of Things
 Creating an interactive environment
 Network of devices connected together
IOT: Remote Data Logging
 Collect data from the devices in the network
 Send the data to a server/remote machine
 Control the network remotely

System Overview:
 A network of Temperature and humidity sensor connected with
Raspberry Pi
 Read data from the sensor
 Send it to a Server
 Save the data in the server
IOT : Remote Data Logging
( contd..)
Requirements
 DHT Sensor
 4.7K ohm resistor
 Jumper wires
 Raspberry Pi
DHT
Sensor
 Digital Humidity and Temperature
Sensor (DHT)
 PIN 1, 2, 3, 4 (from left to right)
 PIN 1- 3.3V-5V Power supply
 PIN 2- Data
 PIN 3- Null
 PIN 4- Ground
Sensor- Raspberry Pi
Interface
 Connect pin 1 of DHT sensor to the
3.3V pin of Raspberry Pi

 Connect pin 2 of DHT sensor to any


input pins of Raspberry Pi, here we
have used pin 11

 Connect pin 4 of DHT sensor to the


ground pin of the Raspberry Pi
Read Data from the Sensor
 Adafruit provides a library to work with the DHT22
sensor

 Install the library in Raspberry Pi

 Use the function Adafruit_DHT.read_retry() to read data from


the sensor

Source: ADAFRUIT DHTXX SENSORS, Lady Ada, 2012-07-29


Program: DHT22 interfaced with Raspberry
Pi
Code Output
Sending Data to a Server
Sending data to Server using network protocols
 Create a server and client
 Establish connection between the server and the client
 Send data from the client to the server

Socket Programming:
 Creates a two-way communication between two nodes in a network
 The nodes are termed as Server and Client
 Server performs the task/service requested by the client

Creating a socket:
s = socket.socket (SocketFamily, SocketType, Protocol=0)
 SocketFamily can be AF_UNIX or AF_INET
 SocketType can be SOCK_STREAM or SOCK_DGRAM
 Protocol is set default to 0
Sending Data to a Server (contd..)

Server:
s = socket.socket() # creating a socket object
host = socket.gethostname() # local machine name/address

port = 12321 # port number for the server


s.bind((host, port)) # bind to the port
s.listen(5) # waiting for the client to connect
while True:
c, addr = s.accept() # accept the connection request from the client
print ‘Connected to', addr c.send(‘Connection Successful’)
c.close() #close the socket
Sending Data to a Server (contd..)

Client:
s = socket.socket() # creating a socket object
host = socket.gethostname() # getting local machine name
port = 12345 # assigning a port
s.connect((host, port))
print s.recv(1024)
s.close
Sending Data to a Server (contd..)

Client Code: Obtain readings from the sensor

def sensordata():
GPIO.setmode(GPIO.BOARD)
GPIO.setwarnings(False)
sensor = Adafruit_DHT.AM2302
humidity, temperature = Adafruit_DHT.read_retry(sensor,17)
return(humidity, temperature)

This function returns the values from the DHT sensor


Sending Data to a Server (contd..)

Client Code: Connecting to the server and sending the data

sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) #create UDP socket

server_address = ('10.14.3.194', 10001)


try:
while (1):
h,t = sensordata()
message = str(h)+','+str(t)
#Send data
print >>sys.stderr, 'sending "%s"' % message
sent = sock.sendto(message, server_address)
finally:
print >>sys.stderr, 'closing socket' sock.close()
Sending Data to a Server (contd..)
Server Code: Receive data from client and save it
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) # Create a UDP
socket
server_address = ('10.14.3.194',
10001) sock.bind(server_address) #
Bind the socket to the port

while True:
data, address =
sock.recvfrom(4096) with
open(“Datalog.txt","a") as f:
mess=str(data)
f.write(mess) print
mess
f.close()
Resul
t
 The client takes reading from
the sensor and sends it to the
server
 The server receives the
data from the client and
saves it in a text file
DataLog.txt
Implementation of IoT with Raspberry Pi: Part
3
Internet of Things
 Creating an interactive environment
 Network of devices connected
together
IOT: Remote Data Logging
 Collect data from the devices in the network
 Send the data to a server/remote machine
 Processing the data
 Respond to the network
IOT: Remote Data Logging

System Overview:
 A network of Temperature and humidity sensor connected with
Raspberry Pi
 Read data from the sensor
 Send it to a Server
 Save the data in the server
 Data Splitting
 Plot the data
IOT: Remote Data Logging
( contd..)
Requirements
 DHT Sensor
 4.7K ohm resistor
 Jumper wires
 Raspberry Pi
DHT
Sensor
 Digital Humidity and
Temperature Sensor (DHT)
 PIN 1, 2, 3, 4 (from left to
right)
 PIN 1- 3.3V-5V Power
supply
 PIN 2- Data
 PIN 3- Null
 PIN 4- Ground
Sensor- Raspberry Pi
Interface
 Connect pin 1 of DHT
sensor to the 3.3V pin of
Raspberry Pi
 Connect pin 2 of DHT
sensor to any input pins
of Raspberry Pi, here we
have used pin 11
 Connect pin 4 of DHT
sensor to the ground pin
of the Raspberry Pi
Read Data from the Sensor

Use the Adafruit library for DHT22 sensor to read the sensor
data
Sending Data to a Server
 Sending data to server using socket programming
 Create a client and server
 Establish connection between the two
 Send data from the client to the server
 Save the data in a file
Data Processing

Data from the client needs to be processed before it can be


used further
 Data splitting/filtering
 Data plotting
Data Processing
Data splitting/filtering:
 Data from the client is saved in a text file
 The values are separated by a comma(‘ , ’)
message = str(h)+','+str(t)
 Split() function can be used to split a string into multiple strings depending on the type of
separator/delimiter specified.
Example:
Data= ‘sunday,monday,tuesday’ #Data is a string with 3 words separated by a comma
Data.split(“,”) # split the data whenever a “,” is found
[‘sunday’,’monday’,’tuesday’] # Gives 3 different strings as output

Source: HOW TO USE SPLIT IN PYTHON, PythonForBeginners, Sep 26, 2012


Data Processing

Plotting the data:


 MATPLOTLIB is a python library used to plot in 2D
 Plot(x,y): plots the values x and y
 xlabel(‘X Axis'): Labels the x-axis
 ylabel(‘Y Axis'): Labels the y-axis
 title("Simple Plot"): Adds title to the plot

Source: MATPLOTLIB, John Hunter, Darren Dale, Eric Firing, Michael Droettboom and the Matplotlib development team, 2012 - 2016
Data Processing (contd..)

Plotting the data:


import matplotlib.pyplot as myplot
myplot.plot([1,2,3,4])
myplot.ylabel(‘Y-Axis’)
myplot.show()
By default the values are taken for y-axis, values for x-axis are generated automatically starting
from 0

Source: MATPLOTLIB, John Hunter, Darren Dale, Eric Firing, Michael Droettboom and the Matplotlib development team, 2012 - 2016
Data Processing (contd..)

Basic Plot:
Data Proceessing (contd..)

Some other common functions used in plotting:


 figure(): Creates a new figure
 grid(): Enable or disable axis grids in the plot
 ion(): turns on the interactive mode
 subplot(): Adds subplot in a figure
 Close(): Close the current figure window
 Scatter(): make a scatter plot of the given points

Source: MATPLOTLIB, John Hunter, Darren Dale, Eric Firing, Michael Droettboom and the Matplotlib development team, 2012 - 2016
Sending Data to a Server (contd..)
sock = socket.socket(socket.AF_INET,
Client: socket.SOCK_DGRAM) #create UDP socket
server_address = ('10.14.3.194', 10001)
try:
def sensordata(): while (1):
GPIO.setmode(GPIO.BOARD) h,t = sensordata()
GPIO.setwarnings(False) message = str(h)+','+str(t) #Send data
sensor = Adafruit_DHT.AM2302 print >>sys.stderr, 'sending "%s"' % message
sent = sock.sendto(message, server_address)
humidity, temperature =
finally:
Adafruit_DHT.read_retry(sensor,17)
print >>sys.stderr, 'closing socket'
return(humidity, temperature) sock.close()
Sending Data to a Server (contd..)
•Server:
ax.grid()
ax = fig.add_subplot(122)
• def coverage_plot(data,i): ax.plot(hum,i, c='b', marker=r'$\Phi$')
hum=data.split(",")[0] plt.xlabel('Humidity ($\%$)')
• tem=data.split(",")[1] ax.grid()
• print 'temp='+(str(tem))+'iter='+str(i) fig.show()
plt.ion() fig=plt.figure(num=1,figsize=(6,6)) fig.canvas.draw(
• plt.title(' IoT Temperature and Humidity Monitor') )
• ax = fig.add_subplot(121)
• ax.plot(tem,i, c='r', marker=r'$\Theta$') sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
plt.xlabel('Temp ($^0 C$)’)
# Bind the socket to the port
server_address = ('10.14.3.194', 10001)
sock.bind(server_address)
Sending Data to a Server (contd..)
Server:
i=0
while True:
data, address = sock.recvfrom(4096)
with open("DataLog.txt","a") as f:
mess=str(data)
f.write(mess)
coverage_plot(mess,i)
print mess
i+=1
f.close()
Output
 The Reading from the sensor is
sent to the Server and saved
in a text file.
 Two different plots for
temperature and humidity
data
Output
Software- Defined Networking –
Part I
Restructuring the Current Network
Infrastructure
Overview of Current Network

User II

User I
Overview of Current Network

User II
User I

OSPF Protocol executes at the switches

• Every switch is going to implement OSPF.

• When the data is sent OSPF protocol route the packet through
switches depending on which route the OSPF protocol finds to be
the best.
Overview of Current Network

User II

User I

the switch has been attacked!


needs to route through an alternate path!
Present: No centralized control.
Limitations in Current Network

app
app OS
OS hardware app
hardware
app OS
OS hardware
app
hardware
OS app
hardware OS
hardware

Each of these switches run different layers - a hardware, the operating


system and the applications that are running on top.

Switches forward traffic in a distributed manner. They do not have a global view of the
network
Limitations in Current Network

 Vendor-specific architecture of switches limits dynamic configuration


according to application-specific requirements.
(If we have to do any dynamic configuration based on the application
specific requirements that we cannot do very easily means dynamic
configuration cannot be done.)

 Switches are required to configure according to the installed


operating system (OS).
(Each of these switches they run different operating
system and hey have to be configured according to that
particular operating system and there is no centralized
control)
 Centralized control is not feasible in traditional network.
Limitations in Current Network

Routing, mobility
Thousands
management, etc. app app app
lines of
code
Operating system
Cost-expensive

Specialized packet
Millions of gates, forwarding hardware
~10GB RAM
It is required is to make these networks efficient by trying to overcome
these challenges or the limitations.

SDN - take care of the limitations by separating the application and


operating system from the
hardware.
Current Network to SDN

app
OS
app
app
OS
OS
hardware
app hardware
OS
hardwar
e
app app
hardware OS OS

hardware
hardware
Current Network to SDN

app app app

Operating system

packet packet packet


forwarding forwarding forwarding
hardware hardware hardware

DATA PLANE
Origin of
SDN
 2006: At Stanford university, a team proposes a clean-slate security
architecture (SANE) to control security policies in a centralized
manner instead of doing it at edges.
 2008: The idea of software-defined network is originated from
OpenFlow project (ACM SIGCOMM 2008).
 2009: Stanford publishes OpenFlow V1.0.0 specs.
 June 2009: Nicira network is founded.
 March 2011: Open Networking Foundation is formed.
 Oct 2011: First Open Networking Summit. Many Industries
(Juniper, Cisco announced to incorporate.
SDN Architecture
Network Function
Virtualization

Notes
- Northbound API
- Southbound API
(via OpenFlow)
APIs in
SDN
 Southbound API
 Used to communicate between control layer and infrastructure layer.
 OpenFlow protocol is used.
 Northbound API
 Used to communicate between control layer and application layer.
 Standard APIs are used.
 East-Westbound APIs
 Used to communicate among multiple controllers in the control layer.
Basic Concepts of
SDN
 Separate control logic from hardware switches
 Define the control logic in a centralized manner
 Control the entire network including individual switches
 Communication between the application, control, and data
planes are done through APIs

Components/Attributes of SDN
 Hardware switches
 Controller
 Applications
 Flow-Rules
 Application programming interfaces (APIs)
Current Status of SDN

 Companies such as Google have started to implement SDN


at their datacenter networks.

 It is required to change the current network with SDN in


a phased manner.

 Operational cost and delay caused due to link failure can


be significantly minimized.
Challenges inSDN
 Rule placement
 Controller placement

Rule Placement I

 Switches forward traffic based on a rule – ‘Flow-Rule’ –


defined by the centralized controller.
 Traditionally, Routing Table in every switch (L3 switch/router).
SDN maintains Flow Table at every switch.
 Flow-Rule: Every entry in the Flow Table.
 Each rule has a specific format, which is also defined by a
protocol (e.g., OpenFlow).
Rule Placement II

Example of a flow-rule based on OpenFlow protocol

Rule Placement III


 How to define/place the rules at switches, while considering
available TCAM.
 How to define rules, so that less number of PACKET-IN
messages are sent to controller.
Rule Placement Challenges I
 Size of ternary content-addressable memory (TCAM) is limited at the
switches.
 Limited number of rules can be inserted.
 Fast processing is done using TCAM at the switches.
 TCAM is very cost-expensive.
These rules tare placed in the open flow table and these flow tables
are basically stored in the switches the size of these flow tables are
such that they have to be stored in a specialized memory - TCAM

These memories are very limited in size so only limited number of


rules can be inserted in these flow tables.

The flow tables are residing in these TCAM memories and these
TCAM memories are very fast; that means, fast processing can be
done that is why the specialized memories are required.
Rule Placement Challenges II
 On receiving a request, for which no flow-rule is present in the switch,
the switch sends a PACKET-IN message to the controller.
 The controller decides a suitable flow-rule for the request.
 The flow-rule is inserted at the switch.
 Typically, 3-5ms delay is involved in a new rule placement
Controller
OpenFlow Protocol
 Only one protocol is available for rule placement – OpenFlow.
 It has different versions – 1.0, 1.1, 1.2, 1.3, etc. – to have
different number of match-fields.

 Different match-fields
 Source IP
 Destination IP
 Source Port
 Priority
 etc.
How much time a flow-rule is to be kept at the switch?

 Hard timeout
 All rules are deleted from the switch at hard timeout.
 This can used to reset the switch.

 Soft timeout
 If NO flow is received associated with a rule for a particular time, the rule is
deleted.
 This is used to empty the rule-space by deleting an unused rule.
OpenFlow Switch Software

 Indigo: Open source, it runs on Mac OS X.

 LINC: Open source, it runs on Linux, Solaris, Windows, MacOS, and


FreeBSD.

 Pantou: Turns a commercial wireless router/access point to an


OpenFlow enabled switch. OpenFlow runs on OpenWRT.

 Of13softswitch: User-space software switch based on Ericsson


TrafficLab 1.1 softswitch.

 Open vSwitch: Open Source, it is the MOST popular one present


today.
Software- Defined Networking – Part
II
Restructuring the Current Network
Infrastructure
Controller Placement I

 Controllers define flow-rule according to the application- specific


requirements.
 The controllers must be able to handle all incoming requests from
switches.
 Rule should be placed without incurring much delay.
 Typically, a controller can handle 200 requests in a second
(through a single thread).

Controller basically controls the overall flow in


the network.
Controller Placement II

 The controllers are logically connected to the switches in one-


hop distance.
 Physically, they are connected to the switches in multi-hop distance.

 If we have a very small number of controllers for a large


network, the network might be congested with control
packets (i.e., PACKET-IN messages).
For Controller placement there are different
architectures used.
One such basic Architecture is Flat Architecture
Flat Architecture
Controller

Packet-IN
Flow-Rule

Switch
Hierarchical (tree) Architecture
Ring Architecture
Mesh Architecture - Reliable

User II
User I
Control Mechanisms

 Distributed
 The control decisions can be taken in a distributed manner
 Ex: each subnetwork is controlled by different controller

 Centralized
 The control decisions are taken in a centralized manner.
 Ex: A network is controlled by a single controller.
Backup Controller

 If a controller is down, what will happen?


 Backup controller is introduced
 Replica of the main controller is created
 If the main controller is down, backup controller controls the
network to have uninterrupted network management.

Security I
 Enhanced security using SDN
 Firewall
 Proxy
 HTTP
 Intrusion detection system (IDS)
Controller Configuration Software
 Pox
 Nox
 FloodLight
 OpenDayLight [Popular!]
 ONOS [Popular!]
Software- Defined IoT Networking – Part I
Recent Advances of SDN in
IoT
IoT
Architecture
Benefits of Integrating SDN in
IoT
 Intelligent routing decisions can be deployed using SDN
 Simplification of information collection, analysis and decision
making
 Visibility of network resources – network management is
simplified based on user, device and application-specific
requirements
 Intelligent traffic pattern analysis and coordinated
decisions
SDN for IoT
I
SDN for IoT
II

Control of end-devices, such as sensors


and actuators
SDN for IoT
III

R le-placement at access
u devices, hile considering
w mobility and terogeneity of
h end-users
e
SDN for IoT
IV

Rule-placement and traffic


engineering at backbone networks
SDN for IoT
V

Flow classification and enhanced


security at data center networks
Wireless Sensor Network I
 Challenges
 Real-time programming of sensor nodes
 Vendor-specific architecture
 Resource constrained – heavy computation cannot be performed
 Limited memory – cannot insert too many control programs
Wireless Sensor Network II
 Opportunities
 Can we program the sensor nodes in real-time?
 Can we change the forwarding path in real-time?
 Can we integrate different sensor nodes in a WSN?
Software- Defined
WSN I
 Sensor OpenFlow (Luo et al., IEEE Comm. Letters ’12)
 Value-centric data forwarding
 Forward the sensed data if exceeds a certain value
 ID-centric data forwarding
 Forward the sensed data based on the ID of the source node

Real-life implementation of such method NOT done


Software- Defined WSN
II
 Soft-WSN (Bera et al., IEEE SJ ’16)
 Sensor Device Management
 Sensor management
 Multiple sensors can be implemented in a single sensor board
 Sensors can be used depending on application-specific requirements
 Delay management
 Delay for sensing can be changed dynamically in real-time
 Active-Sleep Management
 States of active and sleep mode can be changed dynamically
Software- Defined WSN
III
 Soft-WSN
 Topology Management
 Node-specific management – forwarding logic of a particular sensor can
be modified
 Network-specific management
 Forward all traffic of a node in the network
 Drop all traffic of a node in the network

Experimental results show that network performance can be improved using


software-defined WSN over traditional WSN
Soft-WSN: Result
I

Packet delivery ratio in the network increases using Soft-WSN


compared to the traditional WSN.
Soft-WSN: Result
II

Number of replicated data packets is reduced using Soft-WSN over


the traditional WSN.
Soft-WSN: Result
III

Number of control messages in the network is higher using Soft-WSN over the
traditional WSN. This is due to the PACKET-IN message in the network. Each time a
node receives a new packet, it asks the controller for getting adequate forwarding logic.
Software- Defined WSN
III
 SDN-WISE (Galluccio et al., IEEE INFOCOM ’15)
 A software-defined WSN platform is designed
 Flow-table for rule placement at sensor nodes is designed
 Any programming language can be used through API to program the
nodes in real-time
SDN-WISE Protocol Stack
 Sensor node includes
 IEEE 802.15.4 protocol
 Micro control unit (MCU)
 Above IEEE 802.15.4 stack,
Forwarding layer consists of
Flow-rules.
 INPP – In Network Packet
Processing

Source: Galluccio et al., IEEE INFOCOM ’15


Summar
y
 SDN is useful to manage and control IoT network
 Wireless sensor nodes and network can be controlled using
SDN-based applications
 Network performance can be improved significantly using
SDN-based approaches over the traditional approaches
Software- Defined IOT
Networking – Part II
Recent Advances of SDN in
IoT
SDN for IoT - Recap
 Software-Defined WSN
 Different approaches – Sensor OpenFlow, Soft-WSN, SDN-
WISE
 Software-Defined WSN can improve overall performance over
traditional WSN
Traditional (Wireless) Mobile
Network
 Problems in Traditional Mobile Network
 Difficult to Scale – static over-provisioned network are inflexible to
manage the mobile traffic with high demand
 Difficult to manage – many times lead to misconfigurations
 Inflexible – Requires too much time to introduce a new service as the
hardware architecture is inflexible
 Cost-expensive – Both capital expenditure and operational expenditure
are high

*Based on information from Open Networking Foundation (ONF)


SDN for Mobile Networking I
 Flow-Table Paradigm of SDN
 Well suited for end-to-end communication over multiple technologies
such as WiFi, 3G, 4G, etc.
 Logically Centralized Control
 Particularly useful for efficient base-station coordination for
addressing inter-cell interference

*Based on information from Open Networking Foundation (ONF)


SDN for Mobile Networking II
 Path Management
 Data can be routed based on service requirements without depending
on core routing policies
 Network Virtualization
 Abstracts the physical resources from the network services
 Helps in providing seamless connectivity and service differentiation
among users

*Based on information from Open Networking Foundation (ONF)


SDWMN- Use Case: Interference
Management
Signals of
eNodeB 2 will
not affect
signals of
eNodeB 3

Traditional Mobile Network Software-Defined Mobile Network


*Based on information from Open Networking Foundation (ONF)
SDWMN- Use Case: Mobile Traffic
Management
ANDSF – access
network discovery
and service function

Mobile traffic offloading based on OpenFlow


*Based on information from Open Networking Foundation (ONF)
Key Benefits
 Centralized control of devices manufactured by multiple
vendors
 Higher rate of integration of new services
 Abstracted network control and management
 Network abstracted from the user
Rule Placement at Access Devices
 Challenges
 General OpenFlow does not support wireless network
 Modified version of OpenFlow is required
 Typically, users are mobile in nature – network is highly dynamic
 Frequent changes in rule placement is also required
 Presence of heterogeneous devices in the network
 How to support such heterogeneous devices in a single platform
Approaches
 ODIN
 Ubi-Flow
 Mobi-Flow
ODIN I
 An agent is placed at access points to communicate with
controller
 Two components are present
 Odin agent – placed with the physical devices
 Odin master – placed at the controller end

Source: Towards Programmable Enterprise WLANs with Odin, ACM HotSDN 2012
ODIN II
 Conversion of 802.11
 LVAP – Light virtual AP
Ubi-Flow I
 Mobility management in SDIoT
 Scalable control of the APs
 Fault tolerance
 Flow-Scheduling
 Network partition
 Network matching
 Load balancing

Source: UbiFlow: Mobility Management in Urban-scale Software Defined IoT, IEEE INFOCOM 2015
Mobi-Flow I
 Mobility-aware flow-rule placement in SDIoT

Source: Mobility-Aware Flow-Table Implementation in Software-Defined IoT, IEEE GLOBECOM 2016


Mobi-Flow II
 Proactive rule placement depending on users’ movement in
the network
 Approach
 Predict location of end-users at (t+1) time, while the users are at (t)
time
 Place flow-rules at the APs which can be associated to the users based
on their predicted locations
Mobi-Flow III
 Location prediction
 Order-K Markov predictor – takes last k-th location instances to predict
next location
 Flow-rule placement
 Linear programming can be used to select optimal AP
Mobi-Flow IV

Message Overhead in the Network Energy consumption in the Network

Control message overhead and energy consumption can be minimized significantly


using Mobi-Flow compared to the conventional flow-rule placement schemes.
Rule Placement at Backbone
Network
 Existing rule placement schemes for wired network can be
used
 Load balancing is an important issue due to the dynamic
nature of the IoT network
 Dynamic resource allocation can also be integrated
Data Center Networking
 Mice-Flow – Wildcard rules can be placed to deal with mice-
flows
 Elephant Flow – Exact match rules are useful

 We need to classify the flows before inserting flow-rules at


the switches to adequately forward them in the network
Anomaly Detection in IoT Network

 Monitor the network through OpenFlow to detect any


anomaly in the network
 This can be done by monitoring each flow in the network
 We can also collect the port statistics of the switches
 If there is any anomaly, it may generate large number of packets in the
network – it can be detected by monitoring the flows
Experimenting with Wireless Network

 Mininet-WiFi
 Can be used to deploy a network
 Supports both wired and wireless network
 Wired – Ethernet protocol
 Wireless – WiFi protocol (IEEE 802.11 group)
 ONOS
 Can be used to place the controllers
Summary
 SDN-based solution approaches are useful to deal with
different challenges present in an IoT network
 Issues with interference and mobility can be managed using
software-defined approaches
Data Handling and Analytics – Part I
What is Data Handling
 Data handling
 Ensures that research data is stored, archived or disposed off in a
safe and secure manner during and after the conclusion of a
research project
 Includes the development of policies and procedures to manage
data handled electronically as well as through non‐electronic
means.

 In recent days, most data concern –


o Big Data
o Due to heavy traffic generated by IoT devices
o Huge amount of data generated by the deployed sensors
What is Big Data
 “Big data technologies describe a new generation of technologies
and architectures, designed to economically extract value from very
large volumes of a wide variety of data, by enabling the high-
velocity capture, discovery, and/or analysis.”
[Report of International Data Corporation (IDC)]

 “Big data shall mean the data of which the data volume,
acquisition speed, or data representation limits the capacity of
using traditional relational methods to conduct effective analysis
or the data which may be effectively processed with important
horizontal zoom technologies.”
[National Institute of Standards and Technology (NIST)]
Types of Data
 Structured data
 Data that can be easily organized.
 Usually stored in relational databases.
 Structured Query Language (SQL) manages structured data in
databases.
 It accounts for only 20% of the total available data today in the
world.

 Unstructured data
 Information that do not possess any pre‐defined model.
 Traditional RDBMSs are unable to process unstructured data.
 Enhances the ability to provide better insight to huge datasets.
 It accounts for 80% of the total data available today in the world.
Characteristics of Big Data
Big Data is characterized by 7 Vs –
 Volume
 Velocity
 Variety
 Variability
 Veracity
 Visualization
 Value 5
Characteristics of Big Data (Contd.)

 Volume
 Quantity of data that is generated
 Sources of data are added continuously

Example of volume ‐
• 30TB of images will be generated every night from the Large
Synoptic Survey Telescope (LSST)
• 72 hours of video are uploaded to YouTube every minute 6
Characteristics of Big Data (Contd.)
 Velocity
 Refers to the speed of generation of data
 Data processing time decreasing day‐by‐day in order to provide
real‐time services
 Older batch processing technology is unable to handle high
velocity of data

Example of velocity –
 140 million tweets per day on average (according to a survey conducted
in 2011)
 New York Stock Exchange captures 1TB of trade information during
each trading session 7
Characteristics of Big Data (Contd.)

 Variety
 Refers to the category to which the data belongs
 No restriction over the input data formats
 Data mostly unstructured or semi‐structured

Example of variety –
Pure text, images, audio, video, web, GPS data, sensor data,
SMS, documents, PDFs, flash etc.
8
Characteristics of Big Data (Contd.)
 Variability
 Refers to data whose meaning is constantly changing.
 Meaning of the data depends on the context.
 Data appear as an indecipherable mass without structure
Example:
Language processing, Hashtags, Geo‐spatial data, Multimedia, Sensor events
 Veracity
 Veracity refers to the biases, noise and abnormality in data.
 It is important in programs that involve automated decision‐
making, or feeding the data into an unsupervised machine
learning algorithm.
 Veracity isn’t just about data quality, it’s about data 9

understandability.
Characteristics of Big Data (Contd.)
 Visualization
 Presentation of data in a pictorial or graphical format
 Enables decision makers to see analytics presented visually
 Identify new patterns

 Value
 It means extracting useful business information from scattered
data.
 Includes a large volume and variety of data
 Easy to access and delivers quality analytics that enables informed
decisions 10
Data Handling Technologies
 Cloud computing
Essential characteristics according to NIST
 On‐demand self service
 Broad network access
 Resource pooling
 Rapid elasticity
 Measured service
Basic service models provided by cloud computing
 Infrastructure‐as‐a‐Service (IaaS)
 Platform‐as‐a‐Service (PaaS)
11
 Software‐as‐a‐Service (SaaS)
Data Handling Technologies (Contd.)
Internet of Things (IoT)
 According to Techopedia, IoT “describes a future where
every day physical objects will be connected to the
internet and will be able to identify themselves to other
devices.”

 Sensors embedded into various devices and machines and


deployed into fields.

 Sensors transmit sensed data to remote servers via 12

Internet.
Data Handling Technologies (Contd.)
 Data handling at data centers
 Storing, managing, and organizing data.
 Estimates and provides necessary processing capacity.
 Provides sufficient network infrastructure.
 Effectively manages energy consumption.
 Replicates data to keep backup.
 Develop business oriented strategic solutions from big
data.
 Helps business personnel to analyze existing data.
 Discovers problems in business operations. 14
Flow of Data

Generation Acquisition Storage Analysis

 Enterprise data  Data collection  Hadoop  Bloom filter


 IoT data  Data transportation  MapReduce  Parallel computing
 Bio‐medical data  Data pre‐processing  NoSQL databases  Hashing and
 Other data indexing

15
Data Sources
 Enterprise data  Bio‐medical data
 Online trading and analysis  Masses of data generated
data. by gene sequencing.
 Production and inventory data.  Data from medical clinics and
 Sales and other financial data. medical R&Ds.
 IoT data  Other fields
 Data from industry,  Fields such as – computational
agriculture, traffic, biology, astronomy, nuclear
transportation research etc
 Medical‐care data,
 Data from public departments,
and families.
16
Data Acquisition
 Data collection
 Log files or record files that are automatically generated by data sources
to record activities for further analysis.
 Sensory data such as sound wave, voice, vibration, automobile, chemical,
current, weather, pressure, temperature etc.
 Complex and variety of data collection through mobile devices. E.g. –
geographical location, 2D barcodes, pictures, videos etc.
 Data transmission
 After collecting data, it will be transferred to storage system for further
processing and analysis of the data.
 Data transmission can be categorized as – Inter‐DCN transmission and
Intra‐DCN transmission.

17
Data Acquisition (Contd.)

 Data pre‐processing
 Collected datasets suffer from noise, redundancy, inconsistency
etc., thus, pre‐ processing of data is necessary.
 Pre‐processing of relational data mainly follows – integration,
cleaning, and redundancy mitigation
 Integration is combining data from various sources and provides users
with a uniform view of data.
 Cleaning is identifying inaccurate, incomplete, or unreasonable
data, and then modifying or deleting such data.
 Redundancy mitigation is eliminating data repetition through
detection, filtering and compression of data to avoid unnecessary
18
transmission.
Data Storage
 File system
 Distributed file systems that store massive data and ensure – consistency,
availability, and fault tolerance of data.
 GFS is a notable example of distributed file system that supports large‐
scale file system, though it’s performance is limited in case of small files
 Hadoop Distributed File System (HDFS) and Kosmosfs are other notable file
systems, derived from the open source codes of GFS.
 Databases
 Emergence of non‐traditional relational databases (NoSQL) in order to deal
with the characteristics that big data possess.
 Three main NoSQL databases – Key‐value databases, column‐oriented
databases, and document‐oriented databases.

19
Data Handling Using Hadoop
Reliable, scalable, distributed data handling

20
What is Hadoop

 Hadoop is a software framework for


distributed processing of large datasets
across large clusters of computers.
 Hadoop is open-source implementation for
Google ‘s GFS and MapReduce.
 Apache Hadoop's Map Reduce and Hadoop
Distributed File System (HDFS)
components originally derived respectively
from Google's MapReduce and Google File
System (GFS) .
Source: https://
www.cloudnloud.com/hadoop-hdfs-operations/

21
Building Blocks of Hadoop
 Hadoop Common
 A module containing the utilities that support the other Hadoop components
 Hadoop Distributed File System (HDFS)
 Provides reliable data storage and access across the nodes
 MapReduce
 Framework for applications that process large amount of datasets in parallel.
 Yet Another Resource Negotiator (YARN)
 Next‐generation MapReduce, which assigns CPU, memory and storage to applications
running on a Hadoop cluster.

22
Hadoop Distributed File System (HDFS)
 Centralized node
 Namenode
 Maintains metadata info about files

 Distributed node
 Datanode
 Store the actual data
 Files are divided into blocks
 Each block is replicated
Source:
http://hadoop.apache.org/docs/r1.2.1/hdfs_design.html

23
Name and Data Nodes
 Namenode
 Stores filesystem metadata.
 Maintains two in‐memory tables, to map the datanodes to the blocks, and vice versa

 Datanode
 Stores actual data
 Data nodes can talk to each other to rebalance and replicate data
 Data nodes update the namenode with the block information periodically
 Before updating datanodes verify the checksums.

24
Job and Task Tra ckers
 Job Tracker –
 Runs with the Namenode
 Receives the user’s job
 Decides on how many tasks will run (number
of mappers)
 Decides on where to run each mapper
(concept of locality)
 Task Tracker –
 Runs on each datanode
 Receives the task from Job Tracker
 Always in communication with the Job
Source:
Tracker reporting progress http://developeriq.in/articles/2015/aug/11/an-introduction-to-

apache-hadoop-for-big-data/

25
Hadoop Master / Slave Architecture
 Master‐slave shared‐nothing architecture
 Master
 Executes operations like opening, closing,
and renaming files and directories.
 Determines the mapping of blocks to
Datanodes.
 Slave
 Serves read and write requests from the
file system’s clients.
 Performs block creation, deletion, and
replication as instructed by the Namenode.
Source:
http://ankitasblogger.blogspot.in/2011/01/hadoop-cluster-setup.html

26
Data Handling and Analytics – Part II

14
0
What is Data Analytics
 “Data analytics (DA) is the process of examining data sets in
order to draw conclusions about the information they
contain, increasingly with the aid of specialized systems and
software. Data analytics technologies and techniques are
widely used in commercial industries to enable organizations
to make more‐ informed business decisions and by scientists
and researchers to verify or disprove scientific models,
theories and hypotheses.”
[An admin's guide to AWS data management]
Types of Data Analysis
 Qualitative Analysis
o Deals with the analysis of data that is categorical in nature

 Quantitative Analysis
o Quantitative analysis refers to the process by which numerical
data is analyzed
Qualitative Analysis
 Data is not described through numerical values
 Described by some sort of descriptive context such as text
 Data can be gathered by many methods such as interviews, videos and
audio recordings, field notes
 Data needs to be interpreted
 The grouping of data into identifiable themes
 Qualitative analysis can be summarized by three basic principles :
 Notice things
 Collect things
 Think about things
Quantitative Analysis
 Quantitative analysis refers to the process by which numerical data is
analyzed
 Involves descriptive statistics such as mean, median, standard deviation
 The following are often involved with quantitative analysis:
 Statistical models  Regression analysis
 Analysis of variables  Statistical significance
 Data dispersion  Precision
 Analysis of relationships between variables  Error limits
 Contingence and correlation
Comparison
Qualitative Data Quantitative Data
Data is observed Data is measured

Involves descriptions Involves numbers

Emphasis is on quality Emphasis is on quantity

Examples are color, smell, taste, etc. Examples are volume, weight, etc.
Advantages
 Allows for the identification of important (and often mission‐
critical) trends
 Helps businesses identify performance problems that require
some sort of action
 Can be viewed in a visual manner, which leads to faster and
better decisions
 Better awareness regarding the habits of potential customers
 It can provide a company with an edge over their competitors
Statistic al models
 The statistical model is defined as the mathematical equation that
are formulated in the form of relationships between variables.

 A statistical model illustrates how a set of random variables is


related to another set of random variables.

 A statistical model is represented as the ordered pair (X , P)


 X denotes the set of all possible observations
 P refers to the set of probability distributions on X
Statistic al models (Contd.)
 Statistical models are broadly categorized as
 Complete models
 Incomplete models

 Complete model does have the number of variables equal


to the number of equations

 An incomplete model does not have the same number of


variables as the number of equations
Statistic al models (Contd.)
 In order to build a statistical model
 Data Gathering
 Descriptive Methods
 Thinking about Predictors
 Building of model
 Interpreting the Results
Analysis of variance
 Analysis of Variance (ANOVA) is a parametric statistical
technique used to compare datasets.
 ANOVA is best applied where more than 2 populations or
samples are meant to be compared.
 To perform an ANOVA, we must have a continuous response
variable and at least one categorical factor (e.g. age, gender)
with two or more levels (e.g. Locations 1, 2)
 ANOVAs require data from approximately normally distributed
populations
Analysis of variance (Contd.)
 Properties to perform ANOVA –
Independence of case
 The sample should be selected randomly
 There should not be any pattern in the selection of the
sample
Normality
 Distribution of each group should be normal
Homogeneity
 Variance between the groups should be the same (e.g.
should not compare data from cities with those from
slums)
Analysis of variance (Contd.)
 Analysis of variance (ANOVA) has three types:
One way analysis
oOne fixed factor (levels set by investigator). Factors: age,
gender, etc.
 Two way analysis
oFactor variables are more than two
 K‐way analysis
oFactor variables are k
15
2
Analysis of variance (Contd.)
 Total Sum of square
 In statistical data analysis, the total sum of squares (TSS or SST) is a quantity that
appears as part of a standard way of presenting results of such analyses. It is defined
as being the sum, over all observations, of the squared differences of each observation
from the overall mean.
 F –ratio
 Helps to understand the ratio of variance between two data sets
 The F ratio is approximately 1.0 when the null hypothesis is true and is greater than
1.0 when the null hypothesis is false.
 Degree of freedom
 Factors which have no effect on the variance
 The number of degrees of freedom is the number of values in the final calculation of a
statistic that are free to vary.

15
3
Data dispersion
 A measure of statistical dispersion is a nonnegative real number that is zero if all
the data are the same and increases as the data becomes more diverse.

 Examples of dispersion measures:


 Range
 Average absolute deviation
 Variance and Standard deviation

15
4
Data dispersion (Contd.)
 Range
 The range is calculated by simply taking the difference between the maximum and
minimum values in the data set.
 Average absolute deviation
 The average absolute deviation (or mean absolute deviation) of a data set is the average of the
absolute deviations from the mean.
 Variance
 Variance is the expectation of the squared deviation of a random variable from its mean
 Standard deviation
 Standard deviation (SD) is a measure that is used to quantify the amount of variation
or dispersion of a set of data values

15
5
Contingence and correlation
 In statistics, a contingency table (also known as a cross tabulation or crosstab) is a
type of table in a matrix format that displays the (multivariate) frequency
distribution of the variables.

 Provides a basic picture of the interrelation between two variables

 A crucial problem of multivariate statistics is finding (direct‐)dependence structure


underlying the variables contained in high‐dimensional contingency tables

15
6
Contingence and correlation (Contd.)
 Correlation is a technique for investigating the relationship between two
quantitative, continuous variables

 Pearson's correlation coefficient (r) is a measure of the strength of the association


between the two variables.

 Correlations are useful because they can indicate a predictive relationship that can be
exploited in practice

15
7
Regression analysis
 In statistical modeling, regression analysis is a statistical process for estimating the
relationships among variables

 Focuses on the relationship between a dependent variable and one or more


independent variables

 Regression analysis estimates the conditional expectation of the dependent


variable given the independent variables

15
8
Regression analysis (Contd.)
 The estimation target is a function of the independent variables called the
regression function
 Characterize the variation of the dependent variable around the regression
function which can be described by a probability distribution
 Regression analysis is widely used for prediction and forecasting, where its use has
substantial overlap with the field of machine learning
 Regression analysis is also used to understand which among the independent
variables are related to the dependent variable

15
9
Statistic al signific ance
 Statistical significance is the likelihood that the difference in conversion rates
between a given variation and the baseline is not due to random chance

 Statistical significance level reflects the risk tolerance and confidence level

 There are two key variables that go into determining statistical significance:
 Sample size
 Effect size

16
0
Statistic al signific ance (Contd.)
 Sample size refers to the sample size of the experiment

 The larger your sample size, the more confident you can be in the result of the
experiment (assuming that it is a randomized sample)

 The effect size is just the standardized mean difference between the two groups

 If a particular experiment replicated, the different effect size estimates from each
study can easily be combined to give an overall best estimate of the effect size

16
1
Precision and Error limits
 Precision refers to how close estimates from different samples are to each other

 The standard error is a measure of precision

 When the standard error is small, estimates from different samples will be close in
value and vice versa

 Precision is inversely related to standard error

16
2
Precision and Error limits (Contd.)
 The limits of error are the maximum overestimate and the maximum underestimate
from the combination of the sampling and the non‐sampling errors

 The margin of error is defined as –


 Limit of error = Critical value x Standard deviation of the statistic
 Critical value: Determines the tolerance level of error.

16
3

You might also like