Split Architecture for Large Scale Wide Area Networks

SPARC ICT-258457 Deliverable D3.3 Split Architecture for Large Scale Wide Area Networks Editors: Wolfgang John; Ericsson (EA) Deliverable type: Report (R) Dissemination level: (Confidentiality) Public (PU) Contractual delivery date: M26 Actual delivery date: M27 Version: 1.0 Total number of pages: 129 Keywords: Split Architecture, OpenFlow, Carrier-Grade, Network Virtualization, Resiliency, OAM, QoS, Service Creation, Scalability, Energy-Efficient Networking, Multi-Layer Networking Abstract This deliverable defines a carrier-grade split architecture based on requirements identified during the SPARC project. It presents the SplitArchitecture proposal, the SPARC concept for Software Defined Networking (SDN) introduced for large-scale wide area networks such as access/aggregation networks, and evaluates technical issues against architectural trade-offs. First we present the control and management architecture of the proposed SplitArchitecture. Here, we discusses a recursive control architecture consisting of hierarchically stacked control planes and provide initial considerations regarding network management integration to SDN in general and SplitArchitecture in particular. Next, OpenFlow extensions to support the carrier-grade SplitArchitecture are discussed. These are: a) Openness & Extensibility, extending OpenFlow with more advanced processing functionalities on both data and control planes; b) Virtualization, enabling a flexible way of partitioning the network into virtual networks while providing full isolation between these partitions; c) OAM, presenting a solution for both technology-specific OAM (MPLS BFD) and a novel technology agnostic flow OAM; d) Resiliency approaches for surviving link failures or failures of controller or forwarding elements; e) Bootstrapping and topology discovery issues, discussing current discovery of network devices and their interconnections; f) Service creation solutions for integration of residential customer services in the form of PPP, and business customer services in form of pseudo-wires (PWE); g) Energy-efficient networking approaches for energy savings and requirements on the OpenFlow protocol; h) QoS aspects, showing how traditional QoS tools such as packet classification, metering, coloring, policing, shaping and scheduling can be realized in an OpenFlow environment; (i) and Multilayer aspects outlining different stages of packet-optical integration. In addition, we discuss selected deployment and adoption scenarios faced by modern operator networks, such as service creation scenarios and peering aspects, i.e., how to interconnect with legacy networks. Finally, we indicate how our SplitArchitecture approach meets carrier grade scalability requirements in access/aggregation network scenarios. WP3, Deliverable 3.3 Split Architecture - SPARC Disclaimer This document contains material which is the copyright of certain SPARC consortium parties and may not be reproduced or copied without permission. In the case of Public (PU): All SPARC consortium parties have agreed to full publication of this document. In the case of Restricted to Program (PP): All SPARC consortium parties have agreed to make this document available on request to other framework program participants. In the case of Restricted to Group (RE): All SPARC consortium parties have agreed to full publication of this document. However this document is written for / being used by an <organization / other project / company, etc.> as <a contribution to standardization / material for consideration in product development, etc.>. In the case of Consortium Confidential (CO): The information contained in this document is the proprietary confidential information of the SPARC consortium and may not be disclosed except in accordance with the consortium agreement. The commercial use of any information contained in this document may require a license from the proprietor of that information. Neither the SPARC consortium as a whole, nor any specific party of the SPARC consortium warrant that the information contained in this document is acceptable for use, nor that use of the information is free from risk, and accepts no liability for any loss or damage suffered by any person or institution using this information. Imprint [Project title] [Short title] [Number and title of work package] [Document title] [Editors] [Work package leader] [Task leader] Split Architecture for carrier grade networks SPARC WP3 – Architecture Split Architecture for Large- Scale Wide Area Networks Wolfgang John Wolfgang John Wolfgang John Copyright notice © 2012 Participants in project SPARC (as specified in the Partner and Author list on page 7) © SPARC consortium 2012 Page 2 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Executive summary In this deliverable, we present our final proposal for a carrier-grade split architecture, introducing Software Defined Networking (SDN) for large-scale wide area networks. Based on our conclusions from D3.1, we focus on OpenFlow as an enabling technology our proposed SplitArchitecture. However, earlier SPARC deliverables (D2.1 and D3.1) made it clear that current OpenFlow implementations do not fulfill carrier requirements. Thus, this deliverable presents a suitable framework for controlling carrier-grade operator networks and investigates missing features in OpenFlow as identified during the SPARC project in the context of access/aggregation network scenarios. The overall conclusion of the SPARC project is that it is technically feasible to apply an OpenFlow-based split architecture to the carrier domain. This novel architecture paradigm promises improved network design and operation in large-scale networks with millions of customers, offering high levels of availability, flexibility and automation. The results of this project prove that these promises are valid and definitely deserve further attention by the networking industry in general and telecommunications operators in particular. The figure below depicts a high-level representation of the main building blocks considered in our SplitArchitecture. In addition to the split between data and control planes, as commonly discussed in the context of SDN, we will also discuss a split of the data plane into forwarding and processing. Furthermore, we provide initial considerations of how network management-related functions can be integrated into the SplitArchitecture. Schematic SplitArchitecture Overview SPARC Control and Management Architecture (Section 4) The requirements for a split architecture solution as defined in WP2, include the key aspects of support for virtualization of network resources in order to allow sharing of infrastructure among multiple operators; and increased flexibility by allowing deployment of new services in parallel to existing legacy protocol stacks. Furthermore, common carrier-grade requirements still apply, including service isolation, QoS enforcement, NMS integration, OAM provisioning, resiliency measures and scalability. To meet this diverse set of requirements, in our SplitArchitecture proposal the control functions are organized as a stack of control planes, connected to each other via OpenFlow (see figure on schematic SplitArchitecture depicted above). Each control plane consumes services from lower control planes and acts as a controller according to the OpenFlow terminology, and, at the same time offers services to control blocks in higher planes and acts as a datapath element in this role. To facilitate this architecture, we define extensions to the OpenFlow protocol with flowspace management, which allows a control plane to express those parts of the overall flowspace it is actually willing to control. Additionally, we introduce a more generalized transport endpoint concept that maps onto the OpenFlow definition of “port” for datapath elements and supports transport endpoints on different planes in the stacked control layer. A control plane contains several functional sub-blocks. First, an internal control block hosts the backplane control logic providing connectivity services within the sub-domain under control. Second, an external control block contains the protocol logic, which interacts with external peer control entities which manage other sub-domains. Finally, an optional interface might expose transport endpoints offered by this control entity via the OpenFlow protocol. © SPARC consortium 2012 Page 3 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC As mentioned, we define a recursive control layer by stacking control planes recursively. This hierarchical carrier-grade control architecture enables operators to deploy several control planes with minimal interference and assign flows dynamically based on given policies. Furthermore, it gives a network operator tight control over the level of detail of which data plane details are exposed to higher control planes or to third parties. We provide considerations with regard to a suitable management framework flanking the controller architecture as well. When traditional network management definitions are applied to a generic SDN model with both a centralized control plane and a centralized management plane, we conclude that it is difficult to differentiate precisely between control and management functions in the context of SDN. Therefore, we present a proposal of how to integrate network management and control with the flexibility to choose whether to place network management functions (NMF) within an SDN controller or a separate network management system (NMS). Final functional assignment should be done on a by-case basis, depending on the exact scenario and use case in question. Parameters affecting such an analysis include the scale of the network (number of devices and geographical spread), existing legacy infrastructure in place, type of transport technologies in use, and type of services to be supported. Thus, in certain scenarios, either the controller based NMF or the external NMS could be designed minimalistically or even be omitted. In this deliverable, we present a way to handle the design choice of where to place network management functions. In our recommendations for carrier grade networks, we used timeliness and automatic configuration as the differentiators between control and management functions. However, depending on the specific use case and the technology used, the results of such an assessment might look quite different from case to case. SPARC extensions to OpenFlow and network element functions beyond forwarding (Section 5) In the section above, we outline a scalable, flexible control and management architecture. However, the network elements and their control interface (i.e. OpenFlow) will also require extensions, since we can conclude from our earlier deliverables (D2.1 and D3.1) that current OpenFlow-based implementations and standards do not fulfill carrier requirements. Below, we summarize the OpenFlow extensions discussed in this document. All these protocol extensions also imply additional functionality on the network elements that go beyond pure packet and flow forwarding:    Openness and Extensibility: firstly, we suggest extensions required to implement the proposed recursive control plane, specifically a more flexible port management by replacing OpenFlow’s current physical port model with a generalized transport endpoint model. Furthermore, we realized that more advanced processing functionalities at the data plane are desirable in many situations. OpenFlow currently provides processing of packets through stateless, lightweight actions (e.g., pushing tags, updating header fields, etc.). For statefull processing (i.e. taking proceeding packets of the flow into account), we propose using separate processing instances on a datapath element, which can be addressed by a lightweight “process” action type. Finally, we also revisit virtual ports and discuss their applicability for the use case of OAM, which requires execution of parts of the state machine directly on the datapath to meet strict timing constraints. In this final case, we specifically propose two types of virtual ports: a pre-/post-filter virtual port attached to a physical port, for cases in which we want to avoid passing OAM messages through the datapath element’s forwarding engine (e.g., for connectivity checks); and a terminating virtual port, which terminates OAM messages that actually traverse the entire forwarding engine and thus also tests flow table entries (fate sharing). Virtualization: a crucial feature of future carrier-grade networks is network virtualization, enabling multi-service and multi-operator scenarios on a shared physical network infrastructure. In this deliverable, we propose improvements to the reliability, isolation and automation aspects of an OpenFlow-based network virtualization system. Regarding isolation, we identify the current lack of QoS primitives in OpenFlow as a major barrier. We also propose extensions to enable automation of virtual network setup and management. OAM: one important requirement for carrier-grade networks is the availability of proper OAM solutions. Regarding OAM integration in OpenFlow, we identified two contradicting aspects: on the one hand, integrating existing OAM tools (e.g., Ethernet or MPLS OAM) provides the desired compatibility with legacy OAM toolsets and offers wellknown standard functionalities. On the other hand, integrating existing OAM tools in an OpenFlow environment requires integrating several technology-specific toolsets, which will substantially increase the complexity of datapath elements. In this deliverable, we discuss both aspects: a technology-dependent OAM solution and a novel, technology-agnostic generic flow OAM solution. As the technology-dependent OAM example, we detail how to implement an MPLS BFD based continuity check for both OpenFlow versions 1.0 and 1.1. For a technologyagnostic flow OAM, we propose a generic OAM module as a separate process on each datapath element. To still ensure fate sharing of OAM and data traffic, we suggest a virtual data packet approach that allows the OAM tool to test the entire forwarding engine. © SPARC consortium 2012 Page 4 of 129 WP3, Deliverable 3.3       Split Architecture - SPARC Network Resiliency: resiliency and reliability are important requirements for carrier-grade networks. In this deliverable, we first study mechanisms to ensure data plane resiliency in an OpenFlow scenario. We show how data plane resiliency can be realized through rerouting, restoration and protection. We also discuss control plane resiliency by outlining scenarios of how out-of-band and in-band control networks can be used together to achieve increased robustness in the control network. Control Channel Bootstrapping and Topology Discovery: current OpenFlow specifications do not describe how initial address assignment and control channel setup are performed. This is especially challenging in case of an inband control network, since the datapath elements need to be able to establish IP connectivity towards the network control in the absence of a node configuration protocol. We will propose a method that facilitates automatic bootstrapping of datapath elements in such an in-band control case. Furthermore, we will present our extensions to the topology discovery module that is implemented in the NOX controller. Service Creation: we define service creation as the configuration process of network functions at service creation points within (or at the border of) the access/aggregation network to provide services to various types of customers. Besides connectivity, the configured functions include, among others, authentication, authorization and accounting (AAA) aspects. We propose integration of residential customer services in the form of PPP and business customer services in the form of pseudowires (PWE) in an OpenFlow-controlled operator network. Energy-Efficient Networking: centralized control software like OpenFlow offers additional options for reducing network energy consumption. We discuss possible energy saving approaches and functions (e.g., network topology optimization, burst mode operation, adaptive link rate). We then propose two sets of extensions for energy efficiency: one set of functions relating to port features (e.g., switching them on/off, enabling and configuring energy-efficiency functions etc.); and another set relating to configuration and monitoring of components of the switch itself (e.g., switch internal power management, switch temperature monitoring etc.). QoS: support for QoS mechanisms is generally considered a key requirement for carrier-grade networks. Furthermore, QoS plays an even more important role in a virtualized, multi-provider, multi-service operator network enabled by SplitArchitecture. In this deliverable, we show how traditional QoS tools such as packet classification, metering, coloring, policing, shaping and scheduling can be realized in an OpenFlow environment. Multilayer Aspects: we discuss the extension of OpenFlow toward control of configurable transport technologies, like wavelength and TDM switched networks. We used the example of circuit switched optical layers, i.e., packetoptical integration. We first discuss the additional requirements placed on a control framework by optical network equipment. We then outline possible phases for realizing packet-optical integration in an OpenFlow environment. Finally, we detail a proposal for GMPLS-aware multi-layer/multi-region extensions to OpenFlow. Implementation scenarios of the SPARC SplitArchitecture (Section 6) We differentiate between four types of how to integrate an OpenFlow-based SplitArchitecture into carrier gradenetworks: 1. Basic emulation of transport service: OpenFlow data and control plane emulates and replaces legacy transport technologies (e.g., Ethernet, MPLS). 2. Enhanced emulation of transport services: As in 1., OpenFlow is used to provide transport services. Additional features and functions are added to both the data and control planes in order to comply with carrier-grade requirements. 3. Service node virtualization: In addition to transport services, OpenFlow also takes control of (distributed) service node functionalities, including service creation, authentication and authorization. 4. All-OpenFlow network: OpenFlow also controls other network domains, e.g., customer premises (RGWs) and the operator’s core domain. Considering that current OpenFlow 1.x is sufficient to provide integration type 1, the main focus of this deliverable is on integration type 2, i.e., studying possible extensions to OpenFlow in order to fulfill carrier-grade requirements, but we are also starting to touch upon integration type 3 in some cases (specifically in Sections 5.6 and 6.2 about service creation). In Section 6, we present implementation scenarios of the proposed carrier-grade SplitArchitecture in an operator network, specifically in access/aggregation networks. Besides OpenFlow-controlled transport connectivity, we propose three evolutionary approaches for OpenFlow integration of service creation functions in access/aggregation networks, depending on which elements and functionalities are to be controlled by OpenFlow. These approaches include centralized OpenFlow control of a single service creation point at the IP edge only (e.g., BRAS), a decentralized model expanding OpenFlow control to aggregation devices (e.g., DSLAM), and finally a complete OpenFlow controlled approach including all devices in the access/aggregation domain. © SPARC consortium 2012 Page 5 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Regarding a decentralized OpenFlow control model, we further discuss general implementation options for service creation. In addition to the need for legacy support for PPPoE, we identify another important requirement on OpenFlow improvements as seen from SPARC: the decision logic for authentication and authorization, which we will discuss specifically in order to outline the requirements and the potential options in more detail. As examples of residential customer services, we outline the implementation of a “SPARC BRAS” and a “SPARC DHCP++”. The “SPARC BRAS” transforms current service creation design into the SPARC design principle of the split between forwarding and processing. This discussion includes a detailed analysis of how OpenFlow fulfills the requirements for a BRAS/BNG as defined by the Broadband Forums TR-101 specification. The “SPARC DHCP++” essentially introduces a new set of protocols that needs to be supported in the OpenFlow environment. In many scenarios, OpenFlow typically does not control all parts of the network structure. As an example, access/aggregation networks might be realized in a SplitArchitecture design, whereas the core network segment to which these networks connect still has legacy IP/MPLS as the predominate technology. This leads to an additional requirement: The SplitArchitecture domain must cooperate with legacy control planes in a suitable peering or horizontal interworking model. We discuss different options for connecting domains controlled by the SPARC control framework to legacy IP/MPLS control planes. As a result, we propose to update the controller architecture with a generalized network visor within control planes that provide a virtual router model of their part of the SplitArchitecture domain. An NNI protocol proxy can then use this virtual router to steer the communication with legacy control planes (e.g. the IP/MPLS), using relevant legacy protocols (OSPF-TE, LDP, BGP, etc.) that run as part of the controller. Finally, we examined the feasibility of a SplitArchitecture with a numerical scalability study based on an idealized deployment model of an access/aggregation network. The results show that there are no stability concerns for static scenarios. The resulting requirements from the numerical model are in the order of magnitude or even below the capabilities of existing control and datapath devices. However, dynamic behavior (e.g., reconfiguration due to link failures) might raise scalability concerns, especially when strict time constraints exist. However, we can show that changing the connection structures through more careful network planning can increase scalability significantly even in the face of dynamic behavior. © SPARC consortium 2012 Page 6 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC List of partners and authors Organization/Company Authors DTAG Mario Kind, Steffen Topp, F.-Joachim Westphal, Andreas Gladisch EICT Andreas Köpsel, Hagen Woesner EAB Wolfgang John, Zhemin Ding, Alisa Devlic ACREO Pontus Sköldström, Viktor Nordell ETH András Kern, David Jocha, Attila Takacs IBBT Dimitri Staessens, Sachin Sharma © SPARC consortium 2012 Page 7 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Table of contents Executive summary ........................................................................................................................................... 3 List of authors .................................................................................................................................................... 7 Table of contents ............................................................................................................................................... 8 List of figures and/or list of tables ................................................................................................................... 10 1 Introduction .............................................................................................................................................. 12 1.1 Project Context ................................................................................................................................ 12 1.2 Relation to Other Work Packages ................................................................................................... 12 1.3 Scope of the Deliverable.................................................................................................................. 12 1.4 Report Outline ................................................................................................................................. 12 2 Review of SPARC use-case access/aggregation network ........................................................................ 13 2.1 Refinement of Requirements ........................................................................................................... 13 2.2 Summary of Requirements .............................................................................................................. 15 3 Introduction to SplitArchitecture.............................................................................................................. 16 3.1 State of the art .................................................................................................................................. 16 3.1.1 Visions and specifications of the ONF ........................................................................................ 17 3.1.2 Evolution and status of the OpenFlow protocol .......................................................................... 17 3.1.3 SDN model and status of OF-Config........................................................................................... 18 3.1.4 Vision of SDN in Academia ........................................................................................................ 19 3.2 SPARC Vision on SplitArchitecture ............................................................................................... 20 4 Carrier-Grade Control and Management Architecture ............................................................................. 22 4.1 A recursive control plane for SplitArchitecture............................................................................... 22 4.1.1 Controlling a single network element using SDN ....................................................................... 23 4.1.2 Controlling multiple network elements with a single controller ................................................. 24 4.1.3 Mapping the carrier-grade Architecture on the SPARC implementations .................................. 26 4.1.4 Flowspace Management .............................................................................................................. 27 4.1.5 In-depth recursive controller architecture .................................................................................... 28 4.2 Introducing management functions to SplitArchitecture ................................................................. 29 4.2.1 Definition of Network Management ............................................................................................ 29 4.2.2 Modern view of transport networks ............................................................................................. 30 4.2.3 Evolution of network management for different architectures .................................................... 31 4.2.4 Network management for SDN ................................................................................................... 32 4.2.5 SPARC management integration proposal .................................................................................. 33 4.2.6 Analyzing the placement of Network Management functions .................................................... 35 4.2.7 Combined SPARC network management and recursive control framework .............................. 37 5 OpenFlow Extensions for Carrier-Grade SplitArchitecture ..................................................................... 40 5.1 Openness and Extensibility ............................................................................................................. 40 5.1.1 Extensions for a Recursive Architecture ..................................................................................... 40 5.1.2 The Various Processing Types .................................................................................................... 42 5.1.3 State-full Packet Processing and Action Process......................................................................... 42 5.1.4 Advanced Processing using Virtual Ports.................................................................................... 43 5.1.5 Split State Machines and an Event/Action API ........................................................................... 44 5.1.6 Defining New Action Types at Run-Time................................................................................... 45 5.2 Virtualization and Isolation ............................................................................................................. 46 5.2.1 What is network virtualization? ................................................................................................... 46 5.2.2 Requirements for the virtualization system ................................................................................. 47 5.2.3 Customer traffic mapping ............................................................................................................ 49 5.2.4 Network virtualization techniques for OpenFlow ....................................................................... 49 5.2.5 Improvement proposal ................................................................................................................. 53 5.2.6 Proof-of-concept implementation ................................................................................................ 54 5.3 Operations and Maintenance (OAM) Tools .................................................................................... 56 5.3.1 Background: existing OAM toolset ............................................................................................. 56 5.3.2 Mapping OAM element roles to SplitArchitecture ..................................................................... 57 5.3.3 MPLS BFD-based Continuity Check for OpenFlow ................................................................... 58 5.3.4 Technology-agnostic flow OAM ................................................................................................. 61 © SPARC consortium 2012 Page 8 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC 5.4 Network Resiliency ......................................................................................................................... 65 5.4.1 Data plane resiliency ................................................................................................................... 66 5.4.2 Control channel resiliency ........................................................................................................... 67 5.5 Control Channel Bootstrapping and Topology Discovery .............................................................. 68 5.5.1 Control-Channel Bootstrapping in an in-band OpenFlow network ............................................. 68 5.5.2 SPARC Extension to the Topology Discovery Mechanism ........................................................ 74 5.6 Service Creation .............................................................................................................................. 76 5.6.1 Service creation phases ................................................................................................................ 76 5.6.2 Relationship to requirements and OpenFlow 1.1......................................................................... 77 5.6.3 Residential customer service creation (PPP and beyond) ............................................................ 78 5.6.4 Business customer service creation based on MPLS pseudo-wires............................................. 79 5.6.5 Overall conclusions for service creation...................................................................................... 80 5.7 Energy-Efficient Networking .......................................................................................................... 81 5.7.1 Current approaches to reducing network power consumption .................................................... 81 5.7.2 Sustainable networking with OpenFlow...................................................................................... 82 5.8 Quality of Service ............................................................................................................................ 83 5.8.1 Queuing management, scheduling, traffic shaping...................................................................... 85 5.8.2 Improvement proposal ................................................................................................................. 86 5.9 Multilayer Aspects: Packet-Optical Integration .............................................................................. 88 5.9.1 GMPLS and OpenFlow ............................................................................................................... 89 5.9.2 Virtual ports vs. adaptation actions ............................................................................................. 90 5.9.3 Three Levels of Integration ......................................................................................................... 90 6 Implementing Carrier-Grade SplitArchitecture in an Operator Network................................................. 94 6.1 OpenFlow in Access/Aggregation Networks .................................................................................. 95 6.2 Implementing OpenFlow for Residential and Business Services in Carrier Environments ............ 95 6.2.1 Residential customer service with OpenFlow ............................................................................. 96 6.2.2 Business customer services based on MPLS pseudo-wires with OpenFlow ............................. 107 6.3 Split Control of Transport Networks ............................................................................................. 107 6.3.1 Cooperation with legacy control planes .................................................................................... 107 6.3.2 Semi-centralized control plane for MPLS access/aggregation/core networks .......................... 108 6.4 Scalability Characteristics of Access/Aggregation Networks ....................................................... 112 6.4.1 Introduction to the scalability study .......................................................................................... 112 6.4.2 Numerical model ....................................................................................................................... 113 7 Conclusions ............................................................................................................................................ 121 Abbreviations ................................................................................................................................................ 123 References ..................................................................................................................................................... 127 © SPARC consortium 2012 Page 9 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC List of figures and/or list of tables Figure 1: Relation of SPARC work packages ................................................................................................. 12 Figure 2: The access/aggregation network ...................................................................................................... 13 Figure 3 Existing architectural design principles, based on research by Stanford and specified by ONF ...... 16 Figure 4 ONFs SDN architecture including OpenFlow and OF-Config [42].................................................. 19 Figure 5 Enhancement to the initial OpenFlow model .................................................................................... 19 Figure 6 SplitArchitecture defined by SPARC................................................................................................ 20 Figure 7: Control plane organized in control blocks, OpenFlow as remote SAP ............................................ 23 Figure 8: A legacy network domain with distributed autonomous protocol stacks. Yellow boxes depict control blocks and green boxes datapath elements. ......................................................................................... 24 Figure 9: Split of control and data plane ......................................................................................................... 25 Figure 10: Functional blocks in a control block .............................................................................................. 25 Figure 11: Recursive Architecture of Virtual Nodes and Modular Control Plane .......................................... 26 Figure 12: Core functional blocks of a Layer. ................................................................................................. 28 Figure 13: Three planes of telecommunication networks: a centralized management plane, managing both the distributed control and data planes. ................................................................................................................. 31 Figure 14: Network management introduced to OpenFlow-based SDN via a fully separated, external NMS 32 Figure 15: Network management introduced to SDN by integration of selected network management functions (NMF) with the controller. .............................................................................................................. 33 Figure 16: Proposal for a carrier split architecture with integrated NM .......................................................... 34 Figure 17: Combined control and management framework in the recursive controller architecture .............. 37 Figure 18: Port Management Messages and Port life cycle ............................................................................ 41 Figure 19: Actions vs. Processing Entities ...................................................................................................... 42 Figure 20: ActionProcess and Processing Instances........................................................................................ 43 Figure 21: Virtual Port Concept ...................................................................................................................... 44 Figure 22: Proposed logical OpenFlow architecture ....................................................................................... 44 Figure 23: Programmable Datapath Model ..................................................................................................... 46 Figure 24: Different virtualization models for OpenFlow. .............................................................................. 50 Figure 25: A single TCP/SSL session used by a FlowVisor (left) vs one connection per controller (right). .. 51 Figure 26: A combination of an encapsulated forwarding plane with flow table splitting, an in-band Ethernetbased control network and multiple isolated OpenFlow Instances. Translation is performed between the OpenFlow Instances and the fast-path hardware, and is configurable through a privileged OpenFlow Instance. ........................................................................................................................................................... 53 Figure 27: The Master Controller (left) sees the full physical topology. ........................................................ 54 Figure 28: The various virtualization and customer tables and groups inside a virtualized switch. Solid lined boxes indicate regular OpenFlow datapath resources, whereas dashed lines indicate additions to realize the proposed encapsulation-based virtualization system. ...................................................................................... 55 Figure 29: SplitArchitecture OAM Configuration .......................................................................................... 58 Figure 30: Ingress-side OAM signaling generation......................................................................................... 59 Figure 31 : Egress-side OAM signaling reception .......................................................................................... 60 Figure 32: BFD implementation for OpenFlow version 1.1. Normal incoming packets are processed in the flowtable and forwarded to the Fast Failover group, which uses the first working LSP to forward the packets. Incoming OAM packets are sent through a channel for processing in the external module. The external module generates OAM packets and injects them directly in the LSP groups, whose liveness the module controls via a control channel. ......................................................................................................................... 61 Figure 33: Flow OAM Architecture ................................................................................................................ 62 Figure 34: Flow ID Module ............................................................................................................................. 63 Figure 35: Virtual Data Packet ........................................................................................................................ 64 Figure 36: Recovery mechanism for OpenFlow networks .............................................................................. 66 Figure 37: In-band Network topology ............................................................................................................. 68 Figure 38: DHCP client and server interaction ............................................................................................... 70 Figure 39: Format of DHCP server configuration file..................................................................................... 70 Figure 40: Action of the bootstrapping application on the PACKET-IN event .............................................. 72 Figure 41: The NOX Routing Mechanism ...................................................................................................... 74 Figure 42: A) NOX modified Mechanism (Spanning Tree solution) (B) without Spanning Tree Creation 75 © SPARC consortium 2012 Page 10 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Figure 43: Service creation models BRAS with PPPoE and DHCP++ based on DSLAM ............................. 79 Figure 44: Typical pseudo-wire frame structure for Ethernet emulation ........................................................ 80 Figure 45: Multilayer Traffic Engineering (MLTE) ........................................................................................ 81 Figure 46: Power Management........................................................................................................................ 82 Figure 47: Energy consumption across the functional blocks of a high-end core router [9] ........................... 82 Figure 48: Logical operation of a 10 Gigabit Ethernet switch with line cards, from [34]............................... 84 Figure 49: QoS processing pipeline where packets from different flows are classified and metered before going through the routing/switching stage. The packets that were not dropped are then scheduled, shaped, and rewritten. ................................................................................................................................................... 85 Figure 50: The hierarchical and the flat QoS model........................................................................................ 88 Figure 51: Representing a leaf of one QoS hierarchy as the root of another, virtualized one. ........................ 88 Figure 52: GMPLS Multi-Region hybrid node, composed of a packet OpenFlow switch and a circuit (TDM or WDM) switch. ............................................................................................................................................. 89 Figure 53: Interworking of OpenFlow and GMPLS........................................................................................ 90 Figure 54: Direct control of optical network elements by the OF controller. Path computation is still being done in a vendor’s control plane...................................................................................................................... 91 Figure 55: Integration of a PCE into a multilayer OpenFlow controller ......................................................... 91 Figure 56: Ericsson proposal for an OpenFlow multi-layer/multi-region switch architecture ........................ 92 Figure 57: Circit flow table entry .................................................................................................................... 93 Figure 58: Three models for attachment of OpenFlow in access/aggregation networks ................................. 95 Figure 59: General implementation options for service creation .................................................................... 96 Figure 60: AAA integration in PPPoE and OpenFlow .................................................................................... 97 Figure 61: SPARC BRAS options in contrast to today’s residential model .................................................... 98 Figure 62: SPARC DHCP++ in contrast to today's residential model .......................................................... 105 Figure 63: SPARC DHCP++ integration options for first node (DSLAM) and AGS node .......................... 106 Figure 64: Pseudo-wire processing in Ethernet over MPLS pseudo-wires (based on Fig.3 of RFC4448) ... 107 Figure 65: Single dissemination area option ................................................................................................. 109 Figure 66: ABR is under OpenFlow control ................................................................................................. 109 Figure 67: ABR is not under OpenFlow control ........................................................................................... 110 Figure 68: Revised controller architecture .................................................................................................... 111 Figure 69: Deployment scenario for scalability studies ................................................................................ 112 Figure 70: Simplified domain view ............................................................................................................... 113 Figure 71: Tunnels considered ...................................................................................................................... 114 Figure 72: Number of equipment units.......................................................................................................... 114 Figure 73: Number of tunnels........................................................................................................................ 115 Figure 74: Number of flow entries ................................................................................................................ 115 Figure 75: The effect of a link down: tunnels ............................................................................................... 116 Figure 76: The effect of a link-down: flow mods .......................................................................................... 116 Figure 77: Recovery times ............................................................................................................................. 117 Figure 78: IPTV channel change ................................................................................................................... 118 Figure 79: IPTV channel change time ........................................................................................................... 118 Figure 80: Ring topology............................................................................................................................... 119 Figure 81: Alternative OpenFlow domain internal tunnel structure .............................................................. 119 Figure 82: Topology and domain internal tunnel structure effects ................................................................ 120 Table 1 Overview of OpenFlow releases ........................................................................................................ 17 Table 2: Assessment of NM functions and potential for control plane integration ......................................... 35 Table 3: List of study topics requiring OpenFlow extensions ......................................................................... 40 Table 4: Existing virtualization solutions with OpenFlow vs. SPARC requirements ..................................... 49 Table 5: Options for the transport of encoded information in OpenFlow ....................................................... 89 © SPARC consortium 2012 Page 11 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC 1 Introduction 1.1 Project Context The SPARC project (“Split Architecture for carrier-grade networks”) is aimed at implementing a new split in the architecture of Internet components. In order to better support network design and operation in large-scale networks for millions of customers, with high automation and high reliability, the project will investigate splitting the traditionally monolithic IP router architecture into separable forwarding and control elements. The project will implement a prototype of this architecture based on the OpenFlow concept and demonstrate the functionality at selected international events with high industry awareness, e.g., the MPLS Congress. The project, if successful, will open the field for new business opportunities by lowering the entry barriers present in current components. It will build on OpenFlow and GMPLS technology as starting points, investigating if and how the combination of the two can be extended, and study how to integrate IP capabilities into operator networks emerging from the data center with simpler and standardized technologies. 1.2 Relation to Other Work Packages WP1 Project management WP1 Project WP2 Use Case & Business s Scenarios WP3 Architecture WP4 Prototyping WP WP5 Validation Validation Performan Evaluati Evaluation WP6 Dissemination Figure 1: Relation of SPARC work packages In the “workflow” of the work packages, WP3 is embedded between WP2 (Use Cases / Business Scenarios) and WP4 (Prototyping). WP3 will define the SplitArchitecture taking use cases and requirements of WP2 into account and will analyze technical issues with the SplitArchitecture. Moreover, this architecture will be evaluated against certain architectural trade-offs. WP4 will implement a selected subset of the resulting architecture, and feasibility will be validated in WP5. WP6 disseminates the result at international conferences and publications. 1.3 Scope of the Deliverable In this deliverable, we present our final proposal for a split architecture for large-scale wide area networks, such as carrier-grade operator networks. Based on our conclusions from D3.1, we focus on OpenFlow as an enabling technology for the split architecture. Earlier SPARC deliverables (D2.1 and D3.1) made it clear that current OpenFlow implementations do not fulfill carrier requirements. In this deliverable, we therefore discuss a suitable framework for controlling carrier-grade operator networks and investigating missing features in OpenFlow as identified during the SPARC project in the context of access/aggregation network scenarios. 1.4 Report Outline In order to set the stage, we start the deliverable in Section 2 with a summary of the use cases and requirements for a split architecture we defined and listed in SPARC deliverable D2.2. In Section 3, we then provide an overview of current software-defined networking (SDN) models, and compare them to the envisioned carrier-grade SplitArchitecture concept. Section 4 describes the control and management architecture of the proposed SplitArchitecture. Here, we discusses a recursive control architecture consisting of hierarchically stacked control planes and provide initial considerations regarding network management integration with SDN and SplitArchitecture. In Section 5 we propose required extensions to OpenFlow to support the envisioned carrier-grade SplitArchitecture. This section proposes the necessary extensions with respect to carrier-grade requirements based on missing features as identified in earlier SPARC deliverables (D2.1 and D3.1). Topics include openness and extensibility, virtualization and isolation, OAM, resiliency aspects, control channel bootstrapping and topology discovery, service creation, energy-efficient networking, QoS and multilayer aspects. Finally, in Section 6 we present selected deployment scenarios of a carrier-grade SplitArchitecture with OpenFlow. We show how OpenFlow-based SplitArchitectures can be adopted for relevant scenarios prevalent in modern operator networks, such as service creation, general access/aggregation, network scenarios and peering aspects. Finally, we present a numerical scalability study indicating the feasibility of a splitarchitecture approach in access/aggregation network scenarios in terms of scalability requirements. © SPARC consortium 2012 Page 12 of 129 WP3, Deliverable 3.3 2 Split Architecture - SPARC Review of SPARC use-case access/aggregation network The access/aggregation network is the linking part between customers and core networks, where typically services are hosted. The general network architectures are depicted in A recursive control plane for SplitArchitecture. Figure 2: The access/aggregation network Beside a vertical link, there exists two additional planes. First, there is a link with an optical transport network. Second, there exists a control or management plane hosting services like AAA, auto-configuration or service management. The use case access/aggregation represents a manifold and diverse area. In SPARC, it was split in five areas (called use cases as well):      2.1 Seamless MPLS or SDN approaches to MPLS Transport Multi-service/-provider environments (Service Creation) Mobile backhaul Software Defined Networking application in context of IEEE 802.11 compliant devices Dynamic control composition Refinement of Requirements Early in the SPARC project, we defined a set of 67 detailed requirements in WP2. In order to allow concentration on the most important ones, the total of 67 requirements was reduced in deliverable D2.1 in order to concentrate only on those requirements that are not already fulfilled with respect to existing architecture concepts and available implementations. Four groups of general important requirements have been identified. The first group covers all required “Modifications and extensions for the data path element” or the SplitArchitecture itself. The other three groups deal with needed extensions of carrier-grade operation of ICT networks. The aspects related to the operation of an ICT network “authentication, authorization and auto configuration” (not to be mixed up with “AAA”, as accounting is use-casespecific) are covered in a second group; “OAM” in the sense of facilitating network operation and troubleshooting in the third group; “network management, security and control” of the behavior of the network and protocols in the fourth group. Within network management, the aspects for the use of policies in network environments are included. Additional to the work in WP2, also the assessment of existing SplitArchitecture approaches, such as ForCES, GMPLS/PCE, and most importantly OpenFlow, revealed a number of issues and open questions that need to be considered for future carrier-grade Split Architectures (see deliverable D3.1). The following topics have been identified that require special attention in the current architecture study: © SPARC consortium 2012 Page 13 of 129 WP3, Deliverable 3.3      Split Architecture - SPARC Requirement group “Network virtualization”, e.g. to ensure strict virtual network isolation and integrity and handling of overlapping address spaces should be handled Requirement group “Recovery and redundancy” including open question not only with respect to the data plane of the network, but also with respect to controller and control plane failures Requirement group “Multilayer control” include integration of circuit switching and multilayer coordination for optimization or resiliency purposes Requirement group “OAM” functionalities for service management Requirement group “Scalability” to be considered for the data plane and proposing controller architecture In the following architectural deliverable D3.2, the specific functions have been investigated and additional refinement of the different groups has been performed (each requirement group now effectively represents a general network function or feature):       Requirement group “Modifications and extensions to the data path elements” has been broken down: o Requirement group “Openness and Extensibility” developing ways of how to extend OpenFlow to support a more complete, stateful processing on data path elements in order to enable OpenFlow support for further technologies with high relevance to carrier networks, such as PBB, VPLS, PPPoE, GRE, etc. o Requirement group “Multilayer” aspects is on the extension of OpenFlow in order to control nonEthernet-based layer 1 technologies (as specified by IEEE 802.3 study group), especially the integration of circuit switched optical layers into OpenFlow (packet-optical integration). Requirement group “Authentication, authorization and auto configuration,” is covered by the broader topic of “Service Creation” Requirement group “OAM” identifies a technology-dependent OAM solution (i.e., MPLS BFD) and a novel technology-agnostic Flow OAM solution Requirement group “Network management” was divided into several subgroups: o Requirement group “Network Management” covers the general framework for implementation of network management functions, fault and performance management covered by “OAM” configuration management o Requirement group “Quality of Service” o Requirement group “Resiliency” is commonly seen as one key attribute of carrier-grade networks i.e., the ability to detect and recover from incidents within a 50ms interval without impacting users o In order to facilitate this centralized network management operation, it was identified that automatic (Requirement group) “Control Channel Bootstrapping and Topology Discovery” is an important feature o Requirement group “Energy-Efficient Networking” provides functionalities to increase the energy efficiency of modern and future networks Requirement group “Virtualization and Isolation” enabling multiservice (within the responsibility of one operator) and multi-operator scenarios on a single set of a physical network infrastructure Requirement group “Scalability” is another key feature of the SPARC controller architecture [not covered in D2.1]. In this final architecture deliverable (D3.3), we add the topics of a “Recursive Control Plane” architecture (also referred to as hierarchical controller concept) and a separate discussion on “Network management” (not to be confused with the other requirement groups on QoS, resiliency, etc.). © SPARC consortium 2012 Page 14 of 129 WP3, Deliverable 3.3 2.2 Split Architecture - SPARC Summary of Requirements The list of harmonized requirement groups is the result of the evolving discussions during the entire project duration and has been aligned between the final deliverables in all technical work packages (WP2 to WP5). The final requirement groups have a focus on the access/aggregation use case only and are listed below: (a) Recursive Control Plane (b) Network Management (c) Openness and Extensibility (d) Virtualization and Isolation (e) OAM (technology-specific MPLS OAM / technology-agnostic Flow OAM) (f) Network Resiliency (g) Control Channel Bootstrapping and Topology Discovery (h) Service Creation (i) Energy-Efficient Networking (j) Quality of Service (k) Multilayer Aspects (l) Scalability Note that we do not claim this list to be exhaustive. The above listed features have been chosen solely based on our own assessment of feature importance. In the continuous process of requirement specification, concepts and prototypical implementation, the scope has gradually been extended. However, in this deliverable we cover only the listed requirement groups listed above. Overall, we conclude that the required network functions listed above have been analyzed and specified in detail where appropriate solutions could be found. With the focus on access/aggregation, it has been possible to develop a comprehensive set of solutions covering most requirements, and most solutions have been developed, implemented and tested (see deliverables of WP4 and WP5). In this deliverable, we cover requirement groups (a) and (b) in Section 4, groups (c) to (k) in Section 5, and finally assess requirement group (l) in Section 6.4. © SPARC consortium 2012 Page 15 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC 3 Introduction to SplitArchitecture 3.1 State of the art Today the design of network elements (i.e., switches, routers) follows a monolithic design pattern, i.e., each networking element integrates functions for control, forwarding and processing. Forwarding and processing capabilities are consolidated in the data plane where control capabilities are centralized in its own dependent control plane of a current network element. Usually it is not possible to access the interface in-between these functional blocks of today’s network element, as depicted in Figure 3 (I). The upper control plane performs path computation primarily. This plane consists of network-wide distributed algorithms that enable construction of the essential forwarding information base for data plane operations. The compound of control and data plane is adjusted entirely to the provided service. The data plane performs packet forwarding, such as prefix matching, and is instructed by the control plane directly [54]. Splitting this design into functional independent planes (control and data) and opening the interface in between could be a successful way to relieve existing hardware and infrastructure of legacy architectures, thus facilitating the evolution and deployment of new protocols, technologies and architectures. Stanford’s OpenFlow proposal [45] constitutes an independent control and data plane where the interface in-between is publicly accessible, as represented in Figure 3 (II). Here, OpenFlow is used to orchestrate intercommunication between separated planes. Applications (apps) are pieces of software coupled to the centralized control plane (typically called the controller). “Application” is a generic term that in this context could cover both network and service-related functions. Figure 3 II shows applications as separate entities; Figure 3 III shows network services (network-related functions) and business applications (service related-functions). business applications app control control app app control control network services OpenFlow data data OpenFlow data data (I) today’s network design SDN control software data (II) generic OpenFlow architecture proposed initially by Stanford data data data data (III) SDN specified by the ONF Figure 3 Existing architectural design principles, based on research by Stanford and specified by ONF In this deliverable, we study how to extend the generic approach shown in Figure 3 (II) leading to our SplitArchitecture design proposal, in which we focus on the fulfillment of carrier-grade requirements (e.g., OAM, resiliency, network virtualization, etc.). As a result, carrier networks could benefit from the advantages of SplitArchitecture and enable network operators to have a disjointed evolution of data path and control mechanisms, which have the potential to pave the way toward more dynamic control of services and connectivity in carrier networks. In deliverable D3.1, we discussed three architectural approaches that could be used to implement a split architecture. The IETF’s ForCES framework (RFCs 3746 and 5810), IETF’s GMPLS/PCE [69][70] and the OpenFlow approach by Stanford University [45], which is now specified by the Open Networking Foundation (ONF [43]), were considered. While GMPLS/PCE provides a decoupling of control and data plane, most control plane functions are still distributed to each individual network element. Only the PCE architecture allows placement of parts of the control intelligence to a separate software component that can be accessed via a standardized interface, i.e. the PCEP protocol. Furthermore, GMPLS is a signaling protocol set that is useful within the control plane (e.g. for NNI); however, it does not specify the control connection between data and control planes. Both ForCES and OpenFlow specifically target the control interface between the control and data planes. The ForCES framework initially seemed more mature in some aspects when compared to OpenFlow. ForCES provides a configuration model, allowing the specification of different technologies via libraries. OpenFlow, on the other hand, provides a more detailed, but also more rigid node model, which makes OpenFlow simpler but less flexible than ForCES. However, the strong industry support recently observed for OpenFlow, and the lack of support for ForCES in academia and industry, makes OpenFlow the more interesting and evolving technology for split architecture today. © SPARC consortium 2012 Page 16 of 129 WP3, Deliverable 3.3 3.1.1 Split Architecture - SPARC Visions and specifications of the ONF During 2011, the Open Networking Foundation (ONF) was founded as a non-profit consortium, consisting of network operators, equipment manufacturers, software suppliers and chip technology providers. The ONF is dedicated to developing and standardizing a software-defined network (SDN) architecture. SDN was coined as a new name for the concept of separating control and data plane as first proposed by Stanford when introducing OpenFlow. The ONF adopted OpenFlow and SDN approach by Stanford and specified it as depicted in Figure 3 (III). The separated control plane is represented by the SDN control software in this architecture approach. ONF itemized applications from Figure 3 (II) into network services and business applications. Network services represent basic network functions such as routing, topology discovery and policy management. In contrast business applications are functions for service generation or a service itself. Business applications are entirely decoupled from the SDN control software (control plane) and operate on an abstracted view of the underlying network. Abstraction is provided by the interface between the applications and control software. ONF uses OpenFlow for communication between the SDN control plane and particular data plane entities, as also proposed by Stanford (Figure 3 (II)). The northbound interface between the SDN control software and business applications is currently under discussion. The ONF defined SDN as “an emerging network architecture where network control is decoupled from forwarding and is directly programmable … Network intelligence is (logically) centralized in software-based SDN controllers, which maintain a global view of the network” [44]. According to the ONF, the key features of SDN are thus: (a) Separation of control and data plane; (b) Centralized control plane (or controller) with global network view; and (c) Programmability by external software modules or applications via the controller. The current focus in the ONF lies on the southbound controller interface, i.e., the interface between the separated control and data planes. This interface has been instantiated through the OpenFlow protocol, as depicted in Figure 3 (III). In the execution of the ONF approach SDN control software is instanced by even one or more OpenFlowControllers, while individual data plane entities are represented by specific, independent OpenFlow capable switches. The ONF identified OF as the unique major component of its proposed SDN concept. However, other aspects, such as commercialization and promotion, are also covered by the ONF. The ONF is organized as independent working groups and supervised by the board of directors and a separate technical advisory group. Current WGs consider the following aspects regarding OpenFlow: Archtiecture & Framework; Extensibility; Configuration & Management; Testing & Interoperability; coexistence of conventional and OpenFlow enabled forwarding mechanisms (Hybrid switches and networks); and Market Education. The WGs with most relevance in terms of protocol standardization are the Extensibility group, driving the OpenFlow specification, and the Config & Management WG, driving the specification of the newly introduced SDN configuration mechanism OF-Config [42]. 3.1.2 Evolution and status of the OpenFlow protocol The OpenFlow protocol was initially specified by the universities of Stanford and Berkley. With the launch of ONF, all specification activities concerning OpenFlow have been taken over, resulting in standardization of more consistent protocol versions. The extensibility working group has driven aspects such as integration of IPv6, sophisticated matching strategies and flexibility. Version 1.2 is the first standard of the protocol evolved solely by the ONF and builds significantly on the predecessor version 1.1. In the course of enhancement to version 1.3, more and more missing aspects were addressed and added to the upcoming versions. The evolution of the protocol is outlined in the following table [41]. Table 1 assumes OpenFlow 1.0 as reference and covers major increments of the respective release. OpenFlow 1.1 (Feb. 2011) Table 1 Overview of OpenFlow releases Multiple table processing: An OpenFlow pipeline may consist of separate flow tables which are concatenated. Groups: The abstraction of a group enables to use a set of ports as a discrete entity for forwarding packets properly, such as multicast or multipath. MPLS and VLAN assistance: Support of sophisticated manipulation mechanisms (i.e. add, modify, delete) of MPLS and VLAN labels. The proposed methods are not limited to single level VLAN tagging, but consider queue-in-queue tagging as well. Virtual port concept: Virtual ports are used to represent forwarding abstraction such as tunnels. Connection failure: Either switch continues to work in standalone or secure mode instead of using an emergency flow cache once connectivity with the controller is lost. © SPARC consortium 2012 Page 17 of 129 WP3, Deliverable 3.3 OpenFlow 1.2 (Dec. 2011) Split Architecture - SPARC Extensibility match assistance: Fixed structure of ofp_match changed to a type-length-value (TLV) structure. IPv6 support: Capabilities concerning IPv6 matching and header rewriting (i.e. source and destination address, protocol number, traffic class and ICMPv6 information) have been added. Controller role change: Switches are controlled by only one controller (master) but may maintain connectivity to a set of controllers (slaves) concurrently. This can be used for failover, where the roles of controllers change immediately on connection loss to the master. OpenFlow 1.3 (April 2012) Table miss entry: The previously assigned usage of table configuration flags is replace by a specific table-miss entry to take care of any non-matched packet. Advanced IPv6 support: Added ability to match the presence of several IPv6 extension headers (i.e. fragmentation, hop-by-hop). Meters: Meters are attached to flow entries to enable reliable measurement and control of the rate of packets per flow. 3.1.3 SDN model and status of OF-Config In parallel with the evolution of OpenFlow, the ONF configuration and management working group (WG) focussed on how to configure/monitor aspects of the datapath elements that are not associated with the OpenFlow protocol. Current versions of OpenFlow do not cover management and configuration features sufficiently (e.g., address have to be configured manually for control channel establishment). This WG thus specified OF-Config, a mechanism for configuring OpenFlow capable devices. The configuration mechanism is a first step to adding previously lacking management and configuration capabilities to SDN, which we also identified as an important carrier-grade requirement within SPARC. OF-Config 1.0 was released in January 2012, and is based on NETCONF (RFC 4741), a transactional protocol that uses remote procedure calls on top of a secure transport channel to manage configurations on remote devices. Instead of focusing on a complete range of functions, the purpose of this first specification was to define a schema to ensure a consistent representation of configuration elements in the protocol. To specify the data model of the switch configuration, the WG made use of two specification options. As the NETCONF is XML-based, the data model used during configuration can be specified using XML schemas. However, since the XML schemas lack support for specifying behavioral constraints, the OF-Config specification was given as the YANG model (RFC 6020). The first version of the configuration protocol, OF-Config 1.0, included only a limited set of functions, such as assignment of a set of controllers, separate configuration of related resources (i.e. ports, queues) and remote manipulation. In OF-Config 1.1, additional functions such as certificate handling, capability discovery and the configuration of three basic tunnel endpoints have been added. Further functions expected in future versions include topology discovery, capability configuration, advanced tunnel configuration and instantiation and resource specification for the logical switches. The WG is also discussing whether broader OAM functions, such as fault and performance monitoring, should be part of OF-Config or should instead be discussed in separate future specifications. To summarize, the functional scope of the OF-Config (version 1.1) protocol is the following: 1. The assignment of one or more OpenFlow controllers 2. The configuration of queues and ports 3. The ability to remotely change some aspects of ports (e.g. up/down) 4. Configuration of ceritificates for secure communication between the OpenFlow Logical Switches and OpenFlow Controllers 5. Discovery of capabilities of an OpenFlow Logical Switch 6. Configuration of a small set of tunnel types such as IP-in-GRE, NV-GRE and VxLAN Figure 4 shows the coexistence of OF and OF-Config within the current SDN architecture as defined by ONF. In this architecture, an OpenFlow capable switch, which is a physical or virtual network element, is hosting one or more OpenFlow logical switches. The logical switches represent the actual OpenFlow network elements, which are controlled by one or more OpenFlow Controllers using the OpenFlow protocol. Network applications on top of the OF Controller use the network via the OF Controller’s northbound API (NB API). Finally, an OF Configuration Point represents the service that communicates via the NETCONF-based OF-Config protocol with an OpenFlow capable switch and partitions resources among OF logical switches (such as ports and queues). Currently, the relationship between the OF Controller and OF Configuration Point is deliberately not defined by the ONF. © SPARC consortium 2012 Page 18 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Apps NB API OF configuration point Controller OpenFlow OF-Config Resources (ports, queues) Resources (ports, queues) OF logical switch OF logical switch OpenFlow Capable switch Figure 4 ONFs SDN architecture including OpenFlow and OF-Config [42] 3.1.4 Vision of SDN in Academia Concurrently to the SDN model defined by the ONF, further alternative SDN architecture attempts have been made in academia. The main aspect discussed by academics is the integration of an additional element called hypervisor, responsible for virtualization and abstraction of the network. Scott Shenker, professor at the University of California, Berkley, has done the most notable work. His vision of an SDN architecture is approximated in Figure 5 (II). business applications business applications control program nypervisor network services SDN control software network operating system OpenFlow data data (I) SDN specified by the ONF OpenFlow data data data data (II) SDN enhanced by S. Shenker Figure 5 Enhancement to the initial OpenFlow model Shenker separated network services as specified by the ONF (Figure 5 (I)) into control programs and services provided by the network operation system itself. In our understanding, the terms “SDN control software” as introduced by the ONF and “network operation system” as proposed by Shenker can be used interchangeable. Control programs (e.g. different kinds of routing algorithms) are network-related functions, which are not part of the essential set of functions provided by the network operation system itself (e.g. topology discovery). These programs can be implemented in different ways and substitute for one another, while functions provided by the network operation system constitute the set of fundamental functions that must be provided. The nypervisor (an amalgam of network and hypervisor) has a global view of the underlying topology and its resources (e.g. address spaces). Here the essential function of decoupling the upper layer entities from the underlying topology is organized by providing an abstracted total global view. The preferred position of the nypervisor is discussed by academics controversially and several different proposals exist in the literature. As depicted in Figure 5 (II) Shenker prefers to have the nypervisor atop the network operation system. © SPARC consortium 2012 Page 19 of 129 WP3, Deliverable 3.3 3.2 Split Architecture - SPARC SPARC Vision on SplitArchitecture Based on the separation of control and data plane of current network elements and influenced by simultaneous ongoing trends, SPARC evolved the concept of SplitArchitecture. SplitArchitecture enhances the various control separation approaches integrated in different existing architectures (e.g. GMPLS or ForCES, OpenFlow Figure 3 (II)) and from the work of both the ONF and academia (cf. Figure 5). In general, SplitArchitecture acknowledges the principles of SDN with a split between data and control plane as well as the introduction of a kind of network operation system. A high-level graphical representation of the concept of SplitArchitecture as proposed by SPARC is shown in Figure 6. There are three substantial differences to the previous concepts proposed by the ONF and academia. Figure 6 SplitArchitecture defined by SPARC (1) First, the split between control program and network operating system (NOS) with the help of the nypervisor and two different abstractions is reduced to one abstraction (see Figure 5 II) - the filtered, abstract network view. This means that a network hypervisor is not mandatory and can be replaced by a basic set of filter function establishing a meaningful abstraction. Currently, the discussed design follows a pragmatic way of hierarchical controllers separating the network view by filtering the available addressing scheme and therefore granting access for control programs to parts of the resources only. In SPARC’s concept of SplitArchitecture, control planes in a hierarchical controller unify the network operation system, the hypervisor functionality as proposed by Shenker (Figure 4 (II)), and control applications. The hierarchical controller concept as defined by SPARC means that several control planes are stacked upon one another recursively. In this hierarchy, each plane acts as controller toward data path elements in a lower plane and as single data plane entity toward higher planes. Further discussions about the control layer and the proposed controller architecture in SPARC can be found in Section 4.1. (2) Secondly, we specifically discuss the role of network management in an SDN environment. Trying to apply traditional management definitions a generic SDN model we conclude that it is difficult todifferentiate precisely between control and management functions in the context of SDN, where both the control and management planes are centralized. As a result, we present a proposal network management integration with the flexibility to choose whether to place network management functions (NMF) within an SDN controller or a separate network management system (NMS). The proposed management architecture that is based on the current SDN model defined by the ONF (cf. Figure 4). In our proposal, the control plane entities includes a network management function (NMF) module consisting of an OF configuration point and an equivalent monitoring point, responsible for configuration and monitoring interfaces (OF-config and OF-mon). This NMF shares the network view of the controller, including topology information and updates to this view in terms of alarms and notifications from the data plane. In terms of functionality, the NMF can take over responsibility for configuration, fault, and performance management functions that are useful in the control layer. The controller and the NMF module can also interact with an external NMS for functions beyond the scope of the configuration and monitoring interface provided by the NMF in the controller. The network management considerations are presented in Section 4.2. © SPARC consortium 2012 Page 20 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC (3) The third difference is an additional split in the data plane between forwarding and the processing of related procedures. Various aspects motivate this split. Forwarding decisions are done most efficiently at the edge of the networks, but the network elements at the edge lack sophisticated processing capabilities in many cases (e.g. DSLAM). Processing capabilities are today spread around the network environment. Compared with other technologies, processing capabilities are evolving fast, so a separation of related entities could ease innovation. In addition new protocols could be integrated in existing environments more easily, for example by supporting general-purpose computing hardware. Moreover, at the end of the life cycle, legacy protocols could be phased out by moving the desired processing capabilities to other locations while keeping forwarding decisions with the same network device. Referring to the access/aggregation domain, an example could be the PPPoE recognition in a DSLAM, but moving the BRAS functions from dedicated, single-purpose devices to a general computing hardware or a data center. Section 5 discusses the necessary extensions to protocols and data path elements to fulfill carrier-grade requirements, which includes discussions on these specific topics in the Sections 5.1 on Extensibility and in 5.6 on Service Creation. © SPARC consortium 2012 Page 21 of 129 WP3, Deliverable 3.3 4 Split Architecture - SPARC Carrier-Grade Control and Management Architecture In this section, we will first present our architectural considerations regarding a suitable control plane for the carriergrade SplitArchitecture, i.e., applying SDN principles to wide-area operator networks. Besides discussing the separation of control and forwarding, which is essential to SDN, we will also provide a discussion about how network management functions could be integrated into the SplitArchitecture in Subsection 4.2. Extensions required to datapath elements and the OpenFlow protocols itself will then be detailed in the subsequent Section 5. 4.1 A recursive control plane for SplitArchitecture While OpenFlow is “flattening the layers” in that the datapath executes a single match for multiple packet header fields, this does not mean that the control plane has to be flat as well. Figure 6 introduced the basic concept of the SPARC SplitArchitecture. It shows how any controller in recursive architecture is a network application to the next lower controller. The proposed architectural extensions to a split control plane based on OpenFlow do not dictate any constraints on control plane designers with regard to organization of the control plane. Instead, it gives the freedom of adapting the control architecture according to needs and business models. At the same time, recursive stacking, flowspace management, and advanced processing allows distributing controllers to adapt to network capabilities. The freedom to distribute or centralize controllers increases scalability, adds resiliency over a single-controller architecture and allows plugging in legacy equipment (and legacy control planes). A carrier-grade SDN-aware control architecture should also support legacy control protocol operations, as network operators require a smooth upgrading path for deploying SDN enabled devices and (sub-) domains within a network environment still based on legacy control protocols. This also includes the network management system and all OAM operations. To sum up, the main requirements discussed in SPARC deliverables D2.1 and D3.1 are: 1. Enable virtualization of network resources for sharing physical infrastructures among several operators. 2. Allow deployment of new control plane architectures and services in parallel to existing legacy protocol stacks. 3. Maintain all requirements in terms of service isolation, OAM provisioning, QoS enforcement, and NMS integration as defined in existing networks today (these topics are discussed in Sections 5.2, 5.3, 5.8 and 4.2 respectively). The OpenFlow API defines all basic service primitives for implementing remotely accessible Service Access Points (SAP) for OpenFlow capable switches. These SAPs are different from the ones defined in ISO/OSI [73], as the architectural split enforced by OpenFlow is different from OSI. Typically, there is only the control of the data flow going through this SAP, eventhough OpenFlow also allows data frames to be relayed to the controller through that API. The elements of the APIs, which a SplitArchitecture controller provides upwards, will obviously differ for network management, monitoring, and control purposes. For the latter purpose, we propose OpenFlow itself to export an SAP to client control layers for accessing a server control layer’s communication services in the recursive control architecture. This OpenFlow-realized SAP also defines the information model: it models provided communication service with one forwarding device. This representation automatically hides the details of how the service is actually implemented resulting in a compact representation through the filtered network view. This design is inline with the abstract node representation provided by GMPLS [69]. We aim towards organizing a carrier-grade control plane as a stack of protocol layers, However, contrary to the monolithic controllers (such as NOX and Trema) seen today in network elements, we split the control plane into an theoretically arbitrary number of control blocks implementing layers and use OpenFlow as the base interface between these control blocks. Furthermore, splitting a protocol stack into several logical control blocks and using OpenFlow as glue among them, allows us a distribution of functional blocks across several locations and devices within the control plane, thus simplifying instantiation and attachment of new control blocks at run-time for either load balancing purposes or addition of new capabilities, processing or forwarding capacities. © SPARC consortium 2012 Page 22 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Figure 7: Control plane organized in control blocks, OpenFlow as remote SAP However, OpenFlow lacks some crucial ingredients for such software based modular control plane: moving to higher layers in a protocol stack involves typically a de-multiplexing step, i.e. a Service Access Point as defined by OpenFlow can be used by several higher-layer instances in parallel. As OpenFlow defines a 1:n relationship between the control plane and datapath elements in the data plane, a single layer (n-1) entity only can attach at any time to layer (n). To solve this limitation, we introduce flowspace management. With flowspace registrations, a controller entity, when attaching to a datapath element (= lower-layer instance), expresses the part of the overall flowspace available at the SAP that it is willing (and able) to control. When several controller entities connect to an existing SAP, the datapath element uses the existing flowspace registrations for demultiplexing Packet-In (Flow-Removed, etc.) events and sending the event to the controller responsible for that flow. The general properties we have introduced in this section so far are summarized below:   We proposed OpenFlow as an open API to build a stack of control blocks. Each control block consumes services from lower-layer entities and acts here as a controller according to the OpenFlow terminology, and, at the same, offers services to higher-layer entities and acts as a datapath element in this role. A control block also behaves as an OpenFlow proxy, as it is datapath and controller at the same time, though it typically implements broader set of services rather than acting as a proxy. We extend the OpenFlow protocol with flowspace management that allows a controlling entity to express parts of the overall flowspace accessible at the SAP it is actually willing to control. This provides fine control of the (de-)multiplexing function of the server control block and allows several control entities to use the same OpenFlow SAP in parallel without interfering with each others. Note that these design principles do not define or restrict a control layer’s internal structure and architecture. A control plane designer is free to decompose the desired control functions in any arbitrary control hierarchy. One can collect all functions into one OpenFlow controller making the APIs between the control blocks internal to the controller. Of course, this does not preclude opening up these APIs. This paradigm of “fat controllers” was first adapted in the OpenFlow community and the most widespread controller implementations, such as NOX, follows this concept. The other extreme is when the OpenFlow-based SAP is defined between the atomic control blocks. 4.1.1 Controlling a single network element using SDN Figure 7 depicts an example of how two control planes A and B attached to a single datapath element and implement a protocol stack for a single network element. Both control planes consist of a set of control blocks that are mutually connected via OpenFlow interfaces. Multiple controlling entities may register flowspaces at the same Service Access Point for parallel control of the underlying datapath element. Flowspace registrations of parallel control blocks may or may not overlap. An overlapping flowspace allows definition of “catch-all” controllers, e.g. for implementing monitoring devices or handling denial-of-service attacks. The OpenFlow specification introduces the concept of ports. The OpenFlow port model covers physical ports only (or logical ports like trunk interfaces or simple tunnel endpoints). While physical ports do not exist at higher protocol layers, they may define specific transport endpoints for this layer. As an example, one may consider a controller block located in the Ethernet layer. To offer Ethernet transport services, the Ethernet controller will offer Ethernet transport © SPARC consortium 2012 Page 23 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC endpoints (= endpoints with a specific MAC address assigned) and use the physical ports exposed by the underlying datapath element. A generalized understanding is that any port is a resource that an OpenFlow datapath element offers for control by one or more OpenFlow controllers. The transport service that the port is offering differs when moving up the stack of controllers, at the same time enlarging the geographic extension of the transport domain. While physical ports typically connect to one opposite port (or more for broadcast-and-select networks), learned MAC addresses define the “ports” of an Ethernet controller, eventually creating the notion of a “big switch”. We extend the OpenFlow port model and introduce a more generalized one that maps to either physical ports or transport endpoints (for details see Section 5.1.1). 4.1.2 Controlling multiple network elements with a single controller In non-SDN enabled networks, all network elements contain specific control stack tailored for their needs, e.g., a switch provides spanning tree and reverse learning for supporting Ethernet forwarding while a router implements IGP protocols to support IP forwarding. Introducing a split architecture concept does not necessarily change this situation. Extracting the control logic from each network element and shifting it towards the control plane still maintains the previous situation: each control block is still an autonomous entity and all these entities establish and synchronize shared state (see Figure 8). A network element makes routing decisions and programs its forwarding autonomously. Figure 8: A legacy network domain with distributed autonomous protocol stacks. Yellow boxes depict control blocks and green boxes datapath elements. The control plane designer may decide to divide the network into smaller control domains and designate a control element to be responsible for (1) supervising all control procedures performed within the control domain; and (2) establishing some form of information exchange with designated control entities for some other control domains. As part of this latter task, the control entities maintain an abstract node model of the owned control domain. As the translation between the physical sub-domain and the abstract node model is local to the controller, it can decide the amount of details it shares. This allows reduction of the amount information advertised among the control entities and thus enhances the scalability of the control plane. In Figure 9, the network operator has split the network domain in three sub-domains. Two of them are controlled by dedicated centralized control entities; while the third is kept fully distributed. From an external perspective and with respect to its adjacent nodes, each sub-domain behaves as a single virtual node with a set of ingress/egress ports that connect to neighbor nodes not under control of the sub-domain’s control block. Considering the sub-domain’s internal operations, packets received by any ingress port will be either terminated within the sub-domain (e.g. at an emulated transport endpoint within the control plane) or sent via some egress port to an end system or a network domain in the next hop. Please note that two protocol layers operate in this situation: one protocol layer controls the sub-domain’s internal operations and provides packet transport services in the virtual node’s backplane, while the other protocol layer interacts with entities outside of the sub-domain and uses for its operation the packet transport services of the backplane controlling entity for its operation. Note that any combination of centralized and distributed control can exist in a network. The details of the abstract model for the control entities are restricted only by the protocol they use. © SPARC consortium 2012 Page 24 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Figure 9: Split of control and data plane The functional entities discussed above are organized into two control blocks: 1. The internal control block controls the sub-domain’s internal operation, i.e. transport of packets across the sub-domain’s internal backplane. Since a sub-domain hides all of its internal operations from the surrounding environment, the protocols and solutions adopted for managing the backplane operation are out of scope of our architecture. In principle, network designers can choose any solution that meets their requirements. However, that solution must provide packet transport services among the ingress/egress nodes of the sub-domain. This resembles the principles of an IEEE 802.11 compliant distribution system. The backplane control block must detect the ingress/egress nodes of the sub-domain through some form of topology discovery. 2. The eexternal control block interoperates with peering control entities. In our example discussed here, we split the network domain into two abstract nodes, which actually implement two control domains formed by three and four nodes respectively; and three physical nodes having dedicated control plane entities. In a hybrid scenario with legacy and SDN enabled network elements, the network operator would select the protocol used for interworking of the new formed abstract nodes based on the legacy protocol stacks, in order to avoid updates to the legacy network elements. For the access/aggregation use case considered in SPARC, the IP/MPLS control protocols, OSPF, LDP, RSVP-TE, BGP etc, will provide the necessary glue among the control domains. In a non-hybrid scenario, the network operator may choose either standardized protocols like in the hybrid case, or any state sharing mechanism between the SDN controllers, such as HyperFlow [72]. Figure 10: Functional blocks in a control block Besides the above two control blocks, a third block is required: this module exposes transport endpoints via OpenFlow for accessing the transport services offered by this control block (optional). © SPARC consortium 2012 Page 25 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC The clustering of network elements in an abstract node is not limited to one control layer. Rather, we can apply this scheme recursively on top of abstract nodes again. However, for such recursive architecture, a control block must expose a datapath-like interface towards the client control entities. The depth of such a hierarchy is in principle unlimited; however, for practical reasons, it should be limited to a maximum value1. In Section 4.1.1 we introduced some design guidelines (OpenFlow as northbound interface, flowspace reservation) for controlling a single network element (NE) with an SDN control plane. Since a hierarchy of virtual node control entities behaves like a single node towards the control plane, we can combine both principles in a single framework (see Figure 11). Figure 11: Recursive Architecture of Virtual Nodes and Modular Control Plane 4.1.3 Mapping the carrier-grade Architecture on the SPARC implementations As we discussed above, the suggested control plane design does not define or restrict a control layer’s internal structure and architecture. This also means that the number of such control layers and the mapping of control blocks into these layers depend only on the network operator’s design decisions. Within the SPARC project, we identified two cases for developing and implementing SDN-aware control solutions for transport and service functions: an MPLS-based aggregation and core network, and a Broadband Network Gateway (BNG) service node. For scalable control of the MPLS based aggregation, we adopted the design principle of splitting the network domain into control domains: the aggregation network is split into several control domains, while the legacy distributed control plane is retained in the core (see Figure 8). Within each aggregation control domains the control blocks required for provisioning MPLS transport services (PWE, MPLS tunnel provisioning, OAM, recovery) as well as communicating with the core control domain are grouped into one control layer, which contains the major functional groups shown in Figure 10. As an example of service node virtualization, we implemented a control plane based on the design principles depicted in Section 4.1.1. The carrier-grade control plane consists of several control blocks and uses OpenFlow (and its protocol extensions, presented and applied in Sections 5 and 6, respectively) to create a Broadband Network Gateway (BNG) for access/aggregation domains. Each control block uses the same modularization principles internally, i.e., a control block adopts the OpenFlow interface for loading adaptation and termination functions and for exposing Ethernet and IP based transport endpoints. An initial implementation of a control plane following this design was demonstrated at the Open Networking Summit in April 2012. A detailed description of this prototype is available in SPARC deliverable D4.3. 1 OpenFlow limits the maximum to 1024, as any name space will at least use one bit. In practical applications the number should of course be as low as possible, limiting the processing delay in the control stack. The actual number will be a trade-off between performance (single control block) and modularity, allowing for separation of concerns, better testing and virtualization. © SPARC consortium 2012 Page 26 of 129 WP3, Deliverable 3.3 4.1.4 Split Architecture - SPARC Flowspace Management In the preceding sections, we introduced control blocks as basic building blocks of a protocol stack. Each control block acts as a proxy entity, i.e. it operates as datapath element offering services to higher layers and a controller for using services from lower layers. We adopt the OpenFlow API for binding such control blocks to each other. The OpenFlow 1.x series of specifications defines a 1:n relationship between controllers and datapath elements, which means a single controller may control multiple datapath elements, but a datapath element can only have a single controller. We propose an extension to the OpenFlow framework named flowspace management that relies on the slicing of flowspaces. Before we go further, let us briefly cover some properties of flowspaces and possible options for slicing them. Packet header information is typically structured into fields as a MAC address (source and destination), VLAN tag, IP addresses, protocol type, port, MPLS tag #1,#2, … etc. OpenFlow defines 14 of these headers in version 1.1 of the specification. The different headers correspond to multiple layers and allow the scalability of communication by reducing the number of communicating entities per layer. It is now possible to view all these headers as one large tag that gives an identifier to an individual packet or flow. This flattening of the namespace is an attractive feature of OpenFlow because it reduces the total numbers of layers (which typically translate into specific boxes in a carrier-grade network). Still, however, the functions at the border of the network need to restructure this flat label into meaningful headers that can be processed. The many-to-many relation of adjacent layers also enables the parallel deployment of different control planes and allows an the individual assignment of resources to one of the deployed control planes. This requires an appropriate solution for resource slicing or virtualization at any of the server control layers. When multiple controllers share a single underlying controller or datapath element, the slicing between them requires multiplexing/demultiplexing. Controllers of the control layer (n+1) need to be addressed from the underlying controller/datapath element. This demultiplexing of messages (e.g., the Packet_In message) needs to be encoded in the flow itself, as there is no additional information than the packet header. This means that the known endpoints of layer n are exposed to the layer (n+1) controller. Each controlling entity of a client layer must be aware of which resource slices are allocated to it. This information can be obtained from management entities, as it is done in GENI’s FlowVisor and Opt-In Manager solution [15]. The controlling entity can also poll the server layer to get the usable resources. As an alternative the controlling entity may request resources it is willing to control. A layer (n) controller exposes the known endpoints (which are addresses from flowspace n). A layer (n+1) controller then requests the slice by specifying a subset of this list. Multiple parallel layer (n+1) controllers would therefore exclusively share the set of layer (n) endpoints. For example, an Ethernet controller would pick the Ethernet ports to be controlled by it: two Ethernet controllers on one switch would be possible, and the slicing would take place on physical ports. As a second example, one layer higher, a number of MAC addresses that is known to an Ethernet controller (through MAC learning, for instance) can be divided among multiple IPv4 controllers, corresponding to multiple routers on a single Ethernet switch. These routers would receive their own MAC addresses for the IP router ports from the layer (n), in this case the Ethernet controller. A control layer receives two categories of configuration requests: downstream from a controlling entity (residing in a higher layer) designated to control the offered resources, and upstream from a controlled entity (residing in a lower layer) providing triggers. For example, a trigger can be a PDU from the data plane or a notification about the changes in the resources offered by the controlled entity. To sum up: the service API between adjacent control layers shall provide a means of issuing configuration requests toward a lower control layer, receiving configuration triggers from lower control layers along with enhanced management features – such as flowspace management, virtualization, and control slice isolation. The service API will also be exposed to external content providers and thus need authentication and security features. We define the following extensions for flowspace management:   We add a set of messages for signaling flowspace registrations between a controller and a datapath element. Both the Fsp-Open and Fsp-Close messages carry a flowspace description. This flowspace description consists of a structure ofp_match, i.e. which means we use the same mechanism for describing flows as used by OpenFlow. A controller may restrict its control to a single flow or request all flows traversing a datapath element by sending an all-wildcard flowspace description. A controller may send multiple flowspace registrations in parallel. © SPARC consortium 2012 Page 27 of 129 WP3, Deliverable 3.3     4.1.5 Split Architecture - SPARC A datapath element stores all incoming flowspace registrations in a local flowspace table. The datapath element checks any event requiring a controller notification against the flowspace table. The flowspace entry with the most precise match (in terms of exact hits, wildcard hits, and priority field) wins this competition and the associated controller entity is used as destination for the event notification. Flowspace management can operate in either overlapping or non-overlapping mode. If multiple flowspace entries match in overlapping mode, the event is sent to all controller entities with matching entries. In nonoverlapping mode a datapath element rejects a flowspace registration attempt that overlaps with an already accepted flowspace. For flowspace registrations, we can either adopt a soft-state or hard-state approach similar to Flow-Mod entries. When soft-state registrations expire, the datapath element automatically removes them, unless the controller refreshes the entry. Flowspace management does not affect the controller role model as introduced with OpenFlow version 1.2. Controllers can adopt one of the roles defined there: master, slave, or equal. In-depth recursive controller architecture Over the last ten years research in the area of Internet architecture has been influenced by the “clean slate” approach, largely driven by the difficulty of introducing even small changes in the services provided by - and the protocols installed in - production Internet routers. Researchers had been frustrated by the apparent block of innovation in the “real world”, so a part escaped into a Gedankenexperiment called clean-slate research, exploring what could be a reasonable new architecture for the Internet, fixing the weak points of the current architecture, among others: mobility support, route oscillations, a centralized and error-prone name resolution structure, and information replication. Clean-slate research helped in identifying basic principles of network layering (RNA, RINA), introducing new concepts like content-centric networking (CCN) and flow switching. The latter evolved into OpenFlow and promises to be the main paradigm of networking in the coming years. We also expect the previous two results to play a major role in networking over the coming years, in part enabled by OpenFlow itself. The potential benefits of CCN are the easy replication of much needed information in the network, reducing the overall load in the network, and - among other things - reducing the vulnerability to DDoS attacks. The main finding of the recursive network architecture was that the layering in today’s network stack follows a recursive pattern. A layer in a network according to RNA or RINA does not follow the OSI/ISO model, but in a sense is orthogonal to OSI’s functional split. Analyzing the multi-layered networks of today (like IP over MPLS over Ethernet over WDM) one can observe that a certain set of functions is present in each of those layers, in a way, replicating all seven OSI layers in each of the “real” layers of a carrier network. Therefore, recursive layering splits the stack according to name spaces (the “scope”) instead of functions. All functions that are required to establish communications between two or more processes must be present in each layer. Figure 12: Core functional blocks of a Layer. © SPARC consortium 2012 Page 28 of 129 WP3, Deliverable 3.3 - Resource information exchange (protocol) o - A network node has to forward and multiplex frames (or flows) from neighboring nodes. This is done by evaluating the address of the respective layer and looking up the forwarding table. The latter returns one or more points of attachment in the next lower layer, and results in the necessity to write these addresses into the frame before forwarding it. In the case of IP this means that the routing table lookup returns a next hop IP address that then needs to be resolved into the destination’s lower-layer address, which is a MAC address in most cases. Access Control o - Typically, a link between endpoints of a layer is controlled by some form of error checksum and flow control. The combined error and flow control of IP is implemented in TCP: Ethernet has the option to use flow control and all layers down to the PHY assure some form of CRC. Forwarding and multiplexing o - OSPF, Spanning Tree and LLDP all consist of three parts: a handshake for neighbour discovery, a subsequent exchange of resource information (be it link state in OSPF, distance vector in STP, or plain port identifiers in LLDP), and a subsequent calculation of a topology, and, using a CSPF, a routing table. Error and flow control o - Split Architecture - SPARC Access control means that a new node is only allowed to communicate within a network layer after a certain process that assigns an address to work in the name space that defines the layer. DHCP, PPP’s NCP are examples for this procedure in IP. The situation in Ethernet is different because the layer relies on fixed and a-priori assigned MAC addresses (creating ambiguities when it comes to dynamic creation of virtual machines). Directory (Name resolution) o Each layer relies on a mechanism that resolves the names of its name space to point-of-attachment addresses in the next lower layer. This mechanism is ARP for IP, or the process of MAC learning and storing the learned addresses along with the next lower name space, ports, in the forwarding table of an Ethernet switch. One of the possible upper layers of IP is nowadays predominant, and uses URL names that are resolved to IP addresses in the domain name system DNS. Other name spaces (like SIP) are used on top of IP as well, indicating that any knowledge of higher layer names in a lower layer would be potentially harmful. Introducing management functions to SplitArchitecture 4.2 Before we discuss the integration of network management (NM) into a generic SDN and our SplitArchitecture network, we will provide some background by giving a historical perspective on network management. We start with a functional definition of network management based on ISO and ITU-T models, and a brief recap of the traditional layering of telecommunication networks (i.e. data, control and management planes). Next, we discuss the relation between the ITU TMN framework and the three planes by discussing the evolution of network management for different architectures, leading to the SDN-based SplitArchitecture concept as discussed in this deliverable. 4.2.1 Definition of Network Management A network management model describes a set of recommendations or a framework for managing a transport network. Several network management models have been defined through the years by different standardization bodies, focusing on management of different network technologies. In order to define network management, we use the established network management model from the International Organization for Standardization (ISO) as a baseline together with the ITU-Ts Telecommunications Management Network (TMN) framework. ISO defined five functional areas of network management: Fault, Configuration, Accounting, Performance, and Security management - the so called OSI FCAPS model [46]. In this deliverable, we follow the network-centric perspective of FCAPS to divide network management functions. The functions are thus grouped as followed:    Fault management: detection, isolation, correction and notification of faults in the network. Configuration management: configuration of the network devices, provision of circuits and services. Accounting management: collection and storage of data on network resource usage, deliver payment and accounting information. © SPARC consortium 2012 Page 29 of 129 WP3, Deliverable 3.3   Split Architecture - SPARC Performance management: collection and storage of operational statistics on resource usage for network optimization and planning. Security management: secure access to network elements, resources and services. The ITU-T introduced the Telecommunications Management Network (TMN) framework [48] as a reference for how to operate and manage telecommunication networks. The TMN defines logical layers orthogonal to network management functions. There are five management layers, each providing the appropriate FCAPS functionality [47] according to the layer definition. The layers are network element layer, element management layer, network management layer, service management layer, and business management layer.      4.2.2 Business management layer: functions related to business aspects, which includes rather strategically and tactical management rather than operational management, as considered in this deliverable. Service management layer: creation, handling, implementation and monitoring, and charging for the services build on top of the transport connectivity managed by the network management layer. Network management layer: distribution of network resources, configuration, control, and supervision the network consisting of network elements. Element management layer: handling of individual network elements or groups of network elements; including detection and handling of equipment errors, collection of statistics for accounting, and logging of event and performance data. Network element layer: providing an interface to the network elements, as well as instances of modules providing the required functionalities to support all FCAPS functions. Modern view of transport networks In this deliverable, we discuss the split between control and data planes as one of the core concepts of SDN and SplitArchitecture. In this section, we start to discuss a third important plane in transport networks: the management plane. Indeed, today’s telecommunication networks are divided architecturally into three planes: management plane, control plane, and data plane. According to the ITU-T’s generic protocol reference model for telecommunication networks [51], the user (data) plane is represented by user entities (hardware and software components) that deal with the transport of the user information ensuring switching, multiplexing, flow control and data integrity. The control plane is responsible for control related functions to establish, manage and release communications to transport information among user entities, while the management plane (as the name indicates) is responsible for management-related functions. The establishment of a communication channel is the result of cooperation between the control and management entities and the information transfer service provided. The communication channel may have different characteristics: e.g., connection-oriented, connectionless, on-demand, permanent etc. According to the IETF GMPLS control plane framework [56], the control plane encompasses dynamic provisioning of paths, routing, path computation, signaling, traffic engineering, and path recovery. On the other hand, the fault, configuration, performance, and security management functions are placed in the management plane as defined by ITUT [49], IETF also adds requirements for object and information models to be put in the management plane, as they are needed to manage networks and network elements [57]. With regard to the network management of packet transport networks, data, control, and management planes are defined as follows: The management plane addresses router configuration, collection of statistics, and optionally fault and performance management. The control plane exchanges connectivity and reachability information between the routers that is needed to build a routing table and computes and identifies a path between communication endpoints based on the link cost (and other types of) information. The control plane is also involved in routing of packets by identifying the outgoing interface and the next hop router to which the packet should be forwarded. The data plane receives and processes all the inbound packets, either by forwarding them to a specific interface, discarding them, or processing them specifically according to the differentiated service traffic policies. © SPARC consortium 2012 Page 30 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Management Plane Control Plane Data Plane Figure 13: Three planes of telecommunication networks: a centralized management plane, managing both the distributed control and data planes. The control plane is positioned between the data and management planes (see Figure 13). Network elements are controlled either by the control plane or by both the management and the control plane. The management plane configures and supervises the control plane. While the control plane can make certain decisions and control the data plane, the management plane has the ultimate control over both the control plane and the data plane entities. 4.2.3 Evolution of network management for different architectures Traditionally, the network performs routing and switching functions by distributing the control and forwarding logic to all the network elements that are part of a network infrastructure. A network management system (NMS) is implemented as a centralized system on top of this distributed control plane, collecting information remotely from the network (Figure 13). In heterogeneous transport networks, a number of vendor-specific and technology-specific NM solutions have been deployed, managing a specific type of transport network elements. However, these different NM solutions are not integrated, and have proprietary interfaces towards their control plane, which have not been standardized. To enable dynamic, policy-driven control of transport networks, the ITU-T defined an architecture for Automatically Switched Optical Networks (ASON), while the IETF was implementing a similar idea with the generalized MPLS (GMPLS) concept, focusing on control plane signaling. ASON/GMPLS defines a unified control plane for different data path technologies (i.e., different types of switching). It also includes definitions regarding management interfaces towards the control plane and the data plane. Prior to this, each data path technology had its own control plane and its own interface(s) towards the management plane. A unified control plane enables on-demand provisioning of end-to-end services (optical circuits) to assign bandwidth dynamically on demand, thus enabling new switched services with dynamic bandwidth requirements. The optical circuits (and their bandwidth) previously provisioned statically, a priori in order to provide bandwidth guarantees for end-to-end IP traffic, and could not be modified subsequently. However, after high-demand applications appeared with strict requirements of latency, loss, bandwidth and reliability in terms of protection and resilience, the need for support of dynamic bandwidth assignment based on the actual demand (i.e., on-demand bandwidth provisioning) arose. The ASON/GMPLS approach of addressing these provisioning issues was to add intelligence to the control plane. It integrated parts of the provisioning and configuration process into the control plane, automatically updating the network information database, all with the goal of simplifying network operations, reducing its cost, and quickly responding to failures by dynamically rerouting traffic. The GMPLS unified control plane offers a single interface to a management plane towards the control plane, while in the past the NMS had to have different interfaces for each different control plane. The functions that remain in the management plane are configuration and monitoring of the transport and control plane entities or services. In ASON/GMPLS, we can observe the trend of moving some management functions into the control plane. The ITU-T specified the relation between ASON and the TMN architecture as the framework for ASON management in [50]. The relationship between management, control and data (or transport) plane is similar as described above, with the management plane directing both the control and the transport plane, while the control plane itself directs the data plane and reports northbound to the management plane. The control plane in this definition takes over functions that have been part of traditional TMN layers for service management, network management and element management, such as connection and call control, neighbour discovery and routing control. © SPARC consortium 2012 Page 31 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC However, the ASON/GMPLS control plane is still distributed (except an optionally centralized PCE), i.e. control functions are considered part of the network element layer. Thus, the datapath elements also have network knowledge and are involved in the topology discovery. 4.2.4 Network management for SDN In the meantime, the SDN concept moved the network design into another direction, by making the datapath elements dumb and centralizing the control plane, e.g. by replacing distributed routing algorithms on the network elements by centralized route calculations performed in an SDN controller. Signaling in this architecture is done by the SDN controller via its southbound interface towards the datapath elements (e.g. OpenFlow). SDN has emerged as a new trend of building networks, by moving the intelligence from the data plane to the control plane. Hence, SDN is supposed to deliver cost-efficient solutions that are easier to manage, resulting in expected savings in both CAPEX and OPEX. Compared to ASON/GMPLS, the control and data planes are not only decoupled in this design, but the control plane is also logically centralized in an SDN controller. Management Plane Network Mangement System Proprietary NM Control Plane Proprietary NM OpenFlow Controller OpenFlow Data Plane OpenFlow Datapath Element Figure 14: Network management introduced to OpenFlow-based SDN via a fully separated, external NMS An obvious way to introduce a complete network management framework to an SDN network is to include a management plane on top of both the data and the control planes, according to the traditional network model (Figure 14). Network management would be fully contained within the management plane, adding additional management interface(s) to the network elements. While this might seem to be a feasible solution, we believe that this approach to the problem is not optimal, considering that both control and management planes are centralized elements in SDN. The SDN architecture blurs the boundary between the control and the management planes. An SDN controller contains and maintains an updated network view, computes the paths, creates the forwarding rules and installs these rules on datapath elements - and in doing so provisions all the connections. Some of these functions (e.g., configuration of routers, topology discovery, service provisioning) can be traditionally seen as network management functions. The fully separated solution in Figure 14 implies that some types of events (e.g. topology updates or port state changes) would be duplicated and sent both to the controller and the NMS, and both entities would be required to keep their own central view of the network topology and state. Furthermore, certain state changes could trigger reaction from both entities. Without a certain level of synchronization between controller and NMS, this could result in an inconsistent state of the network and its elements. Finally, some notifications are only sent to the NMS. However, modern controller applications might very well require this information to react in a close-to-real-time fashion. As an example, consider device management data about power consumption, which is a main input to a power optimization-based traffic engineering policy. An alternative way to introduce network management to SDN is to acknowledge the existence of traditional network management functions in the control plane. As a result, parts of the traditional management plane functions could be placed in the control plane. We will refer to these functions as network management functions (NMF). In the Figure 15, these functions are depicted as a separate module in the control plane. The actual realization of these functions is implementation specific, and it would be possible to integrate these functions completely into the controller software. The controller uses the southbound OpenFlow interface for control of the datapath elements. The NMF module would use open standard network management protocols and data models as southbound interfaces, such as the NETCONFbased OF-config by ONF. Due to the lack of a defined northbound interface, we assume that any API to an NMS can be implemented, similar to the fully separated solution in Figure 14. © SPARC consortium 2012 Page 32 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Management Plane Network Mangement System Proprietary API Control Plane OpenFlow Controller OpenFlow Data Plane NMF NM API OpenFlow Datapath Element Figure 15: Network management introduced to SDN by integration of selected network management functions (NMF) with the controller. The advantages of a dedicated NMF module, implementing certain network management functions in the control plane, are    Removal of duplicate functions in two centralized elements (i.e. the controller and the NMS), thus limiting notification overhead and possible race-conditions of reactive-actions. Allowing for more timely reaction to management data (e.g., fault and performance) in the control plane, e.g. for TE purposes. Replacing proprietary management protocols and data models with open and standardized “southbound” interfaces towards the network elements. There are however some detailed questions to be answered in a scenario as depicted above. First, we need to define the exact set of functions to be included into a NMF module within the SDN control plane. Second, we need to extend existing protocols (e.g. OpenFlow of OF-config) or define new open standards to support the southbound interaction of NMF with the datapath elements. In the next subsection, we will present our proposal of a network management architecture for SDN, which will provide some initial answers to these questions. 4.2.5 SPARC management integration proposal The ONF did not yet define any complete network management framework for SDN. As described in Section 3.1.3, the recent ONF view includes the OF-Configuration point as a separate logical entity that configures datapath elements. However, no interface for exchanging the data between the OF-Configuration point and the controller has been defined. Additionally, there are no northbound interfaces defined for interaction between the applications and the OFConfiguration point, or between the applications and the controller, which could be used for management. Moreover it is not clear how network monitoring should be performed or whether a new interface needs to be defined or if the existing protocols can be extended for this purpose. We propose a generic network management solution similar to Figure 15 for a carrier-grade split architecture transport network. The proposed architecture, depicted in Figure 16, is based on the current SDN model defined by the ONF. © SPARC consortium 2012 Page 33 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Network Mangement System Management Plane Proprietary API (Proprietary API) Adapter NB API OpenFlow Controller Control Plane NMF Controller Shared data OpenFlow Data Plane Config Point Monitoring Point OFConfig OpenFlow Datapath Element OFMon? Alarm/ Notifiction OAM Tools Figure 16: Proposal for a carrier split architecture with integrated NM The real-time parts of a dataplane element are controlled via the OpenFlow protocol. Additionally, each datapath element needs to provide a NETCONF interface and support for the OF-config configuration scheme. In order to enable additional fault and performance management, OAM tools responsible for performing the actual measurements are required. A separate discussion on OAM tools for the OpenFlow-based SplitArchitecture can be found in Section 5.3. Finally, a monitoring interface is also required to report on results and alarms. In Figure 16, we call this interface OFMon, which could be realized as a separate interface or as an extension to OpenFlow. An example of a separate interface is the SNMP protocol, with its possibilities to allow retrieval of measurement and status data via the monitoring points, and to send alarms to the monitoring point asynchronously via traps. Alternatively, the OpenFlow protocol itself might also be suited for this purpose due to its real-time characteristics. While OpenFlow already supports retrieval of flow and port-related statistics, it lacks OAM-related reporting structures. It should be straightforward to add new message types for both asynchronous alarm messages and periodic measurement results to OpenFlow. The exact specification of these messages would be dependent on the specific alarm and OAM tool. The SDN control plane consists in the first hand of a controller in accordance with the definition of SDN [44]. The controller maintains a global view of the network, e.g. in form of graphs with annotated nodes and edges. The controller also provides the southbound OpenFlow interface for deploying the packet forwarding rules. We propose that the OpenFlow controller includes a network management function (NMF) module for configuring as well as fault and performance monitoring. Being part of the same controller, the NMF shares the network view of the controller, including topology information and updates to this view in term of alarms or notification from the data plane. The OF configuration point is part of the NMF module, providing an OF-config interface to the datapath elements. Additionally, the NMF module comprises an equivalent monitoring point, providing the monitoring interface (OF-Mon) towards the data plane. In terms of functionality, the NMF can take over responsibilty for configuration, fault, and performance management functions that are useful in the control layer, as discussed below in Section 4.2.6. Through the controllers northbound API 2 , the controller and the NMF module interact with an external NMS. These interactions include configuration of the controller, the NMF and policy input by the NMS, and updates of network state, fault and performance data towards the NMS by the controller. The management plane is represented by an external network management system (NMS). In a typical carrier network, NMS entities will be represented by commercial NMS solutions, but could potentially also be implemented as customized NM applications or even simple CLI interfaces. In most operator environments, however, we assume that some form of NMS is already in place and could be adapted to manage the SDN domain as well as legacy equipment. To take advantage of the controllers northbound API, a lightweight NM adapter application is used to translate the controller northbound API to the proprietary API interfacing the NMS. If the management functions required for the SDN domain go beyond the scope of the configuration and monitoring interface provided by the NMF, a further device or vendor specific management interface from the NMS to the devices might be required. Such specific management functions may include many device management tasks, as identified in Table 2. In general, the external NMS is responsible for all remaining FCAPS functions not covered by the control layer, as discussed in the next section. 2 The northbound interface of a SDN controller is yet undefined by subject of ongoing discussions in the ONF. Here, we assume that each controller implementation has its own common northbound API defined. © SPARC consortium 2012 Page 34 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC In general, this generic architectural proposal makes it possible to flexibly choose whether to place network management functions within a controller or in an NMS on a per-case basis, depending on the exact scenario and usecase in question. Parameters affecting such analyses include the scale of the network (number of devices and geographical spread), existing legacy infrastructure in place, type of transport technologies in use and type of services to be supported. Thus in certain scenarios, either the controller based NMF or the external NMS could be designed to be minimalistic or even be non-existent. In the next section, we will present a way to handle the design choice of where to place network management functions, and provide our recommendations for carrier type networks. 4.2.6 Analyzing the placement of Network Management functions The key question in the proposed NM architecture is the assignment of traditional network management functions to the layers in an SDN environment (e.g. the generic architecture depicted in Figure 16). In other words, we assess which functionalities should be integrated into the SDN controller, and which should remain in an external NMS. Note that in the current SDN model by the ONF, the OpenFlow controller is already responsible for the updated network view, path computation, determination and configuration of forwarding rules and provisioning of connections. The OF configuration point supports additional configuration functions, such as controller assignment, resource configuration, certificate handling, capability discovery and basic configuration of tunnel endpoints. In Table 2, we list common network management functions of the TMN layers and asses the responsibilities for them in an SDN environment. Besides functions that are already part of the control plane, the analysis as to whether a NM function should be placed within the control plane or stay in the management plane is based on the following three questions: Q1: Is the function already included in the ONF SDN model or the carrier-grade controller framework as proposed by SPARC? If not, Q2: Does the information provided by the NM function help the controller framework to configure and steer the network in timely and automated fashion in order to provide carrier-grade performance? If so, Q3: Do the southbound controller interfaces defined by the ONF allow straight-forward support for the NM function (i.e. would required extensions to the OF / OF-Config protocols be simple and keep the protocol “open”, without bloating or overloading it with vendor or device specific elements)? Table 2: Assessment of NM functions and potential for control plane integration NM function Element management functions: Firmware management Device monitoring (temp., etc) Device monitoring: Power consumption Control network bootstrapping Resource and capability discovery Logical swtich instatiation Control channel (addresses and credentials) Fault detection (equipment) Alarm management Logging of alarms Logging of statistical data Resource usage (cpu, buffer, queue-length) Network management functions: Topology discovery (creation of network view) Path computation & setup Flow table management Tunnel management Traffic engineering (creation of QoS paths) Fault detection (link level) Link performance monitoring Network performance optimization Resiliency measures Service management functions: Accounting User management and AAA Service definition and administration Service OAM configuration QoS management (service delay, loss) SLA management © SPARC consortium 2012 FCAPS Groups Q3 open Q1 included? Q2 timely? interfaces? Proposed CP integration config performance performance config config config config / security fault configuration fault, accounting performance, accounting performance no no no no yes yes yes no no no no no no no yes¹ no no no no no yes no no yes² no no OF-mon no OF, OF-config OF-config OF-config no OF-config no no OF-mon no no yes¹ no yes yes yes no yes no no yes² config config config config config fault performance performance performance/config yes yes yes yes yes yes³ no no yes yes yes yes yes yes yes yes yes yes OF OF OF OF-config OF OF-mon OF-mon OF, OF-config OF, OF-config yes yes yes yes yes yes yes yes yes accounting no no no no accounting / security no no no no config no no no no yes* config no yes OF-config performance no yes OF-mon yes* accounting no no no no ¹ for energy-aware networking (see section 5.7) ² for logical switches sharing switch resources (see section 5.2.4) ³ implemented in SPARC as BFD (see section 5.3.3) * assuming service controller functionality in the CP, as in SPARC D4.3 Page 35 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Table 2 lists the most common network management function required by the SPARC use case of a carrier-grade access/aggregation network. We grouped the functions according to the TMN layers element- to service management. The second column indicates the broader function in terms of FCAPS. Columns Q1-3 refer to the three questions stated above, helping us to assess for which functions it makes sense to be integrated in an SDN controller. Finally, the last column provides our recommendation, based the results of the columns Q1-3. 4.2.6.1 Element management layer functions Functions in the element management layer are mainly related to bootstrapping, device configuration and monitoring, as well as logging of statistical data and alarms received from the OAM tools at the network element layer. Most of these functions are not time-critical in terms of controller reaction, or require hardware or vendor specific interfaces (e.g. firmware management) that might not need to be integrated into open SDN standards. As an example, bootstrapping of the control network connection could be done via existing auto-configuration mechanisms like DHCP (see Section 5.5). However, configuration of the devices is currently one of the prime tasks of OF-config. Hence, we suggest integration of device configuration in the control plane, given that the function can be supported by extending the OF-config configuration scheme and can thus be communicated via the NETCONF-based OF-config protocol. Possible exceptions in terms of required quick controller reactions could be certain performance data of specific device parameters. As one example is energy aware networking, which we propose in Section 5.7. In this case, the controller needs updated power consumption figures for interfaces and devices in the network in order to be able to react accordingly, e.g by multilayer traffic engineering (MLTE) topology optimization, or by reconfiguring interfaces to burst- or adaptive link rate mode (see Section 5.7). Another example is network virtualization, which is discussed in detail in Section 5.2. In a virtualized scenario, the physical resources (e.g. CPU, memory, queues) in a datapath element are shared among several tenants by slicing the physical network element into several isolated logical OpenFlow switches. To ensure the service level agreements (SLA) with these tenants, it might be necessary for the controller to react in cases when physical resources become scarce and logical switches are threatened with resource starvation. In these scenarios, the controller needs fault notifications or continuous performance data updates from the device, which allows it to react, for example, through resource reassignment or even migration of logical switches to other network elements. 4.2.6.2 Network management layer functions Functions in the network management layer involve configuration and provisioning of the network as well as steering and monitoring the traffic. To a large degree, these functions already covered by SDN controllers and communicated via OpenFlow – and, of late, via OF-Config as well. Topology discovery, for example, is most commonly done via a controller-based LLDP-like mechanism (see Section 5.5). Functions related to link, path and tunnel provisioning and configuration are the very essence of many controllers and are nearly fully supported by existing OpenFlow specifications. Fault and performance monitoring, however, are currently only supported in rudimentary fashion by ONF protocols, e.g. by polling flow, port, group and queue statistics or receiving asynchronous port status messages via OpenFlow. However, fault and performance data is essential for the controller to realize protection and restoration, as well as performance optimization. In Section 5.3 we propose more advanced carrier-grade OAM tools for more finegrained and specific fault and performance monitoring of the network. As a monitoring channel (OF-mon), we propose either a monitoring extension to OpenFlow or, alternatively, adaption of an existing monitoring interface, e.g. SNMP. 4.2.6.3 Service management layer functions Network services, such as residential internet access or E-LINE connectivity, provided to customers are provisioned on the top of the common packet based transport infrastructure. The services are managed in the service management layer according to the TMN framework, which includes functions for service creation, implementation and monitoring. Service creation typically touches a few service nodes on the edges of the transport network, which provide the underlying end-to-end transport connectivity which is managed by the network management layer. In some SDN environments, service management might not be supported by the SDN controller, which means that related management functions would need to be performed by an external NMS. For a carrier-grade controller framework, we assume a control plane that already includes service creation functionality. In SPARC, we proposed to organize the controller into transport and service control regions, as also realized in the SPARC prototype described in SPARC Deliverable D4.3. These regions can be implemented, for example, through a recursive control plane architecture, as proposed in Section 4.1. In this case, the services can take advantage of control plane service level fault and performance management to enable fast reaction to any type of service degradation. The fault and performance management functions need support from service OAM tools on the network elements as well as a monitoring channel between the controller and the data plane (OF-mon). © SPARC consortium 2012 Page 36 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC However, not all service management functions are necessarily time critical or need to share topology data with the controller. For example, service billing, SLA management and accounting do not have to be done in the control plane, and should remain in an existing NMS system. In this case, the required information is exported by the controller through the northbound API and the NMS-specific adaptor application. Concluding the Analysis 4.2.6.4 While the analysis in this section provides initial recommendations on the placement of management functions, a final assessment needs to be done on a per-case basis, depending on the exact scenario and use case in question. According to our analysis for carrier-grade networks, typical network management layer functions are best placed within the controller framework of an SDN architecture. Additionally, the controller should have access to OAM tool information from device, network and service levels, which is critical for effective performance and fault management. On the other hand, the remaining device and service management layer functions to not require time-critical controller reaction or are not straightforward to implement with existing and planned southbound controller interfaces. An external NMS in a carrier SDN scenario might be more suitable to take over vendor and device-specific device management tasks, as well as service and business-level policy management and accounting tasks. We started our discussion of network management by considering traditional models and definitions, which we then tried to apply to a generic SDN model. It an SDN scenario, both management and control are performed centralized, decoupled from the actual datapath elements. Based on this observation, we conclude that it is difficult to differentiate precisely between control and management in the context of SDN. Within SPARC, we used timeliness and automatic configuration (i.e. real-time reactiveness) as the differentiator between control and management functions. However, depending on the specific use-case and the technology used, the results of such an assessment might look quite different from case to case. We believe that our generic NM integration proposal (cf. Figure 16) allows enough flexibility for the placement of specific network management functions to cope with this architectural tradeoff. 4.2.7 Combined SPARC network management and recursive control framework Within SPARC, we initially had a strong focus on the control plane architecture and only started to consider network management aspects very recently. In the following paragraphs, we will provide our initial ideas on a combined control/management architecture. However, a mature network management solution requires further discussions beyond the scope of SPARC. The network management proposal presented so far targets a rather generic SDN architecture, taking current SDN models as defined by OpenFlow and the ONF into account (cf. Figure 3 (II) and (III)). In the following paragraphs, we discuss how to integrate the proposed generic management framework with the hierarchical, recursive controller architecture discussed in Section 4.1. We will revisit the question about the role of a network management system (NMS) in a hierarchical control layer scenario. Furthermore, we will clarify the functions of additional, managementspecific modules in this scenario, which includes the suggestion network management function (NMF) module in control entities as well as OAM tools within (logical) datapath elements. We start by considering a recursive control architecture as depicted in Figure 11. In this control architecture, we stack multiple control planes, where each plane acts as controller for the lower control planes, while at the same time providing a filtered, abstracted view of its own control plane to higher layers via a virtual datapath node. The concept of virtual nodes makes it possible to (re-)use OpenFlow as the interface between the control planes. We outline this basic architecture in Figure 17. CTRL N M S CTRL CTRL OF-config / OF-mon Mgmt API CTRL control plane (n+1) OpenFlow OAM DP NMF CTRL control plane (n) DP DP DP DP DP control plane (n -1) Figure 17: Combined control and management framework in the recursive controller architecture © SPARC consortium 2012 Page 37 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC We propose integrating the SPARC network management proposal (depicted in Figure 16) as highlighted in red color in Figure 17. The additional new entities added to the hierarchical controller architecture are OAM tools and the network management function (NMF) component in the control framework, as well as an external network management system (NMS)3. Furthermore, additional interfaces between the stacked control planes are provided through the configuration and monitoring protocols OF-config and OF-mon 4, as introduced in Section 4.2.5. We currently do not define the management interface between the individual control planes and the NMS, but we assume the control entities can adapt to a management interface supported by the NMS. Obviously, the OpenFlow-related management APIs OF-config and/or OF-mon are good candidates as well. 4.2.7.1 Management interfaces The main purpose of the network management in this scenario is configuration and monitoring of the control and data planes. Regarding configuration, the hierarchical control architecture requires additional functions besides the traditional management functions listed in Table 2. As pointed out in Section 4.1, these are flowspace management and a more flexible port management by replacing OpenFlow’s current physical port model with a generalized transport endpoint model. However, given the recent introduction of OF-config, both flowspace management and endpoint (i.e. port) management could also be performed through extensions to the ONF configuration protocol. Furthermore, with regard to an external NMS, the additional functions could also be carried out centrally by using a horizontal management interface. So far, we considered a distributed method, i.e. via OpenFlow protocols from a higher control plane (n) to the lower, serving control planes (n-1). As discussed above, this partitioning of management functions needs to be done with the specific use-case and the capabilities of the network management system in mind. Regarding monitoring, a monitoring interface (OF-mon) needs to facilitate the handling of alarms and events. This includes management of the OAM tools as well as event correlation and alarm propagation based on the output of the OAM tools. Either the NMF in the controller or the separate NMS needs to provide an event correlation function with the purpose of identifying and locating the possible cause of the potential failure or service degradation. This event correlation function can be centralized in the NMS, in which case the NMS could only control the number of events through the configuration of a collection time interval in OAM tools. Obviously, this type of centralized event correlation poses extra requirements on the NMS infrastructure in terms of memory and computing power. Another alternative is to perform this function within the NMF module in each control plane. In this case, the NMF module needs to include capabilities to suppress alarms, reducing their number and correlating them before forwarding them to higher control planes or the NMS. In this scenario, the data collection and event correlation functionality is effectively distributed along the controller hierarchy. However, this requires coordination among different control planes and the NMS, which is a study subject for future work. 4.2.7.2 Distributed vs centralized network management As pointed out earlier, it is hard to define a strict boundary between network management and control in SDN networks. Thus, we propose a flexible placement of specific network management functions in order to cope with the architectural tradeoff to be found for each use-case. This flexibility is indicated with the fat, red arrow in Figure 17. At one extreme, network management functions can be placed centrally in an external NMS, or at the other extreme be distributed completely to different control planes of the hierarchical controller. In the latter case, the element and network management functions listed in Table 2 would be placed in lower control planes, whereas service management function would naturally reside in higher planes of the control architecture. A possible alternative is to propagate registrations, configurations, and notifications through the hierarchical stacked controller planes, where only one plane (e.g. the top or bottom plane) would be connected to the NMS via a horizontal management interface. However, the lack of direct interaction between an NMS and the lower layer controllers (including physical datapath elements) can be a shortcoming, due to the need to propagate all the messages through the whole chain of controllers. By messages, we refer to the configuration and initiation of each control entity (in terms of flowspaces, logical endpoints, and network management functions) and datapath elements (in terms of device management, performance and fault monitoring), as well as the propagation of notifications and alarms through the layered hierarchy. We use the term “network management system” as a generic term for the higher four layers of the ITU-T TMN model, ranging from business to element management. 4 When the recursive control plane was discussed, the ONF has not yet specified any management related protocols: OF-Config was not yet released; and a monitoring interface is starting to be discussed in the ONF only at the time of writing this deliverable (Sept. 2012). Note that “OF-Mon” is currently only a working title used by SPARC. 3 © SPARC consortium 2012 Page 38 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC These shortcomings can obviously be addressed by connecting each control plane via a horizontal management interface to the NMS. The benefit of this approach is that it enables each control plane to be configured independently of the other layers. In this case, NMS can be connected to each control plane and be used to configure, initiate, and manage all the controllers and monitoring tools. The measurement data and notifications do not need to traverse the whole hierarchy of controllers in order to reach the NMS, which can reduce the complexity of the solution. For example, there is no need to implement the forwarding of messages or keep the notification state in the controller, in case the connection with the upper controller(s) is lost. However, connecting the NMS to each control plane requires adding more interfaces, which might require some management overhead, but could result in more efficient data delivery by not needing to rely on a single channel for the interaction of all controllers with NMS. In practice, a hybrid solution might be desirable, e.g. allowing notification of both the NMS and higher control planes in case of certain events. However, this would require strict assignment of responsibilities between the centralized and distributed management functions. © SPARC consortium 2012 Page 39 of 129 WP3, Deliverable 3.3 5 Split Architecture - SPARC OpenFlow Extensions for Carrier-Grade SplitArchitecture In the duration of the SPARC project, we concluded that current OpenFlow-based implementations do not fulfill carrier requirements, thus protocol extensions and standardization of certain functionalities are needed. As documented in Section 2.1, the original requirements defined in deliverable D2.1 have been refined with each consecutive deliverable of WP2 and WP3, leading to the final list of requirement groups (i.e. network features) presented in Section 2.2. In Section 4, we have covered requirement groups (a) and (b) on control and management architecture. In this section, we provide proposals for OpenFlow extensions defined to fulfill requirement groups (c) to (k) as listed in Table 3. All these protocol extensions also imply additional functionality on the network elements which go beyond pure packet and flow forwarding. For each topic, we provide an introduction and the motivation by outlining current state-of-the-art solutions. We then describe our proposed improvements to an OpenFlow-based SplitArchitecture in order to enable the respective feature. Specific technical details, such as extensions to OpenFlow configuration procedures or OpenFlow protocol messages, can for some topics be found in SPARC Deliverable D4.2 “OpenFlow protocol suite extensions”, as indicated in the table. Table 3: List of study topics requiring OpenFlow extensions Section 5.1 Requirement group, i.e. study topic Extensions defined in D4.2 5.1 (c) Openness and Extensibility Yes 5.2 (d) Virtualization and Isolation Yes 5.3.3 (e) OAM: technology-specific MPLS OAM Yes 5.3.4 (e) OAM: technology-agnostic Flow OAM No 5.4 (f) Network Resiliency No 5.5 (g) Control Channel Bootstrapping and Topology Discovery No 5.6 (h) Service Creation Yes 5.7 (i) Energy-Efficient Networking Yes 5.8 (j) Quality of Service No 5.9 (k) Multilayer Aspects: Packet-Opto integration No Openness and Extensibility A closer look at the OpenFlow processing framework and its present capabilities immediately reveals one of its major deficiencies: its rather limited set of supported protocols. Initially invented in a campus-networking environment, OpenFlow 1.0 supports a basic set of common protocols and frame formats like Ethernet, VLAN, and ARP/IPv4. The OpenFlow specification authors have continuously added new protocols to the evolving specification like MPLS in OF1.1, IPv6 in OF1.2, or PBB in OF1.3. This for sure makes OpenFlow more useful for specific use cases and environments, but a general framework capable of adding support for yet unsupported protocols seems desirable. Just refer to the access/aggregation use cases discussed in this and the accompanying documents from WP2, where protocol support for PPP and PPPoE is a mandatory requirement. 5.1.1 Extensions for a Recursive Architecture OpenFlow defines in a generic manner a Service Access Point and its basic primitives: Port-Status, Port-Modify, Packet-In, Packet-Out and its more advanced version Flow-Modify. In our recursive architecture as introduced in Section 3 we use the OpenFlow API as interface between a stacked series of transport controllers (in that sense a datapath element is also a controller, a PHY-port controller). Each transport controller defines transport endpoints on its specific layer, e.g. an Ethernet controller de-multiplexes based on MAC addresses and exposes Ethernet ports to the © SPARC consortium 2012 Page 40 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC next higher layer (it provides Ethernet transport services via these transport endpoints). An IP controller exposes IP transport endpoints to the next higher layer, thus providing IP transport services. In the original concept of OpenFlow, a port is a well-defined entity: a physical (or logical) port with Ethernet like properties and configuration parameters. However, in our recursive architecture, a port (i.e. a protocol specific transport endpoint) may define additional or different configuration parameters, e.g. an IP port defines a local source IP address assigned to this port, a network mask, and a peer address (in case of a point-to-point link). The OpenFlow specification defines a fixed C-structure that is limited to Ethernet parameters for its Port-Status messages. We propose to use an extension to OpenFlow that utilizes a TLV-based approach for specifying port specific configuration parameters in struct ofp_port similar to the TLV based approach for struct ofp_match. Figure 18: Port Management Messages and Port life cycle In the recursive architecture, ports are de-multiplexing transport endpoints and except within a PHY-controller (i.e. an Ethernet based datapath) or an optical datapath (i,e, a fiber based datapath), these ports are logical entities used for demultiplexing flows. We replace the physical port model defined in OpenFlow with a more generalized transport endpoint model. This includes the physical port model as used in OpenFlow so far, but adds some flexibility:   We add an additional protocol message to OpenFlow for management of transport endpoints in layer (n) by layer (n+1). This Transport-Endpoint protocol message controls the CRUD5 lifecycle of a transport endpoint. We replace the so far statically defined port description with a TLV based eXtensible Port Parameter set (XPP). A transport endpoint stores a number of layer specific configuration parameters in an XPP set. For a controller entity on layer (n) we define a protocol extension that allows creation of transport endpoints on layer (n-1), e.g. an IP controller may request a new Ethernet transport endpoint and access to its communication services. OpenFlow defines Port-Status messages for signaling port management events from a datapath (i.e. layer (n-1)) to a controller (i.e. layer (n)), as ports in the original specification are primarily physical ports. For the recursive architecture, we propose a symmetrical Port-Status message, i.e. a controller may open a new transport endpoint (i.e. a port) in the adjacent lower layer. Figure 23Figure 18 depicts the life cycle of a transport endpoint deployed in layer (n) and controlled by layer (n+1). All commands (Transport-Endpoint, Port-Status, and Port-Modify) adopt a TLV-based port configuration structure that replaces the current static C-structure. We call this an eXtensible Port Parameter set (or XPP set for short). Port-Status and Port-Modify messages are asymmetric messages in the original OpenFlow specification, i.e. OpenFlow lacks the ability to signal a result status via an acknowledgment back to the originator of the operation. Upon reception of a Transport-Endpoint message for creation of a new transport endpoint, the datapath element sends a Port-Status or PortModify message back with the result of the requested operation and the current port configuration. A port creation operation might fail, e.g. when the layer (n) instance is a PHY-port controller or when a transport endpoint address is already in use by another endpoint. In addition, the set of extensible port parameters defines the type of ports exposed by a layer (n). Via a Features Request message, a layer (n+1) entity should obtain the set of valid port parameters for the specific layer. 5 CRUD stands for “create, read, updated, delete”. Note that Figure 21 visualizes “update” as “modify”. © SPARC consortium 2012 Page 41 of 129 WP3, Deliverable 3.3 5.1.2 Split Architecture - SPARC The Various Processing Types OpenFlow defines an action based processing framework and a surrounding forwarding logic for packet manipulation where individual actions define specific atomic processing operations like pushing a header tag, decrementing a field or setting some header field to a specific value. This processing model supports a number of use cases (e.g. emulating a simple IP router with decrementing TTLs, setting source MAC addresses, and so on), but it also implies a number of constraints. All actions defined in OpenFlow are lightweight and state-less in nature, i.e. an action never takes into account state from the history of preceding packets. However, there are some use cases that require a more advanced processing framework, either because they define dependencies to preceding packets, or because they need a more advanced processing logic. One example is block-chaining encryption codes that require directly results from preceding packet operations as its input. Use cases that contain encryption schemes cannot easily be realized using OpenFlow. Another example is OAM support, which is discussed in more detail in Section 5.3. OAM endpoints typically have some specific time constraints on detecting and reacting on failure conditions. Those timing constraints can only be fulfilled such when all time critical components are deployed directly in the data plane. OAM processing implies a more advanced programming logic: generating packets, setting timers, running a finite state machine, and so on. An example is our BFD implementation done in SPARC that is generating test messages and runs timers for detecting test message losses. However, some use cases force us to define a processing logic beyond the existing packet manipulation framework. OAM is a typical use case here, where the OAM endpoint typical injects and removes test messages and runs some internal timer based logic for detecting loss of such test messages. The various OAM frameworks typically define sets of different OAM types, e.g. pure connectivity checks for testing the physical links, or OAM associations that check the proper configuration of all flow tables of datapath elements along a specific flow path. For such non-packet related processing, an additional processing framework based on virtual ports seems quite useful. Figure 19: Actions vs. Processing Entities Thus, we end up with three groups of processing requirements and means to implement these processing types: 1. State-less, lightweight packet processing (supported already since OpenFlow version 1.0) 2. State-full processing that stores and takes into account the history of preceding packets.  Solution approach: We propose Processing Instances and a new action named Process. 3. Parts of complex state machines requiring execution directly on the datapath to meet strict timing constraints.  Solution approach: We revisit virtual ports and discuss their applicability for the use case OAM. We discuss state-full packet processing actions in the next subsection and re-consider the virtual port concept as nonpacket processing related processing framework afterwards. 5.1.3 State-full Packet Processing and Action Process We propose the following extension to the OpenFlow packet-processing framework: We define processing instances that are persistent (or at least long living) entities following a CRUD approach, i.e. they are explicitly created, updated, and deleted on/from the datapath very similar to group table entries as defined by OpenFlow 1.1. A processing instance contains a specific packet processing logic and maintains state across several consecutive packets, i.e. it may store an entire flow packet history. It acts as a packet filter, i.e. it filters and processes packets. When leaving the processing instance, packets are re-injected into OpenFlow’s existing action execution logic. Loading and configuring a processing instance’s internal logic is out of scope of OpenFlow: a proprietary API may co-exist with the OpenFlow interface. Similar to group table entries, a processing instance obtains a unique identifier for reference purposes. © SPARC consortium 2012 Page 42 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC In addition to processing instances, we define a new action for OpenFlow named ActionProcess. An ActionProcess filters a packet through the processing instance referred to by the Processing Instance ID (ProcInstID) stored within ActionProcess. Figure 20: ActionProcess and Processing Instances Action “Process”: When executed, this action redirects packets to a specific processing entity instance. An Action “Process” decouples a Flow-Mod entry from the processing entity. All OpenFlow 1.1 action-specific constraints also apply to the Action “Process” i.e., only a single action of any type may reside within an ActionSet. OpenFlow 1.1 defines an ordered list of actions, i.e., all actions within an ActionSet must be reordered according to this ordering policy before actually executing the ActionList. All “Set” related actions (lightweight processing) are executed after decrementing TTL values (position #5) and before any QoS-related actions are applied (position #7), therefore Action “Process” should occur before or after position #6. No restrictions apply to using Action “Process” inside a group table entry. 5.1.4 Advanced Processing using Virtual Ports Beyond plain packet processing, some use cases require a more advanced processing framework. As an example, consider OAM related use cases, where specific probe messages are interleaved with the existing stream of packets for testing proper operation of a specific (stitched set of) link(s). A similar problem arises in the PPPoE/PPP related use case, where LCP-ECHO request/reply exchanges monitor the state of a specific PPPoE/PPP session. Due to the strict timing constraints of most OAM schemes, we aim towards deploying all of its time critical parts on the datapath element and keep only non-critical parts in the control plane’s slow path. Consider two typical examples of OAM schemes: a connectivity check (CC) scheme that tests the physical link between two datapath elements, and an OAM association that monitors proper configuration of flow tables along a specific flow path. Figure 21 depicts both scenarios: for the connectivity check use case we want to avoid passing our test messages through the datapath element’s forwarding engine. In case of a loss indication the OAM association endpoint could not determine, whether the problem is caused by the OF forwarding engine or the physical link. Thus, we propose to introduce pre-/post-filters attached to a physical in-/out-port (see part (a) in Figure 21 for details). A filter actually defines a (set of) flow-match(es) and extracts all matching packets from the packet flow. An attached virtual port consumes these packets and takes appropriate actions. A typical example for such a pre-filtering virtual port is an IEEE 802.1ag compliant OAM scheme, where we want to redirect all Ethernet frames with a specific EtherType (e.g. 0x8902) to the virtual port. © SPARC consortium 2012 Page 43 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Figure 21: Virtual Port Concept However, we might want to deploy other OAM schemes with a different scope for testing physical connectivity as well as existence of specific flow table entries along a flow path. In such a scenario, a virtual port should also stress the flow tables on the first and last datapath element. We define a terminating virtual port (see Figure 21 part (b) for details) that behaves like a physical port. All packets generated by this virtual port are actually traversing the datapath element’s forwarding engine and thus, test the datapath’s internal forwarding logic including flow table entries. However, for enabling true fate sharing among data and OAM packets, all OAM packets must also use the flow table entries defined for normal data packets. 5.1.5 Split State Machines and an Event/Action API We have discussed briefly some constraints arising from OAM related use cases for an advanced processing framework. Due to the tight timing constraints, some monitoring must occur within the data plane. For the PPPoE/PPP example, the LCP-OAM functionality if part of a larger state machine, whose dominant part is running within the control plane while the time critical OAM part executes for each PPP session within the data plane. This example defines a split finite state machine, i.e. some parts of a state machine are executed in the data plane, while the remaining parts execute in the control plane. As both are parts of a now split state machine, we need some means to synchronize state between the two sub-state machines. This problem of synchronizing split state machines may occur either for state-full action-based processing (see Section 5.1.3) or virtual ports as discussed in Section 5.1.4. A processing entity is in principle logically split into two halves (see Figure 22): A top handler resides in the control plane while the bottom handler is located within the datapath element. For synchronizing state both handlers exchange events and actions among each other, effectively initiating state transitions. Note that both top and bottom handlers may be NULL handlers, i.e., all processing is done entirely either in the control plane (bottom handler is empty) or no controlling instance exists at all (top handler is empty). Control-plane-only processing moves all packets of a flow into the “slow path,” thus presumably degrading the flow’s forwarding performance significantly. Figure 22: Proposed logical OpenFlow architecture © SPARC consortium 2012 Page 44 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC We propose a generic event/action API for signaling events and state transitions as well as required actions between control and data plane. The API should be usable for both virtual ports as well as processing instances. Some mandatory ingredients are:       A namespace for identifying processing instances as well as virtual ports A namespace for identifying a state machines counterpart uniquely in the control plane, A set of OpenFlow protocol messages for sending notifications for actions, events, and state transitions A data model to share the same state machine between data plane and control plane An encoding for events, actions, and state transitions, probably based on the Specification and Description Language (SDL) [67] or some other appropriate language Management messages to deploy parts of a split finite state machine on a datapath and attach this to its counterpart in the control plane We leave a final proposal for further study. 5.1.6 Defining New Action Types at Run-Time Beyond a plain extension of the data structures and message formats defined by OpenFlow, adding support for a new protocol entails also changes to a number of functional elements in a datapath element, e.g. the packet parser, the matching logic within the OpenFlow pipeline, and the packet processing logic implemented by the datapath element. OpenFlow defines an abstract view on a datapath element and its capabilities, effectively decoupling the logical from the various physical architectures adopted by hardware manufacturers. For building high performance hardware based switching/routing devices, the hardware manufacturer maps the logical architecture on the available hardware elements. ASIC-based designs impose additional constraints for extensibility, as they restrict extensibility by their internal design and the set of supported protocols that has been defined by the chip set manufacturer [17][18]. However, network processors have become more powerful in recent years and may perform advanced processing tasks at adequate speed. On the other hand, Intel is promoting chip sets for implementing fast network elements using their Intel I/O Acceleration Technology [68], so fast general purpose processing units are (or at least will be soon) available. Such programmable processing environments allow us to rethink the processing framework. 1. Since version 1.2, OpenFlow supports a TLV-based framework for defining matches as well as actions (see ActionSetField in the specification). This enables a datapath element to reveal per-table capabilities in terms of available actions and matches. The available namespace of OpenFlow extensible match fields is 23bits wide, where 16 bits contain a class identifier. Currently, exclusively assigned to ONF members, this namespace allows for a wide variety of additional header fields. While we can add new actions with means like firmware upgrades, the set of available processing capabilities remains static. 2. Exposing new non-standardized actions is a useful extension to the current OpenFlow specification. However, in the light of the generally programmable processing units entering the networking market, programming a datapath element with new, yet unknown actions and matches, seems feasible. The datapath model as defined by the ONF hides any datapath specific details. We propose to extend the OpenFlow data model with a general processing environment. While the control plane developer defines new actions and matches in the language of this virtual machine, a hardware manufacturer will map these instructions to the real hardware setup. Based on our recursive architecture (please refer to Section 4 for details), hybrid designs may couple fast ASIC-based switching chip sets and programmable processing units within a domain. Such a domain may expose itself as a single datapath to higher layers, while its internal logic is redirecting flows based on their processing capabilities within the domain (i.e. the domain backplane). This domain controlling logic must define a constraint based routing, where the constraints are actually the processing capabilities (i.e. the supported actions and matches) of a specific datapath element within the domain. Making a datapath programmable by the control plane is the most flexible extension framework for OpenFlow. Such an approach requires the definition of a model for parsing and matching packets and for a virtualized processing. With such a design the control plane is able to specify new actions and the virtual machine within the datapath interprets and maps these new actions/matches to the underlying forwarding and processing backend. © SPARC consortium 2012 Page 45 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Figure 23: Programmable Datapath Model Figure 23 depicts the logical layers in a programmable datapath element. We can distinguish four layers above the hardware forwarding and processing layer:     The control plane designer defines a dynamic action using a programming language designed for network packet processing such as packetC or NFtables [58][66]. Some typical compound functions may reoccur frequently, like implementing a L3 or L2 routing/switching forwarding element, RIB-to-FIB mappings, etc. The datapath designer may specify a library of such compound functions with optimized hardware mapping and expose this to the control plane developers. In either case, the newly defined action results in a program based on the programming language. In Figure 23 a control module sends a code block (as either source code or in an intermediate representation) down to the datapath element. The OpenFlow management endpoint in the datapath receives the new action definition. Both the management endpoint and the virtual machine define a common layer, as it is agnostic with respect to the underlying forwarding and processing engine. This is called a Hardware Independent Convergence Layer (HICL). The virtual machine is responsible in interpreting/compiling the new action definition into a representation suitable for the hardware driver. A Hardware Abstraction Layer effectively decouples the hardware dependent driver and the virtual machine, i.e. both VM and hardware driver can be implemented and provided by separate vendors/manufacturers. The hardware driver defines the Hardware Dependent Convergence Layer (HDCL). The driver may use an arbitrary (open or proprietary) API for programming the forwarding and processing backend. The lowest layer is the forwarding and processing backend, e.g. ASIC; NPU, or CPU based ones. Such architecture is currently under discussion within the ONF. 5.2 5.2.1 Virtualization and Isolation What is network virtualization? Network virtualization as such is not a new idea; in fact, many existing implementations exist on multiple layers in the network stack. These existing techniques are applicable in many situations, e.g., to improve network resource utilization through sharing among different tenants, to provide a logical separation of traffic between different entities, to simplify network management, and to provide secure connectivity over untrusted networks. For example, to provide end-to-end, point-to-point or multipoint-to-multipoint connectivity, there are a number of different Virtual Private Network (VPN) techniques. There are many ways to create a VPN, for example, on top of layer 2, layer 3 or even layer 4 networks to provide the user with an interface that emulates a direct or routed/switched connection at the specific layer. For example, VPLS operates at layer 2 by creating MPLS pseudo-wire tunnels between a number of provider edge routers; these routers then provide a layer 2 interface to various customers and switch the layer 2 traffic over the tunnels. From the customer’s point of view, the endpoints act as if they are connected to a standard layer 2 switch. Similar techniques exist for providing connectivity on top of many types of networks, as well as © SPARC consortium 2012 Page 46 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC providing different kinds of connectivity. Almost all possible combinations exist: Ethernet over IP, IP over ATM, ATM over MPLS, etc. One important aspect of VPN services is that the customer or user of the service has no control over how the service is actually implemented – he is only aware of the endpoint connections. From the provider’s point of view, the techniques used to implement VPNs allow him better network utilization, since multiple customers can share network resources, including both links and routers/switches. Using, for example, Virtual LANs (VLANs) on the Ethernet layer or Virtual Routing and Forwarding (VRF) on the IP layer, it is possible to create multiple forwarding tables inside a switch or router, forwarding traffic in different manners depending on the forwarding table assignment. This capability can be used either as a mechanism for providing services such as VPNs, or as a way of simplifying management of the network by splitting it into multiple separated domains. Assigning traffic to the different forwarding tables inside a router/switch can be done in several ways, for example by adding a tag to the traffic that identifies which forwarding table should be used – this is the case with VLANs. VRF does not use any explicit tag: With VRF, traffic is assigned to forwarding tables based on other criteria such as which link it arrived on. The link may be an actual physical link, or a virtual link implemented using a tunneling protocol such as GRE, IPSEC, etc. While there are many techniques for virtualizing parts of the network - the nodes, the links, and for creating virtual connectivity between end-points, typically they are applied separately and not as an integrated service. For example, it is common to lease virtual links for connecting different enterprise branches, but typically one has no control over routing and filtering over that link. OpenFlow, on the other hand, gives us the possibility of combining the existing techniques / concepts and provide a more comprehensive solution, namely a virtual network. A virtual network service would provide not only end-to-end connectivity, that hides all the details of how it is actually implemented, but would also give the customer complete control of how traffic is forwarded and handled inside the network, allowing him to treat it as a part of his own network. An environment such as a multi-tenant WAN or a fixed mobile-converged network, where multiple customers and competing service providers share a single physical network through network virtualization, imposes many requirements on the virtualization system. Not only must the system make sure that the traffic is separated between customers, ideally no information should be able to leak between the virtual “partitions” (unless previously agreed otherwise); it must also enforce any established SLAs. In addition, the system should be as flexible as possible in order to avoid costly interaction and coordination between the parties involved, thus reducing operational costs. In the following sections, we first investigate the derived requirements from Deliverable D2.1 and show how they are fulfilled by existing solutions for OpenFlow-based virtualization. We then investigate two aspects regarding virtualization – a) how to map or assign the customer’s traffic into and out of the virtual networks and, b) what options are available for implementing the virtualization system itself and what changes are required to the protocol as well as the OpenFlow switch model in order to implement it in a carrier-grade manner. 5.2.2 Requirements for the virtualization system In Deliverable 2.1 a number of requirements are derived and several of those apply directly to network virtualization, in particular these requirements: R-2 The Split Architecture should support multiple providers. R-3 The Split Architecture should allow sharing of a common infrastructure, to enable multiservice or multiprovider operation. R-4 The Split Architecture should avoid interdependencies of administrative domains in a multiprovider scenario. R-11 The Split Architecture should support best practices for QoS with four differentiated classes according to the definition documented by the Metro Ethernet Forum. R-16 The Split Architecture must control the access to the network and specific services on an individual service provider basis R-42 Data center virtualization must ensure high availability. While most of these are quite general requirements, they highlight some issues with the existing implemented OpenFlow virtualization techniques: the FlowVisor and Multiple Switch Instances (a type of software-isolated virtualization). These existing solutions are described in more detail in Section 7.3-7.4 of Deliverable D3.1, but a short description is provided here. The FlowVisor is a type of controller that acts as an OpenFlow protocol proxy, sitting between a number of controllers and switches, which forwards or denies OpenFlow commands from and to the controllers based on predefined policies. For example, the policies may restrict one controller to only send commands concerning a specific VLAN range and in that way virtualizes the network by restricting the view the different controllers have over the network. The virtual view © SPARC consortium 2012 Page 47 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC is not only restricted to what the controller can see but also what rules they may install, or put in another way, it restricts the connected controllers to interact only with a well-defined part of the total flowspace. Using Multiple Switch Instances, the switch runs multiple OpenFlow instances, allowing multiple controllers to connect to a single switch. Each of these OpenFlow instances is restricted through configuration to be able to use only a subset of the ports on the switch, or a number of VLANs on a particular port. More detail about the impact of the different derived requirements on the virtualization system: R-2 – Multiple providers Using a shared network, where multiple providers each have the ability of controlling part of the network via their own virtual network instance, increases the requirements for isolation in the virtualization system compared to the situation where a single provider uses virtual networks as a means to separate services. In the latter case, for example, information leakage between virtual networks or unfair bandwidth sharing between virtual networks is not a major problem since the single provider already knows the leaked information and is only “stealing” bandwidth from itself. In the first case, both of these problems are major issues and thus the virtual networks need to be strictly isolated from each other. R-3 – Multiple services and multiple operators on a single link Running multiple services, each in a virtual network, does not put very heavy requirements on isolation in terms of information leakage or fair use of the control channel. However, good QoS support may be important depending on what the services are. This would include common QoS concepts like prioritization, traffic shaping, etc. Sharing a single link for multiple operators requires some way of distinguishing packets belonging to the different operators, with more flexibility than reserving whole links per operator. Either this could involve defining strict limits on what part of the address space each operator is able to use, or somehow marking the packets with for example operator-reserved VLAN tags or MPLS labels. R-4 – Administrative interdependencies When traffic enters the virtual network at the edge of the network, some mapping is necessary in order to map the incoming packets to the particular virtual network to which they should belong. This might be all traffic entering on a particular port, or traffic with some specific characteristics, such as certain VLAN tags, IP subnets, MPLS label ranges, etc. This is one area where flexibility is important in order to reduce the administrative interdependencies if, for example, two providers want to utilize or map the same VLAN range to their different virtual networks. R-42 – High availability Either a robust virtualization system should be easy to duplicate or protect by other measures in order to maintain high availability, or it should be constructed in such a way that, from an availability point of view, it makes no difference if it is there or not. Existing solutions have some problems meeting these requirements, as summarized in Table 4: SPARC Requirement Multiple providers Multiple services Multiple operators per link FlowVisor Multiple Switch Instances Level of support Poor Implementation-dependent Main reasons OpenFlow lacks support for proper data plane isolation, good control plane isolation is possible Data and control plane isolation may be good depending on the configuration and implementation Level of support Poor Poor Main reasons Lack of proper QoS support in OpenFlow Lack of proper QoS support in OpenFlow Level of support OK Poor Main reasons Only supports non-overlapping flowspaces Typically separate ports per operator or non-overlapping VLAN tagging © SPARC consortium 2012 Page 48 of 129 WP3, Deliverable 3.3 Administrative interdependency Split Architecture - SPARC Level of support OK OK Main reasons Flowspaces may not overlap, which requires coordination Typically separate ports per provider or non-overlapping VLAN tagging Additional single point of failure that has to be replicated Same as without multiple switch instances High availability Table 4: Existing virtualization solutions with OpenFlow vs. SPARC requirements As can be seen in this table, the current solutions do not provide adequate support for the requirements in many areas, therefore we have investigated what possibilities are available in order to create a system that fulfills the requirements to a higher degree. 5.2.3 Customer traffic mapping Regardless of how the actual virtualization system is implemented, there is a problem at the edge between the customer and the provider of the virtualized network. How should traffic be mapped from the customer network onto the virtual network? At the edge nodes of the virtual network the customers may each be connected to an interface of their own, or they may share an interface through some means, for example through some tunneling protocol or simply by using different parts of the address space on some layer. Additionally, this may not be consistent for all the customers’ connections to the network but may be different at various points of access. The virtualization system must be informed about how these connections are made in order to assign the incoming and outgoing traffic to the correct virtual network. The system may also require other virtualization related parameters such as the IP address, bandwidth requirements, priority of the traffic, and – depending on the virtualization system – what part of the flowspace should be reserved for each particular Virtual Network. In the current FlowVisor implementation, these parameters are specified in a policy file on a per-switch basis, which can be a very time consuming and inflexible way of configuring the network. There is software to simplify the process, for example, the Opt-In Manager which is a web interface that allows users to define a network-wide flowspace they wish to control and to forward their request to an administrator for approval. While this simplifies the process, it still requires manual intervention for each slice. One way of increasing the flexibility would be to allow automatic configuration either directly through the OpenFlow protocol itself or via an external protocol. Such a protocol could allow the network administrator to define some highlevel SLA parameters such as maximum total bandwidth, as well as some limitations on the total flowspace the customer is allowed to use. Detailed configuration could then be left to the customer’s controller, which automatically (depending on the applications running) could define the VNs he needs, within his assigned flowspace, as well as how his traffic should be mapped to these VNs. The same protocol could serve multiple purposes, but primarily it would be used to define how traffic should be handled at the edges of the network. However, it could also be used to specify the topology of the virtual network. For example, if the customer would prefer that the virtualization system abstracts the entire physical topology into an abstract node before presenting it to him, or if he would like a certain topology to be created using virtual links it could be specified using the same protocol. 5.2.4 Network virtualization techniques for OpenFlow A number of different models of how to perform virtualization in the context of OpenFlow can be imagined. All of these models contain three major parts that have to be virtualized: the control channel, the software part of the switch (the OpenFlow Instance, typically running on a general purpose CPU in the switch), and finally the hardware fast-path forwarding part of the switch. First, we will examine some different virtualization models before we delve deeper into the options available for the three parts of the system. Most essential to the virtualization process is some kind of translation / hypervisor unit that translates values between the real physical view and the different virtual views, a unit similar to a virtual memory management unit in computer architectures and the various kinds of hypervisors used to create virtual machines. As with computer virtualization hypervisors, there are many different options for how and where to implement it, but it has to be somewhere between the application logic and the physical fast-path hardware. In Figure 24 five different models are shown: 1. The FlowVisor approach, where the translation unit resides outside of the switches and is shared by multiple switches and multiple controllers. 2. Each switch runs a translation unit that distinguishes between different connected controllers and performs the translation at the protocol level inside the OpenFlow instance on the switch. © SPARC consortium 2012 Page 49 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC 3. Each switch runs multiple OpenFlow instances and with translation done between each OpenFlow instance and the fast-path forwarding hardware. 4. Each switch runs multiple OpenFlow instances and even the fast-path hardware has been split into several forwarding modules, one per OpenFlow instance. Translation is performed by restricting how ports are connected to the forwarding modules; the physical links may be split into multiple ports by using VLANs or MPLS labels, for example. 5. A model with multiple translation units, one responsible for virtualizing a single switch into multiple virtual ones, and another responsible for connecting multiple virtual switches and creating a virtual network representation. The first is responsible for virtualizing and isolating the switch resources, while the second connects to each virtual switch to create a virtual network topology, e.g., by presenting multiple switches as a single switch and managing tunnels that are presented as physical links similar to what is done in [29] and [30]. 6. Figure 24: Different virtualization models for OpenFlow. 5.2.4.1 Control channel isolation To ensure isolation between the control channels, the network that connects the controller to the switches (be it out-ofband or in-band) has to support some kind of rate limiting to prevent one virtual network from using all the bandwidth on the control channel, e.g., by forwarding a large amount of packets to the controller. The same is true in the other direction; the different controllers should not be able to disturb each other by transmitting large amounts of data to the switch. In the simplest scenario, with only one switch and one controller, these two problems could be taken care of in the switch and the controller respectively by limiting the amount of data they are allowed to transmit. However, if one increases the amount of switches in the system, one will reach a point where the aggregate amount of (still limited per VN per switch) traffic from all switches is enough to cause disruption. Additionally, in both out-of-band and in-band cases, the control traffic may be competing with other traffic for network resources, for example different control traffic using the same out-of-band network, or data traffic in the case of in-band control channels. This traffic may be using its own flow control, like TCP, but this is not always necessarily true. If the only traffic on the control network is the different TCP connections between controllers and switches, the built-in flow control will react to congestion and limit the bandwidth use of each connection so that the TCP connections get an approximately equal share each. However, depending on how the different virtual networks are operating, they may likely have different bandwidth requirements for the control channel: One may rely heavily on sending data traffic to the controller for further analysis, whereas another may only be sending OpenFlow commands over the control channel. In both the out-of-band and in-band cases, network-wide QoS reservations for the control traffic can solve the problem and provide fairness in the use of control channel resources. For example, a tenant that needs to analyze data traffic in the controller will probably consume more bandwidth than one that does not have that need – TCP flow control is not enough to satisfy such requirements. Depending on the virtualization model used, local (per VN in the controllers and switches), rate limiting may still be necessary. When using a FlowVisor, the control channels for the different VNs are multiplexed over a single TCP/SSL connection, so it is impossible for intermediate network nodes to look inside to differentiate between different flows and enforce QoS for the various controllers. On the other hand, if the switches allow the different controllers to connect and control the virtual networks using, e.g., different source or destination IP addresses or port numbers, the control traffic network can easily distinguish between the different connections and thus enforce QoS policies. © SPARC consortium 2012 Page 50 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Figure 25: A single TCP/SSL session used by a FlowVisor (left) vs one connection per controller (right). 5.2.4.2 Out-of-band vs. in-band control network An out-of-band control network has many advantages compared to the in-band counterpart in the OpenFlow approach. It is both simpler and easier to design in a reliable manner. However, out-of-band control networks might not be possible in some scenarios, for example in widely geographically distributed OLTs in access networks. Even if an outof-band network is possible, it would be more expensive than an in-band solution due to an entire extra network and extra ports on the hardware. Given these considerations, in-band control channels are as important as out-of-band channels, if not more. In an in-band scenario, the control channel is vulnerable to mis-configured flow table entries since all packets, including control packets, traverse them. For example, the in-band solution implemented by OpenvSwitch [31] requires the operator to pre-configure each switch with the neighboring switches’ as well as its own MAC addresses and the controllers’ IP addresses in order to install “invisible” FlowEntries that send control traffic to be processed by the local (non-OpenFlow) networking stack. This solution is quite fragile and configuration intensive, which could be operationally expensive in a large network. However, with a robust automated topology and controller discovery algorithm (see Section 5.5), even a quite complicated solution could be managed. It is important to note that such an automated discovery algorithm does not necessarily have to operate on all the virtual networks’ control networks – these can be managed by the virtualization system itself. In the in-band case, it is also very important that sufficient QoS support is available so that control traffic can be guaranteed priority over data traffic, in order not to lose control over switches in case of data traffic overload. This should be kept in mind when designing the QoS extensions discussed in Section 5.8. Additionally, QoS should not only be applied to control traffic entering the switch from the in-band channel, but also to outgoing traffic, without the traffic necessarily passing through the flow table when leaving the switch. This may require that an invisible / non-mutable high-priority queue is reserved for traffic from the local network stack. 5.2.4.3 OpenFlow state isolation By OpenFlow Instance we refer to the software part of the OpenFlow implementation of a switch that is running on the normal switch CPU which is responsible for processing OpenFlow messages and performing anything else that is not handled by the hardware itself (e.g., implementing some actions, handling counters). Depending on the virtualization model and the implementation, it may also contain a translation unit that has to keep the different connected controllers separated. However, running multiple OFIs as separate processes decreases the risk of them affecting each other, e.g., if misconfiguations that cause packets to go to the wrong VN. This could still occur in the datapath or in the translation unit, but the fewer ways the different VNs share, the less likely it should be. Additionally, running them as separate processes reduces the risk of problems spreading from one VN to another through other software bugs that could cause the software to crash. Depending on the implementation, an OFI may contain memory tables that could overflow, or it may use more can than its fair share of switch CPU when processing protocol messages.. If the switch operating system supports it, this can be limited through further isolation by a virtualization system, such as the lightweight Linux Containers [32], which enables strict management of switch memory and CPU usage by the OFIs. Local virtual ports can be used to send flows to the switch’s local networking stack, e.g., to implement an in-band control network. In order to support the multiple channels corresponding to each OFI, a predetermined range of Local port numbers could be used, one for each OFI. Alternatively, if the different OFI instances are multiplexed through a single Local virtual port using their respective IP addresses, it could lead to fairness and starvation issues. © SPARC consortium 2012 Page 51 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC The translation unit within the OFI is responsible for mapping the virtual identifiers to their physical counterparts as well as enforcing the separation between the different virtual “identifier spaces.” The complexity of the translation unit depends on the choice of the virtualization model. For example, if per-VN encapsulation is used, similar to model (d) in Figure 24, the payload of the Packet_In and Packet_Out OpenFlow messages needs to be modified based on the port and VN they are associated with, and they should have the correct QoS markings applied for fair resource allocation. In contrast, correct translation and mapping of per-VN port numbers and port counters applies to all the virtualization models discussed in Figure 13. For example, for statistics monitoring, per-VN port counters could be implemented through per-VN flow table entries. One of the OFIs has a privileged role and has the authority to configure the translation unit as well as all the flow tables (without going through the translation unit). This process should be running at a higher priority than the others in order to increase the likelihood that the network owner can control the switch in case of issues with the other OFIs. Configuration of the flow tables can be performed through the existing OpenFlow protocol, which could be extended with, e.g., a vendor extension to also be able to define the VNs and configure the translation unit. Such an extension would contain, e.g., the VNID, the ports that should be included, which tables the VN should use and similar information. 5.2.4.4 Isolation in the forwarding plane In the forwarding plane several things should be virtualized and isolated from each other, firstly the actual entries in flow table(s), the physical links (or parts of them) and any processing modules (e.g., OAM modules as presented in Section 5.3, or other processing resources reachable by the Action Process presented in Section 5.1.3). The exact demarcation between the OpenFlow instance and the hardware is not very clear and depends on the implementation. There are two approaches to keeping the virtual networks separated on the link level: partitioning and encapsulation. With partitioning the link is split into multiple partitions by dividing the total flowspace into non-overlapping logical chunks (called “slices”). For example one can reserve a part of the total VLAN identifier range or IP address range for each individual Virtual Network. To ensure that this separation is effective, all the virtual networks must be partitioned based on the same part of the flowspace; one cannot create some partitions based on VLAN identifiers and some based on IP addresses since it may be ambiguous as to which virtual network a packet with both a VLAN and IP header belongs to. However, these restrictions are strictly “link local,” e.g., the same VLAN range could be reused for a different virtual network on another port on the same switch. Additionally one could apply translation to the field used to partition the flowspace between virtual networks. For example, if traffic belonging to two different virtual networks enters a switch from different ports but with the same VLAN, it is possible to translate one of them to a different VLAN identifier while traversing the network, and at the egress translated back to the original value. With encapsulation, some type of encapsulation separates the traffic when it traverses a shared link. Before transmitting a packet, each switch adds some kind of link-local encapsulation per virtual network; the encapsulation is used at the receiver to assign the packet to a certain virtual network and then removed before the packet enters the packet processing pipeline. This leaves each virtual network with a complete flowspace that has not been sliced to create separation, thus making the system more flexible at the cost of the extra processing needed to add and remove the encapsulation at each hop. It also requires the switches to support some kind of encapsulation protocol such as IPSEC, L2TP, GRE, MPLS, or PBB. Currently the only supported encapsulation method is MPLS, which is supported with the Ericsson extensions to OpenFlow Version 1.0 and by default from OpenFlow Version 1.1 onwards. When it comes to the flow table(s) there are multiple approaches as well, either partitioning or splitting. With partitioning the same restrictions are applied as in the case of the partitioned link sharing above. Each virtual network controller is restricted to insert flow table entries that 1) a have a match that is within the specific flowspace assigned to the particular virtual network, and 2) do not have any actions that would move the packet into a different virtual network, for example by changing the VLAN. Here again the approach has similar drawbacks as in the previous case – one needs to make sure that the partitioned flowspaces do not overlap, which causes a loss of flexibility. Splitting divides the flow table into multiple pieces, giving each virtual network access to a specific piece. This can be done either logically through some internal logic that is specific to the particular brand of switch, or by having physically separate tables. The logical split is of course more interesting since it is more flexible and cheaper (by not using multiple hardware components). With OpenFlow Version 1.1, a logical split can be constructed by restricting the use of the multiple tables available, for example by allowing each Virtual Network access to only ten tables instead of the full range. While the OpenFlow protocol cannot address more than 2 8 tables, this does not limit the potential number of VNs to 256. The flow table identifier has a local scope in each OpenFlow protocol session. If the switch hardware can support more than 256 flow tables, they can all be used through translation, but not by a single OpenFlow session. Processing units, like virtual ports and Action Process actions (described in Section 5.1.3); can be shared between virtual networks if they are able to distinguish between packets from different virtual networks. For example, in the case of the BFD OAM module discussed in Section 5.3.3, the module uses a Maintenance End Point Identifier (MEP-ID) to distinguish between different BFD sessions when transmitting and receiving packets. An identical MEP-ID could exist © SPARC consortium 2012 Page 52 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC in several virtual networks and therefore it is necessary to combine the MEP-ID with some kind of virtual network identifier to create a unique session identifier. This most likely applies to all kinds of processing units, as well as other related things, such as packet counters, which also should be kept on a per-virtual-network basis and not (only) globally within each switch. So far, we have only discussed the ability to distinguish between different virtual networks and not mentioned enforcing fair use of resources. This could be handled through the standard QoS mechanisms (classification, metering and coloring, policing, marking, queuing and shaping) by applying them on a per-VN basis. However, support for QoS functionality is currently limited in OpenFlow, so we do not go into detail here but instead refer to Section 5.8. With the improvement suggested there, it would be possible to create guaranteed per-VN bandwidth profiles. 5.2.5 Improvement proposal By combining the ideas presented above, we can construct a complete virtualization system that provides a high degree of isolation, both in the datapath and on the control level, which is flexible and requires a low degree of interaction between the operators. The system presented here is shown in Figure 26 and is based on model c) in Figure 24. Figure 26: A combination of an encapsulated forwarding plane with flow table splitting, an in-band Ethernet-based control network and multiple isolated OpenFlow Instances. Translation is performed between the OpenFlow Instances and the fast-path hardware, and is configurable through a privileged OpenFlow Instance. The datapath is based on model c) in Figure 24 to which we add encapsulation-based link separation combined with flow table partitioning. This results in a very flexible datapath virtualization system. Isolation between the VNs is handled by applying the resource allocation and QoS mechanisms discussed in Section 5.8. This can be achieved by dedicating the ingress and egress flow tables on the datapath for this purpose. Link separation can be achieved, for example, through specific, predetermined MPLS labels, where at each link they are used to encode the VNID (MPLS replaces the existing EtherType with its own, therefore each VNID needs multiple MPLS labels assigned to it, one per EtherType used). The OFIs could be separated through the virtualization mechanisms of the OS, and they connect to the datapath hardware through the translation unit. The OFI that handles the Master Controller (MC), usually operated by the network owner, configures the translation unit as well as all the ingress and egress flow tables (shown as MC tables in Figure 26). This OFI should run at a higher priority on the local switch CPU so that the MC communications with the switch are given higher priority in order to handle special events such as failure scenarios. Configuration of the MC tables can be performed through the existing OpenFlow protocol, with minor restrictions enforced for each VN controller. The translation unit also needs configuration in order define the different virtual networks, and this can be achieved through non-OpenFlow channels or by OpenFlow protocol extensions containing the definition of a particular VN on a switch (e.g., the VNID, MPLS labels belonging to it, the ports belonging to it, etc.). The control network could be implemented as an in-band IP network and separated into two parts: the master control network and the virtual control channels. The virtual control channels are managed by the MC, which is responsible for routing and establishing QoS as well as reconfiguring the control network in case of failures or topology changes. etc. The master control network, however, needs to be bootstrapped through manual configuration or via a dynamic discovery protocol. © SPARC consortium 2012 Page 53 of 129 WP3, Deliverable 3.3 5.2.6 Split Architecture - SPARC Proof-of-concept implementation In order to test the various ideas described above a proof-of-concept implementation was constructed. However, the implementation does not exactly follow the improvement proposal since it was judged to be too difficult to implement within the timeframe of the project. Instead, the design is based on model b) of Figure 24. The major difference compared to the improvement proposal is that the translation unit is placed inside a single OFI instead of underneath multiple OFIs. The reason for this is that in the switch implementation the OFI and the “fast path” is difficult to separate. This modification does not impact the protocol extensions at all, and impacts the implementation of the translation unit very little. The proof of concept implementation consists of three parts: 1. An OpenFlow 1.1 capable software switch extended with a translation unit for virtualization 2. Vendor extensions to the OpenFlow protocol to install/remove a virtualization profile from a switch’s translation unit as well as to identify the identity of the controller during the protocol connection handshake 3. A NOX application that reads a description of VNs from a file and based on the descriptions installs flow table entries and group entries to the all the involved switches as well as programming the translation unit in each switch The overall function can be seen in Figure 27. On the left side the Master Controller can be seen connected to the physical OpenFlow switch topology, as well as the topology “seen” from the controller. The Master Controller (using a Virtualization Manager application) then installs a reduced virtual topology to the switches. The installation phase consists of programming the translation unit on each switch as well as installing a number of flow table entries and group table entries. On the right side a Virtual Controller is connected to the physical topology, however, it is only able to “see” the virtual topology since the translation unit is hiding and/or translating all relevant messages. Figure 27: The Master Controller (left) sees the full physical topology. The Virtualization Manager running on the Master Controller configures the switches with a virtual topology. On the right side of the figure, a Customer Controller connects to the switches and is only able to see the virtual topology configured previously. Dataplane virtualization Dataplane virtualization is performed through VLAN-tagging on shared links and by reserving a number of flow tables per virtual network inside each switch. Since OpenFlow 1.1 supports multiple VLAN-tags per packet this does not interfere with any existing customer VLANs. In Figure 28 the processing pipeline for three virtual networks within a switch is shown. © SPARC consortium 2012 Page 54 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Figure 28: The various virtualization and customer tables and groups inside a virtualized switch. Solid lined boxes indicate regular OpenFlow datapath resources, whereas dashed lines indicate additions to realize the proposed encapsulation-based virtualization system. When a packet first enters the switch on the left hand side of the figure, it first enters the Virtualization table. Here the packet is handled based on the incoming port: if it is a port shared by multiple networks the VLAN tag is examined, removed, and, based on the VLAN-ID, the packet is forwarded to the first of the tables reserved for the customer registered to that VLAN-ID. If the incoming port is dedicated to a particular customer the packet is immediately forwarded to the customer tables without any modification. Once the packet is inside the Customer tables whatever the customer has programmed the tables to do will be executed. There are no restrictions on the either the types of matches nor actions that the customer can apply here. This includes modifications to the packet, sending the packet to the controller, etc. When all customer tables have been traversed, the packet may enter one or more Customer groups, or it may go directly towards the outgoing port. Again, there are no restrictions placed on the customer groups, all actions and group-types are available for the customer to use. Finally, before leaving the system, the packet goes through a Virtualization group. Depending on which port the virtualization group is attached to, it either does nothing, which is the case if the outgoing port is a dedicated customer port. However, if the outgoing port is a shared port, a VLAN tag is pushed onto the packet and the packet is assigned to a per-customer QoS queue. After being queued on the port, the packet may finally leave the switch. Rule installation and translation unit The Virtualization Manager application is responsible for installing the virtualization rules in the virtualization table as well as creating the virtualization groups and configuring the queues. Once these have been configured the translation unit is programmed with the relevant per VN information such as which ports are customer ports and which are system ports, which flow tables and per-port queues belong to this VN. Once this has been done, a customer controller can connect and identify itself to the switch. The translation unit uses the information it has received in order to modify the OpenFlow messages sent to and received from the customer controller. For example, when a flow entry is installed, the translation unit modifies any “output” commands to go through the associated virtualization group instead of going to the port. The reverse process is performed when the controller requests information about a flow, if the flow refers to a virtualization group the messages to the controller is modify back to show an “output” action instead. Similar things are done for all state messages such as the one enumerating available ports, installed groups, available flow tables etc. In this fashion, only the information that the master controller has authorized can be seen by the customer controller. © SPARC consortium 2012 Page 55 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Operations and Maintenance (OAM) Tools 5.3 OpenFlow, as an attractive element of the open split-architecture platform, aims to eliminate the weaknesses caused by proprietary hardware design and supply, i.e., the single sourced system software supply closely coupled with equipment vendors. However, this open but immature protocol still has significant drawbacks compared to the closed, traditional protocols: It only supports partial configuration of the switches. One important aspect that cannot be provisioned through the current OpenFlow solutions is OAM. In earlier SPARC Deliverables (D3.1 and D2.1) OAM has been identified as an essential requirement for carrier-grade operator networks in order to facilitate network operation and troubleshooting. For instance, D2.1 proposes the following requirements: “For operation and maintenance, a number of functions are already widely defined and implemented. This includes identification of link failures, connectivity checks and loopbacks in the data plane. In addition, it should be possible to send test signals and measure the overall performance. Standards to be supported are ITU-T Y.1731, IEEE 802.3ag/ah and respective IETF standards for IP and MPLS. Interfaces of switches, routers, etc., provide additional information for monitoring and operation like link down, etc. This includes monitoring of links between interfaces as well.” Specifically, three requirements regarding Operations and Maintenance have been derived in D2.1: R-25 The Split Architecture should support OAM mechanisms according to the applied data plane technologies. R-26 The Split Architecture should make use of OAM functions provided by the interface. R-27 The Split Architecture shall support the monitoring of links between interfaces. We see a few challenges with OAM in a SplitArchitecture:    How to map traditional, technology-specific OAM elements to SplitArchitecture, i.e., how to integrate specific OAM tools (e.g., Ethernet OAM, MPLS OAM and IP OAM) into an OpenFlow-based SplitArchitecture. How to provide a generalized, technology-agnostic flow OAM for SplitArchitecture. Given that virtualization enables multi-operator scenarios, how can we support a multi-carrier service OAM and provide automatic OAM configuration in this environment? In the following subsections, we will discuss the challenges listed above. We will start by giving an overview of the most prevailing existing packet OAM toolset, namely Ethernet Service OAM (IEEE 802.1ag, ITU-T Y.1731)) and MPLS-TP OAM (BFD, LSP-ping or ITU-T Y.1731 based). We will discuss how the traditional OAM functions and roles map to our split-architecture design. As an example of how technology-specific OAM tools can be integrated, we will then outline the MPLS BFD solution as implemented in the SPARC prototype. Finally, we will go one step further and consider the case of a novel technology-agnostic flow OAM by providing an initial architectural solution. 5.3.1 Background: existing OAM toolset According to the state-of-the-art design, OAM toolsets are technology-dependent, i.e., they are bound to the specific data plane technology they have been developed for (e.g., Ethernet or MPLS(-TP)). As we will outline in the following subsections, the existing toolsets have very similar purposes, such as the detection of a connectivity loss or a violation of a delay constraint. However, the methods and architectures used to reach those goals are different, which makes the various OAM solutions incompatible. 5.3.1.1 Ethernet Service OAM Ethernet Service OAM is a key component of operation, administration and maintenance for carrier Ethernet based networks. It specifies protocols, procedures and managed objects for end-to-end fault detection, verification and isolation. Ethernet service OAM defines a hierarchy of up to eight OAM levels, allowing users, service providers and operators to run independent OAMs at their own level. It introduces the concept of a Maintenance Association (MA) that is used to monitor the integrity of a single service instance by exchanging CFM (Connectivity Fault Management) messages. The scope of a Maintenance Association is determined by the Management Domain (MD), which describes a network region where connectivity and performance is managed. Each MA associates two or more Maintenance Association Endpoints (MEP) and allows Maintenance Association Intermediate Points (MIP) to support fault detection and isolation. The continuity check protocol is used for fault detection. Each MEP can periodically transmit connectivity check messages (CCM) and track CCMs received from other MEPs in the same maintenance association. © SPARC consortium 2012 Page 56 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC A unicast Loopback Message is used for fault verification. It is typically performed after fault detection. It can also confirm successful initiation or restoration of connectivity. Loopback Messages (LBMs) are transmitted by operator command. The receiving MP responds to the LBM with a unicast Loopback Reply (LBR). A multicast Link-Trace Message (LTM) is transmitted in order to perform path discovery and fault isolation. The LTM is transmitted by operator command. The intercepting MP sends a unicast Link-Trace Reply (LTR) to the originator of the LTM. The originating MEP collects the LTRs and provides sufficient information to construct the sequence of MPs that would be traversed by a data frame sent to the target MAC address. There are usually two types of Ethernet service performance management: Delay Measurement (DM) and Loss Measurement (LM). DM is performed between a pair of MEPs. It can be used for on-demand measurement of frame delay and frame delay variation. An MEP maintains the timestamp at the transmission time of the ETH-DM frame. LM is performed between a pair of MEPs, which maintain two local counters for each peer MEP and for each priority class – TxFCl: counter for data frames transmitted toward the peer MEP; RxFCI: counter for data frames received from the peer MEP. OAM frames are not counted. 5.3.1.2 MPLS(-TP) OAM MPLS-TP OAM is still intensely discussed by the community, and is thus documented in IETF drafts only. Generally, there are two types of MPLS-TP OAM under discussion: The first method is to enhance the available MPLS OAM toolset (LSP Ping and BFD) to meet the OAM requirements of MPLS-TP. LSP Ping provides diagnostic tools for connectivity checks of MPLS tunnels, testing both data and control plane aspects. LSP Ping can be run periodically or in on-demand fashion. Essentially, LSP ping verifies the status of a complete LSP by inserting “echo” requests into the MPLS tunnel with a specific IP destination address. Usage of a dedicated address prevents the packet from being routed further at the egress LSR of the MPLS tunnel. On reception of an “echo” request, the designation LSR responds by sending an “echo” reply back to the originator of the request. For MPLS-TP, LSP Ping should be extended with tracing functionality (traceroute i.e., link-trace). Furthermore, LSP Ping should support point-to-multipoint LSPs and should be able to run without an underlying IP. BFD (Bidirectional Forwarding Detection) is used for very fast proactive detection of data plane failures by connectivity checks. BFD is realized by “hello packets” exchanged between neighboring LSRs in regular, configurable intervals. If hello packets are not received as expected, a connectivity failure with the particular neighbor is detected. BFD packets can also be used to provide loopback functionality with replied received packets at the neighboring node. For working with MPLS-TP, BFD will be extended, e.g., for working without reliance on IP/UDP functionality. The second method is to develop MPLS-TP OAM based on Y.1731 Ethernet service OAM. The basic idea is that Y.1731 specifies a set of OAM procedures and related packet data unit (PDU) formats that meet the transport network requirements for OAM. The actual PDU formats are technology agnostic and could be carried over different encapsulations, e.g., MPLS Generic Associated Channel [59]. 5.3.2 Mapping OAM element roles to SplitArchitecture Extending SDN enabled network domains with OAM functions raises two core questions: An OAM solution may aim towards monitoring and supervising the internal integrity of an SDN domain without interacting with external networking domains. However, interworking among SDN domains and legacy domains may occur frequently, at least in migration phases moving legacy infrastructures towards SDN. As discussed previously, we face a wide variety of OAM toolsets in legacy domains tailored to providing services on their individual layer, be it Ethernet, MPLS, IP, etc. We investigate potential solutions for enabling interworking among OpenFlow based and legacy networking domains first. Beyond interworking with legacy OAM, a network operator may want to monitor and supervise internal operation and health of his SDN enabled domain. Here, the need for a dedicated SDN based OAM toolset arises. We investigate potential solutions for providing OAM functionality within an SDN domain in a following section. While the OpenFlow protocol itself does not support any OAM solutions, it does not prohibit adopting technologyspecific toolsets. For example, to support Ethernet OAM, Ethernet OAM modules could be configured in every node considering the control and management architecture presented in Section 3, as in the following figure: © SPARC consortium 2012 Page 57 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Controller N M S Config Point/ OAM mgmt Controller Split Architecture domain Ethernet OAM Ethernet OAM Datapath Datapath Ethernet OAM Datapath Figure 29: SplitArchitecture OAM Configuration This configuration works well with Ethernet traffic flows. However, a main feature of an OpenFlow-based SplitArchitecture (e.g., OpenFlow) is the support of various traffic flows such as Ethernet, MPLS or IP. Thus, in the case of MPLS traffic flows, the Ethernet OAM module would become ineffective and an additional MPLS OAM module would have to be added to the datapath elements. As a result, datapath elements would contain different OAM configuration modules and OAM generator/parsing modules for different traffic flows. This contradicts to the original notion of SplitArchitectures having simple datapath elements, and this type of OAM integration would significantly increase the complexity of data forwarding devices. An additional problem with multiple OAM modules for different flow technologies is their different configuration models, which further complicates both datapath and control elements. Furthermore, OpenFlow allows routing and switching of packet traffic that does not strictly follow Ethernet switching, MPLS forwarding, or IP forwarding. The OpenFlow switch allows “flow switching,” i.e., switching is done based on arbitrary combinations of bits in the flowspace, which currently mainly consists of the Ethernet, MPLS and IP header fields. Finally, a general problem for OAM integration into current OpenFlow scenarios is the lack of an explicit OAM indication, which allows dispatching of OAM packets without interfering with OpenFlow match structures. This section is devoted to proposing improvements to the OpenFlow switch model to implement OAM feature sets. During the design phase, we were faced with two contradictory aspects:   Compatibility with legacy OAM toolsets. Avoiding the need of developing and running dozens of toolsets at the same switch. The first aspect is relevant when such logical connections are monitored that span OpenFlow and non-OpenFlow domains, as done in the integration of IEEE 802.1ag OAM to OpenFlow by SARA Computing & Networking Service [20]. In such cases, the data plane switch must implement all technology-specific OAM features of the data layer. Considering the monitoring of OpenFlow domain internal connectivity only, this former aspect becomes irrelevant. As the datapath in OpenFlow domain is layering-agnostic, the goal is to provide a unified flow OAM using only a single OAM module in order to monitor all flows in this specific domain. In the upcoming two subsections we will present improvement proposal methods for both, technology-dependent and technology-agnostic OAM toolsets. As an example of a technology-dependent solution, we present MPLS BFD for OpenFlow, which has also been implemented as part of the SPARC demonstrator. The second, more novel solution is technology-agnostic and provides a unified, generic OAM module for all flows. This solution is outlined only as an architectural concept, and many details relevant for implementation are still the subject of future work. 5.3.3 MPLS BFD-based Continuity Check for OpenFlow OpenFlow MPLS BFD-based continuity check (CC) uses the virtual ports concept. A virtual port is identified with a port number within the port table exactly like a physical port, but additional actions not part of the standard OpenFlow ActionSet may be performed on the packet. Virtual ports can be chained. A virtual port chain on the input side resides between the input port where the packet enters and the first flow table, whereas a chain on the output side starts after the flow table and ends at a physical port where the packet exits. The output side chains can be addressed by entries of any flow tables. OpenFlow 1.0 included a few built-in virtual ports for actions such as forwarding a packet to the controller. But those virtual ports are reserved for specific purposes and cannot be configured through the OpenFlow interface. An © SPARC consortium 2012 Page 58 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC extension is defined for OpenFlow 1.0 that allows the dynamic creation and configuration of virtual ports in order to implement MPLS-related packet manipulation. To reduce the amount of resources (e.g., timers) needed to generate monitoring packets, the OpenFlow MPLS protection switching makes use of the group concept to perform multicast. The protection switching also uses this feature to replicate single monitoring packets and then send them, after being modified to identify a single BFD session, out through multiple ports. Since the standard OpenFlow 1.0 neither supports the group concept nor includes any other multicast method, the group concept has been added to the MPLS-enabled OpenFlow 1.0 switch implementation. 5.3.3.1 Ingress-side BFD control packet insertion into the MPLS G-ACh control channel Figure 30 illustrates how BFD control packets are generated and inserted into LSPs on the ingress side. A virtual port represents a BFD packet generator. The BFD packet generator generates a BFD packet template with a configured interval, for example each second, each millisecond, etc (see  in Figure 30). The BFD template packet does not contain any information related to a particular BFD session; however, the packet generator fills in the timing fields of the packet. The BFD packet template also contains the required associated channels (ACH) TLVs (with empty values) and the generic associated channel label (GAL) (see [59] for further details) on top. The switch input source port is set to the BFD packet generator representing the virtual port number upon input to the flow table. Figure 30: Ingress-side OAM signaling generation The template packet is sent to the flow table for matching (see  in Figure 30). The flow table has been programmed with a flow rule that matches the incoming BFD packet template, specifically the “In Port” field matches the virtual port number of the BFD packet generator, and the MPLS field matches the GAL label number (13). The flow action forwards the packet to a multicast group that handles packets with the same generation interval as the BFD packet generator. If there are two packet generators with different intervals, the packets generated by them are forwarded to different BFD multicast groups (see  in Figure 30) based on the differing input port numbers. When the multicast group receives a packet, it replicates it and forwards it to its configured output ports. These virtual ports are field-modifier virtual ports (see  in Figure 30). The field-modifier virtual port fills in the missing fields currently empty in the packet, i.e., the BFD packet fields for the particular BFD session: the timer values, the MEP-ID and the state variables, such as remote defect indication (RDI) [60]. Thus, this virtual port represents the source functions of the MEP (source MEP). The MEP-ID is used to do a lookup in an associative data structure that contains all the BFD session information (state variables, descriptors, output virtual port number, etc.) for all the BFD sessions on the egress side of the LSP. Once the BFD packet has been completed, the packet is forwarded to an MPLS tunnel ingress virtual port, where the BFD control packet is treated in the same fashion as incoming user data traffic: The MPLS label for the LSP is pushed onto the label stack and the packet multiplexed into the MPLS LSP through the output physical port. © SPARC consortium 2012 Page 59 of 129 WP3, Deliverable 3.3 5.3.3.2 Split Architecture - SPARC Egress-side BFD control packet processing Figure 31 illustrates how BFD packets are processed on the egress side of the LSP. Incoming BFD packets enter the switch like any other packet on the same MPLS LSP (see  in Figure 31). An incoming MPLS packet is matched in the flow table, and the packet is forwarded to an MPLS tunnel egress virtual port (see  in Figure 31). This port is configured to pop the top label and examine any labels still on the stack. If the new top label is the GAL label, the packet is removed from the regular packet processing pipeline, but passed to the G-ACh module within the virtual port. The G-ACh module examines the channel type of the packet and the packet is sent to a Link Failure Detection module depending on the channel type (see  in Figure 31). The Link Failure Detection module performs a lookup in the BFD sessions’ associated data structure storage using the MEP-ID found in the BFD packet as the key (see  in Figure 31). If a session is found, the associated BFD session data and failure timer for the session are updated appropriately. Figure 31 : Egress-side OAM signaling reception 5.3.3.3 BFD transmission timer updates During its lifetime a BFD session uses at least two different transmission timer intervals, one before the session has been established (longer than 1 second) and another (depending on the failure detection resolution) once the BFD session has been established. This requires that a BFD session is able to move between different BFD packet generators. To reiterate, each BFD session common timer value is represented by a BFD template packet generator and a corresponding Multicast group per timer value, and each particular BFD session is represented by a field modifier virtual port. In order to move a BFD session, when the timer changes, the multicast groups has to be reconfigured during operation in order to change the actual rate of template packets going to the field modifier ports. This design might seem overly complicated but it was done in order to reduce the number of individual high-resolution timers required by sharing them between BFD sessions with the same timing requirements. 5.3.3.1 Updated design for Openflow version 1.1 The implementation described above was originally designed for the 1.0 version of the OpenFlow protocol and datapath model. With the release of OpenFlow version 1.1, the BFD implementation was updated to make use of the new mechanisms available in the new version. OpenFlow version 1.1 introduced the concept of “Fast Failover” groups. These groups contain buckets with references to either other groups or ports, when a group receives a packet it will forward the packet to the first of these buckets that is alive. The liveness of a port or group is determined by a liveness flag that is set by an external entity. The new Fast Failover mechanism allowed for a less invasive design than the previous, since it is no longer necessary to modify the flowtable itself in order to perform the protection switching. This allowed us to design a more generic model, which required fewer modifications of the data plane as well as to the protocol itself. With the failover mechanism already in place we can move a large part of the BFD functionality into an external module and put a more generic OAM structure in place that could be reused for other OAM protocols apart from BFD. © SPARC consortium 2012 Page 60 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Figure 32: BFD implementation for OpenFlow version 1.1. Normal incoming packets are processed in the flowtable and forwarded to the Fast Failover group, which uses the first working LSP to forward the packets. Incoming OAM packets are sent through a channel for processing in the external module. The external module generates OAM packets and injects them directly in the LSP groups, whose liveness the module controls via a control channel. The updated design can be seen in Figure 32, in this design the OAM packets (in this case: BFD packets) are processed entirely by an external OAM process. The external process creates and sends them to the OpenFlow software switch using some type of data channel, in our case, we are using UNIX sockets as a low-latency and simple means to transport the packets between the processes but other channels such as UDP/IP or shared memory could be used. When an OAM packet arrives on a data channel, the switch forwards the packet directly into a group for processing, bypassing the normal packet pipeline that does packet classification and flowtable processing. When an OAM packet enters a group it is treated exactly like any packet that has gone through the normal pipeline and from that point shares fate with the normal data packets. When the switch receives OAM packets the same updated MPLS processing scheme that was used in the 1.0 version is applied, but instead of sending the packet for internal processing in the software switch itself, the packet is forwarded to the external OAM module with is responsible for decoding it and associating it with a particular OAM session. In case of failure the external OAM module uses a control channel to update the liveness of the affected group, which will cause traffic to switch to a different path in the Fast Failover group (in case one is available). In case of a failure, the same notification mechanism is used as in the previous version; however, since all OAM session information is in the external module, the notification to the controller only contains the new liveness state and the group number. 5.3.4 Technology-agnostic flow OAM Technology-specific OAM solutions, as described in the MPLS BFD example above, have the advantage of straightforward integration with legacy domains in order to provide end-to-end monitoring of connections spanning multiple domains (including OpenFlow domains). However, technology-specific OAM solutions do not take the diversity of the OpenFlow flowspace into account, and target only specific flow types (e.g., MPLS flows in the BFD example). In order to support OAM functionalities for the entire flowspace, this approach would require a number of separate OAM tools (e.g., Ethernet OAM monitors Ethernet Traffic; MPLS OAM monitors MPLS traffic; IP OAM monitors IP traffic), which might lead to an unacceptable increase of complexity in the datapath elements. Examining the different OAM technologies, we realized that they all have similar goals for the fault or failure in the network – detect, locate and report. In a SplitArchitecture domain, with decoupled control and data forwarding plane, different traffic flows (Ethernet, MPLS or IP) are all treated equally by the data forwarding devices – and differences are only relevant to the controlling elements. Then the interesting question arises: In an OpenFlow-based SplitArchitecture environment, can we decouple the OAM monitoring traffic from the actual flow traffic (Ethernet, MPLS or IP)? There are a few issues with decoupling OAM from regular flow traffic that need to be considered for a proposed solution. First, even if OAM traffic is decoupled, fate sharing needs to be ensured. Fate sharing means that OAM traffic © SPARC consortium 2012 Page 61 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC needs to take exactly the same path as the actual flow traffic, i.e., it must go through the same link, the same node and use the same filtering table as the data traffic to be monitored. Secondly, there needs to be a way to explicitly mark OAM packets in order to enable the datapath elements to detect and handle OAM packets accordingly. Finally, some OAM may have technology-specific OAM requirements. Independent OAM module 5.3.4.1 We propose an independent OAM module that is not associated with data traffic. This independent OAM module only generates or parses OAM PDUs (Protocol Data Units). To create a “fate sharing” path between the actual data traffic flow and OAM flow, we propose a Flow ID encapsulation/decapsulation module. This Flow ID module is associated with actual data traffic flow to be monitored so that the OAM traffic will have the same Flow ID and pass through the same link and the same node as the data traffic flow. This general split-architecture OAM proposal has three advantages when compared to other solutions:    Ubiquity: One OAM module supporting different traffic flows. Granularity: Supporting many services from a single carrier or many services from multiple carriers. Uniformity: Simplified and standard configuration and provisioning process. : 1 2 Controller populates OAM encap/decap table to monitor Flow X / Controller Node B MIP Node A Ingress MEP Flow Classifier ... FWD Configuration: monitor Flow X with parameters from node A to Node C Output Classifier Flow ID Encapsulation/ Decapsulation FWD Node C Egress MEP Output OAM Module OAM Module 3 Generate OAM PDU template with relevant fields (OAM Protocol ID, MD, etc.) 4 FWD Output Flow ID Encapsulation/ Decapsulation Flow ID Encapsulation/ Decapsulation OAM Module Classifier 5 Flow with OAM indication is diverted to OAM module. OAM module processes the OAM PDU to detect if there are defects. Flow with OAM indication diverted to OAM module. Certain OAM info is processed in intermediate nodes (e.g. Loopback), or it is just reflected back to normal data path (e.g. CCM). Figure 33: Flow OAM Architecture Figure 33 contains a schematic overview of the proposed flow OAM, depicting the Flow ID module as well as the OAM module. To highlight the procedure of a typical OAM session including configuration, we list the necessary steps for an exemplary continuity check (CC) session: 1. The OAM configuration module in the controller receives a network management command to monitor traffic flow with certain parameters (Flow ID such as VLAN or MPLS label) from Node A to Node C. The OAM configuration module also manually or dynamically generates the corresponding MD (Maintenance Domain), MA (Maintenance Association), MEP and MIP. 2. The controller populates the Flow ID encapsulation/decapsulation tables of all nodes based on the traffic flow information, which is to be monitored; for example, if the traffic flow is MPLS, the OAM PDU must be encapsulated with the correct Flow ID (MPLS label). © SPARC consortium 2012 Page 62 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC 3. The OAM module in Node A generates an OAM packet template and fills the fields of the OAM PDU, such as OAM Protocol ID, MD, MEP, etc. In this case, the OAM module will create a CCM OAM PDU. This OAM PDU is fed into the Flow ID module which encapsulates this OAM PDU with correct Flow IDs (e.g., Ethernet MAC, VLAN or MPLS label) based on the Flow ID encapsulation/decapsulation table. 4. When the intermediate Node B receives traffic flow from Node A, the traffic flow with OAM indication (e.g., OAM EtherType) will be guided to the Flow ID module. The Flow ID module will strip the Flow ID and send the OAM PDU to the OAM Module for further processing. In this case, OAM PDU is a CCM OAM PDU, thus the intermediate Node B will not process this OAM PDU. The OAM PDU will be sent back to the normal traffic flow path. 5. When MEP Node C receives traffic flow from Node B, the traffic flow will be matched against a default flow table which has a special flow entry to guide traffic flow with OAM Type (e.g., OAM EtherType) to the Flow ID module. The Flow ID module will strip the Flow ID and send the OAM PDU to the OAM Module for further processing. In this case, the OAM module recognizes it is the destination of this CCM OAM PDU. The OAM module finally processes the OAM PDU to detect if there is a problem. This is a general Flow OAM mechanism for SplitArchitecture. In practice, we can reuse all the existing implementation in IEEE 802.1ag and Y.1731. For example, the OAM PDU can be based on Ethernet OAM, whereas the actual traffic can be MPLS or IP or Ethernet. However, it is necessary to define an identifier for OpenFlow OAM indication. We suggest reusing the Ethernet OAM EtherType 0x8902, but a new special Flow OAM type or other fields in the OpenFlow match structure might also be used. Furthermore, since OAM is not part of current OpenFlow implementations [19], the selected OAM identifier thus needs to be considered by the OpenFlow parser (i.e., classifier) within each datapath element. Besides configuration routines, this modification of the OpenFlow classifier is also the major extension to the OpenFlow specifications required in order to implement the proposed flow OAM mechanisms. In order to support multi-service and/or multi-operator scenarios, the Flow ID module can be implemented as a service multiplex entity as in Figure 34. The Flow ID service multiplex entity has one General SAP (Service Access Point), and a number of multiplexed SAPs, each of them assigned to a single VLAN identifier, or a MPLS label, or a generic Flow Identifier depending on the configuration of the controller. Upon receiving an OAM packet from the General SAP, the Flow ID multiplex entity uses the identifier (VLAN ID, MPLS label, etc.) or a generic flow identifier to select one of its corresponding multiplexed SAPs to present the OAM PDU to the specific OAM module instance. Similarly, upon receiving an OAM PDU from a specific OAM module instance from the multiplexed SAP, the identifier associated with this SAP is added to the OAM PDU before going to the General SAP. Table 0 ... General SAP Multiiplexed SAPs SAP1 SAP2 OAM OAM Instance 1 Instance 2 .... SAPn OAM Instance n Figure 34: Flow ID Module For technology-specific OAM requirements which an Ethernet OAM PDU cannot satisfy, we may define new PDU types to extend the functionality. Furthermore, it would also possible be possible to base OAM instances on MPLS BFD instead of Ethernet OAM. The Flow OAM architecture in 5.3.4.1 assumes that an OAM indication is considered by the OpenFlow classifier, so that the OAM flow can be detected and diverted to the corresponding OAM instance in the common OAM module. In other words, the architecture proposed actually decouples OAM traffic from regular data traffic. For this reason, fate sharing between the OAM flow and the data flow, which is to be monitored, needs to be enforced. However, we have not yet given a detailed description of how to ensure fate sharing. We will therefore present two proposals of how to achieve fate sharing. One is to create a virtual data packet in datapath elements in order to test the forwarding engine. The other approach is to use the META data header in the internal flow processing of each node. © SPARC consortium 2012 Page 63 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Fate sharing with virtual data packets 5.3.4.2 The idea is to test the forwarding engine with a virtual data packet created from the information carried in the payload of the actual OAM packet. The next hop for the OAM packet is selected based on the forwarding decision made for the virtual data packet. Before exiting at the output port, the internal OAM logic strips off the virtual header and puts back the OAM information into the payload of the original OAM packet. Virtual DATA packet is handled by FWD engine like normal DATA packets DATA FWD FWD engine engine Classifier Classifier 0x8902 Output Output port port creates removes fwd table ofp_match OAM Virtual DATA packet is constructed from ofp_match structure carried by OAM packet dst MAC … src MAC Flow Actions DP#1 0x8902 Flow Actions DP#2 OFP header Fixed Flow OAM header Flow Actions DP#3 ofp_match … … Figure 35: Virtual Data Packet Virtual data packets enforce fate sharing, since OAM packets travel through the same links, the same node and the same filtering tables as the corresponding data traffic. We depict the process in Figure 35 and describe its details next: Figure 35       OAM packets (highlighted in green) arrive at datapath elements interleaved with data packets (yellow) for the corresponding flow on a certain interface. The classifier, which has been updated to recognize OAM packets based on some ID (e.g., EtherType 0x8902 as used by Ethernet OAM) diverts the OAM packet to the Flow ID encap/decap module and the corresponding OAM module instance (simplified as a single OAM module box in the figure). The OAM packet payload carries information about the packet headers used in the corresponding data packets (e.g. in the form of an OpenFlow match structure). The OAM module uses this information in order to create a virtual data packet whose header matches those of regular data packets of the specific flow. The newly created virtual data packet is inserted into the forwarding engine like any regular data packet. Since it has an identical header as regular data packets, it will also match the same rules and actions as corresponding data packets from the monitored flow. The OAM module collects the virtual data packet before it is sent out at an output port. The header of the virtual data packet is adjusted in accordance with the actions performed by the OpenFlow forwarding engine. The virtual packet header is stripped off and the header information is stored in the OAM packets payload. The OAM packet is forwarded at an output port as chosen by the forwarding engine or the controller. Thus, the OAM packet is again interleaved with the data traffic of the monitored flow. A flow OAM solution implementing the virtual data packet approach as depicted above would result in a single, generic OAM module in each datapath element, covering the entire flowspace as offered by OpenFlow, and at the same time would ensure fate sharing of OAM and data traffic. While this approach presents an initial concept for an OpenFlow OAM solution, we acknowledge that it has some limitations and is thus still the subject of ongoing discussions and future work. First of all, the separate handling of OAM packets at each datapath element could have undesired effects on OAM performance measurements (e.g., delay) or cause packet reordering. And second, this type of generic flow OAM works only within an OpenFlow domain, but does not provide compatibility with legacy domains (i.e., traditional OAM tools). However, legacy OAM needs to be taken into account in scenarios where an OpenFlow domain is adjacent to non-OpenFlow domains, as in the access/aggregation use case described in Deliverable D2.1. © SPARC consortium 2012 Page 64 of 129 WP3, Deliverable 3.3 5.3.4.3 Split Architecture - SPARC Fate sharing with metadata encapsulation A different solution would be to add additional OpenFlow metadata of fixed length to each packet inside an OpenFlow domain. In order not to interfere with current standards, there are two options – either we add the additional metadata at the beginning of the packet or at the end of the packet. Since packets are of variable length, stripping bits off the packet may be easier at the beginning of the packet (i.e., before any protocol header). The OpenFlow metadata is always stripped by each OpenFlow datapath element for inspection by a module in the node, and added again to the packet on the line out. It is therefore not interfering with the switching in any way, but it enables additional management of flows without interfering with standardized protocols. OpenFlow switches which do not follow this OAM standard must perform the strip and add operation transparently (without processing the contents of the header). Adding OpenFlow metadata to packets essentially corresponds to adding an extra encapsulation layer within OpenFlow domains, which would allow indication of special traffic (such as OAM or in-band controller traffic) without the need to interfere with the OpenFlow match structure. The disadvantages of this approach are that all switches must support the stripping operation. Furthermore, the OpenFlow domain would have a decreased effective MTU size. Advantages are simplicity and that all networked packets with the same headers (including OAM packets) follow the same path through the network, i.e., fate sharing is easy to ensure. Furthermore, all the processing of the metadata can be done in parallel to the switching operation. For example, a 16-bit OpenFlow metadata field preceding packets in an OpenFlow domain could be specified as follows:     5.4 Packet class (3 bits) o (000) Data packet o (001) OpenFlow control packet (e.g., for in-band OpenFlow) o (010) OAM packet o (011-111) currently undefined Protection class (3 bits) o (000) unprotected flow o (001) restored flow o (010) 1:1 protected flow o (011) 1:1 protected flow with restoration after dual failure o (100) 1+1 protected flow o (101) 1+1 protected flow with restoration after dual failure o (110-111) currently undefined Currently unused (9 bits) Parity check bit (1 bit) Network Resiliency Network resilience is the ability to provide and maintain an acceptable level of service in the presence of failures. Resilience is achieved in carrier-grade networks by first designing the network topology with failures in mind in order to provide alternate paths. The next step is adding the ability to detect failures and react to them using proper recovery mechanisms. The resilience mechanisms that are currently used in carrier-grade networks are divided into two categories: reactive restoration and proactive protection. In the case of protection, redundant paths are preplanned and reserved before a failure occurs. Hence, when a failure occurs, no additional signaling is needed to establish the protected path and traffic can immediately be redirected. However, in the case of restoration, the recovery paths can be either preplanned or dynamically allocated, but the resources needed by the recovery paths are not reserved until a failure occurs. Thus, when a failure occurs, additional signaling is needed to establish the restoration path. We divide this section into two parts: The first describes data plane failures managed by an out-of-band control plane; the second describes failure management when the failure is located in the control path. © SPARC consortium 2012 Page 65 of 129 WP3, Deliverable 3.3 5.4.1 Split Architecture - SPARC Data plane resiliency The failure can be detected in OpenFlow by a Loss of Signal (LOS). This causes an OpenFlow port to change the state from up to down. The OpenFlow port is the port bounded to the OpenFlow instance to transmit and receive packets. This mechanism only detects link-local failures, for example, it says nothing about failures in forwarding engines on the path. As the link-local failures can be used to detect failures in restoration, the available detection method can be used in the restoration mechanism. However, since path protection requires an end-to-end failure detection, we require an additional method to detect those failures. As explained in Section 5.3.3, BFD can be used to detect end-to-end failures on a path in a network. The same protocol can be used to trigger protection in OpenFlow networks. Fast restoration in OpenFlow networks can be implemented in the controller. It requires an immediate action from the controller after a notification of a change in a link status. Failure recovery can be performed by removing the existing flow entries affected by the failure and installing new entries in the affected switches as fast as possible following a failure notification. The restoration mechanism can be seen in Figure 36 A, which consists of the OpenFlow switches A, B, C, D and E. Assuming the controller knows the network topology, we can calculate a path from a source node to the destination node. In Figure 36 A, the controller first installs the path <ABC> by adding the flow entries in the switches A, B and C. Once the controller receives the failure notification message of the link BC, it calculates the new path, <ADEC>. For the OpenFlow switch A, as the flow in the flow entry for the working path <ABC> and restoration path <ADEC> is identical but the action is different for OpenFlow switch A (i.e., to forward to the switch B or D), the controller modifies the flow entry at A. In addition, for the restoration path <ADEC>, there are no flow entries installed in the nodes D, E, and C, so the controller adds these entries in the respective switches. The flow entry in C for the working path <ABC> and restoration path <ADEC> is likely different since the incoming port is assumed to be a part of the matching header in the flow entry. Once all the affected flows have been updated/installed in all the switches, the flow is recovered. After the immediate action of restoration, the controller can clean up the other nodes by deleting the flow entries at B and C related to the older path <ABC>. Deleting (B) Flows One Flow Entry (B) Deleting Flows (C) Working Path Modifying Flows (C) One Group Entry One Flow Entry Working Path Adding Flows (A) Restoration Path (A) Protection Path (E) Adding Flows Adding Flows (D) One Flow Entry (Working Path) (D) One Flow Entry (A) Restoration One Flow Entry (Protection Path) (E) One Flow Entry (B) Protection Figure 36: Recovery mechanism for OpenFlow networks The total time to recover from failure using restoration depends on: 1. The amount of time it takes for a switch to detect a failure. 2. The propagation delay between the switch and the controller. 3. The time spent by the controller to calculate the new path. 4. The time spent by the controller to transmit the messages to modify/delete/add flow entries in the switches. 5. Time spent by the switches to modify/add the flow entries. Our experiments carried out in WP5 Deliverable D5.2 have shown that the restoration time in such a recovery scenario depends on the number of flows to be restored via the alternative path. While the experiments showed that low-cost OpenFlow devices can restore traffic, its dependency on a centralized controller means that it will be hard to achieve 50 ms restoration in a large-scale carrier-grade network. In fact, the restoration time increases linearly with the number of flows, which indicates that the total recovery time is actually dominated by the flow specific messages and modifications listed above (i.e. list items 4. and 5.). © SPARC consortium 2012 Page 66 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC During the recovery time, i.e. the time between failure detection and complete restoration, packets may be lost. In order to reduce the packet loss resulting from a delay in executing the restoration action, we can switch over to the preestablished protection. Protection eliminates the need for the controller to update the datapath elements for modifying and adding new flow entries in order to establish an alternative path. This is accomplished by precomputing the protected path and provisioning it along with the working path. In this case, recovery is fast and the total recovery time is constant, i.e. it does not depend on the number of affected flows (as shown in WP5 Deliverabe D5.2). On the other hand, protection requires more overhead in terms of additional “idle” entries in the flow tables. To implement a protection scheme, we can use the group table concept (specified for OpenFlow v1.1 ). Unlike a flow table, the group table consists of group entries which in turn contain a number of actions. To execute any specific entry in the group table, a flow entry forwards the packet to a group entry with a specific group ID. Each group entry consists of the group ID (which must be unique), a group type and a number of action buckets. An action bucket consists of an alive status (e.g., watch port and watch group in OpenFlow v1.1) and a set of actions that are to be executed if the associated alive status has a certain value. OpenFlow introduces the fast failover group type in order to perform fast failover without needing to involve the controller. This group type is important for our protection mechanism. Any group entry of this type consists of two or more action buckets with a well-defined order. A bucket is considered alive if its associated alive status is within a specific range (i.e., watch port or watch group is not equal to 0xffffffff). The first action bucket describes what to do with the packet under the normal condition. On the other hand, if this action bucket is declared as unavailable, for example due to a change in status of a bucket (i.e., 0xffffffff), the packet is treated according to a “next” bucket, until an available bucket is found. The status of the bucket can be changed by the OpenFlow switch by the monitored port going into “down state” or through other mechanisms, e.g., if a BFD session declared the bucket as unavailable. In our protection mechanism we used BFD to declare the bucket unavailable. The protection mechanism for OpenFlow can be seen in Figure 36 B. When a packet arrives at the OpenFlow switch (A), the controller installs two disjoint paths in the OpenFlow network: one in <ABC> (working path) and the other one in <ADEC> (protected path). The OpenFlow switch (A) is the node that actually needs to take the switching action on the failure condition, i.e., to send the packet to B on the normal condition and to send the packet to D on the failure condition. For this particular flow of the packet, the group table concept can be applied at OpenFlow switch (A), which may contain two action buckets: one for output port B and the other for output port D. Thus, one entry can be added in the flow table of the OpenFlow switch (A) which points the packet to the above entry in the group table. Since the group entries do not require any flow information, it can be added proactively in the OpenFlow switch (A). For the other switches, B and C for the working path, and D, E and C for the protection path, only a normal flow entry can be added. Thus in our case, the switch in C contains two flow entries, one for the working path <ABC> and other for the protecting path <ADEC>. Once a failure is detected by BFD, the OpenFlow switch (A) can change the alive status of the group entry to make the specific bucket unavailable for the action. Thus, action related to the second bucket, i.e., whose output port is D, can be taken when the failure is detected in the working path. As the flow entries in D, E and C related to the <ADEC> path are already present; there is no need to establish a new path in these switches once the failure is detected. 5.4.2 Control channel resiliency This subsection discusses recovery from a control channel failure, i.e., when the connection between the controller and the OpenFlow switches fails. Earlier, we considered the failure in data plane links, i.e., the links between the OpenFlow switches. However, because OpenFlow is a centralized architecture (relying on the controller to take action when a new flow is introduced in the network), reliability of the control plane is also an important issue. There are multiple options for control plane resiliency. One can provide two controllers, each on a separate control network, and when a connection to one controller is lost, the switch can switch over to the backup network. This is a very expensive solution. Control plane resiliency can also be obtained by having a redundant connection to the controller, where restoration and protection mechanisms can be applied in the out-of-band network. However, in some cases, it is difficult to provide redundant path for an out-of-band control network. Another option is to try to restore the connection to the controller by routing the control traffic over the data network, i.e., an in-band solution. When a switch loses connection to the OpenFlow controller, it can send its control traffic to a neighboring switch which forwards the traffic to the controller via its own control channel. This requires that the controller detects such messages and establishes flow entries for routing the control traffic through the neighboring switch. This solution is an intermediate step toward full in-band control. An effective scheme for control plane resiliency in carrier grade networks may be to implement out-of-band control until the failure occurs and switching to in-band control for switches that lose the controller connection after a failure. Thus, when the switch in out-of-band control network looses the connection, it can discover the controller via in-band control channel discovery, which is discussed in Section 5.5.1. © SPARC consortium 2012 Page 67 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Control Channel Bootstrapping and Topology Discovery 5.5 Current OpenFlow specifications do not describe how initial address assignment and control channel setup are performed. In this section, we discuss a method that facilitates automatic bootstrapping of the control network for datapath elements. The bootstrapping procedure for newly connected datapath elements requires at least three configuration steps: 1. Establishment of a data control network which will implement the IP connectivity required by the OpenFlow protocols (i.e. OpenFlow and OF-config) between the datapath elements and the controllers. 2. Assignment of connection identifiers for connecting the datapath element to an OpenFlow controller (or alternatively to an OF-configuration point if the recent ONF model is considered). The connection identifiers required are at least the local address and the address of the OpenFlow controller. If non-default values are used, this may also include transport protocols and port numbers. Assuming an IP based control network, network address configuration can be done via DHCP. 3. Instantiation of an OpenFlow (or OF-config) session with the OpenFlow controller through the control network. Once an OpenFlow or OF-config session is established, all further configuration and setup of the datapath element can be done remotely via the controller. In the case of a dedicated out-of-band control network, the first step (1.) is automatically satisfied. When this network is implemented with “legacy” network control (e.g., spanning tree or IGP) the other two steps, namely the address autoconfiguration (2.) and OpenFlow session establishment (3.) are realized with already standardized procedures and protocols. However, in the case of an in-band control network, the datapath elements need to be able to establish IP connectivity towards the network control in the absence of a node configuration protocol. In following subsections, we discuss bootstrapping of OpenFlow datapath elements in an in-band control network scenario. Furthermore, we will present our extensions to the topology discovery module that is implemented in the NOX controller. 5.5.1 Control-Channel Bootstrapping in an in-band OpenFlow network In an in-band control network, the control-plane traffic is sent in the same communication channel used to transport the associated user data or management traffic. An example of the In-band OpenFlow topology is shown in Figure 37 where the control-plane traffic of the switches B or C passes the same connection as the data-plane traffic. The fundamental principle of in-band control in OpenFlow is that an OpenFlow switch must recognize the control traffic in the data plane traffic without involving the OpenFlow controller. In order to support in-band control, the Stanford reference switch implementation also includes an in-band control plane. In addition to in-band control, the reference switch implementation implements a discovery module in its local networking stack. This discovery module runs a DHCP client to configure the IP address of the LOCAL port and to discover the IP address of the controller. Controller Port = 1 A Port = 4 Port = 2 B Port = 1 DHCP Server Port = 3 Port = 2 Port = 1 Port = 2 C Figure 37: In-band Network topology In-band control in a switch (with the reference 1.0 software) recognizes its own control messages but it does not recognize all the control messages from the other switches. In Figure 37, these messages are DHCP messages from the switches B and C, which are not recognized by in-band control of the switch A. Thus, when these messages reach to the switch A, these are forwarded to the controller for an action. Now if switch A connects to the controller via OpenFlow, the controller can reply the switch A for the action. We implement an application in the controller that can recognize these messages and can respond with an appropriate action. We call our implemented application as a bootstrapping application. In addition to this bootstrapping application, we modify in-band control of the switch such that in-band rules are applied to the packets only when the switch is not connected to the controller, using the reasoning that the controller should have complete control once it has established a connection with the controller. © SPARC consortium 2012 Page 68 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC We propose our solution because all the switches in the OpenFlow topology need a connection with the controller. The current solution in the OpenFlow reference implementation connects the switch A to the controller but it is not able to connect the switches B and C to the controller (in Figure 37). Section 5.5.1.1 describes working of in-band control in the OpenFlow reference software; Section 5.5.1.2 describes the controller connection to the switch A with the OpenFlow reference software; Section 5.5.1.3 describes our contribution to in-band control. 5.5.1.1 In-band control module of the reference switch implementation In-band control in the reference switch is an application on the OpenFlow switch. It is like a controller application that receives PACKET-IN event and replies with PACKET-OUT or FLOW-MOD messages. The OpenFlow switch communicates with this application before transmitting the packet to the controller. The packets are sent to the controller only when this application is not able to handle the packet. In-band control in the reference switch performs MAC learning on the source address. It handles the following packets when the switch does not have a matching flowentry in the FlowTable. (1) All the packets with the LOCAL port as the incoming port (in_port). (2) All the packets with the destination MAC address (dl_dst) as the LOCAL port’s MAC address. (3) All the ARP packets from the controller i.e. the source MAC address (dl_src) as the controller’s MAC address and destination MAC address as the broadcast address. (4) All the packets that are sent to (or from) the controller. These are specifically TCP packets with dl_src as the controller MAC address and tp_src as 6633, or dl_dst as the controller’s MAC address and tp_dst as 6633 (Controller TCP port = 6633). This case may happens when the controller traffic is to (or from) other switches. The switches own control traffic follows (1) or (2) case. In the case (1) (above), in-band control floods the packet when the destination MAC address is the broadcast address. However, if the destination MAC address is not the broadcast address, it performs the MAC learning lookup. The performed action on packet is flood when MAC learning lookup does not know the output port for the destination. However, if it knows the output port then the flow-entry is added in the flow-table and packet is forwarded according to the flow-entry. In the case (2), in-band control performs MAC learning on the source address and forwards the packet to the LOCAL port. In the case (3), in-band control floods the packet. In the cae (4), in-band control performs MAC learning and output port is decided by performing the MAC learning lookup. If the output port is found, the flow-entry is added, otherwise the packet is flooded. 5.5.1.2 OpenFlow session establishment in the reference switch implementation As explained above, the controller can establish a connection with the switch A in the topology shown in Figure 37. We describe the connection of the switch A with the controller in this section. The OpenFlow reference software implements a discovery module to establish a connection with the controller. This discovery module runs a DHCP client. The OpenFlow reference switch assumes that the DHCP server is present in the controller or it is connected to the one of the ports of the switch that directly connects the controller to the OpenFlow topology (e.g. shown in Figure 37). The DHCP client and server interaction is shown in Figure 38. We describe exchange of messages between the DHCP client and the server together with in-band control processing in the switch A. In this description, the DHCP client runs in the switch A and the DHCP server is connected with the port 4 of the switch A (Figure 37). © SPARC consortium 2012 Page 69 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC DHCP Client DHCP Server STEP 1 STEP 2 Client sends DHCPDISCOVER message with a vendor-specific identifier “OpenFlow” The DHCP server receives the DHCPDISCOVER and offers the available IP address to the client by sending DHCP OFFER. This DHCP OFFER message also contains the same vendor-specific identifier with the string containing the location of the controller STEP 3 Client receives the DHCPOFFER and sends the DHCPREQUEST requesting the IP address lease offered STEP 4 The DHCP server receives the DHCPREQUEST and grants the IP address by officially sending DHCPACK STEP 5 Client receives the DHCP ACK and configures its own IP address. Figure 38: DHCP client and server interaction In the first step, the DHCP client in A transmits the DHCPDISCOVER message on the LOCAL port, which contains a vendor specific identifier “OpenFlow” (STEP 1 in Figure 38). The DHCPDISCOVER message has the source MAC address as the MAC address of the LOCAL port of A, destination MAC address as the broadcast address, source UDP port as 68 and destination UDP port as 67. As the incoming port of the message is the LOCAL port (the case (1) of inband control), in-band control handles this packet. Now, as the destination address of the DHCPDISCOVER message is the broadcast address, in-band control floods the message from each of the ports including Port 4 of A. The DHCP server receives this packet from the Port 4 of A and chooses the free IP address. Figure 39 shows the format of the configuration file (proposed by the OpenFlow reference implementation) of the DHCP server which chooses the IP address 192.168.1.20 through 192.168.1.30 to the DHCP clients that send the vendor specific identifier as “OpenFlow” in the DHCPDISCOVER message. In addition to the IP address for a DHCP client, the following configuration file also allows to send a string in the DHCPOFFER message containing the location of the controller i.e. tcp:192.31.1.1. default-lease-time 600; max-lease-time 7200; option space openflow; option openflow.controller-vconn code 1 = text; class "OpenFlow" { match if option vendor-class-identifier = "OpenFlow"; vendor-option-space openflow; option openflow.controller-vconn "tcp:192.31.1.1"; option vendor-class-identifier "OpenFlow"; } subnet 192.31.1.0 netmask 255.255.255.0 { pool { allow members of "OpenFlow"; range 192.31.1.20 192.31.1.30; } } Figure 39: Format of DHCP server configuration file Now the DHCP server sends the DHCPOFFER message to the DHCP client at the switch A (STEP 2 in Figure 38). The DHCPOFFER message has the source MAC address as the MAC address of the DHCP server, destination MAC address as the MAC address of the LOCAL port of A, UDP source port as 67 and UDP destination port as 68. The OpenFlow switch A receives this DHCPOFFER from the Port 4. Now as the destination address of the message is the LOCAL port of the switch A, in-band control handles this packet (due the case (2) of in-band control). Thereafter, inband control transmits the packet to the LOCAL port. The LOCAL port handles this packet to the discovery module. © SPARC consortium 2012 Page 70 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC The discovery module parses the vendor specific identifier string (tcp:192.31.1.1) and sends the DHCPOFFER message to the DHCP client. The DHCP client receives this DHCPOFFER and sends the DHCPREQUEST to request the IP address offered (STEP 3 in Figure 38). Like the DHCPDISCOVER, the DHCPREQUEST message also contains the vendor specific identifier “OpenFlow”. The DHCPREQUEST message has the source MAC address as the MAC address of the LOCAL port, destination MAC address as the broadcast address, source UDP port as 68 and destination UDP port as 67. As the incoming port of this DHCPREQUEST message is the LOCAL port, the message is handled by in-band control (case (1)). MAC learning in in-band control floods this message as the destination address of the DHCPREQUEST message is the broadcast address. The server receives the DHCPREQUEST from Port 4 and grants the IP address by sending the DHCPACK message (STEP 4 Figure 38). The DHCPACK message has the source MAC address as the MAC address of the DHCP server, destination MAC address as the MAC address of the LOCAL port of A, UDP source port as 67 and UDP destination port as 68. The OpenFlow switch A receives this DHCPACK from the Port 4. Now as the destination MAC address of the DHCPACK is the MAC address of the LOCAL port, in-band control forwards this message to the LOCAL port. The LOCAL port handles this to the discovery module which forwards this to the DHCP client and sets the IP address of the controller from the parsed vendor specific string (tcp:192.31.1.1). The DHCP client receives this message from the discovery module and configures the IP address of the LOCAL port (STEP 5 in Figure 38). After STEP 5 of DHCP client/server interaction, the discovery module in the switch A transmits the ARP message to the controller (192.31.1.1) from the LOCAL port. The source MAC address of the ARP message is the MAC address of the LOCAL port. Now as the incoming port of the packet is the LOCAL port (case (1) of in-band control), in-band control handles this message. In-band control floods this packet as the destination MAC address of the packet is the broadcast address. In Figure 37, the Port 1 of the switch A connects to the controller. Thus, the packet reaches to the controller by the Port 1. The controller receives this message and sends the ARP reply. The OpenFlow switch A receives this ARP reply from the Port 1. Now, as the destination address of the ARP reply is the LOCAL port (the case (2) in in-band control), in-band control handles this packet. In-band control learns the MAC address of the controller and forwards the packet to the LOCAL port. The LOCAL port handles this to the discovery module which then performs the TCP 3 way handshake with the controller. Thus, now the OpenFlow switch A becomes connected with the controller. 5.5.1.3 SPARC contribution to in-band control channel bootstrapping In parallel with the switch A in the OpenFlow topology shown in Figure 37, the DHCP client of the switch B and C also transmit the DHCP messages but these messages do not reach to the DHCP server. This is because when these messages reach to the switch A, in-band control of the switch A does not recognise this traffic (because these do not fall into case (1) to case (4) of in-band control). If switch A is not connected to the OpenFlow controller, these messages are dropped, otherwise, these are sent to the controller for the action. We implement an application in the controller so that the controller can decide the action of these messages. We call our application as a bootstrapping application. We also modify in-band control of the OpenFlow reference implementation such that a switch does not handle any control messages after it connects with the controller. Our bootstrapping application works on the PACKET-IN and datapath-join event. It maintains the topology database (TD), a list of the datapath IDs that are connected to the controller (JOINED-IDs), tables that consist the MAC address of the LOCAL port verses datapath id (TABLE-1) and the IP address of the LOCAL port verses datapath id (TABLE2), and a list (LIST) which contains JOINED and SENDER as two variables. The JOINED variable in LIST contains the MAC address of the LOCAL port of a switch such that if this switch is connected to the controller, the datapath id in the SENDER has to transmit LLDP packet from its ports. These LLDP packets are the probe packets which has the similar format as the LLDP packet. These are used to add or update links in the topology database. The topology database (TD) contains the ID of a node (OpenFlow switch, DHCP server and the controller) and the link between the nodes. The ID of the switches in our case is the MAC address of the LOCAL port. In the initial step, the bootstrapping application in the controller assigns the unique ID to the controller and the DHCP server, and initializes the topology database (TD) with the ID of the controller and the DHCP server, Other variables like JOINED-ID, TABLE-1, TABLE2 and LIST are initialized with the NULL value, The pseudo code of our bootstrapping application on the packet-In event is shown in Figure 40. © SPARC consortium 2012 Page 71 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC STOP STEP 1 Packet-In Event Received STEP 3 Update or add the link the topology database (TD) yes STEP 2 Is the packet type LLDP no Extract the flow from the packet-in message Let the In-port is P, source address is S, destination address is D, tp-src =TS and tp-dst = TD. We refer S here as MAC or IP address. One of the address whichever available for comparison below can be used in our application Store the datapath id verses IP address at D in TABLE-2. The datapath-id is the datapath id of the sender of the PACKET-IN message STEP 4 STEP 7 yes yes D =sender of the packet-in message AND ( S= controller address OR TS =67) STEP 6 STEP 8 Path exists in the topology database between S and D TS !=67 AND IP address in D is valid no yes no STEP 5 In STEP 5, the sender of the packet-in message is the MAC address or IP STEP 9 Create a link between S and D in the topology database. In this link, port P of D is connected with S address from TABLE-1 or TABLE-2 which contains datapath id vs MAC or IP address no STEP 11 S exists in the topology database yes STEP 12 yes STEP 10 TD = 67 no Add S in the topology database STEP 13 1) add a link between S and sender of the packet-in message in the topology database. In this link, port P of the sender is connected with the port 0 of S. We will update the port of the S once it receives the LLDP from the sender in STEP 3 2) add source MAC address of the flow as JOINED and sender of the packet-in as SENDER into the LIST STEP 15 STOP Send PACKETOUT to Flood the Packet STOP no no STEP 14 Path to DHCP server known yes STEP 17 Add a flow entry in sender of the Packet-in where output port is LOCAL yes STEP 16 D=LOCAL Port address of the sender of the packet-in (MAC or IP) no STEP 19 Calculate the path from S to D Address of the LOCAL port of datapath id of the sender of the packet-in can be find from TABLE-1 or TABLE-2 (STEP 16) STEP 18 yes Path exists in the topology database between S and D no STEP 20 Establish the path from the sender of the packet-in message to D STEP 21 JOINED in the LIST is present in JOINEDID list yes STEP 22 Transmit the LLDP packets from the datapath id of the SENDER in the LIST no STOP Figure 40: Action of the bootstrapping application on the PACKET-IN event The action of the bootstrapping application on the datapath-join event is described in the Pseudo code written below: In the datapath join event, the controller receives the FEATURE REPLY message from a switch. The FEATURE REPLY consists of the switch datapath ID and information about switch ports including LOCAL port. The information of the switch ports consists of the MAC address of the port which is important for our bootstrapping application. In the part of the datapath-event, our bootstrapping application adds the datapath ID into the JOINED-ID list, adds the datapath ID and the MAC address of the LOCAL port in the TABLE-1, and adds of the MAC address of the LOCAL port in the TD if it is not present. At this time, the bootstrapping application checks that if the MAC address of the LOCAL port is present in the JOINED variable of the LIST or not. If it is present then the bootstrapping application forwards the LLDP packet from the each of the ports of the switch whose datapath id is present in the SENDER variable of the LIST. © SPARC consortium 2012 Page 72 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC We explain working of above bootstrapping application together with the DHCP client and modified in-band control of the OpenFlow reference software. We take an example of the OpenFlow topology shown in Figure 37 to describe the bootstrapping of switches with our application. Our bootstrapping application recognises the DHCP messages. A message is recognised as a DHCP message if the UDP source or destination port of the message is 67. The description of the controller connection with the OpenFlow switch A (Figure 37) in our modified in-band control is similar to the description with the unmodified in-band control (described above). Our bootstrapping application comes into picture after it receives the data-path join event of the switch A. In the part of the data-path join event of A, the bootstrapping application adds the datapath ID of A into the JOINED-ID list, adds the MAC address of the LOCAL port and datapath id of A in the TABLE-1, and adds the MAC address of the LOCAL port into TD. When the switch is connected with the controller, the controller sends flow-remove message to the switch that triggers flushing of all the entries from the switch. As now the switch is connected and modified in-band control does not handle this message, the message is sent to the controller for the action. The controller receives this message as the PACKETIN message (STEP 1 in Figure 40). As the packet type in the PACKET-IN is not the LLDP (STEP 2 in Figure 40), the flow is extracted from the packet-in message (STEP 4 in Figure 40). Let S and D are the source and destination MAC address of the flow. Now in our case, destination (D) of the packet is the sender of the packet-in message and source (S) is equal to the MAC address of the controller (STEP 5). The bootstrapping application adds the link between the controller and the switch A in its topology database (STEP 9) after following STEP 6, 7, 8 in Figure 40. Now, as the destination of the flow (STEP 16) is the LOCAL port of the switch A (sender of this packet in message) itself, the flow entry is added in the switch A where output port is LOCAL and thus now the acknowledgement of the flow-remove message is handled to the discovery module. At present (when the switch A has established a connection to the controller), the switch B and C are in the initial phase of transmitting DHCPDISCOVER messages, because these messages do not reach to the DHCP server. The switch B and switch C continues transmitting DHCPDISCOVER messages until it does not receive the reply from the DHCP server. Let us take the case when the switch A receives the DHCPDISCOVER message of the switch B from the Port 2 (after A has established a connection with the controller). The DHCPDISCOVER message of B has the source MAC address as the MAC address of the LOCAL port of B, destination MAC address as the broadcast address, source UDP port as 68 and destination UDP port as 67. The switch A sends this message to the controller as the PACKET-IN message. The controller receives the PACKET-IN message and generates the PACKET-IN event ((STEP 1 in Figure 40). As the packet is not LLDP one (STEP 2 in Figure 40), the flow is extracted from the packet-in message (STEP 4 in Figure 40). Now it follows the STEP 5 and reaches STEP 10 in Figure 40. As the destination port of the flow is 67, it reaches to STEP 11 in Figure 40. As S (switch B) is not present in TD (STEP 11), it is added in the topology database (TD) and the link is added in the topology database (TD) where port 2 of the switch A connects to the switch B (STEP 13). At this time, the bootstrapping application does not know the output port of the switch B that is connected to the port 2 of A. So, it adds the source MAC address of the flow in the JOINED variable of the LIST and datapath id of the sender of the PACKET-IN (switch A) into the SENDER variable of the LIST (STEP 13). Now as the bootstrapping application does not know the path to the DHCP server (STEP 14), it sends a packet-out message to the switch A to flood the DHCPDISCOVER message of switch B. Thereafter, the DHCPDISCOVER message reaches to the DHCP server via Port 4 of the switch A. The DHCP server receives this message and sends the DHCPOFFER to the switch B via the switch A. This DHCPOFFER message has the source MAC address as the MAC address of the DHCP server, destination MAC address as the MAC address of the LOCAL port of B, UDP source port as 67 and UDP destination port as 68. The switch A receives this packet and sends this to the controller as PACKET-IN message (STEP 1 in Figure 40) . Now the bootstrapping application reaches to the STEP 18 after following STEP 2, 4, 5, 10, 16. The destination the DHCPOFFER is the switch B. As there is a path to the switch B (LOCAL port of switch A) from the switch A in the TD, the controller calculates the path from A to B from TD (STEP 19), and it establishes the flow-entry in the switch A with the action Port 2 and forwards this DHCPOFFER to the switch B. Now switch B receives this and sends DHCPREQUEST. After the exchange of the all DHCP messages and TCP-3 way handshake, the switch B connects to the controller. Thus the controller receives the datapath join event of the switch B. In the part of the data-path join event, the bootstrapping application adds the datapath ID of the switch B into the JOINED-ID list, adds the LOCAL port and datapath id in the TABLE-1. Now, as the MAC address of the LOCAL port of the switch B is present in the JOINED variable of LIST, the bootstrapping application forwards the LLDP packet from the each of the port of the datapath id in the SENDER variable of LIST ( i.e. switch A). Now when the switch B receives this LLDP packet, it sends the LLDP packet to the controller as the PACKET-IN message (STEP 1). As the packet is LLDP (STEP 2), the bootstrapping application updates the link (in the TD) between switch B and switch A (STEP 3) with output port of B as 1. Thus, the bootstrapping also derives the topology of the network. © SPARC consortium 2012 Page 73 of 129 WP3, Deliverable 3.3 5.5.2 Split Architecture - SPARC SPARC Extension to the Topology Discovery Mechanism Our topology discovery mechanism borrows the mechanism defined by the NOX original routing mechanism. The NOX routing mechanism implements three modules for routing a packet in an OpenFlow network. These modules are discovery, authentication and routing modules (Figure 41). The discovery module discovers links between the OpenFlow switches. It uses probe packets to discover the link which have similar as LLDP packet format. The authenticator module creates a Flow_in_event containing the source and destination access points, which the Routing module then listens for and uses to set up the flow’s route through the network.. Routing NModule Discovery N Module Authenticator N Module N NOX Event Handling subsystem Figure 41: The NOX Routing Mechanism The routing module discovers non-OpenFlow switches by MAC learning and route a packet to the destination nonOpenFlow switch. The MAC learning algorithm tracks source address of packets to discover the switches. It floods the packet if the destination is unknown. An OpenFlow network may have loops in its topology. Hence, the flooded packets may persist indefinitely or until TTL expire. Thus, MAC learning in the routing module may not function correctly since nodes may receive packets from multiple ports. The current solution in Ethernet networks to prevent loops is to draw a spanning tree and flood the packet around that spanning tree. We implement two algorithms to prevent loops in MAC learning. In the first algorithm, the controller performs MAC learning on each OpenFlow node and the packet is flooded along the spanning tree, and in the second algorithm, MAC learning is performed on the OpenFlow network and the packet is flooded outside of the OpenFlow network. Thus in first case, each OpenFlow node behaves like an Ethernet switch and in second case, an OpenFlow network controlled by a controller behaves like an Ethernet switch. In the first algorithm where each OpenFlow node behaves like an Ethernet switch, the controller needs to draw a spanning tree of an OpenFlow network topology. The NOX discovery module learns the OpenFlow topology via the original topology discovery method (periodically sends out LLDP formatted probe packets to the node and waits for relaying back these packets by other switches). We used this topology to draw a spanning tree in the OpenFlow topology. We implemented Kruskal's Algorithm to draw a spanning tree. The sequence steps in our implementation are shown Figure 42A. The steps are shown after a packet-in event. The controller generates the packet-in event when it receives the packet-in message from the OpenFlow switch. If the packet contained in the packet-in message is not a LLDP packet (Figure 42A) and its destination is unknown, it is flooded along the spanning tree. On the other hand, if the packet is a LLDP packet then the NOX performs discovery. Furthermore, if the packet is not a LLDP packet and destination of the packet is known, the controller calculates the shortest path and establishes Flow Entries. © SPARC consortium 2012 Page 74 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Packet-In Event Received Perform discovery yes Is the packet a LLDP message Packet-In Event Received Perform discovery yes no Calculate a shortest path and establish Flow Entries in the OpenFlow switches yes Is the destination for the packet known Is the packet a LLDP message no Calculate a shortest path and establish Flow Entries in the OpenFlow switches yes Is the destination for the packet known no no Is a Spanning Tree drawn Find the nonOpenFlow links in the topology no yes Draw a Spanning Tree Flood the packet along the non-OpenFlow links Flood the packet along the Spanning Tree STOP Figure 42: A) NOX modified Mechanism (Spanning Tree solution) STOP (B) without Spanning Tree Creation In the second algorithm where an OpenFlow network behaves like an Ethernet switch, the controller needs to know the non-OpenFlow links to flood the packet outside of the OpenFlow network. The non-OpenFlow links are the links in the topology connected to the non-OpenFlow switches. In our implementation of this algorithm, reception or non-reception of a LLDP packet declares a link as an OpenFlow or a non-OpenFlow link. In the production network, there could be two situations (1) a non-OpenFlow switch does not transmit any LLDP packet (may be because it does not run its own LLDP protocol) (2) a non-OpenFlow switch transmits a LLDP packet to the OpenFlow network (may be because it runs its own LLDP protocol). We consider both the situations while declaring a link as a non-OpenFlow link. The controller in our implementation registers the MAC addresses of all the Ethernet interfaces of the OpenFlow switches when it receives the feature-reply message in the initial phase of connection. The controller declares a link as a non-OpenFlow link in one of the following situations (1) if it has not received any LLDP packet from the link (2) it has received the LLDP packet but the MAC address of the LLDP packet is not the registered MAC address of any of the Ethernet interface of the OpenFlow switches. The sequence of steps that are followed in this implementation is shown in Figure 42B. Like first algorithm, all the actions are taken upon packet-in event. The packet is first checked whether it is a LLDP packet or whether destination is known for the non-LLDP packet. Now if the destination is not known then the controller finds the non-OpenFlow links in the topology and floods the packet along the non-OpenFlow links (Figure 42B). The OpenFlow switch can send a first few bytes of the packet or all the bytes of the packet to the controller in the packet-in message. It depends on configuration of the switch. In the second implemented algorithm, we transmitted all the bytes of the packet to the controller because the packet in packet-in message is flooded to the destination nonOpenFlow switch in this implementation © SPARC consortium 2012 Page 75 of 129 WP3, Deliverable 3.3 5.6 Split Architecture - SPARC Service Creation Service creation is a very general concept. In the telecommunication industry the concept describes service creation points (SCP) which are the points in the network where network functions with customer-specific or product-specific parameters needs to be configured. These are customer facing services, in network technologies often referred to as the service itself, and often referred to as a network service or a user session. For example, mobile networks provide adequate means for addressing various service slices via so called “Access Point Names” (APN) that tie a user session and its established PDP context (the tunnel actually connecting the user and the service creation gateway) to the service slice. Currently such a comprehensive architectural solution is lacking in fixed networks, e.g., in the architecture defined by the Broadband Forum for access aggregation domains. In this case a BRAS grants access to a user session which comprises only the default service slice (i.e., IP access) and does not allow differentiation between various service slices. Besides the network service, there is also a transport function which typically aggregates multiple network services. One could say that network services are tunneled through transport services. For example, customers are using VLAN IDs to split between different services and in a certain forwarding engine; the VLAN IDs are mapped to MPLS labels in order to reduce configuration efforts since one only needs to configure a single transport service that can carry all the network services. Therefore, there might not be one-to-one mapping between a transport function, and the network services correlation with transport functions might not be given, or there may be a wish split transport and service (a service is a virtual connection between two points in this context, potentially stacked and transported within other virtual connections). Typical SCPs are located at the edge of the telecommunication network, for example the BRAS that is used to create residential services. Service creation exists in various forms for different customer groups, e.g., an access service provided to residential customers is fundamentally different from the one provided to business customers. Typical examples for both cases are presented in the following section. Nonetheless, there are some requirements and process steps that are common to both cases and these will be explained first. Another important aspect to be mentioned with regard to service creation is single point provisioning. Today, for certain protocols and their configurations, operators cannot cope with the growing complexity of the network. For example, the scalability of VLAN identifiers is limited to 4094 unique identifiers, but operators have many more users to organize in a typical network domain or segment. Technologies like Provider Bridge (Q-in-Q, IEEE 802.1ad) overcome this limitation, but require additional configuration efforts at the border of both technology variants. Therefore, the number of such provisioning points has to be minimized. This consequently requires good network design and/or qualified control plane protocols. 5.6.1 Service creation phases Service creation denotes the process of connecting and granting access for a user and the creation of access to specific service slices, taking into account different limitations stemming from policy restrictions, which in turn are influenced by contractual constraints, etc. Service creation comprises several phases: Attachment phase: A user must establish physical connectivity to the operator’s network termination or authentication point. A network operator may adopt various options for establishing initial connectivity for the user, e.g., a legacy PPPoE/PPP session, a DHCP/IP-based solution, or an IPv6 link layer auto-configuration procedure. The network operator’s policy will presumably demand prevention of user access to other service slices in this phase. Note that in this scenario there is a strict distinction between user identity and network attachment point. However, in existing solutions, this is not always the case, e.g., the line ID on a DSLAM access node identifies both the user ID and network attachment point. . Default session phase: This phase is only applicable in cases where the user is not or could not be authenticated. e.g., the user does not have authentication information, but still requires network connectivity for emergency calls, resolving configuration/purchasing issues or to contact helpline services. Here special protocols or protocol configurations should be used such as PPPoE/PPP session establishment without authentication or DHCP-based IP configuration with limited access rights to a special service slice, e.g., a landing page or emergency VoIP system. If a user has sufficient authentication information, one should skip this phase and directly move to the authentication phase. Authentication phase: Access to the user’s service slices is granted based on proper user authentication, i.e., the network operator is able to identify the user and allocate a session for handling all management-related activities concerning service slice access. Again, a network operator should not be restricted in his selection of an authentication scheme and an authentication wrapper protocol for exchanging the necessary authentication PDUs – PPP, IEEE802.1x, PANA are some examples of authentication protocols that allow integration of various authentication schemes. Typically, this also includes some form of “binding” the user session between the network operator and user by deriving some keying material for message authentication and probably encryption. © SPARC consortium 2012 Page 76 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Authorization (and signaling) phase: Once a user session has been established, various options for attaching the user to service slices exist: a) some form of dedicated signaling via a specific signaling protocol (resembling SIP-style signaling in mobile networks for accessing x-CSCF functions); b) some policy-driven automatic attachment based on user/operator contracts that correlates to the automatic attachment to the Internet service slice as done today in the existing access/aggregation architectures; c) some form of implicit signaling where the network performs some form of deep packet inspection in order to determine which is the appropriate service slice. Session phase: User attachment is a complex process beyond the three phases discussed thus far. For attaching a user to a service slice, the management subsystem must ensure several constraints. Typically, a service slice consists of several service slice gateways and transport services between the user’s network attachment point and the service slice gateway. It may be desirable to provide an adequate level of service quality as well. Furthermore, when IP connectivity is required for joining a service slice, compatibility of the addressing scheme adopted in the service slice and the addresses assigned during the initial attachment phase must be ensured. Today residential gateways (RGW) in fixed network environments are typically given a single IP address for accessing various service slices. A more advanced scheme may coordinate the assignment of different identifiers for different service slices on an RGW. The single IP address model also potentially stresses the service slice gateway with the need to do network address translation for providing services in a transparent manner, i.e., NAT helper applications may be required. 5.6.2 Relationship to requirements and OpenFlow 1.1 Overall, service creation relates to a multitude of the requirements detailed in D2.1. The list is as follows:           R-1: A wide variety of services/service bundles should be supported. R-2: The Split Architecture should support multiple providers. R-3: The Split Architecture should allow sharing of a common infrastructure to enable multi-service or multiprovider operation. R-4: The Split Architecture should avoid interdependencies of administrative domains in a multi-provider scenario. R-8: The Split Architecture should support automatic transfer methods for distribution of customer profiles and policies in network devices. R-10: The Split Architecture should provide sufficient customer identification. R-21: The Split Architecture should monitor information required for management purposes. R-23: The Split Architecture should extract accounting information. R-29: It should be possible to define chains of processing functions to implement complex processing. R-30: The Split Architecture shall support deployment of legacy and future protocol/service-aware processing functions. Obviously, R-1 – R-4 need to be covered by the service creation approach. Multiservice support is highly relevant both in current and future x-Play networks with business customer support. In multi-provider environments, it is important to distinguish between the various customers in relation to the appropriate provider. Here R-10 is demanded as well. In conjunction with customer identification, support for the configuration of customer-specific entities in edge devices (R8) as service creation is required in telecommunication networks. In addition, R-21 and R-23 are relevant with the demand for management information in general and customer-specific terms. One of the most important requirements is the support of the deployment of legacy processing functions (like PPPoE or other tunneling protocols) as defined in R30. In conjunction with PPPoE, for example, requirement R-29 is important as PPPoE is a mixture of different protocols and requires flexible decision logic (the complexity of PPP can be pointed out by 104 RFC’s with “PPP” in the title). Comparing the requirements in this section and the desired network functions from the previous section with the implemented OpenFlow switch specification version 1.1.0, it is rather difficult to outline specific missing features in the specification. However, the following existing features could be reused:     Security mechanisms preventing unrestricted or authorized access could be implemented as a combination of default flow entries with appropriate counters and pointers to the authentication platform; counters must become more dynamic, allowing the submission of traps and notifications to network management systems. DDoS attacks could be prevented by appropriate counters for specific protocol requests like ARP requests; counters must become more dynamic allowing the submission of traps and notifications to network management systems. Spoofing could be prevented by appropriate flow entries after authentication. Accounting information collection mechanisms could be based on various counters. © SPARC consortium 2012 Page 77 of 129 WP3, Deliverable 3.3      Split Architecture - SPARC Management support is given to a limited extent with various “read state message” – here especially counters must become more dynamic allowing the submission of traps and notifications to network management systems. Authentication and related authorization of the datapath could be based on the setting of appropriate flow entries; however, the decision logic is currently beyond the scope of the OpenFlow specification. Forwarding to RADIUS entity, auto-configuration functions could be based on appropriate flow entries; however, the decision logic for these functions is currently beyond the scope of the OpenFlow specification. Configuration of profiles could be based on group tables; it is unclear how sophisticated the potential configuration options are. Limited support for IPv6 could be enabled by flow entries matching the IPv6 EtherType, but the specification lacks more advanced forwarding analytics. Note that missing features do not have to be implemented in OpenFlow per se. However, it is necessary to point to functions and features which should be available to support service creation models. The essential missing features are:      5.6.3 Modular decision logic for authentication/authorization and support for related protocols (depending on service creation models). Auto-configuration mechanism for customer services. Sophisticated management features like profile-based/policy-based network control and common protocol support like SNMP, potentially based on more dynamic counter mechanisms. Support for PPPoE matching and encapsulation/decapsulation. Support for IPv6 matching as well as modification actions. Residential customer service creation (PPP and beyond) For residential customers, service creation and single point provisioning requires a combination of different network functions like authentication, authorization and accounting, (auto-)configuration of layers 2 – 7, and security for the network infrastructure. In addition, it must be possible to trace and analyze problems and support today’s current protocols. There are two typical protocol suites/models:   The most common one is the Point-to-Point Protocol (PPP) in combination with RADIUS, in current deployments as a PPP over Ethernet (PPPoE) variant. PPP is a combination of a data plane encapsulation and protocol suite, which provides several functions like support for different data plane protocols, autoconfiguration of layers 2 and 3, different authentication mechanisms and an interaction model with RADIUS. This interaction is implemented through a Broadband Remote Access Service (BRAS). The BRAS functionality is typically provided by a router and requires a high degree of processing and memory in the router compared to other operations. The BRAS is the mediation point between PPP and RADIUS. RADIUS provides AAA and configuration options as well as services between centralized user databases (like OSS and BSS systems) and the BRAS. The schematic diagram is illustrated in Figure 43. The second model is an adaptation of the Dynamic Host Configuration Protocol (DHCP) with several extensions, called DHCP++ in this deliverable. DHCP has its roots in LAN environments and provides autoconfiguration support for layers 3-7. For carrier-grade support it requires additional protocols for AAA and security, and potentially auto-configuration for layer 2. Here a variety of options are discussed. Figure 43 only illustrates a model with AAA entry in the DSLAM. © SPARC consortium 2012 Page 78 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Service creation in BRAS with PPPoE RADIUS Service creation in DSLAM with DHCP++ AAA and User Profile server RADIUS IP Edge BRAS AGS2 PPPoE tunnel AAA and User Profile server AGS1 AGS2 Configuration server (incl. L3) DSLAM with PPPoE Intermediate Agent RGW Configuration server (incl. L3) AGS1 RGW DSLAM including RADIUS client with authentication/security gateway (e.g. IEEE802.1x) and DHCP-to-RADIUS transition (or DHCP relay / DHCP server function (dotted line)) Note: Other DHCP++ models are possible, but require additional security functions between service creation point and customer. Other mechanisms for AAA could apply as well. Figure 43: Service creation models BRAS with PPPoE and DHCP++ based on DSLAM An important aspect is the transition from IPv4 to IPv6 in carrier networks. This aspect is not dedicated to service creation only, but it must be mentioned here explicitly. Several models are now available for the introduction of IPv6 in existing carrier environments like PPPv6, DHCPv6 or IPv6 route advertisements. In summary, a service creation model has to cover the following aspects:    Support for legacy protocols in general, PPPoE or DHCP-based IPv6 model as potential future production model. Support for common user databases like RADIUS. Security mechanisms o Authentication of users. o Authorization of users. o Security mechanisms preventing unrestricted or authorized access.      Discarding any data except authorization requests until successful authorization. Restriction of access to resources associated to a user. Protection of network infrastructure against any misbehavior like DDoS attacks (not necessarily with ill-effects) or spoofing. Collection of accounting information. Support for management.  Auto-configuration of common L3-L7 functions like IP address, subnet mask, etc. 5.6.4 Business customer service creation based on MPLS pseudo-wires Business customers typically require connectivity between different locations emulating a LAN environment over public telecommunication services. Service creation relies on the appropriate configuration of desired virtual networks. For example, a pseudo-wire is a generic point-to-point tunneling concept for emulating various services over packetbased networks. Several pseudo-wire-based services have been defined, e.g., TDM circuit emulation, ATM emulation, SONET/SDH emulation, and Ethernet emulation to provide business customers with sufficient options for interconnecting locations. Ethernet on MPLS pseudo-wires can be used to provide an emulated Ethernet connection also known as an E-Line service in MEF terminology. Typically, E-Line services are used to connect two branch offices at the Ethernet layer. Ethernet pseudo-wires are also often used to create Virtual Private LAN Services (VPLS) by establishing a mesh of MPLS LSPs between the provider edge routers and implementing an Ethernet to/from the LSP © SPARC consortium 2012 Page 79 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC switch in these routers (similar to a normal Ethernet switch that handles the LSPs like ports) – in MEF terminology these are called E-LANs. MPLS-based VPLS services have become very popular, partly because of the possibility of performing extensive traffic engineering on MPLS LSPs, something that can be difficult with other packet-based networks. There are several ways to create an E-Line/E-LAN service within an OpenFlow network. Since OpenFlow incorporates the Ethernet layer, one could imagine a network application running in the controller that simply installs Ethernet forwarding rules along a path in the OpenFlow network. However, since the MAC address space is flat, it would result in one flow table entry per OpenFlow switch for each MAC address that exists in the customers network, and that is something that does not scale very well. Additionally, there is a risk of MAC address collisions between E-Lines/ELANs belonging to different customers (MAC addresses are not as unique as they should be, especially if they have been automatically generated and assigned to a virtual machine). Collisions between different E-Lines could be resolved through translation of the MAC addresses at the provider edge, however, this causes further complications in troubleshooting the network and might increase complexity if multiple controllers are involved, since this would have to be synchronized between the provider edge routers. Pseudo-wire encapsulation resolves the scalability issues as well as the collision problem. Instead of requiring one flow entry per customer MAC address on all provider nodes, it requires two flow entries per E-Line service – one for each direction of the MPLS LSP carrying the pseudo-wire. Since the Ethernet frames are encapsulated, there is no risk of collisions – the customer MAC addresses are only “visible” at the provider edge nodes. The overall pseudo-wire architecture is described in RFC3985, and the detailed specification for MPLS networks in RFC4385 and RFC4448. The first step when tunneling an Ethernet frame in MPLS PWE is to encapsulate the frame (source and destination MAC address, EtherType, and payload) with a PWE control word. The control word is used to provide various optional features, for example, it may contain a frame sequence number in order to enforce ordered delivery of the tunneled frames. This is optional when emulating Ethernet because Ethernet as such does not guarantee ordered delivery, therefore the control word can be left empty without violating the specification. Two MPLS labels are prepended on top of the control word, one for identifying the pseudo-wire tunnel and one for identifying the LSP. Finally, a new Ethernet header is added – this header does not contain any customer MAC addresses – instead it refers to the originating and next-hop provider router. Once the outer Ethernet header as been added, the frame is completed (see Figure 44). Figure 44: Typical pseudo-wire frame structure for Ethernet emulation 5.6.5 Overall conclusions for service creation This section summarizes the general missing aspects in the current OpenFlow specification for supporting service creation. They are:    Detailed, complete service creation model based on OpenFlow for carrier environments. Sophisticated push operation of flow entries during bootstrap from controller to switch (access node, e.g., DSLAM) for initial protection of aggregation network. Profile-based/policy-based configuration possibilities are required. © SPARC consortium 2012 Page 80 of 129 WP3, Deliverable 3.3    Split Architecture - SPARC Potentially fine granular configuration of flow entries for protocols or user-specific authorized access to services and/or providers (the latter in the case of multi-provider forwarding). Legacy support, e.g., for PPPoE, MPLS Pseudo-Wire or other tunneling protocols, especially from business customer demands. General support for IPv6. In Section 6.2 we will follow up on the issues raised here and present a detailed model for implementation of service creation for both residential and business customers in an OpenFlow-based SplitArchitecture environment. 5.7 Energy-Efficient Networking The global concern about climate change on our planet is also influencing the ICT sector. Currently ICT accounts for 2 percent to 4 percent of carbon emissions worldwide. About 40 percent to 60 percent of these emissions can be attributed to energy consumption in the user phase, whereas the remainder originates in other life cycle phases (material extraction, production, transport, and end-of-life). By 2020 the share of ICT in worldwide carbon emissions is estimated to double in a “business as usual” scenario. Thus, an important goal for future networking is the reduction of its carbon footprint. One way to gain higher energy efficiency in networking is the wider use of optical technologies, since optical signals consume less energy than electrical signals. Additionally, there has been an increase in research worldwide related to sustainable networking in recent years, with initiatives such as the current EU-funded Network of Excellence TREND [5], COST Action 804 [6], the GreenTouch consortium [7] and the CANARIE-funded GreenStar Network [8] which also has European members, including the SPARC members IBBT and Ericsson. 5.7.1 Current approaches to reducing network power consumption 5.7.1.1 Network topology optimization Different strategies to save power in networks are possible. At the highest level one can investigate whether optimizations are possible in the network topology. Currently networks are designed to handle peak loads. This means that when the loads are lower, there is overcapacity in the network. At night the traffic load can be only 25 percent to 50 percent of the load during the day. This lower load could allow for a more simplified network topology at night which in turn enables switching off certain links. Additionally, switching off these links allows for line cards to be switched off and thus leads to reduced node power consumption. A concept that implements this principle is MLTE (multilayer traffic engineering). The MLTE approach can lead to power savings of 50 percent during low load periods. However, many access networks are organized in tree structures, so shutting down links is not a feasible option. This means that dynamic topology optimization cannot be applied in all network scenarios, and is not feasible in access networks. Figure 45: Multilayer Traffic Engineering (MLTE) 5.7.1.2 Burst mode operation Given a certain static network topology, further optimizations can be achieved by burst mode operation. In burst mode operation, packets are buffered in a network node and then sent over the link at the maximum rate. In between the bursts the line can be powered down. This strategy can be useful mainly in access networks due to the “burstiness” of the © SPARC consortium 2012 Page 81 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC traffic. However, the difference in power consumption between different link rates is mainly manifested at the higher bit rates. Furthermore, burst mode operation works with very small time scales, so the number of components which can be switched off is limited. Finally, burst mode operation requires larger packet buffers which also need power. Hence it is yet unclear whether this strategy can lead to significant power optimization in reality. 5.7.1.3 Adaptive link rate Another approach for static network topologies is to use adaptive link rates. Adaptive link rate also exploits the “burstiness” of the traffic. The approach is based on the principle that lower link rates lead to lower power consumption in the network equipment. Saving energy is possible by estimating the average link rate required on a line and adapting the link rate to this level. However, similar to burst mode operation, adaptive link rate requires larger packet buffers, which might reduce some of the actual power savings. Figure 46: Power Management 5.7.2 Sustainable networking with OpenFlow Figure 47: Energy consumption across the functional blocks of a high-end core router [9] Centralizing the control software often means an automatic reduction in switch power consumption offset by consumption of a new element. As we can see from [9], most of the power consumption stems from the Forwarding Engine (32%) and the Cooling/Power Supply (35%). However, a rather small, but still significant amount (11%) of the power is consumed by the control plane. SplitArchitecture with OpenFlow enables us to reduce this part by moving the routing tables (RIB)/routing engine and control plane functionality to the controller and keeping only the forwarding engine (FIB) in the switch with a smaller OpenFlow software component and extra hardware for handling communication with the controller. However, the controller will consume more power due to additional computational loads. As a result, it is unclear if an OpenFlow architecture per se will actually reduce power consumption compared to conventional network architectures. © SPARC consortium 2012 Page 82 of 129 WP3, Deliverable 3.3 5.7.2.1 Split Architecture - SPARC OpenFlow protocol extensions for energy-efficient networking The OpenFlow architecture enables us to implement optimization of network topology, e.g., MLTE operations as an application in the OpenFlow controller. In order to reap the maximum benefit, the controller should be able to power up/power down parts of the switch on demand as a function of the energy-efficient algorithms in the application. To enable power management for static network topologies as well, we need to add the burst and adaptive link modes in the switches and advertise them to the controller. On the controller side, it should be extended to allow control of such energy-efficient features. This means that there are two sets of extensions designed within SPARC for energy efficiency:   The first set of functions relates to port features: switching them on/off, enabling functions on the ports (Adaptive Line Rate / Burst Mode) and setting parameters for these functions (for instance, burst length for burst mode operation). We also need to disseminate these port capabilities to the controller. The second set relates to configuration and monitoring of components of the switch itself, which do not relate directly to forwarding, for instance internal power management and monitoring switch temperature. To enable energy-efficient networking applications to run on the OpenFlow controller, there needs to be an interface towards the datapath elements to control the energy-efficiency related functions. Thus, we first proposed to add some extra messages to the OpenFlow specification which not only indicate the status of a port on the switch, but also allows us to control the individual ports. In D4.2 we will present details of our proposed extensions for dissemination of power management capabilities, for monitoring the related switch parameters and for controlling these capabilities. However, with the recent addition of OF-config to the OpenFlow architecture (see Section 3.1.3) there is an additional interface available, dedicated for configuration tasks. Following the discussion in Section 4.2.6, it would make sense to include this configuration possibilites to the network management function in the control framework, since it helps to configure and steer the network in timely-fashion6. Due to the recency of OF-config, it is still unclear where the energy awareness features go in the updated ONF SDN architecture. As of yet, OF-config provides means for setting the following parameters for port configuration: no-receive, no-forward, no-packetin, admin-state. To support energyefficient networking, the OF-config data model could be extended with a set of parameters allowing configuration of energy efficiency features, as listed above. The other extension set, relating to monitoring and management of the switch itself is also not unambiguously defined. Regarding additional switch capabilities, the OF-config specification readily support capability discovery, which could be extended in a straight-forward way to additionally discover capabilities designed for energy efficiency. 5.8 Quality of Service To understand the QoS requirements, the different logical blocks of a 10 Gigabit Ethernet switch model with line cards and a backplane, is depicted in Figure 48 with the potential stress points highlighted. This model is from an IXIA white paper [34]. It addresses six typical stress points, which are designed to operate at line speed but under certain loads may become congested and cause increased latency, packet drops and other problems:      Ingress packet buffer: The ingress packet buffer stores all received packets that are waiting to be processed. These buffers may overflow if upstream nodes are transmitting packets faster than the switch can processes them. Packet classification: The packet classifier uses the parsed header information incoming packets in order to classify them into different service classes. Depending on the design and the types of packets received, packet classification may not be able to operate fast enough. For example, complicated packets with multiple levels of headers (e.g., tunneled packets) may require multiple table lookups that increase the required amount of processing time per packet in order to classify them. Traffic management: The traffic management modules are responsible for applying QoS through the mechanisms discussed below. During high load, these modules may be heavily stressed as queue management algorithms such as random early detection are activated in an attempt to reduce the load. Crossbar switch and backplane interconnect: The crossbar switch and backplane interconnect are responsible for transferring packets between different connected line cards. Depending on the architecture of the particular switch, the backplane may cause blocking under certain traffic patterns. Advanced queuing and scheduling algorithms, combined with fast interconnects, can limit this problem or even completely resolve it. Multicast replication and queues: Multicast replication is usually performed in two stages, one at the ingress line card in order to multicast the packet to different egress line cards, and another at the egress line card(s) in order to 6 Note that the extensions detailed in D4.2 were designed prior to the release of OF-config, thus only OpenFlow extensions have been considered so far. © SPARC consortium 2012 Page 83 of 129 WP3, Deliverable 3.3  Split Architecture - SPARC multicast the packet to multiple ports. Multicast packets compete for the same resources as the unicast packets and may be the cause of congestion both at the egress ports and when going through the crossbar and the backplane. Control plane: The rate at which the control plane is able to update the tables used by the switch, for example in forwarding tables, may cause problems in error conditions when a high number of changes needs to be performed. As we can see, congestion can be caused by traffic consisting of complicated packets or multicast traffic. However, even simple unicast traffic may be the most obvious cause of congestion when multiple ports are trying to forward traffic through the same outgoing port and the combined packet rate/bit rate is higher than the line rate of the outgoing port. This may cause congestion first in the egress packet buffer, which in turn may cause blocking in the crossbar switch, which in turn may cause congestion in the incoming packet buffer. Figure 48: Logical operation of a 10 Gigabit Ethernet switch with line cards, from [34]. © SPARC consortium 2012 Page 84 of 129 WP3, Deliverable 3.3 5.8.1 Split Architecture - SPARC Queuing management, scheduling, traffic shaping Typical quality of service (QoS) implementations in most routers and switches are constructed from a number of tools that are able to affect incoming packets in order to prioritize certain traffic, ensure proper bandwidth usage, smoothen traffic flows and reduce congestion in the network. Normally five tools are used: classification, traffic policing, traffic scheduling, traffic shaping, and packet rewriting. Figure 49 illustrates an example of QoS processing chaining; the order of the tools (or the tools used) in the figure are in no way canonical – different QoS goals may require different tools in a different order. Exactly which capabilities are needed by the different tools again depends on the requirements in a particular instance, as well as where they should be placed in the packet processing chain from incoming port to outgoing port. As shown in the figure, traditional devices may allow some QoS actions to take place before the packet enters the actual switching/routing stage, for example policing, or allow queue management actions on the incoming packet buffers. For a more detailed discussion of these subjects, see [35] and [36]. Figure 49: QoS processing pipeline where packets from different flows are classified and metered before going through the routing/switching stage. The packets that were not dropped are then scheduled, shaped, and rewritten. Classifier The classifier is responsible for recognizing different traffic flows based on a number of criteria, for example, Layer 2 and Layer 3 addresses, TCP/UDP port number, explicit packet marking (DSCP/802.1p), incoming physical port, etc. Classified packets are placed in a class of service that shares a common QoS treatment in the switch. Metering and coloring Traffic metering and coloring measures the packet rate and assigns colors to the incoming packets based on their arrival rate. Usually a dual-rate mechanism is employed: Packets not exceeding the first rate are colored green, packets exceeding the first rate but not the second are colored yellow, and finally packets that exceed both rates are colored red. Typical implementations may use token bucket-based algorithms such as the “two-rate three-color marker” [rfc2698] or “virtual scheduling” as used in ATM networks [37]. Policing Traffic policing is used to enforce a maximum transmission rate based on the colors assigned to the packets. For example, red packets may be dropped immediately while yellow packets are candidates for dropping, whereas green packets should be allowed through without any restrictions. Shaping Traffic shaping is used to not only enforce a maximum transmission rate, but also to smoothen the traffic flow. This may be based on the same coloring used for policing, or new measurements may be made. Instead of dropping the offending packets, they are placed in a buffer until they may be transmitted without violating the defined packet rate or bit rate. However, if the shaping buffer fills up, packets may be dropped here as well. Scheduling Packet scheduling is responsible for selecting packets from queues representing different service classes and placing them in the output queue in a particular order. Many different scheduling/queue management algorithms exist, from a basic “round-robin” scheme to more complicated schemes such as “priority-based deficit-weighted round-robin” and congestion avoidance algorithms such as flow-based random early detection. Rewriting If one wishes to carry explicit service class or priority markers in the packets themselves (for use by other routers/switches), these are written to the packets by the packet rewriter which modifies the packets before transmission. © SPARC consortium 2012 Page 85 of 129 WP3, Deliverable 3.3 5.8.2 Split Architecture - SPARC Improvement proposal OpenFlow has basic support for some of the methods described above. Since OpenFlow is continuously evolving we will try to show the differences between the various versions (1.0, 1.1, and 1.3) and what has changed between them. Classification can be performed by matching packets in the flow table(s). In OpenFlow 1.0, multiple matching per packet is not possible since there is only one flow table and multiple passes through that table is not allowed. Thus it is not possible to decouple service class classification from flow classification (i.e., using one match for determining service class, and a different one for determining how to forward the packets). If one wishes to have a generic rule for a flow to perform forwarding but apply different QoS schemes, one needs to combine the forwarding and QoS rules, multiplying the number of flow table entries (e.g., with four different QoS service classes the number of entries becomes four times larger). In OpenFlow 1.1 multiple matches are possible, so there is no need to combine the rules, and the number of entries may only increase with a constant number at the cost of an additional match. The metadata field available in OpenFlow 1.1 – which is associated with a packet while it traverses the processing pipeline – could be used to temporarily store the assigned service class for use in later matching stages if necessary. The support for multiple lookups remains in OpenFlow 1.3, with the addition of “Extensible match support”. Extensible matching replaces the previously static matching structure (i.e. which header fields could be matched and how matching was performed) with a more flexible TLV-based structure, that can be extended by vendors. This could potentially allow more advanced classification than was previously supported. The major issue with classification is solved in OpenFlow 1.3 through multiple lookups and extensible matching for more advanced classification. However, OpenFlow still lacks a consistent way to color packets and change their behavour (e.g. in queuing and scheduling) based on the this coloring. The metadata field is an ad-hoc way of carrying the color information but a more consistent way would be preferable, for example by adding a color field to the perpacket context data carried through the processing pipeline. Metering and coloring is not available in ether OpenFlow versions 1.0 or 1.1. What is possible is to collect statistics on the number of matched packets and bytes per flow entry. However, these cannot affect how packets are processed within the switch but are purely for collecting statistics. Our suggestion regarding improving metering and coloring would be to implement metering and coloring as a processing action that measures the packet rate or bit rate and writes the determined color to the metadata field. Following tables could then use the metadata to allow the packet to continue in the processing pipeline or to drop the packet. Since shaping uses similar mechanisms, it seems it should be as easy to implement as a processing action. However, since shaping requires buffering of packets that exceed the packet rate or bit rate, it becomes more complicated. This would require that the processing pipeline be capable of temporarily storing some packets somewhere in the pipeline, continue processing incoming packets, and then pick up processing of the stored packets mid-pipeline when the shaping mechanism allows it. OpenFlow version 1.3 includes per-flow meter support. These meters are configured in a metering table and can be accessed from the flowtable on a per-flow basis. In the current state it support multiple bands per meter, meaning that different actions can be applied based on the packet rate. Currently only two actions are defined, drop and DSCPremark. With the drop action packets can be dropped if they exceed the limit, making it possible to do policing with this type of meter band. DSCP-remark allows the meter to mark the IP header of a packet based on the current packet rate, which together with flexible matching could be used for decisions further along in the processing pipeline on the switch, or by other switches/routers in the network. This limited type of coloring is however only applicable to IP packets. To improve the mechanism in 1.3 it would be natural to extend the existing metering framework to add support for more coloring methods that remark packets (e.g. using priority bits in Ethernet and CoS bits in MPLS). Additionally support for coloring that does not modify the packet would be useful, for example through the metadata field or a dedicated color field. Policing before entering the flowtable is not supported by any of the OpenFlow versions, and only version 1.3 supports it within the flowtables with the mechanisms described for metering and coloring. However, this may leave the flowtables and associated processing vulnerable to congestion since it is not possible to limit the rate of packets entering the processing pipeline directly after packet classification has been performed (i.e. once the switch has parsed the packet, not classified in terms of QoS class of service assignment). One could imagine a solution to this where policing units (or meters) can be executed before the flowtables: However, these meters would need a similar matching structure © SPARC consortium 2012 Page 86 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC to the one already available in the flowtables, which makes this solution would less elegant since we end up with a structure that is more or less identical to flowtables. However, in most situations where the flowtables are overwhelmed by a long complex processing pipeline it would be enough to do policing early in the pipeline and in that way reduce the load on the rest of it. Shaping is supported by all OpenFlow version using the maximum-rate queues. Since packets can be implicitly marked with which queue they should go through it is possible to shape on both unicast and multicast flows, if somewhat awkwardly. It is not possible to shape a flow before its packets are sent to the port-queue, so for example shaping a flow and then spreading the packets over multiple outgoing ports is not possible; the shaping can only be performed after the packets have been sent to the ports. This makes it difficult to implement certain shaping schemes. For example, if one flow is spread over multiple ports using a Select group (which sends to one outgoing port based on a criterion, e.g. a hash of the packet header or by simple round robin) it is not possible to shape the actual flow. Rather, one is forced to shape each individual part of the flow that has been created by the Select group. Shaping could be improved using the same mechanisms described for metering and coloring, e.g. by adding shaping support to the meter bands. Packet rewriting is supported by all OpenFlow versions through built-in actions. OpenFlow version 1.0 supports modifying the Ethernet 802.1p priorities and IP ToS bits, this is extended in Version 1.1 by MPLS traffic class bits. In OpenFlow 1.3 the ToS bits of an IPv6 header can be modified as well. All these fields may also be used as matching targets in the flow table(s). Matching and modifying these fields makes it possible to interact with legacy systems on the QoS level, for example to utilize the QoS classification performed and written by legacy systems. Scheduling has limited support in all versions, through queues that attach to a port. In OpenFlow 1.0 and 1.1 only a single type of queue is supported, namely the minimum guaranteed rate queue, which acts as a traffic shaper attached to an outgoing port. This queue allows to reserve a percentage of the outgoing bandwidth of a port, but it is not defined whether it should consider priorities, e.g., via the different packet marking mechanisms. This queuing concept could be extended with more advanced queuing mechanisms, supporting different kinds of queuing algorithms such as random early detection (RED). Additionally they could take into account not only the explicit packet markings (i.e., 802.1p, IP ToS or MPLS TC), but also use parts of the metadata field (only available from OpenFlow version 1.1) to allow packet prioritization without explicit marking. For example, the packet metering functions could implicitly modify bits 0-3 of the metadata, or these metadata bits could be modified by explicit metadata modification actions. OpenFlow version 1.3 introduces another type of queue, the maximum rate queue, which can be used to shape outgoing traffic. One major issue with the current queuing mechanism is that the queues only attach to outgoing ports – it is not possible to chain queues with other queues. This makes it difficult to construct hierarchical queuing structures, something that highly simplifies construction of typical QoS schemes. Using hierarchical schemes one can easily design complex QoS setups like the one depicted on the left in Figure 50, where two different organizations share the bandwidth of a single link. Within their shares they have further subdivided the bandwidth between a number of services, some with lowlatency requirements and some with less stringent latency requirements. The bandwidth values are the guaranteed values for these classes. However, if there is leftover bandwidth one level higher up in the hierarchy, this bandwidth may be used without violating any guarantees. For example, organization B may be using 75 Mbps of the total capacity if organization A is not currently utilizing all of its guaranteed bandwidth. Packets that do not match any of the defined QoS service classes may also utilize any leftover bandwidth, if no guarantees are broken. This type of sharing of the leftover bandwidth is difficult to manage in a fair way with a flat organization (on the right in Figure 50), whereas with a hierarchical organization it is easy to determine how the share of the leftover bandwidth should be distributed among the different service classes. © SPARC consortium 2012 Page 87 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Figure 50: The hierarchical and the flat QoS model A hierarchical QoS model would also fit very nicely in the virtualization concepts discussed in Section 5.2. In this case, the network operator could create and attach queues to the ports per virtual network. These queues would then be presented as physical ports to their respective virtual networks, which could attach their QoS classes directly to this queue (see Figure 52). This would not only ensure isolation between the different virtual networks and allows complex queuing setups within the virtual networks, but could also allow the creation of additional virtual networks within previously defined virtual networks (i.e., recursion). Figure 51: Representing a leaf of one QoS hierarchy as the root of another, virtualized one. 5.9 Multilayer Aspects: Packet-Optical Integration The term multi-layer needs an up-front clarification: In GMPLS terminology (RFC5212), a region is defined by a switching technology. Multi-region nodes in exhibit at least two different interface switching capabilities (ISC) An ISC represents one out of six currently defined switching technologies for the interface: PSC (Packet Switch Capable), L2SC (Layer-2 Switch Capable), TDM capable, LSC (Lambda Switch Capable), FSC (Fiber Switch Capable), and DCSC (Data Channel Switch Capable). This section deals with the integration of optical networks and OpenFlow. Multiple layers may well be formed within a single switching technology, e.g., TDM, multiplexing fine-grained circuits into coarse-grained. The following considerations therefore consider multi-region nodes (which are always multi-layer, as well). © SPARC consortium 2012 Page 88 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Hybrid node Packet Circuit Cct id Match / action / stats 1 2 3 4 port Lab el in Adaptation and Output action 1 2 3 4 Figure 52: GMPLS Multi-Region hybrid node, composed of a packet OpenFlow switch and a circuit (TDM or WDM) switch. Introduction of current generation optical networks into the domain controlled by OpenFlow is a tedious task because of the complexity of optical signal transmission:    While an Ethernet port in a conventional OpenFlow switch will have little more status than being “UP” or “DOWN”, WDM ports are characterized by the number of channels available, the nature of the transceivers (e.g., short/long range), available bit rate and in the near future even the size of the available spectrum. The transmission via Ethernet cable is assumed error-free (because the signal is terminated and regenerated at each switch). However, the quality of the received signal in an optical path varies as function of the length of the optical path, the number of hops without full regeneration, the number of active wavelengths, the modulation scheme, etc. This has the side effect that signal quality is attributed to an optical path (a flow) instead of to a port. While actions in OpenFlow are typically coded as packet header modifications, and output actions, which can be arbitrarily defined, optical crossconnects differ in the available switching capabilities. As an example, ROADMs can be fixed or colorless, directive or directionless, contention-prone or contention-less. These switching constraints are non-existent in Ethernet, and consequentially in OpenFlow switches. In addition, the number of transponders in a so-called “hybrid switch” (switch with non-homogeneous ports, see Section 4.2.1 of RFC5212) limits the throughput. While a label in OpenFlow can be directly matched on the packet headers, the label information for wavelength or OTN channels is indirect. Encoding for labels beyond packet transmission has been introduced along with GMPLS (RFC3471) and has been extended. All these labels, however, are only visible in the control plane and need to be mapped to a certain configuration of the switching matrix in an optical crossconnect. 5.9.1 GMPLS and OpenFlow Most of the problems described above were tackled some five years ago in the IETF GMPLS working groups (ccamp) when defining GMPLS extensions for wavelength switched networks (WSON). There is no obvious reason to invent new encodings for ports, labels, adjustment capabilities and optical impairments [38]. Encodings IETF GMPLS/WSON OpenFlow 1.1.0 spec Port RFC4202 (PSC,L2SC, TDM, LSC, FSC) RFC4206, RFC5212, RFC 6002 (DCSC) Section A.2.1: RFC 3471, RFC4606 (TDM), RFC6205(WDM), draft-farrkingelccamp-flexigrid-lambda-label-03 (FlexGrid) Section A.2.3: Action (Adjustment capacity) RFC 5212, RFC5339, RFC 6001 (IACD) Section A3.6: Supported action bit field in struct ofp_table_stats Flow stats (Impairment encoding) RFC6566,draft-bernstein-wsonimpairment-encode-01.txt Section A3.6: Match (Flow label) enum ofp_port_features struct ofp_match struct ofp_flow_stats Table 5: Options for the transport of encoded information in OpenFlow © SPARC consortium 2012 Page 89 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC The question here is rather how and where these encodings can be introduced in OpenFlow. It turns out (see Table 5) that potentially all of the peculiarities of optical transmission can be architecturally covered by the existing OpenFlow standard 1.1.0. However, some of the mappings require a different understanding of the nature of a flow in optical networks. This is especially the case for the use of ofp_flow_stats for the encoding of optical impairments. In fact, it is the different nature of the optical flow that prohibits flow statistics relying on packet counts. 5.9.2 Virtual ports vs. adaptation actions As discussed in Section 5.1.2, there would be a functional equivalence between virtual ports and potentially complex “processing actions” if the latter would be able to keep state information. For optical networks relying on circuit switching, one could create virtual ports that identify an established circuit, thereby hiding the optical details of this circuit and making the port appear again as a regular OpenFlow port. All configuration of this virtual port would go through either the configuration protocol (OF-config) or OpenFlow itself. This would cover not only creating/updating/deleting the port, but also manage its internal behavior. The virtual port concept would split a multi-region hybrid node into two nodes (see Figure 52), a conventional (packet-switched) OpenFlow controlled part and another part that may or may not be under the control of OpenFlow. Adaptation actions have the potential to pull the encapsulation/decapsulation into the domain controlled by OpenFlow, making the use of a separate configuration interface for virtual ports superfluous/obsolete. Capabilities are bound to tables, starting from OpenFlow 1.1, which means that a specific encapsulation/decapsulation action like, e.g., the Generic Framing Procedure (GFP), could be called as an action on a flow. However, output actions are not yet port-specific, which means that a GFP-mapped TDM signal could end up in an Ethernet port by misconfiguration. It would be helpful to associate ports to tables for a clean configuration of hybrid nodes. 5.9.3 Three Levels of Integration OpenFlow can be gradually introduced to control optical network equipment, leveraging on existing control plane implementations. This may be advantageous as it saves development cost and may as well protect IPR used in the Path Computation Elements [70] of vendors. The following three phases follow the implementation time line in the project OFELIA as presented in [39]. Adaptation of Overlay model 5.9.3.1 One of the GMPLS architecture [69] considered routing models defines dedicated GMPLS control plane instances to the different transport technologies. These instances communicate with each other via a User-to-Network interface. In one possible implementation, a single abstract node represents the whole server domain. The optical transport plane takes care of the optical related attributes (ports, labels, actions) while the packet forwarding is configured via OpenFlow. This way the details of optical transport domainare hidden from the OpenFlow part and the optical domain appears as the backplane of a single Ethernet switch (Figure 53). Operation is such that a packet_in message from one of the hybrid switches triggers the setup of a light path via the GMPLS UNI [71]. This means that the flow_mod entries that are generated by the OF controller for the hybrid switches are following the establishment of a light path between the transponders (appearing as virtual ports between the packet and the circuit switch part of the hybrid switches). OpenFlow controller GMPLS (PCE) Hybrid node Hybrid node Packet Match / action /stats Circuit 1 2 3 4 C c t i d 1 2 3 4 p o r t L a b e l Adaptation and Output action Adaptation and Output action i n L a b e l i n OXC node Circuit C c t i d p o r t L a b e l i n Circuit p o r t Packet C c t i d 1 2 3 4 1 2 3 4 Match / action /stats Adapta tion and Output action Figure 53: Interworking of OpenFlow and GMPLS © SPARC consortium 2012 Page 90 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Abstracted optical layer 5.9.3.2 A second step of integration is the OpenFlow control of optical nodes (Figure 54). This means that each network element has an agent translating the received OpenFlow messages into local (typically SNMP) control commands. A flow_mod from the controller is then used to configure an entry in the switching matrix. Path computation will still be done in a GMPLS PCE, but all configuration of the network elements would now go through the OF controller. While the optical routing is remaining in the GMPLS control plane with the PCE evaluating OSPF-TE messages from GMPLS nodes to create a view of the topology, OpenFlow replaces signaling. The controller requests an ERO (explicit routed object) from the PCE using PCEP, and then configures the nodes accordingly. On the OpenFlow side this step will require encoding of label types and adaptation actions. Ports can still be considered abstract, as they only appear after a feasibility check OpenFlow controller PCEP GMPLS (PCE) SNMP Hybrid node Hybrid node Pac ket Circuit Matc h / ac tion /stats 1 2 3 4 C c t i d 1 2 3 4 p o r t L a b e l Adaptation and Output action Adaptation and Output action i n L a b e l i n Circuit p o r t 1 2 3 4 OXC node Circuit C c t i d p o r t L a b e l i n Pac ket C c t i d 1 2 3 4 Matc h / ac tion /stats Adapta tion and Output action Figure 54: Direct control of optical network elements by the OF controller. Path computation is still being done in a ve dor’s co trol pla e. Impairment-aware OpenFlow 5.9.3.3 The ultimate step integrating packet and optical transmission will be the mapping of the attached transport technologies (i.e., ISC) to flow tables. This will require association of ports to flow tables, the definition of impairment-annotated flow_stats and the path computation as an integral part of the multi-layer OpenFlow controller, as indicated in Figure 55. OpenFlow controller PCE new (optical) matches, capabilities, flow stats Hybrid node Hybrid node Packet Match / action /stats Circuit 1 2 3 4 C c t i d 1 2 3 4 p o r t L a b e l Adaptation and Output action Adaptation and Output action i n L a b e l i n OXC node Circuit C c t i d p o r t L a b e l i n Circuit p o r t Packet C c t i d 1 2 3 4 1 2 3 4 Match / action /stats Adapta tion and Output action Figure 55: Integration of a PCE into a multilayer OpenFlow controller © SPARC consortium 2012 Page 91 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Two main propositions exist in the literature on how to implement this step of full integrated circuit based transport networks to SDN. First we will briefly introduce the one proposed by the University of Stanford, and then we discuss the second solution, proposed by Ericsson. OpenFlow Circuit Switched Addendum by Stanford University: First, there is the OpenFlow Circuit Switched Addendum v.03 [52], which is an extension to OpenFlow 1.0. It consists of seven additions to the OpenFlow 1.0 specification. The Addendum proposes extensions to support certain transport layer technologies, in particular time division multiplexing (TDM). A basic circuit switched cross-connection table is defined inside the OpenFlow switch. This cross-connect table is to be kept separate from the usual OpenFlow packet flow table. The circuit switch flow table has four fields per input and output ports. These include the port, the lambda, the virtual port and the TDM signal and time-slots (starting time slot in the SONET/SDH encoding). Unfortunately, these extensions re-define from scratch the circuit resources used by the TDM and other transport technologies. This adds unnecessary complication to the OpenFlow protocol and compatibility issues with existing control planes (e.g. GMPLS). Ericsson’s GMPLS-aware Multi-Layer/Multi-Region Extensions to OpenFlow: Second, Ericsson further enhanced the circuit switched addendum by reusing GMPLS encodings and the logical concept of GMPLS label switches paths (LSP) [53]. Compared to the former addendum, this solution removes the complexity of re-defining circuit and optical resource encodings and emulates LSP nesting features. The Ericsson ML/MR proposal also contains a circuit flow table, which has however a different use than the current packet flow table; the former will only represent existing connections while the latter serves in a per packet lookup process. The fundamental difference between circuit switched and packet switched OpenFlow is therefore the fact that the circuit flow table is not used to lookup packets. The OpenFlow controller is responsible for setting up the circuits’ cross-connections in the switch using the OpenFlow protocol and treating messages received from the switch regarding the current state of connections. The circuit cross-connections are established in a proactive way, i.e., no packet is forwarded to the controller for circuit flows. However, a packet sent to the controller can trigger the establishment of a new circuit cross-connect (e.g. pre-configured cross-connects, similar to virtual TE-links in GMPLS). The extensions proposed by Ericsson consider hybrid switches with both circuit based and packet based interfaces. This is not to be confused with the OpenFlow hybrid terminology defined by the ONF. The Ericsson OpenFlow extensions partly rely on existing GMPLS features, specifically on GMPLS’ way of provisioning new connections with the standardized label encodings. This implies that GMPLS routing function are taken over by the centralized OpenFlow controller, which furthermore contains traffic engineering (TE) and path computation (PCE) applications. OpenFlow Controller OpenFlow Protocol OpenFlow ML/MR Switch M C Table 1 I M Table n C M I Group Table GI GT C C I Circuit Flow Table Packet Flow Table Table 0 cct ID in port out port L A B E GEN IN L OUT adaptation actions AB Switch Hardware APIs Switch Hardware Drivers Packet Switch Fabric Packet Switch Ports HW Packet Processing Tables TDM Switch Fabric TDM Switch Ports WDM Switch Fabric Line Ports add/drop ports Figure 56: Ericsson proposal for an OpenFlow multi-layer/multi-region switch architecture © SPARC consortium 2012 Page 92 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC The proposed node architecture of an OpenFlow ML/MR switch is shown in Figure 56. Packet flow tables (left side of the figure) are consulted on the fly for each packet to determine its forwarding and required actions. Circuit flows (right side of the figure) represent existing physical circuits established by the switch. The circuit IDs serve as virtual ports to other flows. A circuit ID is a virtual port to which incoming packet flows can be forwarded. Other circuit flows can also point to a circuit ID and hence represent circuit hierarchies (the equivalent to GMPLS LSP nesting). The circuit flows do not affect the on the fly processing of packets. The proposed architecture is an extension to the OpenFlow 1.1 specification, the left side of Figure 56 is therefore left unmodified. CCT ID in port out port Gen Label label in label out (encoding,ST,G-PID) (e.g. TDM/WDM) (e.g. TDM/WDM) adaptation actions Figure 57: Circit flow table entry Figure 57 shows that each entry in the circuit flow table consists of a set of circuit identifiers and descriptive fields. Again, the circuit table is just an internal representation of existing cross-connects inside the switch and a new entry is added each time the controller signals the establishment of a new circuit. For the case of bidirectional circuits, the circuit will occupy two entries in the circuit flow table, as resources (In/Out Labels) may not be symmetrical in both directions. The flow table fields in Figure 57 are defined as follows:      Circuit Identifier (CCT ID): a 32 bit unsigned integer represents the circuit flow and also corresponds to a virtual port to which other flows can be forwarded. In Port/Out Port: a 32 bit unsigned integer represents the incoming/outgoing port number between which the circuit cross-connects have been programmed. General Label (Signal Specification): a 32 bit unsigned integer represents the information required to fully characterize the cross-connect as part of an end-to-end circuit. It comprises of the following GMPLS specified attributes: encoding, switching type and payload identifier. The OpenFlow extension adopts the code point domains defined by IETF RFC 3473: o Encoding: an eight bit unsigned integer that designates the signal type of the connection within the transport technology class. For example, both SONET and OTN are denoted with TDM switching capabilities, but different encoding code points identify them. o Switch Type: an eight bit unsigned integer that designates the switching type used on the link. This is particularly important for hybrid switches, which has interfaces supporting more than one region. o G-PID: a sixteen bit unsigned integer that designates the payload of the client signal carried in the circuit. In Label/Out Label: a vector of 32 bit unsigned integer represents the incoming/outgoing label following GMPLS standardized labels per technology. Adaptation actions: the adaptation of the signal from the input towards the output port. This field can also be used in the future for specific technology related actions (e.g. related to optical technologies). The establishment of a new circuit flow (and hence its addition to the circuit flow table) must carry enough information to allow the switch to program its cross-connections. To be able to signal the new circuit flow cross-connect, the controller first needs to know the features of the switch, its ports, and the available resources. The information stored in the circuit flow table comes from the controller and is sufficient for the switch to establish the circuit connection. To this end, the controller needs to keep an updated view of the switch’s resources and state. © SPARC consortium 2012 Page 93 of 129 WP3, Deliverable 3.3 6 Split Architecture - SPARC Implementing Carrier-Grade SplitArchitecture in an Operator Network SplitArchitecture and SDN provide a significant degree of freedom to network planners and designers as they remove several constraints and boundaries found in legacy architectures. An operator network is typically structured in transport domains. Today the size and scope of such transport domains are fixed by manual network planning, as different datapath elements typically provide different functions. However, in SplitArchitecture the differences among datapath elements (e.g. switching and routing devices) start to disappear, as datapath elements are capable of providing both functions in parallel. This enables network operators to adapt a fixed transport domain and its boundaries based on dynamic conditions (e.g., load situation, etc.), of course within the limits defined by the physical deployment and wiring of datapath elements. We identified several types of integrating OpenFlow in carrier networks: 1. Emulation of transport services: As a first step, OpenFlow may be introduced in transport domains (e.g., Ethernet, MPLS, optics, etc.) by replacing legacy network devices with OpenFlow-compliant datapath elements and deploying a control plane that emulates behavior of the legacy transport technology in use, e.g., an Ethernet domain, an MPLS domain, etc. All nodes connected to such an OpenFlow enhanced transport domain still use legacy protocols for providing service and remain unaltered. OpenFlow in its versions 1.0 and 1.1 provide all the means to control Ethernet transport domains in such a scenario. However, support for enhanced Ethernet or MPLS services (e.g., those from the Metro Ethernet Forum), including OAM and reliability features, is beyond scope of OpenFlow 1.0/1.1. 2. Enhanced emulation of transport services: For a carrier-grade SplitArchitecture, a number of mandatory features and functions must be added to OpenFlow in order to fully comply with OAM requirements (among others), resiliency and scalability needs. OpenFlow lacks support for such advanced functions in versions 1.0 and 1.1 and must be extended accordingly to emulate carrier-grade transport services. Basic MPLS support was added to OpenFlow 1.1, but support (e.g., for MPLS-specific OAM schemes like BFD) is still lacking. We cover some of the necessary extensions to OpenFlow 1.0 and 1.1 in Section 5 of this deliverable, including OAM, advanced processing, interaction with legacy stacks, resiliency, and multilayer operation in OpenFlow. Again, all service nodes in this second integration scenario remain unaltered. 3. Service node virtualization: Thus far we have focused on ways of emulating legacy transport domains with OpenFlow. However, besides such basic transport services, carrier-grade operator networks provide a number of additional functional elements, e.g., for authentication and authorization, service creation, enforcing quality of service, etc. Most of these functions are today located on a limited set of network devices; the discussions in Deliverable D2.1 have documented the exposed position of the Broadband Router Access Service Gateway (BRAS) in carrier-grade operator networks according to the architecture defined by the Broadband Forum and deployed by most operators. A third integration level for a SplitArchitecture is virtualization of such service node functions inside OpenFlow. This involves control plane as well as datapath elements to cope with more advanced processing needs, as interfacing with more legacy protocol stacks must be supported. For the access/aggregation use case, we will showcase the virtualization of service nodes in OpenFlow in more detail in Section 3 based on an access/aggregation PPP/PPPoE UNI example. 4. All-OpenFlow-Network: Obviously, OpenFlow deployments may be pushed forward to other network domains as well, e.g., for controlling residential gateways (RGW) in customer premises networks or toward the operator’s core domain. Controlling RGWs may simplify service slicing and service deployment in customer premises networks, but defines new constraints on an operator’s control plane in a SplitArchitecture: Controlling customer-owned devices outside of the network operator’s area of responsibility may impose additional security requirements. However, these security implications are beyond the scope of this deliverable. Existing OpenFlow versions (1.x) suffice for integration of OpenFlow for pure emulation of transport services (type 1 listed above). Consequently, this deliverable is mainly targeting enhancements to the emulation of transport services (type 2) by extending the architecture and protocols as described in Sections 4 and 5. However, besides OpenFlowbased configuration of transport nodes, this deliverable also starts to look into service node virtualization (type 3), i.e. how service functionalities can be implemented through a centralized control architecture, discussed so far in Section 5.6. In the following sections, we will outline generic approaches of how to realize service creation in Access/aggregation network with OpenFlow. We will then present specific examples for residential and busioness service creation (BRAS, DHCP and PWE, respectively). © SPARC consortium 2012 Page 94 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC OpenFlow in Access/Aggregation Networks 6.1 One essential question in the design of the SplitArchitecture is how to integrate OpenFlow into the existing network architecture. There are three aspects to consider: The first aspect is dealing with the level of the hierarchy at which we introduce OpenFlow. The second aspect relates to the number of hierarchy levels controlled by OpenFlow. Finally, the third aspect focuses on which functionalities are configured through OpenFlow. Considering these aspects, we will present three evolutionary approaches for today’s residential service creation as in Section 5.6.3. Today’s residential model Centralised OpenFlow control BRAS OF Controller AGS2 Decentralised OpenFlow control Complete OpenFlow control automatic configuration less network devices AGS1 Tunnel DSLAM OF Controller Tunnel OF Controller RGW Figure 58: Three models for attachment of OpenFlow in access/aggregation networks The “Centralized OpenFlow Control” model would be very similar to today’s model with the exception that the OpenFlow controller would manage the central IP edge. This would result in a centralized network element in which potentially all customer traffic must be managed through OpenFlow with fine granular control – requiring powerful hardware. In addition, this model requires, similar to today’s model, a mechanism to maintain routes to forward all packets from the RGW to the central element – the IP edge. In the “Decentralized OpenFlow Control” model an OpenFlow controller would configure the DSLAMs. The difference to the previous model is that the DSLAMs control a significantly smaller amount of flows and bandwidth, but the OpenFlow controller needs to handle more connections to different DSLAMs than in the centralized model. The connection between the device and the OpenFlow controller, or between the OpenFlow controller and the network management system, requires additional network connectivity, which is implemented either with an out-of-band dedicated control plane network or multiplexed with the data links resulting in an in-band control network. Again, similar to the centralized model, the connection between DSLAM and the IP edge needs to be more automatic and the OpenFlow controller/subsystem needs to trigger the configuration of required transport connections. In the “Complete OpenFlow Control” approach all devices, including the DSLAM, are controlled by an OpenFlow controller. Besides managing service configuration at the DSLAM, the controller is also responsible for provisioning transport functions of all forwarding devices along the network path to the IP edge. Note that, for performance and compliance reasons, mechanisms other than OpenFlow may be used for managing these transport functions, but how they are implemented depends on the capabilities of the OpenFlow controller The decision favoring or rejecting a particular model is not part of the discussion in this section. Most likely there will be no clear general preference for a particular model in real network deployments, as this depends on several other aspects such as the size of the network, the availability of a hybrid OpenFlow switch model, which functionality needs be controlled, etc. The discussion of these models for the use of service creation continues in the following Section 6.2. 6.2 Implementing OpenFlow for Residential and Business Services in Carrier Environments In Section 5.4 service creation was introduced and detailed for residential and business services. Aspects of OpenFlow and related implementations were presented. From a technical perspective, the level of detail was not sufficient. Thus a detailed proposal for residential and business customers is presented here. © SPARC consortium 2012 Page 95 of 129 WP3, Deliverable 3.3 6.2.1 Split Architecture - SPARC Residential customer service with OpenFlow As described in Section 5.4, there are several solutions available with varying implementation options, see Figure 59 below for a general overview. Today PPPoE with  BRAS OF design today Out of scope  SPARC BRAS Split in architecture between forwarding and processing  SPARC DHCP++ Unification of production Figure 59: General implementation options for service creation Option 0, “OF design today,” follows the design rules and principles of the OpenFlow development done up to Version 1.1 – add any new protocol structure to the base protocol. As already discussed, this could result in an “explosion” of required protocol options, and hardware support might not be implemented in any case. In addition, the split between forwarding and processing could not be integrated. Therefore, this solution is considered out of scope. Option 1, “SPARC BRAS,” transforms current service creation design into the SPARC design principle of the split between forwarding and processing. The essential aspect in this solution is the required support for PPPoE processing, which could be hardware supported or emulated in various ways. Overall, several options are discussed in the following subsections. Option 2, “SPARC DHCP++,” uses the split of forwarding and processing, but essentially introduces a new set of protocols which needs to be supported in the OpenFlow environment. From a carrier’s perspective, the model provides an integrative approach for the various service creation models used today. The target of this solution is to not introduce new required protocol support to the base specification, but to propose additions for required features in OpenFlow. Beside the need for legacy support for PPPoE, this section details another important requirement on OpenFlow improvements as seen from SPARC: The decision logic for authentication and authorization. The authentication aspect is discussed a separate subsection in order to outline the requirements and the potential options in more detail. 6.2.1.1 Authentication In Section 5.4, authentication has been identified as one of the requirements and important phases of service creation. Some more detailed information on the specific models of authentication for residential customers in the models based on PPP as used in the model SPARC BRAS and DHCP as used in the model SPARC DHCP++ is presented here. The level of detail of authentication can differ depending on the desired level of information, ability of fine granular management and demands from legal aspects (which are omitted here). In general, there are two different levels: the authentication of a single user and the authentication of a connection per port. In the latter case, it could not be assumed that only one customer is authenticated. For example, fixed Internet dial-in environments (e.g., POTS, ISDN) were authenticating a single user only (and demanded the function for multiple users per dial-in connection) until the widespread deployment of residential gateways in xDSL environments, which typically uses only one set of user name/password for the whole group of users connected to them. Therefore, the most common model today is the authentication of a connection per port. Similar models exist in mobile environments as well, where the definition of connection/port differs, but uses the same principle. The authentication of a single customer could be required in different situations/cases:   Each customer needs to be authenticated in the case of multiple customers per connection per port. Customer authentication in nomadism and a number of mobility cases. Another important reason for authenticating on the port level is the configuration of a customer / service specific profile (parameters such as QoS profiles, nomadism, access to service platforms, etc.). In general the same reasoning applies for the authentication of a single user and therefore this model should be combined with the authentication of a port. Port-level authentication is typically handled through the help of an (virtual) identifier (defined as Line ID) and a mechanism to include this identifier in the connection setup process. In principle, the Line ID could be anything that © SPARC consortium 2012 Page 96 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC uniquely identifies the port (the process of creation/provision of the LineID is not yet standardized). Since the authentication is performed centrally at the BRAS node, a mechanism to pass the Line ID (however defined) to the BRAS is essential. The mechanism to include the Line ID in the connection setup process depends on the protocol used in the service creation. For PPPoE, this is done by the PPPoE Intermediate Agent. In DHCP a similar function exists with option 82 (DHCP Relay Agent Information Option) function, integrated into the DSLAM. With OpenFlow authentication can be a rather simple or extremely complex processes depending on the two outlined authentication targets (single user or connection/port). Both mechanisms in today’s carrier environments (PPPoE Intermediate Agent and DHCP option 82) could be supported easily with the right filtering mechanisms in the DSLAM and forwarding to appropriate processing, or through the use of the hybrid mode and ignoring the authentication messages from the OpenFlow-configured part of the switch. On the other hand, the port information is integrated in any OpenFlow packet-in message sent to the controller. Therefore, the information about the port could be used in any controller application that in turn could, e.g., fetch it from a database attached to the controller. This approach would require OpenFlow support in the DSLAM or any other network device terminating the customer’s access line. Otherwise appropriate port information would not be available for the controller or would have to be added to the request (packet-in message) with appropriate mechanisms like adding a virtual identifier (e.g., VLAN ID) at the ingress of the DSLAM (the port) and the correlation of VLAN ID with the respective DSLAM port. Figure 60 depicts the principle of the PPPoE model and the potential application of the split control functions. Resource configuration is an essential part of service creation, but will be documented in an upcoming SPARC deliverable – it is included in the illustration in order to show the difference between the two models. Today service creation is performed centrally in the BRAS and the configuration information is transmitted via RADIUS. In an OpenFlow environment, an AAA application has to take over the task, or at least needs a link to the resource configuration system/function via the OpenFlow controller. RADIUS AAA and User Profile server Resource configuration* BRAS AAA and User Profile server AAA App AGS2 PPPoE AAA inform. AGS1 DSLAM with PPPoE Intermediate Agent OF Controller AAA inform. RGW * Or via OF controller Figure 60: AAA integration in PPPoE and OpenFlow Authorization is the step after authentication and could be handled in a simple fashion with OpenFlow too. The controller (or rather, an authorization application running on top of it) can simple send appropriate replies in “Packetout” messages and configure the forwarding and processing policies at the DSLAM using e.g. flow modification messages. 6.2.1.2 Example: SPARC BRAS We depict three deployment scenarios of a split BRAS (Broadband Remote Access Server) in Figure 61. Note that here we consider a BRAS to be a software component deployed on a Broadband Remote Access Router (BB-RAR) or Broadband Network Gateway (BNG) as defined in BBF TR-101. Therefore, BRAS and BNG are used synonymously within this document: a) The first scenario covers a typical legacy BRAS deployment. The BRAS is deployed on a BNG and the latter acts as a termination point towards the Ethernet based access/aggregation domain. The BRAS is a central point containing all functions defined within TR-101 [61]. b) The second scenario replaces the BRAS/BNG device with an SDN based solution consisting of an extended datapath element and a control plane implementation that emulates the behavior of a legacy BRAS device. This scenario allows a smooth deployment of SDN-based elements within a legacy deployment and defines a migration path. Necessary extensions to the datapath element must provide the requested protocol encapsulation services, policy enforcement, and OAM functions. © SPARC consortium 2012 Page 97 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC c) In the third scenario, the BNG functions are distributed on different devices within the access/aggregation domain. The BNG provides VLAN support, QoS enforcement, hierarchical scheduling, user traffic isolation, IP forwarding services, multicast support, ARP processing, DHCP specific functions, security, and OAM support. Contrary to the centralized scenario (b), SDN allows distribution of these functions among all nodes within the access/aggregation domain, thus leading to load sharing, relieving the centralized BRAS device. Today’s residential model BRAS PPPoE App RADIUS new RADIUS OF Controller OF Controller AGS2 AGS2 AGS1 SPARC BRAS Option 2 PPPoE App BRAS RADIUS PPPoE Session (data and control) SPARC BRAS Option 1 Transparent for PPPoE (*) PPPoE Data Session AGS1 AGS2 Transparent for PPPoE (*) PPPoE Data Session DSLAM DSLAM RGW RGW * Except for PPPoE Intermediate Agent PPPoE Control Session * Except for PPPoE Intermediate Agent AGS1 PPPoE Control Session (*) DSLAM RGW * PPPoE Intermediate Agent required (line ID) Figure 61: SPARC BRAS optio s i co trast to today’s reside tial odel We have seen from the discussion in the previous section that SplitArchitecture allows network operators the freedom to distribute control and data plane functions across several network elements. In a legacy access/aggregation domain the BRAS, as a special purpose service node, provides all authentication and authorization services for specific service types and traffic shaping in a centralized manner. With SDN techniques, a network operator may virtualize all these functions in the control plane and may enforce them in the data plane at different locations. SDN allows an operator to define a smooth migration strategy: in a first phase legacy network elements (e.g. BRAS) may be replaced with SDN enabled ones. An SDN enabled BRAS must support the same set of protocols and network functions as its legacy counterpart in order to act as a drop-in replacement. Besides BRAS compliant devices, a network operator may also replace the aggregation switching devices and/or access nodes with SDN enabled versions. After replacing all legacy devices, a network operator may decide to maintain the old set of protocol suits on all these devices. This in fact results in emulating a legacy domain and all network functions at their intended locations, respectively. As an alternative, a network operator may also decide to distribute or relocate functions inside the aggregation domain and thus, changing the network protocol stacks deployed on individual network elements. In either case the SDN framework must provide adequate means to emulate all network functions defined by the network operator for his aggregation domain, e.g. as defined by the Broadband Forums TR-101 specification [61]. For an exhaustive list of requirements for a revised BRAS (named Broadband Network Gateway or BNG for short), please refer to the aforementioned document. We limit our discussion on mapping these functions to an OpenFlow-based SDN framework. TR101 Issue 2 chapter 5 defines the requirements for a Broadband Network Gateway (BNG) for the following categories: - VLAN support - Quality of Service and Hierarchical Scheduling - Multicast - ARP processing - DHCP relaying - OAM - Security Functions © SPARC consortium 2012 Page 98 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC We compare the requested protocols with the current version of OpenFlow at the time of writing, i.e. OpenFlow v1.3. High level Architectural Requirements R-10 The Broadband Network Gateway MUST be able to terminate the Ethernet layer and corresponding encapsulation protocols. R-11 The Broadband Network Gateway MUST be able to implement the counterpart of the functions added to the Access Node for access loop identification, Ethernet-based QoS, security and OAM. R-12 Following TR-059 QoS principles, the Broadband Network Gateway SHOULD be able to extend its QoS and congestion management logic (e.g. hierarchical scheduler) to address over-subscribed Ethernetbased topologies.  Supported  In principle a BRAS/BNG acts as an IP routing device and forwards packets between  Supported  An IP router terminates L2 transport domains and requires Ethernet transport endpoints in each of these domains. Thus, MAC address rewriting is required for replacing source and destination MAC addresses in the Ethernet header.  the operator’s core network and the access/aggregation domain and vice versa. Since version 1.1, OpenFlow datapath elements provide support for IPv4 based layer 3 forwarding and since OF 1.2, also for IPv6 datagrams. Also supported: TTL decrement, IP ECN and DSCP fields for matching and queueing. Unsupported  OpenFlow 1.3 lacks PPPoE/PPP support, although some preliminary extensions for PPP termination have been defined. VLAN support R-190 The Broadband Network Gateway MUST be able to attach a single S-Tag to untagged frames in the downstream direction. R-191 The Broadband Network Gateway MUST be able to double-tag frames (S-C-VID pair) in the downstream direction. R-192 The Broadband Network Gateway MUST be capable of associating one or more VLAN identifications with a physical Ethernet aggregation port. These may be S-VIDs or S-C-VID pairs. R-193 The Broadband Network Gateway MUST support a one-to-one mapping between an S-VID or S-CVID pair and a user PPPoE or IPoE session. R-194 The Broadband Network Gateway MUST support a one-to-many mapping between an S-VID or S-CVID pair and a user PPPoE or IPoE sessions, where multiple PPPoE and/or IPoE sessions from the same user are within the same S-VID or S-C-VID pair. R-195 The Broadband Network Gateway MUST support a one-to-many mapping between a S-VID or S-CVID pair and users sessions, where multiple PPPoE and/or IPoE sessions from multiple users are within the same S-VID or S-C-VID pair.  Supported  OpenFlow supports IEEE 802.1ad style Q-in-Q and IEEE 802.1aq MAC-in-MAC virtual LANs. QoS – Hierarchical Scheduling / Policing R-196 The Broadband Network Gateway MUST be able to perform at least 3-level HS towards the Ethernet aggregation network. R-197 The Broadband Network Gateway SHOULD be able to perform 4-level HS towards the Ethernet aggregation network. R-198 The Broadband Network Gateway MUST be able to identify the root level by a single physical port. © SPARC consortium 2012 Page 99 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC R-199 The Broadband Network Gateway SHOULD be able to identify the root level by a group of physical ports. R-200 The Broadband Network Gateway MUST be able to identify the second level (and potentially the third) by either 1 or 2 below. R-201 The Broadband Network Gateway SHOULD identify the second level (and potentially the third) by a combination of 1 and 2 above. R-202 The Broadband Network Gateway MUST identify the access loop by: a single C-VID, S-VID or S-CVID pair, or by the User Line Identification (described in Section 3.9). R-203 The Broadband Network Gateway MUST identify the logical port or session by a C-VID, S-VID or S-CVID pair, by the User Line Identification, by IP address, or by PPPoE session. R-204 The Broadband Network Gateway MUST be able to map between IP traffic classes and the Ethernet priority field. R-205 The Broadband Network Gateway MUST support marking Ethernet drop precedence within at least 2 traffic classes and MUST support configurable mapping from both the classes as well as drop precedence to the 8 possible values of the Ethernet priority field. R-206 The Broadband Network Gateway MUST support marking Ethernet direct indication of drop precedence within all supported traffic classes based on setting the DEI bit value of the S-Tag header. R-207 The Broadband Network Gateway, when receiving information about Broadband line rate parameters through PPP or DHCP, MUST NOT apply the information in an additive fashion when multiple sessions are active on the same Broadband line (the underlying rate is shared by all the sessions on a given line although each session will report the rate independently). R-208 The Broadband Network Gateway MUST support the application of ingress policing on a per user basis. R-209 The Broadband Network Gateway MUST support the application of Ingress policing on a per C-VID, S-VID or S-C-VID pair basis. R-210 The Broadband Network Gateway SHOULD support the application of ingress policing of a group of sessions or flows for a given user.  Out-of-scopeImplementing a scheduling policy is out of scope of the OpenFlow specification. OpenFlow defines the SetQueue action that can be used for mapping specific flows to QoS enhanced queues on an outgoing port. The queue-id namespace is a 32-bit number which should be sufficient for implementing the set of required queues. Multicast R-261 The Broadband Network Gateway MUST support multicast routing capabilities per TR-092 Appendix A “Multicast Support.” R-262 The Broadband Network Gateway MUST support IGMPv3. Note: IGMP v3 includes support for endpoints using earlier IGMP versions. R-263 The Broadband Network Gateway MUST support IGMPv2 group to source address mapping for IGMP v2 to PIM/SSM compatibility. R-264 The Broadband Network Gateway MUST provide the following statistics: … (see TR 101 Issue 2 for details) R-265 The Broadband Network Gateway MUST support forwarding the multicast traffic on the same Layer 2 interface on which it receives the IGMP joins. R-266 The Broadband Network Gateway MUST support the following configurable parameters per port (i.e. physical or logical port (VLAN), but not per end user). This allows the Broadband Network Gateway to enforce service level agreements in real-time. R-267 The Broadband Network Gateway MUST support IGMP immediate leave as part of the IGMP router function. © SPARC consortium 2012 Page 100 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC R-268 The Broadband Network Gateway MUST immediately send Group Specific Queries out of an interface if it receives an IGMP query solicitation message (i.e. a Group Leave for group „0.0.0.0‟).   Supported  All multicast related requirements can be fulfilled by the functions defined for datapath elements in the OpenFlow specification. Supported in slow path  All IGMP specific functions will be dealt with in the slow path, i.e. in the control plane. IGMP processing and Hierarchical Scheduling R-276 A Broadband Network Gateway supporting hierarchical scheduling MUST support dynamic adjustment of the user-facing QoS shapers to reflect changes in the number of multicast groups joined by a user. (These adjustments would be inclusive of all levels of the hierarchy). R-277 A Broadband Network Gateway supporting hierarchical scheduling MUST be able to trigger dynamic adjustment of the user-facing QoS shapers based on the tracking of IGMP messages received on both regular user-facing interfaces as well as on the appropriate multicast VLAN, and also based on local knowledge of the peak-rate of multicast streams. The correlation mechanism to identify the proper scheduler node with an associated multicast group or groups is an implementation option of the BNG. The PPPoE session VLAN and IPoE multicast VLAN may or may not be the same. R-278 A Broadband Network Gateway supporting hierarchical scheduling SHOULD debit the amount of traffic offered by a given multicast group from the user-facing QoS shapers based on a provisioned association between a multicast group and a peak information rate. R-279 A Broadband Network Gateway supporting hierarchical scheduling MAY debit on a packet by packet basis the amount of traffic offered by a given multicast group from the user-facing QoS shapers on a real time basis. ARP processing R-211 For a given IP interface (say in subnet Z), the Broadband Network Gateway MUST be able to work in „Local Proxy ARP‟ mode: routing IP packets received from host X on this interface to host Y (X and Y are in subnet Z) back via the same interface. Any ICMP redirect messages that are usually sent on such occasions MUST be suppressed. R-212 The Broadband Network Gateway MUST respond to ARP requests received on this interface for IP addresses in subnet Z with its own MAC address. This requirement refers to both N:1 VLANs as well as to several 1:1 VLANs sharing the same IP interface on the BNG.  Supported  OpenFlow supports ARP as a native protocol in the data plane and can emulate MAC addresses in its action lists or sets. DHCP relay R-213 The Broadband Network Gateway MUST be able to function as a DHCP Relay Agent as described in RFC 951 “BOOTP”, RFC 2131”DHCP” and RFC 3046 “DHCP Relay Agent Information Option” on selected untrusted interfaces. R-214 The Broadband Network Gateway MUST be able to disable the DHCP Relay Agent on selected interfaces. R-215 The Broadband Network Gateway MUST be able to function as a DHCP relay agent on selected trusted interfaces, from which it does not discard packets arriving with option-82 already present, and does not add or replace option-82 in these packets. R-216 The Broadband Network Gateway MUST be able to function as a DHCP relay agent on selected trusted interfaces and MUST NOT strip out option-82 from the corresponding server-originated packets it relays downstream. © SPARC consortium 2012 Page 101 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC R-217 The Broadband Network Gateway, when functioning as a DHCP Relay Agent, MUST discard any DHCP packets with non-zero „giaddr‟ in the DHCP request from the client. R-218 The Broadband Network Gateway, when functioning as a DHCP Relay Agent, MUST send the DHCP packets downstream as Layer 2 unicast or Layer 2 broadcast, according to the broadcast bit in the request. R-219 The Broadband Network Gateway, when functioning as a DHCP relay agent, MUST be able to transparently forward any DHCP option information other than for option 82.  Supported on slow path  DHCP is not supported by OpenFlow. All requirements defined in R-213 – R-219 must be implemented in the control plane (i.e. in the slow path). As the number of DHCP packets is typically low, this does not seem to be a critical limitation. OAM Short Intra-Carrier Maintenance Level R-354 The BNG MUST support an outward-facing Maintenance association End Point (MEP) on a per userfacing port and per S-VLAN basis. R-355 The BNG MUST support initiating a Loopback Message (LBM) towards its peer MEPs and receiving the associated Loopback Reply (LBR), for the MEP(s) on the user-facing port. R-356 The BNG MUST support receiving a Loopback Message (LBM) from its peer MEPs and initiating the associated Loopback Reply (LBR), for the MEP(s) on the user-facing port. R-357 The BNG MUST support initiating a Link Trace Message (LTM) towards its peer MEPs and receiving the associated Link Trace Reply (LTR) messages, for the MEP(s) on the user-facing port. R-358 The BNG MUST support receiving a Link Trace Message (LTM) from its peer MEPs and initiating the associated Link Trace Reply (LTR), for the MEP(s) on the user-facing port. R-359 For business customers and/or premium customers requiring proactive monitoring, the BNG SHOULD support generating Continuity Check Messages (CCMs) towards its peer MEPs for the MEP(s) on the userfacing port. R-360 The BNG MUST support turning off sending CCMs for the MEP(s) on the user-facing port, while keeping the associated MEP active. R-361 The BNG SHOULD be configurable to assume continuity exists from a remote MEP while not receiving CCMs from this MEP. R-362 The BNG MUST support receiving AIS messages on the MEP(s) on the user-facing port. R-363 The BNG SHOULD trigger the appropriate alarms for Loss of Continuity. A TR 101 compliant access/aggregation domain utilizes Ethernet like OAM mechanisms based on IEEE 802.1ag-2007 and ITU Y.1731. In a hybrid environment, SDN enabled devices must emulate legacy OAM behavior for interacting with non-SDN legacy devices. There has been no native OAM support defined in OpenFlow so far, although some basic ingredients for brewing an OAM solution exist (group tables, buckets, liveness of buckets, etc.). In Section 5.3, we introduced and discussed three proposals for defining a flow OAM solution for OpenFlow in order to address the specific OAM needs of SDN frameworks. A brief summary:   Carrier-grade OAM solutions require usually high precision timers (e.g. < 50ms for protecting voice carrying connectivity) for detecting loss of connectivity. An OAM endpoint’s core logic must detect a loss of connectivity on a primary path within a specific period of time, typically by exchanging echo messages with its peer, and maintain a second alternative path in case of a failure. OpenFlow provides the basic functionality in terms of group table entries and buckets for this task. However, the SDN control channel induces additional delay, OAM endpoints should not be instantiated within the slow path (=control plane) and must be deployed directly in the user plane. In case of a failure event, the OAM endpoint’s core logic must notify the control plane about this incident any further countermeasures. OpenFlow lacks specific notification functions for signaling such error conditions currently, but either OpenFlow’s ERROR message or an experimental message may be used for sending such indications to the control plane. © SPARC consortium 2012 Page 102 of 129 WP3, Deliverable 3.3   Split Architecture - SPARC Usually, OAM frameworks define specifically tailored OAM packet formats for achieving their monitoring goal. We use specific virtual ports for creating BFD OAM packets within our demo implementation, but a more capable framework for defining arbitrary packet creation may be defined later within OpenFlow (see the packetC discussion in Section 5.1). We assume that creation, configuration, and destruction of virtual ports for OAM packet management happen via a proprietary management interface of the datapath element. Final conclusion  An OpenFlow based SDN framework can provide the necessary OAM functionality, but additional logic and an additional interface for managing OAM endpoints and signaling state and events is required. OAM  Carrier Maintenance Level R-364 For 1:1 VLANs, the BNG MUST support using a Multicast LBM towards its peer MEP. OAM  Customer Maintenance Level R-365 The BNG MUST support receiving a unicast or multicast Loopback Message (LBM) from its peer MEPs and initiating the associated Loopback Reply (LBR), for the MEP(s) on the user-facing port. R-366 The BNG SHOULD be able to be configured to assume continuity exists from a remote RG MEP while not receiving CCMs from this MEP. R-367 The BNG MUST be able to be configured to discard all incoming LTMs on a per user-facing port, per S-VLAN and per C-VLAN basis. R-368 The BNG MUST support rate limiting of received CFM Ethernet OAM messages arriving on a per user-facing port. Security Functions (Source IP spoofing) R-220 The Broadband Network Gateway MUST only respond to user ARP requests when they originate with the proper IP source address and are received on the appropriate 802.1q VLAN, or 802.1ad stacked VLAN. R-221 The Broadband Network Gateway MUST be able to detect and discard ARP requests and reply messages with „sender protocol address‟ other than the one assigned (i.e. spoofed). Specifically, the Broadband Network Gateway MUST NOT update its ARP table entries based on received ARP requests. R-222 The DHCP relay agent in the Broadband Network Gateway MUST inspect downstream DHCP ACK packets, discover mapping of IP address to MAC address and populate its ARP table accordingly. R-223 The DHCP relay agent in the Broadband Network Gateway SHOULD follow the lease time and lease renewal negotiation, and be able to terminate any user sessions and remove the corresponding ARP table entry when the lease time has expired. R-224 Having the knowledge of MAC to IP mapping (achievable by following R-222 and R-223), the Broadband Network Gateway MUST NOT send broadcast ARP requests to untrusted devices (i.e. RGs).  Supported in slow path  These requirements cannot be fulfilled on a datapath element within the OpenFlow specification. However, an implementation may be moved in the control plane. Mandatory datapath element extensions   Provide support for PPP-over-Ethernet encapsulation/decapsulation according to RFC 2516. The current OpenFlow specification (v1.3 at time of writing) supports flexible matches for new protocols. However, push and pop operations for PPPoE and PPP must be added to the OpenFlow specification. The SET-FIELD action adopts the OpenFlow Extensible Match (OXM) TLVs and can be used for setting protocol header fields within PPPoE and PPP. For both PPPoE and PPP the following OXM TLVs should be defined: © SPARC consortium 2012 Page 103 of 129 WP3, Deliverable 3.3 Header field PPPoE session id      Split Architecture - SPARC Description 16bit session id  identifies jointly with Access Concentrator MAC address and user MAC address uniquely the user session PPPoE code Identifies the PPPoE frame type during 4-way handshake, i.e. PADI, PADO, PADR, PADS, and PADT PPPoE version PPPoE version, defined as value “1” by RFC 2516 PPPoE type PPPoE type, defined as value “1” by RFC 2516 PPP protocol The PPP protocol field (one byte or two bytes long) defined according to RFC 1661 An Ethernet based access/aggregation domain adopts IEEE 802.1ag and Y.1731 for OAM. In TR-101 the authors have defined four levels of OAM: customer, carrier, intra-carrier, and access link. The endpoints of these maintenance domains depend on the intended model (broadband access vs. wholesale service, etc.). However, a datapath element used for emulating BNG services must provide support for Ethernet connectivity fault management. Refer to Section 5.3 for a description of the CFM OAM toolset. Beginning with version 1.1 OpenFlow provides basic support for implementing OAM services using so-called groups. Groups contain buckets and ActionLists and allow implementation of fast-failover strategies in the case of a network port failure. For the extended IEEE 802.1ad based model of TR-101 OpenFlow provides necessary functions since version 1.1. Implementers may decide to support an arbitrary depth of chained VLAN tags and provide push/pop operations in order to insert or remove tags accordingly. For IP forwarding services, a datapath element must be enabled to decrement TTL values in the IPv4 header. This functionality has been added to the OpenFlow specification since version 1.1. Policy enforcement and QoS support as a 3-level hierarchical scheduler must be available (see the final paragraphs of Section 5.8 for details on hierachical QoS models). QoS management is out of scope of the base OpenFlow specification and is expected to be configured via a third-party proprietary interface. However, OpenFlow provides means to queue packets to specific CoS queues on a network interface (see action ActionSetQueue). OpenFlow lacks efficient means for implementing traffic shaping strategies. However, we assume that such functionality is defined outside of the OpenFlow datapath element via a proprietary interface. Since version 1.2 of the OpenFlow specification, a precise definition of a datapath element’s classifier has been omitted. For supporting PPPoE/PPP, a datapath must be enabled to parse PPPoE and PPP protocol headers and to match these against the above defined OXM TLVs. With OpenFlow version 1.3, a control plane developer may define an arbitrary sequence of push operations, i.e. it is up to the control plane to ensure that the final sequence of headers (PPPoE within VLAN within VLAN within Ethernet) is useful. The datapath may not check validity of the defined order of push commands. Modules for emulating a BRAS/BNG in the control plane The following list provides a set of mandatory functions for a BRAS/BNG emulation in the control plane. An architecture for implementing such a control plane is detailed in deliverable D4.3.         A module for providing an emulated PPPoE access concentrator functionality bound to an operator defined MAC address must be available. Functionality for hosting a PPPoE session must be available. Functionality for hosting a PPP session must be available. The PPP module must be enabled to connect to AAA interfaces for session management, e.g. Radius or Diameter. An IP routing module must be available capable of defining Flow-Mod entries for L3 based forwarding. The IP routing module must take into account the session’s authentication state. Non-authenticated sessions must not be forwarded (except L2TP encapsulated towards a remote BRAS/BNG for wholesale services). Functionality for hosting IP-over-Ethernet or IP-over-MPLS should be available. A management function for observing and configuring OAM state must be available. A traffic shaping control module must be available. © SPARC consortium 2012 Page 104 of 129 WP3, Deliverable 3.3    Split Architecture - SPARC A VLAN control module must be available supporting IEEE 802.1ad. A DHCP relay control module must be available. Multicast support must be provided. Example SPARC DHCP++ 6.2.1.3 This example is splitting data and control plane of the SPARC BRAS model Option 1 while leveraging some advantages of this separation. Fundamentally, it changes nothing of the requirements as detailed in Section 5.6 and is relative similar to Option 2 of the SPARC BRAS example detailed in the previous section. The key change is that, instead of PPP(oE) and its integrated protocols, DHCP is used in combination with other protocols and mechanisms. Typically, the required protocols are already available at customer’s residential gateways (RGW), but commonly not used for the WAN side / interface. Therefore, it would require some modifications of the RGW, but this could be done by firmware updates and would require no hardware upgrades. How the legacy devices could be updated or how firmware will be deployed is out of scope of the SPARC project and thus not covered in this deliverable. As stated before, the DHCP++ proposal splits between data and control plane. Furthermore, it requires support of forwarding decisions in terms of forwarding entries in the FIB of the devices (potentially DSLAM, AGS1, AGS2, edge router) and implementation of QoS profiles in scheduler, shaper or police engine in the devices (see Section 5.8). Figure 62: SPARC DHCP++ in contrast to today's residential model The current model is already explained in detail in Section 5.6.3. The important aspect is that the data and control part of the PPPoE session is encapsulated and sent within the same session, presented in Figure 62 left side. In SPARC DHCP++, a DHCP++ app is introduced on top of the OpenFlow controller which is connected to either the edge of the network (edge router) or to any other device in the data plane chain from customer to Internet (more specifically: DSLAM, AGS1 or AGS2 as depicted with dotted lines in Figure 62 right side). Unfortunately, different implementation options with different requirements will exist concurrently. In the following paragraphs, we concentrate our analysis first on design options related to authentication, and then briefly discuss also the required routing function. First, it is important to acknowledge that the desired authentication target is important (see Section 6.2.1.1 for details). In current fixed access networks, a router on customer premises (RGW) performs the authentication by sending credentials to the network, thus applies a per port-based authentication scheme. If more than one user needs to be authenticated (referred to as customer authentication in Section 6.2.1.1), the model becomes more complex and requires more advanced models (e.g. see Protocol for Carrying Authentication for Network Access (PANA) defined in [62]). Such a model is out of scope for this document and subject to future work. This deliverable will concentrate on the perport based authentication. Again, different models might be implementable, shown in Figure 63 below. In the following, two general models with different implementation options are discussed. Essential is the support of DHCP Option 82, the DHCP Relay Agent Information Option (standardized in [63]) in one or the other way. For now this function is integrated in the DSLAM and adds a Line Identifier based on the port which could be used for authentication (see Section 6.2.1.1 for details) and correlation of customer profiles. In OpenFlow networks, one could use the port identifier directly. © SPARC consortium 2012 Page 105 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Figure 63: SPARC DHCP++ integration options for first node (DSLAM) and AGS node The first possible model is the attachment of the OpenFlow controller with the DHCP++ app (and thus connectivity to RADIUS) at the first node (the DSLAM) as shown in the middle of Figure 63. The DSLAM is OpenFlow enabled. Here forwarding entries are installed in a way that any traffic despite DHCP messages for the attachment of the client is dropped (similar to the IEEE 802.1X Port Access Entity [64]). After a successful DHCP ACK message (and related configuration of network parameters in the client e.g. address, subnet mask, default gateway, etc), the forwarding entries are updated so that Internet access is granted. This initial set of forwarding rules must add some port information. As detailed previously, here one has two options. In option one, the DHCP option 82 is used (which would result in some kind of hybrid node) and the DHCP++ app forwards this ID to the RADIUS server in order to receive appropriate profile and policing information. In option two, the DSLAM will forward the incoming packet with a port information and based on some additional data base information, e.g. the port information in an OpenFlow Packet-In message. Here the DHCP++ app forwards the Line ID to the RADIUS (like in the DHCP option 82 mechanism) or it forwards information like the port number and the DSLAM ID to the RADIUS and the RADIUS itself must figure out the client based on this information. Optional is the configuration of profiles and policies in the other devices. This could be done during the procurement of the customer or during the connection setup. In general, the implications are similar to the considerations discussed in the evolutionary approaches described in Section 6.1, and each operator has to decide which model is the best fit for its respective requirements. The second possible model is the attachment of the OpenFlow controller with the DHCP++ app at the OpenFlow enabled AGS1, shown on the right side of Figure 63. In principle, the concept is similar to the previous model, but requires smaller modifications. The general problem is the notification of the port. Again, this could be done in two different ways. First, at the DSLAM, a unique identifier for the datapath (e.g. VLAN ID) representing a tunnel is attached per port in each packet. Based on the ingress port of the AGS1 and this VLAN ID, the DHCP++ app can then identify the incoming port. The other option is to use the DHCP option 82 in the DSLAM and appropriate processing by the DHCP++ app. Other aspects like configuration of profile and policies are similar to the aspects covered by first node integration option. Another possibility is the integration at the AGS2 node. Here the model would be the same like for the integration at AGS1 node. A second major aspect beside the authentication target is the integration of the routing function into the service creation architecture. This is important for the following reasons:     Client requires a default gateway, e.g. for addressing ARP requests Scalability / security demand might require some termination of the broadcast domain Scalability of the OpenFlow platform might require some independent network segments Organization of backbone networks and related routing environment might require some subnetting In the SPARC BRAS model, these issues could be handled in a rather simple manner because of the termination of the PPPoE sessions at the BRAS and the required routing functionality in the PPPoE function. Therefore, the BRAS could be changed from being a router (with PPPoE support to customer side) to a switch. In the SPARC DHCP++ model, this transformation could not be applied easily. Again several options for the integration are possible and will depend on the © SPARC consortium 2012 Page 106 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC existing platform (attachment to legacy) or the desired targets of the platform to be built. Therefore, we give only some hints for possible options in this deliverable.    6.2.2 Several options are detailed for the integration of MPLS environments in Section 6.3, similar models could be applied Integration / interworking of routing function (e.g. RouteFlow [65]) into / with the DHCP++ app Shift of routing function to some core networks, e.g. Label Edge Router Business customer services based on MPLS pseudo-wires with OpenFlow Business customers use a number of different technologies to interconnect locations, and carriers provide different options for service creation of these services. In order to analyze the impact of OpenFlow on the service creation, and to have a meaningful relation to the work done in the demonstrator development in SPARC WP4, it was restricted to Ethernet services and MPLS first. In Figure 64 a provider edge router providing pseudo-wires is illustrated; while this example deals with Ethernet over MPLS, the same general architecture is used for multiple types of pseudo-wires. Frames are received from a customer edge switch/router and are first processed in the “Native Service Processing” (NSP) module – this refers to Ethernetspecific processing such as modifying VLAN tags, priority marking, bridging between different pseudo-wires, etc. Once through the NSP module, the “PW Termination” module is responsible for maintaining the pseudo-wire by, e.g., encapsulating/decapsulating frames and performing any necessary processing such as buffering in case of ordered delivery. Finally the packets are delivered to the “MPLS Tunnel” for MPLS encapsulation and transmission across the network. Figure 64: Pseudo-wire processing in Ethernet over MPLS pseudo-wires (based on Fig.3 of RFC4448) In OpenFlow Version 1.1 this model can be followed, for example, by having one flow table per module, with the encapsulation/decapsulation actions implemented as vendor specific actions, and any additional processing managed either through process actions or virtual ports (as discussed in Section 5.1.2). If we focus on MPLS pseudo-wires with Ethernet emulation (and for the moment ignoring other types of pseudo-wires), we need a number of new actions corresponding to the different steps described above. First, at the ingress of the pseudo-wire, we need an action that takes an incoming Ethernet frame and turns it into the payload of a new frame. Once the new frame has been created, we need to add the control word. From this point on we can use existing actions to push MPLS labels, set the correct EtherType, and add correct MAC source and destination addresses. At the egress of the tunnel this should be mirrored by the inverse actions, in particular one to strip the outer headers and convert the payload back into an Ethernet frame. 6.3 Split Control of Transport Networks 6.3.1 Cooperation with legacy control planes As discussed in the introduction to this section (Section 6), we identified four types of introducing OpenFlow to an access/aggregation network. In the first two types, OpenFlow-based control solutions configure the transport nodes, while in the third type the service functionalities are implemented through an OpenFlow-based centralized control scheme. In these three types, some nodes and/or some functions of those nodes are configured with protocols other than OpenFlow; only the fourth type considers OpenFlow to be the sole forwarding node configuration protocol. There are several OpenFlow integration options (as discussed in Section 6.1) for the third type: the centralized and the distributed OpenFlow control. In both options, the intermediate transport nodes are still not configured via OpenFlow, but provisioned through the management plane or by making use of legacy control protocols. For example, an Ethernetbased aggregation switch can be configured using SNMP. Therefore, the OpenFlow controller should be able to interact with a management/control entity responsible for provisioning the transport connections. For this purpose, an interface between the legacy management/control entity and the OpenFlow controller is essential. This raises the demand for © SPARC consortium 2012 Page 107 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC cooperation with legacy management or control planes. In our above example, the OpenFlow controller must be able to communicate with an entity responsible for managing Ethernet transport and thus for provisioning connectivity between the service nodes (e.g., DSLAM, BRAS). The fourth integration type, in which all aspects of all aggregation nodes are configured via OpenFlow, does not require such “vertical” cooperation with non-OpenFlow control entities. Even in this latter case, the whole network, as shown in Figure 2, is still not under the control of OpenFlow due to these aspects:   Mature control planes are deployed in some network segments and any gain of substituting them with OpenFlow is not clear, and Covering a whole network of thousands of nodes with a single controller entity raises scalability issues. In any of the discussed implementation cases, the OpenFlow controller must be able to cooperate with the other domains of the operators network. It is important to emphasize that any kind of control solution (centralized, distributed, static, dynamic, etc.) can be deployed in those domains. Therefore, the controller must be able to cooperate with the control function of those domains; this is referred to as horizontal interworking or peering and raises two major issues. The various control functions use different resource description models. For instance, the MPLS control plane was designed as a protocol running between physical nodes. Hence, the internal structure of routers is less relevant and information about the internal details of the nodes is not disclosed as a simplified view only. This view encodes the router with its interfaces and capacity information, assigned only to these interfaces. GMPLS control follows similar abstractions even in multilayer cases: All information is tied/bound to the interfaces of the nodes. In the case of WSON, this abstraction level is changed by adding further details of the node’s internal capabilities, but it still uses a generic model. On the other hand, an OpenFlow-based transport controller has much more detailed information about the managed domain. For horizontal interworking with legacy domains controllers via MPLS, GMPLS, etc., that information will be essentially filtered: A virtualized view of the managed domain is derived and provided toward the peering control entities. Furthermore, the control plane entities may allow for different control plane network implementations: For example, MPLS supports an in-band control plane, where the protocol messages travel together with the regular data traffic. GMPLS is also able to operate with in-band control channels, but it also supports use of the out-of-band control plane. While the SplitArchitecture inherently supports the out-of-band control network, it can provide in-band options as well: The controller is able to instruct the data plane nodes to inject the protocol messages into the data stream toward the peer control node and to demultiplex the protocol messages received from the peer node. 6.3.2 Semi-centralized control plane for MPLS access/aggregation/core networks Considering typical network structure as shown in Figure 2, the network consists of two parts: The access/aggregation using various forwarding technologies (e.g., Ethernet or MPLS), whereas IP/MPLS is the predominant technology in the core network segments. This also implies the control plane used in the core. This means that the controller, which manages the transport connections in the access/aggregation network segment, must be able to exchange IP/MPLS control protocol messages with the distributed IP/MPLS control plane of the core. The IP/MPLS control plane has both of the major issues discussed above. As IP/MPLS uses link state routing protocols (either OSPF or ISIS) to keep the topology databases synchronized at the protocol speakers, a very simple network model is used. According to this model each protocol speaker advertises only its identifiers to the attached network and the detected adjacent routers, and by default does not report any information about its internal capabilities or structure. It additionally uses signaling protocols, e.g., LDP, to provision end-to-end MPLS label switched paths. However, the label distribution mechanism allows the adjacent nodes to agree on the used label value, but it does not instruct any node about how to configure its internal elements. This means that the controller must implement an IP/MPLS control-compliant view of the managed domain and a mapping mechanism between the physical network and the logical representation. As discussed, the IP/MPLS control plane is an in-band control plane, so the OpenFlow controller must be aware of that. An extension to the link state routing protocols allows the assignment of further opaque attributes to the link. These additional attributes are also disseminated to other protocol speakers, although they do not carry any relevant information for the routing protocol. Assigning link characteristics such as available bandwidth, delay, etc., as opaque attributes supports implementation of traffic engineering in IP/MPLS networks. However, to provision trafficengineered LSPs, an additional protocol is used: RSVP-TE. The IGP protocol must implement additional structures to advertise the opaque attributes as well, and such extended protocols are referred to as IGP-TE (e.g., OSPF-TE). These extensions convert the link database to a Traffic Engineering Database (TEDB). © SPARC consortium 2012 Page 108 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Options for connecting the controller to the IP/MPLS control plane 6.3.2.1 The dissemination areas of the link state IGPs (OSPF/ISIS) define a structure for the IP/MPLS control plane. Adding all protocol speakers to a common dissemination area will result in an accurate view of the network at all speakers, allowing proper calculation of the LSPs. The drawback of such a solution is the scalability because the number of nodes of the same dissemination area increases. Several documents (see [40]) report that the while the core network can be covered by a single dissemination area, the whole network cannot. Therefore, in this section we discuss two alternatives of how a controller managing an access/aggregation domain could be attached to the distributed IP/MPLS control plane of the core. A possible implementation is when the controllers act as simple IP/MPLS protocol speakers and they are attached directly to the core network’s control plane, just like the simple core routers. Then the controllers and the core routers share the same dissemination area (OSPF area) as shown in Figure 65. OpenFlow CTRL Access AGS1 NNI AGS2 LSR LSR LER OpenFlow domain Single MPLS core & aggregation domain Figure 65: Single dissemination area option Consequently, each controller has the same topology database as all the other core routers and controllers. Based on the shared database, every controller or router can initiate an LSP configuration to all other routers or controllers. The signaling protocols, LDP and the RSVP-TE are used to manage the LSPs. In some cases, the IP/MPLS network splits into multiple dissemination areas. Area Border Routers (ABRs) reside at the border of the dissemination areas. Thus the controller can be part of either the backbone dissemination area or any of the stub/attached areas. In the former option, the ABR can be considered part of the OpenFlow controller domain (as shown on Figure 66). This option is roughly similar to the single dissemination area option because the controller communicates with all nodes in the backbone using OSPF-TE, LDP or RSVP-TE for dissemination. Relying only on these protocols, it is possible to create LSPs in the core area only. Spanning LSPs covering multiple dissemination areas require additional protocols:   One possibility is to introduce MP-BGP as described in the seamless MPLS concept [40]. This means that the controller must support the MP-BGP protocol as well as extended router and LSP redistribution mechanisms. Another option is to introduce some parts of the GMPLS’ multi-domain extensions based on signaling extensions of RSVP-TE. Since such an LSP spans multiple dissemination areas, the source node has an accurate view of the local area only, and it has connectivity information only for the remote one. This affects the path calculation mechanisms used. It is possible to calculate the path domain-by-domain, but it will not be optimal. A better solution is to adapt the Backward Recursive Path Computation (BRPC) algorithm and use the PCEP to synchronize the calculated path fragments. To support this option the controller must implement the extended version of RSVP-TE as well as PCEP. NNI OpenFlow CTRL Access AGS1 AGS2 ABR LSR ABR OpenFlow domain MPLS aggregation domain MPLS core domain Figure 66: ABR is under OpenFlow control © SPARC consortium 2012 Page 109 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC In the latter option (see Figure 67) the controller is part of an attached dissemination area. Just like the other multi-area option, it implements the basic protocols to configure the LSPs within the area and applies BGP or multi-domain RSVP-TE for configuring the end-to-end path. The significant difference is that the controllers are not directly involved in the core network configuration, and they have a limited view. The pros and cons of the two multi-area options are discussed as part of the scalability evaluation in Section 6.4. OpenFlow CTRL Access AGS1 NNI AGS2 ABR LSR ABR OpenFlow domain MPLS aggregation domain MPLS core domain Figure 67: ABR is not under OpenFlow control Based on the implementation cases discussed, the NNI interface running between the controller and the IP/MPLS domain shall implement the following protocols:      A link state IGP protocol (either OSPF or ISIS) to share connectivity information. TE extensions of the above IGP protocols to share TE attributes and/or provide detailed information about the internal structure of the OpenFlow domain. To signal intra-domain MPLS connections, LDP can be used for best effort, RSVP-TE for TE-enabled connections. To signal inter-domain MPLS connections, MP-BGP can be used for best effort, RSVP-TE (RFC5151) with optional PCEP support for TE-enabled connections. To support multicast, mLDP or RSVP-TE (RFC4875) can be used. 6.3.2.2 Domain representation model The IP/MPLS control plane design assumes that the protocol speakers are actually routers, and they could consider all other speakers as routers. As a result, that information model focuses on the links running between these routers, and very limited information is disclosed about the internals of the routers. This model was kept when TE capabilities were added: The TE attributes were tied/bound to the link descriptors (see Traffic Engineering Link, TE Link concept). One alternative is to keep the IP/MPLS model as is, and the controller is then developed with functions to support the existing models. As an alternative the IP/MPLS information model may be extended, similar to the WSON extensions for GMPLS [27]. However, it will violate the assumption of not making any updates to the core network. Therefore, this latter alternative is not discussed here. A trivial consequence is that the OpenFlow controller must implement an appropriate mapping function between the controller’s internal model and the IP/MPLS information model because the IP/MPLS model is not sufficient to describe all aspects of the OpenFlow domain. This mapping function can be implemented in many ways. A possible realization is the emulation of the control plane (see for instance QuagFlow [28]). The OpenFlow domain switches are replicated as emulated routers running the IP/MPLS control plane. This creates a logical view of the OpenFlow domain topology and all control plane actions are emulated. If more routers run in the emulated environment, they will synchronize their state even though they are running in the same controller. Instead of replicating the whole OpenFlow domain with emulated IP/MPLS routers, we propose representing the whole OpenFlow domain as a single IP/MPLS router. This single virtual router is considered during the interaction with the legacy IP/MPLS control plane. Its identifiers and virtual interfaces and are advertised in OSPF, and its content determines the sent signaling messages (LDP or RSVP-TE) as well. Upon reception of any signaling messages its content will be uploaded. Besides eliminating the unnecessary domain internal state synchronizations, this approach has scalability advantages as well: The rest of the IP/MPLS does not need to take care of the internal structure of the OpenFlow domain. © SPARC consortium 2012 Page 110 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Updated controller architecture 6.3.2.3 The general control plane architecture described in Section 4.1 does not cope with all the requirements dictated by the considered MPLS-based access/aggregation scenario. That hierarchical organization of controller layers does not consider peering control plane entities at the same control layer. To support such peering control plane entities, the control plane model must be extended as described below. According to the above examples, the peering control planes may use different approaches to manage their supervised network segments. Therefore, direct exchange of the internal data models of the network segments is impossible without any agreed translation functions. Even if such translation functions were defined, implementing would not be recommended due to scalability and privacy issues. For example, one operator may not want to disclose all internal information to other operators. Furthermore, sharing the information describing databases may place unacceptable processing burdens on the interoperating control planes: One controller would process each and every change in the databases of other controllers. Router virtualization is an obvious choice to alleviate both problems. In this case, the peering controller plane must support a common information model and associated procedure set. The virtualization models used here are roughly the same as those used in hierarchical interworking, i.e., the managed network domain can be represented as a switch, or a set of switches with a physical/emulated topology. The associated procedures sit on the top of this virtualization model and are implemented by protocols. In order to enhance flexibility, the virtualization model and the associated procedures are detached, i.e., the different sets of protocols may use the same virtualization model, and the protocols actually used may be configured. The resulting updated control architecture is depicted in Figure 68. Client Controller Network Configuration Network Configuration Topology & resource info Virtual topology & resources Vertial Interworking (UNI Protocol Proxy) NNI View Domain Maintenance Generalized Network Visor UNI View Configuration instructions PDU Virtual topology & resources Horizontal Interworking (NNI Protocol Proxy) Peer Controller PDU PDU OpenFlow OpenFlow domain domain Figure 68: Revised controller architecture Here the controller is comprised of three major elements:    The Domain Maintenance module manages the OpenFlow domain, deploys flows, reacts to topology changes etc. A Generalized Network Visor realizes the virtualization feature by managing virtual (emulated) topologies. The namespace and resource management is also implemented here. Interworking modules implement the protocol and functional modules to communicate with other control plane entities. These modules operate on top of the virtualized topologies provided by the generalized visor, which controls the access of the interworking modules to the OpenFlow switches. With these extensions a controller implementing the split transport control plane will comprise several key modules. The Domain Maintenance module is responsible for managing the OpenFlow domain. It authenticates the connected data plane switches, and it maintains an appropriate view of the topology of the available resources of the managed network domain. It also provisions all desired aspects of flows within the OpenFlow domain: calculating paths fulfilling the traffic engineering objectives, deploying monitoring endpoints and the protection infrastructure. It also provides an interface to other modules. Through this interface the other modules can post configuration requests to the Domain Maintenance module and they can receive reports of events in the managed domain. © SPARC consortium 2012 Page 111 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC The Generalized Network Visor supports the various virtualization models and implements a virtual router model. The essential transformation functions between the virtual model and the managed topology are realized by the visor as well. For example, a configuration request in the virtual router may trigger a request for a flow establishment from the Domain Maintenance module. The result of a successful deployment of such a flow may result in installation of a new forwarding entry in the virtual router. The virtual router is used by the NNI protocol proxy that steers the communication with the IP/MPLS protocol control plane. For example, it can integrate all relevant legacy protocols (OSPF-TE, LDP, BGP, etc.) that run as part of the controller. Another option is to make use of external protocol implementations. In this case the controller hosts a proxy or kernel part of the protocol stack and uses stack-specific protocols (e.g., the zebra protocol) to communicate with the protocol implementation. As a third option, external protocol stacks are used, but the virtual router model is exported and the controller acts as a switch by providing standard switch forwarding configuration interfaces (e.g., SNMP). Scalability Characteristics of Access/Aggregation Networks 6.4 This section presents our scalability investigations regarding the access/aggregation use case. First we provide a high level introduction and then we present the results of our numerical analysis before finally showing the simulation results. 6.4.1 Introduction to the scalability study There are many aspects of scalability, e.g., the scalability of an architecture, of a protocol, of a given scenario. We are focusing on the generic SplitArchitecture concept and the main use-case of the project. 6.4.1.1 Deployment – network topology Our deployment scenario is based on D2.1 and serves as a basis for the scalability study. Figure 69 shows the common view of the project partners in the schematic view of the access/aggregation network area. Base Station Base Station Business Customer Business Customer IPTV Server Service Edge (BNG) CE RGW AN AGS1 AGS2 CE RGW AN AGS1 AGS2 CE RGW AN AGS1 AGS2 EN IPTV Server Core Mobile GW Figure 69: Deployment scenario for scalability studies The OpenFlow domain consists of Access Nodes (AN), first-level aggregation switches (AGS1), second-level aggregation switches (AGS2) and edge nodes (EN). All other entities, both on the client side (Customer Equipment, Residential Gateway, Business Customer [BC], mobile Base Station [BS]) and the server side (Service Edge, IPTV server, Mobile GW) are not part of the OpenFlow domain. Note that multiple access/aggregation domains can be connected to the core network and they can be OpenFlow-based, but they have no direct (OpenFlow) interaction. Based on D2.1 we have three scenarios that cover current and future deployment numbers.    In the “Today” scenario there is 1 PoP location (corresponding to our EN) for 500,000 households (which are represented by the RGWs). This covers about 1 million inhabitants. In the “Future” scenario 1 PoP location will cover around 2,000,000 customers. In the “Long-term” scenario 1 PoP location will serve 4,000,000 customers. © SPARC consortium 2012 Page 112 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC In all scenarios the number of devices will relate to each other as: customers devices (CE) >> customer edge (RGW) >> access nodes (AN) >> edge nodes (EN). Additionally, the sum of access nodes (AN) relates to the sum of aggregation nodes (AGS1+AGS2) and edge node (EN) as 10:1. Based on these assumptions, network topologies can be drawn for the different scenarios. 6.4.1.2 Deployment - services Regarding the services provided by using this network, we have made the following assumptions based on D2.1. There are residential services, namely the simple service (e.g., PPPoE) for Internet access and IPTV. The simple service provides bidirectional connectivity between the RGW and the service edge. The IPTV service provides bidirectional connectivity between the RGW and the IPTV server, where the direction from the IPTV server dominates, and typically the same data is simultaneously sent to multiple RGWs. There are business services, namely the simple service (as for residential customers), the Point-to-Point (PtP) and the Multipoint to Multipoint (MPtMP). The PtP service connects two business customers, while the MPtMP connects multiple business customers. All of these services are bidirectional. The mobile backhaul service connects a base station to a mobile GW in a bidirectional way. 6.4.1.3 Scalability concerns There are theoretical constraints due to protocol limitations, e.g., the maximum number of fields or objects, maximum length or value of fields or objects, maximum size of packets, etc. The OpenFlow protocol is relevant in our case. Other protocols that may be affected are the distributed control plane protocols, e.g., MPLS (OSPF, LDP, RSVP). Our focus is not on the analysis of a given protocol, but rather on a more generic approach to the split control plane. Protocolspecific issues are beyond our scope. Computational resource limits may be a bottleneck at the central controller (in such a case these calculations may be made in a decentralized way, e.g., single central logical view vs. distributed physical view). The complexity of the algorithms used can be checked here, e.g., NP hardness. However, the time budgeted for running an algorithm is important – from this point of view the P complexity class could be still too difficult in some practical cases. The control network’s capacity limits the overall control traffic, thus introducing upper limitations on the frequency and size of node configuration commands. This can have an effect on the possible network size (e.g., the number of nodes/ports/links) or on the reaction time of the overall network. The control network introduces non-negligible delays due to the geographical distance between switches and the controller. This results in a lower limit on reaction time even if the controller had infinite computational power. 6.4.2 Numerical model This section describes our numerical model and the results of the of the study based on this model. 6.4.2.1 High-level description of the model The model is based on Section 6.4.1 and it includes the network nodes, services and scenarios identified there, with the following assumptions. Clients (RGW, BC, BS) are connected only to the AN, not to higher level aggregation nodes. SE is not directly interconnected to an AGS2. The MPtMP business service is not modeled. The UNI interface is on the left side of the AN, while the NNI is on the right side of the EN, see Figure 71. Controller Service Edge RGW AN AGS1 AGS2 Business Customer AN AGS1 AGS2 Base Station AN AGS1 AGS2 EN IPTV Core Server Mobile GW Figure 70: Simplified domain view © SPARC consortium 2012 Page 113 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Controller Service tunnel RGW BC BS End-to-end transport tunnel Transport segment AN AGS1 AGS2 EN Service Edge Core Figure 71: Tunnels considered To transport the above services appropriately, the transport connection architecture is assumed as depicted in Figure 71. The blue connections are OpenFlow domain internal transport segment tunnels. There are multiple options for organizing these tunnels. There can be a segment tunnel connecting each AN to the EN as shown in the figure. End-to-end tunnels (shown as a red line) are defined between nodes, e.g., the AN and the SE. Such tunnels overlap OpenFlow and non-OpenFlow domains. Service tunnels (shown as a green line) within the E2E tunnels identify the service, e.g., the user in the case of an AN where multiple RGWs are connected. The exception is IPTV, which is treated as a special E2E tunnel in the model. This triple-layer connectivity set provides compatibility with the IP/MPLS core, flexibility within the OpenFlow domain, and it also supports scalability (by aggregating connections). Static requirements from the model 6.4.2.2 The figure below (Figure 72) shows the amount of equipment units to be managed by the controller according to the main scenarios based on our model. Two alternatives are offered for each scenario, both fulfilling the requirements. Note the we will later concentrate on one alternative only because the results of the alternative topologies have the same order of magnitude results in each case. 10 000 000 1 000 000 Number of equipments 100 000 AGS2 10 000 AGS1 AN 1 000 RGW 100 10 1 Sparc today Sparc future Sparc long-term scenario Figure 72: Number of equipment units Figure 72 shows that the number of OpenFlow switches in an OpenFlow domain is around 10,000 / 20,000 / 50,000 for the specific scenarios. © SPARC consortium 2012 Page 114 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Figure 73 shows the number of tunnels terminating or crossing a given type of node. 100 000 000 10 000 000 EN segment Tunnel number per equipment 1 000 000 EN E2E EN service 100 000 AGS2 segment AGS2 E2E AGS2 service 10 000 AGS1 segment AGS1 E2E AGS1 service 1 000 AN segment AN E2E 100 AN service 10 1 "Sparc today" "Sparc future" "Sparc long-term" network scenario Figure 73: Number of tunnels Note that a flow entry isn’t needed for each type of tunnel in each type of node – for example, an aggregation switch is unaware of the service tunnels passing by because they are encapsulated in E2E tunnels which are encapsulated in transport segment tunnels. This figure illustrates why encapsulation is needed here and why automatic advertising of all interfaces (e.g., via LDP) is not recommended for use here. Figure 74 below shows the number of flow entries to be supported by the network entities in our access/aggregation use case for the different scenarios. Controller (SUM) EN AGS2 AGS1 AN Flow entries per equipment 100 000 000 10 000 000 1 000 000 100 000 10 000 1 000 100 10 1 Sparc today Sparc future Sparc long-term Scenario Figure 74: Number of flow entries We can see that for the lower part of the aggregation about one hundred, while for the upper part of the aggregation thousands of flow entries have to be supported by the switches. The controller has to manage from one to ten million flow entries overall in the whole domain. Based on the static analysis the following KPIs have been calculated for the scenarios (today/future/long-term) for a given OpenFlow access/aggregation domain:      Number of OF switches in an OF domain: 10,000 / 20,000 / 50,000 Number of UNI interfaces: 500,000 / 2,000,000 / 5,000,000 Number of NNI interfaces: 1 Number of flow entries in a switch, depending on aggregation level: hundreds – thousands – tens of thousands Number of actions of a flow entry: 1 – 2 © SPARC consortium 2012 Page 115 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Dynamic results from the model 6.4.2.3 To address the dynamic behavior of the OpenFlow domain, we considered the effects of two type of events. The first one was a link going down an AN-AGS1, an AGS1-AGS2 or an AGS2-EN link, to be handled by the controller. We assume redundancy in the topology, so all traffic can be rerouted to a backup path. Regarding the reconfiguration of the flow entries, we calculated two values for each link-down:   The best case, where there is a direct alternative link between the same two nodes, resulting in limiting the modifications only to those two nodes. The worst case, where the alternative link is connected to a different (but same aggregation level) node, with the most disjoint path to the EN compared to the original failed one. In this case the flow reconfigurations are not limited to the nodes directly affected, and higher aggregation level nodes will also be reconfigured. Figure 75 shows how a link-down affects the various tunnels according to the model. 100 000 000 10 000 000 EN segment Tunnel number per equipment 1 000 000 EN E2E EN service AGS2 segment 100 000 AGS2 E2E AGS2 service 10 000 AGS1 segment AGS1 E2E 1 000 AGS1 service AN segment AN E2E 100 AN service 10 1 "Sparc today 1" "Sparc today 2" "Sparc future "Sparc future 1" 2" "Sparc longterm 1" "Sparc longterm 2" network scenario Figure 75: The effect of a link down: tunnels From the best case we can derive requirements for the switches, while from the worst case we can derive requirements for the controller. Figure 76 shows the required flow modification commands for restoring the tunnels via the controller. 100 000 10 000 1 000 best worst 100 10 1 AN-AGS1 AGS1AGS2 Sparc today AGS2-EN AN-AGS1 AGS1AGS2 Sparc future AGS2-EN AN-AGS1 AGS1AGS2 AGS2-EN Sparc long-term Figure 76: The effect of a link-down: flow mods © SPARC consortium 2012 Page 116 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Flow_mods to be handled by a switch in the case of a non-AGS2-EN link failure are in the magnitude range of tens and a few hundreds. For the controller, the magnitude range is in the hundreds of flow entries. In the case of an AGS2-EN link failure, the switch has to be able to handle a few thousand of flow_mods, while the controller magnitude range is a thousand or a few ten thousands. Taking into account the time available for sending these changes results in Figure 77. These diagrams can be used to derive requirements, e.g., if 1 sec. can be used for the controller to send the configuration messages (see horizontal axis), then this results in a given flow mod/sec controller requirement (vertical axis). Recovery (future, best case) 100000 Flow mods / sec 10000 1000 AN-AGS1 link AGS1-AGS2 link AGS2-EN link 100 10 0, 65 0, 85 1, 05 1, 25 1, 45 1, 65 1, 85 2, 05 2, 25 2, 45 2, 65 2, 85 3, 05 3, 25 3, 45 3, 65 3, 85 4, 05 4, 25 4, 45 4, 65 4, 85 0, 45 0, 25 0, 05 1 Recovery time Recovery (future, worst case) 1000000 100000 Flow mods / sec 10000 AN-AGS1 link AGS1-AGS2 link AGS2-EN link 1000 100 10 0, 25 0, 45 0, 65 0, 85 1, 05 1, 25 1, 45 1, 65 1, 85 2, 05 2, 25 2, 45 2, 65 2, 85 3, 05 3, 25 3, 45 3, 65 3, 85 4, 05 4, 25 4, 45 4, 65 4, 85 0, 05 1 Recovery time Figure 77: Recovery times Note that the time refers only to sending the flow_mod messages – other aspects are not shown here. The second type of event we considered is a burst of IPTV channel changes. This could happen if a significant percent of the users change channels, for example in the case of the beginning of a sport event or the evening news. The figure below shows the number of flow_mod messages the controller has to send. The best case refers to the situation where the video stream is already at the AN, whereas in the worst case the path for the stream has to be configured from the core for each request (which is slightly above the realistic scenario, but definitely an upper limit). The result is shown in Figure 78. © SPARC consortium 2012 Page 117 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC IPTV channel change 1% of users change simultaneously 90 000 80 000 70 000 Flow mods 60 000 50 000 flow modifications (best) flow modifications (worst) 40 000 30 000 20 000 10 000 0 "today" "future" "long-term" Scenario Figure 78: IPTV channel change While the best case is a requirement for the AN and a lower limit for the controller, the worst case is a pessimistic upper limit requirement for the controller. Note: 1 percent of IPTV users change channels simultaneously in this example, but since it is scaled linearly, it is easy make calculations for other numbers. Taking into account the time available for sending these changes results in Figure 79. Note that the time refers only to sending the flow_mod messages – other aspects are not shown here. Channel change (future, 1% ) 350 000 300 000 flow mod / s 250 000 200 000 best worst 150 000 100 000 50 000 4, 9 4, 6 4 4, 3 3, 7 3, 4 3, 1 2, 8 2, 5 2, 2 1, 9 1, 6 1 1, 3 0, 7 0, 4 0, 1 0 time (s) Figure 79: IPTV channel change time If we assume 1 second response time for the channel change request and 1 percent of the users change, then the flow mod rate per second to be issued by the controller is in the best case 2,000 / 8,.000 / 20,000, while in the worst case 8,000 / 32,000 / 80,000. © SPARC consortium 2012 Page 118 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Comparison – existing switch and controller performance 6.4.2.4 We found the following related numbers in the DevoFlow [10] paper, which discusses a different use case and setup, however it provides numbers for expectable switch and controller performance, regardless of the actual setup: “We found that the switch completes roughly 275 flow setups per second. This number is in line with what others have reported [11].” This performance is far below that needed to provide carrier-grade controller-based resiliency during a link-down event, however, it may be suitable for best effort type of service. Regarding the IPTV channel change event, this switch performance seems to suit the lower parts of the access/aggregation network. “[12] report that one NOX controller can handle ‘at least 30K new flow installs per second while maintaining a sub-10 ms flow install time ... The controller's CPU is the bottleneck.’ ” “Maestro [13] is a multi-threaded controller that can install about twice as many flows per second as NOX, without additional latency.” This reported controller performance seems to be in the order of magnitude suitable for handling link-down effects in the network in a carrier-grade sense. For the IPTV channel change, it depends on the specific requirements. Updates to the initial topology and tunnel structure 6.4.2.5 The high flow_mod values in the case of link failure restoration close to the EN are the result of the many tunnels between the ANs and the EN. To decrease this high number of flow_mod messages, those tunnels could be aggregated, thus having only a single transport segment tunnel between neighboring nodes. In the case of restoration, an alternative segment tunnel can be built between the same two nodes over other multiple nodes. This increases the static number of flow entries per forwarding device and the traffic volume. However, it drastically decreases the message number in the restoration case. Figure 82 shows the comparison of a tree (snowflake) topology (Figure 70), a tree + ring combination topology (Figure 80), and the same topology with the single hop transport segments (Figure 81). Order of magnitude differences are evident. Controller RGW AN AGS1 Service Edge AGS2 EN Business Customer AN AGS1 AGS2 Base Station AN AGS1 AGS2 IPTV Core Server EN Mobile GW Figure 80: Ring topology Controller Transport segment RGW BC BS AN AGS1 AGS2 EN Core Service Edge Figure 81: Alternative OpenFlow domain internal tunnel structure © SPARC consortium 2012 Page 119 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC 1 000 000 Flow mods 100 000 10 000 Tree, E2E segments Ring, E2E segments Ring, 1hop segments 1 000 100 10 1 best case worst case AN-AGS1 best case worst case AGS1-AGS2 best case worst case AGS2-EN Figure 82: Topology and domain internal tunnel structure effects However, reducing the required flow_mod messages has specific disadvantages as well, such as more static flow entries or increased traffic volume in the case of restoration. A detailed analysis is beyond the scope of this deliverable. 6.4.2.6 Conclusions from the scalability study Looking at the static behavior, requirements from the model seem to be in the order of magnitude or below the capabilities of existing controllers and switches. Regarding the dynamic behavior, there are scalability concerns, especially when strict time constraints are given. We showed that by changing the connection structures scalability can be significantly increased. © SPARC consortium 2012 Page 120 of 129 WP3, Deliverable 3.3 7 Split Architecture - SPARC Conclusions This deliverable defined the proposed carrier-grade SplitArchitecture, applying the concept of Software Defined Networking (SDN) to operator networks. Taking the use cases and requirements defined in WP2 into account, we presented our proposal and evaluated technical issues against certain architectural trade-offs. First, we considered a control and management architecture, consisting of a hierarchical, recursive control plane that enables operators to deploy several control planes with minimal interference and assign flows dynamically based on given policies. Additionally, we outlined an initial proposal of how to integrate network management and OpenFlow control with the flexibility to choose for each management function whether to integrate it into an SDN controller or place it in a separate network management system. Next, we discussed the required extensions to the OpenFlow protocol for supporting the carrier-grade SplitArchitecture. These include general extensions for openness & extensibility, virtualization support, OAM solutions, resiliency measures, bootstrapping and topology discovery issues, service creation solutions, energy-efficient networking approaches, QoS aspects, and multilayer considerations. In addition, we outlined selected deployment and adoption scenarios faced by modern operator networks. Specifically, we discussed procedures for scenarios such as service creation, general access/aggregation network scenarios and peering aspects, i.e., how to interconnect with legacy networks. Finally, a numerical scalability study indicates the feasibility of the SplitArchitecture approach in access/aggregation network scenarios in terms of scalability. As overarching goals, the SPARC project defined i) A SplitArchitecture blueprint for carrier-grade networks, and ii) Required extensions to the OpenFlow protocol. We are confident that this deliverable provides a comprehensive blueprint for control-, data- and management planes, as well as protocol aspects, as summarized above. Here, we would like to highlight two specific requirements with respect to carrier-grade networks, namely scalability and availability.   Regarding scalability, our goal for the SplitArchitecture blueprint was to meet requirements to support large-scale deployments for carrier-grade networks. Specifically, the SplitArchitecture device shall be able to control forwarding devices that could count in the order of hundreds. In fact, the targeted use case determined the size of a network even in the order of thousands of nodes (see deliverable D2.1). This confirmed the requirement for a controller device to be able to control forwarding devices that could count in the order of thousands. We conclude that practical software architectures do not preclude maintaining OpenFlow connections toward such number of equipment. However, the rate of configuration data exchanged between the controller and the switches may raise scalability concerns. Our numerical scalability study showed that the control traffic required to proactively deploy transport connections with OpenFlow will not cause scalability limitations in the considered network sizes. On the other hand, reactive operation impairs the scalability of the SplitArchitecture. The measurements performed in deliverable D5.2 confirmed that tasks requiring strict time constraints, such as 50 milliseconds in the case of recovery or fast detection of link failures, cannot be realized via current OpenFlow based split control. This observation led to OpenFlow switch function proposals, together with OpenFlow protocol extensions for providing configuration support for these functions, such as OAM and protection, as discussed in this deliverable. Regarding availability, our goal for the SplitArchitecture blueprint was to ensure that the availability of networking services shall be equivalent to that of traditional technologies. We conclude that the availability of applications and services is dependent on the reliability of the server hosting them and the ability of the network to connect the user to the service. Server reliability itself is not affected by introducing SplitArchitecture, and therefore the main impact is by facilitating the implementation of custom protection and restoration per flow. As discussed in this deliverable and shown in deliverable D5.2, the proposed extensions to OpenFlow can enable sub-50 milliseconds protection for high-availability services while providing restoration for others. We must note that restoration in SplitArchitecture is slower than in traditional networks due to the centralized nature of the controller. However, OpenFlow can enable sub-second restoration in networks that previously had no such capability in a very cost-efficient solution. In SPARC, we applied the concept of SDN/SplitArchitecture to the network operator domain with the promise of improving network design and operation in large-scale networks with multi-million customers, high availability and high automation. Furthermore, SplitArchitecture should open the field for new market players by lowering the entry barriers that exist in the complexity of individual components. From the technical point of view, we have been able to present and to demonstrate an architecture that can fulfill scalability and availability requirements, as well as many other specific requirements such as network management and OAM, virtualization and QoS support. From the business point of view, two things should be noted. First, our techno-economic analysis in WP2 confirms that there exists a potential for optimization of costs in terms of capital (CAPEX) as well as operational expenditures (OPEX) for carrier networks. Second, our analysis of the business environment, including a questionnaire and an analysis of © SPARC consortium 2012 Page 121 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC standardization organizations, indicates that major players already present in today’s telecommunication industry might be the leaders in a software defined telecommunication network environment as well. Hence, the conclusion of the SPARC project is that it is both technically feasible and economically beneficial to apply SDN/SplitArchitecture to the carrier domain. Furthermore, the results of this project clearly indicate that the promises made by this novel architecture paradigm in terms of simplified and automated operation are still valid and definitely deserve further attention not only by academia, but also the network industry in general and telecom operators in particular. The positive conclusions of the SPARC project also manifest itself in many ongoing and planned contributions to the main SDN standardization body, the ONF, by the SPARC ONF members (Ericsson and DTAG). During the initial period of the project, the SPARC project was concentrating on the setup of an ONF working group dealing with questions regarding a general architecture and use cases, which was eventually successful with the installation of the Architecture and Framework WG (including a SPARC member in the design team). Currently, also the Configuration & Management WG receives much input from SPARC partners based on the content on network management, OAM and bootstrapping as provided in this deliverable (Sections 4.2, 5.3 and 5.5). While these efforts will continue, the SPARC ideas on service creation (Sections 5.6 and 6.2), as well as the concepts for a controller architecture and openness and extensibility (presented in Sections 4.1 and 5.1) are planned to be discussed within the newly founded ONF Architecture and Framework WG. Additionally, SPARC members contribute actively to the New Transport Discussion Group within the ONF with the results of the multilayer discussions in Section 5.9, specifically the GMPLS-aware extensions for OpenFlow (Section 5.9). As the next steps for carrier-grade SDN solutions, the SPARC project suggests continuing research towards full service node virtualization. SPARC already considered first SDN scenarios for service creation (e.g. split BRAS or SPARC DHCP), but mostly focused on an enhanced emulation of transport services in order to comply with legacy network technologies (e.g. support for MPLS and related OAM). However, the concept of split service creation could evolve to fast and flexible service deployment, ranging from connectivity services for residential business and backhaul, but also more sophisticated ones such as VoIP and IPTV. This would require study of programmable network datapath elements with generic hardware abstraction, enabling support for these flexible services. Furthermore, evolving the control plane for flexible service creation requires support for unified service and transport control along with additional necessary protocol extensions. Functions to consider include support for mobility, AAA and advanced service OAM. © SPARC consortium 2012 Page 122 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC Abbreviations 3GPP Third-generation partnership program CPU Central Processing Unit ADSL Asymmetric Digital Subscriber Line CRC Cyclic Redundancy Check AES Advanced Encryption Standard CR-LDP Constraint-based LDP AGS Aggregation Switch CRUD Create, read, update, delete ANDSF Access Network Discovery Selection Function CSCF Call Session Control Function AP Access Point DCF Distributed Coordination Function API Application Programming Interface DHCP Dynamic Host Configuration Protocol ARP Address Resolution Protocol DHT Distributed Hash Table AS Autonomous System DiffServ Differentiated Services, IETF ATM Asynchronous Transfer Mode DNS Domain Name Server AWG Arrayed-waveguide Grating DOCSIS BB Broadband Data Over Cable Service Interface Specification BBA Broadband Access DPI Deep Packet Inspection BBF Broadband Forum DRR Deficit Round Robin BB-RAR Broadband Remote-Access-Router (SCP for Fast Internet) DS Differentiated Services DSCP Diff Serve Code Point BE Best-Effort DSL Digital Subscriber Line BFD Bidirectional Forwarding Detection DSLAM BG Broadband Aggregation Gateway Digital Subscriber Line Access Multiplexer (network side of ADSL line) BGP Border Gateway Protocol; Distance Vector Routing protocol of IETF (EGP) DWDM (Dense) Wave-Division-Multiplex dWRED Distributed WRED BRAS Broadband Remote Access Server / Service DXC Digital Cross-Connect BRPC Backward Recursive Path Computation ECMP Equal Cost Multi-Path BSS Basic Service Set ECN Explicit Congestion Notification CAR Committed Access Rate ECR Egress Committed Rate CBO Class-Based Queuing EGP Exterior Gateway Protocol CBWFQ Class-Based Weighted Fair Queuing EIGRP Enhanced IGRP CCM Continuity Check Message EN Edge Node CDMA Code Division Multiple Access ePDG Evolved Packet Data Network Gateway CE Control Element ESS Extended Service Set CHG Customer HG; HG in customer site FE Forwarding Element CIDR Classless Inter-Domain Routing FEC Forwarding Equivalence Class CIPM Cisco IP Manager FEC Forward Error Correction CIR Committed Information Rate FIB Forwarding Information Base CLI Command Line Interface FMC Fixed Mobile Convergence CLNC Customer LNS, LNS in customer site ForCES Forwarding and Control Element Separation CORBA Common Object Request Broker Architecture FPGA Field Programmable Gate Array CoS Class of Service FSC Fiber Switching CP Connectivity Provider FTTCab Fiber to the Cabinet CPE Customer Premise Equipment FTTH Fiber to the Home © SPARC consortium 2012 Page 123 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC FTTLEx Fiber to the Local Exchange LAC L2TP Access Concentrator FW Firewall LAN Local Area Network GbE Gigabit Ethernet LDP Label Distribution Protocol GFP Generic Framing Procedure LER GLONASS Globalnaja Nawigazionnaja Sputnikowaja Sistema (Russian satellite system) Label Edge Router; MPLS-based router with MPLS, IP-VPN and QoS edge support LER-BB Broadband LER; LER for DS and higher GMPLS Generalized Multi-Protocol Label Switching L-GW Local Gateway GNSS Global Navigation Satellite System LLDP Link Layer Discovery Protocol GPON Gigabit Passive Optical Network L-LSP Label-inferred LSP GPS Global Positioning System LMP Link Management Protocol GRE Generic Route Encapsulation LNS L2TP Network Server GTS Generic Traffic Shaping LSP Label Switch Path GUI Graphical User Interface LSR HCF Hybrid Coordination Function Label Switch Router; MPLS-based router in the inner IP network. Only IGP knowledge. HDLC High-level Data Link Control LTE Long Term Evolution HG Home Gateway MAC Media Access Control HIP Host Identify Protocol MAN Metropolitan Area Network IACD Interface Adjustment Capability Descriptor MEF Metro Ethernet Forum ICR Ingress Committed Rate MGW Media Gateway ICT Information and Communication Technology MIB Management Information Base IEEE Institute of Electrical and Electronics Engineers MLD Multicast Listener Discovery MLTE Multilayer Traffic Engineering MME Mobility Management Entity MPLS Multi-Protocol Label Switching IETF Internet Engineering Task Force (www.ietf.org) IF Interface ISC Interface Switching Capabilities IGMP Internet Group Management Protocol IGP Interior Gateway Protocol IGRP Interior Gateway Routing Protocol IntServ Integrated Services, IETF IP Internet Protocol IPTV IP television ISCD Interface Switching Capability Descriptor ISDN Integrated Services Digital Network IS-IS Intermediate System - Intermediate System; Link State Routing Protocol from OSI (IGP) ISO International Organization for Standardization ISP Internet Service Provider ITIL IT Infrastructure Library ITU International Telecommunication Union L2F Layer 2 Forwarding L2TP Layer 2 Tunnel Protocol © SPARC consortium 2012 MPLS-TP MPLS Transport Profile MSC Mobile Switch Controller MTU Maximum Transmission Unit NAT Network Address Translation NGN Next-Generation Network NIC Network Interface Controller NMS Network Management System NNI Network-to-Network Interface NP Network Provider NSP Native Service Processing NTP Network Time Protocol OAM “Operation, Administration and Maintenance” or “Operations and Maintenance” ODU Optical Data Unit OF OpenFlow OFDM Orthogonal Frequency Division Multiplexing OLT Optical Line Termination OSI Open Systems Interconnection Page 124 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC OSNR Optical Signal-to-Noise Ratio RTT Round Trip Time OSPF Open Shortest Path First; Link State Routing Protocol from IETF (IGP) SAE System Architecture Evolution SAP Service Access Point OTN Optical Transport Network SBC Session Border Controller OTU Optical Transport Unit SCP Service Creation Point OXC Optical Cross-Connect SDH Synchronous Digital Hierarchy PANA Protocol for carrying Authentication for Network Access RSVP (-TE) ReSource reserVation Protocol (-Traffic Engineering) PBB-TE Provider Backbone Bridge Traffic Engineering SDN Software Defined Networking PCEP Path Computation Element Communication Protocol SDU Service Data Unit SE Service Edge PDU Protocol Data Unit SGW Serving Gateway PE Provider Edge; Service Creation Point for IPVPN SHG Separate HG, separate HG device with virtual HG PER Provider Edge Router SIP Session Initiation Protocol PGW Packet Data Network Gateway SIPTO Selective IP Traffic Offload PIM Protocol Independent Multicast SLA Service Level Agreement PIP Physical Infrastructure Provider SLNS PMIP Proxy Mobile IP Separate LNS, separate LNS device with virtual LNS PoP Point of Presence SMS Service Management System POTS Plain Old Telephony Service SNMP Simple Network Management Protocol PPP Point-to-Point Protocol SONET Synchronous Optical Network PPPoE PPP over Ethernet SP Service Provider PSTN Public Switched Telephone Network SPARC Split Architecture for carrier-grade networks PVC Permanent Virtual Circuit (permanent L2 connection, e.g., Frame Relay, ATM) SSID Service Set Identifier SSM Source Specific Multicast PWE PseudoWire Emulation STA Station QoE Quality of Experience STM QoS Quality of Service; general for differentiated quality of services or absolute quality of services. Synchronous Transfer Module (STM-1: 155 Mbit/s, STM-4: 622 Mbit/s, STM-16: 2.5 Gbit/s; STM-64: 10 Gbit/s) TCAM Ternary Content Addressable Memory QPSK Quadrature Phase-Shift Keying TCM Tandem Connection Monitoring RADIUS Remote Authentication Dial-In User Service TCP Transmission Control Protocol RAR Remote Access Router (SCP for OCN) TDM Time Division Multiplexing RARP Reverse ARP TE Traffic Engineering RFC Request for Comment (in IETF) TKIP Temporal Key Integrity Protocol RGW Residential Gateway ToR Top of the Rack RIB Routing Information Bases ToS Type of Service RIP Routing Information Protocol; Distance Vector Routing Protocol from IETF (EGP) TR Technical Report (from BBF) TTL Time to live ROADM Reconfigurable Optical Add-Drop Multiplexer UDP User Datagram Protocol RR Route Reflector for BGP/MP-BGP UNI User Network Interface RTP Real Time Protocol © SPARC consortium 2012 Page 125 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC VEB Virtual Ethernet Bridges VSI Virtual Station Interface VEPA Virtual Ethernet Port Aggregator WAN Wide Area Network VLAN Virtual LAN WDM Wavelength Division Multiplexing VM Virtual Machine WEP Wired Equivalent Privacy vNIC Virtual NIC WFQ Weighted Fair Queuing VoIP Voice over IP WP Work Package VPLS Virtual Private LAN Service WSON Wavelength Switched Optical Network VPN Virtual Private Network WT Working Text (from BBF) © SPARC consortium 2012 Page 126 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] K. Kompella, Y. Rekther, “Virtual Private LAN service (VPLS): Using BGP for Auto-Discovery and Signaling,” IETF RFC 4761, 2007 (accessed: 2011-09-05). Metro Ethernet Forum (MEF), online: http://metroethernetforum.org/index.php (accessed: 2011-0905) OpenFlow 1.2 Proposals, wiki online: http://www.openflow.org/wk/index.php/ OpenFlow_1_2_proposal (accessed: 2011-09-05) W. Simpson, “The Point-to-Point Protocol (PPP),” IETF RFC 1661, 1994 (accessed: 2011-09-05) Toward Real Energy-efficient Network Design (TREND), online: www.fp7-trend.eu (accessed: 201109-05) Energy efficiency in large scale distributed systems (IC804), online http://www.irit.fr/cost804/ (accessed: 2011-09-05) GreenTouch, online: http://www.greentouch.org (accessed: 2011-09-05) GreenStar Network, online: http://www.greenstarnetwork.com/ (accessed: 2011-09-05) R.S. Tucker, R. Parthiban, J. Baliga, K. Hinton, R.W.A. Ayre, W.V. Sorin, “Evolution of WDM optical IP networks: A cost and energy perspective,” IEEE Journal of Lightwave Technology 27(3), 243-252, 2009 Andrew R. Curtis, Jeffrey C. Mogul, Jean Tourrilhes, Praveen Yalagandula, Puneet Sharma, Sujata Banerjee: “DevoFlow: Scaling Flow Management for High-Performance Networks.” SIGCOMM’11. R. Sherwood, G. Gibb, K.-K. Yap, G. Appenzeller, M. Casado, N. McKeown, and G. Parulkar: “Can the production network be the testbed?” In OSDI, 2010. A. Tavakoli, M. Casado, T. Koponen, and S. Shenker: “Applying NOX to the Datacenter.” In HotNets, 2009. Z. Cai, A. L. Cox, and T. S. E. Ng. “Maestro: A System for Scalable OpenFlow Control”. Tech. Rep. TR10-08, Rice University, 2010. A.Farrel, I.Bryskin. “GMPLS – Architecture and Applications.” Morgan Kaufman, Elsevier 2006. Expedient. Control framework for OpenFlow test beds. Available online: http://yuba.stanford.edu/~jnaous/expedient/ and http://groups.geni.net/geni/wiki/OpenFlow/Expedient ICT-258457 SPARC Deliverable D2.1 “Initial Definition of Use Cases and Carrier Requirements.” Available online: http://www.fp7-sparc.eu/project/deliverables/ M.Casado, T.Koponen, D.Moon, S.Shenker. “Rethinking Packet Forwarding Hardware.” Seventh ACM Workshop on Hot Topics in Networks (HotNets-VII), Calgary, Alberta, Canada, October 6-7, 2008 J.C.Mogul, P.Yalagandula, J.Tourrilhes, R.McGeer, S.Banerjee, T.Connors, P.Sharma. “API Design Challenges for Open Router Platforms on Proprietary Hardware.” Seventh ACM Workshop on Hot Topics in Networks (HotNets-VII), Calgary, Alberta, Canada, October 6-7, 2008 OpenFlow 1.1 Specification, online: www.openflow.org/documents/openflow-spec-v1.1.0.pdf (accessed: 2011-11-29) SARA Computing & Networking Services, “dot1ag-utils.” Online: https://noc.sara.nl/nrg/dot1agutils/index.html (accessed: 2011-12-01) L. Yang, R. Dantu, T. Anderson, R. Gopal. “Forwarding and Control Element Separation (ForCES) Framework.” IETF RFC 3746, 2004 (accessed: 2011-12-20). Rob Sherwood, Glen Gibby, Kok-Kiong Yapy, Guido Appenzellery, Martin Casado, Nick McKeowny, Guru Parulkary. "FlowVisor: A Network Virtualization Layer," http://www.openflowswitch.org/downloads/technicalreports/openflow-tr-2009-1-flowvisor.pdf (accessed: 2011-12-20) Gude, N., Koponen, T., Pettit, J., Pfaff, B., Casado, M., McKeown, N., and Shenker, S. 2008. NOX: toward an operating system for networks. SIGCOMM Comput. Commun. Rev. 38, 3 (Jul. 2008), 105110. http://doi.acm.org/10.1145/1384609.1384625 © SPARC consortium 2012 Page 127 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC [24] Beacon Openflow controller, https://openflow.stanford.edu/display/Beacon/Home (accessed 2011-1220) [25] Trema Openflow framework, http://trema.github.com/trema/ (accessed 2011-12-20) [26] John Day, Ibrahim Matta, Karim Mattar."Networking is IPC: a guiding principle to a better internet." CoNEXT, 2008 [27] Y. Lee, G. Bernstein, Wataru Imajuku.“ Framework for GMPLS and PCE Control of Wavelength Switched Optical Networks (WSON),” IETF Internet draft (informational), 2011, https://tools.ietf.org/html/draft-ietf-ccamp-rwa-wson-framework-12 (accessed 2011-12-20) [28] M.R. Nascimento, C.E. Rothenberg, M.R. Salvador, M.F. Magalhães,"QuagFlow: partnering Quagga with OpenFlow," SIGCOMM, 2010 [29] Casado, M. and Koponen, T. and Ramanathan, R. and Shenker, S., "Virtualizing the network forwarding plane," Workshop on Programmable Routers for Extensible Services of Tomorrow, 2010 [30] Salvadori, E. and Corin, R.D. and Gerola, M. and Broglio, A. and De Pellegrini, F., "Demonstrating generalized virtual topologies in an openflow network," ACM SIGCOMM, 2011 [31] OpenvSwitch, "Open vSwitch: An Open Virtual Switch," http://openvswitch.org (accessed 2012-1222) [32] lxc Linux Containers, http://lxc.sourceforge.net (accessed 2012-12-22) [33] Foster, N. and Harrison, R. and Freedman, M.J. and Monsanto, C. and Rexford, J. and Story, A. and Walker, D., "Frenetic: A network programming language," ACM SIGPLAN, 2011 [34] IXIA, "10-Gigabit Ethernet Switch Performance Testing," http://www.ixiacom.com/pdfs/library/white_papers/10ge.pdf (accessed 2012-12-22) [35] Barreiros, M. and Lundqvist, P., "QOS-enabled Networks: Tools and Foundations," Wiley, 2011 [36] Farrel, A. et al., "Network Quality of Service Know It All," Morgan Kaufmann, 2009 [37] Perros, H.G., "An introduction to ATM networks," Wiley, 2002 [38] Shirazipour M., Tatipamula M., “Design Considerations for OpenFlow Extensions Toward MultiDomain, Multi-Layer, and Optical Networks”, ECOC OFELIA Workshop, 2011 [39] Authenrieth, A.: “Extending OpenFlow to Optical Wavelength Switching – Challenges, Requirements, and Integration Model”, ECOC OFELIA Workshop, 2011, slides to be found at http://www.fp7ofelia.eu/news-and-events/workshops/ecoc-2011-ofelia-workshop/ (also for [38]) [40] N. Leymann, B. Decraene, C. Filsfils, M. Konstantynowicz, D. Steinberg, "Seamless MPLS Architecture," IETF Internet draft (informational), 2011 [41] OpenFlow 1.3 Specification, online: https://www.opennetworking.org/images/stories/downloads/specification/openflow-spec-v1.3.0.pdf (accessed: 2012-07-11) [42] OpenFlow-Config 1.1, online: https://www.opennetworking.org/images/stories/downloads/ofconfig/of-config-1.1.pdf (accessed: 2012-07-11) [43] Open Networking Foundation (ONF), online: https://www.opennetworking.org/ (accessed: 2012-0720) [44] Open Networking Foundation: “Software-Defined Networking: The New Norm for Networks”, ONF White Paper, 2012, online: https://www.opennetworking.org/images/stories/downloads/whitepapers/wp-sdn-newnorm.pdf (accessed: 2012-07-20) [45] McKeown, N. et al, “OpenFlow: enabling innovation in campus networks”, ACM SIGCOMM CCR, 38 (2), 2008 [46] Raman, L., “OSI system and network management”, Communications Magazine 36 (3), IEEE, 1998 [47] ITU-T Recommendation M.3400, “TMN management functions”, 2000 [48] ITU-T Recommendation M.3010, “Principles for a telecommunications management network”, 2000 [49] ITU-T Recommendation G.7710/Y.1701, “Common equipment management function Requirements”, 2012 [50] ITU-T Recommendation G.7718/Y.1709, “Framework for ASON management”, 2005 © SPARC consortium 2012 Page 128 of 129 WP3, Deliverable 3.3 Split Architecture - SPARC [51] ITU-T Recommendation I.322, “Generic protocol reference model for telecommunication networks”, 1999 [52] Das, S., “Extensions to the OpenFlow protocol in support of circuit switching”, addendum to OpenFlow Protocol Specification v1.0 – Circuit Switch Addendum v0.3, 2010, online: http://www.openflow.org/wk/index.php/PAC.C (accessed: 2012-07-11) [53] Shirazipour M. et al., “Realizing Packet-Optical Integration with SDN and OpenFlow 1.1 Extensions”, Workshop on Software Defined Networks (SDN’12), IEEE, 2012 [54] Rexford, J. et al., “Network-wide decision making: toward a wafer-thin control plane”, HotNets III, 2004,online: http://www.cs.cmu.edu/~acm/papers/rexford-hotnetsIII.pdf [55] Keymile, “White Paper: Advantages of Voice-over-IP with IP-based multi-service access nodes (IP MSAN)”, available at http://www.keymile.com [56] Andersson, L. et al., “MPLS Transport Profile (MPLS-TP) Control Plane Framework”, IETF RFC 6373, 2012 (accessed: 2012-08-30). [57] Lam K. et al., “Network Management Requirements for MPLS-based Transport Networks” IETF RFC 5951, 2007 (accessed: 2011-09-05). [58] Duncan R. et al., “PacketC Language and Parallel Processing of Masked Databases”, http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=true&arnumber=5599193&contentType=Confe rence+Publications [59] Bocci M., Vigoureux M. et al., “RFC 5586 - MPLS Generic Associated Channel”, available online: http://tools.ietf.org/html/rfc5586 [60] Allan D., Swallow G., et al. “RFC 6428 - Proactive Connectivity Verification, Continuity Check, and Remote Defect Indication for the MPLS Transport Profile”, November 2011, available online: http://tools.ietf.org/html/rfc6428 [61] Broadband Forum Technical Report TR-101 Issue 2 “Migration to Ethernet-Based DSL Aggregation”, July 2011, available online. http://www.broadband-forum.org/technical/download/TR-101_Issue-2.pdf [62] Forsberg D., Ohba Y., Patil B., Tschofenig H. and Yegin A., “RFC 5191 - Protocol for Carrying Authentication for Network Access (PANA)”, May 2008, available online http://tools.ietf.org/html/rfc5191 [63] Patrick M., “RFC 3046 DHCP Relay Agent Information Option”, January 2001, available online http://tools.ietf.org/html/rfc3064 [64] Seaman Mick, “IEEE Standard for Local and metropolitan area networks--Port-Based Network Access Control”, February 5th 2010, ISBN 9780738161464 [65] RouteFlow Project, available online at https://sites.google.com/site/routeflow/ [66] NFTables Project, available online at http://en.wikipedia.org/wiki/Nftables [67] Specification and Description Language (SDL) – available online at http://www.itu.int/rec/T-REC-z [68] Intel I/O Acceleration Technology (IOAT), available online at http://www.intel.com/content/www/us/en/wireless-network/accel-technology.html [69] E. Mannie, “Generalized Multi-Protocol Label Switching (GMPLS) Architecture,” IETF RFC 3945, 2004 (accessed: 2012-10-29). [70] A. Farrel, J.-P. Vasseur, J. Ash, “A Path Computation Element (PCE)-Based Architecture,” IETF RFC 4655, 2006 (accessed: 2012-10-29). [71] G. Swallow et al., “Generalized Multiprotocol Label Switching (GMPLS) User-Network Interface (UNI): Resource ReserVation Protocol-Traffic Engineering (RSVP-TE) Support for the Overlay Model”, RFC 4208, 2005 (access: 2012-10-29) [72] A. Tootoonchian and G. Yashar, "HyperFlow: A distributed control plane for OpenFlow." Proceedings of the 2010 internet network management conference on Research on enterprise networking, 2010. [73] OSI, “Information technology -- Open Systems Interconnection -- Basic Reference Model: The Basic Model, ISO/IEC 7498-1:1994”, 1994, Available online at: http://standards.iso.org/ittf/PubliclyAvailableStandards/s020269_ISO_IEC_7498-1_1994(E).zip (accessed: 2012-10-30) © SPARC consortium 2012 Page 129 of 129

Log In

Split Architecture for Large Scale Wide Area Networks

Related papers

Related papers