Skip to main content

    Tor Skeie

    Clouds offer flexible and economically attractive compute and storage solutions for enterprises. However, the effectiveness of cloud computing for high-performance computing (HPC) systems still remains questionable. When clouds are... more
    Clouds offer flexible and economically attractive compute and storage solutions for enterprises. However, the effectiveness of cloud computing for high-performance computing (HPC) systems still remains questionable. When clouds are deployed on lossless interconnection networks, like InfiniBand (IB), challenges related to load-balancing, low-overhead virtualization, and performance isolation hinder full potential utilization of the underlying interconnect. Moreover, cloud data centers incorporate a highly dynamic environment rendering static network reconfigurations, typically used in IB systems, infeasible. In this paper, we present a framework for a self-adaptive network architecture for HPC clouds based on lossless interconnection networks, demonstrated by means of our implemented IB prototype. Our solution, based on a feedback control and optimization loop, enables the lossless HPC network to dynamically adapt to the varying traffic patterns, current resource availability, workload distributions, and also in accordance with the service provider-defined policies. Furthermore, we present IBAdapt, a simplified ruled-based language for the service providers to specify adaptation strategies used by the framework. Our developed self-adaptive IB network prototype is demonstrated using state-of-the-art industry software. The results obtained on a test cluster demonstrate the feasibility and effectiveness of the framework when it comes to improving Quality-of-Service compliance in HPC clouds.
    Exascale computing systems are being built with thousands of nodes. A key component of these systems is the interconnection network. The high number of components significantly increases the probability of failure. If failures occur in... more
    Exascale computing systems are being built with thousands of nodes. A key component of these systems is the interconnection network. The high number of components significantly increases the probability of failure. If failures occur in the interconnection network, they may isolate a large fraction of the machine. For this reason, an efficient fault-tolerant mechanism is needed to keep the system interconnected, even in the presence of faults. A recently proposed topology for these large systems is the hybrid KNS family that provides supreme performance and connectivity at a reduced hardware cost. This paper present a fault-tolerant routing methodology for the KNS topology that degrades performance gracefully in the presence of faults and tolerates a reasonably large number of faults without disabling any healthy node. In order to tolerate network failures, the methodology uses a simple mechanism: for some source-destination pairs, only if necessary, packets are forwarded to the destination node through a set of intermediate nodes (without being ejected from the network) which allow avoiding faults. The evaluation results shows that the methodology tolerates a large number of faults. Furthermore, the methodology offers a gracious performance degradation. For instance, performance degrades only 1% for a 2D-network with 1024 nodes and 1% faulty links.
    As the size of high-performance computing systems grows, the number of events requiring a network reconfiguration, as well as the complexity of each reconfiguration, is likely to increase. In large systems, the probability of component... more
    As the size of high-performance computing systems grows, the number of events requiring a network reconfiguration, as well as the complexity of each reconfiguration, is likely to increase. In large systems, the probability of component failure is high. At the same time, with more network components, ensuring high utilization of network resources becomes challenging. Reconfiguration in interconnection networks, like InfiniBand (IB), typically involves computation and distribution of a new set of routes in order to maintain connectivity and performance. In general, current routing algorithms do not consider the existing routes in a network when calculating new ones. Such configuration-oblivious routing might result in substantial modifications to the existing paths, and the reconfiguration becomes costly as it potentially involves a large number of source-destination pairs. In this paper, we propose a novel routing algorithm for IB based fat-tree topologies, SlimUpdate. SlimUpdate employs techniques to preserve existing forwarding entries in switches to ensure a minimal routing update, without any performance penalty, and with minimal computational overhead. We present an implementation of SlimUpdate in OpenSM, and compare it with the current de facto fat-tree routing algorithm. Our experiments and simulations show a decrease of up to 80% in the number of total path modifications when using SlimUpdate routing, while achieving similar or even better performance than the fat-tree routing in most reconfiguration scenarios.
    ABSTRACT
    Research Interests:
    ABSTRACT An increasing amount of interconnect technologies rely on source routing to forward packets through the network. It is therefore important to develop methods for fault tolerance that are well suited for source routed networks.... more
    ABSTRACT An increasing amount of interconnect technologies rely on source routing to forward packets through the network. It is therefore important to develop methods for fault tolerance that are well suited for source routed networks. Dynamic fault tolerance allows the network to remain available through the occurrence of faults, as opposed to static fault tolerance which requires the network to be halted to reconfigure it. Source routing readily supports the source node choosing a different path when a fault occurs, but using this approach, packets already in the network will be lost. Local dynamic fault tolerance, where the packet is routed around the fault locally, would prevent much of the traffic being lost during failures, but this is cumbersome to achieve in source routed networks since packets encountering a fault will need to follow a path different from that encoded in the packet header. In this paper we present a mechanism to achieve local dynamic fault tolerance in source routed fat trees, a topology that has widespread use in supercomputer systems, and compare it with endpoint dynamic fault tolerance. We also show that by combining the two approaches we achieve performance superior to any of the two individually
    Virtualization of computing resources is becoming increasingly important both for high-end servers and multi-core CPUs. In a virtualized system, the set of resources that constitute a virtual compute entity should be spatially separated... more
    Virtualization of computing resources is becoming increasingly important both for high-end servers and multi-core CPUs. In a virtualized system, the set of resources that constitute a virtual compute entity should be spatially separated from each other. Dividing the cores on a chip, or the CPUs in a high end server into disjoint sets for each task is a trivial problem.
    Interconnection networks play a key role in the fault toler- ance of massively parallel computers, since faults may isolate a large fraction of the machine containing many healthy nodes. In this paper, we present a methodology to design... more
    Interconnection networks play a key role in the fault toler- ance of massively parallel computers, since faults may isolate a large fraction of the machine containing many healthy nodes. In this paper, we present a methodology to design fully adaptive fault-tolerant routing algorithms for direct interconnection networks that can be applied to dif- ferent regular topologies. The methodology is mainly based on the selec- tion of an intermediate node (if needed) for each source-destination pair. Packets are adaptively routed to the intermediate node and, from this node, they are adaptively forwarded to their destination. This methodol- ogy requires only one additional virtual channel, even for tori. Evaluation results show that the methodology is 7-fault tolerant, and for up to 14 faults, more than 99% of the combinations are tolerated, also without significantly degrading performance in the presence of faults.
    Research Interests:
    ... Frank Olaf Sem-Jacobsen1,2, ˚Ashild Grønstad Solheim1,2, Olav Lysne1,2, Tor Skeie1,2, and Thomas Sødring2 1Department of Informatics 2Networks and Distributed Systems ... The second dataset is the continuous black line, which we call... more
    ... Frank Olaf Sem-Jacobsen1,2, ˚Ashild Grønstad Solheim1,2, Olav Lysne1,2, Tor Skeie1,2, and Thomas Sødring2 1Department of Informatics 2Networks and Distributed Systems ... The second dataset is the continuous black line, which we call the network performance ratio. ...
    Research Interests:
    a b s t r a c t Virtualization is the key to efficient resource utilization and elastic resource allocation in cloud comput- ing. It enables consolidation, the on-demand provisioning of resources, and elasticity through live mi- gration.... more
    a b s t r a c t Virtualization is the key to efficient resource utilization and elastic resource allocation in cloud comput- ing. It enables consolidation, the on-demand provisioning of resources, and elasticity through live mi- gration. Live migration makes it possible to optimize resource usage by moving virtual machines (VMs) between physical servers in an application transparent manner. It does, however, require a flexible, high- performance, scalable virtualized I/O architecture to reach its full potential. This is challenging to achieve with high-speed networks such as InfiniBand and remote direct memory access enhanced Ethernet, be- cause these devices usually maintain their connection state in the network device hardware. Fortunately, the single root IO virtualization (SR-IOV) specification addresses the performance and scalability issues. With SR-IOV, each VM has direct access to a hardware assisted virtual device without the overhead in- troduced by emulation or para-virtualization. However, SR-IOV does not address the migration of the network device state. In this paper we present and evaluate the first available prototype implementation of live migration over SR-IOV enabled InfiniBand devices.
    ABSTRACT Computer architectures for high performance computing have traditionally been based on an assumption of one parallel application running alone on one machine. The current trend is, however, that huge computer installations offer... more
    ABSTRACT Computer architectures for high performance computing have traditionally been based on an assumption of one parallel application running alone on one machine. The current trend is, however, that huge computer installations offer compute power to a set of users or customers, each demanding only a subset of the available compute resources. This places new requirements on the architecture, in that it must support dynamic partitioning of the resources into several virtual servers as demand changes. We introduce a novel framework which supports flexible formation of such virtual servers while preventing interference between the communication of different virtual servers. This paper investigates the impacts of a shared interconnection network on applications running on virtual compute servers. We show that the interconnect performance supplied to each job is highly unpredictable, and that a job can experience a performance degradation of 97% when its traffic interferes with the traffic of concurrent jobs. With a minor reduction in the utilization of each processing node, this can be considerably improved through a combination of routing-containment in the interconnection network and a carefully designed resource allocation strategy.
    Research Interests:
    Massively parallel computing systems are being built with thousands of nodes. Because of the high number of components, it is critical to keep these systems running even in the presence of failures. Interconnection networks play a... more
    Massively parallel computing systems are being built with thousands of nodes. Because of the high number of components, it is critical to keep these systems running even in the presence of failures. Interconnection networks play a key-role in these systems, and this paper proposes a fault-tolerant routing methodology for use in such networks. The methodology supports any minimal routing function
    ABSTRACT A modern supercomputer or large-scale server consists of a huge set of components that perform processing functions and various forms of input/output and memory functions. All of the components unite in a complex collaboration to... more
    ABSTRACT A modern supercomputer or large-scale server consists of a huge set of components that perform processing functions and various forms of input/output and memory functions. All of the components unite in a complex collaboration to perform the tasks of the entire system. The communication between these components that allows this collaboration to take place is supported by an infrastructure called the interconnection network.
    Research Interests:
    ABSTRACT A modern supercomputer or large-scale server consists of a huge set of components that perform processing functions and various forms of input/output and memory functions. All of the components unite in a complex collaboration to... more
    ABSTRACT A modern supercomputer or large-scale server consists of a huge set of components that perform processing functions and various forms of input/output and memory functions. All of the components unite in a complex collaboration to perform the tasks of the entire system. The communication between these components that allows this collaboration to take place is supported by an infrastructure called the interconnection network.
    Understanding the nature of trac in high-speed communication systems is essential for achieving QoS in these networks. A rst step towards this goal is understanding how basic QoS mechanisms work and ae cts the network predict- ability... more
    Understanding the nature of trac in high-speed communication systems is essential for achieving QoS in these networks. A rst step towards this goal is understanding how basic QoS mechanisms work and ae cts the network predict- ability before we introduce more complex mechanisms such as admission control. In this paper we analyse the ee ct of a Di- Serv inspired QoS concept applied to virtual cut-through net- works. The main ndings from our study are that (i) through- put dier entiation can be achieved by weighting of virtual lanes (VL) and by classifying VLs as either low or high priority, (ii) the balance between VL weighting and VL load is not crucial when the network is operating below the saturation point, (iii) jitter, however, is large and good jitter characteristics seems unachievable with such a relative scheme.
    Research Interests:
    End-to-end congestion control is the main method of congestion control in the Internet, and achieving consistent low queuing latency with end-to-end methods is a very active area of research. Even so, achieving consistent low queuing... more
    End-to-end congestion control is the main method of congestion control in the Internet, and achieving consistent low queuing latency with end-to-end methods is a very active area of research. Even so, achieving consistent low queuing latency in the Internet still remains an unsolved problem. Therefore, we ask “What are the fundamental limits of end-to-end congestion control?” We find that the unavoidable queuing latency for bestcase end-to-end congestion control is on the order of hundreds of milliseconds under conditions that are common in the Internet. Our argument depends on two things: The latency of congestion signaling – at minimum the speed of light – and the fact that link capacity may change rapidly for an end-to-end path in the Internet.
    In September 2020, the Broadband Forum published a new industry standard for measuring network quality. The standard centers on the notion of quality attenuation. Quality attenuation is a measure of the distribution of latency and packet... more
    In September 2020, the Broadband Forum published a new industry standard for measuring network quality. The standard centers on the notion of quality attenuation. Quality attenuation is a measure of the distribution of latency and packet loss between two points connected by a network path. A vital feature of the quality attenuation idea is that we can express detailed application requirements and network performance measurements in the same mathematical framework. Performance requirements and measurements are both modeled as latency distributions. To the best of our knowledge, existing models of the 802.11 WiFi protocol do not permit the calculation of complete latency distributions without assuming steady-state operation. We present a novel model of the WiFi protocol. Instead of computing throughput numbers from a steady-state analysis of a Markov chain, we explicitly model latency and packet loss. Explicitly modeling latency and loss allows for both transient and steady-state anal...
    Nowadays, the use of multimedia applications that present QoS requirements is increasing rapidly. Advanced Switching (AS) is a new interconnection network technology that expands the capabilities of PCI Express. AS provides mechanisms... more
    Nowadays, the use of multimedia applications that present QoS requirements is increasing rapidly. Advanced Switching (AS) is a new interconnection network technology that expands the capabilities of PCI Express. AS provides mechanisms that can be used to support QoS. Specifically, an AS fabric permits us to employ virtual channels, egress link scheduling, and an admission control mechanism to differentiate between traffic flows. In this paper we examine these mechanisms and show how to provide QoS based on bandwidth and latency requirements. Furthermore, a new algorithm, Weighted Fair Queuing Credit Aware, is proposed as a specific implementation of one of the schedulers suggested by the AS specification. 1
    In large high-performance computing systems, the probability of component failure is high. At the same time, for a sustained system performance, reconfiguration is often needed to ensure high utilization of available resources.... more
    In large high-performance computing systems, the probability of component failure is high. At the same time, for a sustained system performance, reconfiguration is often needed to ensure high utilization of available resources. Reconfiguration in interconnection networks, like InfiniBand (IB), typically involves computation and distribution of a new set of routes in order to maintain connectivity and performance. In general, current routing algorithms do not consider the existing routes in a network when calculating new ones. Such configuration-oblivious routing might result in substantial modifications to the existing paths, and the reconfiguration becomes costly as it potentially involves a large number of source–destination pairs. In this paper, we propose a novel routing algorithm for IB-based fat-tree topologies, SlimUpdate. SlimUpdate employs path preservation techniques to achieve a decrease of up to 80 % in the number of total path modifications, as compared to the OpenSM’s fat-tree routing algorithm, in most reconfiguration scenarios. Furthermore, we present a metabase-aided re-routing method for fat-trees, based on destination leaf-switch multipathing. Our proposed method significantly reduces network reconfiguration overhead, while providing greater routing flexibility. On successive runs, our proposed method saves up to 85 % of the total routing time over the traditional re-routing scheme. Based on the metabase-aided routing, we also present a modified SlimUpdate routing algorithm to dynamically optimize routes for a given MPI node order.
    Rerouting around faulty components and migration of jobs both require reconfiguration of data structures in the Queue Pairs residing in the hosts on an InfiniBand cluster. In this paper we report an implementation of dynamic... more
    Rerouting around faulty components and migration of jobs both require reconfiguration of data structures in the Queue Pairs residing in the hosts on an InfiniBand cluster. In this paper we report an implementation of dynamic reconfiguration of such host side data-structures. Our implementation preserves the Queue Pairs, and lets the application run without being interrupted. With this implementation, we demonstrate
    The paper first gives an overview of the required functions for providing Internet connectivity and mobility management for mobile ad-hoc networks (MANETs). Internet gateway selection is one of these functions. Since multiple Internet... more
    The paper first gives an overview of the required functions for providing Internet connectivity and mobility management for mobile ad-hoc networks (MANETs). Internet gateway selection is one of these functions. Since multiple Internet gateways might exist on the same MANET domain, a hybrid metric for Internet gateway selection is proposed as a replacement of the shortest hop-count metric. The hybrid metric provides load-balancing of intra/inter-MANET traffic. Simulation results show that ad-hoc routing protocols, using our proposed metric get better performance in terms of packet delivery ratio and transmission delay, at the cost of slightly increased signalling overhead.

    And 137 more