ES2237346B2

ES2237346B2 - TOLERANT ROADING MECHANISM TO HIGHLY SCALABLE FAULTS.

Info

Publication number: ES2237346B2
Application number: ES200500530A
Authority: ES
Inventors: Valentin Puente Varona; Jose Angel Gregorio Monasterio; Fernando Valle Alonso; Ramon Beivide Palacio
Original assignee: Universidad de Cantabria
Current assignee: Universidad de Cantabria
Priority date: 2005-03-01
Filing date: 2005-03-01
Publication date: 2006-07-16
Anticipated expiration: 2025-03-01
Also published as: ES2237346A1

Abstract

Mecanismo de encaminamiento tolerante a fallos altamente escalable denominado S-Immunet, que se caracteriza por ser un mecanismo eficiente para tolerar fallos en redes de interconexión de computadores paralelos y distribuidos. El mecanismo está basado, por un lado, en un método de reencaminar los mensajes cuando se produce un fallo en la red y por otro lado, en una estructura hardware específica del aparato encaminador de mensajes. Las principales diferencias con los mecanismos previos de tolerancia a fallos son que la invención puede ser aplicada a redes muy grandes (miles de nodos) de tipo k-ary n-cube; no produce una sobrecarga de la red en ausencia de fallo; reconfiguración automática y transparente a la aplicación; reparados los componentes se recupera el rendimiento de la red antes del fallo y el nuevo mecanismo además es capaz de tolerar cualquier número de fallos de enlace y cualquier combinación espacial y temporal de fallos.Highly scalable fault-tolerant routing mechanism called S-Immunet, which is characterized as an efficient mechanism to tolerate faults in interconnection networks of parallel and distributed computers. The mechanism is based, on the one hand, on a method of re-routing the messages when a network failure occurs and on the other hand, on a specific hardware structure of the message routing apparatus. The main differences with the previous fault tolerance mechanisms are that the invention can be applied to very large networks (thousands of nodes) of type k-ary n-cube; it does not produce an overload of the network in the absence of failure; automatic and transparent reconfiguration to the application; Once the components are repaired, the network performance is recovered before the failure and the new mechanism is also able to tolerate any number of link failures and any spatial and temporal combination of failures.

Description

Mecanismo de encaminamiento tolerante a fallos altamente escalable.Fault-tolerant routing mechanism highly scalable

Technical sector

La invención está relacionada con las redes de interconexión de múltiples computadores (multicomputadores ó multiprocesadores) y más concretamente con la tolerancia a fallos que puedan producirse en las redes que los componen.The invention is related to the networks of interconnection of multiple computers (multicomputers or multiprocessors) and more specifically with fault tolerance that can occur in the networks that compose them.

State of the art

Lograr elevadas tasas de disponibilidad es objetivo deseable en la mayoría de los computadores. Sin embargo, lograr esta meta en grandes computadores paralelos con tareas de carácter crítico deja de ser una opción para convertirse en una necesidad. Cualquier computador de este tamaño debe disponer de mecanismos eficientes para tolerar la aparición de fallos a todos sus niveles, pues la multiplicidad de componentes incrementa de forma considerable su tiempo medio entre fallos (MTBF).Achieving high availability rates is desirable goal in most computers. But nevertheless, achieve this goal in large parallel computers with tasks of critical character ceases to be an option to become a need. Any computer of this size must have efficient mechanisms to tolerate the occurrence of failures at all its levels, because the multiplicity of components increases by considerable way its mean time between failures (MTBF).

El contexto en el que se centra esta invención está en el subsistema de interconexión. Los problemas de tolerancia a fallos dentro de la red de interconexión se pueden subdividir entre errores o fallos a nivel de enlace, errores a nivel de encaminador y a nivel de red. Nosotros nos centraremos sobre el nivel de red. A este nivel, el problema fundamental no esta relacionado con las técnicas de implementación, como ocurre en los otros dos niveles, siendo un problema más amplio. Cuando una red de interconexión sufre un fallo en uno de sus recursos (enlaces o nodos) inmediatamente sus propiedades topológicas se ven alteradas. Esto no resulta difícil de solventar desde el punto de vista del mecanismo de encaminamiento empleado para acercar los paquetes a destino. Sin embargo, ese cambio afecta a otros aspectos más sutiles. El más importante es que las estrategias empleadas para evitar cualquier posible anomalía en la red libre de fallo dejan de ser válidas. De acuerdo a esto, después de un fallo en un componente, incluso en redes con centenares de miles de componentes, puede quedar completamente inutilizada. El tipo de anomalía más íntimamente ligada a los aspectos topológicos es el bloqueo entre paquetes o deadlock. La producción de un deadlock en una parte de la red como consecuencia del fallo y posterior invalidación del mecanismo para su evitación puede hacer inútil el sistema completo.The context in which this invention focuses is on the interconnection subsystem. Fault tolerance problems within the interconnection network can be subdivided between errors or failures at the link level, errors at the router level and at the network level. We will focus on the network level. At this level, the fundamental problem is not related to the implementation techniques, as occurs in the other two levels, being a broader problem. When an interconnection network suffers a failure in one of its resources (links or nodes) immediately its topological properties are altered. This is not difficult to solve from the point of view of the routing mechanism used to bring the packets to their destination. However, that change affects other more subtle aspects. The most important is that the strategies used to avoid any possible failure in the fault-free network are no longer valid. Accordingly, after a failure in one component, even in networks with hundreds of thousands of components, it can be completely disabled. The type of anomaly most closely linked to the topological aspects is the blocking between packets or deadlock . The production of a deadlock in a part of the network as a result of the failure and subsequent invalidation of the mechanism for its avoidance can render the entire system useless.

Este problema no es nuevo y las soluciones propuestas por otros autores pasan desde eliminar aquellos recursos sanos que como consecuencia del fallo pueden inducir el deadlock [1], incluir nuevos recursos físicos o virtuales para solventar la situación [2]-[10] o reconfigurar adecuadamente la información topológica de la red empleando un mecanismo de evitación del deadlock lo suficientemente versátil. En [11] los autores de esta invención propusieron un nuevo mecanismo en este sentido, llamado Immunet. Sus principales virtudes eran que toleraba cualquier combinación espacio-temporal en los fallos, siempre y cuando la red permaneciera conexa.This problem is not new and the solutions proposed by other authors go from eliminating those healthy resources that as a consequence of the failure can induce the deadlock [1], include new physical or virtual resources to solve the situation [2] - [10] or reconfigure suitably the topological information of the network using a sufficiently versatile deadlock avoidance mechanism. In [11] the authors of this invention proposed a new mechanism in this regard, called Immunet. Its main virtues were that it tolerated any spatio-temporal combination in failures, as long as the network remained connected.

Sin embargo Immunet presenta una serie de deficiencias que la hacen casi impracticable cuando se trata de redes con decenas de miles de nodos. La pérdida de rendimiento observada después de un número de fallos reducido tiende a ser elevada. Esto limitaba la aplicabilidad de la idea en los grandes supercomputadores paralelos, que es donde la tolerancia a fallos tiene mas sentido. La invención que aquí se presenta, S- Immunet, es una evolución del mecanismo que permite lograr con total éxito una muy baja pérdida de rendimiento a muy bajo coste.However Immunet presents a series of deficiencies that make it almost impracticable when it comes to networks with tens of thousands of nodes. Loss of performance observed after a reduced number of failures tends to be high. This limited the applicability of the idea in large parallel supercomputers, which is where fault tolerance It makes more sense. The invention presented here, S-Immunet, It is an evolution of the mechanism that allows to achieve with total success a very low loss of performance at a very low cost.

La invención se apoya en la explotación de las características típicas de las redes de interconexión empleadas en estos sistemas. Así como la aplicabilidad de Immunet era completamente genérica, S- Immunet se especializa en redes k-ary n-cube. De esta manera, el nuevo mecanismo, además de tolerar de forma dinámica la aparición de fallos en la red, logra pérdidas de rendimiento por debajo del 1% cuando lo empleamos en una red con 1024 nodos con un solo fallo de enlace (los mejores resultados los obtenía la antigua versión que degeneraba hasta un 15% con el mismo fallo). Además, la invención permite la recuperación dinámica del rendimiento inicial de la red de interconexión cuando se lleva a cabo la reparación. Es decir, una vez que todos los fallos han ido siendo eliminados, el mecanismo es capaz de detectar la nueva situación y sin detener la ejecución de las aplicaciones que estén ejecutándose en la máquina paralela, puede llevar a cabo la mencionada recuperación.The invention is based on the exploitation of the typical characteristics of the interconnection networks used in these systems. Just as Immunet's applicability was completely generic, S-Immunet specializes in k-ary n-cube networks. In this way, the new mechanism, in addition to dynamically tolerating the occurrence of network failures, achieves performance losses below 1% when used in a network with 1024 nodes with a single link failure (the best results they obtained them the old version that degenerated up to 15% with the same fault). Furthermore, the invention allows the dynamic recovery of the initial performance of the interconnection network when the repair is carried out. That is, once all the faults have been eliminated, the mechanism is able to detect the new situation and without stopping the execution of the applications that are running on the parallel machine, it can carry out the mentioned recovery.

[1][one]: NR Adiga, GS Almasi, Y Aridor, M Bae, Rajkishore Barik, et al., "An Overview of the BlueGene/L Supercomputer", Supercomputing \underline{2002}.NR Adiga , GS Almasi , and Aridor , M Bae , Rajkishore Barik , et al ., "An Overview of the BlueGene / L Supercomputer", Supercomputing \ underline {2002}.

[2][2]: R.V. Boppana and S. Chalasani. "Fault-tolerant wormhole routing algorithms for mesh networks". IERF, Trans. on Computers, vol. 44, no.7, pp. 848-864, July \underline{1995}.RV Boppana and S. Chalasani . "Fault-tolerant wormhole routing algorithms for mesh networks". IERF, Trans. on Computers , vol. 44, no.7, pp. 848-864, July \ underline {1995}.

[3][3]: J. Bruck, R. Cypher, and C. Ho. "Fault-tolerant meshes with small degree". SIAM Journal of Computing, vol. 26, no. 6, pp. 1764-1784, December \underline{1997}.J. Bruck , R. Cypher , and C. Ho . "Fault-tolerant meshes with small degree". SIAM Journal of Computing , vol. 26, no. 6, pp. 1764-1784, December \ underline {1997}.

[4][4]: R. Casado, A. Bermudez, F. J. Quiles, J. L. Sanches, and J. Duato, "Performance evaluation of dynamic reconfiguration in high-speed local area networks". International Symposium on High-Performance Computer Architecture (HPCA), January \underline{2000}.R. Casado , A. Bermudez , FJ Quiles , JL Sanches , and J. Duato , "Performance evaluation of dynamic reconfiguration in high-speed local area networks". International Symposium on High-Performance Computer Architecture (HPCA), January \ underline {2000}.

[5][5]: S. Chalasani and R V. Boppana, "Communication in Multicomputers with Nonconvex Faulty". IEEE Trans. on Computers, vol. 46, no.5, pp. 616-622, \underline{1997}.S. Chalasani and R V. Boppana , "Communication in Multicomputers with Nonconvex Faulty". IEEE Trans. on Computers , vol. 46, no.5, pp. 616-622, \ underline {1997}.

[6][6]: M.S. Chen and K.G. Shin, "Adaptive Fault-Tolerant Routing in Hypercube Multicomputers", IEEE Trans. on Computers, vol. 39, no.12, pp. 1406-1416, December \underline{1990}.MS Chen and KG Shin , "Adaptive Fault-Tolerant Routing in Hypercube Multicomputers", IEEE Trans. on Computers , vol. 39, no.12, pp. 1406-1416, December \ underline {1990}.

[7][7]: D. Avresky, "Embedding and Reconfiguration of Spanning Trees in Faulty Hypercubes", IEEE Transactions on Parallel and Distributed Systems vol. 10 no.3, pp. 211-222, March \underline{1999}.D. Avresky , "Embedding and Reconfiguration of Spanning Trees in Faulty Hypercubes", IEEE Transactions on Parallel and Distributed Systems vol. 10 no.3, pp. 211-222, March \ underline {1999}.

[8][8]: M. Galles, "Spider: a high-speed network interconnect", IEEE Micro, vol. 17, no.1, pp. 34-39, Jan-Feb. \underline{1997}.M. Galles , "Spider: a high-speed network interconnect", IEEE Micro , vol. 17, no.1, pp. 34-39, Jan-Feb. \ underline {1997}.

[9][9]: P.T. Gaughan and S. Yalamanchili, "A Family of Fault-Tolerant Routing Protocols for Direct Multiprocessor Networks", IEEE Trans. on Parallel and Distributed Systems, vol. 6, no.5, pp. 482-497, May \underline{1995}.PT Gaughan and S. Yalamanchili , "A Family of Fault-Tolerant Routing Protocols for Direct Multiprocessor Networks", IEEE Trans. on Parallel and Distributed Systems , vol. 6, no.5, pp. 482-497, May \ underline {1995}.

[10][10]: Rpang, T. Pinkston, "The Double Scheme: deadlock-free reconfiguration of Cut-through Networks", International Conference of Parallel Processing (ICPP) August \underline{2000}. Rpang , T. Pinkston , "The Double Scheme: deadlock- free reconfiguration of Cut-through Networks", International Conference of Parallel Processing (ICPP) August \ underline {2000}.

[11][eleven]: V. Puente, J.A. Gregorio, F. Vallejo and R. Beivide, "Immunet: A Cheap and Robust Fault-Tolerant Packet Routing Mechanism". The 31st Annual International Symposium on Computer Architecture (ISCA2004), pp. 198-209, Munchen, Germany, June \underline{2004}.V. Puente , JA Gregorio , F. Vallejo and R. Beivide , "Immunet: A Cheap and Robust Fault-Tolerant Packet Routing Mechanism". The 31st Annual International Symposium on Computer Architecture (ISCA2004), pp. 198-209, Munchen, Germany, June \ underline {2004}.

[12][12]: V. Puente, C. Izu, R. Beivide, J.A. Gregorio, F. Vallejo and J.M. Prellezo, "The Adaptive Bubble Router", Journal of Parallel and Distributed Computing. vol 61, no. 9, September \underline{2001}.V. Puente , C. Izu , R. Beivide , JA Gregorio , F. Vallejo and JM Prellezo , "The Adaptive Bubble Router", Journal of Parallel and Distributed Computing . vol 61, no. 9, September \ underline {2001}.

Description of the invention Breve descripción de la invenciónBrief Description of the Invention

La invención, denominada S-Immunet, es un mecanismo eficiente para tolerar fallos en redes de interconexión de computadores paralelos y distribuidos. El mecanismo está basado, por una parte, en un método de reencaminar los mensajes cuando se produce un fallo en la red y por otra parte, en una estructura hardware específica del aparato encaminador de mensajes. Las principales diferencias con los mecanismos previos de tolerancia a fallos son que la invención puede ser aplicada a redes muy grandes (miles de nodos) de tipo k-ary n-cube y además la reparación de los componentes es tenida en cuenta y es recuperado el rendimiento de la red antes del fallo. El mecanismo es capaz de tolerar cualquier número de fallos de enlace y cualquier combinación espacial y temporal de fallos; después de cualquier fallo, permite llevar a cabo una reconfiguración automática y además es transparente a la aplicación que esta ejecután-
dose en el sistema multiprocesador. El mecanismo además no produce una sobrecarga de la red en ausencia de fallos.The invention, called S-Immunet, is an efficient mechanism for tolerating faults in interconnection networks of parallel and distributed computers. The mechanism is based, on the one hand, on a method of re-routing the messages when a network failure occurs and on the other hand, on a specific hardware structure of the message routing apparatus. The main differences with the previous fault tolerance mechanisms are that the invention can be applied to very large networks (thousands of nodes) of type k-ary n-cube and also the repair of the components is taken into account and the network performance before failure. The mechanism is capable of tolerating any number of link failures and any spatial and temporal combination of failures; After any failure, it allows an automatic reconfiguration to be carried out and is also transparent to the application that is being executed.
taking place in the multiprocessor system. The mechanism also does not cause network overload in the absence of failures.

Brief description of the content of the figures

Figura 1. Ejemplo de una rotura de un enlace en una red toroidal (3-ary 2-cubo).Figure 1. Example of a broken link in a toroidal network (3-ary 2-cube).

Figura 2. Nodo padre y nodos hijos del nodo 5 tras la comunicación del fallo por el nodo 4.Figure 2. Parent node and child nodes of node 5 after communication of the fault by node 4.

Figura 3. Camino o "tour" a lo largo del árbol y que formará el Anillo o Camino Seguro.Figure 3. Road or "tour" along the tree and that will form the Ring or Safe Path.

Figura 4. Ejemplos de uso de los canales de escape de primer y segundo orden sobre una red toroidal de dos dimensiones.Figure 4. Examples of use of the channels of first and second order escape over a two toroidal network dimensions.

Figura 5. Estructura del encaminador de mensajes que puede ser empleado en la invención.Figure 5. Structure of the message router which can be used in the invention.

Detailed description of the invention

Ante la aparición en la red de interconexión de uno o varios fallos de enlace o de nodo, S-Immunet permite la obtención automática de un "Anillo" que abarca todos los nodos de la red y aplicando sobre él la técnica conocida como Buffer Flow Control, BFC [12], dicho anillo pasa a ser un "Anillo o Camino Seguro" para que cualquier paquete alcance su destino en presencia de fallos. El proceso de obtención del Anillo se lleva a cabo sin tirar ningún paquete en tránsito.Before the appearance in the interconnection network of one or several link or node failures, S-Immunet allows the automatic obtaining of a "Ring" that covers all the nodes of the network and applying on it the technique known as Buffer Flow Control , BFC [12], said ring becomes a "Ring or Safe Path" so that any package reaches its destination in the presence of failures. The process of obtaining the Ring is carried out without throwing any package in transit.

El Anillo Seguro se obtiene a partir de la generación de un árbol (spanning tree), que siempre existe embebido en cualquier topología, y cuya raíz es el nodo que detectó el fallo. Llevando a cabo un "recorrido" (tour) a lo largo del mencionado árbol y aplicando sobre él la técnica BFC mencionada se garantiza la existencia de un Anillo Seguro para el intercambio de paquetes entre todos los nodos supervivientes de la red.The Safe Ring is obtained from the generation of a tree ( spanning tree ), which always exists embedded in any topology, and whose root is the node that detected the fault. Carrying out a "tour" (tour) along said shaft and applying on it the BFC mentioned technique ensures the existence of a Ring sure to exchange packets between all network nodes survivors.

La generación del árbol se lleva a cabo mediante un proceso trivial de difusión (broadcast) a sus inmediatos vecinos de la situación de fallo. Con la respuesta, o no, de sus vecinos cada nodo crea una tabla provisional de rutas que permite la actualización de las tablas de ruta necesarias para soportar el cambio topológico que ha forzado el fallo.The generation of the tree is carried out by means of a trivial process of diffusion ( broadcast ) to its immediate neighbors of the fault situation. With the response, or not, of its neighbors each node creates a provisional route table that allows the updating of the necessary route tables to support the topological change that has forced the failure.

Por ejemplo, se supone que en la red de la figura 1 se ha roto el enlace entre los nodos #4 y #1 y que el nodo #5 ha recibido una notificación de fallo del nodo #4 (y por consiguiente se ha convertido en su nodo padre) y sus inmediatos vecinos #3 y #8 han respondido a su reemisión (y se han convertido en sus hijos ordenados local y arbitrariamente). A partir de ese instante, el nodo #5 únicamente necesita recordar cuál es puerto correspondiente al "padre" en el árbol y cuál el de cada uno de los "hijos", como se muestra en la figura 2.For example, it is assumed that in the network of the figure 1 the link between nodes # 4 and # 1 has been broken and that node # 5 has received a failure notification from node # 4 (and therefore it has become its parent node) and its immediate neighbors # 3 and # 8 have responded to their remission (and have become their children ordered locally and arbitrarily). From that moment, the node # 5 only needs to remember which is corresponding port to the "father" in the tree and which of each of the "children", as shown in figure 2.

Es obvio que el recorrido a lo largo del árbol (figura 3) se genera sin más que cada nodo reenvíe los paquetes aplicando las reglas de la tabla 1. Además, cualquier paquete inyectado en este camino alcanzará su destino (aunque en el peor caso tenga que recorrer todo el anillo).It is obvious that the path along the tree (figure 3) is generated without more than each node forwarding the packets applying the rules in table 1. In addition, any package injected on this path will reach its destination (although in the worst if you have to go through the entire ring).

TABLE 1 Reglas para reenviar mensajes tras la aparición de fallosRules for forwarding messages after the appearance of failures

Paquetes provenientes depackages coming from Deben ser enviados aThey must be sent to hijo ison i hijo i+1son i + 1 padrefather Hijo 1 (primer hijo)Son 1 (first child) último hijolast child padre ^{(1)}father ^ (1)} puerto de inyeccióninjection port padre ^{(1)}father ^ (1)} canal adaptativoadaptive channel padre ^{(1)}father ^ (1)} (1) Una excepción es la raíz del árbol (no tiene padre). Los paquetes deben ser enviados al primer hijo.(1) An exception is the root of the tree (has no father). Packages must be sent to the first child

Una vez que cada nodo dispone de la información necesaria para enviar paquetes a cualquier otro nodo de la red, se actualizan las tablas de rutas definitivas resultantes de la nueva situación de fallo y se lleva a cabo el envío, tanto de los nuevos paquetes que se generen por las aplicaciones, como los que se encontraban en las colas de tránsito (buffers) esperando la reconfiguración.Once each node has the necessary information to send packets to any other node in the network, the definitive route tables resulting from the new fault situation are updated and the sending of both new packets is carried out. generated by applications, such as those found in traffic queues ( buffers ) waiting for reconfiguration.

La capacidad de soportar cualquier combinación espacio-temporal de fallos se consigue mediante del empleo de un identificador único de fallo, EPL (Emergency Priority Level), generado con cada nuevo fallo y dependiente del propio identificador (ID) del nodo que lo detecta y del número de fallos previos que haya soportado dicho nodo hasta ese instante. Por ejemplo, en una red de N nodos, si un encaminador cuyo identificador es ID=x detecta un nuevo fallo, generará un EPL igual a tN+x, siendo t el número de reconfiguraciones previas experimentadas por el nodo. Así, el valor EPL generado tras la detección de un fallo siempre es distinto y por tanto se genera un solo árbol.The ability to support any spatio-temporal combination of faults is achieved through the use of a unique fault identifier, EPL ( Emergency Priority Level ), generated with each new fault and dependent on the identifier (ID) of the node that detects it and the number of previous failures that node has supported until that moment. For example, in a network of N nodes, if a router whose identifier is ID = x detects a new fault, it will generate an EPL equal to tN + x , where t is the number of previous reconfigurations experienced by the node. Thus, the EPL value generated after the detection of a fault is always different and therefore a single tree is generated.

Aunque la metodología es completamente general, cuando disponemos de una red compuesta por miles de nodos, el fallo en un solo componente, o de unos pocos, (son las situaciones más probables) puede producir una degradación en el rendimiento muy acusada, como consecuencia de la elevada longitud del Anillo Seguro resultante. Además, si el componente se repara, el mecanismo no es capaz de recuperar el funcionamiento original libre de fallos. Mediante S-Immunet se resuelven estos dos problemas para redes de cualquier tamaño con una estructura del tipo k-ary n-cube.Although the methodology is completely general, when we have a network composed of thousands of nodes, the failure in a single component, or a few, (are the most likely situations) can cause a very marked performance degradation, as a consequence of the high length of the resulting Safe Ring. In addition, if the component is repaired, the mechanism is not able to recover the original malfunction free operation. By means of S-Immunet, these two problems are solved for networks of any size with a structure of the type k-ary n-cube .

La invención asocia a la red de interconexión tres redes virtuales diferentes. La primera red es completamente adaptativa y con encaminamiento mínimo. En la segunda red virtual, denominada "red virtual de escape de primer orden", se supondrá que la red se encuentra libre de fallo, aunque esto no sea cierto. Por lo tanto, los paquetes en esta red se encaminarán en orden de dimensión (X, Y, Z,..). En una situación de fallo es imposible lograr que todo el tráfico alcance su destino porque el camino en alguna dimensión estará cortado. Cuando un paquete que se encuentre empleando esta red alcance un recurso en fallo que impida seguir aplicando encaminamiento en orden de dimensión, estará habilitado para emplear la tercera red virtual, denominada "red virtual de escape de segundo orden". En esta red aplicaremos el Anillo Seguro descrito anteriormente como mecanismo de evitación de bloqueo. Puesto que sabemos que la red virtual de escape de segundo orden es libre de bloqueo, independientemente de la configuración de los fallos, toda la red será libre de fallo.The invention associates with the interconnection network Three different virtual networks. The first network is completely adaptive and with minimal routing. In the second virtual network, called "first-order virtual escape network", it it will assume that the network is free from failure, even if this is not true. Therefore, the packets in this network will be routed in dimension order (X, Y, Z, ..). In a fault situation it is impossible to get all traffic to reach its destination because the path in some dimension will be cut. When a package is find using this network reach a failed resource that prevents continue applying routing in order of dimension, it will be enabled to use the third virtual network, called "network second-order virtual escape. "In this network we will apply the Safe Ring described above as a mechanism to avoid blocking. Since we know that the second virtual escape network Order is lock free, regardless of configuration of failures, the entire network will be free from failure.

Bajo las condiciones previamente expuestas, se dispone de dos redes virtuales con posibilidad de bloqueo y una tercera libre de bloqueo bajo cualquier circunstancia. Los tipos de tráfico que pueden emplear cada una de las tres redes virtuales y que el encaminador se encargará de determinar son los siguientes:Under the conditions previously stated, it It has two virtual networks with the possibility of blocking and one third block free under any circumstances. The types of traffic that can be used by each of the three virtual networks and that the router will determine are the following:

--: La red virtual adaptativa está disponible para cualquiera que sea el tipo de tráfico y su recorrido previo.The net Adaptive virtual is available for whatever type of traffic and its previous route.

--: La red virtual de escape de primer orden está disponible para todos aquellos paquetes que no han encontrado un fallo en su camino a destino, provengan de inyección, de la red virtual adaptativa o estén avanzando por ella misma.The net First-order virtual escape is available to all those packages that have not found a bug on their way to destination, come from injection, from the adaptive virtual network or are advancing by herself.

--: La red de escape de segundo orden podrá ser empleada exclusivamente por los paquetes que habiendo circulado previamente por la red de escape de primer orden encontraron un recurso en fallo en su camino.The net Second-order exhaust may be used exclusively by the packages that having previously circulated through the exhaust network of First order they found a resource in failure in their way.

En la Figura 4 se muestran varios ejemplos de rutas para diferentes paquetes. Esta política resulta extremadamente simple de aplicar e implica únicamente un bit de sobrecarga por paquete. Este tipo de política de asignación permite limitar la aplicación del Anillo Seguro solamente a aquel tráfico que se encuentre frontalmente con un recurso en fallo. Cuando el número de recursos en fallo es relativamente bajo, la mayor parte del tráfico fluirá como si la red estuviera libre de fallos. Así, la pérdida de rendimiento debido al Anillo Seguro se ve eliminada en su mayor parte, pues el número de paquetes que alcanzan los fallos es reducido, siendo, en consecuencia, el nivel de ocupación de la red de escape de segundo orden muy reducido y logrando que el efecto de la longitud del Anillo Seguro sea prácticamente despreciable.Several examples of routes for different packages. This policy results extremely simple to apply and involves only a bit of package overload This type of allocation policy allows limit the application of the Safe Ring only to that traffic that is frontally with a resource in failure. When he number of resources in failure is relatively low, most of traffic will flow as if the network were free from failures. So, the loss of performance due to the Safe Ring is eliminated for the most part, as the number of packages that reach failures are reduced, being, consequently, the level of occupation of the second order exhaust network very small and making the Effect of the length of the Ring Sure is practically negligible.

En una situación libre de fallo no es necesario aplicar ninguna clase de mecanismo adicional para evitar el bloqueo originalmente propuesto en [12]. Aplicando encaminamiento en orden de dimensión en una de las tres redes virtuales de que disponemos es suficiente (la red de escape de primer orden). La red de escape de segundo orden es empleada como un segundo canal adaptativo adicional.In a fault-free situation it is not necessary apply any kind of additional mechanism to avoid blocking originally proposed in [12]. Applying routing in order of dimension in one of the three virtual networks that we have it is enough (the first order exhaust network). Exhaust network second order is used as a second adaptive channel additional.

Por último, la invención permite la recuperación completa del rendimiento tras la reparación completa de la red. Así, si suponemos que se desencadena un proceso de reconfiguración en la red después de cualquier clase de reconfiguración, resulta sencillo determinar si todos los nodos de la red tienen todos sus enlaces funcionando. Para una red k-ary n-cube, cualquiera de los encaminadores libres de fallos debe contar con 2*n enlaces bi-direccionales operativos. Globalmente, la red ha de disponer de 2*n *k^{2} enlaces en funcionamiento. Por tanto, conocida la cantidad, denominada w, de enlaces operativos, la red estará libre de fallo si y solo si w= 2*n*k^{2}.Finally, the invention allows full recovery of performance after complete network repair. Thus, if we assume that a reconfiguration process is triggered in the network after any kind of reconfiguration, it is easy to determine if all the nodes of the network have all their links functioning. For a k-ary n-cube network , any of the fault-free routers must have 2 * n operational bi-directional links. Globally, the network must have 2 * n * k2 links in operation. Therefore, known the amount, called w, of operational links, the network will be free of failure if and only if w = 2 * n * k2.

W puede conocerse fácilmente durante la actualización de las tablas en el proceso de reconfiguración. Cada nodo envía un paquete de actualización hacia su nodo padre con el identificador del nodo emisor, el nivel de prioridad EPL y el número de enlaces disponibles en ese nodo particular. Así, el nodo raíz determina el número de enlaces correctos w. Si w= 2*n*k^{2}, el nodo raíz envía un paquete de control específico junto con el nivel EPL de la reconfiguración a sus inmediatos hijos y estos lo retransmitían a los suyos. Cada nodo que reciba este paquete desechará el uso del Anillo Seguro y a partir de ese momento empleará los recursos previamente dedicados a la red de escape de segundo orden (Anillo Seguro) a encaminar los paquetes de forma adaptativa. Repitiendo este proceso en todos los nodos de la red se dispondrá otra vez de dos canales adaptativos y se restaurará el rendimiento original de la red. W can easily be known during the updating of the tables in the reconfiguration process. Each node sends an update packet to its parent node with the sender node identifier, the EPL priority level and the number of links available on that particular node. Thus, the root node determines the number of correct links w. If w = 2 * n * k2, the root node sends a specific control packet along with the EPL level of the reconfiguration to its immediate children and they retransmit it to their children. Each node that receives this packet will discard the use of the Secure Ring and from then on it will use the resources previously dedicated to the second order exhaust network (Safe Ring) to route the packets adaptively. By repeating this process on all the nodes of the network, two adaptive channels will be available again and the original network performance will be restored.

Example of embodiment of the invention

Para mejor comprensión del invento, a continuación se describe un ejemplo de aplicación. Suponer la existencia de una red de interconexión de computadores del tipo k-ary n-cube como la de la figura 1 pero que puede estar compuesta por miles de nodos o elementos de cómputo. Cada uno de los encaminadores que forman la red tiene una estructura como la de la figura 5. Tres canales virtuales (sin fallos, dos serán adaptativos y uno de escape), un elemento de conmutación o crossbar que permite la interconexión de las entradas y las salidas, así como con los elementos de cómputo local. La selección de los canales de entrada que tendrán acceso a los de salida la lleva a cabo un Árbitro. La Unidad de Encaminamiento (RU) emplearía dos tablas. Una pequeña, mediante la que se indica los puertos que constituyen el camino de escape en cada momento (Tabla Escape) y la otra tabla (Tabla Adaptativa) tendrá una entrada por cada nodo de la red y será empleada para encaminar los mensajes dependiendo del nodo destino.For a better understanding of the invention, an application example is described below. Assume the existence of a network of interconnection of computers of the type k-ary n-cube as in Figure 1 but which may be composed of thousands of nodes or computational elements. Each of the routers that make up the network has a structure like the one in Figure 5. Three virtual channels (without failures, two will be adaptive and one escape), a switching or crossbar element that allows the interconnection of the inputs and Departures, as well as with the local computing elements. The selection of the input channels that will have access to the output channels is carried out by an Arbitrator. The Routing Unit (RU) would use two tables. A small one, indicating the ports that constitute the escape path at each moment (Escape Table) and the other table (Adaptive Table) will have an entry for each node of the network and will be used to route the messages depending on the node destination.

Si un nodo desea enviar un mensaje a otro que se encuentra a una distancia (\Deltax, \Deltay) 5 empleará prioritariamente cualquiera de los dos canales virtuales adaptativos (Tabla Adaptativa). En caso de que no pueda hacerlo, intentará entrar en el canal de escape que siempre seguirá una ruta en orden de dimensión (DOR). Es decir, recorrerá primero el anillo de las Xs hasta agotar esta componente de su dirección (la distancia \Deltax sea 0), luego continuaría por el anillo correspondiente al eje de las Ys hasta su destino (la distancia \Deltay sea 0). No obstante, en su avance volverá a los canales adaptativos siempre que pueda.If a node wishes to send a message to another that is at a distance (\ Delta x , \ Delta y ) 5 it will use either of the two adaptive virtual channels (Adaptive Table). In case you cannot do it, you will try to enter the escape channel that will always follow a route in order of dimension (DOR). Ie first traverse the ring Xs while supplies this component address (the distance \ Delta x is 0), then continue through the corresponding ring to the axis of the Ys to destination (the distance \ Delta and is 0) . However, in your advance you will return to adaptive channels whenever you can.

Si algún encaminador detecta un fallo en alguno de sus enlaces con algún vecino, entra en un estado de fallo, detiene la comunicación con todos los demás nodos, se convierte en el nodo raíz y, empleando las líneas hardware, se lo comunica a sus vecinos. Estos lo interpretarán como el comienzo de un proceso de reconfiguración y tras responder al requerimiento se comunicarán a su vez con sus vecinos y así hasta alcanzar el último nodo de la red. Cada uno de los encaminadores apuntará en su pequeña tabla (Tabla Escape) cuál fue el puerto por el que recibió el requerimiento de paso a modo de fallo (nodo padre a partir de entonces) y quienes de sus vecinos le han ido respondieron al suyo (nodos hijos en el orden de la respuesta). Tras un corto periodo de tiempo, todos los encaminadores tienen su pequeña tabla, de forma que juntos forman un árbol (como el de la figura 3) cuya raíz es el nodo que detectó el fallo. Por tanto, cualquier encaminador que desee enviar un paquete a cualquier otro, dispone de un camino para hacerlo. Si además para introducir un nuevo paquete en este camino se añade la condición de mantener un hueco adicional, este camino es "seguro", es decir, no podrán ocurrir bloqueos mutuos entre paquetes. A partir de este instante, en un sencillo proceso de difusión, vuelven a actualizarse las tablas adaptativas para tener en cuenta el cambio topológico producido por el enlace roto.If a router detects a failure in any of your links with a neighbor, enters a state of failure, stops communication with all other nodes, becomes the root node and, using the hardware lines, communicates it to its neighbors. These will interpret it as the beginning of a process of reconfiguration and after responding to the requirement they will be communicated to turn with its neighbors and so on until reaching the last node of the net. Each of the routers will point at their small table (Escape Table) what was the port through which he received the step-by-step failure requirement (parent node from then) and those of your neighbors have responded to yours (children nodes in the order of the response). After a short period of time, all routers have their little table, so that together form a tree (like the one in figure 3) whose root is the node that detected the failure. Therefore, any router that you want to send a package to any other, you have a way to do what. If also to introduce a new package in this way the condition of maintaining an additional gap is added, this path is "safe," that is, mutual blockages cannot occur between packages. From this moment, in a simple process of diffusion, adaptive tables are updated again to have consider the topological change produced by the broken link.

A partir de ese momento, si un nodo desea enviar un mensaje a otro que se encuentra a una distancia (\Deltax, \Deltay), igual que cuando se encuentra libre de fallos empleará prioritariamente el único canal adaptativo de que ahora dispone el encaminador (Tabla Adaptativa actualizada). En caso de que no pueda hacerlo intentará entrar en el canal de escape de primer orden, que siempre seguirá una ruta DOR como si no hubiese ningún enlace roto. Si en el camino DOR el paquete se encuentra con un enlace roto y por lo tanto no puede avanzar, en este instante pasará al camino de escape de segundo orden o Anillo Seguro que se acaba de crear con la reconfiguración a costa de uno de los canales adaptativos.From that moment on, if a node wishes to send a message to another that is at a distance (\ Delta x , \ Delta y ), just as when it is free of faults, it will use the only adaptive channel currently available to the router (Updated Adaptive Table). In case you can't do it, you will try to enter the first-order escape channel, which will always follow a DOR route as if there were no broken links. If on the DOR path the package encounters a broken link and therefore cannot move forward, it will now go to the second order escape route or Safe Ring that has just been created with the reconfiguration at the expense of one of the channels adaptive

Así, con un número pequeño de enlaces rotos, la degradación del rendimiento será muy pequeña porque lo más probable es que el paquete no se encuentre con el enlace en mal estado, sobremanera en las redes con un gran número de nodos que es además donde más degradación se produce cuando no se aplica el mecanismo en el que se basa la invención que se esta describiendo.Thus, with a small number of broken links, the performance degradation will be very small because most likely is that the package is not with the link in bad condition, greatly in networks with a large number of nodes which is also where more degradation occurs when the mechanism is not applied in which is based on the invention that is being described.

Si el enlace roto es reparado, se producirá un proceso de reconfiguración y durante la actualización de las tablas se lleva a cabo también la comunicación a los nodos padre de los enlaces en correcto estado, hasta llegar al nodo raíz que puede determinar si todos los enlaces de la red están libres de fallos. Si es así, emitirá un paquete específico para indicar a todos los nodos de la red que a partir de ese momento pueden volver a usar el canal de escape de segundo orden como un segundo canal adaptativo. Es decir, se restaura el rendimiento original de la red.If the broken link is repaired, there will be a reconfiguration process and during tables update communication is also carried out to the parent nodes of the links in correct condition, until you reach the root node that can Determine if all network links are free from bugs. Yes If so, it will issue a specific package to indicate all network nodes that from then on can reuse the second order escape channel as a second adaptive channel. That is, the original network performance is restored.

Si durante éste o en cualquier otro instante ocurriese un nuevo fallo, la existencia de un único identificador EPL garantiza una nueva y correcta actualización de las tablas de encaminamiento.If during this or at any other time a new fault occurs, the existence of a unique identifier EPL guarantees a new and correct update of the tables of routing

Claims

1. Highly scalable fault-tolerant routing mechanism that is characterized by its application, especially in regular direct networks of the large k-ary n-cube type and in which a message routing device is distinguished in each node network with table-based routing; a method to reroute messages based on the automatic creation of an escape path over a ring; and a method to recover network performance after the repair of failures.

2. Highly scalable fault tolerant routing mechanism according to claim 1, characterized in that the message routing device of each node consists of at least three virtual channels per physical channel. In the absence of failures two of the channels are used as adaptive and the third as deterministic.

3. Highly scalable fault-tolerant routing mechanism according to claim 1, characterized in that the routing device messages from each node in the presence of one or more faults automatically reassigns one of the adaptive channels to form part of a large ring that joins all the nodes of the network and that can be used as a second order escape channel.

4. Highly scalable fault-tolerant routing mechanism according to claim 1, characterized in that the method for re-routing messages, based on the automatic creation of an escape path over a ring, is carried out by creating a spanning tree composed of all the nodes of the network and whose root is the one that detected the fault.

5. Highly scalable fault-tolerant routing mechanism, according to claim 1, characterized in that the method for recovering network performance after fault repair is based on automatic communication to the root node, by any node of the network , of any repair detected.