WO2005109754A1

WO2005109754A1 - System and method for real-time monitoring and analysis for network traffic and content

Info

Publication number: WO2005109754A1
Application number: PCT/US2005/014733
Authority: WO
Inventors: Koon-Seng Lim; Yuntai Kyong; Earl A. Rogers
Original assignee: Synematics, Inc.
Priority date: 2004-04-30
Filing date: 2005-05-02
Publication date: 2005-11-17

Abstract

A system and method for monitoring and analyzing traffic on a packet-based communications network including interconnected switching elements. The system has a plurality of network probes, each having a database for storing statistical information derived from captured network traffic and a query engine for receiving and responding to queries for information stored in said tables. The network probes are synchronized with one another. Each probe has associated with it an interface tap for communicatively connecting each network probe to a switching element in the network to allow the probe to capture network traffic from the switching elements. A centralized management console sends queries for information stored on the databases along a network path, and the results of the queries are propagated back along the path with aggregation being performed by intermediate network probes.

Description

SYSTEM AND METHOD FOR REAL-TIME MONITORING AND ANALYSIS FOR NETWORK TRAFFIC AND CONTENT SPECIFICATION CROSS REFERENCE TO RELATED APPLICATIONS This application claims priority to a United States Provisional

Application entitled "Method and System for Time-Efficient Monitoring and Analysis of Network Traffic and Content" Serial No. 60/567,084, which was filed on April 30,

2004, and is incorporated by reference.

FIELD OF THE INVENTION The present invention relates generally to a method and system that enables a user to monitor and analyze network traffic and content. More particularly, a user will be able to sift through a high volume of packets of information flowing in real-time to look for patterns, locate resources, aggregate data and correlate information. BACKGROUND OF THE INVENTION The Internet is the fastest growing medium of all time. Part of its popularity and phenomenal growth has been attributed to the invention of the search engine. Today, more than 319 million web searches are performed daily by the world's 665 million internet users. But there remains a need in the art for a technology that can tap into humanity's digital communications (including voice, video and data, etc.), automatically organizing and filing away the information it encounters into a collection of massive databases that can be analyzed in real-time.

SUMMARY OF THE INVENTION The present invention exemplifies a real-time performance monitoring platform for IP networks. The platform allows service providers to observe and monitor the dynamics of packet traffic in large networks in real-time. The platform consists of network probes which capture traffic at line rate and perform detailed analysis, measure quality of service metrics along network paths between pairs of probes, and store all analysis and measurement results in relational tables. The platform also consists of a centralized management console which allows users to issue queries to extract and compute data from tables stored on probes in real-time, provides a front-end for interactive visualization of traffic, and manages and configures the network probes. A typical installation of the platform consists of a collection of the hardware probes placed at strategic monitoring points in the network and the web-based appliance centralized management console. An operator configures and operates the system via the console which presents information collected from the network probes. The present invention utilizes distributed query propagation and processing to achieve real-time performance. Network probes autonomously detect the presence of other probes during power up and create a peer-to-peer network encompassing other probes in the network. Queries issued from the console are propagated along a minimal delay spanning tree to all probes. Intermediate results are incrementally aggregated at each probe so that the amount of time required to execute a query is proportional only to the length of the longest path. This allows queries for very large networks to be completed in near real-time. Network parameters that an operator utilizing the present invention may wish to monitor include, but are not limited to, flow statistics, traffic composition, quality of service statistics, and TCP statistics. Flow statistics may include top flows in the network ranked by bit rate, packet rate, or byte control, as well as source, destination addresses, ports, Type-of-Service, application, protocol, packet size histogram and TCP statistics for TCP flows. Traffic composition may include the break down of total traffic by application type, Type-of-Service and protocol for the whole network and at each monitoring point. Quality of service statistics may include one way/round-trip end-to-end/hop-by-hop delay, the average, max, min and histogram of end-to-end/hop-by-hop jitter, and the end-to-end/hop-by- hop loss rate. TCP statistics may include statistics for zero-window events, push bit set, urgent bit set, and resets, as well as number of open/closed connections and average connection setup times. In accordance with an exemplary embodiment of the present invention, this monitoring and analysis tool allows the information carried by computer networks to be automatically captured, organized and stored in databases that can be searched in real-time. Like a web search engine which allows users to see the billions of web pages on the Internet in terms of keywords and phrases, this tool makes accessible information about the network's structure and the content flowing through its millions of connections. When deployed in corporate networks, the present invention allows its users to sift through billions of packets of information flowing in real-time to look for patterns, locate resources, aggregate data or correlate information. An exemplary embodiment of the present invention can analyze information generated by all popular network applications including the web, email, instant messaging, etc. and can be extended to process information generated by other proprietary applications. This analysis is performed across each point in the network where the present invention is installed, actively aggregating, correlating and processing information in parallel. Both simple keyword searches like a web search engine, as well as sophisticated structured database queries may be used, making it a unique tool for mining, processing and analyzing a network and its traffic content. Network service providers can use an exemplary embodiment of the present invention to visualize and analyze network usage patterns, perform tests, correlate distributed measurements, and to understand the real-time dynamics of traffic that flows through their global networks. Security providers can use an exemplary embodiment of the present invention to detect unauthorized network intrusions, isolate suspicious online activity or log packets of malicious traffic. Governments and law enforcement agencies can use an exemplary embodiment of the present invention to search for specific patterns of communication, correlate events in space and time and track network traffic to physical locations. Finally, information research organizations can use an exemplary embodiment of the present invention to understand online consumer behavior, identify trends and analyze their impact as they are occurring. Some corporate users, such as network service providers, can utilize an exemplary embodiment of the present invention to generate new revenue streams by offering new services to their customers. Examples of such services include real-time traffic analysis, pattern searching or security monitoring. Other corporate users can simply benefit from the real-time speed of the present invention to reduce response times in the event of failures, identify problems before they occur or to improve overall network resource utilization. The present invention is in essence a supercomputer constructed from a federation of off-the-shelf PCs, each augmented with specialized hardware (described below, exemplary embodiment of the acquisition element 210) that allows the processing of network traffic at rates above a billion bits per second. The present invention achieves its speed through parallel processing controlled by distributed algorithms (e.g., the echo pattern discussed below and in separate filings U.S. Serial No. 60/461,221 and International Application No. PCT/US2004/010646, hereby incorporated by reference) designed to maximize search speed and robustness. The present invention can theoretically search, extract or process information from a network of 200 million nodes in as fast as 2 seconds; accessing each node sequentially would require more than 3 weeks using the fastest PC available today. The present invention has the potential to change the way large networked systems are managed, maintained and used in fundamental ways. By allowing the collection and processing of information on a scale not possible previously, the technology can be a key enabler for whole new classes of applications that exploit real-time searching, data correlation and aggregation. Indeed many of the potential applications of the technology are just being developed, in areas as diverse as infrastructure performance management, cyber defense and homeland security.

BRIEF DESCRIPTION OF THE DRAWINGS Further objects, features and advantages of the invention will become apparent from the following detailed description taken in conjunction with the accompanying figures showing illustrative embodiments of the invention, in which Figure 1 is a block diagram depicting a system and method for searching for network traffic and content data. Figure 2 is a block diagram depicting the functional elements of a network probe. Figure 3 illustrates the network architecture of a scalable analysis element of a network probe. Figure 4 illustrates an exemplary distributed architecture of the present invention. Figure 5 is a matrix representing an exemplary active traffic flow. Figure 6 illustrates an exemplary network instrumented with the present invention. Figure 7 illustrates wire-speed monitoring of a network probe. Figure 8 is a screenshot of an exemplary configuration page for a centralized management console. Figure 9 illustrates how to connect an exemplary network probe to a switch/router for monitoring. Figure 10 is a screenshot of an exemplary configuration page for a network probe. Figure 11 illustrates two network probes connected together via the synchronization ports. Figure 12 is a screenshot of an exemplary configuration page for a network probe. Figures 13, 14, 15, and 16 are screenshots of exemplary matrices and charts representing monitored traffic flows which can be viewed using the centralized management console. Figure 17 is a screenshot illustrating a web-interface supported by the centralized management console which allows users to type in queries. Figures 18 and 19 are screenshots illustrating tools a user can use to generate matrices and charts to view monitored traffic flow data. Figure 20 is a screenshot of an exemplary query results chart which can be viewed using the centralized management console. Throughout the figures, the same reference numerals and characters, unless otherwise stated, are used to denote like features, elements, components or portions of the illustrated embodiments. Moreover, while the subject invention will now be described in detail with reference to the figures, it is done so in connection with the illustrative embodiments. It is intended that changes and modifications can be made to the described embodiments without departing from the true scope and spirit of the subject invention as defined by the appended claims.

DETAILED DESCRIPTION OF THE INVENTION Figure 4 illustrates an exemplary network enabled by the present invention. The network consists of a collection of network probes, 410, connected to switches/routers to be monitored, 415, and a centralized management console

(console), 420. The network probes consist logically of a search, data collection, correlation and aggregation engine (called the query engine), 430, an interface tap for passive wire-speed packet monitoring and protocol analysis (called the tap), 440, and an input, 450, for synchronized time source, 455. The network probes, 410, are placed at strategic points in the network and collaboratively create a distributed searchable, data collection, correlation, and aggregation grid. The centralized management console, 420, provides an interface to the distributed system and allows the creation and dispatch of searches to the system. The network probes are typically placed at significant demarcation points in the network. These points may correspond to administrative boundaries (e.g., between a provider and its customer, or between two provider networks), logical boundaries (e.g., between the edge and core of a network), or any point where significant utility can be realized from detailed traffic analysis (e.g., frequently congested points or along critical paths). Each probe supports a frame capture interface that is typically attached to a mirrored or tapped port on a switch/router and a traffic injection interface that is connected to the same switch/router. Traffic statistics captured by each probe are correlated in real-time with data from other probes across the network, to provide network-wide views on the conditions and states of various paths and regions in the network. Each probe can also be configured to periodically inject test traffic into the network to estimate the quality of service of paths between probes. The query engine, 430, in a probe is optimized to execute queries which involve either retrieving data stored on the probe, or aggregating intermediate results to be returned. Depending on the nature of the computation being performed, the query engine can typically execute hundreds of queries per second. The tap, 440, is a hardware assisted packet filter that can decode and process Layer 1 and above packets at wire-speed. Under the control of a real-time operating system, the tap can also be programmed to inject packets into the network at precise intervals to simulate loading conditions. High level call generators are bundled that allow the simulation of most IEFT and VoIP protocols. The tap may be chosen from a wide-variety of hardware interfaces and speeds ranging from Fast Ethernet and 1GB Ethernet up to OC-192. An exemplary hardware specification for a centralized management console is a Xeon Class 1U rackmount server running Red Hat Linux 9 with 512MB RAM, 73GB SCSI HD, and 1 10/100/1 OOOBaseTX Management Port. An exemplary configuration page for a console is shown in Figure 8. An exemplary hardware specification for a network probe is a Xeon Class 1U rackmount appliance with 4 RJ-45 ports: 1 x Management Port (10/100/lOOOBaseTX), 1 x Traffic Injection Port (10/100/lOOOBaseTX), 1/2 x Capture Port (GIGE/OC12/OC48/OC192), and 1 x Synchronization Port. Figure 9 illustrates how to connect a network probe to a switch/router for monitoring. The Management Port, 910, is connected to the Management Network, 920. The Management Network 920 is a logical network, interconnecting devices and applications that are used to monitor and control the network being managed. The Management Network 920 may be a physically separate network or share the physical resources of the managed network. The Injection Port, 930, is connected to the port closest to the router/switch, 940. The Capture Port, 950, is connected to the router/switch, 940. The network probe uses specialized hardware (described below, exemplary embodiment of the acquisition element 210) to timestamp each frame as it is captured. The synchronization port, 960, allows probes to get timing from an external reference clock (GPS or CDMA) with 50- 100ns accuracy. A synchronization port may be hooked to a synchronization port of another probe or it may be connected to an external timing source. Figure 11 illustrates two network probes connected via their respective synchronization ports. Network Probe 1 , 1110, synchronizes with NTP server, 1120, via the Management Port 1130. Network Probe 2, 1140, synchronizes with Network Probe 1 , via the Synchronization Ports, 1150. Exemplary configuration pages for a network probe are shown in Figures 10 and 12. The present invention employs both simple keywords as well as structured queries for searching its infrastructure. When searching with keywords, the present invention operates like a search engine, checking its distributed repositories for patterns that match the keyword. For structured queries, the present invention preferably utilizes a query language which is modeled after the Structured Query Language (SQL), the industry standard language for relational databases. The syntax and semantics of the query language used in the present invention is preferably designed to resemble SQL. From a systems integration viewpoint, there is only a slight difference between writing scripts and programs to collect, aggregate and correlate information in the network and writing database applications. The complexities that are typically associated with distributed programming may be abstracted away by the language. In effect, an SQL-like query language turns a network into a large real-time database whose contents can be quickly retrieved. The queries return information in tabular format, preferably encapsulated in the Extensible Markup Language (XML). These results can be posted through a number of channels, such as socket and web-based APIs, to third party applications, management systems or stored in enterprise databases. The results may also be displayed to a user through a web-based API for generating charts and reports. Figures 18 and 19 illustrate the types of charts and reports that may be generated. Figure 20 illustrates how query results may be displayed to a user. Queries created at the console are dispatched in parallel across the network to the network probes along a minimal delay spanning tree and the results are propagated back along the spanning tree with aggregation being performed by intermediate network probes. The dispatch point, forwarding and aggregation paths traversed by a query are chosen so as to minimize the overall delay. The manner in which queries are dispatched and forwarded across the network is described by the echo pattern, discussed below and in separate filings U.S. Serial No. 60/461,221 and International Application No. PCT/US2004/010646, hereby incorporated by reference. In general, the speed at which queries complete is proportional to the diameter of the network and not on the number of nodes. As a result, real-time performance in data collection, aggregation and correlation can be achieved even across very large networks. In addition to speed, the distributed architecture of the present invention allows processing and network transport load to be spread out across multiple probes in the network. Thus, the console never becomes a bottleneck regardless of the size of the network. A secondary benefit is that there is no one single point of failure in the system. Even if parts of the physical infrastructure fail, the distributed algorithms (e.g., the echo pattern discussed below and in separate filings U.S. Serial No. 60/461,221 and International Application No. PCT/US2004/010646, hereby incorporated by reference) employed automatically route queries around failed nodes and links. Also, since the network probes do not depend on the console for query execution, the system will continue to function even if the console is offline. In one embodiment, users may dispatch queries directly to network probes. Figure 1 is a block diagram depicting a system and method for searching for network traffic and content data. First, the user at computer 105 submits a query, shown by arrow 110, to the centralized management console 115 (console). Next, the console transmits the query to a network probe for processing, as shown by arrow 120. At 130, that network probe in turn propagates the query to other network probes across the network (which may in turn propagate the query further, 140) where the required data is collected. At 150, each network probe aggregates data from child nodes when appropriate and returns the collected data to the parent. Ultimately the required data is sent back to the console at step 160. Finally, at step 170, the console formats the query results for user presentation. In an exemplary embodiment of the present invention, configuration of both the centralized management console and the network probes are accomplished by using an SSL-enabled web-browser to connect to the IP address of the units' management interface. In an exemplary embodiment of the present invention, the default IP address of the management interface on the console and a network probe is 192.168.1.100 and 192.168.1.50 respectively. To initially configure each network probe, the user must supply the IP address of the console's management interface. Once this is accomplished further detailed configuration of a probe can be achieved from the console's web interface. Each network probe is supplied with the address of the console during initial configuration. In the boot up process, and periodically afterwards, each probe registers its address with the console. An exemplary embodiment of the present invention uses a query language modeled after the Structured Query Language (SQL). The query language used in the present invention is a flexible SQL-like language (encapsulated in XML) designed to allow the querying, aggregation and correlation of management information from network nodes to be specified without programming. The queries can be very compact performing operations in a single statement that ordinarily require thousands of lines of code. The queries can be stored as templates and reused in many applications with similar data retrieval and processing requirements by changing only a few parameters in the templates. An exemplary embodiment of the present invention includes an extensive library of pre-built query templates and functions for computing common network statistics. Query tables can be exported in XML to third party applications for other uses. The following is a non-exhaustive list of examples of searches that may be conducted on a system according to the present invention:

The SQL-like language includes extensions to SQL, such as the definition of the start node, which relate to the distributed nature of the network database and the real-time quality of network information. The syntax of an exemplary query is given as follows: SELECT <columns> FROM <tables> [ ON <startnode> [ FOR <hops> ]] [ WHERE <conditions> ] [ GROUP BY <groups> [ HAVING <having> ]] [ ORDER BY <ordering> ] [ LIMIT <limit> ] Here, <columns>,<groups> and <ordering> refer to columns of the virtual global tables (see below) or to operations on these columns, <tables> refers to the names of virtual global tables, <startnode> refers to the IP address of the start node of the query, <hops> restricts the execution of the query to a specific distance from the start node, <conditions> and <having> refer to boolean expressions that filter the rows returned by the query and <limit> specifies the maximum number of rows returned. A query in this SQL-like language is executed against a set of global virtual tables, which consist of all data records that make up the local tables stored on the network probes. For each type of local table, there is a corresponding virtual global table with the same structure. An exemplary implementation includes the following global virtual tables; namely, the Device, Interface, System and Flow tables. Each global query is translated into three SQL sub-queries, which are executed on the network probes against their local tables. This translation process is described below. For illustration purposes, the following are three example queries together with possible returned results: • Query A: "List the heaviest flows currently in the network" Query A lists addresses and ports of the top 3 IP flows in the network, ordered by their bit rate in the three minute interval between 2004-04-18 05:23:00 and 2004-04-18 05:26:00 UTC. SELECT MAX((ByteCount*8)/SamplingInterval) as BitRate, Srclp, Dstlp, DstPort FROM Flow

GROUP BY Srclp, Dstlp, DstPort WHERE Timestamp <= "2004-04-18 05:26:00" and Timestamp >= "2004-04-18 05:23:00" ORDER BY BitRate DESCENDING LIMIT 3 Result:

• Query B: "Find the gateway with most FTP traffic currently" Query B identifies the subnet that generates the highest volume of FTP traffic through a single gateway during over the last 5 seconds. The result of the query identifies that gateway and provides the volume of the subnet's traffic through the gateway router. The function SUBNET() takes two arguments, the first an IP address and the second a subnet address, and returns true, if the given address is in the specified subnet.

SELECT Devicelp, InterfaceSubnet, SUM(ByteCount) as Volume

FROM Device, Interface, Flow

WHERE SUBNET(SrcIp, InterfaceSubnet) = true and Application = "FTP" and Timestamp <= "2004-04-18 05:23:05" and Timestamp >= "2004-04-18 05:23:00" ORDER BY Volume LIMIT 1

Result:

• Query C: "Identify all flows currently traversing two given routers." Query C identifies all flows traversing the two routers 192.168.1.1 and 192.168.4.1 during the past 5 seconds. Here, the aggregate function SET_CONCAT() coalesces all records in a group into a single record by concatenating the value of each record and removing duplicates. The function STRSTR() takes two strings and returns true, if the second string is found within the first. Note that the column PathSet returned by the query provides an unordered set of the addresses of routers traversed by each flow. SELECT Srclp, Dstlp, SrcPort, DstPort, SET_CONCAT(DeviceIp) as PathSet FROM Flow, Device

WHERE Timestamp <= "2004-04-18 05:23:05" and Timestamp >= "2004-04-18 05:23:00" GROUP BY Srclp, Dstlp, SrcPort, DstPort

HAVING STRSTR(PathSet,"192.168.1.1") and STRSTR(PathSet, "192.168.4.1") Result:

The process of translating a global query to SQL sub-queries is tightly coupled with the state machine of the echo pattern and utilizes it to accomplish two key tasks - data transport and incremental aggregation. In the former, the explorer messages are used to propagate SQL sub-queries while the echo messages are used to carry back the results. In the latter, the echo pattern state machine is used to trigger incremental aggregation when the results are returned from a network probe's neighbors. The process begins when a query is submitted to the start node defined in the query, for example, see query G below. The start node translates G into three SQL sub-queries SI, S2, and S3. SI and S2 are propagated via explorer messages to all network probes in the system. On each network probe, including the start node, SI is executed against the local database to yield a temporary local table named TEMP_TABLE. If a network probe is a leaf node in the execution tree, the records in the network probe's TEMP_TABLE is carried back to its parent via echo messages and TEMP_TABLE is deleted. Otherwise, the records carried in each echo message received from its neighbors are appended to its local TEMP_TABLE. When the last echo message is received on the network probe, S2 is executed against its TEMP_TABLE and the result returned via an echo message to the network probe's parent. Its local TEMP_TABLE is then deleted. Finally, when all echoes have returned to the start node, S3 is executed against the node's TEMP_TABLE, producing the results for the global query G. The mapping from global query G to sub-queries SI, S2, and S3 is explained choosing query C for G: G:

SELECT Srclp, Dstlp, SrcPort, DstPort, SET CONCAT(Devicelp) as PathSet FROM Flow, Device

WHERE Timestamp <= "2004-04-18 05:23:05" and Timestamp >= "2004-04-18 05:23:00"

GROUP BY Srclp, Dstlp, SrcPort, DstPort HAVING STRSTR(PathSet,"192.168.1.1") and STRSTR(PathSet, "192.168.4.1") G is mapped into SI as follows: G is pre-pended with a statement that creates a temporary table TEMP_TABLE and the "HAVING" clause in G is deleted.

The "HAVING" clause in G is deleted because group-level filtering, which is specified by the clause, can only be performed at the start node after all results have returned. SI:

CREATE TEMP_TABLE SELECT Srclp, Dstlp, SrcPort, DstPort, SET_CONCAT(DeviceIp) as PathSet

FROM Flow, Device

WHERE Timestamp <= "2004-04- 18 05 :23 :05" and Timestamp >= "2004-04-18 05:23:00"

GROUP BY Srclp, Dstlp, SrcPort, DstPort G is mapped into S2 as follows: The tables specified in the "FROM" clause are replaced with TEMP_TABLE, the "WHERE" and "HAVING" clauses in G are deleted and all aggregate functions (i.e., SET_CONCAT() in this case) in the "SELECT" clause are re-applied to their corresponding columns (i.e., PathSet in this case), in TEMP_TABLE. The "WHERE" clause is dropped because record filtering has already been performed by sub-query S 1 while the "HAVING" clause is omitted for the same reason as in sub-query SI .

S2: SELECT Srclp, Dstlp, SrcPort, DstPort, SET_CONCAT(PathSet)

FROM TEMP_TABLE

GROUP BY Srclp, Dstlp, SrcPort, DstPort G is mapped into S3 as follows: The tables specified in the "FROM" clause are replaced with TEMP_TABLE, the "WHERE" clause is deleted and all aggregate functions specified in the "SELECT" clause are replaced by column names.

The replacement of aggregate functions with column names is necessary because aggregation has been completed by S2. The primary purpose of S3 is to filter out groups of records which do not satisfy the conditions specified in the "HAVING" clause in G. S3:

SELECT Srclp, Dstlp, SrcPort, DstPort, PathSet

FROM TEMP_TABLE

GROUP BY Srclp, Dstlp, SrcPort, DstPort

HAVING STRSTR(PathSet," 192.168.1.54") and STRSTR(PathSet, "192.168.4.24") Note that S3 is not executed on the start node if G does not include the "HAVING" clause, since its purpose is to eliminate extraneous groups. The present invention distributes sub-queries and aggregates results in an asynchronous manner controlled by the state machine of the echo pattern. In general, the order in which sub-query S2 is executed among network probes is not deterministic. However, operations that are both commutative and associative do not depend on the ordering of their operands. The present invention exploits this property by allowing these operations to be computed incrementally on each network probe driven by the state machine of the echo pattern. As an example, the SET_CONCAT() function used to illustrate sub- query S2 above, is a commutative and associative string aggregate function that can be computed incrementally at each network probe. In this case, an echo carrying a list of IP addresses from a network probe's neighbors is simply concatenated with the list from the previous echo. A number SQL numerical aggregate functions, such as

MIN(), MAX(), SUM() also have these properties. However, other functions, such as the SQL AVG() aggregate function, do not. In such cases, distinct distributed versions of these functions have will have to be implemented that support the aggregation of partial results in an incremental distributed manner. Fortunately, most commercial Relational Database Management Systems support the development of these user defined aggregate functions. In an exemplary embodiment of the present invention, users can issue queries in three ways: (1) interactive querying - the centralized management console supports a web-interface that allows the user to type in an SQL statement, as exemplified in Figure 17. The return result is a table in XML displayed on the user's browser as illustrated in Figure 20; (2) issuing queries via HTTP post - web-based applications can post SQL queries to the centralized management console and receive results back in XML table; (3) issuing queries via well-known TCP socket - applications can use a socket-based interface to post SQL queries to the centralized management console. The syntax of the SQL used is compliant to SQL-92. In addition to the standard keywords and capabilities of SQL-92 (select, from, where, group by, having, order by, limit), an exemplary embodiment of the present invention also includes a comprehensive library of extended "select" and "group by" functions for computing network-specific statistics (e.g., histograms, jitter, path, etc.). Figure 2 is a block diagram depicting the functional elements of a network probe. A query, sent from the console as in Fig. 1, arrow 110, is processed by each receiving network probe. An exemplary embodiment of the network probe has four main functional elements: acquisition, analysis, storage, and processing. First, at 210, the acquisition element captures network packets and frames from a monitoring point in the network. Next, these packets are passed to the analysis element, 220, which decodes and computes statistics, indices and summaries. The resulting data produced by the analysis element is then stored in a high performance database, 230. Finally, this data may be accessed in real time when processing a search, query or computation by the processing element, 240. An exemplary embodiment of the acquisition element 210 of the network probe can be constructed from high performance off-the-shelf Network Interface Cards (NIC) that support a Direct Memory Access (DMA) interface, an example of which is the DAG series of NICs produced by Endace Measurement

Systems. Other interrupt-based NICs can also be employed to construct this element; however the tradeoff may be in the loss of some packets when the link utilization is high. In terms of system software required to develop the element, NIC vendors will either supply a proprietary kernel driver with their hardware or include standard drivers that will work in a number of popular operating systems. In either case, this element can be developed to operate over the supplied proprietary driver or to utilize open sourced packet filter libraries such as libpcap (available at http://sourceforge.net/proiects/libpcap/). An exemplary embodiment of the analysis element 220 of a network probe is based on an architecture that utilizes parallel pipelining to speed up packet processing and analysis. The advancement of optical and gigabit Ethernet technologies have led to a corresponding increase in the speed of network interfaces. Consequently designers of protocol analysis and packet decoding systems face a constant challenge in crafting solutions that can keep abreast of the latest interface technologies. The analysis functional element 220 of the network probes utilizes a multi-staged tree pipelining architecture that exploits parallel processing to allow the decoding and analysis of a packet stream at arbitrary speeds. The element itself can consist of a single protocol analysis element or a network of active nodes each of which contains a protocol analysis element coupled with the distributed processing element of a network probe. Figure 3 illustrates the architecture of the analysis element 220 in the network. The square boxes, 300, represents active nodes with distributed SQL processing functionality described below (processing element 240). Each active node, 300, is a location for a network probe, such as a switch or router. The rounded rectangles, 310, represent the protocol analysis functionality of each active node that captures, decodes and stores packets on a local database for subsequent querying. The thick solid arrow, 320, represents the original packet stream captured at rate R by active node 1. The dotted arrows, 330, represent the packet stream after it has been processed by active node 1, 301, which includes packets as well as timestamps. There are a total of n active nodes in the system arranged into a hierarchy of log(n) stages. Each active node with the exception of active node 1, 301, is capable of capturing packets at a rate less than R/2 and processing it at a rate of R/n. Active node 1, 301, is capable of capturing and timestamping packets at rate R but processing it at rate R/n. Furthermore, in Figure 3, all packets arriving on active node 1, 301, are timestamped. A fraction of these (R/n), is directed to the protocol analysis function on the node. The remaining fraction of packets is transmitted, together with their timestamps, to active node 2, 302, and active node 3, 303, in equal proportions. On active node 2, 302, and active node 3, 303, a fraction of the received packets and timestamps (R/n) is directed to the protocol analysis function on each node while the remaining is transmitted in equal proportions to their peers in stage 3 where the process is repeated. At each stage, the rate of incoming packets and timestamps is reduced by the action of active nodes in the previous stage. In the final stage, this rate is less than R/n so that all of the packets and timestamp can be processed by the protocol analysis function on the node. The architecture of the system has advantages which include, but are not limited to the following: • Active nodes can be added to increase the rate in which packets can be processed without limitation. • Each new active node added only lengthens the pipeline by log n stages. • The total number of stages in the pipeline can be shortened further by configuring each active node to transmit to more than two peers (e.g., three or more). The resulting pipeline will be an k-ary tree instead of a binary tree as shown in Figure 4. Aside from this, no other changes are required in the architecture. An exemplary embodiment of the storage element 230 of the network probe depicted in Figure 2 can be constructed from any off-the-shelf relational database system that supports the Structured Query Language (SQL). The database schema of each network probe consists of a collection of relational database tables that contain information generated by the analysis element 220. Each table contains records of identical structures, which can be accessed via a local interface. Each such record contains data gathered by the network probe from its attached router via an access protocol — SNMP or Command Line Interface. In a preferred embodiment, network probes share the same schema, meaning that the type and the structure of information they collect are identical. In general, each table holds information about traffic generated or consumed by a single type of application. The structure of each is thus highly specific to the type of application it represents. The list below provides the schema for tables representing data gathered from the traffic generated by a number of commonly used applications. • Email transaction table This table may contain records representing email transactions between mail clients and servers. By way of example, each record contains, but is not limited to the following list of fields: source and destination IP addresses and port, sender, receiver, subject, cc, bcc, mail agent, mail protocol (SMTP, ESTMP, POP, IMAP, etc.), content type, action (sending, receiving, relaying), and timestamp. • HTTP transaction table This table may contain records representing HTTP transactions between browsers, proxies and web servers. By way of example, each record contains, but is not limited to the following list of fields: source and destination IP address and port, HTTP-version, transaction type (request or response), method (get, post, head, put, delete, trace, connect), request-URI, status code, user-agent, from, host, content-type, length, and timestamp • SIP transaction table This table may contain records representing SIP calls between SIP user agents, gateways and servers. By way of example, each record contains, but is not limited to the following list of fields: source and destination IP address and port, SIP- version, transaction type (request or response), method (register, invite, ack, cancel, bye, options), request-URI, status code, from, to, caller-ID, contact, and timestamp.

• FTP transaction table

This table may contain records representing FTP transactions between FTP clients and servers. By way of example, each record contains, but is not limited to the following list of fields: source and destination IP address and port, command

(username, password, cwd, cdup, logout, put, get, delete, rmdir, mkdir), target (filename, directory name, user name, password), and timestamp.

• Instant messaging log table

This table may contain records representing instant messages between instant messaging clients and servers. By way of example, each record contains, but is not limited to the following list of fields: source and destination IP address and port, protocol (YAHOO, MSN,AOL, ICQ, IRC), from, to, cc, datetime, subject, and timestamp.

• P2P activity table This table may contain records representing P2P application searches and search results. By way of example, each record contains, but is not limited to the following list of fields: source and destination IP address and port, protocol (Gnutella, Fastrack, etc.), descriptor Header (Ping, Pong, Query, QueryHit, Push), search criteria string, shared file names, shared file sizes, user, and timestamp. In addition to the above per-application tables, the network probe also contains a number of tables that represent general IP, TCP and layer 2 traffic. These tables may include, but are not limited to the following:

• Flow table

This table may contain records representing network traffic flows sampled from a sequence of packets captured from the network. By way of example, for an IP network, a record would contain fields which include, but are not limited to the following: source and destination IP address and ports, layer 2 source and destination address, IP packet header options, for example, Type-of-Service, TTL etc., packet count and byte count, sampling duration, and timestamp of the sample. • Frame table This table may contain records representing layer 2 frames captured by the acquisition element from the network. By way of example, for an Ethernet based network, a record would contain fields which include, but are not limited to the following: source and destination MAC addresses, frame count and byte count, sampling duration, and timestamp of the sample. • TCP connection table This table may contain records corresponding to TCP connection events observed in the network. By way of example, a record would contain fields which include, but are not limited to the following: source and destination IP address and port of the connection, connection event type (e.g., SYN, SYNACK, FIN, RESET, etc.), TCP sequence number, and timestamp of the event. An exemplary embodiment of the processing element 240 of the network probe depicted n Figure 2 is based on an architecture that utilizes distributed SQL queries to collect and process information from the network. As networks grow larger and more dynamic, the ability to create network views in real-time becomes increasingly important. To compute such views, measurements must be taken from various points in a network and statistical methods applied to infer state abstractions. Since the quality of the inference depends on the number of measurements available, the ability to collect and process large amounts of data quickly and efficiently is the key to computing real-time views in the context of decentralized management. An exemplary embodiment of the processing element 240 utilizes the relation model as the fundamental abstraction to model network data and compute network views, for the following reasons. First, the relational model, providing the basis for most commercial database management systems today, is known for its flexibility and expressiveness for formulating queries through the use of a relational algebra. Second, the maturity of this technology and the availability of tools and resources for manipulating relational data reduce the cost of developing, maintaining and operating new applications. Finally, the large amount of data that is currently available in enterprise relational databases (e.g., in the form of inventories, service contracts, usage records) could be merged with network measurements, providing integrated views that encompass multiple domains and business functions. Therefore, in accordance with the representation of network data as tables maintained on network nodes, a network view may be understood as the result of a (distributed) relational query on these tables. An exemplary embodiment of the processing element 240 supports the creation of, but is not limited to, the following categories of real-time views of the network: 1. Views of traffic flows; 2. Views of traffic composition; 3. Views of end-to-end quality of service statistics, including packet delay, jitter and loss; 4. Views of statistical measures of traffic characteristics such as frame or packet sizes; 5. Views of end-to-end connection statistics of connection-oriented protocols like TCP; 6. Views of topology information, such as the connectivity distribution of the network nodes or the current number of sinks of IP multicast sessions; 7. Views derived from statistical correlation of point measurements, e.g., the coefficient of correlation between traffic volume and delay between two end-points. The dynamics of the system operates as follows. A network administrator sends a query via HTTP to the centralized management console, which dispatches it to a start node (a network probe) in the network. For processing a query in the network, the system uses the echo pattern and a query aggregator that is invoked by the pattern on each node. During the expansion phase of echo, sub queries are distributed to the targeted node and the query aggregator performs the local database access. During the contraction phase of echo, partial results carried back by echo messages are incrementally added to the results obtained from the local query, on all nodes of echo's execution tree. At the end of the contraction phase, the resulting table of the global query is sent from the start node, which is the root node of the execution tree, to the console, which forwards it to the administrator. To better understand how a sub query is resolved on a network probe, consider Query A. When the sub query is executed on a network probe, which we will call X and assuming that X is not a leaf of the echo's execution tree-a local table is constructed that initially holds data of the top ten flows passing through X's attached router. As partial results from X's neighbors are received via echo messages, additional rows containing data of the top ten flows from X's neighbors' routers are added to the local table. After an echo message is received from the last neighbor in the set N, the query aggregator on X re-executes the original sub query on the local table, retaining only records of the top ten flows. This table is then sent as part of an echo message to X's parent. An exemplary embodiment of the centralized management console includes a set of a set of PHP scripts running on an Apache HTTP server and a

MySQL database daemon. A MySQL database daemon also runs on each network probe to interpret the local queries generated by the query aggregator and to insert records with data collected from the attached routers. An exemplary embodiment of the present invention includes a method which allows network operators to visualize and track thousands of active traffic flows in a network in real-time. Much of today's network traffic is comprised of sustained high volume 'streams' of data between pairs of computers. At the lowest level, each of these streams is the aggregation of thousands of individual packets from the same source headed towards the same destination. However, modeling traffic in the network in terms of 'streams' or flows instead of disparate packets provides a better insight into the dynamics and utilization of resources in a network. For example, a common route taken by the heaviest flows is often congested; similarly a node on the intersection of many heavy flows is more likely to be overloaded. While the benefits of modeling network traffic as flows is widely recognized, there are very few tools in the industry today capable of tracking and visualizing flows. There are two primary reasons for this. First, there are very few vendors with devices capable of examining every packet on a link at high speeds (upwards of several million packets per second) in order to derive flow characteristics. Secondly, tracking the dynamics of a flow across the network requires the correlation of information from many possible different points in the network. The capability of a network probe of the present invention, to capture and analyze traffic streams in real-time make it an ideal source for flow information. An exemplary method of the present invention utilizes packet level information in the network to create a graphical network-view of active flows in a continuously scrolling display. Much like how the scrolling stock ticker provides a real-time view of the performance of the stock market, this method provides a network operator with a live view of the traffic dynamics in a network in significant detail. A flow is a sequence of datagrams with identical source and destination addresses and port numbers. The construction of a flow matrix requires a table of records, one for each datagram captured in the network. Each record should include the following information: • Source and destination addresses and port numbers of the datagram • Size of the datagram • A timestamp indicating when the packet was captured Figure 5 shows an exemplary active traffic flow matrix of a network with many active flows. Network flows in the matrix are represented as horizontal lines of varying thickness, 510. Time is represented on the horizontal axis and is marked out in discrete intervals at the top of the matrix, 520. Each flow can vary in thickness over a minimum pre-determined period of time (e.g., 3 seconds). This minimum period is called a slice and represents the sampling time window of statistics about the flow. For each slice of each flow, statistics such as its average bit rate or packet rate can be computed. These statistics can be represented in the matrix visually as an indication of the dynamics of the flow. For example, in Figure 5, the thickness of a slice represents the average bit rate of the flow over the slice. The matrix scrolls in the direction of the time axis providing a continuously updated view of network traffic over a window of time proportional to its width. Because there may be more flows in the network than is practical to be displayed, the matrix may represent only samples of the most interesting flows. For example, the top 25 ranked flows in terms of bit rate. Colors are used in the matrix to represent flows with similar characteristics, 530; for example, flows carrying similar kinds of traffic (e.g., video) may be similarly colored. The vertical position of a flow in the matrix can be used to indicate several properties of the flow. For example, flows which have been in the matrix longer may be placed lower, or flows representing applications with real-time constraints may be positioned higher than others with no time constraints. The movement of sinking and floating flows may be used by the matrix to highlight subtle trends in the traffic dynamics of the network. A matrix, such as that shown in Figure 5, may also permit drill-down analysis of selected flows. For example, clicking on a flow, 510, may display a graph of the flow rate, the source and destination addresses of the flow, and the intermediate nodes traversed by the flow. Alternatively, right-clicking on a flow, 510, may bring up a menu that permits users to search for similar flows in a number of ways, for example, flows traversing the common set of nodes or flows originating or terminating from the same region in the network. Because the active traffic flow matrix can capture the dynamics of thousands of flows in a single display, a matrix diagram of a network is, in essence, a compact representation of the complete traffic dynamics of the network over a specific window in time. Additional examples of traffic flow matrices and charts are shown in Figures 13, 14, 15, and 16. In an exemplary embodiment of the present invention, the centralized management console has several screens that use Java applets to provide real-time scrolling views of network traffic, therefore to view all screens on the console a browser would be used that supports a plug-in for Sun's Java standard edition version 1.4.2 and above. Figure 6 illustrates an example of how a typical provider's network can be instrumented with the present invention. Today, operators have networks that typically consist of dozens of points of presence (POPs) and span multiple peering points over a wide geographical distance. Getting a good insight into the end-to-end behavior of traffic in the network is non-trivial since the path of a typical packet may span multiple provider's networks during which it may be encapsulated and de- capsulated any number of times. In order to effectively monitor such connections, traffic statistics from various points in the network have to be correlated. To make matters worse, customers with service to multiple POPs may request for 'network- wide' service level agreements so that their traffic observes guaranteed bounds regardless of the end-point addresses. Verifying such service level agreements requires measurement and aggregation of traffic statistics from multiple hops on a per-customer basis. Typical monitoring systems today cannot perform these computations in real-time and service level agreement violations are belatedly reported, if at all. The real-time aggregation and correlation capabilities of the present invention allows operators to take a proactive approach by spotting problems before or as they are occurring. In Figure 6, network probes, 610, are attached at all ingress and egress routers at POPs, 620, in a provider's network as well as to some routers within a POP. The network probes can be grouped into clusters depending on whether they are monitoring edge routers/switches or internal routers. The network probes monitor all traffic entering and exiting their attached router at wire-speed, analyzing source- destination addresses, ports and decoding selected protocols. The network probes also generate active test traffic between each POP to evaluate the performance of the service and network infrastructure. Because each network probe has access to a synchronized time source, one-way delay measurements with sub-millisecond accuracy can be obtained between any pair of measurement points. All measurements are cross correlated in real-time to yield edge-to-edge (630), POP-to-POP (640), end- to-end (650), and even hop-to-hop (660) metrics. These consolidated measurements can be exported via the console in XML for inclusion into the provider's operations, administration, and maintenance systems. Figure 7 illustrates how the network probes, 710, accomplish active and passive monitoring on a router or switch. Passive monitoring is achieved through the use of mirrored ports, 720. These ports are available on select routers and switches and allow incoming traffic from all ports to be duplicated onto the mirrored port. Attaching a network probe to the mirrored port allows the probe to access all traffic passing through the monitored device, 750. This is indicated by the solid line, 730. In active monitoring, the network probe is also attached to a regular port which is used to inject specially marked packets into the network (indicated by dotted lines, 740). As the injected traffic passes through the probe instrumented devices, 750, they are passively monitored by the probes. To ensure maximum accuracy, network probes may be equipped with special hardware to timestamp packets as they arrive from the mirrored ports without any CPU intervention. With GPS synchronized clocks, the accuracy of these timestamps are on the order of 100ns. In a similar way to delay measurements, other one-way metrics such as loss, jitter, throughput, packet count, and packet size distribution can also be obtained. Detailed statistics pertaining to TCP and other application layer protocols can also be measured and correlated. Finally, a network probe can also monitor attached devices via traditional management channels such as SNMP, Telnet and Cisco Netflow. Although the present invention has been described in connection with specific exemplary embodiments, it should be understood that various changes, substitutions and alterations can be made to the disclosed embodiments without departing from the spirit and scope of the invention as set forth in the appended claims.

Claims

WE CLAIM: 1. A system for monitoring and analyzing traffic on a packet-based communications network comprising interconnected switching elements, the system comprising: a plurality of network probes each having a database for storing statistical information derived from captured network traffic and a query engine for receiving and responding to queries for information stored in said tables, said plurality of network probes being synchronized in time with one another; a plurality of interface taps for communicatively connecting each of said plurality of network probes to a switching element whereby the probe can capture network traffic from the switching element; and a centralized management console capable of sending queries for information stored on said databases along a network path comprising said plurality of probes, wherein the results of said queries are propagated back along said path with aggregation being performed by intermediate network probes.

2. The system of claim 1 wherein the network path is a minimal delay spanning tree whereby the speed at which queries complete is proportional to the diameter of the network.

3. The system of claim 1 wherein the database comprises relational tables.

4. The system of claim 3 wherein the database of each of the plurality of probes has an identical schema of relational tables.

5. The system of claim 4 wherein the schema includes one or more tables containing records representing transactions selected from the group of transactions comprising email transactions, http transactions, SIP transactions, FTP transactions, instant messaging transactions and P2P transactions.

6. The system of claim 4 wherein the schema includes one or more tables containing records representing predetermined traffic parameters.

7. The system of claim 1 wherein the captured statistical information is derived from network traffic captured by more than one probe.

8. The system of claim 1 wherein at least one probe is synchronized by an external clock.

9. The system of claim 1 wherein at least one probe is synchronized by another probe.

10. The system of claim 1 wherein the network is the Internet.

11. The system of claim 1 wherein said plurality of network probes are configured by said centralized management console.

12. The system according to claim 1 wherein at least one of said interface taps comprises a frame capture interface connected to a mirrored port on a switching element.

13. The system according to claim 1 wherein at least one of said interface taps comprises a frame capture interface connected to a tapped port on a switching element.

14. The system according to claim 1 wherein at least one of said interface taps comprises a traffic injection interface connected to a switching element whereby each probe can be configured to periodically inject test traffic into the network to estimate the quality of service of paths between probes.

15. A method for monitoring and analyzing traffic on a packet-based communications network comprising interconnected switching elements, each of said switching elements having a network probe communicatively connected thereto, the method comprising the steps of: said probes capturing and storing information derived from network traffic; transmitting from a centralized location a query for information stored on at least three of said probes along a network path comprising said at least three probes, wherein the step of transmitting comprises propagating said query from said centralized location to a first of said at least three probes and propagating said query from said first of said at least three probes to the other two of said at least three probes; said other two of said at least three probes generating a response to said query based on said information stored thereon and propagating said response to the first of said at least three probes; said first probe generating a response to said query based on said information stored thereon; said first probe aggregating the responses from the at least three probes and propagating an aggregated response to said centralized location; and receiving said aggregated response to said query at said centralized location.

16. A method according to claim 15 wherein said query is transmitted to said at least three probes along a minimal delay spanning tree.

17. A method according to claim 15 wherein the network is the Internet.

18. A method according to claim 15 wherein said central location comprises a centralized management console, and said centralized management console formats said aggregated response into a flow matrix for user presentation.

19. A method according to claim 15 wherein said query is a simple keyword search.

20. A method for monitoring and analyzing traffic on a packet-based communications network comprising interconnected switching elements, each of said switching elements having a network probe communicatively connected thereto, the method comprising the steps of: said network probes capturing and storing information derived from network traffic; transmitting a query from a centralized management console to a first network probe, propagating said query from said first network probe to a plurality of network probes along a minimal delay spanning tree path; said first network probe receiving responses from said plurality of network probes, said responses being based on said information stored on said network probes; said first network probe propagating an aggregated response to said centralized management console; and receiving said aggregated response to said query at said centralized management console.

21. A method according to claim 20 further comprising the step of said centralized management console formatting said aggregated response into a flow matrix for user presentation.