WO2001001297A1 - Data aggregator architecture for data mining - Google Patents
Data aggregator architecture for data mining Download PDFInfo
- Publication number
- WO2001001297A1 WO2001001297A1 PCT/US2000/018183 US0018183W WO0101297A1 WO 2001001297 A1 WO2001001297 A1 WO 2001001297A1 US 0018183 W US0018183 W US 0018183W WO 0101297 A1 WO0101297 A1 WO 0101297A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- event
- data
- aggregator
- events
- computer
- Prior art date
Links
- 238000007418 data mining Methods 0.000 title claims abstract description 52
- 238000000034 method Methods 0.000 claims description 101
- 238000013481 data capture Methods 0.000 claims description 37
- 230000008569 process Effects 0.000 claims description 32
- 238000012545 processing Methods 0.000 claims description 31
- 230000000694 effects Effects 0.000 claims description 29
- 230000004044 response Effects 0.000 claims description 10
- 230000009471 action Effects 0.000 claims description 5
- 238000004891 communication Methods 0.000 claims 1
- 238000001914 filtration Methods 0.000 description 14
- 230000006870 function Effects 0.000 description 12
- 238000013480 data collection Methods 0.000 description 11
- 230000008901 benefit Effects 0.000 description 6
- 230000015654 memory Effects 0.000 description 6
- 238000013500 data storage Methods 0.000 description 5
- 238000013459 approach Methods 0.000 description 4
- 230000003068 static effect Effects 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 241000699666 Mus <mouse, genus> Species 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 241001522296 Erithacus rubecula Species 0.000 description 1
- 241000699670 Mus sp. Species 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 238000010420 art technique Methods 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000004941 influx Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000035755 proliferation Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 229920006395 saturated elastomer Polymers 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
Definitions
- the present invention relates to uses of a computer infrastructure and more particularly, to improved methods of implementing data mining processes utilizing a number of novel approaches to achieve various purposes.
- Data mining is unlike a traditional well- organized and robust database that a user might use to formulate a specific database query. Data mining refers to situations that have three major characteristics.
- data mining refers to a situation where the user is looking for information in the data but is not sure what that information might be.
- One way to capture the data in a computer system or a computer network for future data mining uses the programming technique of raising events throughout the software code. Raising events, or more appropriately, making a "raise event” call, is a well known programming technique of capturing data at a particular point in the software code and outputting it to a computer screen, disk or other storage device. For example, raise event calls are used frequently in the debugging of software to output variable values at particular points in the code so that the code may be debugged. In another example, a typical business application software program might have raise event calls sprinkled liberally through the code in order to capture large volumes of any and all type of data.
- FIG. 1 illustrates a prior art computing infrastructure for use by users accessing a target web site.
- a user puts in an access request 102 that goes through a web server router 104 which is linked to a plurality of web servers 106a-e.
- Web server router 104 randomly assigns a web server among web servers 106a-e to respond to the user's access request 102.
- Web servers 106a-e are linked to an application server router 110, which assigns an application server from the plurality of application servers 112a-d.
- Each of these application servers 112a-d is also linked to a database router 120, which in turn is linked to databases 124a-c.
- Database router 120 may assign a database among databases 124a-c in response to an access request as needed.
- each server (which may be a database server, an application server or a web server), is connected to at least one router, which plays the intermediate role of assigning the appropriate server in response to an access request.
- Each router is generally linked to like servers in a given cluster, and selects one of these servers for assignment based on factors such as machine load and existence of previously cached data.
- Information for data mining is generated during normal operation of these servers, which means that the data gathering is tied into the business application code, and that the code will have to be modified and recompiled when changes to the data gathering strategies need to be made.
- the data generated is stored within this infrastructure, which does not allow for expansion in terms of data storage capability. Therefore, what is needed is an improved expandable computer infrastructure that will allow for a smooth extension of data storage capabilities as well as improved schemes that will allow the business applications and the gathering of generated data to function independently of one another.
- a method of creating an event for a user activity includes determining that a user activity is useful for data mining, creating an event for this user activity, assigning an event name to the created event, recording the event name in an event table and setting the event to a default state, placing calls in appropriate locations in computer code to raise the event when its corresponding user activity occurs, and compiling the computer code to allow each event to be raised when its corresponding user activity occurs.
- this method may further include setting up a listener to listen for events selected from a list of documented events.
- the setting up of a listener would include the steps of reviewing the list of documented events, selecting at least one event for a particular business need, and writing handler code for each selected event which would handle that particular selected event should it be raised.
- a method of processing an event includes receiving an indication of an event when that event is raised, obtaining an event identifier for that event, looking up the identifier in an event table to determine if the event is turned on, and if the event is turned on, a handler for that event is invoked and run. As the handler runs, it may also capture data to an aggregator.
- the event processing may include the signaling of a daemon thread.
- a server used within this computer infrastructure includes a code module, a handler module, a filtering/policy module and an event processor.
- Possible variations in the server structure may include a filter/policy module in the server having an event table with a plurality of events or a code module having a plurality of raise event calls embedded in the code.
- This first embodiment advantageously provides for a dynamic selection of the type of information that is captured to look for patterns and other information through the use of dynamic raise event calls.
- the dynamic events may be embedded in the code and can be turned on or off depending on what information is to be collected for data mining.
- an aggregator intelligent director computer in a second embodiment of the invention, includes a processing capability database, a load status database, an aggregator selector and a configurator.
- the aggregator intelligent director computer may optionally include a usage history database.
- an aggregator network in another aspect of the second embodiment, includes an aggregator intelligent director linked to a number of aggregators.
- the aggregator network may also be further expanded by linking each of the aggregators to another aggregator intelligent director, which in turn is linked to a second set of aggregators.
- a computer network in yet another aspect of the second embodiment, includes a number of web servers, a number of business logic servers, a number of database servers and an aggregator network.
- the aggregator network includes an aggregator intelligent director.
- a method for capturing data includes receiving data from a server, checking databases in an aggregator intelligent director computer, selecting an aggregator using an aggregator intelligent director computer, and delivering the data to the aggregator.
- the method may further include accepting feedback from the aggregator and including that feedback in the databases.
- the method may include configuration of the aggregator.
- the second embodiment advantageously provides an improved aggregate architecture for storage of captured data for data mining that allows virtually unlimited expansion in capacity by adding machines as needed. This advantageously eliminates the need to replace machines already present within the infrastructure.
- a policy module that is decoupled from a primary business application is disclosed.
- This policy module includes an event table having a number of events and a toggling switch to toggle the events on or off.
- the policy module may further include a counter to increment occurrences of each of the events.
- the toggling switch in the policy module may toggle the events on or off independent of a main business application that is running alongside the policy module.
- the third embodiment provides for decoupling the policy module from the primary business application. This advantageously allows a policy module to be created, modified and run independently of operations relating to the primary business application. This policy within the policy module may be modified as needed to fulfill data mining requirements by toggling events on and off as needed. This approach avoids recompiling of code and allows for tailoring of policies without affecting normal operation of the primary business operation, therefore resulting in greater efficiency.
- Figure 1 illustrates the prior art computing infrastructure for use by users accessing a target web site.
- Figure 2 illustrates an improved computing infrastructure for use by users accessing a target web site having greater expansion capabilities through the use of intelligent directors.
- Figure 3 shows a simplified illustration of the computing infrastructure of Figure 2 having a number of aggregator networks connected to the servers for information downloading.
- Figure 4 illustrates an exemplary aggregator computer network having aggregator intelligent directors and aggregator computers forming a tree topology.
- Figure 5 illustrates components of an exemplary aggregator intelligent director.
- Figure 6 illustrates an exemplary server computer having a code module, a filtering/policy module, a handler module and an event processor.
- Figure 7 illustrates a process flow of creating an event for use in data mining.
- Figure 8 illustrates a process flow of setting up a listener for capturing selected events.
- Figure 9 illustrates a process flow of processing an event once that event is raised.
- Figure 10 illustrates a process flow to filter the collecting of data by taking actions in response to events raised.
- Figures 11 and 12 illustrate a computer system 900 suitable for implementing embodiments of the present invention.
- the first embodiment relates to an improved method that allows for dynamic selection of the type of information that is captured to look for patterns and other information through the use of dynamic raise event calls.
- events that have been created by placing dynamic raise event calls in the code may be turned on and off depending on what information needs to be collected. This allows for flexibility in modifying data mining policies; each policy may be tailored to the proper problem domain without going through extensive procedures such as modifying and recompiling the main code to achieve that purpose.
- the second embodiment relates to an improved aggregate architecture to allow for flexible expansion of the storage capacity for collecting data for data mining purposes.
- an aggregate architecture may start with a basic topology and be expanded by adding additional aggregate intelligent directors and aggregator computers to increase storage capacity. Conversely, the topology may be modified by removing aggregator intelligent directors and aggregator computers when they are not needed.
- the third embodiment relates to an innovative feature of decoupling the filtering/policy module from the primary business application.
- the decoupling feature advantageously allows data collection to take place without interference from the operations of the primary business application.
- the filtering/policy module is a separate entity from the code module that handles the main sequence of operations. Altering the filtering/policy module by turning selected events on and off will have the effect of modifying the data collection strategy without affecting the primary structure of the business code module.
- Figure 2 illustrates an improved computing infrastructure for use by users accessing a target web site, wherein the routers that are used in the prior art infrastructure to direct users to their assigned servers are replaced by intelligent directors.
- a web logic intelligent director 204 which is linked to a plurality of web servers 106a-e.
- Web logic intelligent director 204 receives feedback from web servers 106a-e, and based on the feedback, selects a web server among web servers 106a-e to respond to the user's access request 102.
- Web servers 106a-e which are linked to a business logic intelligent director 210, have the capability to request a business logic server based on the user's needs via business logic intelligent director 210.
- Director 210 selects a business logic server from the plurality of business logic servers 112a-d based on the information it receives from the requesting web server and feedback from the business logic servers. Each of these business logic servers 112a-d is also linked to a database intelligent director 220, which in turn is linked to databases 124a-c. This enables a business logic server to select a database via database intelligent director 220. Like the other intelligent directors, database intelligent director 220 selects a database based on the requesting server, which is a business logic server in this case, as well as feedback from the databases it is linked to.
- each server which may be a database server, a business logic server, or a web server, is connected to at least one intelligent director, which, among other functions, plays the intermediate role of assigning the appropriate server in response to an access request.
- the pair of lines pointing in opposite directions between any two machines illustrate the logical view of the connections that exist between these machines.
- the connection is enabled by a network interface card (NIC) and a bus having bidirectional functionality to allow input and output.
- NIC network interface card
- An intelligent director is generally linked to like servers in a given cluster and selects one of these servers for assignment based on feedback received from these servers.
- FIG 3 shows a simplified illustration of the computing infrastructure of Figure 2 having a number of aggregator computer networks connected to the servers for information downloading.
- Web servers 304a-c are linked to an aggregator intelligent director 306 which serves as a foundation for each aggregator computer network.
- Aggregator intelligent director 306 is, in turn, linked to aggregator computers 308a-b. Intelligent directors other than the aggregator intelligent directors are not shown in this figure for ease of illustration.
- Information collected at business logic servers 324a-c and database servers 354a-c may be processed in a similar fashion via paths 325 and 355 to aggregator intelligent directors 326 and 356. These directors will assign an aggregator, for example, 328a, 328b or 358. Information collected in this manner may be used for data mining either in an offline mode where data is collected for a long period of time and then processed later to look for patterns, or alternatively, in a real-time mode where the data is dynamically processed as it is being collected and decisions are made on the spot or with a very short lag time.
- each server of a cluster in this exemplary infrastructure is connected to an aggregator intelligent director, which may in turn be connected to an aggregator computer network which may include a single aggregator computer or multiple aggregator computers.
- aggregators may stand alone, or alternatively, they may be expanded further through the use of additional aggregator intelligent directors into a hierarchy of aggregators.
- Aggregators and aggregator intelligent directors may be linked in a variety of combinations to form different aggregator networks having different topologies, depended on what is needed. For example, one topology might have all aggregators linked to one aggregator intelligent director, while another topology might have each aggregator linked to an individual aggregator intelligent director.
- FIG. 4 illustrates an exemplary aggregator computer network having aggregator intelligent directors and aggregator computers that form an exemplary tree topology.
- aggregator intelligent director 400 is linked to multiple aggregator computers 410a-d.
- Aggregators 410b-d are expanded by linking to aggregator intelligent director 420, which in turn is linked to two aggregators 43 Oa-b.
- Aggregator 430b is then linked to aggregator intelligent director 440, which in turn is linked aggregators 450a-b.
- This chain of aggregator intelligent directors and aggregators may continue as long as it is deemed reasonable and necessary.
- aggregator 410c may be assigned to respond to that request. If aggregator 410c does not have adequate capacity to store all the information provided, aggregator 410c may send a request to aggregator intelligent director 420, which may select aggregator 430b based on the nature of the request and feedback from its downstream aggregators, which in this case would be 430a-b. Aggregator 430b would continue to receive the information, but in the event that aggregator 430b has inadequate capacity, it may request additional reinforcement through aggregator intelligent director 440, which may assign an aggregator 450a or 450b depending on what is needed as well as the feedback it receives from these aggregators.
- FIG. 5 illustrates components of an exemplary aggregator intelligent director 500.
- the components of aggregator intelligent director 500 include an aggregator selector 502, a usage history database 504, a processing capability database 506, a load status database 508 and a configurator 510.
- Also included but not shown is an aggregator directory, a latency database, a log file database and a network traffic database.
- aggregator intelligent director 500 receives a request for an aggregator at 512 over a network connection from one of the web servers, business logic servers or a database server. Upon receipt of this request, aggregator selector 502 looks to usage history database 504, processing capability database 506, load status database 508 and information from the other databases to determine which aggregator can best be assigned to handle this particular request.
- These databases store status information about the aggregators and are linked to the aggregator intelligent director and may be updated by receiving feedback from these aggregators as shown in 514.
- the aggregator selector selects an aggregator computer based upon these databases and assigns an aggregator to handle the request and returns the aggregator selection in 516.
- the databases also provide information to configurator 510 which may reconfigure the aggregators over connection 518 using the feedback information provided. Especially if the aggregators are heterogeneous, for example, certain aggregators may have special data mining or data gathering functions that other aggregators do not have.
- Information received from the aggregators via path 514 is stored in exemplary blocks 504, 506, 508 and in the other databases not shown. For ease of illustration, not every database is shown in Figure 5. Note that some of information is static and may be received as part of the registration process that the aggregators undergo as they are installed into the network. An example of such static information is aggregator processing capability and geographic location. Other information may be dynamically received by the director from the aggregators. Still, other information may be derived from the information received dynamically and/or statically (such as the average latency time for aggregators, which may be calculated periodically based on average network latency between the server and an aggregator).
- Usage history database 504 is used to predict usage demands placed on the aggregators and to prospectively balance the data to capture among the aggregators if needed. Thus, an anticipated surge in usage does not overwhelm any particular aggregator. For example, if one particular aggregator is always overwhelmed each Saturday evening, that information is kept in database 504 and can be used in the future to deflect a sudden influx of data from that aggregator on a future Saturday evening.
- Processing capability database 506 may track the processing capability (again in number of transactions per second, the number servers that can be supported concurrently, etc.) of the individual aggregators in order to ascertain how much bandwidth a particular aggregator may have, how much has been used, and how much is available. Information pertaining to the processing capability of the aggregators may powerfully impact the routing decision since a more powerful aggregator may be able to capture data faster than a less powerful aggregator even if the more powerful aggregator may appear to be more heavily loaded.
- the aggregator processing capability may be obtained when the aggregator is first installed and registered within the system.
- Load status database 508 may track information pertaining to the level of load currently experienced by the individual aggregators (in number of transactions per second, the number of servers currently using an aggregator, CPU load, for example). This information may be tracked for aggregators individually and/or as a group average.
- Network traffic data pertaining to specific connections within and between the various sites may be obtained through the use of appropriate network sniffers or other software tools and furnished to a network traffic database (not shown) within director 500 so that the appropriate calculations pertaining to average latency can be made.
- An aggregator average latency database may track the average latency to be expected if a particular aggregator is employed to service a data capture request.
- the average latency may be calculated based on the processing capability of the aggregator, how remote it is from the server that issues the capture request and the speed of the network connection.
- An aggregator log file database may track the operational status of each aggregator to determine, for example, the length of time that the aggregator has been in operation without failure and other types of log file data.
- An aggregator directory may track information pertaining to the list of aggregators available to the system, their remote/local status (geographic distribution), their relative weight which reflects the relative preference with which data should be directed to the individual aggregators (e.g., due to network conditions or other factors), and the like.
- the remote/local status of the aggregators may also impact routing decisions.
- servers of a given service widely dispersed over a large geographic area, both to improve service to its worldwide customer base and also to provide some measure of protection against natural or man-made catastrophes that may impact a given geographic location.
- the aggregators that capture data from these servers may also be well dispersed.
- one of the servers When one of the servers prepares to capture data for later data mining (for example, as in step 920), it accesses selector 502 to ascertain the appropriate aggregator computer to which the data may be sent. The decision pertaining to which aggregator to assign the data may be made based on the current relative load levels on the aggregators, their history of usage, their processing capabilities or other. This information is periodically received by director 500 from the aggregators through path 514. Using the databases available, an aggregator selector 502 then selects one of the aggregators to service the pending data capture request and transmits the selection to the requesting server via path 516. Aggregator selector 502 may select an aggregator for data capture using any of a wide variety of methodologies.
- the heuristics are limitless and may depend upon the nature of the aggregators, the data to be captured, and the actual implementation of the overall system.
- the information contained within databases 504-508 and those not shown may be used in many different ways to choose an aggregator for data capture. A few techniques will now be discussed.
- selector 502 may route the incoming data capture request to one of the aggregators using a conventional routing methodology (such as round robin, based on the relative load levels, and the like).
- a conventional routing methodology such as round robin, based on the relative load levels, and the like.
- selector 502 may intelligently route the incoming data capture request to a specific aggregator that can service the request in the most timely manner. Additionally and/or alternatively, all aggregators may be loaded with data capture requests gradually and equally so as to minimize the impact on the whole aggregator network if any random aggregator fails.
- one of the aggregators is particularly powerful in terms of data storage capacity, it may be wise to avoid allowing that powerful aggregator to handle a high percentage of the data capture requests to lessen the adverse impact in the event of failure by that aggregator.
- an additional aggregator to provide additional fault tolerance protection for stored data may be made by reviewing the load status database, the latency data and/or the processing capability database kept at director 500, without regard to whether the additional aggregator is "local” or "remote.” This may very well be the case if the aggregators of the system network are connected through reliable, high speed network connections and the geographical distinction may therefore be less important. In this case, the aggregator that is mostly lightly loaded may well be selected to capture redundant data.
- a director may be co-located with the router that routes the traffic to the aggregators, or it may be implemented separately from the router. It should be kept in mind that although Figure 3 shows a director 500 for each of the web server stage, the business logic stage, and the data base stage, there is no requirement that there must be a director for each stage. The provision of a single director may be made for all servers, or some servers may do without a director and dump data directly to an aggregator computer.
- BUSINESS LOGIC SERVER Figure 6 illustrates an exemplary business logic server 600 having a business code module 602, a filtering/policy module 604, a handler module 606 and an event processor 608.
- the information that is to be used for data mining is collected by the servers by a procedure termed "raising an event". For example, the process of a user logging in through a web server may raise a number of events such as the login name, the IP address the user is logging in from, and whether the login is successful.
- Code module 602 provides the code that drives the functions of server 600.
- Code module 602 may represent any suitable business application software that is running on server 600. This software is the focus of the data collection efforts for data mining.
- raise event calls are placed to signal when the event does occur, i.e., is raised. Therefore, code module 602 may contain a number of raise event calls as shown, which allow notice to be given when that an event occurs so that the data may be collected. Use of a raise event call is well known in the art.
- raise event calls may be sprinkled liberally throughout the code, wherever it is possible that data might need to be collected.
- Handler module 606 includes a number of handler subroutines, for example, one for each event to be raised. There may also be instances when a handler is created to accommodate more than one event or, alternatively, where the occurrence of an event causes more than one handler to be invoked. There may also be instances where handlers are not created for certain events, such as when be events are raised but the information generated by the occurrence of these events do not need to be collected at this time. Each handler subroutine, tailored to its corresponding event, is invoked when two conditions are met, namely, when an event corresponding to that handler occurs and if that particular event had previously been chosen to generate information for future data collection as explained in Figure 9.
- Event processor 608 may be any suitable software module for implementing Figures 9 and 10.
- Filtering module 604 includes a conceptual event table that lists all the events that have been created in code module 602 and can be raised when the user activity corresponding to that event occurs.
- the events in this event table may be turned on or off, depending on the purpose, for example, if the task at hand is to monitor security, the security-related events would be turned on. The same would hold true for other tasks such as load balancing, cross-selling, dynamic decision making, etc. Events may be turned on or off through the use of a simple flag for each event or other similar technique.
- the filtering module may also be provided with logic to enable the module to make certain strategic decisions based on feedback received. An exemplary process by which an exemplary filtering module operates is provided in the flowcharts that will be described later in Figures 9 and 10. FLOW DIAGRAMS FOR EVENT PROCESSING
- Figure 7 illustrates a process flow 700 of creating an event for use in data mining.
- a user activity is examined to determine if it is a candidate for data mining, that is, if that particular type of user activity is suitable for data collection. This is achieved by having the programmer analyze the primary business application module which he is writing. If a user activity is deemed to be a suitable candidate for data mining, an event having a unique label is created for that user activity in 710, followed by the documenting of the existence of that user activity event by recordation of its name in 714. For example, if a login involves ten steps, some of these steps might be selected to create events.
- some events that may be created during the login process may include bi-state events such as login success or failure, or singular events such as user name, login time, IP address, etc.
- Other types of events may relate to specific business applications such as human resources, accounting, budgeting, planning, etc.
- Events related to human resources include: employee enrollment, time card entry, vacation request, benefits administration, etc.
- the programmer can be opportunistic and place raise event calls in every location where data that may be remotely interesting is generated, and the programmer is only required to make judgment calls to eliminate repetitive or redundant data gathering. This would eliminate the risk of omitting what might later be considered as important data without raising concerns of ending up with an overabundance of unnecessary data.
- the events are turned on and off dynamically using the event table of module 604.
- Figure 8 illustrates a process flow 800 of setting up a listener for capturing a selected event by that listener.
- the process begins with an expert reviewing the list of documented events in 804. Then, based on his knowledge of each domain, the expert selects an event that is appropriate for a particular business need in 810. At this point in time, the event may be turned on via its flag in the event table of module 604. Advantageously, the event may be toggled on or off at any future time through the event table, which decouples data collection from the actual business application software.
- handler code for the selected event is written to handle that event should the selected event be raised. The function of the handler code is primarily to respond to the raise event call and capture the data generated for later storage into an aggregator.
- FIG. 9 illustrates a process flow 900 of processing an event once that event is raised.
- Process flow 900 is controlled by the event processor 608 and begins when an event is raised in 904.
- a daemon thread is signaled as in 906, followed by the obtaining of an event identifier as in 908. Then, in 910, the event is looked up in the event table of filtering module 604 to determine if the event has been turned on, that is, if it has been chosen for data collection. There may also be other qualifiers besides a bi-state decision in the event table to determine whether data generated by the occurrence of a particular event will be raised.
- the other qualifiers may be incorporated in the policy module, and a simplified event table limited to a bi-state qualifier selection may be used for this event processing procedure.
- the decision is made in 914, wherein if the raised event has been turned on, the handler for that event is invoked in 916. Once the handler is invoked, it runs as in 918 and captures data to the aggregator as in 920 by transferring the data over a network connection from the server.
- the handler functions may also be expanded to include utilities other than capturing data such as online data mining in response to the occurrence of a corresponding event.
- the daemon thread is suspended as in 922. If the event has not been turned on, the process flow proceeds from decision point 914 directly to 922 to suspend the daemon thread. Once the daemon thread is suspended, it waits for the next time it is signaled, for example, when another event is raised.
- daemon threads in processing events allows for asynchronous performance of the event processor.
- a daemon thread takes over the task of determining whether the event is turned on, and if so, it invokes the corresponding handler for the event to perform handler functions such capturing data, dumping the data to an aggregator, etc.
- the code that raised the event proceeds with its path to perform its primary business logic functions whereas if the event processor were designed as a synchronous machine, it would not be possible to achieve optimal performance by using multiple processors that run in parallel. If there is only a single processor, however, it would perform as a synchronous machine without the capability to maximize performance, and the users would not be able to take advantage of the asynchronous characteristics of this inventive infrastructure.
- Figure 10 illustrates a process flow 1000 to filter the process of collecting data by taking actions in response to events raised. This process may be used to discover irregularities in the use of the software or correlation amongst events.
- the filtering process is directed by the filtering/policy module 604, which includes the event table listing all the events that can be raised as well as whether they are on or off. The events may be toggled on and off manually, or the toggling process may be accomplished by automating the code.
- Filtering process 1000 provides an example of how such code automation may be accomplished. The process begins at 1004 when notification from the event processor signals that an event has been raised or that a handler has been invoked and is running in response to the event being raised.
- a counter specific to each type of event is provided for implementation, and this counter is incremented each time an event of that type occurs in 1010. If the count threshold for a particular event type is exceeded, as determined in 1014, some intervening acts may take place, for example, some events may be toggled on or off as in 1016, and the counters maybe reset as in 1018. If the count threshold has not been exceeded, process flow 1000 comes to a halt as shown in 1022.
- the event processor sends a number of notifications of unsuccessful login events, which may possibly be some unscrupulous users trying to use illegal passwords to break into the system.
- the implemented counter has a threshold of five login failures, once there are five consecutive login failures the event for login IP address is toggled on and the machine starts to store the IP addresses that all subsequent logins originate from in order to capture the addresses of the unscrupulous users. At this point, the counter for login failures may be reset. Other unusual events that may be counted include repeated access failures, sequential scanning of large numbers of employee records, etc. Other response mechanisms in the filtering module other than toggling events on and off may include, by way of example, blocking logins from a suspect IP address, sending warning messages to alert responsible personnel such as a systems administrator about security risks, etc.
- the improved data collection techniques have many inherent advantages which solve many of the problems encountered with earlier methods, for example, shortage in storage space, processing bottlenecks, and inflexibility in the selection of data to be collected.
- An important advantage is that the improved approach allows the business logic and the data collection to operate independently of one another.
- the improved data collection techniques also allows for flexibility in selecting what data is to be collected, usage of aggregators to isolate storage capability so that the data collection process does not affect the regular functions of the servers, and expansion capabilities in terms of data storage, among other advantages.
- COMPUTER SYSTEM EMBODIMENT Figures 11 and 12 illustrate a computer system 900 suitable for implementing embodiments of the present invention.
- Figure 11 shows one possible physical form of the computer system.
- the computer system may have many physical forms ranging from an integrated circuit, a printed circuit board and a small handheld device up to a huge super computer.
- Computer system 900 includes a monitor 902, a display 904, a housing 906, a disk drive 908, a keyboard 910 and a mouse 912.
- Disk 914 is a computer-readable medium used to transfer data to and from computer system 900.
- FIG 12 is an example of a block diagram for computer system 900. Attached to system bus 920 are a wide variety of subsystems.
- Processor(s) 922 also referred to as central processing units, or CPUs
- Memory 924 includes random access memory (RAM) and read-only memory (ROM).
- RAM random access memory
- ROM read-only memory
- RAM random access memory
- ROM read-only memory
- a fixed disk 926 is also coupled bi-directionally to CPU 922; it provides additional data storage capacity and may also include any of the computer-readable media described below.
- Fixed disk 926 may be used to store programs, data and the like and is typically a secondary storage medium (such as a hard disk) that is slower than primary storage. It will be appreciated that the information retained within fixed disk 926, may, in appropriate cases, be incorporated in standard fashion as virtual memory in memory 924.
- Removable disk 914 may take the form of any of the computer-readable media described below.
- CPU 922 is also coupled to a variety of input/output devices such as display 904, keyboard 910, mouse 912 and speakers 930.
- an input/output device may be any of: video displays, track balls, mice, keyboards, microphones, touch- sensitive displays, transducer card readers, magnetic or paper tape readers, tablets, styluses, voice or handwriting recognizers, biometrics readers, or other computers.
- CPU 922 optionally may be coupled to another computer or telecommunications network using network interface 940. With such a network interface, it is contemplated that the CPU might receive information from the network, or might output information to the network in the course of performing the above-described method steps.
- method embodiments of the present invention may execute solely upon CPU 922 or may execute over a network such as the Internet in conjunction with a remote CPU that shares a portion of the processing.
- embodiments of the present invention further relate to computer storage products with a computer-readable medium that have computer code thereon for performing various computer-implemented operations.
- the media and computer code may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well known and available to those having skill in the computer software arts.
- Examples of computer-readable media include, but are not limited to: magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROMs and holographic devices; magneto-optical media such as floptical disks; and hardware devices that are specially configured to store and execute program code, such as application-specific integrated circuits (ASICs), programmable logic devices (PLDs) and ROM and RAM devices.
- Examples of computer code include machine code, such as produced by a compiler, and files containing higher level code that are executed by a computer using an interpreter.
- the computer infrastructure is by no means limited to what is described, in fact, it may include servers that also generate data for data mining but are functionally different from the servers provided in the above examples.
- the aggregator network may have a wide variety of topologies, for example, the aggregators at a certain level may be linked to multiple aggregator intelligent director. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU59071/00A AU5907100A (en) | 1999-06-30 | 2000-06-29 | Data aggregator architecture for data mining |
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US34525999A | 1999-06-30 | 1999-06-30 | |
US34517099A | 1999-06-30 | 1999-06-30 | |
US34522599A | 1999-06-30 | 1999-06-30 | |
US09/345,259 | 1999-06-30 | ||
US09/345,170 | 1999-06-30 | ||
US09/345,225 | 1999-06-30 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2001001297A1 true WO2001001297A1 (en) | 2001-01-04 |
WO2001001297A9 WO2001001297A9 (en) | 2002-07-25 |
Family
ID=27407675
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2000/018183 WO2001001297A1 (en) | 1999-06-30 | 2000-06-29 | Data aggregator architecture for data mining |
Country Status (2)
Country | Link |
---|---|
AU (1) | AU5907100A (en) |
WO (1) | WO2001001297A1 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5446885A (en) * | 1992-05-15 | 1995-08-29 | International Business Machines Corporation | Event driven management information system with rule-based applications structure stored in a relational database |
US5873095A (en) * | 1996-08-12 | 1999-02-16 | Electronic Data Systems Corporation | System and method for maintaining current status of employees in a work force |
US5890150A (en) * | 1997-01-24 | 1999-03-30 | Hitachi, Ltd. | Random sampling method for use in a database processing system and a database processing system based thereon |
-
2000
- 2000-06-29 AU AU59071/00A patent/AU5907100A/en not_active Abandoned
- 2000-06-29 WO PCT/US2000/018183 patent/WO2001001297A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5446885A (en) * | 1992-05-15 | 1995-08-29 | International Business Machines Corporation | Event driven management information system with rule-based applications structure stored in a relational database |
US5873095A (en) * | 1996-08-12 | 1999-02-16 | Electronic Data Systems Corporation | System and method for maintaining current status of employees in a work force |
US5890150A (en) * | 1997-01-24 | 1999-03-30 | Hitachi, Ltd. | Random sampling method for use in a database processing system and a database processing system based thereon |
Also Published As
Publication number | Publication date |
---|---|
AU5907100A (en) | 2001-01-31 |
WO2001001297A9 (en) | 2002-07-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11656915B2 (en) | Virtual systems management | |
CA2503987C (en) | System and method for performance management in a multi-tier computing environment | |
US7743142B2 (en) | Verifying resource functionality before use by a grid job submitted to a grid environment | |
Appleby et al. | Oceano-SLA based management of a computing utility | |
US7296268B2 (en) | Dynamic monitor and controller of availability of a load-balancing cluster | |
US7711845B2 (en) | Apparatus, method and system for improving application performance across a communications network | |
US7788375B2 (en) | Coordinating the monitoring, management, and prediction of unintended changes within a grid environment | |
CN102075556B (en) | Method for designing service architecture with large-scale loading capacity | |
Meng et al. | State monitoring in cloud datacenters | |
US20030212643A1 (en) | System and method to combine a product database with an existing enterprise to model best usage of funds for the enterprise | |
JP2007500908A (en) | Method, apparatus, and program for autonomic failover | |
CN105847237A (en) | Safety management method and device based on NFV (Network Function Virtualization) | |
CN111800484B (en) | Service anti-destruction replacing method for mobile edge information service system | |
WO2001001297A1 (en) | Data aggregator architecture for data mining | |
JP2004164610A (en) | Management device | |
CN110266564A (en) | The method of the method and control device and its execution of detection device and its execution | |
CN117573479B (en) | A condition monitoring method and system architecture for multi-source targets of information systems | |
CN115550365B (en) | A processing method and electronic device | |
CN118413536B (en) | A resource processing method and device based on edge computing | |
Kim et al. | Highly available and efficient load cluster management system using SNMP and Web | |
CN118101521A (en) | Network connectivity status detection method, device and storage medium | |
WO2024235296A1 (en) | Application autoscaling method and system | |
WO2025057236A1 (en) | Method and system for distributing a traffic load using an interface | |
CN120128542A (en) | A traffic distribution system based on redundant switches | |
Haney et al. | Load-balancing for mysql |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
AK | Designated states |
Kind code of ref document: C2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: C2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
COP | Corrected version of pamphlet |
Free format text: PAGES 1/11-11/11, DRAWINGS, REPLACED BY NEW PAGES 1/11-11/11; DUE TO LATE TRANSMITTAL BY THE RECEIVING OFFICE |
|
122 | Ep: pct application non-entry in european phase | ||
NENP | Non-entry into the national phase |
Ref country code: JP |