A kind of fault root cause analysis method and analytical equipment
Technical field
The present invention relates to data mining and field of network management, particularly relate to a kind of fault root cause analysis method and analytical equipment.
Background technology
Along with the development of network technology, broadband router application in a network becomes more and more extensive, and occupies critical role in a network.Then, broadband router there will be fault unavoidably in running, when broadband router breaks down, if determine fault occurrence reason not in time, fix a breakdown, then network can be caused to occur temporary interruption, bring inconvenience and loss to enterprise, therefore, network failure occurrence cause is determined in time and the fault getting rid of broadband router is necessary.
Due to, contain major part in the network log that broadband router produces and run relevant information with broadband router, therefore, existing technical staff can locate by analyzing network log the reason (i.e. fault root because of) that broadband router fault occurs.But, realizing in process of the present invention, technical staff finds: the mode of current failure root cause analysis many employings manual analysis daily record, in analytic process, the artificial part participated in is more, dropped into a large amount of manpowers and time, meanwhile, need again in conjunction with a large amount of professional knowledge locating network fault roots because of, fault root cause analysis efficiency is lower, and then causes to get rid of network failure timely fast.
Summary of the invention
For solving the problem, the embodiment of the present invention provides a kind of fault root cause analysis method and analytical equipment, to solve in existing fault root cause analysis process, need to adopt a large amount of manpowers and time to analyze network log, the fault root cause analysis efficiency caused is lower, can not get rid of the problem of network failure in time.
Optional for reaching above-mentioned order, embodiments of the invention adopt following technical scheme:
First aspect, the embodiment of the present invention provides a kind of fault root cause analysis method, and performed by analytical equipment, described method can comprise:
Determine some fault time of the network equipment;
Obtain the first log information collection that the described network equipment produces in first time period; Described first log information collection comprises M class log information, described M be more than or equal to 1 integer, described first time period is: the time period from the second moment after the first moment before described fault time point to described fault time point;
According to presupposition analysis strategy, class log information every in described M class log information is analyzed, obtain N class root in described M class log information because of daily record; Described N class root because of daily record is: the log information produced when the described network equipment breaks down, described M >=N >=1, and described presupposition analysis strategy is: the rule that when predetermined described network equipment failure occurs, daily record occurs;
According to described N class root because the reason that the described network equipment breaks down is determined in daily record.
Due to, the network equipment may produce at least one class log information (root is because of daily record) when fault occurs, and the appearance of these class log informations near putting fault time presents obvious characteristic rule, for this reason, invention technician is in conjunction with a large amount of fars, in advance the log information that a large amount of fault time produces near point is analyzed, excavate the characteristic rule that fault root occurs because of daily record: at least one class log information produced when (1) fault occurs usually can be combined and repeat and continual appearance near fault point; (2) the class log information produced when fault occurs frequently occurs usually within a long time period, and at some fault time place in the trend increased suddenly.
Therefore, can in implementation in the one of first aspect, describedly analyze class log information every in described M class log information according to presupposition analysis strategy, obtaining N class root in described M class log information because of daily record can comprise:
M corresponding for described M class log information Log Types is divided into the daily record combination that i different; At least one Log Types comprised in a described M Log Types is combined in each daily record, and described each daily record to combine the Log Types comprised different, described i be more than or equal to 1 integer;
Travel through described i daily record combination, determine described i daily record combine at least one root combine because of daily record; Described is combined as because of daily record: frequent and continue the daily record combination that occurs in described first time period;
To at least one root described because daily record is combined into row relax;
By at least one root after process because at least one class log information that daily record combination is corresponding is defined as described N class root because of daily record.
Can select, arbitrary daily record is combined, determine that described daily record is combined as root and combines can comprise because of daily record:
Described first time period is divided at least one time window, each time window at least one time window described is divided at least one little time window;
Calculate described daily record and be combined in the first frequency occurred in arbitrary time window; Described first frequency is: the ratio of the number of the little time window that the number of the little time window of the described daily record combination of the interior appearance of described time window and described time window comprise;
If described first frequency is greater than the first predetermined threshold value, then determine that described daily record is combined as the frequent daily record combination in described time window;
Calculate described frequent daily record and be combined in the second frequency occurred in described first time period; Described second frequency is: the ratio of the number of the time window that the number of the time window of the described frequent daily record combination of the interior appearance of described first time period and described first time period comprise;
If described second frequency is greater than the second predetermined threshold value, then determine that described frequent daily record is combined as root because of daily record and combines.
Optionally, can comprise because daily record is combined into row relax at least one root described:
If travel through described i daily record to combine at least one root of determining because having first in daily record combination because daily record combination and second are because of daily record combination, and described first because of daily record combination be included in described second because of during daily record combines; Then described at least one root described to be comprised because daily record is combined into row relax:
When described first because of second frequency that daily record combination is corresponding be greater than described second because of second frequency that daily record combination is corresponding time, do not reject described first because of daily record combination;
When described first because of second frequency that daily record combination is corresponding be less than described second because of second frequency that daily record combination is corresponding time, reject described first because of daily record combination.
Or, if travel through described i daily record to combine at least one root of determining because having the 3rd in daily record combination because of daily record combination, and described 3rd root be combined as before point described fault time because of daily record combines because of daily record, then describedly at least one root determined combined to traversal described i daily record comprise because daily record is combined into row relax:
Reject described 3rd because of daily record combination.
It should be noted that, in embodiments of the present invention, described daily record is combined as the combination comprising at least one Log Types, described time window is a time interval, in the process that time window is divided, the size of each time window can equal also can be unequal, the size of each little time window can equal also can be unequal, described daily record is combined in time window and occurs referring to: the log information that described daily record combines the Log Types that comprises corresponding occurred in the time interval corresponding to this time window.
First predetermined threshold value and the second predetermined threshold value can be arranged as required, the embodiment of the present invention does not limit this, if the first frequency of a daily record combination correspondence is greater than the first predetermined threshold value, then represent in this daily record combination of sets and at a time frequently occur, be defined as frequent daily record combination, if first frequency corresponding to this daily record combination is less than or equal to the first predetermined threshold value, can think log information corresponding to this daily record combination be the network equipment normal time some log informations of producing, if the second frequency of frequent daily record combination is greater than the second predetermined threshold value, then represent that at a time frequent interior the continuing of first time period that aim at day occurred frequently occurs, namely this frequent daily record is combined as in first time period and repeats and the uninterrupted daily record occurred, meet the rule that fault root occurs because of daily record, determine that this frequent daily record is combined as root because of daily record and combines, this daily record is combined the log information that log information corresponding to the Log Types that comprises is defined as producing when the network equipment breaks down, if the second frequency of frequent daily record combination is less than or equal to the second predetermined threshold value, then represent that aiming at day of at a time frequently occurring can not continuation occur in first time period, can think log information corresponding to this daily record combination be the network equipment normal time some log informations of producing.
In addition, by finding a large amount of fault correlation log and normal log analysis, normal daily record often presents the rule periodically occurred, distribution is relatively more even, and occur in whole daily record more frequent.And fault Gen Yin aims at fault point in increasing trend suddenly, and but almost never occur in the daily record that non-faulting pattern is corresponding, this content information content higher with the frequency of occurrences described in information theory is lower consistent, for this reason, can in implementation in another of first aspect, describedly analyze class log information every in described M class log information according to presupposition analysis strategy, obtaining N class root in described M class log information because of daily record can comprise:
Determine and described M class log information M exceptional value one to one; Described exceptional value is used for representing: the frequent degree that a class log information occurs within the second time period and mutation content, and described second time period comprises described first time period;
From a described M exceptional value, obtain top n largest outliers, concentrated by described first log information the N class log information corresponding with described top n largest outliers to be defined as described N class root because of daily record.
Optionally, the second daily record information set that the described network equipment produces within described second time period can be obtained; Described second daily record information set comprises at least one log information, the corresponding time point of each log information;
Preliminary treatment is carried out to described second daily record information set, obtains the first daily record behavioural matrix; Described first daily record behavioural matrix comprises: Q group daily record behavior vector, often organize daily record behavior vector and take a time interval, often organize daily record behavior vector and comprise R element, described R is the group number of the Log Types that described second daily record information set is corresponding, described R >=described M; A jth element representation in described daily record behavior vector: the number of jth class log information within the time interval of described daily record behavior vector;
According to formula
Calculate the exceptional value of described R class log information respectively, obtain and described R class log information R exceptional value one to one;
Obtain and described M class log information M exceptional value one to one from a described R exceptional value.
Wherein, the described time interval is larger, is generally dozens of minutes, and often organize daily record behavior vector the time interval can equal also can be unequal.
Described
represent that in described R class log information, jth class log information concentrates the frequent degree of appearance in described second daily record information, described in
represent the mutation content that in described R class log information, jth class log information is concentrated in described second daily record information, described q
jfor comprising the group number of the daily record behavior vector of jth class log information, described c
k+1, jrepresent the total quantity of jth class log information in+1 time interval of kth, described c
k,jrepresent the total quantity of jth class log information in the kth time interval.
It should be noted that, above-mentioned two kinds of modes obtain N class root in described M class log information because of daily record and can perform separately, and also can combine execution, the exact cause occurred with locating network fault more accurately.
Finally, because log information is the recorded information of the network equipment at the crawler behavior of a time point, therefore, described N class root can directly be obtained because of daily record, using the reason that N class root breaks down as the network equipment because of recorded information that daily record is corresponding; Existing analytical method can also be adopted in conjunction with N class root because daily record is analyzed, determine the most basic failure cause causing N class root because of daily record.
Can also obtain on N class root in the described M class log information basis because of the mode of daily record at the first, by each Log Types at least one Log Types after merging with its at least one root because of daily record combine in the number of times corresponding record that occurs, using log information corresponding for Log Types the highest for number of times directly as the basic reason of network equipment failure generation.
Second aspect, the embodiment of the present invention also provides a kind of analytical equipment, for the fault root of analyzing the network equipment because of, described analytical equipment can comprise:
Determining unit, for determining some fault time of the network equipment;
Acquiring unit, for obtaining the first log information collection that the described network equipment produces in first time period; Described first log information collection comprises M class log information, described M be more than or equal to 1 integer, described first time period is: the time period from the second moment after the first moment before described fault time point to described fault time point;
Analytic unit, analyzes for every class log information in the M class log information that gets described acquiring unit according to presupposition analysis strategy, obtains N class root in described M class log information because of daily record; Described N class root because of daily record is: the log information produced when the described network equipment breaks down, described M >=N >=1, and described presupposition analysis strategy is: the rule that when predetermined described network equipment failure occurs, daily record occurs;
Described determining unit, also for according to described N class root because the reason that the described network equipment breaks down is determined in daily record.
Due to, the network equipment may produce at least one class log information (root is because of daily record) when fault occurs, and the appearance of these class log informations near putting fault time presents obvious characteristic rule, for this reason, invention technician is in conjunction with a large amount of fars, in advance the log information that a large amount of fault time produces near point is analyzed, excavate the characteristic rule that fault root occurs because of daily record: at least one class log information produced when (1) fault occurs usually can be combined and repeat and continual appearance near fault point; (2) the class log information produced when fault occurs frequently occurs usually within a long time period, and at some fault time place in the trend increased suddenly.
Therefore, can in implementation in the one of second aspect, described analytic unit may be used for:
M corresponding for described M class log information Log Types is divided into the daily record combination that i different; At least one Log Types comprised in a described M Log Types is combined in each daily record, and described each daily record to combine the Log Types comprised different, described i be more than or equal to 1 integer;
Travel through described i daily record combination, determine described i daily record combine at least one root combine because of daily record; Described is combined as because of daily record: frequent and continue the daily record combination that occurs in described first time period;
To at least one root described because daily record is combined into row relax;
By at least one root after process because at least one class log information that daily record combination is corresponding is defined as described N class root because of daily record.
Wherein, combine for arbitrary daily record, described analytic unit may be used for:
Described first time period is divided at least one time window, each time window at least one time window described is divided at least one little time window;
Calculate described daily record and be combined in the first frequency occurred in arbitrary time window; Described first frequency is: the ratio of the number of the little time window that the number of the little time window of the described daily record combination of the interior appearance of described time window and described time window comprise;
If described first frequency is greater than the first predetermined threshold value, then determine that described daily record is combined as the frequent daily record combination in described time window;
Calculate described frequent daily record and be combined in the second frequency occurred in described first time period; Described second frequency is: the ratio of the number of the time window that the number of the time window of the described frequent daily record combination of the interior appearance of described first time period and described first time period comprise;
If described second frequency is greater than the second predetermined threshold value, then determine that described frequent daily record is combined as root because of daily record and combines.
It should be noted that, in embodiments of the present invention, described daily record is combined as the combination comprising at least one Log Types, described time window is a time interval, in the process that time window is divided, the size of each time window can equal also can be unequal, the size of each little time window can equal also can be unequal, described daily record is combined in time window and occurs referring to: the log information that described daily record combines the Log Types that comprises corresponding occurred in the time interval corresponding to this time window.
First predetermined threshold value and the second predetermined threshold value can be arranged as required, the embodiment of the present invention does not limit this, if the first frequency of a daily record combination correspondence is greater than the first predetermined threshold value, then represent in this daily record combination of sets and at a time frequently occur, be defined as frequent daily record combination, if first frequency corresponding to this daily record combination is less than or equal to the first predetermined threshold value, can think log information corresponding to this daily record combination be the network equipment normal time some log informations of producing, if the second frequency of frequent daily record combination is greater than the second predetermined threshold value, then represent that at a time frequent interior the continuing of first time period that aim at day occurred frequently occurs, namely this frequent daily record is combined as in first time period and repeats and the uninterrupted daily record occurred, meet the rule that fault root occurs because of daily record, determine that this frequent daily record is combined as root because of daily record and combines, this daily record is combined the log information that log information corresponding to the Log Types that comprises is defined as producing when the network equipment breaks down, if the second frequency of frequent daily record combination is less than or equal to the second predetermined threshold value, then represent that aiming at day of at a time frequently occurring can not continuation occur in first time period, can think log information corresponding to this daily record combination be the network equipment normal time some log informations of producing.
In addition, by finding a large amount of fault correlation log and normal log analysis, normal daily record often presents the rule periodically occurred, distribution is relatively more even, and occur in whole daily record more frequent.And fault Gen Yin aims at fault point in increasing trend suddenly, and but almost never occur in the daily record that non-faulting pattern is corresponding, this content information content higher with the frequency of occurrences described in information theory is lower consistent, for this reason, can in implementation in another of second aspect, described analytic unit may be used for:
Determine and described M class log information M exceptional value one to one; Described exceptional value is used for representing: the frequent degree that a class log information occurs within the second time period and mutation content, and described second time period comprises described first time period;
Described analytic unit is used for: from a described M exceptional value, obtain top n largest outliers, is concentrated by described first log information the N class log information corresponding with described top n largest outliers to be defined as described N class root because of daily record.
Optionally, the second daily record information set that the described network equipment produces within described second time period can be obtained; Described second daily record information set comprises at least one log information, the corresponding time point of each log information;
Preliminary treatment is carried out to described second daily record information set, obtains the first daily record behavioural matrix; Described first daily record behavioural matrix comprises: Q group daily record behavior vector, often organize daily record behavior vector and take a time interval, often organize daily record behavior vector and comprise R element, described R is the group number of the Log Types that described second daily record information set is corresponding, described R >=described M; A jth element representation in described daily record behavior vector: the number of jth class log information within the time interval of described daily record behavior vector;
According to formula
Calculate the exceptional value of described R class log information respectively, obtain and described R class log information R exceptional value one to one;
Obtain and described M class log information M exceptional value one to one from a described R exceptional value.
Wherein, the described time interval is larger, is generally dozens of minutes, and often organize daily record behavior vector the time interval can equal also can be unequal.
Described
represent that in described R class log information, jth class log information concentrates the frequent degree of appearance in described second daily record information, described in
represent the mutation content that in described R class log information, jth class log information is concentrated in described second daily record information, described q
jfor comprising the group number of the daily record behavior vector of jth class log information, described c
k+1, jrepresent the total quantity of jth class log information in+1 time interval of kth, described c
k,jrepresent the total quantity of jth class log information in the kth time interval.
It should be noted that, above-mentioned two kinds of modes obtain N class root in described M class log information because of daily record and can perform separately, and also can combine execution, the exact cause occurred with locating network fault more accurately.
Finally, because log information is the recorded information of the network equipment at the crawler behavior of a time point, therefore, described determining unit, may be used for:
Direct acquisition described N class root because of daily record, using the reason that N class root breaks down as the network equipment because of recorded information that daily record is corresponding;
Can also be used for adopting existing analytical method in conjunction with N class root because daily record is analyzed, determine the most basic failure cause causing N class root because of daily record.
Also may be used for adopting the first to obtain on N class root in the described M class log information basis because of the mode of daily record at analytic unit, by each Log Types at least one Log Types after merging with its at least one root because of daily record combine in the number of times corresponding record that occurs, using log information corresponding for Log Types the highest for number of times directly as the basic reason of network equipment failure generation.
The third aspect, the embodiment of the present invention also provides a kind of analytical equipment, for the fault root of analyzing the network equipment because of, described analytical equipment can comprise:
Processor, for determining some fault time of the network equipment;
Receiver, for obtaining the first log information collection that the described network equipment produces in first time period; Described first log information collection comprises M class log information, described M be more than or equal to 1 integer, described first time period is: the time period from the second moment after the first moment before described fault time point to described fault time point;
Processor, analyzes for every class log information in the M class log information that gets described receiver according to presupposition analysis strategy, obtains N class root in described M class log information because of daily record; Described N class root because of daily record is: the log information produced when the described network equipment breaks down, described M >=N >=1, and described presupposition analysis strategy is: the rule that when predetermined described network equipment failure occurs, daily record occurs;
Described processor, also for according to described N class root because the reason that the described network equipment breaks down is determined in daily record.
Due to, the network equipment may produce at least one class log information (root is because of daily record) when fault occurs, and the appearance of these class log informations near putting fault time presents obvious characteristic rule, for this reason, invention technician is in conjunction with a large amount of fars, in advance the log information that a large amount of fault time produces near point is analyzed, excavate the characteristic rule that fault root occurs because of daily record: at least one class log information produced when (1) fault occurs usually can be combined and repeat and continual appearance near fault point; (2) the class log information produced when fault occurs frequently occurs usually within a long time period, and at some fault time place in the trend increased suddenly.
Therefore, can in implementation in the one of the third aspect, described processor may be used for:
M corresponding for described M class log information Log Types is divided into the daily record combination that i different; At least one Log Types comprised in a described M Log Types is combined in each daily record, and described each daily record to combine the Log Types comprised different, described i be more than or equal to 1 integer;
Travel through described i daily record combination, determine described i daily record combine at least one root combine because of daily record; Described is combined as because of daily record: frequent and continue the daily record combination that occurs in described first time period;
To at least one root described because daily record is combined into row relax;
By at least one root after process because at least one class log information that daily record combination is corresponding is defined as described N class root because of daily record.
Wherein, combine for arbitrary daily record, described processor may be used for:
Described first time period is divided at least one time window, each time window at least one time window described is divided at least one little time window;
Calculate described daily record and be combined in the first frequency occurred in arbitrary time window; Described first frequency is: the ratio of the number of the little time window that the number of the little time window of the described daily record combination of the interior appearance of described time window and described time window comprise;
If described first frequency is greater than the first predetermined threshold value, then determine that described daily record is combined as the frequent daily record combination in described time window;
Calculate described frequent daily record and be combined in the second frequency occurred in described first time period; Described second frequency is: the ratio of the number of the time window that the number of the time window of the described frequent daily record combination of the interior appearance of described first time period and described first time period comprise;
If described second frequency is greater than the second predetermined threshold value, then determine that described frequent daily record is combined as root because of daily record and combines.
It should be noted that, in embodiments of the present invention, described daily record is combined as the combination comprising at least one Log Types, described time window is a time interval, in the process that time window is divided, the size of each time window can equal also can be unequal, the size of each little time window can equal also can be unequal, described daily record is combined in time window and occurs referring to: the log information that described daily record combines the Log Types that comprises corresponding occurred in the time interval corresponding to this time window.
First predetermined threshold value and the second predetermined threshold value can be arranged as required, the embodiment of the present invention does not limit this, if the first frequency of a daily record combination correspondence is greater than the first predetermined threshold value, then represent in this daily record combination of sets and at a time frequently occur, be defined as frequent daily record combination, if first frequency corresponding to this daily record combination is less than or equal to the first predetermined threshold value, can think log information corresponding to this daily record combination be the network equipment normal time some log informations of producing, if the second frequency of frequent daily record combination is greater than the second predetermined threshold value, then represent that at a time frequent interior the continuing of first time period that aim at day occurred frequently occurs, namely this frequent daily record is combined as in first time period and repeats and the uninterrupted daily record occurred, meet the rule that fault root occurs because of daily record, determine that this frequent daily record is combined as root because of daily record and combines, this daily record is combined the log information that log information corresponding to the Log Types that comprises is defined as producing when the network equipment breaks down, if the second frequency of frequent daily record combination is less than or equal to the second predetermined threshold value, then represent that aiming at day of at a time frequently occurring can not continuation occur in first time period, can think log information corresponding to this daily record combination be the network equipment normal time some log informations of producing.
In addition, by finding a large amount of fault correlation log and normal log analysis, normal daily record often presents the rule periodically occurred, distribution is relatively more even, and occur in whole daily record more frequent.And fault Gen Yin aims at fault point in increasing trend suddenly, and but almost never occur in the daily record that non-faulting pattern is corresponding, this content information content higher with the frequency of occurrences described in information theory is lower consistent, for this reason, can in implementation in another of the third aspect, described processor may be used for:
Determine and described M class log information M exceptional value one to one; Described exceptional value is used for representing: the frequent degree that a class log information occurs within the second time period and mutation content, and described second time period comprises described first time period;
Described processor is used for: from a described M exceptional value, obtain top n largest outliers, is concentrated by described first log information the N class log information corresponding with described top n largest outliers to be defined as described N class root because of daily record.
Optionally, the second daily record information set that the described network equipment produces within described second time period can be obtained; Described second daily record information set comprises at least one log information, the corresponding time point of each log information;
Preliminary treatment is carried out to described second daily record information set, obtains the first daily record behavioural matrix; Described first daily record behavioural matrix comprises: Q group daily record behavior vector, often organize daily record behavior vector and take a time interval, often organize daily record behavior vector and comprise R element, described R is the group number of the Log Types that described second daily record information set is corresponding, described R >=described M; A jth element representation in described daily record behavior vector: the number of jth class log information within the time interval of described daily record behavior vector;
According to formula
Calculate the exceptional value of described R class log information respectively, obtain and described R class log information R exceptional value one to one;
Obtain and described M class log information M exceptional value one to one from a described R exceptional value.
Wherein, the described time interval is larger, is generally dozens of minutes, and often organize daily record behavior vector the time interval can equal also can be unequal.
Described
represent that in described R class log information, jth class log information concentrates the frequent degree of appearance in described second daily record information, described in
represent the mutation content that in described R class log information, jth class log information is concentrated in described second daily record information, described q
jfor comprising the group number of the daily record behavior vector of jth class log information, described c
k+1, jrepresent the total quantity of jth class log information in+1 time interval of kth, described c
k,jrepresent the total quantity of jth class log information in the kth time interval.
It should be noted that, above-mentioned two kinds of modes obtain N class root in described M class log information because of daily record and can perform separately, and also can combine execution, the exact cause occurred with locating network fault more accurately.
Finally, because log information is the recorded information of the network equipment at the crawler behavior of a time point, therefore, described processor, may be used for:
Direct acquisition described N class root because of daily record, using the reason that N class root breaks down as the network equipment because of recorded information that daily record is corresponding;
Can also be used for adopting existing analytical method in conjunction with N class root because daily record is analyzed, determine the most basic failure cause causing N class root because of daily record.
Also may be used for adopting the first to obtain on N class root in the described M class log information basis because of the mode of daily record at processor, by each Log Types at least one Log Types after merging with its at least one root because of daily record combine in the number of times corresponding record that occurs, using log information corresponding for Log Types the highest for number of times directly as the basic reason of network equipment failure generation.
As from the foregoing, the embodiment of the present invention provides a kind of fault root cause analysis method and analytical equipment, determines some fault time of the network equipment; Obtain the first log information collection that the described network equipment produces in first time period; Described first log information collection comprises M class log information, described M be more than or equal to 1 integer, described first time period is: the time period from the second moment after the first moment before described fault time point to described fault time point; According to presupposition analysis strategy, class log information every in described M class log information is analyzed, obtain N class root in described M class log information because of daily record; According to described N class root because the reason that the described network equipment breaks down is determined in daily record.So, automatically all kinds of log informations near fault time point are analyzed, obtain the log information meeting the rule that root occurs because of daily record when fault occurs, according to the basic reason that this log information determination network equipment breaks down, achieve to network equipment failure generation root because of automatic analysis, improve the efficiency of fault root cause analysis.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
The theory diagram of the fault root cause analysis that Fig. 1 provides for the embodiment of the present invention;
The structure chart of the analytical equipment 20 that Fig. 2 provides for the embodiment of the present invention;
The flow chart of the fault root cause analysis method that Fig. 3 provides for the embodiment of the present invention;
The structure chart of the analytical equipment 30 that Fig. 4 provides for the embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
General principle of the present invention is: first to the root near a large amount of fault point because daily record occurs that pattern carries out data mining and machine learning, find out the rule of fault root because occurring when fault occurs, then a kind of suitable Mathematical Method is found to carry out real-time analysis to the log information near fault point in online daily record according to this rule, if there is the class log information meeting this rule in the log information near fault point, then determine that this log information is that the root of network equipment when breaking down is because of daily record, and then according to the basic reason that this root occurs because of daily record determination network failure, so, according to suitable analytical method, automatic analysis is carried out to log information and find out root because of daily record, improve the efficiency of fault root cause analysis.
Such as, the theory diagram of the fault root cause analysis that Fig. 1 provides for the embodiment of the present invention, as shown in Figure 1, data mining and machine learning are carried out to offline logs and obtains the pests occurrence rule (as: Gen Yin aim at fault point near continual repeat or Gen Yin aim at fault point in suddenly increase trend) of fault root because of daily record, then, to the work of online daily record through log integrity, fault point confirmation, root cause analysis three aspect, orient the precise reason why that network failure occurs, and then failure cause is fed back to testing staff by analytical statement; Wherein, log integrity mainly comprises: Log Clustering, for processing unified for the daily record of same type; Fault point confirms mainly to refer to: confirm the time point that the network equipment breaks down; Root cause analysis mainly comprises: according to the rule of offline logs being carried out to fault root that data mining and machine learning obtain and occurring because of daily record, log information near fault point is analyzed, obtain and meet the log information of root because of daily record pests occurrence rule, according to the precise reason why that this log information determination network equipment failure occurs.It should be noted that, in the theory diagram shown in Fig. 1, offline logs refers to that the present invention trains the daily record of use, and online daily record refers to the actual log that the present invention applies.
Wherein, method provided by the invention can analytical equipment 20 as shown in Figure 2 perform, for carrying out accident analysis and location to the network equipment 10.Described analytical equipment 20 can be: any one equipment in the equipment such as switch, router, Network Management Equipment, Web (webpage) server, software defined network (SoftwareDefinedNetwork, SDN) controller.Optionally, as shown in Figure 2, described analytical equipment 20 can comprise: processor 2011, memory 2012, receiver 2013, transmitter 2014 and at least one communication bus 2015, for realizing connection between these devices and intercoming mutually;
Receiver 2013 can be used for carrying out data interaction between ext nal network element, as: the log information that collection network equipment 10 produces.
Memory 2012 can be volatile memory (volatilememory), such as random access memory (random-accessmemory, RAM); Or nonvolatile memory (non-volatilememory), such as read-only memory (read-onlymemory, ROM), flash memory (flashmemory), hard disk (harddiskdrive, or solid state hard disc (solid-statedrive, SSD) HDD); Or the combination of the memory of mentioned kind.
Processor 2011 may be a central processing unit (centralprocessingunit, referred to as CPU), also can be specific integrated circuit (ApplicationSpecificIntegratedCircuit, ASIC), or be configured to the one or more integrated circuits implementing the embodiment of the present invention, such as: one or more microprocessor (digitalsingnalprocessor, DSP), or, one or more field programmable gate array (FieldProgrammableGateArray, FPGA); For first carrying out data mining to offline logs and machine learning obtains the pests occurrence rule of fault root because of daily record, then, through log integrity, fault point, the fault point that the acquisition network equipment breaks down is confirmed to online daily record, obtain the log information near fault point, according to root because daily record pests occurrence rule carries out root cause analysis to the log information near fault point, obtain and meet the log information of root because of daily record pests occurrence rule, according to the basic reason that this log information determination network equipment breaks down.
Transmitter 2014 can be used for carrying out data interaction between ext nal network element, as: can be a human-computer interaction interface, feed back to testing staff for the failure cause oriented by processor 2011.
Communication bus 2015 can be divided into address bus, data/address bus, control bus etc., can be industry standard architecture (IndustryStandardArchitecture, ISA) bus, peripheral component interconnect (PeripheralComponent, PCI) bus or extended industry-standard architecture (ExtendedIndustryStandardArchitecture, EISA) bus etc.For ease of representing, only representing with a thick line in Fig. 2, but not representing the bus only having a bus or a type.
Optionally, processor 2011, for determining some fault time of the network equipment.
Wherein, the described time point that fault time, point broke down for the network equipment, due within a time period, may there is repeatedly fault in the network equipment, therefore, puts above-mentioned fault time and can refer to the time point that the network equipment once breaks down arbitrarily.
Receiver 2013, for obtaining the first log information collection that the described network equipment produces in first time period; Described first log information collection comprises M class log information, described M be more than or equal to 1 integer, described first time period is: the time period from the second moment after the first moment before fault time that described processor 2011 is determined point to described fault time point.
Wherein, described duration from described fault time o'clock to the first moment and can arranging as required from described fault time o'clock to the duration in the second moment, the embodiment of the present invention does not limit this, the present invention only to obtain the log information near putting fault time for principle, determines the first moment and the second moment.
Preferably, the log information of 40 minutes after the log information of 20 minutes and fault time point before can obtaining some fault time, using the log information that gets as the first log information collection in first time period; The log information of (as in 60 minutes) in a time period after only can also obtaining some fault time, using the log information that gets as the first log information collection.
Processor 2011, analyzes for every class log information in the M class log information that gets described receiver 2013 according to presupposition analysis strategy, obtains N class root in described M class log information because of daily record; Described N class root because of daily record is: the log information produced when the described network equipment breaks down, described M >=N >=1, and described presupposition analysis strategy is: the rule that when predetermined described network equipment failure occurs, daily record occurs.
Described processor 2011, also for the N class root that obtains according to described processor 2011 because the reason that the described network equipment breaks down is determined in daily record.
Because the network equipment may produce at least one class log information (root is because of daily record) when fault occurs, and the appearance of these class log informations near putting fault time presents obvious characteristic rule, for this reason, invention technician is in conjunction with a large amount of fars, in advance the log information that a large amount of fault time produces near point is analyzed, excavate the characteristic rule that fault root occurs because of daily record: at least one class log information produced when (1) fault occurs usually can be combined and repeat and continual appearance near fault point; (2) the class log information produced when fault occurs frequently occurs usually within a long time period, and at some fault time place in the trend increased suddenly, based on this, invention technician proposes and can determine and meet the analysis strategy of root because of daily record occurrence law, according to this analysis strategy, the M class log information got is analyzed, determine and meet a few class log informations of root because of daily record occurrence law, optionally, processor 2011 can obtain root because of daily record by following two kinds of modes:
(1) M corresponding for described M class log information Log Types is divided into the daily record combination that i different; At least one Log Types comprised in a described M Log Types is combined in each daily record, and described each daily record to combine the Log Types comprised different, described i be more than or equal to 1 integer;
Travel through described i daily record combination, determine described i daily record combine at least one root combine because of daily record; Described is combined as because of daily record: frequent and continue the daily record combination that occurs in described first time period;
To at least one root described because daily record is combined into row relax;
By at least one root after process because at least one class log information that daily record combination is corresponding is defined as described N class root because of daily record.
Wherein, arbitrary daily record is combined, determines that described daily record is combined as root and combines can comprise because of daily record:
Described first time period is divided at least one time window, each time window at least one time window described is divided at least one little time window;
Calculate described daily record and be combined in the first frequency occurred in arbitrary time window; Described first frequency is: the ratio of the number of the little time window that the number of the little time window of the described daily record combination of the interior appearance of described time window and described time window comprise;
If described first frequency is greater than the first predetermined threshold value, then determine that described daily record is combined as the frequent daily record combination in described time window;
Calculate described frequent daily record and be combined in the second frequency occurred in described first time period; Described second frequency is: the ratio of the number of the time window that the number of the time window of the described frequent daily record combination of the interior appearance of described first time period and described first time period comprise;
If described second frequency is greater than the second predetermined threshold value, then determine that described frequent daily record is combined as root because of daily record and combines.
It should be noted that, in embodiments of the present invention, described daily record is combined as the combination comprising at least one Log Types, described time window is a time interval, in the process that time window is divided, the size of each time window can equal also can be unequal, the size of each little time window can equal also can be unequal, described daily record is combined in time window and occurs referring to: the log information that described daily record combines the Log Types that comprises corresponding occurred in the time interval corresponding to this time window.
First predetermined threshold value and the second predetermined threshold value can be arranged as required, the embodiment of the present invention does not limit this, if the first frequency of a daily record combination correspondence is greater than the first predetermined threshold value, then represent in this daily record combination of sets and at a time frequently occur, be defined as frequent daily record combination, if first frequency corresponding to this daily record combination is less than or equal to the first predetermined threshold value, can think log information corresponding to this daily record combination be the network equipment normal time some log informations of producing, if the second frequency of frequent daily record combination is greater than the second predetermined threshold value, then represent that at a time frequent interior the continuing of first time period that aim at day occurred frequently occurs, namely this frequent daily record is combined as in first time period and repeats and the uninterrupted daily record occurred, meet the rule that fault root occurs because of daily record, determine that this frequent daily record is combined as root because of daily record and combines, this daily record is combined the log information that log information corresponding to the Log Types that comprises is defined as producing when the network equipment breaks down, if the second frequency of frequent daily record combination is less than or equal to the second predetermined threshold value, then represent that aiming at day of at a time frequently occurring can not continuation occur in first time period, can think log information corresponding to this daily record combination be the network equipment normal time some log informations of producing.
In addition, by finding a large amount of fault correlation log and normal log analysis, normal daily record often presents the rule periodically occurred, distribution is relatively more even, and occur in whole daily record more frequent.And fault Gen Yin aims at fault point in increasing trend suddenly, and but almost never occur in the daily record that non-faulting pattern is corresponding, this content information content higher with the frequency of occurrences described in information theory is lower consistent, we have proposed the exceptional value computational methods meeting daily record behavior pattern for this reason, and select fault log based on the exceptional value obtained, specific implementation is as shown in (2):
(2) determine and described M class log information M exceptional value one to one; Described exceptional value is used for representing: the frequent degree that a class log information occurs within the second time period and mutation content, and described second time period comprises described first time period;
From a described M exceptional value, obtain top n largest outliers, concentrated by described first log information the N class log information corresponding with described top n largest outliers to be defined as described N class root because of daily record.
Optionally, the second daily record information set that the described network equipment produces within described second time period can be obtained; Described second daily record information set comprises at least one log information, the corresponding time point of each log information;
Preliminary treatment is carried out to described second daily record information set, obtains the first daily record behavioural matrix; Described first daily record behavioural matrix comprises: Q group daily record behavior vector, often organize daily record behavior vector and take a time interval, often organize daily record behavior vector and comprise R element, described R is the group number of the Log Types that described second daily record information set is corresponding, described R >=described M; A jth element representation in described daily record behavior vector: the number of jth class log information within the time interval of described daily record behavior vector;
According to formula
Calculate the exceptional value of described R class log information respectively, obtain and described R class log information R exceptional value one to one;
Obtain and described M class log information M exceptional value one to one from a described R exceptional value.
Wherein, the described time interval is larger, is generally dozens of minutes, and often organize daily record behavior vector the time interval can equal also can be unequal.
Described
represent that in described R class log information, jth class log information concentrates the frequent degree of appearance in described second daily record information, described in
represent the mutation content that in described R class log information, jth class log information is concentrated in described second daily record information, described q
jfor comprising the group number of the daily record behavior vector of jth class log information, described c
k+1, jrepresent the total quantity of jth class log information in+1 time interval of kth, described c
k,jrepresent the total quantity of jth class log information in the kth time interval.
It should be noted that, above-mentioned two kinds of modes can perform separately, also can combine execution, with the exact cause of locating network fault generation more accurately, such as: the 1st class can be determined by first pass-through mode (1), 5th class log information is the daily record combination of frequent and lasting appearance, then, again according to mode (2) to the 1st class, the exceptional value of the log information of the 5th class calculates, if the first log information set pair answers the 1st class daily record and the 5th class daily record, and the exceptional value of the 1st class daily record is in the individual maximum exceptional value of front M, then determine that the 1st class log information is that the root of fault generation is because of daily record, so, improve the accuracy of the network failure analysis of causes.
Due to, log information is the recorded information of the network equipment at the crawler behavior of a time point, therefore, and described processor 2011, may be used for directly obtaining described N class root because of daily record, using the reason that N class root breaks down as the network equipment because of recorded information that daily record is corresponding;
Existing analytical method can also be adopted in conjunction with N class root because daily record is analyzed, determine the most basic failure cause causing N class root because of daily record;
On the basis of the mode (1) that can also adopt at processor 2011, by each Log Types at least one Log Types after merging with its at least one root because of daily record combine in the number of times corresponding record that occurs, using log information corresponding for Log Types the highest for number of times directly as the basic reason of network equipment failure generation.
As from the foregoing, the embodiment of the present invention provides a kind of analytical equipment, determines some fault time of the network equipment; Obtain the first log information collection that the described network equipment produces in first time period; Described first log information collection comprises M class log information, described M be more than or equal to 1 integer, described first time period is: the time period from the second moment after the first moment before described fault time point to described fault time point; According to presupposition analysis strategy, class log information every in described M class log information is analyzed, obtain N class root in described M class log information because of daily record; According to described N class root because the reason that the described network equipment breaks down is determined in daily record.So, automatically all kinds of log informations near fault time point are analyzed, obtain the log information meeting the rule that root occurs because of daily record when fault occurs, according to the basic reason that this log information determination network equipment breaks down, achieve to network equipment failure generation root because of automatic analysis, improve the efficiency of fault root cause analysis.
For convenience of description, following examples one illustrate with the form of step and describe the fault root cause analysis method that in the present invention, analytical equipment 20 performs in detail, wherein, the step illustrated also can perform in the computer system of the such as one group of executable instruction except analytical equipment 20, as: method of the present invention can also be performed by the network equipment 10, the unit of the execution method provided by the invention comprised in the analytical equipment 20 namely shown in Fig. 2 also can be included in the network equipment 10, performs fault root cause analysis method provided by the invention by the network equipment 10.In addition, although show logical order in the drawings, in some cases, can be different from the step shown or described by order execution herein.
Embodiment one
The flow chart of the fault root cause analysis method that Fig. 3 provides for the embodiment of the present invention, analytical equipment 20 as shown in Figure 2 performs, and for carrying out fault root cause analysis to the analytical equipment 20 in Fig. 2, as shown in Figure 3, described method can comprise:
S101: some fault time determining the network equipment.
Wherein, the described time point that fault time, point broke down for the network equipment, due within a time period, may there is repeatedly fault in the network equipment, therefore, puts above-mentioned fault time and can refer to the time point that the network equipment once breaks down arbitrarily.
Optionally, some fault time of existing people's method determination network equipment can be adopted, also can adopt some fault time of following method locating network device:
Obtain at least one log information that the network equipment produces within a time period;
Described at least one log information is processed, forms second day will behavioural matrix; Wherein, described second day will behavioural matrix comprises X daily record behavior vector, and each daily record behavior vector takies a small time intervals, and each daily record behavior vector comprises Y element; Described Y is the number of Log Types, y element representation in described daily record behavior vector: in the small time intervals of described daily record behavior vector, belong to the number of the log information of y class;
According to preset model, the daily record behavior vector in described second day will behavioural matrix is calculated, determine the time of failure of the described network equipment; Wherein, described preset model is used for: the daily record behavior vector filtering out the behavioural characteristic met when the network equipment breaks down.
Wherein, described at least one log information is the recorded information of the crawler behavior of the network equipment within a time period, every bar log information describes the once independent crawler behavior of the network equipment, and every bar log information can comprise: the network equipment performs the information such as the timestamp of event, main frame or module name, event level, information profile, event message; Need illustrate time, can there is repeatedly fault in the network equipment within a described time period.
Optionally, analytical equipment can capture at least one log information of the technical limit spacing network equipment by existing log scan, as: can by least one log information of the web crawlers technical limit spacing network equipment, in this not go into detail.
Optionally, analytical equipment can adopt following method to process described at least one log information, forms second day will behavioural matrix:
In at least one log information obtained, the content format of every bar log information is converted to default journal format;
Log information after format conversion is sorted out, and replaces described log information with the classification logotype belonging to log information, form a time series be made up of classification logotype;
According to default small time intervals, described time series is divided;
For each small time intervals, classification logotype identical in described small time intervals is carried out counting statistics, and statistics number is arranged in a Y dimension daily record behavior vector;
All daily record behavior vectors are formed described second day will behavioural matrix according to time sequencing.
Wherein, the journal format preset can preset as required, and the embodiment of the present invention does not limit this.Such as: log information can comprise: the information fields such as timestamp/host name/event level/information profile/event information (Timestamp/Device/Eventseverity/Brieflyinformation/Eventm essage); And the form of each field can specification be form as shown in table 1 below, as: " timestamp " in log information is represented by the temporal information of shape as " Apr21201502:34:25 " form, with representing that the numeral of grade represents " event class ", now, if the timestamp that there is a log information is: 2015-11-1109:00:00, then need this timestamp to be converted to " Nov11201509:00:00 ".
Table 1 daily record standardization form
Described classification logotype is used for representing: Log Types; Such as: if log information " Apr21201512:12:12Userlogin " belongs to Log Types 1, then this log information can be represented by numeral " 1 ".
Preferably, the method of hierarchical clustering can be adopted to sort out the every bar log information after format conversion, wherein, described hierarchical clustering is the classic algorithm in artificial intelligence, adopt the cluster analysis instrument of q-gram algorithm to weigh character string similarity degree, using q-gram distance as the diversity factor value between different daily record, cluster is carried out to the every bar log information after format conversion, by adjustment clustering parameter q, obtain optimum Log Types number; Wherein, the difference of q value can cause the difference of analog result, and from experimentally a large amount of, q preferably gets 3 in the present invention, and this value is little on the impact of Log Clustering result, and specific implementation repeats no more.
Small time intervals corresponding to described each daily record behavior vector can equal also can be unequal, described small time intervals can be arranged as required, and the embodiment of the present invention does not limit this, as: can be 1 minute or 5 minutes.Such as, if the number of Log Types is Y, the small time intervals according to presetting marks off X time period, then the second day will behavioural matrix constructed is:
Wherein, (x
t1,1x
t1,2..., x
t1, Y) represent that the daily record behavior of small time intervals T1 is vectorial, y element x in this daily record behavior vector
t1, yrepresent: the number belonging to the log information of y class.Such as: the number of Log Types is 10, and by the numeral of 1-10 as classification logotype, Log Types 1-10 is identified one to one, now, if get 100 log informations in T1 small time intervals, the classification logotype 1 of 10 log informations is wherein had, article 20, the classification logotype 7 of classification logotype 3,70 log informations of log information, then the daily record behavior vector of T1 small time intervals is: (10,0,20,0,0,0,70,0,0,0).
Optionally, describedly according to preset model, the daily record behavior vector in described second day will behavioural matrix to be calculated, determines that the time of failure of the described network equipment specifically can comprise following two kinds of modes:
(1) daily record frequency and the daily record kind of each daily record behavior vector is calculated respectively;
For the arbitrary daily record behavior vector in described second day will behavioural matrix, calculate the daily record frequency variance between described daily record behavior vector sum at least one daily record behavior vector adjacent with described daily record behavior vector and daily record kind variance;
If the average of described daily record frequency variance and daily record kind variance is greater than predetermined threshold value, then small time intervals corresponding for described daily record behavior vector is defined as described network equipment failure time of origin.
Wherein, predetermined threshold value can be obtained by a large amount of fault log analysis, the present invention does not limit at this, if the average of described daily record frequency variance and daily record kind variance is greater than predetermined threshold value, then representing that daily record frequency and the daily record kind of daily record behavior vector are undergone mutation, for there is network failure in this time period; If the average of described daily record frequency variance and daily record kind variance is less than or equal to predetermined threshold value, then represent that the daily record frequency of daily record behavior vector and daily record kind be network equipment normal operation are behavioural characteristics.
It should be noted that, at least one daily record behavior vector adjacent with described daily record behavior vector can be the several daily record behavior vectors before this daily record behavior vector, also can be the several daily record behavior vectors after this daily record behavior vector, can also for occurring in the several daily record behavior vectors before and after this daily record behavior vector, its number can be arranged as required, and the embodiment of the present invention does not limit this; Preferably, according to great many of experiments, at least one daily record behavior vector adjacent with described daily record behavior vector can be: four daily record behavior vectors adjacent after described daily record behavior vector.
Such as, if the daily record frequency variance calculated and daily record kind variance are respectively a
iand b
i, now, if
λ
1for by obtaining predetermined threshold value to a large amount of fault log analysis, then time corresponding for this vector is defined as fault time.
It should be noted that, due in periodicity daily record, the log information number of the generation in the unit interval to change, i.e. daily record frequency is changeless, so, for periodicity daily record, do not have meaning in the fault detect intermediate frequency numerical mutation of aforesaid way, affect failure detection result, in order to address this problem, power method is composed in the daily record that the present invention proposes based on technology for information acquisition, has considered the distribution situation of all kinds of daily record, effectively promotes the accuracy of delimiting fault time; Optionally, when the network equipment produces periodically daily record, the embodiment of the present invention, before the daily record frequency calculating each daily record behavior vector respectively and daily record kind, also needs to carry out following process:
According to formula
assignment is weighted to y element in each daily record behavior vector;
Wherein, described y element is the arbitrary element in described daily record behavior vector; n
ybe the number of the small time intervals that y class log information occurs, namely refer to that y class log information occurred in n small time intervals; Std (y) is: the distribution variance of y class log information.
The distribution variance of described y class log information is: the number of y class log information in described daily record behavior vector, and in other all daily record behavior vectors in described second day will behavioural matrix except described daily record behavior vector y class log information number between variance.
Such as, two daily record behavior vector fractional integration series are not: (10, 0, 20, 0, 0, 0, 70, 0, 0, 0), (10, 0, 20, 0, 20, 0, 30, 0, 10, 10), namely in identical small time intervals, all produce 100 log informations, daily record frequency is identical, now, each element that can be respectively in these two daily record behavior vectors according to above-mentioned assignment formula is weighted assignment, obtain: (11.7307, 0, 4.79, 0, 0, 0, 2.348, 0, 0, 0), (2.5597, 0, 3.9780, 0, 2.67, 0, 30, 0, 5.648, 10), so, characterization value corresponding to each daily record behavior is different, replace original daily record frequency can make locate fault time with it more accurate.
(2) each daily record behavior vector in described X daily record behavior vector is traveled through, similitude between the daily record behavior vector that more described daily record behavior vector sum is adjacent with described daily record behavior vector after described daily record behavior vector time, obtains the comparison value corresponding with described daily record behavior vector;
Vectorial to each daily record behavior in traversal described X daily record behavior vector, that obtain with each daily record behavior vector in described X daily record behavior vector one to one comparison value arrange from big to small;
The small time intervals of the daily record behavior vector of k value correspondence before after arrangement is defined as described network equipment failure time of origin; Wherein, k be more than or equal to 1 integer.
Optionally, can according to formula
similitude between the daily record behavior vector that more described daily record behavior vector sum is adjacent with described daily record behavior vector after described daily record behavior vector time, obtains the comparison value corresponding with described daily record behavior vector; Wherein, the small time intervals of t residing for daily record behavior vector, x
t,yrepresent y element of t capable daily record behavior vector.
Wherein, in embodiments of the present invention, k be more than or equal to 1 integer, and number k can experience choose, can also set a threshold value, being defined as abnormal daily record behavior vector occurs by k the daily record behavior vector being greater than this threshold value in comparison value, is network equipment failure origination point.
It should be noted that, above-mentioned two kinds of modes can perform separately, also can combine execution, with the exact time of locating network fault generation more accurately, such as: can determine that the frequency of the 1st row, the 5th row daily record behavior vector and kind are undergone mutation by first pass-through mode (1), be fault origination point, then, only the similitude of the 1st row, the 5th row is compared according to mode (2) again, determine the 1st row or the 5th behavior fault origination point.
S102: obtain the first log information collection that the described network equipment produces in first time period; Described first log information collection comprises M class log information, described M be more than or equal to 1 integer, described first time period is: the time period from the second moment after the first moment before described fault time point to described fault time point.
Wherein, described duration from described fault time o'clock to the first moment and can arranging as required from described fault time o'clock to the duration in the second moment, the embodiment of the present invention does not limit this, the present invention only to obtain the log information near putting fault time for principle, determines the first moment and the second moment.
Preferably, the log information of 40 minutes after the log information of 20 minutes and fault time point before can obtaining some fault time, using the log information that gets as the first log information collection in first time period; The log information of (as in 60 minutes) in a time period after only can also obtaining some fault time, using the log information that gets as the first log information collection.
Next, the content format of the every bar log information the first log information got can concentrated is converted to default journal format, then, adopt the method for hierarchical clustering to sort out the every bar log information after format conversion, determine that described first log information collection comprises M class log information; Can also directly on the basis of S101, the Log Types that inquiry second day will behavioural matrix comprises, if there is M class log information in the first time period of second day will behavioural matrix, has then determined that the first log information set pair answers M class log information.
S103: analyze class log information every in described M class log information according to presupposition analysis strategy, obtains N class root in described M class log information because of daily record; Described N class root because of daily record is: the log information produced when the described network equipment breaks down, described M >=N >=1, and described presupposition analysis strategy is: the rule that when predetermined described network equipment failure occurs, daily record occurs.
Due to, the network equipment may produce at least one class log information (root is because of daily record) when fault occurs, and the appearance of these class log informations near putting fault time presents obvious characteristic rule, for this reason, invention technician is in conjunction with a large amount of fars, in advance the log information that a large amount of fault time produces near point is analyzed, excavate the characteristic rule that fault root occurs because of daily record: at least one class log information produced when (1) fault occurs usually can be combined and repeat and continual appearance near fault point; (2) the class log information produced when fault occurs frequently occurs usually within a long time period, and at some fault time place in the trend increased suddenly, based on this, invention technician proposes the analysis strategy of root because of daily record occurrence law, according to this analysis strategy, the M class log information that S102 gets is analyzed, determine and meet a few class log informations of root because of daily record occurrence law, optionally, root can be obtained because of daily record by following two kinds of modes:
(1) M corresponding for described M class log information Log Types is divided into the daily record combination that i different; At least one Log Types comprised in a described M Log Types is combined in each daily record, and described each daily record to combine the Log Types comprised different, described i be more than or equal to 1 integer;
Travel through described i daily record combination, determine described i daily record combine at least one root combine because of daily record; Described is combined as because of daily record: frequent and continue the daily record combination that occurs in described first time period;
To at least one root described because daily record is combined into row relax;
By at least one root after process because at least one class log information that daily record combination is corresponding is defined as described N class root because of daily record.
Wherein, arbitrary daily record is combined, described determine described daily record be combined as root because of daily record combination can comprise:
Described first time period is divided at least one time window, each time window at least one time window described is divided at least one little time window;
Calculate described daily record and be combined in the first frequency occurred in arbitrary time window; Described first frequency is: the ratio of the number of the little time window that the number of the little time window of the described daily record combination of the interior appearance of described time window and described time window comprise;
If described first frequency is greater than the first predetermined threshold value, then determine that described daily record is combined as the frequent daily record combination in described time window;
Calculate described frequent daily record and be combined in the second frequency occurred in described first time period; Described second frequency is: the ratio of the number of the time window that the number of the time window of the described frequent daily record combination of the interior appearance of described first time period and described first time period comprise;
If described second frequency is greater than the second predetermined threshold value, then determine that described frequent daily record is combined as root because of daily record and combines.
It should be noted that, in embodiments of the present invention, described daily record is combined as the combination comprising at least one Log Types, described time window is a time interval, in the process that time window is divided, the size of each time window can equal also can be unequal, the size of each little time window can equal also can be unequal, described daily record is combined in time window and occurs referring to: the log information that described daily record combines the Log Types that comprises corresponding occurred in the time interval corresponding to this time window.
First predetermined threshold value and the second predetermined threshold value can be arranged as required, the embodiment of the present invention does not limit this, if the first frequency of a daily record combination correspondence is greater than the first predetermined threshold value, then represent in this daily record combination of sets and at a time frequently occur, be defined as frequent daily record combination, if first frequency corresponding to this daily record combination is less than or equal to the first predetermined threshold value, can think log information corresponding to this daily record combination be the network equipment normal time some log informations of producing, if the second frequency of frequent daily record combination is greater than the second predetermined threshold value, then represent that at a time frequent interior the continuing of first time period that aim at day occurred frequently occurs, namely this frequent daily record is combined as in first time period and repeats and the uninterrupted daily record occurred, meet the rule that fault root occurs because of daily record, determine that this frequent daily record is combined as root because of daily record and combines, this daily record is combined the log information that log information corresponding to the Log Types that comprises is defined as producing when the network equipment breaks down, if the second frequency of frequent daily record combination is less than or equal to the second predetermined threshold value, then represent that aiming at day of at a time frequently occurring can not continuation occur in first time period, can think log information corresponding to this daily record combination be the network equipment normal time some log informations of producing.
Such as: if the first log information set pair answers (1, 2, 3) three Log Types, first time period is divided into 7 time windows, each time window is divided into 12 little time windows, first predetermined threshold value is set to 3/4, second predetermined threshold value is set to 1/2, then can by (1, 2, 3) three Log Types are divided into (1, 2, 3), (1, 2), (1, 3), (2, 3), (1), and (3) seven daily records combination (2), now, if daily record combination (1, 2, 3) all occur in 10 little time windows in first time window, then daily record combination (1, 2, 3) first frequency in very first time window is: 10/12 > 3/4, by daily record combination (1, 2, 3) the frequent daily record combination in very first time window is defined as, travel through each time window, if determine daily record combination (1, 2, 3) at very first time window, second time window, 4th time window, 6th time window, frequent daily record combination is in 7th time window, then this daily record combination (1, 2, 3) second frequency is: 5/7 > 1/2, by daily record combination (1, 2, 3) being defined as root because of daily record combines, i.e. this daily record combination (1, 2, 3) log information of the 1st type in, the log information of the 2nd type and the log information of the 3rd type are some log informations that the network equipment produces when breaking down, in like manner, travel through other daily records combination (1, 2), (1, 3), (2, 3), (1), and (3) (2), determine (1, 2), (1, 3) and (2) three daily records be combined as root because of daily record and combine.
In addition, due to obtained above because daily record combination may exist the relation comprised, if root is because of daily record combination (1,2,3) root is comprised because of daily record combination (1,2), now, if the second frequency of involved daily record combination is less than the daily record combination comprising it, then can reject involved root because of daily record and combine, but, if the second frequency of involved daily record combination is much larger than comprising its daily record combination, then can using involved daily record combination and the daily record combination comprising it as one independently phenomenon of the failure retained; Optionally, if travel through described i daily record to combine at least one root of determining because having first in daily record combination because of daily record combination and second because daily record is combined, and described first because of daily record combination be included in described second because of in daily record combination, then describedly can to comprise because daily record is combined into row relax at least one root described:
When described first because of second frequency that daily record combination is corresponding be greater than described second because of second frequency that daily record combination is corresponding time, do not reject described first because of daily record combination;
When described first because of second frequency that daily record combination is corresponding be less than described second because of second frequency that daily record combination is corresponding time, reject described first because of daily record combination.
Such as, the root determined if above-mentioned is because of daily record combination (1,2,3), (1,2), in (1,3) and (2), (1,2,3) second frequency is greater than (1,2) and the second frequency of (2), and (1,3) second frequency is greater than (1,2,3) second frequency, then reject root because of daily record combination (1,2) and (2), only to root because of daily record combination (1,2,3) and (1,3) merge, namely obtain the 1st class log information, the 2nd class log information and the 3rd class log information and be defined as root because of daily record.
Finally, due to fault occur before also comprise in daily record some frequent and continue the daily record combination that occurs, such daily record is combined as and patrols and examines daily record normally under normal circumstances, have nothing to do with the generation of fault, therefore, if at least one root determined is because existing the 3rd because of daily record combination in daily record combination, and described 3rd root be combined as before point described fault time because of daily record combines because of daily record, then describedly can also comprise because daily record is combined into row relax at least one root described:
Reject described 3rd because of daily record combination.
Wherein, the root before some fault time is combined as because of daily record: frequent and continue the daily record combination that occurs within the time period before fault time point; Optionally, also can adopt and above-mentionedly determine that method that the root in first time period combines because of daily record is to determine that root before some fault time combines because of daily record, such as, time period between before obtaining some fault time, this time period is divided at least one time window, each time window is divided at least one little time window; Calculate daily record and be combined in the first frequency occurred in time window; If described first frequency is greater than the first predetermined threshold value, calculates described daily record and be combined in the second frequency occurred in described first time period; If described second frequency is greater than the second predetermined threshold value, then determine that described daily record is combined as root because of daily record and combines.
Such as, if determine, root in first time period is combined as (1,2,3) and (1 because of daily record, 4), and daily record combination (1,2,3) combine because of daily record for the root before fault time point, then reject daily record combination (1,2,3).
In addition, by finding a large amount of fault correlation log and normal log analysis, normal daily record often presents the rule periodically occurred, distribution is relatively more even, and occur in whole daily record more frequent.And fault Gen Yin aims at fault point in increasing trend suddenly, and but almost never occur in the daily record that non-faulting pattern is corresponding, this content information content higher with the frequency of occurrences described in information theory is lower consistent.We have proposed the exceptional value computational methods meeting daily record behavior pattern for this reason, and select fault log based on the exceptional value obtained, specific implementation is as shown in (2):
(2) determine and described M class log information M exceptional value one to one; Described exceptional value is used for representing: the frequent degree that a class log information occurs within the second time period and mutation content, and described second time period comprises described first time period;
From a described M exceptional value, obtain top n largest outliers, concentrated by described first log information the N class log information corresponding with described top n largest outliers to be defined as described N class root because of daily record.
Optionally, the second daily record information set that the described network equipment produces within described second time period is obtained; Described second daily record information set comprises at least one log information, the corresponding time point of each log information;
Preliminary treatment is carried out to described second daily record information set, obtains the first daily record behavioural matrix; Described first daily record behavioural matrix comprises: Q group daily record behavior vector, often organize daily record behavior vector and take a time interval, often organize daily record behavior vector and comprise R element, described R is the group number of the Log Types that described second daily record information set is corresponding, described R >=described M; A jth element representation in described daily record behavior vector: the number of jth class log information within the time interval of described daily record behavior vector;
According to formula
Calculate the exceptional value of described R class log information respectively, obtain and described R class log information R exceptional value one to one;
Obtain and described M class log information M exceptional value one to one from a described R exceptional value.
Wherein, the described time interval is larger, is generally dozens of minutes, and often organize daily record behavior vector the time interval can equal also can be unequal.
The method building second day will behavioural matrix when the construction method of above-mentioned first daily record behavioural matrix determines that fault time puts with S101, is identical, Ke Yiwei: the content format of the every bar log information described second daily record information concentrated is converted to default journal format, log information after format conversion is sorted out, and replaces described log information with the classification logotype belonging to log information, form a time series be made up of classification logotype, according to prefixed time interval, described time series is divided, for each time interval, classification logotype identical in the described time interval is carried out counting statistics, and statistics number is arranged in a R dimension daily record behavior vector, by all daily record behavior vectors according to the described first daily record behavioural matrix of time sequencing composition, unlike, carrying out, the time interval that in the second day will behavioural matrix built when fault time, point confirmed, each daily record behavior vector is corresponding is shorter, be generally several seconds or a few minutes, and the time interval often organizing daily record behavior vector corresponding in the first daily record behavioural matrix built when carrying out exceptional value and calculating is long, be generally dozens of minutes, therefore, will be understood that, when the second day will behavioural matrix built when directly adopting S101 to determine that fault time puts is as above-mentioned first daily record behavioural matrix, Y daily record behavior vector is needed to be one group with several to be divided into Q group daily record behavior vectorial, Y < Q, when calculating to make exceptional value, the time interval often organizing daily record behavior vector corresponding is larger.
Described
represent that in described R class log information, jth class log information concentrates the frequent degree of appearance in described second daily record information, described in
represent the mutation content that in described R class log information, jth class log information is concentrated in described second daily record information, described q
jfor comprising the group number of the daily record behavior vector of jth class log information, described c
k+1, jrepresent the total quantity of jth class log information in+1 time interval of kth, described c
k,jrepresent the total quantity of jth class log information in the kth time interval.
Such as, if above-mentioned first daily record behavioural matrix comprises 100 groups of daily record behavior vectors, and 1,2,3,4,5 five Log Types, and be directed to the 1st class log information, occur in 1,5,6,9,10 5 group of daily record behavior vector, the number often organizing the 1st class log information in the time interval corresponding to daily record behavior vector is respectively: 100,20,30,60,90, then the frequent degree of the 1st class log information is:
the mutation content of the 1st class log information is:
the exceptional value of the 1st class log information is: log20*log100.
It should be noted that, above-mentioned two kinds of modes can perform separately, also can combine execution, with the exact cause of locating network fault generation more accurately, such as: the 1st class can be determined by first pass-through mode (1), 5th class log information is the daily record combination of frequent and lasting appearance, then, again according to mode (2) to the 1st class, the exceptional value of the log information of the 5th class calculates, if the first log information set pair answers the 1st class daily record and the 5th class daily record, and the exceptional value of the 1st class daily record is in the individual maximum exceptional value of front M, then determine that the 1st class log information is that the root of fault generation is because of daily record, so, improve the accuracy of the network failure analysis of causes.
S104: according to described N class root because the reason that the described network equipment breaks down is determined in daily record.
Due to, log information is the recorded information of the network equipment at the crawler behavior of a time point, therefore, directly can obtain described N class root because of daily record, using the reason that N class root breaks down as the network equipment because of recorded information that daily record is corresponding; In addition, be understandable that, because the primary fault of the network equipment may produce multiple log information, therefore, in embodiments of the present invention, existing analytical method can also be adopted in conjunction with N class root because daily record is analyzed, determine the most basic failure cause causing N class root because of daily record.
Can also on the basis in S103 described in mode (1), by each Log Types at least one Log Types after merging with its at least one root because of daily record combine in the number of times corresponding record that occurs, using log information corresponding for Log Types the highest for number of times directly as the basic reason of network equipment failure generation.
Such as, if the mode (1) through S103 determines that 7 roots are because of daily record combination (11,12,13,14,15,16), (11,14,16,17), (11,28,35), (11,28,8), (11,31), (11,34), (11,35,8), 11st kind of Log Types has occurred 7 times in these 7 roots are because of daily record combination, is the Log Types that occurrence number is the highest, therefore, most basic reason when can directly the recorded information of the 7th class log information be broken down as the network equipment, carries out subsequent analysis process.
As from the foregoing, the embodiment of the present invention provides a kind of fault root cause analysis method, determines some fault time of the network equipment; Obtain the first log information collection that the described network equipment produces in first time period; Described first log information collection comprises M class log information, described M be more than or equal to 1 integer, described first time period is: the time period from the second moment after the first moment before described fault time point to described fault time point; According to presupposition analysis strategy, class log information every in described M class log information is analyzed, obtain N class root in described M class log information because of daily record; According to described N class root because the reason that the described network equipment breaks down is determined in daily record.So, automatically all kinds of log informations near fault time point are analyzed, obtain the log information meeting the rule that root occurs because of daily record when fault occurs, according to the basic reason that this log information determination network equipment breaks down, achieve to network equipment failure generation root because of automatic analysis, improve the efficiency of fault root cause analysis.
According to the embodiment of the present invention, the present invention is following embodiment still provides a kind of analytical equipment 30, is preferably used for realizing the method in said method embodiment.
Embodiment two
The structure chart of a kind of analytical equipment 30 that Fig. 4 provides for the embodiment of the present invention, described analytical equipment 30 can be: switch, router, Network Management Equipment, Web (webpage) server, software defined network (SoftwareDefinedNetwork, SDN) any one equipment in the equipment such as controller, for performing the method described in embodiment one, as shown in Figure 4, described analytical equipment 30 can comprise:
Determining unit 201, for determining some fault time of the network equipment.
Wherein, the described time point that fault time, point broke down for the network equipment, due within a time period, may there is repeatedly fault in the network equipment, therefore, puts above-mentioned fault time and can refer to the time point that the network equipment once breaks down arbitrarily.
Acquiring unit 202, for obtaining the first log information collection that the described network equipment produces in first time period; Described first log information collection comprises M class log information, described M be more than or equal to 1 integer, described first time period is: the time period from the second moment after the first moment before fault time that described determining unit 201 is determined point to described fault time point.
Wherein, described duration from described fault time o'clock to the first moment and can arranging as required from described fault time o'clock to the duration in the second moment, the embodiment of the present invention does not limit this, the present invention only to obtain the log information near putting fault time for principle, determines the first moment and the second moment.
Preferably, the log information of 40 minutes after the log information of 20 minutes and fault time point before can obtaining some fault time, using the log information that gets as the first log information collection in first time period; The log information of (as in 60 minutes) in a time period after only can also obtaining some fault time, using the log information that gets as the first log information collection.
Analytic unit 203, analyzes for every class log information in the M class log information that gets described acquiring unit 202 according to presupposition analysis strategy, obtains N class root in described M class log information because of daily record; Described N class root because of daily record is: the log information produced when the described network equipment breaks down, described M >=N >=1, and described presupposition analysis strategy is: the rule that when predetermined described network equipment failure occurs, daily record occurs.
Described determining unit 201, also for the N class root that obtains according to described analytic unit 203 because the reason that the described network equipment breaks down is determined in daily record.
Further, described determining unit 201 can adopt some fault time of existing people's method determination network equipment, also can adopt some fault time of following method locating network device:
Obtain at least one log information that the network equipment produces within a time period;
Described at least one log information is processed, forms second day will behavioural matrix; Wherein, described second day will behavioural matrix comprises X daily record behavior vector, and each daily record behavior vector takies a small time intervals, and each daily record behavior vector comprises Y element; Described Y is the number of Log Types, y element representation in described daily record behavior vector: in the small time intervals of described daily record behavior vector, belong to the number of the log information of y class;
According to preset model, the daily record behavior vector in described second day will behavioural matrix is calculated, determine the time of failure of the described network equipment; Wherein, described preset model is used for: the daily record behavior vector filtering out the behavioural characteristic met when the network equipment breaks down.
Wherein, described at least one log information is the recorded information of the crawler behavior of the network equipment within a time period, every bar log information describes the once independent crawler behavior of the network equipment, and every bar log information can comprise: the network equipment performs the information such as the timestamp of event, main frame or module name, event level, information profile, event message; Need illustrate time, can there is repeatedly fault in the network equipment within a described time period.
Optionally, determining unit 201 can adopt following method to process described at least one log information, forms second day will behavioural matrix:
In at least one log information obtained, the content format of every bar log information is converted to default journal format;
Log information after format conversion is sorted out, and replaces described log information with the classification logotype belonging to log information, form a time series be made up of classification logotype;
According to default small time intervals, described time series is divided;
For each small time intervals, classification logotype identical in described small time intervals is carried out counting statistics, and statistics number is arranged in a Y dimension daily record behavior vector;
All daily record behavior vectors are formed described second day will behavioural matrix according to time sequencing.
Wherein, the journal format preset can preset as required, and the embodiment of the present invention does not limit this.
Described classification logotype is used for representing: Log Types; Such as: if log information " Apr21201512:12:12Userlogin " belongs to Log Types 1, then this log information can be represented by numeral " 1 ".
Preferably, the method of hierarchical clustering can be adopted to sort out the every bar log information after format conversion, wherein, described hierarchical clustering is the classic algorithm in artificial intelligence, adopt the cluster analysis instrument of q-gram algorithm to weigh character string similarity degree, using q-gram distance as the diversity factor value between different daily record, cluster is carried out to the every bar log information after format conversion, by adjustment clustering parameter q, obtain optimum Log Types number; Wherein, the difference of q value can cause the difference of analog result, and from experimentally a large amount of, q preferably gets 3 in the present invention, and this value is little on the impact of Log Clustering result, and specific implementation repeats no more.
Small time intervals corresponding to described each daily record behavior vector can equal also can be unequal, described small time intervals can be arranged as required, and the embodiment of the present invention does not limit this, as: can be 1 minute or 5 minutes.Such as, if the number of Log Types is Y, the small time intervals according to presetting marks off X time period, then the second day will behavioural matrix constructed is:
Wherein, (x
t1,1x
t1,2..., x
t1, Y) represent that the daily record behavior of small time intervals T1 is vectorial, y element x in this daily record behavior vector
t1, yrepresent: the number belonging to the log information of y class.Such as: the number of Log Types is 10, and by the numeral of 1-10 as classification logotype, Log Types 1-10 is identified one to one, now, if get 100 log informations in T1 small time intervals, the classification logotype 1 of 10 log informations is wherein had, article 20, the classification logotype 7 of classification logotype 3,70 log informations of log information, then the daily record behavior vector of T1 small time intervals is: (10,0,20,0,0,0,70,0,0,0).
Optionally, described determining unit 201 specifically for by following two kinds of modes, determines the time of failure of the described network equipment:
(1) daily record frequency and the daily record kind of each daily record behavior vector is calculated respectively;
For the arbitrary daily record behavior vector in described second day will behavioural matrix, calculate the daily record frequency variance between described daily record behavior vector sum at least one daily record behavior vector adjacent with described daily record behavior vector and daily record kind variance;
If the average of described daily record frequency variance and daily record kind variance is greater than predetermined threshold value, then small time intervals corresponding for described daily record behavior vector is defined as described network equipment failure time of origin.
Wherein, predetermined threshold value can be obtained by a large amount of fault log analysis, the present invention does not limit at this, if the average of described daily record frequency variance and daily record kind variance is greater than predetermined threshold value, then representing that daily record frequency and the daily record kind of daily record behavior vector are undergone mutation, for there is network failure in this time period; If the average of described daily record frequency variance and daily record kind variance is less than or equal to predetermined threshold value, then represent that the daily record frequency of daily record behavior vector and daily record kind be network equipment normal operation are behavioural characteristics.
It should be noted that, at least one daily record behavior vector adjacent with described daily record behavior vector can be the several daily record behavior vectors before this daily record behavior vector, also can be the several daily record behavior vectors after this daily record behavior vector, can also for occurring in the several daily record behavior vectors before and after this daily record behavior vector, its number can be arranged as required, and the embodiment of the present invention does not limit this; Preferably, according to great many of experiments, at least one daily record behavior vector adjacent with described daily record behavior vector can be: four daily record behavior vectors adjacent after described daily record behavior vector.
Such as, if the daily record frequency variance calculated and daily record kind variance are respectively a
iand b
i, now, if
λ
1for by obtaining predetermined threshold value to a large amount of fault log analysis, then time corresponding for this vector is defined as fault time.
It should be noted that, due in periodicity daily record, the log information number of the generation in the unit interval to change, i.e. daily record frequency is changeless, so, for periodicity daily record, do not have meaning in the fault detect intermediate frequency numerical mutation of aforesaid way, affect failure detection result, in order to address this problem, power method is composed in the daily record that the present invention proposes based on technology for information acquisition, has considered the distribution situation of all kinds of daily record, effectively promotes the accuracy of delimiting fault time; Optionally, when the network equipment produces periodically daily record, the embodiment of the present invention, before the daily record frequency calculating each daily record behavior vector respectively and daily record kind, also needs to carry out following process:
According to formula
assignment is weighted to y element in each daily record behavior vector;
Wherein, described y element is the arbitrary element in described daily record behavior vector; n
ybe the number of the small time intervals that y class log information occurs, namely refer to that y class log information occurred in n small time intervals; Std (y) is: the distribution variance of y class log information.
The distribution variance of described y class log information is: the number of y class log information in described daily record behavior vector, and in other all daily record behavior vectors in described second day will behavioural matrix except described daily record behavior vector y class log information number between variance.
(2) each daily record behavior vector in described X daily record behavior vector is traveled through, similitude between the daily record behavior vector that more described daily record behavior vector sum is adjacent with described daily record behavior vector after described daily record behavior vector time, obtains the comparison value corresponding with described daily record behavior vector;
Vectorial to each daily record behavior in traversal described X daily record behavior vector, that obtain with each daily record behavior vector in described X daily record behavior vector one to one comparison value arrange from big to small;
The small time intervals of the daily record behavior vector of k value correspondence before after arrangement is defined as described network equipment failure time of origin; Wherein, k be more than or equal to 1 integer.
Optionally, can according to formula
similitude between the daily record behavior vector that more described daily record behavior vector sum is adjacent with described daily record behavior vector after described daily record behavior vector time, obtains the comparison value corresponding with described daily record behavior vector; Wherein, the small time intervals of t residing for daily record behavior vector, x
t,yrepresent y element of t capable daily record behavior vector.
Wherein, in embodiments of the present invention, k be more than or equal to 1 integer, and number k can experience choose, can also set a threshold value, being defined as abnormal daily record behavior vector occurs by k the daily record behavior vector being greater than this threshold value in comparison value, is network equipment failure origination point.
It should be noted that, above-mentioned two kinds of modes can perform separately, also can combine execution, with the exact time of locating network fault generation more accurately, such as: can determine that the frequency of the 1st row, the 5th row daily record behavior vector and kind are undergone mutation by first pass-through mode (1), be fault origination point, then, only the similitude of the 1st row, the 5th row is compared according to mode (2) again, determine the 1st row or the 5th behavior fault origination point.
Further, due to, the network equipment may produce at least one class log information (root is because of daily record) when fault occurs, and the appearance of these class log informations near putting fault time presents obvious characteristic rule, for this reason, invention technician is in conjunction with a large amount of fars, in advance the log information that a large amount of fault time produces near point is analyzed, excavate the characteristic rule that fault root occurs because of daily record: at least one class log information produced when (1) fault occurs usually can be combined and repeat and continual appearance near fault point, (2) the class log information produced when fault occurs frequently occurs usually within a long time period, and at some fault time place in the trend increased suddenly, based on this, invention technician proposes the analysis strategy of root because of daily record occurrence law, according to this analysis strategy, the M class log information got is analyzed, determine and meet a few class log informations of root because of daily record occurrence law, optionally, analytic unit 203 can obtain root because of daily record by following two kinds of modes:
(1) M corresponding for described M class log information Log Types is divided into the daily record combination that i different; At least one Log Types comprised in a described M Log Types is combined in each daily record, and described each daily record to combine the Log Types comprised different, described i be more than or equal to 1 integer;
Travel through described i daily record combination, determine described i daily record combine at least one root combine because of daily record; Described is combined as because of daily record: frequent and continue the daily record combination that occurs in described first time period;
To at least one root described because daily record is combined into row relax;
By at least one root after process because at least one class log information that daily record combination is corresponding is defined as described N class root because of daily record.
Wherein, combine for arbitrary daily record, described analytic unit 203 may be used for:
Described first time period is divided at least one time window, each time window at least one time window described is divided at least one little time window;
Calculate described daily record and be combined in the first frequency occurred in arbitrary time window; Described first frequency is: the ratio of the number of the little time window that the number of the little time window of the described daily record combination of the interior appearance of described time window and described time window comprise;
If described first frequency is greater than the first predetermined threshold value, then determine that described daily record is combined as the frequent daily record combination in described time window;
Calculate described frequent daily record and be combined in the second frequency occurred in described first time period; Described second frequency is: the ratio of the number of the time window that the number of the time window of the described frequent daily record combination of the interior appearance of described first time period and described first time period comprise;
If described second frequency is greater than the second predetermined threshold value, then determine that described frequent daily record is combined as root because of daily record and combines.
It should be noted that, in embodiments of the present invention, described daily record is combined as the combination comprising at least one Log Types, described time window is a time interval, in the process that time window is divided, the size of each time window can equal also can be unequal, the size of each little time window can equal also can be unequal, described daily record is combined in time window and occurs referring to: the log information that described daily record combines the Log Types that comprises corresponding occurred in the time interval corresponding to this time window.
First predetermined threshold value and the second predetermined threshold value can be arranged as required, the embodiment of the present invention does not limit this, if the first frequency of a daily record combination correspondence is greater than the first predetermined threshold value, then represent in this daily record combination of sets and at a time frequently occur, be defined as frequent daily record combination, if first frequency corresponding to this daily record combination is less than or equal to the first predetermined threshold value, can think log information corresponding to this daily record combination be the network equipment normal time some log informations of producing, if the second frequency of frequent daily record combination is greater than the second predetermined threshold value, then represent that at a time frequent interior the continuing of first time period that aim at day occurred frequently occurs, namely this frequent daily record is combined as in first time period and repeats and the uninterrupted daily record occurred, meet the rule that fault root occurs because of daily record, determine that this frequent daily record is combined as root because of daily record and combines, this daily record is combined the log information that log information corresponding to the Log Types that comprises is defined as producing when the network equipment breaks down, if the second frequency of frequent daily record combination is less than or equal to the second predetermined threshold value, then represent that aiming at day of at a time frequently occurring can not continuation occur in first time period, can think log information corresponding to this daily record combination be the network equipment normal time some log informations of producing.
In addition, due to the root that obtains because daily record combination may exist the relation comprised, if root is because of daily record combination (1,2,3) root is comprised because of daily record combination (1,2), now, if the second frequency of involved daily record combination is less than the daily record combination comprising it, then can reject involved root because of daily record and combine, but, if the second frequency of involved daily record combination is much larger than comprising its daily record combination, then can using involved daily record combination and the daily record combination comprising it as one independently phenomenon of the failure retained; Optionally, if travel through described i daily record to combine at least one root of determining because having first in daily record combination because daily record combination and second are because of daily record combination, and described first because of daily record combination be included in described second because of during daily record combines; Then described analytic unit 203 can also be used for:
When described first because of second frequency that daily record combination is corresponding be greater than described second because of second frequency that daily record combination is corresponding time, do not reject described first because of daily record combination;
When described first because of second frequency that daily record combination is corresponding be less than described second because of second frequency that daily record combination is corresponding time, reject described first because of daily record combination.
Finally, due to fault occur before also comprise in daily record some frequent and continue the daily record combination that occurs, such daily record is combined as and patrols and examines daily record normally under normal circumstances, have nothing to do with the generation of fault, therefore, if travel through described i daily record to combine at least one root of determining because having the 3rd in daily record combination because of daily record combination, and described 3rd is combined because daily record is combined as root before point described fault time because of daily record, then described analytic unit 203 can also be used for:
Reject described 3rd because of daily record combination.
In addition, by finding a large amount of fault correlation log and normal log analysis, normal daily record often presents the rule periodically occurred, distribution is relatively more even, and occur in whole daily record more frequent.And fault Gen Yin aims at fault point in increasing trend suddenly, and but almost never occur in the daily record that non-faulting pattern is corresponding, this content information content higher with the frequency of occurrences described in information theory is lower consistent, we have proposed the exceptional value computational methods meeting daily record behavior pattern for this reason, and select fault log based on the exceptional value obtained, specific implementation is as shown in (2):
(2) described determining unit 201, also for determining and described M class log information M exceptional value one to one; Described exceptional value is used for representing: the frequent degree that a class log information occurs within the second time period and mutation content, and described second time period comprises described first time period;
Described analytic unit 203, also for obtaining top n largest outliers from a described M exceptional value, concentrates the N class log information corresponding with described top n largest outliers to be defined as described N class root because of daily record by described first log information.
Optionally, described acquiring unit 202, can also be used for:
Determining unit 201 determine with described M class log information one to one M exceptional value before, obtain the second daily record information set that the described network equipment produces within described second time period; Described second daily record information set comprises at least one log information, the corresponding time point of each log information;
Described determining unit 201, carries out preliminary treatment for the second daily record information set got described acquiring unit 202, obtains the first daily record behavioural matrix; Described first daily record behavioural matrix comprises: Q group daily record behavior vector, often organize daily record behavior vector and take a time interval, often organize daily record behavior vector and comprise R element, described R is the group number of the Log Types that described second daily record information set is corresponding, described R >=described M; A jth element representation in described daily record behavior vector: the number of jth class log information within the time interval of described daily record behavior vector;
According to formula
Calculate the exceptional value of described R class log information respectively, obtain and described R class log information R exceptional value one to one;
Obtain and described M class log information M exceptional value one to one from a described R exceptional value.
Wherein, the described time interval is larger, is generally dozens of minutes, and often organize daily record behavior vector the time interval can equal also can be unequal.
Described
represent that in described R class log information, jth class log information concentrates the frequent degree of appearance in described second daily record information, described in
represent the mutation content that in described R class log information, jth class log information is concentrated in described second daily record information, described q
jfor comprising the group number of the daily record behavior vector of jth class log information, described c
k+1, jrepresent the total quantity of jth class log information in+1 time interval of kth, described c
k,jrepresent the total quantity of jth class log information in the kth time interval.
It should be noted that, above-mentioned two kinds of modes can perform separately, also can combine execution, with the exact cause of locating network fault generation more accurately, such as: the 1st class can be determined by first pass-through mode (1), 5th class log information is the daily record combination of frequent and lasting appearance, then, again according to mode (2) to the 1st class, the exceptional value of the log information of the 5th class calculates, if the first log information set pair answers the 1st class daily record and the 5th class daily record, and the exceptional value of the 1st class daily record is in the individual maximum exceptional value of front M, then determine that the 1st class log information is that the root of fault generation is because of daily record, so, improve the accuracy of the network failure analysis of causes.
Due to, log information is the recorded information of the network equipment at the crawler behavior of a time point, therefore, and described determining unit 201, may be used for directly obtaining described N class root because of daily record, using the reason that N class root breaks down as the network equipment because of recorded information that daily record is corresponding;
Existing analytical method can also be adopted in conjunction with N class root because daily record is analyzed, determine the most basic failure cause causing N class root because of daily record;
On the basis of the mode (1) that can also adopt at analytic unit 203, by each Log Types at least one Log Types after merging with its at least one root because of daily record combine in the number of times corresponding record that occurs, using log information corresponding for Log Types the highest for number of times directly as the basic reason of network equipment failure generation.
As from the foregoing, the embodiment of the present invention provides a kind of analytical equipment, determines some fault time of the network equipment; Obtain the first log information collection that the described network equipment produces in first time period; Described first log information collection comprises M class log information, described M be more than or equal to 1 integer, described first time period is: the time period from the second moment after the first moment before described fault time point to described fault time point; According to presupposition analysis strategy, class log information every in described M class log information is analyzed, obtain N class root in described M class log information because of daily record; According to described N class root because the reason that the described network equipment breaks down is determined in daily record.So, automatically all kinds of log informations near fault time point are analyzed, obtain the log information meeting the rule that root occurs because of daily record when fault occurs, according to the basic reason that this log information determination network equipment breaks down, achieve to network equipment failure generation root because of automatic analysis, improve the efficiency of fault root cause analysis.
Those skilled in the art can be well understood to, and for convenience and simplicity of description, the unit of foregoing description and the specific works process of system, with reference to the corresponding process in preceding method embodiment, can not repeat them here.
In several embodiments that the application provides, should be understood that, disclosed system, equipment and method, can realize by another way.Such as, apparatus embodiments described above is only schematic, such as, the division of described unit, be only a kind of logic function to divide, actual can have other dividing mode when realizing, such as multiple unit or assembly can in conjunction with or another system can be integrated into, or some features can be ignored, or do not perform.
The described unit illustrated as separating component or can may not be and physically separates, and the parts as unit display can be or may not be physical location, namely can be positioned at a place, or also can be distributed in multiple network element.Some or all of unit wherein can be selected according to the actual needs to realize the object of the present embodiment scheme.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, also can be that the independent physics of unit comprises, also can two or more unit in a unit integrated.Above-mentioned integrated unit both can adopt the form of hardware to realize, and the form that hardware also can be adopted to add SFU software functional unit realizes.
The above-mentioned integrated unit realized with the form of SFU software functional unit, can be stored in a computer read/write memory medium.Above-mentioned SFU software functional unit is stored in a storage medium, comprising some instructions in order to make a computer equipment (can be personal computer, server, or the network equipment etc.) perform the part steps of method described in each embodiment of the present invention.And aforesaid storage medium comprises: USB flash disk, portable hard drive, read-only memory (Read-OnlyMemory, be called for short ROM), random access memory (RandomAccessMemory, be called for short RAM), magnetic disc or CD etc. various can be program code stored medium.
One of ordinary skill in the art will appreciate that all or part of step in the various methods of above-described embodiment is that the hardware (such as processor) that can carry out instruction relevant by program has come, this program can be stored in a computer-readable recording medium, and storage medium can comprise: read-only memory, random asccess memory, disk or CD etc.
Last it is noted that above embodiment is only in order to illustrate technical scheme of the present invention, be not intended to limit; Although with reference to previous embodiment to invention has been detailed description, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein portion of techniques feature; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the spirit and scope of various embodiments of the present invention technical scheme.