[go: up one dir, main page]

CN102135907B - Alarm processing method of super-large scale cluster - Google Patents

Alarm processing method of super-large scale cluster Download PDF

Info

Publication number
CN102135907B
CN102135907B CN 201110069524 CN201110069524A CN102135907B CN 102135907 B CN102135907 B CN 102135907B CN 201110069524 CN201110069524 CN 201110069524 CN 201110069524 A CN201110069524 A CN 201110069524A CN 102135907 B CN102135907 B CN 102135907B
Authority
CN
China
Prior art keywords
information
message
buffer
message buffer
multithreading
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN 201110069524
Other languages
Chinese (zh)
Other versions
CN102135907A (en
Inventor
史登连
赵欢
王清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Tenglong Information Technology Co.,Ltd.
Original Assignee
Dawning Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dawning Information Industry Co Ltd filed Critical Dawning Information Industry Co Ltd
Priority to CN 201110069524 priority Critical patent/CN102135907B/en
Publication of CN102135907A publication Critical patent/CN102135907A/en
Application granted granted Critical
Publication of CN102135907B publication Critical patent/CN102135907B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an alarm processing method of a super-large scale cluster which comprises an information buffer zone, a multithreading buffer zone and a to-be-processed information buffer zone. The processing course comprises an alarm information processing flow, a recovery information processing flow and an event information processing flow. The alarm processing method provided by the invention effectively solves the performance problem of alarm information processing of the super-large scale cluster, can process the alarm information at the speed of 100 pieces/second and better solves various performance bottleneck problems in the alarm information processing of the super-large scale cluster (more than 2000 computers).

Description

A kind of ultra-large cluster alert processing method
Technical field
The present invention relates to the cluster monitoring field, be specifically related to a kind of ultra-large cluster alert processing method.
Background technology
In the High-Performance Computing Cluster supervisory system, can produce a large amount of cluster performance alarm datas, managerial personnel need in time to grasp the situation of cluster on the other hand, real-time monitors cluster, the alarm data that equipment produces is notified to user or keeper with rational processing mode, solved a problem promptly.Traditional approach, process ultra-large cluster (greater than 2000) warning information and can run into various performance bottleneck problems, this technology has solved the performance issue that ultra-large cluster warning information is processed: the performance issue of large-scale data verification, storage, renewal, inquiry.
Prior art is not used parallel fully, such as not using the less data structure of this parallel granularity of ConcurrentHashMap; The reception of information and storage are not to walk abreast simultaneously to carry out.
Summary of the invention
For overcoming the above problems, the present invention relates to three parallel cache districts, Effective Raise the handling property problem.
A kind of ultra-large cluster alert processing method the treating method comprises:
Warning information is processed, and recovers information processing and event information and processes;
Wherein, described warning information is treated to:
A, from message buffer, obtain an information;
B, this information is inserted pending message buffer;
C, fox message type are processed if warning information then uses warning information to process object;
Whether D, the pending message buffer of inspection exist the information that needs this message delay to process, if exist then this information of delay disposal; As then not carrying out next step;
E, check in the multithreading extra buffer whether the repetition warning information is arranged, if without warning information would be inserted the multithreading extra buffer; If have then this information of delay disposal;
Whether F, inspection database activity warning information view have duplicate message;
G, there is duplicate message then to upgrade operation in the usage data storehouse, time digital section of the repetition alarm in the table is added 1, upgrade alarm time; Without duplicate message usage data storehouse update then;
H, from the multithreading extra buffer deletion information;
I, delete this information from pending message buffer;
J, thread repeating step A call other handling procedures simultaneously;
Described recovery message processing flow is:
A1, from message buffer, obtain an information;
B1, this information is inserted pending message buffer;
C1, fox message type, object is processed if recovery information is then used the recovery information processing;
Whether D1, the pending message buffer of inspection exist the information that needs this message delay to process, if exist then this information of delay disposal; As then not carrying out next step;
E1, check whether the multithreading extra buffer has the information of repetition, if without this information would be inserted the multithreading extra buffer; If have then this information of delay disposal;
Whether F1, inspection database activity warning information view have warning information corresponding to this recovery information;
If G1 has this information updating is entered database; If no, whether inspection recovery information can abandon marker bit; If marker bit is fasle, then this message delay is processed, can abandon marker bit and be set to true; If be true then abandon this information;
H1, from the multithreading extra buffer deletion information;
I1, delete this information from pending message buffer;
J1, thread repeating step A1 call other handling procedures simultaneously;
Described event information is treated to:
A2, from message buffer, obtain an information;
B2, this information is inserted pending message buffer;
C2, fox message type, if event information then use case information processing object process;
D2, directly event information is stored into database.
Preferably, quoting of formation blocked in described message buffer storage large scale computer ID and pre-service.
Preferably, described multithreading buffer zone inserts and removes operation by the thread pool thread.
Preferably, described pending message buffer inserts and removes operation by a plurality of threads.
Preferably, described message buffer contains two identical warning information of asynchronism(-nization), and whether at first check has the information more Zao than this information in the pending buffer zone, if having, then this message delay is processed.
Preferably, when described message buffer contained alarm clearing information corresponding to a warning information and this information, two processing threads checked at first whether the information more more Zao than this information is arranged in the pending buffer zone, if having, and then this message delay processing.
Preferably, the described zone bit initial default value of abandoning is false, if marker bit is ture to be shown and carried out once waiting for operation, then abandons this information.
The present invention efficiently solves the performance issue that ultra-large cluster warning information is processed: can process 100/second warning information, well solved the various performance bottleneck problems that run into when processing ultra-large cluster (greater than 2000) warning information.
Description of drawings
Fig. 1 is processing flow chart of the present invention
Embodiment
The present invention has designed three parallel cache districts: message buffer, multithreading buffer zone, pending message buffer.The details of each buffer zone is as follows:
Message buffer (walking abreast)
ConcurrentHashMap<String,PriorityBlockingQueue<PretreatInformation>>hmInfomationBuffer
The key storage large scale computer ID of Map, value partly stores pre-service and blocks quoting of formation
1, information is classified by large scale computer;
When 2, each large scale computer inside is carried out alarm and is produced, grade, alarm clearing prioritizing;
3, the current data structure is inserted by webservice reception information, the insertion of the corresponding a plurality of threads of key, and a thread is only processed the insertion of a key, and thread disappears after insertion.
4, when reading, each key and a thread are corresponding one by one, are single-threaded reading for each key.
5, the formation of priority obstruction can be sorted the priority basis: time>grade>information type (warning information, recovery information) sorts.Block when winning the confidence breath and obtain.
6, ConcurrentHashMap is thread-safe in atomic operation.ConcurrentHashMap treatment effeciency when concurrent is higher, and this data structure is to carry out fragmented storage to read, and the granularity of its lock is thinner, can better solve efficiency.
Multithreading buffer zone (walking abreast)
ConcurrentHashMap<Information, String〉mapMultiThreadsBufferMap:key=information, Value=thread name
It is synchronous that this buffer zone need to carry out multithreading;
1, the current data structure is carried out update by a plurality of threads in a plurality of thread pools, the insertion of a plurality of threads in thread pool of a key correspondence, and a thread is only processed the insertion of a key.
2, when removing, a plurality of threads in the corresponding thread pool of key remove operation, thread is only processed the information removing operation among the ConcurrentHashMap of correspondence of a key.
Pending message buffer (walking abreast)
ConcurrentHashMap<String, Information〉mapInfoToDealBufferMap:key=id (random number of generation), Value=information
1, the current data structure is carried out update by a plurality of threads, the insertion of a corresponding thread of key, and a thread is only processed the insertion of a key, is single-threaded operation for each key update namely.
2, when removing, a plurality of threads in the corresponding thread pool of key remove operation, thread is only processed the information removing operation among the ConcurrentHashMap of correspondence of a key.Warning information treatment scheme (the inner multi-threaded parallel of each large scale computer)
1, from message buffer, obtains an information;
2, this information is inserted pending message buffer;
3, fox message type is processed if warning information then uses warning information to process object;
4, check the information that whether pending message buffer exists needs this message delay to process, if exist then this information of delay disposal; As then not carrying out next step;
5, check in the multithreading extra buffer whether the repetition warning information is arranged, if without warning information would be inserted the multithreading extra buffer; If have then this information of delay disposal;
6, check database activity warning information view, whether duplicate message is arranged;
7, there is duplicate message then to upgrade operation in the usage data storehouse, time digital section of the repetition alarm in the table is added 1, upgrade alarm time; Without duplicate message usage data storehouse update then;
8, deletion information from the multithreading extra buffer;
9, delete this information from pending message buffer;
10, the thread repeating step 1, calls simultaneously other handling procedures.
Recover message processing flow (the inner multi-threaded parallel of each large scale computer)
1, from message buffer, obtains an information;
2, this information is inserted pending message buffer;
3, fox message type, object is processed if recovery information is then used the recovery information processing;
4, check the information that whether pending message buffer exists needs this message delay to process, if exist then this information of delay disposal; As then not carrying out next step;
5, check whether the multithreading extra buffer has the information of repetition, if without this information would be inserted the multithreading extra buffer; If have then this information of delay disposal;
6, check database activity warning information view, whether warning information corresponding to this recovery information is arranged;
If 7 have then this information updating are entered database; If no, check that whether can abandoning of recovery information is labeled as (this is labeled as initial default value is false, if marker bit is ture to be shown and carried out once waiting for operation, then abandons this information).If marker bit is fasle, then this message delay is processed, can abandon marker bit and be set to true; If be true then abandon this information.
8, deletion information from the multithreading extra buffer;
9, delete this information from pending message buffer;
10, the thread repeating step 1, calls simultaneously other handling procedures.
The event information treatment scheme
1, from message buffer, obtains an information;
2, this information is inserted pending message buffer;
3, fox message type, if event information then use case information processing object process;
4, directly event information is stored into database
The delay disposal scene analysis
Delay disposal is based on several scenes once:
Scene one: two identical warning information are arranged, just asynchronism(-nization) of difference in the message buffer.
Thread pool obtains two information simultaneously, is begun to carry out by two threads.If directly carry out, may cause:
1, two information are all carried out storage operation, cause wherein fail information memory.
2, in two information the time evening deposit database in advance, an other information is upgraded operation, this moment for the first time alarm time than the second time alarm time late, logic is unusual.
Therefore, whether more Zao than this information two threads all need at first to check in the pending buffer zone information, if having then this informational needs delay disposal.
Scene two a: warning information and alarm clearing information corresponding to this information are arranged in the message buffer.
Thread pool obtains two information simultaneously, is begun to carry out by two threads.If directly carry out, may cause:
Recovery information is dropped, and alarm can not get recovering.
Therefore, whether more Zao than this information two threads all need at first to check in the pending buffer zone information, if having then this informational needs delay disposal.

Claims (7)

1. ultra-large cluster alert processing method is characterized in that: the treating method comprises:
Warning information is processed, and recovers information processing and event information and processes;
Wherein, described warning information is treated to:
A, from message buffer, obtain an information;
B, this information is inserted pending message buffer;
C, fox message type are processed if warning information then uses warning information to process object;
Whether D, the pending message buffer of inspection exist the information that needs this message delay to process, if exist then this information of delay disposal; As then not carrying out next step;
E, check in the multithreading extra buffer whether the repetition warning information is arranged, if without warning information would be inserted the multithreading extra buffer; If have then this information of delay disposal;
Whether F, inspection database activity warning information view have duplicate message;
G, there is duplicate message then to upgrade operation in the usage data storehouse, time digital section of the repetition alarm in the table is added 1, upgrade alarm time; Without duplicate message usage data storehouse update then;
H, from the multithreading extra buffer deletion information;
I, delete this information from pending message buffer;
J, thread repeating step A call other handling procedures simultaneously;
Described recovery message processing flow is:
A1, from message buffer, obtain an information;
B1, this information is inserted pending message buffer;
C1, fox message type, object is processed if recovery information is then used the recovery information processing;
Whether D1, the pending message buffer of inspection exist the information that needs this message delay to process, if exist then this information of delay disposal; As then not carrying out next step;
E1, check whether the multithreading extra buffer has the information of repetition, if without this information would be inserted the multithreading extra buffer; If have then this information of delay disposal;
Whether F1, inspection database activity warning information view have warning information corresponding to this recovery information;
If G1 has this information updating is entered database; If no, whether inspection recovery information can abandon marker bit; If marker bit is fasle, then this message delay is processed, can abandon marker bit and be set to true; If be true then abandon this information;
H1, from the multithreading extra buffer deletion information;
I1, delete this information from pending message buffer;
J1, thread repeating step A1 call other handling procedures simultaneously;
Described event information is treated to:
A2, from message buffer, obtain an information;
B2, this information is inserted pending message buffer;
C2, fox message type, if event information then use case information processing object process;
D2, directly event information is stored into database.
2. a kind of ultra-large cluster alert processing method as claimed in claim 1, it is characterized in that: quoting of formation blocked in described message buffer storage large scale computer ID and pre-service.
3. a kind of ultra-large cluster alert processing method as claimed in claim 1, it is characterized in that: the multithreading buffer zone inserts and removes operation by the thread pool thread.
4. a kind of ultra-large cluster alert processing method as claimed in claim 1, it is characterized in that: described pending message buffer inserts and removes operation by a plurality of threads.
5. a kind of ultra-large cluster alert processing method as claimed in claim 1, it is characterized in that: described message buffer contains two identical warning information but the asynchronism(-nization) that corresponds to each other, at first check whether the information more Zao than this information is arranged in the pending message buffer, if have, then this message delay is processed.
6. a kind of ultra-large cluster alert processing method as claimed in claim 1, it is characterized in that: when described message buffer contains alarm clearing information corresponding to a warning information and this information, two processing threads check at first whether the information more Zao than this information is arranged in the pending message buffer, if have, then this message delay is processed.
7. a kind of ultra-large cluster alert processing method as claimed in claim 1, it is characterized in that: can abandon the zone bit initial default value is false, if marker bit is ture to be shown and carried out once waiting for operation, then abandons this information.
CN 201110069524 2011-03-22 2011-03-22 Alarm processing method of super-large scale cluster Active CN102135907B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110069524 CN102135907B (en) 2011-03-22 2011-03-22 Alarm processing method of super-large scale cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110069524 CN102135907B (en) 2011-03-22 2011-03-22 Alarm processing method of super-large scale cluster

Publications (2)

Publication Number Publication Date
CN102135907A CN102135907A (en) 2011-07-27
CN102135907B true CN102135907B (en) 2013-01-16

Family

ID=44295699

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110069524 Active CN102135907B (en) 2011-03-22 2011-03-22 Alarm processing method of super-large scale cluster

Country Status (1)

Country Link
CN (1) CN102135907B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101634955A (en) * 2009-08-21 2010-01-27 中兴通讯股份有限公司 Method for processing event in radio frequency identification system and reader management terminal (RMT)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7032226B1 (en) * 2000-06-30 2006-04-18 Mips Technologies, Inc. Methods and apparatus for managing a buffer of events in the background
US20020029266A1 (en) * 2000-09-07 2002-03-07 Edwin Tse Parallel processing architecture for alarm management network entities
JP2010204972A (en) * 2009-03-04 2010-09-16 Hitachi Ltd Computer system and statistical information acquisition method in the same

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101634955A (en) * 2009-08-21 2010-01-27 中兴通讯股份有限公司 Method for processing event in radio frequency identification system and reader management terminal (RMT)

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JP特开2010-204972A 2010.09.16

Also Published As

Publication number Publication date
CN102135907A (en) 2011-07-27

Similar Documents

Publication Publication Date Title
CN104778245B (en) Similar track method for digging and device based on magnanimity license plate identification data
TWI501097B (en) System and method of analyzing text stream message
CN101950293B (en) Log extraction method and device
CN104598565B (en) A kind of K mean value large-scale data clustering methods based on stochastic gradient descent algorithm
CN102024018A (en) On-line recovering method of junk metadata in distributed file system
Deters et al. Evaluating the definition of" stone free status" in contemporary urologic literature.
Li et al. Toward effective traffic sign detection via two-stage fusion neural networks
CN105637489A (en) Asynchronous garbage collection in a distributed database system
CN106130777A (en) System safeguarded by a kind of industrial equipment based on cloud computing
CN103530383A (en) Method for filtering safe RFID middleware redundant data
CN102135907B (en) Alarm processing method of super-large scale cluster
Yu et al. Mining frequent co-occurrence patterns across multiple data streams.
Hai et al. Mining time relaxed gradual moving object clusters
CN107992590B (en) Big data system beneficial to information comparison
CN105046217A (en) Face recognition large data amount concurrency scheme processing method
CN118885615A (en) A public opinion analysis device and method based on large language model
Li et al. Multi-view clustering integrating anchor attribute and structural information
CN105183536A (en) Optimistic time management method based on GPU
Wang et al. Tunnel security management based on association rule mining under Hadoop platform
CN105468494A (en) I/O intensive application identification method
WO2013135059A1 (en) Method for rapid data classification
CN103744899A (en) Distributed environment based mass data rapid classification method
CN107992474A (en) A kind of stream data Topics Crawling method and its system
CN110765037A (en) UBIFS data flash memory system capable of intelligently identifying thermal data
CN204695318U (en) Network security controller of computer

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20210819

Address after: Room 111-1, 1st floor, building 23, No.8 yard, Dongbei Wangxi Road, Haidian District, Beijing 100193

Patentee after: Zhongke Tenglong Information Technology Co.,Ltd.

Address before: 300384 Xiqing District, Tianjin Huayuan Industrial Zone (outside the ring) 15 1-3, hahihuayu street.

Patentee before: DAWNING INFORMATION INDUSTRY Co.,Ltd.

TR01 Transfer of patent right