Embodiment
For the application's above-mentioned purpose, feature and advantage can be become apparent more, below in conjunction with accompanying drawing, the embodiment of the present application is described in detail.
Embodiment mono-
Refer to Fig. 2, it is the process flow diagram of an embodiment of a kind of parallel data processing method of the application, and the method comprises the following steps:
Step 201: main equipment is known the pending data that need to process from data source is task of each pending data creation;
Wherein, described main equipment is known the pending data that need to process from data source, for task of each pending data creation, specifically can comprise: main equipment obtains the identification list of the pending data that need to process from data source, in described identification list, safeguard the Data Identification of all pending data; Main equipment extracts the sign of each pending data from described identification list, after being task of each pending data creation, the sign of extraction is put into this task.
For example, refer to Fig. 3, it is a kind of system applies scene of the application schematic diagram.As shown in Figure 3, data source place has the identification list of the pending data that need to process, has safeguarded the Data Identification of all pending data in identification list, for example, and can be using the address information of data as Data Identification.When main equipment has obtained after identification list from data source, according to each Data Identification in identification list, just can know the pending data of which data resource for processing.After main equipment is task of pending data creation, from identification list, extracts the Data Identification of these pending data, and the sign extracting is put into this task.For example, when being that pending data A creates after a task 1, main equipment extracts the Data Identification of pending data A from identification list, and the Data Identification of pending data A is put into task 1.
Step 202: main equipment when receiving the request message of the task of obtaining sending from equipment, for send request from devices allocation task, the execution result returning from equipment is merged; And, the executing state of each task of dynamically recording, described executing state comprises: in not carrying out, carrying out, executed and having merged;
For example, different from the task push-mechanism in existing hadoop system, in the embodiment of the present application, main equipment is to being the request mechanism based on from equipment from devices allocation task,, when main equipment is received the request message of the task of obtaining sending from equipment, then for send request from devices allocation task.When having while returning to the execution result of task from equipment, the execution result returning from equipment is merged.Meanwhile, the executing state of each task of main equipment dynamically recording, described executing state comprises: in not carrying out, carrying out, executed and having merged.
Wherein, the executing state of described each task of main equipment dynamically recording is specially: when main equipment creates after a task, by the task flagging creating, be execution; And, when main equipment receives the execution result returning from equipment, by complete task flagging, be executed; And, when main equipment is checked through the task in executed state, and after execution result is merged, by merged task flagging for merging.
It should be noted that, due to main equipment create a task time, receive the time of the execution result returning from equipment and check and do not have strict sequencing the cycle length whether having in executed state, therefore, the embodiment of the present application does not limit the execution sequence of above-mentioned three labeling processes yet.For example, when a task of the new establishment of main equipment, because new creating of task is not distributed to from equipment, also from equipment, not carried out, therefore, is not carry out by the task flagging of this new establishment.Whenever main equipment receives the execution result of certain certain task A returning from equipment, complete task A is labeled as to executed.Whenever whether main equipment inspection has proof cycle in executed state task, arrive, and after execution result is merged, by merged task flagging for merging.
Also it should be noted that, whether main equipment is except having the task in executed state by periodic test, main equipment also can often receive the execution result returning from equipment, just checks once whether having in executed state of task, and the embodiment of the present application does not limit this.Certainly, a kind of front method can be saved system power dissipation effectively.
Step 203: main equipment is exported the execution result of merged task.
It should be noted that, main equipment is the result of the merged task of output in advance, can be also when all tasks are during all in merging phase, the execution result of all merged tasks of main equipment output.For example, main equipment is periodically checked the executing state of all tasks, when the executing state of all tasks is all when merging, exports the execution result of all merged tasks.
Refer to Fig. 4, it is the state transition graph of task in the application.As shown in Figure 4, when task is created, its state is not for carrying out; When having from device request, obtain task, and main equipment selection task while distributing to from equipment from the task in executing state not, the state of task is never carried out and is converted in execution; In preset time after task is distributed, main equipment does not receive that, from the execution result of equipment feedback, the state of task is not converted to again and does not carry out from carry out; When finishing the work from equipment and execution result being fed back to main equipment, the state of task is converted to executed from carry out; After main equipment merges the task of executed state, the state of task is converted to and merges from executed.
In the prior art, for whether monitoring task is processed during in unusual condition, therefore main equipment need to, cause the execution efficiency of task lower from equipment poll practice condition repeatedly to a plurality of, and the stability of system and availability are also lower.For the further execution efficiency of raising task is, the stability of system and availability, preferably, the method of the embodiment of the present application also comprises: main equipment is in the task of state in carrying out, whether periodic test there is the task of not returning to execution result in preset time, if existed, the task of not returning to execution result in described preset time is not again labeled as and is not carried out.
For example, main equipment is distributed to certain from equipment by task A, task A is labeled as in execution, set a timer simultaneously, the timing of this timer is a preset time, when timer expiry, if main equipment is not still received the execution result of task A, again task A is not labeled as and does not carry out.
Now, in main equipment side, except comprising that a part newly creates and be labeled as unenforced task, also comprise that a part is marked as the not task of executing state again owing to not being performed in preset time, when main equipment is when receiving the request message of the task of obtaining sending from equipment, can select arbitrarily one current in the task of executing state not, and distribute to send request from equipment.Preferably, can by newly create and send request described in distributing in the priority of task of executing state not from equipment; When new establishment and after the task of executing state is not assigned with, then by the task of being again labeled as executing state not according to the time sequencing primary distribution being once assigned with to described in send request from equipment.
For process that can simple declaration main equipment allocating task, take that in main equipment side, to have 5 be example in the task of executing state not, task 1 wherein, task 2 and task 3 are new create and in the task of executing state not, task 4 and task 5 are for being again labeled as the not task of executing state, and, the time that the time that task 4 is assigned with is for the first time assigned with for the first time early than task 5.When initial, main equipment is preferentially distributed to task 1, task 2 and task 3 from equipment, and after task 1, task 2 and task 3 are all assigned with, main equipment is first distributed to task 4 from equipment, then task 5 is distributed to from equipment.
As can be seen from the above-described embodiment, because no longer propelling movement task of main equipment is given from equipment, but when receiving the request message of the task of obtaining sending from equipment, for from devices allocation task, simultaneously, because main equipment is no longer by safeguarding that a nodal information list carrys out all information from equipment in management cluster, but be task of each pending data creation, and the executing state of each task of dynamically recording.Therefore, for main equipment, from equipment, can add at any time cluster and to master devices request allocating task, or exit at any time cluster, can rapid adjustment cluster scale in the situation that of inadequate resource or the wasting of resources.
In addition, because the execution from equipment task is no longer by main equipment complete monitoring, main equipment is maintenance task state only, once not return in the certain hour after task is assigned with, there are abnormal conditions in the execution of assert task, task status is not again labeled as and is not carried out, task is re-started to distribution.Thereby the execution efficiency of task, the stability of system and availability have further been improved.
Embodiment bis-
Below from main equipment with from equipment reciprocal process, describe parallel data processing method in detail.Refer to Fig. 5, it is the interaction diagrams of a kind of parallel data processing of the application, and as shown in Figure 5, described interaction flow comprises:
Step 501: main equipment obtains the identification list of the pending data that need to process from data source;
Wherein, data source can be ftp server, database (DB) or file system.By identification list, main equipment can know which data is pending data.
Step 502: main equipment is the task of each pending data creation identifying in identification list, uses a task queue to safeguard all tasks, and be executing state not by the task flagging newly creating;
Wherein, main equipment, when creation task, is also put into corresponding task by the Data Identification of each pending data.
Step 503: main equipment receives the request message of the task of obtaining sending from equipment;
Step 504: from task queue by the task of executing state not, distribute to send request from equipment, and the state of task is never carried out and is labeled as in execution;
Step 505: from equipment receives the task of main equipment transmission, resolve the Data Identification that obtains pending data from task;
Step 506: according to Data Identification, obtain pending data from data source from equipment;
Step 507: from equipment to pending data analysis and the calculating of obtaining;
The process of above step 505-507 for executing the task from equipment, wherein, the analysis and calculation method for the treatment of deal with data can adopt method same as the prior art, therefore the embodiment of the present application repeats no more this.
Step 508: from equipment, the result of calculating and analyze is returned to main equipment, and send to main equipment the request message that obtains next task;
Step 509: main equipment receives the execution result returning from equipment is labeled as executed by the state of task from carry out;
Step 510: main equipment checks in task queue whether have the task in executed state, if had, the execution result of task is merged, is labeled as the state of task and merges from executed, if not, waits for next time and checking;
Wherein, main equipment can be periodically to the inspection of the task of executed state, can be also to trigger next time when returning to execution result from equipment to check.
Step 511: whether main equipment checks in task queue that all tasks, all in merging phase, if so, export the execution result of all merged tasks, if not, waits for next time and checking;
Wherein, main equipment can be periodic to the inspection of the task of merging phase.
Step 512: main equipment, in the task of state in carrying out, checks the task of not returning to execution result in preset time that whether exists, if existed, is not again labeled as the task of not returning to execution result in described preset time and does not carry out.
It should be noted that, step 510-step 512 does not have strict execution sequencing with other steps 501-509, and, between step 510-step 512, there is no strict execution sequencing yet, when its arrival checks next time, can carry out this step.
As can be seen from the above-described embodiment, because no longer propelling movement task of main equipment is given from equipment, but when receiving the request message of the task of obtaining sending from equipment, for from devices allocation task, simultaneously, because main equipment is no longer by safeguarding that a nodal information list carrys out all information from equipment in management cluster, but be task of each pending data creation, and the executing state of each task of dynamically recording.Therefore, for main equipment, from equipment, can add at any time cluster and to master devices request allocating task, or exit at any time cluster, can rapid adjustment cluster scale in the situation that of inadequate resource or the wasting of resources.
In addition, because the execution from equipment task is no longer by main equipment complete monitoring, main equipment is maintenance task state only, once not return in the certain hour after task is assigned with, there are abnormal conditions in the execution of assert task, task status is not again labeled as and is not carried out, task is re-started to distribution.Thereby the execution efficiency of task, the stability of system and availability have further been improved.
Embodiment tri-
Corresponding with above-mentioned a kind of parallel data processing method, the embodiment of the present application also provides a kind of parallel data processing equipment.Refer to Fig. 6, it is the structural drawing of an embodiment of a kind of parallel data processing equipment of the application, and this device comprises task creation module 601, task distribution module 602, merges module 603, dynamically recording module 604 and result output module 605.Principle of work below in conjunction with this device is further introduced its inner structure and annexation.
Task creation module 601, for know the pending data that need to process from data source, is task of each pending data creation;
Task distribution module 602, for when receiving the request message of the task of obtaining sending from equipment, for send request from devices allocation task;
Merge module 603, for the execution result returning from equipment is merged;
Dynamically recording module 604, for the executing state of each task of dynamically recording, described executing state comprise do not carry out, carry out in, executed and having merged;
Result output module 605, for exporting the execution result of merged task.
Preferably, refer to Fig. 7, it is the structural drawing of another embodiment of a kind of parallel data processing equipment of the application, as shown in Figure 7, described device also comprises: heavy logging modle 606, in carrying out the task of state, checks the task of not returning to execution result in preset time that whether exists, if existed, the task of not returning to execution result in described preset time is not again labeled as and is not carried out.
Preferably, refer to Fig. 8, it is the structural drawing of an embodiment of the application's task creation module, and task creation module comprises: submodule 801 and marker extraction submodule 802 are obtained in list, wherein,
Submodule 801 is obtained in list, for obtain the identification list of the pending data that need to process from data source, has safeguarded the Data Identification of all pending data in described identification list;
Marker extraction submodule 802, for extract the sign of each pending data from described identification list, after being task of each pending data creation, puts into task by the sign of extraction.
Preferably, dynamically recording module comprises: the first mark submodule, after creating a task, is execution by the task flagging creating; The second mark submodule, for when receiving the execution result returning from equipment, is executed by complete task flagging; The 3rd mark submodule, for when periodic test is to having in executed state of task, and after execution result is merged, by merged task flagging for merging.
For the parallel data processing equipment in Fig. 7, preferably, task distribution module comprises: the first distribution sub module, for by newly create and send request described in distributing in the priority of task of executing state not from equipment; The second distribution sub module, for creating and after the task of executing state has not been assigned with when new, then described in the task of executing state not of being again labeled as is distributed to successively according to the time sequencing being assigned with for the first time, send request from equipment.
As can be seen from the above-described embodiment, because no longer propelling movement task of main equipment is given from equipment, but when receiving the request message of the task of obtaining sending from equipment, for from devices allocation task, simultaneously, because main equipment is no longer by safeguarding that a nodal information list carrys out all information from equipment in management cluster, but be task of each pending data creation, and the executing state of each task of dynamically recording.Therefore, for main equipment, from equipment, can add at any time cluster and to master devices request allocating task, or exit at any time cluster, can rapid adjustment cluster scale in the situation that of inadequate resource or the wasting of resources.
In addition, because the execution from equipment task is no longer by main equipment complete monitoring, main equipment is maintenance task state only, once not return in the certain hour after task is assigned with, there are abnormal conditions in the execution of assert task, task status is not again labeled as and is not carried out, task is re-started to distribution.Thereby the execution efficiency of task, the stability of system and availability have further been improved.
Embodiment tetra-
The embodiment of the present application also provides a kind of parallel data handling system.Refer to Fig. 9, it is the structural drawing of an embodiment of a kind of parallel data handling system of the application, and this system comprises: a main equipment 901 and a plurality of cluster forming from equipment 902.Principle of work below in conjunction with this device is further introduced its inner structure and annexation.
Main equipment 901, for knowing from data source the pending data that need to process, for task of each pending data creation, when receiving the request message of the task of obtaining sending from equipment, for send request from devices allocation task, the execution result returning from equipment is merged, and the executing state of each task of dynamically recording, described executing state comprise do not carry out, carry out in, executed and having merged, export the amalgamation result of merged task;
From equipment 902, for send the request message of the task of obtaining to described main equipment, after receiving the task of described main equipment distribution, carry out the task of distributing, execution result is returned to described main equipment.
Preferably, main equipment 901 also, for being executory task at state, checks the task of not returning to execution result in preset time of whether depositing, if existed, the task of not returning to execution result in described preset time is not again labeled as and is not carried out.
As can be seen from the above-described embodiment, because no longer propelling movement task of main equipment is given from equipment, but when receiving the request message of the task of obtaining sending from equipment, for from devices allocation task, simultaneously, because main equipment is no longer by safeguarding that a nodal information list carrys out all information from equipment in management cluster, but be task of each pending data creation, and the executing state of each task of dynamically recording.Therefore, for main equipment, from equipment, can add at any time cluster and to master devices request allocating task, or exit at any time cluster, can rapid adjustment cluster scale in the situation that of inadequate resource or the wasting of resources.
In addition, because the execution from equipment task is no longer by main equipment complete monitoring, main equipment is maintenance task state only, once not return in the certain hour after task is assigned with, there are abnormal conditions in the execution of assert task, task status is not again labeled as and is not carried out, task is re-started to distribution.Thereby the execution efficiency of task, the stability of system and availability have further been improved.
It should be noted that, one of ordinary skill in the art will appreciate that all or part of flow process realizing in above-described embodiment method, to come the hardware that instruction is relevant to complete by computer program, described program can be stored in a computer read/write memory medium, this program, when carrying out, can comprise as the flow process of the embodiment of above-mentioned each side method.Wherein, described storage medium can be magnetic disc, CD, read-only store-memory body (Read-Only Memory, ROM) or random store-memory body (Random AccessMemory, RAM) etc.
A kind of parallel data processing method, device and the parallel data handling system that above the application are provided are described in detail, applied specific embodiment herein the application's principle and embodiment are set forth, the explanation of above embodiment is just for helping to understand the application's method and core concept thereof; Meanwhile, for one of ordinary skill in the art, the thought according to the application, all will change in specific embodiments and applications, and in sum, this description should not be construed as the restriction to the application.