[go: up one dir, main page]

CN102279730B - Parallel data processing method, device and system - Google Patents

Parallel data processing method, device and system Download PDF

Info

Publication number
CN102279730B
CN102279730B CN201010200891.7A CN201010200891A CN102279730B CN 102279730 B CN102279730 B CN 102279730B CN 201010200891 A CN201010200891 A CN 201010200891A CN 102279730 B CN102279730 B CN 102279730B
Authority
CN
China
Prior art keywords
task
equipment
main equipment
pending data
execution result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201010200891.7A
Other languages
Chinese (zh)
Other versions
CN102279730A (en
Inventor
岑文初
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Network Technology Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201010200891.7A priority Critical patent/CN102279730B/en
Publication of CN102279730A publication Critical patent/CN102279730A/en
Priority to HK12101872.7A priority patent/HK1161386A1/en
Application granted granted Critical
Publication of CN102279730B publication Critical patent/CN102279730B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the invention discloses a parallel data processing method, device and system. The method comprises the following steps: primary equipment acquires data needing to be processed from a data source and creates a task for each datum to be processed; the primary equipment distributes a task to slave equipment which sends a request when a task acquisition request message sent by the slave equipment is received, combines execution results returned by the slave equipment and dynamically records the execution state of each task, wherein the execution state comprises non-execution, during completion, execution completion and combination completion; and the primary equipment outputs an execution result of the combined task. According to the embodiment of the invention, quick adjustment of the cluster scale of the system can be supported under the condition of resource insufficiency or resource waste.

Description

A kind of parallel data processing method, device and parallel data handling system
Technical field
The application relates to communication and field of computer technology, particularly relates to a kind of parallel data processing method, device and parallel data handling system.
Background technology
Along with the development of web2.0 technology, the business datum in internet, applications or internet platform, as user behavior data and platform system data, all presents the trend that magnanimity increases.In order to adapt to the application demand of magnanimity business datum being carried out to data processing, as, in internet site platform, need to carry out analysis and calculation to user behavior data and platform system data, a kind of distributed parallel data treatment technology arises at the historic moment, it utilizes the mutual cooperative work of a plurality of computing machines, jointly completes the processing to mass data.
Current, in large-scale internet site platform, a kind of distributed parallel Computational frame being most widely used is Hadoop system framework.Refer to Fig. 1, it is the structural representation of Hadoop system framework in prior art.As shown in Figure 1, system comprises that a main equipment (Master) and one are from equipment (Slave) cluster, wherein, every all has back end (DataNode) and the subtask tracker (TaskTracker) logic function from equipment.DataNode is responsible for storage service data, and TaskTracker is responsible for carrying out the task that main equipment pushes, that is, the business datum of storing in DataNode is processed, and the execution result of task is carried out to part merging.Main equipment, from logic function, comprises namenode (NameNode) and task tracker (JobTracker).NameNode is in charge of each business datum of storing from equipment, and JobTracker is responsible for starting, following the tracks of and dispatch each from equipment.
But, inventor finds under study for action, in existing Hadoop system, main equipment carrys out all information from equipment in management cluster by safeguarding a nodal information list, and from facility information, formulated task allocation algorithms based on all in nodal information list, according to task allocation algorithms, task is pushed to each from equipment.Yet, in system, there is inadequate resource, while needing dynamic expansion from equipment, or there is the wasting of resources, while needing to delete from equipment, main equipment must first upgrade the nodal information list of self maintained, newer task allocation algorithms is formulated in the nodal information list based on upgrading, so that main equipment is pushed to each from equipment according to task allocation algorithms by task, by carrying out concurrently data processing from equipment.
As can be seen here, data processing method and corresponding parallel data processing system process when expanding or delete from equipment parallel in existing Hadoop system are loaded down with trivial details, and be unfavorable for dynamic expansion or delete from equipment, cannot rapid adjustment cluster scale in the situation that of inadequate resource or the wasting of resources.
Summary of the invention
In order to solve the problems of the technologies described above, the embodiment of the present application provides a kind of parallel data processing method and parallel data handling system, with back-up system rapid adjustment cluster scale inadequate resource or the wasting of resources in the situation that.
The embodiment of the present application discloses following technical scheme:
A parallel data processing method, comprising: main equipment is known the pending data that need to process from data source, is task of each pending data creation; Main equipment is when receiving the request message of the task of obtaining sending from equipment, for send request from devices allocation task, the execution result returning from equipment is merged, and, the executing state of each task of dynamically recording, described executing state comprise do not carry out, carry out in, executed and having merged; Main equipment is exported the execution result of merged task.
A parallel data processing equipment, comprising: task creation module, for know the pending data that need to process from data source, is task of each pending data creation; Task distribution module, for when receiving the request message of the task of obtaining sending from equipment, for send request from devices allocation task; Merge module, for the execution result returning from equipment is merged; Dynamically recording module, for the executing state of each task of dynamically recording, described executing state comprise do not carry out, carry out in, executed and having merged; Result output module, for exporting the execution result of merged task.
As can be seen from the above-described embodiment, because no longer propelling movement task of main equipment is given from equipment, but when receiving the request message of the task of obtaining sending from equipment, for from devices allocation task, simultaneously, because main equipment is no longer by safeguarding that a nodal information list carrys out all information from equipment in management cluster, but be task of each pending data creation, and the executing state of each task of dynamically recording.Therefore, for main equipment, from equipment, can add at any time cluster and to master devices request allocating task, or exit at any time cluster, can rapid adjustment cluster scale in the situation that of inadequate resource or the wasting of resources.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present application or technical scheme of the prior art, to the accompanying drawing of required use in embodiment or description of the Prior Art be briefly described below, apparently, for those of ordinary skills, do not paying under the prerequisite of creative work, can also obtain according to these accompanying drawings other accompanying drawing.
Fig. 1 is the process flow diagram of an embodiment of a kind of data charging method of the application;
Fig. 2 is the process flow diagram of an embodiment of a kind of parallel data processing method of the application;
Fig. 3 is a kind of system applies scene of the application schematic diagram;
Fig. 4 is the state transition graph of task in the application;
Fig. 5 is the interaction diagrams of a kind of parallel data processing of the application;
Fig. 6 is the structural drawing of an embodiment of a kind of parallel data processing equipment of the application;
Fig. 7 is the structural drawing of another embodiment of a kind of parallel data processing equipment of the application;
Fig. 8 is the structural drawing of an embodiment of the application's task module;
Fig. 9 is the structural drawing of an embodiment of a kind of parallel data handling system of the application.
Embodiment
For the application's above-mentioned purpose, feature and advantage can be become apparent more, below in conjunction with accompanying drawing, the embodiment of the present application is described in detail.
Embodiment mono-
Refer to Fig. 2, it is the process flow diagram of an embodiment of a kind of parallel data processing method of the application, and the method comprises the following steps:
Step 201: main equipment is known the pending data that need to process from data source is task of each pending data creation;
Wherein, described main equipment is known the pending data that need to process from data source, for task of each pending data creation, specifically can comprise: main equipment obtains the identification list of the pending data that need to process from data source, in described identification list, safeguard the Data Identification of all pending data; Main equipment extracts the sign of each pending data from described identification list, after being task of each pending data creation, the sign of extraction is put into this task.
For example, refer to Fig. 3, it is a kind of system applies scene of the application schematic diagram.As shown in Figure 3, data source place has the identification list of the pending data that need to process, has safeguarded the Data Identification of all pending data in identification list, for example, and can be using the address information of data as Data Identification.When main equipment has obtained after identification list from data source, according to each Data Identification in identification list, just can know the pending data of which data resource for processing.After main equipment is task of pending data creation, from identification list, extracts the Data Identification of these pending data, and the sign extracting is put into this task.For example, when being that pending data A creates after a task 1, main equipment extracts the Data Identification of pending data A from identification list, and the Data Identification of pending data A is put into task 1.
Step 202: main equipment when receiving the request message of the task of obtaining sending from equipment, for send request from devices allocation task, the execution result returning from equipment is merged; And, the executing state of each task of dynamically recording, described executing state comprises: in not carrying out, carrying out, executed and having merged;
For example, different from the task push-mechanism in existing hadoop system, in the embodiment of the present application, main equipment is to being the request mechanism based on from equipment from devices allocation task,, when main equipment is received the request message of the task of obtaining sending from equipment, then for send request from devices allocation task.When having while returning to the execution result of task from equipment, the execution result returning from equipment is merged.Meanwhile, the executing state of each task of main equipment dynamically recording, described executing state comprises: in not carrying out, carrying out, executed and having merged.
Wherein, the executing state of described each task of main equipment dynamically recording is specially: when main equipment creates after a task, by the task flagging creating, be execution; And, when main equipment receives the execution result returning from equipment, by complete task flagging, be executed; And, when main equipment is checked through the task in executed state, and after execution result is merged, by merged task flagging for merging.
It should be noted that, due to main equipment create a task time, receive the time of the execution result returning from equipment and check and do not have strict sequencing the cycle length whether having in executed state, therefore, the embodiment of the present application does not limit the execution sequence of above-mentioned three labeling processes yet.For example, when a task of the new establishment of main equipment, because new creating of task is not distributed to from equipment, also from equipment, not carried out, therefore, is not carry out by the task flagging of this new establishment.Whenever main equipment receives the execution result of certain certain task A returning from equipment, complete task A is labeled as to executed.Whenever whether main equipment inspection has proof cycle in executed state task, arrive, and after execution result is merged, by merged task flagging for merging.
Also it should be noted that, whether main equipment is except having the task in executed state by periodic test, main equipment also can often receive the execution result returning from equipment, just checks once whether having in executed state of task, and the embodiment of the present application does not limit this.Certainly, a kind of front method can be saved system power dissipation effectively.
Step 203: main equipment is exported the execution result of merged task.
It should be noted that, main equipment is the result of the merged task of output in advance, can be also when all tasks are during all in merging phase, the execution result of all merged tasks of main equipment output.For example, main equipment is periodically checked the executing state of all tasks, when the executing state of all tasks is all when merging, exports the execution result of all merged tasks.
Refer to Fig. 4, it is the state transition graph of task in the application.As shown in Figure 4, when task is created, its state is not for carrying out; When having from device request, obtain task, and main equipment selection task while distributing to from equipment from the task in executing state not, the state of task is never carried out and is converted in execution; In preset time after task is distributed, main equipment does not receive that, from the execution result of equipment feedback, the state of task is not converted to again and does not carry out from carry out; When finishing the work from equipment and execution result being fed back to main equipment, the state of task is converted to executed from carry out; After main equipment merges the task of executed state, the state of task is converted to and merges from executed.
In the prior art, for whether monitoring task is processed during in unusual condition, therefore main equipment need to, cause the execution efficiency of task lower from equipment poll practice condition repeatedly to a plurality of, and the stability of system and availability are also lower.For the further execution efficiency of raising task is, the stability of system and availability, preferably, the method of the embodiment of the present application also comprises: main equipment is in the task of state in carrying out, whether periodic test there is the task of not returning to execution result in preset time, if existed, the task of not returning to execution result in described preset time is not again labeled as and is not carried out.
For example, main equipment is distributed to certain from equipment by task A, task A is labeled as in execution, set a timer simultaneously, the timing of this timer is a preset time, when timer expiry, if main equipment is not still received the execution result of task A, again task A is not labeled as and does not carry out.
Now, in main equipment side, except comprising that a part newly creates and be labeled as unenforced task, also comprise that a part is marked as the not task of executing state again owing to not being performed in preset time, when main equipment is when receiving the request message of the task of obtaining sending from equipment, can select arbitrarily one current in the task of executing state not, and distribute to send request from equipment.Preferably, can by newly create and send request described in distributing in the priority of task of executing state not from equipment; When new establishment and after the task of executing state is not assigned with, then by the task of being again labeled as executing state not according to the time sequencing primary distribution being once assigned with to described in send request from equipment.
For process that can simple declaration main equipment allocating task, take that in main equipment side, to have 5 be example in the task of executing state not, task 1 wherein, task 2 and task 3 are new create and in the task of executing state not, task 4 and task 5 are for being again labeled as the not task of executing state, and, the time that the time that task 4 is assigned with is for the first time assigned with for the first time early than task 5.When initial, main equipment is preferentially distributed to task 1, task 2 and task 3 from equipment, and after task 1, task 2 and task 3 are all assigned with, main equipment is first distributed to task 4 from equipment, then task 5 is distributed to from equipment.
As can be seen from the above-described embodiment, because no longer propelling movement task of main equipment is given from equipment, but when receiving the request message of the task of obtaining sending from equipment, for from devices allocation task, simultaneously, because main equipment is no longer by safeguarding that a nodal information list carrys out all information from equipment in management cluster, but be task of each pending data creation, and the executing state of each task of dynamically recording.Therefore, for main equipment, from equipment, can add at any time cluster and to master devices request allocating task, or exit at any time cluster, can rapid adjustment cluster scale in the situation that of inadequate resource or the wasting of resources.
In addition, because the execution from equipment task is no longer by main equipment complete monitoring, main equipment is maintenance task state only, once not return in the certain hour after task is assigned with, there are abnormal conditions in the execution of assert task, task status is not again labeled as and is not carried out, task is re-started to distribution.Thereby the execution efficiency of task, the stability of system and availability have further been improved.
Embodiment bis-
Below from main equipment with from equipment reciprocal process, describe parallel data processing method in detail.Refer to Fig. 5, it is the interaction diagrams of a kind of parallel data processing of the application, and as shown in Figure 5, described interaction flow comprises:
Step 501: main equipment obtains the identification list of the pending data that need to process from data source;
Wherein, data source can be ftp server, database (DB) or file system.By identification list, main equipment can know which data is pending data.
Step 502: main equipment is the task of each pending data creation identifying in identification list, uses a task queue to safeguard all tasks, and be executing state not by the task flagging newly creating;
Wherein, main equipment, when creation task, is also put into corresponding task by the Data Identification of each pending data.
Step 503: main equipment receives the request message of the task of obtaining sending from equipment;
Step 504: from task queue by the task of executing state not, distribute to send request from equipment, and the state of task is never carried out and is labeled as in execution;
Step 505: from equipment receives the task of main equipment transmission, resolve the Data Identification that obtains pending data from task;
Step 506: according to Data Identification, obtain pending data from data source from equipment;
Step 507: from equipment to pending data analysis and the calculating of obtaining;
The process of above step 505-507 for executing the task from equipment, wherein, the analysis and calculation method for the treatment of deal with data can adopt method same as the prior art, therefore the embodiment of the present application repeats no more this.
Step 508: from equipment, the result of calculating and analyze is returned to main equipment, and send to main equipment the request message that obtains next task;
Step 509: main equipment receives the execution result returning from equipment is labeled as executed by the state of task from carry out;
Step 510: main equipment checks in task queue whether have the task in executed state, if had, the execution result of task is merged, is labeled as the state of task and merges from executed, if not, waits for next time and checking;
Wherein, main equipment can be periodically to the inspection of the task of executed state, can be also to trigger next time when returning to execution result from equipment to check.
Step 511: whether main equipment checks in task queue that all tasks, all in merging phase, if so, export the execution result of all merged tasks, if not, waits for next time and checking;
Wherein, main equipment can be periodic to the inspection of the task of merging phase.
Step 512: main equipment, in the task of state in carrying out, checks the task of not returning to execution result in preset time that whether exists, if existed, is not again labeled as the task of not returning to execution result in described preset time and does not carry out.
It should be noted that, step 510-step 512 does not have strict execution sequencing with other steps 501-509, and, between step 510-step 512, there is no strict execution sequencing yet, when its arrival checks next time, can carry out this step.
As can be seen from the above-described embodiment, because no longer propelling movement task of main equipment is given from equipment, but when receiving the request message of the task of obtaining sending from equipment, for from devices allocation task, simultaneously, because main equipment is no longer by safeguarding that a nodal information list carrys out all information from equipment in management cluster, but be task of each pending data creation, and the executing state of each task of dynamically recording.Therefore, for main equipment, from equipment, can add at any time cluster and to master devices request allocating task, or exit at any time cluster, can rapid adjustment cluster scale in the situation that of inadequate resource or the wasting of resources.
In addition, because the execution from equipment task is no longer by main equipment complete monitoring, main equipment is maintenance task state only, once not return in the certain hour after task is assigned with, there are abnormal conditions in the execution of assert task, task status is not again labeled as and is not carried out, task is re-started to distribution.Thereby the execution efficiency of task, the stability of system and availability have further been improved.
Embodiment tri-
Corresponding with above-mentioned a kind of parallel data processing method, the embodiment of the present application also provides a kind of parallel data processing equipment.Refer to Fig. 6, it is the structural drawing of an embodiment of a kind of parallel data processing equipment of the application, and this device comprises task creation module 601, task distribution module 602, merges module 603, dynamically recording module 604 and result output module 605.Principle of work below in conjunction with this device is further introduced its inner structure and annexation.
Task creation module 601, for know the pending data that need to process from data source, is task of each pending data creation;
Task distribution module 602, for when receiving the request message of the task of obtaining sending from equipment, for send request from devices allocation task;
Merge module 603, for the execution result returning from equipment is merged;
Dynamically recording module 604, for the executing state of each task of dynamically recording, described executing state comprise do not carry out, carry out in, executed and having merged;
Result output module 605, for exporting the execution result of merged task.
Preferably, refer to Fig. 7, it is the structural drawing of another embodiment of a kind of parallel data processing equipment of the application, as shown in Figure 7, described device also comprises: heavy logging modle 606, in carrying out the task of state, checks the task of not returning to execution result in preset time that whether exists, if existed, the task of not returning to execution result in described preset time is not again labeled as and is not carried out.
Preferably, refer to Fig. 8, it is the structural drawing of an embodiment of the application's task creation module, and task creation module comprises: submodule 801 and marker extraction submodule 802 are obtained in list, wherein,
Submodule 801 is obtained in list, for obtain the identification list of the pending data that need to process from data source, has safeguarded the Data Identification of all pending data in described identification list;
Marker extraction submodule 802, for extract the sign of each pending data from described identification list, after being task of each pending data creation, puts into task by the sign of extraction.
Preferably, dynamically recording module comprises: the first mark submodule, after creating a task, is execution by the task flagging creating; The second mark submodule, for when receiving the execution result returning from equipment, is executed by complete task flagging; The 3rd mark submodule, for when periodic test is to having in executed state of task, and after execution result is merged, by merged task flagging for merging.
For the parallel data processing equipment in Fig. 7, preferably, task distribution module comprises: the first distribution sub module, for by newly create and send request described in distributing in the priority of task of executing state not from equipment; The second distribution sub module, for creating and after the task of executing state has not been assigned with when new, then described in the task of executing state not of being again labeled as is distributed to successively according to the time sequencing being assigned with for the first time, send request from equipment.
As can be seen from the above-described embodiment, because no longer propelling movement task of main equipment is given from equipment, but when receiving the request message of the task of obtaining sending from equipment, for from devices allocation task, simultaneously, because main equipment is no longer by safeguarding that a nodal information list carrys out all information from equipment in management cluster, but be task of each pending data creation, and the executing state of each task of dynamically recording.Therefore, for main equipment, from equipment, can add at any time cluster and to master devices request allocating task, or exit at any time cluster, can rapid adjustment cluster scale in the situation that of inadequate resource or the wasting of resources.
In addition, because the execution from equipment task is no longer by main equipment complete monitoring, main equipment is maintenance task state only, once not return in the certain hour after task is assigned with, there are abnormal conditions in the execution of assert task, task status is not again labeled as and is not carried out, task is re-started to distribution.Thereby the execution efficiency of task, the stability of system and availability have further been improved.
Embodiment tetra-
The embodiment of the present application also provides a kind of parallel data handling system.Refer to Fig. 9, it is the structural drawing of an embodiment of a kind of parallel data handling system of the application, and this system comprises: a main equipment 901 and a plurality of cluster forming from equipment 902.Principle of work below in conjunction with this device is further introduced its inner structure and annexation.
Main equipment 901, for knowing from data source the pending data that need to process, for task of each pending data creation, when receiving the request message of the task of obtaining sending from equipment, for send request from devices allocation task, the execution result returning from equipment is merged, and the executing state of each task of dynamically recording, described executing state comprise do not carry out, carry out in, executed and having merged, export the amalgamation result of merged task;
From equipment 902, for send the request message of the task of obtaining to described main equipment, after receiving the task of described main equipment distribution, carry out the task of distributing, execution result is returned to described main equipment.
Preferably, main equipment 901 also, for being executory task at state, checks the task of not returning to execution result in preset time of whether depositing, if existed, the task of not returning to execution result in described preset time is not again labeled as and is not carried out.
As can be seen from the above-described embodiment, because no longer propelling movement task of main equipment is given from equipment, but when receiving the request message of the task of obtaining sending from equipment, for from devices allocation task, simultaneously, because main equipment is no longer by safeguarding that a nodal information list carrys out all information from equipment in management cluster, but be task of each pending data creation, and the executing state of each task of dynamically recording.Therefore, for main equipment, from equipment, can add at any time cluster and to master devices request allocating task, or exit at any time cluster, can rapid adjustment cluster scale in the situation that of inadequate resource or the wasting of resources.
In addition, because the execution from equipment task is no longer by main equipment complete monitoring, main equipment is maintenance task state only, once not return in the certain hour after task is assigned with, there are abnormal conditions in the execution of assert task, task status is not again labeled as and is not carried out, task is re-started to distribution.Thereby the execution efficiency of task, the stability of system and availability have further been improved.
It should be noted that, one of ordinary skill in the art will appreciate that all or part of flow process realizing in above-described embodiment method, to come the hardware that instruction is relevant to complete by computer program, described program can be stored in a computer read/write memory medium, this program, when carrying out, can comprise as the flow process of the embodiment of above-mentioned each side method.Wherein, described storage medium can be magnetic disc, CD, read-only store-memory body (Read-Only Memory, ROM) or random store-memory body (Random AccessMemory, RAM) etc.
A kind of parallel data processing method, device and the parallel data handling system that above the application are provided are described in detail, applied specific embodiment herein the application's principle and embodiment are set forth, the explanation of above embodiment is just for helping to understand the application's method and core concept thereof; Meanwhile, for one of ordinary skill in the art, the thought according to the application, all will change in specific embodiments and applications, and in sum, this description should not be construed as the restriction to the application.

Claims (8)

1. a parallel data processing method, is characterized in that, comprising:
Main equipment is known the pending data that need to process from data source, be task of each pending data creation;
Main equipment when receiving the request message of the task of obtaining sending from equipment, for send request from devices allocation task; The execution result returning from equipment is merged, and, the executing state of each task of dynamically recording, described executing state comprise do not carry out, carry out in, executed and having merged;
Main equipment is exported the execution result of merged task;
Wherein, described main equipment is known the pending data that need to process from data source, for task of each pending data creation specifically comprises: main equipment obtains the identification list of the pending data that need to process from data source, safeguarded the Data Identification of all pending data in described identification list; Main equipment extracts the sign of each pending data from described identification list, after being task of each pending data creation, the sign of extraction is put into task.
2. parallel data processing method according to claim 1, is characterized in that, described method also comprises:
Main equipment, in the task of state in carrying out, checks the task of not returning to execution result in preset time that whether exists, if existed, the task of not returning to execution result in described preset time is not again labeled as and is not carried out.
3. parallel data processing method according to claim 1 and 2, is characterized in that, the executing state of described each task of main equipment dynamically recording comprises:
When main equipment creates after a task, by the task flagging creating, be execution; And,
When main equipment receives the execution result returning from equipment, by complete task flagging, be executed; And,
When main equipment is checked through the task in executed state, and after execution result is merged, by merged task flagging for merging.
4. parallel data processing method according to claim 2, is characterized in that, described main equipment, when receiving the request message of the task of obtaining sending from equipment, is specifically comprising from devices allocation task of sending request:
By newly create and send request described in distributing in the priority of task of executing state not from equipment;
When new establishment and after the task of executing state has not been assigned with, then described in the task of executing state not of being again labeled as is distributed to successively according to the time sequencing being assigned with for the first time, send request from equipment.
5. a parallel data processing equipment, is characterized in that, comprising:
Task creation module, for know the pending data that need to process from data source, is task of each pending data creation;
Task distribution module, for when receiving the request message of the task of obtaining sending from equipment, for send request from devices allocation task;
Merge module, for the execution result returning from equipment is merged;
Dynamically recording module, for the executing state of each task of dynamically recording, described executing state comprise do not carry out, carry out in, executed and having merged;
Result output module, for exporting the execution result of merged task;
Wherein, described task creation module comprises:
Submodule is obtained in list, for obtain the identification list of the pending data that need to process from data source, has safeguarded the Data Identification of all pending data in described identification list;
Marker extraction submodule, for extract the sign of each pending data from described identification list, after being task of each pending data creation, puts into task by the sign of extraction.
6. parallel data processing equipment according to claim 5, is characterized in that, also comprises:
Heavy logging modle, in carrying out the task of state, checks the task of not returning to execution result in preset time that whether exists, if existed, the task of not returning to execution result in described preset time is not again labeled as and is not carried out.
7. a parallel data handling system, is characterized in that, comprising: a main equipment and a plurality of from equipment, wherein,
Described main equipment, for know the pending data that need to process from data source, is task of each pending data creation; When receiving the request message of the task of obtaining sending from equipment, for send request from devices allocation task; The execution result returning from equipment is merged, and the executing state of each task of dynamically recording, described executing state comprise do not carry out, carry out in, executed and having merged, export the amalgamation result of merged task;
Described from equipment, for send the request message of the task of obtaining to described main equipment, after receiving the task of described main equipment distribution, carry out the task of distributing, execution result is returned to described main equipment;
Wherein, described main equipment is known the pending data that need to process from data source, for task of each pending data creation specifically comprises: main equipment obtains the identification list of the pending data that need to process from data source, safeguarded the Data Identification of all pending data in described identification list; Main equipment extracts the sign of each pending data from described identification list, after being task of each pending data creation, the sign of extraction is put into task.
8. parallel data handling system according to claim 7, it is characterized in that, described main equipment is also for being executory task at state, check the task of not returning to execution result in preset time that whether exists, if existed, the task of not returning to execution result in described preset time is not again labeled as and is not carried out.
CN201010200891.7A 2010-06-10 2010-06-10 Parallel data processing method, device and system Active CN102279730B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201010200891.7A CN102279730B (en) 2010-06-10 2010-06-10 Parallel data processing method, device and system
HK12101872.7A HK1161386A1 (en) 2010-06-10 2012-02-24 Method, device and system for parallel data processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201010200891.7A CN102279730B (en) 2010-06-10 2010-06-10 Parallel data processing method, device and system

Publications (2)

Publication Number Publication Date
CN102279730A CN102279730A (en) 2011-12-14
CN102279730B true CN102279730B (en) 2014-02-05

Family

ID=45105202

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010200891.7A Active CN102279730B (en) 2010-06-10 2010-06-10 Parallel data processing method, device and system

Country Status (2)

Country Link
CN (1) CN102279730B (en)
HK (1) HK1161386A1 (en)

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103188306B (en) * 2011-12-30 2016-04-27 中国移动通信集团公司 Distributed preprocess method and system
CN103294527A (en) * 2012-02-29 2013-09-11 深圳市思乐网络技术有限责任公司 Method, system, and server for processing network task
CN103729257B (en) * 2012-10-16 2017-04-12 阿里巴巴集团控股有限公司 Distributed parallel computing method and system
CN104102475B (en) * 2013-04-11 2018-10-02 腾讯科技(深圳)有限公司 The method, apparatus and system of distributed parallel task processing
CN103475520B (en) * 2013-09-10 2017-04-26 聚好看科技股份有限公司 Service processing control method and device in distribution network
CN103559036A (en) * 2013-11-04 2014-02-05 北京中搜网络技术股份有限公司 Data batch processing system and method based on Hadoop
CN103685492B (en) * 2013-12-03 2017-01-25 北京智谷睿拓技术服务有限公司 Dispatching method, dispatching device and application of Hadoop trunking system
CN104462304A (en) * 2014-11-28 2015-03-25 北京奇虎科技有限公司 Information processing method and device
CN105204941A (en) * 2015-08-18 2015-12-30 耿懿超 Data processing method and data processing device
CN105844717A (en) * 2016-01-08 2016-08-10 乐卡汽车智能科技(北京)有限公司 Information processing method and system, and control device
CN106201984A (en) * 2016-07-15 2016-12-07 青岛海信电器股份有限公司 A kind of method for reading data and device
CN108021430B (en) * 2016-10-31 2021-11-05 杭州海康威视数字技术股份有限公司 Distributed task processing method and device
CN106850409B (en) * 2017-01-24 2019-12-10 腾讯科技(深圳)有限公司 Method, equipment and system for processing message chain breaking task
CN107402956B (en) * 2017-06-07 2020-02-21 网易有道信息技术(杭州)有限公司 Data processing method and device for large task and computer readable storage medium
CN107688496B (en) * 2017-07-24 2020-12-04 深圳壹账通智能科技有限公司 Task distributed processing method and device, storage medium and server
CN109508228A (en) * 2017-09-15 2019-03-22 深圳竹云科技有限公司 A kind of data processing method, task execution device and task generating device
CN108153678A (en) * 2018-01-17 2018-06-12 北京网信云服信息科技有限公司 A kind of test assignment processing method and processing device
CN108600008B (en) * 2018-04-24 2021-12-17 致云科技有限公司 Server management method, server management device and distributed system
CN110704517B (en) * 2018-06-21 2023-01-17 北京国双科技有限公司 Method and device for generating task, storage medium and processor
CN109255515A (en) * 2018-07-24 2019-01-22 武汉空心科技有限公司 A kind of task exploitation cloud platform based on page metering and unit time distribution
CN109146250A (en) * 2018-07-24 2019-01-04 武汉空心科技有限公司 Task exploitation delivery method and system based on page metering
CN111431951B (en) * 2019-01-09 2022-05-17 阿里巴巴集团控股有限公司 Data processing method, node equipment, system and storage medium
CN114077448B (en) * 2020-08-11 2024-09-27 深圳云天励飞技术股份有限公司 Data management method and related equipment
CN112181662B (en) * 2020-10-13 2023-05-02 深圳壹账通智能科技有限公司 Task scheduling method and device, electronic equipment and storage medium
CN113535675A (en) * 2020-12-04 2021-10-22 高慧军 Data maintenance method based on big data and big data server
CN114826811A (en) * 2021-01-28 2022-07-29 南宁富桂精密工业有限公司 Data transmission method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101561767A (en) * 2008-04-16 2009-10-21 上海聚力传媒技术有限公司 Method and device for executing tasks based on operating system
CN101566957A (en) * 2008-04-25 2009-10-28 恩益禧电子股份有限公司 Information processing system and task execution control method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010039526A (en) * 2008-07-31 2010-02-18 Toshiba Corp Computer program and master computer

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101561767A (en) * 2008-04-16 2009-10-21 上海聚力传媒技术有限公司 Method and device for executing tasks based on operating system
CN101566957A (en) * 2008-04-25 2009-10-28 恩益禧电子股份有限公司 Information processing system and task execution control method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JP特开2010-39526A 2010.02.18

Also Published As

Publication number Publication date
HK1161386A1 (en) 2012-08-24
CN102279730A (en) 2011-12-14

Similar Documents

Publication Publication Date Title
CN102279730B (en) Parallel data processing method, device and system
CN115328663B (en) Method, device, equipment and storage medium for scheduling resources based on PaaS platform
US10761899B2 (en) Framework to improve parallel job workflow
US9367359B2 (en) Optimized resource management for map/reduce computing
CN105049268A (en) Distributed computing resource allocation system and task processing method
CN114168302B (en) Task scheduling method, device, equipment and storage medium
US9986018B2 (en) Method and system for a scheduled map executor
CN114416352B (en) Computing power resource allocation method and device, electronic equipment and storage medium
CN107807815B (en) Method and device for processing tasks in distributed mode
US20120159236A1 (en) Holistic task scheduling for distributed computing
CN112114950A (en) Task scheduling method and device and cluster management system
CN103997544A (en) Resource downloading method and device
CN104714785A (en) Task scheduling device, task scheduling method and data parallel processing device
KR101656360B1 (en) Cloud System for supporting auto-scaled Hadoop Distributed Parallel Processing System
CN109032769B (en) Container-based continuous integrated CI (CI) task processing method and device
US20130304442A1 (en) Scheduling discrete event simulation
CN112579692B (en) Data synchronization method, device, system, equipment and storage medium
CN103941662A (en) Task scheduling system and method based on cloud computing
CN113051049B (en) Task scheduling system, method, electronic device and readable storage medium
CN116820714A (en) Scheduling method, device, equipment and storage medium of computing equipment
US20150172369A1 (en) Method and system for iterative pipeline
US10048991B2 (en) System and method for parallel processing data blocks containing sequential label ranges of series data
CN109582445A (en) Message treatment method, device, electronic equipment and computer readable storage medium
KR20190061247A (en) Real time resource usage ratio monitoring system of big data processing platform
CN112416980B (en) Data service processing method, device and equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C53 Correction of patent of invention or patent application
CB03 Change of inventor or designer information

Inventor after: Cen Wenchu

Inventor before: Fan Hangcheng

COR Change of bibliographic data

Free format text: CORRECT: INVENTOR; FROM: FAN HANGCHENG TO: CEN WENCHU

REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1161386

Country of ref document: HK

C14 Grant of patent or utility model
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: GR

Ref document number: 1161386

Country of ref document: HK

TR01 Transfer of patent right

Effective date of registration: 20221122

Address after: No. 699, Wangshang Road, Binjiang District, Hangzhou, Zhejiang

Patentee after: Alibaba (China) Network Technology Co.,Ltd.

Address before: Box four, 847, capital building, Grand Cayman Island capital, Cayman Islands, UK

Patentee before: ALIBABA GROUP HOLDING Ltd.

TR01 Transfer of patent right