CN113722057A

CN113722057A - Big data cluster processing method and system, electronic device and storage medium

Info

Publication number: CN113722057A
Application number: CN202110270156.1A
Authority: CN
Inventors: 陈金龙
Original assignee: JD Digital Technology Holdings Co Ltd
Current assignee: JD Digital Technology Holdings Co Ltd
Priority date: 2021-03-12
Filing date: 2021-03-12
Publication date: 2021-11-30

Abstract

The present invention provides a big data cluster processing method and system, an electronic device and a storage medium, wherein the big data cluster processing method includes: defining a script for a host in a big data cluster to generate a first timed task script; Create a first task execution plan under the cluster, and add the first timed task script to the corresponding first task execution plan; based on the first task execution plan, regularly push the first timed task script to the large host in the data cluster. In the present invention, by defining a timed task script, then creating a task execution plan under a designated big data cluster, adding the script to the task execution plan, and regularly pushing the script to the hosts under the cluster to run according to the execution plan, thereby greatly improving the It improves the efficiency of big data cluster operation and maintenance work and saves the operation and maintenance cost.

Description

Big data cluster processing method and system, electronic device and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a big data cluster processing method and system, an electronic device, and a storage medium.

Background

The big data cluster is a cluster for realizing data acquisition, data storage and data analysis of big data. Currently, different large data clusters are managed by respective independent management systems. When managing a big data cluster, a big data cluster operation and maintenance worker often needs to configure a timing task in a host under the cluster, execute tasks such as process state collection, data statistics and the like, and need to write a task script and configure a crontab on each host every time the timing task is set, modified and deleted.

In the prior art, when the large data cluster is large in scale, the time cost for managing the planning task is increased sharply, and the task execution state is not easy to view and the task result is not easy to collect.

Disclosure of Invention

The invention provides a big data cluster processing method and system, electronic equipment and a storage medium, which are used for solving the technical defects in the prior art.

The invention provides a big data cluster processing method, which comprises the following steps:

performing script definition on a host in a big data cluster to generate a first timed task script;

creating a first task execution plan under a big data cluster, and adding the first timed task script into the corresponding first task execution plan;

and based on the first task execution plan, the first timed task script is pushed to a host computer under a big data cluster in a timed mode.

According to the big data cluster processing method provided by the invention, the method further comprises any one or the combination of the following steps:

respectively modifying the first timed task script and the first task execution plan, and pushing the modified first timed task script to a host under a big data cluster at regular time based on the modified first task execution plan;

updating the first timed task script and the first task execution plan respectively to generate a second timed task script and a second task execution plan; based on the second task execution plan, pushing the second timed task script to a host under a big data cluster at regular time;

and deleting the first timed task script and the first task execution plan, and pushing the deleted contents to a host under the big data cluster.

According to a big data cluster processing method provided by the present invention, after the first timed task script is pushed to the host under the big data cluster at a regular time or the modified first timed task script is pushed to the host under the big data cluster at a regular time, the method further includes:

monitoring the state of the first task execution plan, and reporting the state of the first task execution plan to a server side corresponding to the big data cluster;

the timing and pushing the second timed task script to the host computer under the big data cluster comprises the following steps:

and monitoring the state of the second task execution plan, and reporting the state of the second task execution plan to a server side corresponding to the big data cluster.

According to a big data cluster processing method provided by the present invention, after the first timed task script is pushed to the host under the big data cluster at regular time based on the first task execution plan, the method further includes:

and collecting a task operation result fed back by the host under the big data cluster, and reporting the task operation result to a server side corresponding to the big data cluster.

The invention also provides a big data cluster processing system, which comprises:

the script definition module is used for carrying out script definition on a host in the big data cluster and generating a first timed task script;

the script distribution module is used for creating a first task execution plan under the big data cluster and adding the first timed task script into the corresponding first task execution plan;

and the timing pushing module is used for pushing the first timing task script to a host under a big data cluster at regular time based on the first task execution plan.

According to the big data cluster processing system provided by the invention, the big data cluster processing system comprises any one or the combination of the following components:

the modification module is used for respectively modifying the first timed task script and the first task execution plan and pushing the modified first timed task script to a host under a big data cluster at regular time based on the modified first task execution plan;

the updating module is used for respectively updating the first timing task script and the first task execution plan and generating a second timing task script and a second task execution plan; based on the second task execution plan, pushing the second timed task script to a host under a big data cluster at regular time;

and the deleting module is used for deleting the first timed task script and the first task execution plan and pushing the deleted content to the host computer under the big data cluster.

According to a big data cluster processing system provided by the present invention, the big data cluster processing system comprises:

and the state monitoring module is used for monitoring the state of the first task execution plan or the second task execution plan and reporting the state of the first task execution plan or the second task execution plan to a server side corresponding to the big data cluster.

and the operation result collection module is used for collecting the task operation result fed back by the host under the big data cluster and reporting the task operation result to the server side corresponding to the big data cluster.

The present invention also provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, wherein the processor implements the steps of any of the big data cluster processing methods described above when executing the program.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the big data cluster processing method as described in any of the above.

The method comprises the steps of defining a timed task script, then creating a task execution plan under a specified big data cluster, adding the script into the task execution plan, pushing the script to a host under the cluster to run at a fixed time according to the execution plan, and reporting a task execution state and a result to a system server; the first timed task script is pushed to the host under the big data cluster at regular time based on the first task execution plan, so that all the hosts under the appointed big data cluster can be effective at regular time only by modifying the subsequent first timed task script once, the operation and maintenance efficiency of the big data cluster is greatly improved, and the operation and maintenance cost is saved.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a big data cluster processing method according to an embodiment of the present invention;

fig. 2 is a second schematic flowchart of a big data cluster processing method according to an embodiment of the present invention;

fig. 3 is a third schematic flowchart of a big data cluster processing method according to an embodiment of the present invention;

fig. 4 is a fourth schematic flowchart of a big data cluster processing method according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a large data cluster processing system provided by the present invention;

fig. 6 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention discloses a big data cluster processing method, wherein an execution main body is a computer system, a script is uploaded to the computer system, then a timing task is established on the computer system based on the script, and a host computer is selected to be acted on, and the clicking determination is completed, referring to figure 1, the big data cluster processing comprises the following steps:

s1: performing script definition on a host in a big data cluster to generate a first timed task script;

specifically, task operations which need to be executed regularly on a large data cluster or a host are determined, the task operations are compiled into a first timing task script by using a programming language Shell or Python, and then a first timing task script uploading interface of a task center is called to upload the first timing task script to the task center. Script (Script) is an executable file written according to a certain format by using a specific descriptive language, and the first timed task Script is generated by Script definition based on a host in a large data cluster.

S2: creating a first task execution plan under a big data cluster, and adding the first timed task script into the corresponding first task execution plan;

specifically, a timed task creation interface of the task center is called, a first timed task script executed by the timed task, a target big data cluster or a target host used for the timed task, a time period triggered by the timed task at regular time and other parameters are set, and creation of the timed task is completed. The first task execution plan is automatically created under a large data cluster.

S3: and based on the first task execution plan, the first timed task script is pushed to a host computer under a big data cluster in a timed mode.

Specifically, the task center puts the timing task into a task pool, periodically triggers the timing task, automatically and timely pushes the first timing task script to a target big data cluster or a target host, and executes the first timing task script to complete the timing task operation. The timing task can be realized by adopting the existing timing device.

The big data cluster processing method provided by the invention realizes that the task script at the first timing of the subsequent big data cluster operation and maintenance personnel can take effect on all the hosts under the appointed big data cluster only by modifying once.

The big data cluster processing method provided by the invention comprises any one or the combination of the following steps:

respectively modifying the first timed task script and the first task execution plan, and pushing the modified first timed task script to a host under a big data cluster at regular time based on the modified first task execution plan; and the subsequent operation and maintenance personnel of the big data cluster can modify the timing task execution plan and the timing task script only once to take effect on all the hosts under the appointed big data cluster.

and deleting the first timed task script and the first task execution plan, and pushing the deleted contents to a host under the big data cluster. The invention provides a set of complete timed task management flow, and the operations of unified script definition, unified script issuing, timed task execution, task state monitoring, task result collection and the like are performed on the host in the big data cluster, so that the operation and maintenance work efficiency of the big data cluster is improved.

Preferably, in the execution process of the method, after the timing pushing the first timing task script to the host under the big data cluster or the timing pushing the modified first timing task script to the host under the big data cluster, the method includes:

Further, after the first timed task script is pushed to the host under the big data cluster at regular time based on the first task execution plan, the method further includes:

and collecting a task operation result fed back by the host under the big data cluster, and reporting the task operation result to a server side corresponding to the big data cluster. The task execution state is checked and the task result is collected. That is, the results of the execution are returned to the system; the system stores the received task execution result in the log library and updates the execution state and the next execution time of the task in the task library.

To further understand the method of the present embodiment, in a specific example, as shown in fig. 2, the method for processing a big data cluster of the present embodiment includes:

2011, defining a script for a host in the big data cluster, and generating a first timed task script;

firstly, task operation which needs to be executed on a large data cluster or a host at fixed time is determined, the task operation is compiled into a first fixed-time task script by using a programming language Shell or Python, and then a first fixed-time task script uploading interface of a task center is called to upload the first fixed-time task script to the task center.

Step 2012, creating a first task execution plan under the big data cluster, and adding the first timed task script to the corresponding first task execution plan;

storing the script uploaded by the user in a script library, storing the planned task created by the user in a task library, storing the result of task execution in a log library, and scanning a task list in the task library in real time. When a certain task in the task list reaches a preset execution time, the system analyzes the task, acquires a script and a target host required by the task execution, extracts the script from a script library, sends the script to the target host through an SFTP protocol, and executes the script on the target host.

And 2013, based on the first task execution plan, pushing the first timed task script to a host under a big data cluster at regular time.

The task center puts the timing task into a task pool, periodically triggers the timing task, automatically pushes the first timing task script to a target big data cluster or a target host, and executes the first timing task script to finish the timing task operation.

Step 2014, collecting a task operation result fed back by the host under the big data cluster, and reporting the task operation result to the server corresponding to the big data cluster.

And executing the first timed task script, preferably, after the timed task operation is completed, collecting a task running result fed back by the host computer under the big data cluster, wherein the task running result fed back by the host computer comprises successful running or unsuccessful running, and if the running is unsuccessful, continuing to run again in the next step. And reporting the task operation result to a server side corresponding to the big data cluster, so that the server side corresponding to the big data cluster can obtain the operation result.

Step 2015, respectively modifying the first timed task script and the first task execution plan, and periodically pushing the modified first timed task script to a host under a big data cluster based on the modified first task execution plan;

that is, the first timing task script and the first task execution plan are modified respectively to form a modified first task execution plan, the modified first timing task script is pushed to the host under the big data cluster at regular time, and the subsequent operation and maintenance personnel of the big data cluster can take effect on all the hosts under the specified big data cluster by modifying the timing task execution plan and the timing task script only once.

Step 2016, monitoring the state of the first task execution plan, and reporting the state of the first task execution plan to a server corresponding to the big data cluster.

And monitoring the state of the modified first task execution plan, and reporting the state of the first task execution plan to a server corresponding to the big data cluster to realize real-time monitoring and management.

That is, the script uploaded by the user is stored in the script library, the planned task created by the user is stored in the task library, the result of the task execution is stored in the log library, and the task list in the task library is scanned in real time. When a task in the task list reaches a predetermined execution time. The embodiment of the invention analyzes the first task execution plan, acquires a first timing task script and a target host which are required by the first task execution plan, extracts the first timing task script from a script library, sends the first timing task script to the target host through an SFTP protocol, and executes the first timing task script on the target host.

To further understand the method of the present embodiment, in a specific example, as shown in fig. 3, the method for processing a big data cluster of the present embodiment includes:

3011, defining a script for the host in the big data cluster, and generating a first timed task script;

Step 3012, creating a first task execution plan under the big data cluster, and adding the first timed task script to the corresponding first task execution plan;

And 3013, based on the first task execution plan, periodically pushing the first timed task script to a host in a big data cluster.

And 3014, collecting a task running result fed back by the host in the big data cluster, and reporting the task running result to the server corresponding to the big data cluster.

Step 3015, update the first timed task script and the first task execution plan respectively, and generate a second timed task script and a second task execution plan; based on the second task execution plan, pushing the second timed task script to a host under a big data cluster at regular time;

when the updating is needed, the first timed task script and the first task execution plan are only needed to be updated respectively, a second timed task script is generated, and the second timed task script is pushed to the host under the big data cluster in a timed mode, so that the updating is achieved, and the efficiency is high.

The embodiment of the invention analyzes the second task execution plan, acquires a second timing task script and a target host required by the execution of the second task execution plan, extracts the second timing task script from a script library, sends the second timing task script to the target host through an SFTP protocol, and executes the second timing task script on the target host.

And 3016, monitoring the state of the second task execution plan, and reporting the state of the second task execution plan to a server corresponding to the big data cluster.

To further understand the method of the present embodiment, in a specific example, as shown in fig. 4, the method for processing a big data cluster of the present embodiment includes:

step 4011, performing script definition on a host in the big data cluster, and generating a first timed task script;

Step 4012, creating a first task execution plan under the big data cluster, and adding the first timed task script to the corresponding first task execution plan;

And 4013, based on the first task execution plan, periodically pushing the first timed task script to a host under a big data cluster.

And 4014, collecting a task operation result fed back by the host in the big data cluster, and reporting the task operation result to a server corresponding to the big data cluster.

And 4015, deleting the first timed task script and the first task execution plan, and pushing the deleted content to a host under the big data cluster.

If part of the content needs to be deleted, the first timing task script and the first task execution plan corresponding to the part of the content to be deleted can be directly deleted, and the deleted content is pushed to the host computer under the big data cluster, so that the operation is convenient.

The following describes the big data cluster processing system provided by the present invention, and the big data cluster processing system described below and the big data cluster processing method described above may be referred to correspondingly.

The embodiment of the invention discloses a big data cluster processing system, which is shown in figure 5 and comprises the following components:

the script definition module 10 is configured to perform script definition on a host in a big data cluster, and generate a first timing task script;

specifically, task operations which need to be executed regularly on a large data cluster or a host are determined, the task operations are compiled into a first timing task script by using a programming language Shell or Python, and then a first timing task script uploading interface of a task center is called to upload the first timing task script to the task center.

The script distribution module 20 is configured to create a first task execution plan under the big data cluster, and add the first timing task script to the corresponding first task execution plan;

specifically, a timed task creation interface of the task center is called, a first timed task script executed by the timed task, a target big data cluster or a target host used for the timed task, a time period triggered by the timed task at regular time and other parameters are set, and creation of the timed task is completed.

That is, the script uploaded by the user is stored in the script library, the planned task created by the user is stored in the task library, the result of the task execution is stored in the log library, and the task list in the task library is scanned in real time. When a certain task in the task list reaches a preset execution time, the system analyzes the task, acquires a script and a target host required by the task execution, extracts the script from a script library, sends the script to the target host through an SFTP protocol and executes the script on the target host,

and the timing pushing module 30 is configured to push the first timing task script to the host in the big data cluster at regular time based on the first task execution plan.

Specifically, the task center puts the timing task into a task pool, periodically triggers the timing task, automatically pushes the first timing task script to a target big data cluster or a target host, and executes the first timing task script to complete the timing task operation.

The big data cluster processing system provided by the invention realizes that the task script at the first timing can take effect on all the hosts under the appointed big data cluster at regular time only by modifying once for the subsequent big data cluster operation and maintenance personnel by defining the timing task script, then creating a task execution plan under the appointed big data cluster, adding the script into the task execution plan, pushing the script into the hosts under the cluster to operate according to the execution plan at regular time, reporting the task execution state and result to a system server side, and pushing the first timing task script into the hosts under the big data cluster at regular time based on the first task execution plan.

The big data cluster processing system provided by the invention comprises any one or the combination of the following components:

the modification module is used for respectively modifying the first timed task script and the first task execution plan and pushing the modified first timed task script to a host under a big data cluster at regular time based on the modified first task execution plan; and the subsequent operation and maintenance personnel of the big data cluster can modify the timing task execution plan and the timing task script only once to take effect on all the hosts under the appointed big data cluster.

and the deleting module is used for deleting the first timed task script and the first task execution plan and pushing the deleted content to the host computer under the big data cluster. The invention provides a set of complete timed task management flow, and the operations of unified script definition, unified script issuing, timed task execution, task state monitoring, task result collection and the like are performed on the host in the big data cluster, so that the operation and maintenance work efficiency of the big data cluster is improved.

The big data cluster processing system provided by the invention comprises:

and the operation result collection module is used for collecting the task operation result fed back by the host under the big data cluster and reporting the task operation result to the server side corresponding to the big data cluster. The task execution state is checked and the task result is collected. That is, the results of the execution are returned to the system; the system stores the received task execution result in the log library and updates the execution state and the next execution time of the task in the task library.

Fig. 6 illustrates a physical structure diagram of an electronic device, which may include: a processor (processor)610, a communication Interface (Communications Interface)620, a memory (memory)630 and a communication bus 640, wherein the processor 610, the communication Interface 620 and the memory 630 communicate with each other via the communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform a big data cluster processing method comprising:

In addition, the logic instructions in the memory 630 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, the computer is capable of performing a big data cluster processing method, the method comprising:

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program that, when executed by a processor, is implemented to perform a big data cluster processing method, the method comprising:

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A big data cluster processing method is characterized by comprising the following steps:

2. The big data cluster processing method according to claim 1, further comprising any one or a combination of:

3. The big data cluster processing method according to claim 2, wherein after the timing pushing the first timing task script to the host under the big data cluster or the timing pushing the modified first timing task script to the host under the big data cluster, the method further comprises:

4. The big data cluster processing method according to claim 1, wherein after said pushing the first timed task script into the host under the big data cluster at regular time based on the first task execution plan, the method further comprises:

5. A big data cluster processing system, comprising:

6. The big data cluster processing system of claim 5, wherein the big data cluster processing system comprises any one or a combination of:

7. The big data cluster processing system of claim 6, wherein the big data cluster processing system comprises:

8. The big data cluster processing system of claim 6, wherein the big data cluster processing system comprises:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the big data cluster processing method according to any of claims 1 to 4 when executing the program.

10. A non-transitory computer readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the big data cluster processing method according to any of claims 1 to 4.