Disclosure of Invention
The invention provides a method and a device for identifying risk problems in a dependency relationship of batch operations, aiming at solving the technical problem that the risk problems in the dependency relationship of the batch operations are difficult to find in the prior art.
In order to achieve the above object, according to one aspect of the present invention, there is provided a risk problem identification method in a dependency relationship of a batch job, the method including:
acquiring a preset directed graph structure, wherein the directed graph structure comprises: the graph nodes correspond to the jobs one by one, and the directed edges are used for representing the job dependency relationship;
searching graph nodes which are not marked as traversed and corresponding to the operation types are automatic from the directed graph structure, marking the searched graph nodes as traversed and stacking;
executing a plurality of searching steps until the stack is empty, selecting a graph node from the stack as a current node when the searching step is executed each time, searching a target node corresponding to the current node in the directed graph structure, if the target node is searched, updating the target node to the current node, continuing searching the target node, ending the searching step until the target node of the current node cannot be searched, wherein the target node is a graph node which is not marked as traversed in the directed edge direction of the current node, marking the target node as traversed each time the target node is searched, and if the target node is not in the stack, popping the current node before updating when the target node is updated to the current node each time, and popping the current node when the target node of the current node cannot be searched; when the target node is found every time, if the operation type corresponding to the target node is not dependent, first risk problem information is generated, and if the target node is in a stack, second risk problem information is generated.
Optionally, the method for identifying risk problems in the dependency relationship of batch jobs further includes:
if the stack is empty, and no graph node which is not marked as traversed and the corresponding operation type is automatic exists in the directed graph structure, judging whether a graph node which is not marked as traversed exists in the directed graph structure;
if so, generating third risk problem information.
Optionally, the method for identifying risk problems in the dependency relationship of batch jobs further includes:
acquiring a full-amount operation definition rule SQL statement and a full-amount operation dependency relationship rule SQL statement;
analyzing the SQL statement of the full-scale operation definition rule to obtain full-scale operation definition information, wherein each piece of operation definition information comprises: job ID and job type;
analyzing the full-scale operation dependency relationship rule SQL statement to obtain operation dependency relationship information, wherein each piece of operation dependency relationship information comprises: a preceding job ID and a succeeding job ID;
and if the former job ID and the latter job ID in the analyzed job dependency relationship information have corresponding job definition information, recording the analyzed job dependency relationship information.
Optionally, the method for identifying risk problems in the dependency relationship of batch jobs further includes:
and establishing a directed graph structure according to the full-amount job definition information and the recorded job dependency relationship information, wherein each job ID is used as a graph node when the directed graph structure is established, the job type is used as the attribute of the graph node, and a directed edge connecting the graph nodes is established according to the job dependency relationship.
Optionally, the method for identifying risk problems in the dependency relationship of batch jobs further includes:
and if the former job ID and/or the latter job ID in the analyzed job dependency relationship information do not have corresponding job definition information, generating fourth risk problem information.
Optionally, the selecting a graph node from the stack as the current node specifically includes:
and taking the graph node at the top of the stack as the current node.
In order to achieve the above object, according to another aspect of the present invention, there is provided an apparatus for identifying risk problems in dependency relationships of a batch job, the apparatus including:
the directed graph structure acquiring unit is used for acquiring a preset directed graph structure, wherein the directed graph structure comprises: the graph nodes correspond to the jobs one by one, and the directed edges are used for representing the job dependency relationship;
the node stacking unit is used for searching the graph nodes which are not marked as traversed and the corresponding operation types are automatic from the directed graph structure, and marking the searched graph nodes as traversed and stacked;
a first risk problem identification unit for performing a plurality of search steps until the stack is empty, selecting a graph node from the stack as a current node each time a search step is performed, searching a target node corresponding to the current node in the directed graph structure, if the target node is searched, updating the target node to the current node, continuing searching the target node, ending the searching step until the target node of the current node cannot be searched, wherein the target node is a graph node which is not marked as traversed in the directed edge direction of the current node, marking the target node as traversed when the target node is searched each time, stacking the target node if the target node is not in the stack, popping the current node before updating when the target node is updated each time, and popping the current node when the target node of the current node cannot be searched; when the target node is found every time, if the operation type corresponding to the target node is not dependent, first risk problem information is generated, and if the target node is in a stack, second risk problem information is generated.
Optionally, the apparatus for identifying risk problems in the dependency relationship of batch jobs further includes:
and the second risk problem identification unit is used for judging whether a graph node which is not marked as traversed exists in the directed graph structure or not when a stack is empty and the graph node which is not marked as traversed and the corresponding job type is automatic does not exist in the directed graph structure, and generating third risk problem information when the graph node which is not marked as traversed exists.
Optionally, the apparatus for identifying risk problems in the dependency relationship of batch jobs further includes:
the rule SQL statement acquisition unit is used for acquiring a full-amount operation definition rule SQL statement and a full-amount operation dependency relationship rule SQL statement;
the first analysis unit is used for analyzing the full-scale operation definition rule SQL statement to obtain full-scale operation definition information, wherein each piece of operation definition information comprises: job ID and job type;
a second analysis unit, configured to analyze the full-scale job dependency rule SQL statement to obtain job dependency information, where each job dependency information includes: a preceding job ID and a succeeding job ID;
and a job dependency information recording unit for recording the analyzed job dependency information when the corresponding job definition information exists in both the previous job ID and the next job ID in the analyzed job dependency information.
Optionally, the apparatus for identifying risk problems in the dependency relationship of batch jobs further includes:
and the directed graph structure establishing unit is used for establishing a directed graph structure according to the full-amount job definition information and the recorded job dependency relationship information, and when the directed graph structure is established, each job ID is used as a graph node, the job type is used as the attribute of the graph node, and the directed edge connecting the graph nodes is established according to the job dependency relationship.
Optionally, the apparatus for identifying risk problems in the dependency relationship of batch jobs further includes:
and third risk problem identification means for generating fourth risk problem information if there is no corresponding job definition information for the preceding job ID and/or the succeeding job ID in the analyzed job dependency relationship information.
In order to achieve the above object, according to another aspect of the present invention, there is also provided a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the risk problem identification method in the batch job dependency relationship when executing the computer program.
In order to achieve the above object, according to another aspect of the present invention, there is also provided a computer-readable storage medium storing a computer program which, when executed in a computer processor, implements the steps in the risk problem identification method in the batch job dependency relationship described above.
The invention has the beneficial effects that:
according to the method and the device, the dependency relationship of the batch operation is represented by the directed graph structure, and then risk problem identification is carried out on the directed graph structure, so that the beneficial effect of effectively identifying the risk problem in the dependency relationship of the batch operation is achieved, and errors or risk parts contained in the dependency relationship can be found in time and fed back to developers.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
It should be noted that the terms "comprises" and "comprising," and any variations thereof, in the description and claims of the present invention and the above-described drawings, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
It should be noted that the method and apparatus for identifying risk problems in dependency relationships of batch jobs in the following embodiments of the present invention may be applied to the financial field and may also be applied to other technical fields.
The invention provides a method for monitoring dependency of batch operation in real time, which comprises the steps of analyzing regular SQL statements, loading the SQL statements into a memory to obtain operation definitions and operation dependency, converting the operation definitions and the operation dependency into a directed graph structure, performing traversal search on the directed graph to obtain the state of the graph, such as isolated nodes, cyclic dependency and the like, and finally outputting corresponding risk information.
Fig. 1 is a first flowchart of a method for identifying a risk problem in a dependency relationship of a batch job according to an embodiment of the present invention, and as shown in fig. 1, in an embodiment of the present invention, the method for identifying a risk problem in a dependency relationship of a batch job according to the present invention includes steps S101 to S103.
Step S101, obtaining a preset directed graph structure, wherein the directed graph structure comprises: the graph nodes correspond to the jobs one by one, and the directed edges are used for representing the job dependency relationship.
In the embodiment of the invention, the directed graph structure of the batch operation is established based on the dependency relationship of the batch operation. The directed graph structure comprises a plurality of graph nodes, each graph node corresponds to one job, and the directed graph structure also comprises directed edges, and the directed edges point to another graph node from one graph node and are used for representing the dependency relationship between the jobs.
Step S102, searching the graph nodes which are not marked as traversed and the corresponding operation types are automatic from the directed graph structure, marking the searched graph nodes as traversed and stacking.
In the embodiment of the invention, each job has a corresponding job type, and in the invention, the job types comprise automatic types and dependent types.
In the embodiment of the invention, the graph nodes which are not marked as traversed and the corresponding operation types are automatic are continuously searched from the directed graph structure, and each time one graph node is searched, the searched graph node is marked as traversed and stacked until no graph node which is not marked as traversed and the corresponding operation type is automatic exists in the directed graph structure.
Step S103, executing multiple searching steps until the stack is empty, selecting a graph node from the stack as a current node when the searching steps are executed each time, searching a target node corresponding to the current node in the directed graph structure, if the target node is searched, updating the target node to the current node, continuing searching the target node, ending the searching step until the target node of the current node cannot be searched, wherein the target node is the graph node which is not marked as traversed in the directed edge direction of the current node, marking the target node as traversed each time the target node is searched, and if the target node is not in the stack, popping the current node before updating when the target node is updated to the current node each time, and popping the current node when the target node of the current node cannot be searched; when the target node is found every time, if the operation type corresponding to the target node is not dependent, first risk problem information is generated, and if the target node is in a stack, second risk problem information is generated.
In the embodiment of the invention, the first risk problem information is used for indicating the node type error. In an embodiment of the present invention, when the target node is found each time, if the job type corresponding to the target node is not dependent, the first risk problem information is generated according to the node ID of the target node.
In the embodiment of the present invention, the second risk issue information is used to indicate that the job dependency relationship has a cyclic dependency. In an embodiment of the present invention, when the target node is found each time, if the target node is already in the stack, the second risk problem information is generated according to the node ID of the target node.
In an embodiment of the present invention, the selecting a graph node from a stack as a current node specifically includes: and taking the graph node at the top of the stack as the current node.
Therefore, the dependency relationship of the batch operation is represented by the directed graph structure, and the risk problem identification is carried out on the directed graph structure, so that the beneficial effect of effectively identifying the risk problem in the dependency relationship of the batch operation is realized, and the error or risk part contained in the dependency relationship can be found in time and fed back to the developer.
Fig. 2 is a second flowchart of a method for identifying a risk problem in a dependency relationship of a batch job according to an embodiment of the present invention, and as shown in fig. 2, in an embodiment of the present invention, the method for identifying a risk problem in a dependency relationship of a batch job further includes steps S201 to S202.
Step S201, if the stack is empty, and there is no graph node in the directed graph structure that is not marked as traversed and the corresponding job type is automatic, determining whether there is a graph node in the directed graph structure that is not marked as traversed.
Step S202, if any, generates third risk issue information.
In an embodiment of the present invention, the third risk issue information is used to indicate that there is an isolated job. In this step, if there is a graph node that is not marked as traversed in the directed graph structure, third risk problem information is generated according to the node ID of the graph node that is not marked as traversed.
Fig. 3 is a third flowchart of a method for identifying a risk problem in a dependency relationship of a batch job according to an embodiment of the present invention, and as shown in fig. 3, in an embodiment of the present invention, the method for identifying a risk problem in a dependency relationship of a batch job further includes steps S301 to S304.
Step S301, acquiring a full-volume operation definition rule SQL statement and a full-volume operation dependency relationship rule SQL statement.
Step S302, analyzing the SQL statement of the full-scale operation definition rule to obtain full-scale operation definition information, wherein each piece of operation definition information comprises: job ID and job type.
Step S303, analyzing the full job dependency rule SQL statement to obtain job dependency information, where each job dependency information includes: a former job ID and a latter job ID.
In the embodiment of the invention, the operation definition information and the operation dependency relationship information can be simplified and abstracted respectively for the regular SQL sentences of the operation definition and the operation dependency relationship.
In step S304, if the corresponding job definition information exists in both the previous job ID and the next job ID in the analyzed job dependency relationship information, the analyzed job dependency relationship information is recorded.
In an embodiment of the present invention, the method for identifying risk problems in dependency relationships of batch jobs further includes:
and if the former job ID and/or the latter job ID in the analyzed job dependency relationship information do not have corresponding job definition information, generating fourth risk problem information.
As shown in fig. 4, in parsing the regular SQL statement of job definitions and job dependencies, the following risk problems may be found:
SQL statements are not canonical: SQL statements are not written strictly in the defined form of SQL statements.
Job ID is undefined: this job ID is used in job dependencies, but is not present in job definition SQL statements.
In one embodiment of the present invention, the fourth risk issue information is used to indicate that a job definition does not exist. In an embodiment of the present invention, this step specifically generates the fourth risk issue information based on a job ID for which no corresponding job definition information exists.
In an embodiment of the present invention, the method for identifying risk problems in dependency relationships of batch jobs further includes:
and establishing a directed graph structure according to the full-amount job definition information and the recorded job dependency relationship information, wherein each job ID is used as a graph node when the directed graph structure is established, the job type is used as the attribute of the graph node, and a directed edge connecting the graph nodes is established according to the job dependency relationship.
In an embodiment of the present invention, when a new job definition rule SQL statement and a new job dependency rule SQL statement are generated, the new generated rule SQL statement may be parsed to obtain job definition information and job dependency information, and then a new node is directly added to the directed graph structure according to the job definition information, and a new directed edge is added to the directed graph structure when both a previous job ID and a next job ID in the job dependency information have corresponding job definition information.
Fig. 5 is a schematic diagram of a search process of traversing a directed graph structure according to an embodiment of the present invention, and as shown in fig. 5, in an embodiment of the present invention, the search process of traversing a directed graph structure according to the present invention is specifically as follows:
(1) and (5) finding out the unretraversed job ID with the job type of 0-automatic as a starting node, marking the traversed job ID, merging the traversed job ID into the stack, and continuing the step (5) if no unretraversed starting node exists.
(2) Finding the next non-traversed node according to the directed edge of the current node,
if no next node which is not traversed exists, the current node is popped, if the stack is empty, the step (1) is continued, otherwise, the top node of the stack is taken as the current node to continue the step (2);
if the next node which is not traversed exists, the next node is taken as the current node, and the step (3) is continued.
(3) The job type corresponding to the current node is checked,
if the operation type is 0-automatic, recording the risk problem (generating first risk problem information), and if the operation type of the node is not matched, continuing the step (4);
if the job type is 1-dependent, the step (4) is continued.
(4) It is checked whether the current node is on the stack,
in the stack, recording a risk problem (generating second risk problem information), wherein the node operation dependency relationship has a circular dependency, and continuing the step (2);
and if not, marking the traversed and stacked state, and continuing the step (2).
(5) And traversing all the nodes one by one, and recording the nodes which do not have the traversed marks (generating third risk problem information), wherein the nodes are isolated nodes.
(6) And outputting risk problem information and ending the searching process.
It can be seen from the above embodiments that, by simplifying the regular SQL statements that abstract the dependency relationship of the batch operations and analyzing the SQL statements to convert them into the directed graph structure, the present invention implements a feasible method for detecting the dependency relationship of the batch operations in real time, can discover the risk problem included in the dependency relationship of the operations in time, and provides a guarantee for the correct scheduling of the batch operations.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
Based on the same inventive concept, the embodiment of the present invention further provides a device for identifying risk problems in dependency relationships of batch jobs, which can be used to implement the method for identifying risk problems in dependency relationships of batch jobs described in the foregoing embodiment, as described in the following embodiment. Because the principle of solving the problems by the risk problem identification device in the batch job dependency relationship is similar to the risk problem identification method in the batch job dependency relationship, embodiments of the risk problem identification device in the batch job dependency relationship can refer to embodiments of the risk problem identification method in the batch job dependency relationship, and repeated parts are not described again. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 6 is a first structural block diagram of an apparatus for identifying risk problems in a dependency relationship of a batch job according to an embodiment of the present invention, and as shown in fig. 6, in an embodiment of the present invention, the apparatus for identifying risk problems in a dependency relationship of a batch job according to the present invention includes:
a directed graph structure obtaining unit 1, configured to obtain a preset directed graph structure, where the directed graph structure includes: the graph nodes correspond to the jobs one by one, and the directed edges are used for representing the job dependency relationship;
the node stacking unit 2 is configured to search a graph node which is not marked as traversed and whose corresponding job type is automatic from the directed graph structure, mark the searched graph node as traversed and stack the graph node;
a first risk problem identification unit 3 for performing a plurality of search steps until the stack is empty, selecting a graph node from the stack as a current node each time a search step is performed, searching a target node corresponding to the current node in the directed graph structure, if the target node is searched, updating the target node to the current node, continuing searching the target node, ending the searching step until the target node of the current node cannot be searched, wherein the target node is a graph node which is not marked as traversed in the directed edge direction of the current node, marking the target node as traversed when the target node is searched each time, stacking the target node if the target node is not in the stack, popping the current node before updating when the target node is updated each time, and popping the current node when the target node of the current node cannot be searched; when the target node is found every time, if the operation type corresponding to the target node is not dependent, first risk problem information is generated, and if the target node is in a stack, second risk problem information is generated.
In an embodiment of the present invention, the apparatus for identifying risk problems in dependency relationships of batch jobs according to the present invention further includes:
and the second risk problem identification unit is used for judging whether a graph node which is not marked as traversed exists in the directed graph structure or not when a stack is empty and the graph node which is not marked as traversed and the corresponding job type is automatic does not exist in the directed graph structure, and generating third risk problem information when the graph node which is not marked as traversed exists.
Fig. 7 is a first structural block diagram of an apparatus for identifying risk problems in a dependency relationship of a batch job according to an embodiment of the present invention, and as shown in fig. 7, in an embodiment of the present invention, the apparatus for identifying risk problems in a dependency relationship of a batch job according to the present invention further includes:
a rule SQL statement acquisition unit 4, configured to acquire a full-scale job definition rule SQL statement and a full-scale job dependency relationship rule SQL statement;
a first parsing unit 5, configured to parse a full-scale job definition rule SQL statement to obtain full-scale job definition information, where each piece of job definition information includes: job ID and job type;
a second parsing unit 6, configured to parse the full-scale job dependency rule SQL statement to obtain job dependency information, where each job dependency information includes: a preceding job ID and a succeeding job ID;
a job dependency information recording unit 7, configured to record the analyzed job dependency information when corresponding job definition information exists in both the previous job ID and the next job ID in the analyzed job dependency information;
and the directed graph structure establishing unit 8 is used for establishing a directed graph structure according to the full-volume job definition information and the recorded job dependency relationship information, and when the directed graph structure is established, each job ID is used as a graph node, the job type is used as the attribute of the graph node, and the directed edge connecting the graph nodes is established according to the job dependency relationship.
In an embodiment of the present invention, the apparatus for identifying risk problems in dependency relationships of batch jobs according to the present invention further includes:
and third risk problem identification means for generating fourth risk problem information if there is no corresponding job definition information for the preceding job ID and/or the succeeding job ID in the analyzed job dependency relationship information.
To achieve the above object, according to another aspect of the present application, there is also provided a computer apparatus. As shown in fig. 8, the computer device comprises a memory, a processor, a communication interface and a communication bus, wherein a computer program that can be run on the processor is stored in the memory, and the steps of the method of the above embodiment are realized when the processor executes the computer program.
The processor may be a Central Processing Unit (CPU). The Processor may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or a combination thereof.
The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and units, such as the corresponding program units in the above-described method embodiments of the present invention. The processor executes various functional applications of the processor and the processing of the work data by executing the non-transitory software programs, instructions and modules stored in the memory, that is, the method in the above method embodiment is realized.
The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor, and the like. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and such remote memory may be coupled to the processor via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The one or more units are stored in the memory and when executed by the processor perform the method of the above embodiments.
The specific details of the computer device may be understood by referring to the corresponding related descriptions and effects in the above embodiments, and are not described herein again.
In order to achieve the above object, according to another aspect of the present application, there is also provided a computer-readable storage medium storing a computer program which, when executed in a computer processor, implements the steps in the risk problem identification method in batch job dependencies described above. It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD) or a Solid State Drive (SSD), etc.; the storage medium may also comprise a combination of memories of the kind described above.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.