CN109344135A

CN109344135A - A kind of parallel seismic processing job scheduling method of the file lock of automatic load balancing

Info

Publication number: CN109344135A
Application number: CN201811214596.XA
Authority: CN
Inventors: 薛东川; 张金淼; 李维新; 朱振宇; 刘永江; 张云鹏; 王小六; 黄小刚; 糜芳; 江南森
Original assignee: China National Offshore Oil Corp CNOOC; CNOOC Research Institute Co Ltd
Current assignee: China National Offshore Oil Corp CNOOC; CNOOC Research Institute Co Ltd
Priority date: 2018-10-18
Filing date: 2018-10-18
Publication date: 2019-02-15

Abstract

The present invention relates to a kind of parallel seismic processing job scheduling methods of the file lock of automatic load balancing, its step: an index file is created on shared disk, the index file only includes a starting big gun number, the big gun number specify it is next will migration processing seismic channel set；Telnet node sends processing operation；Treatment progress on each node is lined up under file lock synchronization mechanism reads and modifies index file, and journal file is written in treatment progress current information；Treatment progress completes specified seismic data process, and journal file is written in treatment progress current information in the completed；Aforementioned two step is implemented in repetition, until total data has been processed into；It checks concurrent job journal file, handles the seismic data of missing again.The present invention improves practicability, safety and the operational efficiency of concurrent job, is conducive to the efficient process of mass seismic data.

Description

A kind of parallel seismic processing job scheduling method of the file lock of automatic load balancing

Technical field

The present invention relates to the crossing domains of a kind of petroleum gas seismic prospecting and high-performance calculation, especially with regard to one kind The parallel seismic processing job scheduling method of the file lock of automatic load balancing.

Background technique

With being continuously increased for petroleum gas mineral exploration difficulty, petroleum industry successively proposes the exploration of some complexity Technical method, such as 3-d seismic exploration, seismic exploration, wide-azimuth seismic prospecting and multicomponent seismic survey to high-density.These Exploitation method result in the need for processing seismic data amount sharply increase, how efficiently to handle mass seismic data gradually at For the bottleneck of technology development.Currently, the Seismic data processing software of mainstream has all been set about for the processing taken a long time in industry Link develops corresponding parallel processing technique.These technologies are usually all using MPI (Message Passing Interface, letter Cease passing interface) programmed method realize multi-node parallel calculate.MPI is the communications protocol across language, for writing simultaneously Row computer program supports point-to-point and broadcast, and target is high-performance, extensive and portable.In actual application In, discovery MPI is although effectively, there is also some clearly disadvantageous.For example, being needed in advance when MPI concurrent processor is run (generally taking mean allocation) data to be treated are distributed for each calculate node.Therefore, after operation is once start, if Want to increase or reduce some calculate nodes, all (being somebody's turn to do) processing being currently running in existing calculate node can only be killed first Process, and data and then secondary this concurrent job of starting are being redistributed for each node adjusted.This process for It is very cumbersome and time-consuming that hundreds of nodes, which participate in calculating for the seismic data process operation with the TB order of magnitude,.Institute With cannot neatly increase or decrease calculate node during parallel computation is a parallel deficiency of MPI.

In addition, it is also a parallel disadvantage of MPI that respective nodes delay machine, which will lead to entire concurrent job collapse,.Actual seismic Data processing is carried out on PC-Cluster, and the calculation processing (such as pre-stack depth migration) of some complexity can usually use Hundreds of above computer nodes.In the process of running, either host node is still for the Parallel Computation write using MPI From node, as long as any one node goes wrong (such as delay machine or process are surprisingly killed), entire concurrent job will lead to Collapse.The number of nodes for participating in calculating is more, and the operation run on node is more, and the probability that such case occurs is bigger. Therefore, on shared cluster long time treatment mass seismic data to the steady of the running environment of MPI program and concurrent program itself It is qualitative to be proposed severe challenge.In addition, data to be processed are distributed to each calculate node in advance by MPI concurrent program, this So that concurrent program is difficult to real automatic load balancing.In isomeric group environment, or (the calculate node in shared cluster environment On run the operation of different user transmission), the calculating service performance that each calculate node can provide for current work is deposited In difference, it often will appear the stronger node of some performances and take the lead in being in idle shape for a long time after the calculating task for completing to distribute State, and the poor calculate node of other performances still has the case where many data do not complete processing.This phenomenon is just ripe like everybody " wooden pail effect " known, i.e. the overall operation efficiency of parallel processing operation are determined by the minimum calculate node of performance.

It can be seen that there are 3 major defects for MPI concurrent job in seism processing: (1) it is difficult to increase and decrease node； (2) delay machine loss is serious；(3) load imbalance etc..How magnanimity earthquake data parallel processing operation is improved in complicated calculations ring The technical issues of actual operating efficiency under border becomes current urgent need to resolve.

Summary of the invention

In view of the above-mentioned problems, the object of the present invention is to provide a kind of parallel seismic processings of the file lock of automatic load balancing to make Industry dispatching method, which raises the practicability of concurrent job, safety and operational efficiency, are conducive to the efficient of mass seismic data Processing.

To achieve the above object, the present invention takes following technical scheme: a kind of file lock of automatic load balancing is concurrently Shake processing job scheduling method comprising following steps: 1) on shared disk create an index file, the index file is only Comprising a starting big gun number, the big gun number specify it is next will migration processing seismic channel set；2) Telnet node is sent Handle operation；3) treatment progress on each node is lined up under file lock synchronization mechanism reads and modifies index file, and will Journal file is written in treatment progress current information；4) treatment progress completes specified seismic data process, and in the completed will place Journal file is written in reason process current information；5) implementation steps 3 are repeated)~step 4), until total data has been processed into；6) It checks concurrent job journal file, handles the seismic data of missing again.

Further, in the step 1), shared disk is the physics that each calculate node can access on high speed network Disk, and each calculate node has access limit to the shared index file on the disk.

Further, in the step 1), index file is the text file that ASCII character is write, and content only includes one The starting big gun number of integer, the big gun number are No. SHOT, No. CDP or No. OFFSET, need to keep one with the storage order of seismic data It causes.

Further, in the step 2), it is as follows that Telnet node sends processing operation: 2.1) establishing a node column Table: node listing is a text file, and every a line of this document records the host name for participating in the node calculated；2.2) By writing a script file, recycles the long-range shell-command rsh or ssh of Linux and log in meter on node listing one by one Operator node, and enter next step starting and handle.

Further, in the step 3), when a certain node is reading or is rewriting index file, the index file content When being rewritten or read by other nodes again, file lock is added to index file when treatment progress reads and writes index file；Lock with Afterwards, index file operating process becomes: certain treatment progress obtains authorization -> file unlock -> read-write operation -> text from operating system Authorization is returned operating system by part locking -> treatment progress；Treatment progress in queue repeats above-mentioned process, until institute in queue There is process to be completed operation.

Further, stating the index value that the treatment progress in queue is read from index file, to specify the process to be processed Seismic data trace gather, and the index value that index file is re-write after modifying then specifies next seismic data to be treated Trace gather；Treatment progress read after index value, need that journal file is written the current information of process into, journal file records content Include host name, time and date, call number and execution state information.

Further, in the step 4), treatment progress completes pre-Stack Reverse processing to specified seismic channel set data； Pre-Stack Reverse calculation amount is very big and network overhead is smaller, can be obtained using multinode parallel data processing very high parallel Efficiency；After single-shot data complete migration processing, treatment progress needs to be written its current information journal file, journal file record Content includes host name, time and date, call number and execution state information, monitors and uses for concurrent job.

Further, in the step 5), during seismic data process, always not to the specific allocation processing of calculate node Task, each node in node listing is to be lined up the call number that seismic data to be processed is got from index file, After the processing for completing current seismic trace gather, if do not handled there are also remaining data, current call number is less than or equal to maximum Index value, then arrive again queue end be lined up get seismic data, and if it is untreated without remaining data, then process terminates.

Further, it during the seismic data process, if the calculated performance of different nodes has differences, calculates The low node processing data speed of performance is slow, is lined up the number for the seismic data got with regard to less, and the node that calculated performance is high It is fast to handle data speed, is lined up the number for the seismic data got with regard to more, and the state of this " able people should do more work " can be with calculating The actual loading situation of node adjusts in real time, realizes multi-node parallel automatic load balancing, and reaches on the whole highest Parallel efficiency calculation.

Further, in the step 6), after the completion of parallel processing operation, audit log file checks each trace gather data Performance, find out those missing datas caused by being killed because of node delay machine or treatment progress；Collect these missing numbers According to call number, formed list, added to after handling again in final performance data.

The invention adopts the above technical scheme, which has the following advantages: 1, the present invention can be in magnanimity earthquake data Easily increase or reduce computer node in parallel process according to actual needs, it is parallel to improve magnanimity earthquake data Handle the flexibility of operation.2, respective nodes delay machine not will lead to operation collapse in the present invention, and the loss of generation is small, and (influence only limits In current trace gather), significantly improve the stability of magnanimity earthquake data parallel processing operation.3, the present invention realizes multinode simultaneously Row automatic load balancing, this measure greatly improve the overall execution efficiency of magnanimity earthquake data parallel processing operation.4, The present invention is based on file lock simultaneous techniques realize seism processing multinode automatic load balancing Concurrent Job Dispatching method, Method simple practical, effect are prominent.

Detailed description of the invention

Fig. 1 is node listing example；

Fig. 2 is the csh script example of Telnet node listing starting processing operation；

Fig. 3 is the example of multinode race reading and writing of files；

Fig. 4 is the example of the document order read-write after file lock synchronizes；

Fig. 5 is parallel processing job logging file example；

Fig. 6 is the node state monitoring example that 50 nodal parallels complete the processing of 2100 big gun pre-Stack Reverses；

Fig. 7 is the different big gun collection numbers of 50 node processings in Fig. 6；

Fig. 8 is the time that 50 nodes corresponding with Fig. 7 complete consuming used in processing task.

Specific embodiment

The present invention is described in detail below with reference to the accompanying drawings and embodiments.

The present invention provides a kind of parallel seismic processing job scheduling method of file lock of automatic load balancing, and this method is base In the Concurrent Job Dispatching method of the seism processing multinode automatic load balancing of file lock simultaneous techniques, with certain two dimension It describes in detail for the processing of survey line pre-Stack Reverse to method of the invention.The present invention the following steps are included:

1) index file is established.

An index file is created on shared disk, which uses the text file of ASCII character writing, in Hold only include an integer starting big gun number (such as 1001), the big gun number specify it is next will migration processing seismic channel set (in seismic data trace header big gun number be equal to 1001 trace gather), can be No. SHOT, No. CDP or No. OFFSET etc., need and earthquake The storage order of data is consistent.Since pre-Stack Reverse is by big gun collection sequence processing data, in order to improve file Recall precision, seismic data will also press big gun sequential storage, and all calculate nodes are to seismic data acquisition and velocity field file Have can read right, and have access limit to index file.

Shared disk is the physical disk that each calculate node can access on high speed (such as 10,000,000,000 nets) network, and each A calculate node has access limit to the shared index file on the disk.

2) Telnet node sends processing operation.

2.1) node listing is established.Node listing is a text file, and every a line of this document records one Participate in the host name of the node calculated (node listing file as shown in Figure 1 includes 23 and participates in the computer node calculated).

2.2) it then by writing a script file, recycles the long-range shell-command rsh or ssh of Linux and steps on one by one The calculate node on node listing is recorded, and enters next step starting and handles.It is write as shown in Fig. 2, being one with csh order Script example.In practical application, can according to the service condition for the Thread Count and calculate node memory that processing routine uses, Starting multiple treatment progress on the same node, (the 6-7 row in such as Fig. 2 can start two processing after cancelling annotation shielding Process).

3) treatment progress on each node is lined up read-write index file under file lock synchronization mechanism, and by treatment progress Journal file is written in current information.

The case where may cause " race read-write " when multiple treatment progress read while write index file occurs, i.e., when a certain section When point is reading (or rewriting) index file, which rewrites (or reading) by other nodes again, this not only can The data for allowing treatment progress to read wrong also often result in network blockage even network paralysis.As shown in figure 3, being a light weight " race read-write " example of grade.In the case where not adding control, after parallel processing job initiation, 188 nodes read while write rope Quotation part, it can be seen that many nodes have all reprocessed identical seismic data, and only 1071 this big gun data are just at least Tetra- node reprocessings of b03n032y, b03n038y, b03n031y and b08n100y.When more nodes read while write it is bigger When data volume, tend to lead to network paralysis.To solve this problem, the present invention gives when treatment progress reads and writes index file Index file adds file lock.After locking, all processes for wanting access to index file will be waited in line, and only obtain That process of operating system authorization can be written and read file.Index file operating process becomes at this time: certain processing Process obtains authorization -> file unlock -> read-write operation -> file locking -> treatment progress from operating system and operation is returned in authorization System.Treatment progress in queue repeats above-mentioned process, until all processes are completed operation in queue.Fig. 4 is index text The case where part locks the example of post-processing Process Synchronization reading and writing of files, does not have " race read-write " again after locking occurs.Handle into The index value that journey is read from index file specifies process seismic data trace gather to be processed, and re-writes rope after modifying The index value of quotation part then specifies next seismic data trace gather to be treated.As shown in figure 4, pre-Stack Reverse is handled Index value in middle index file is No. SHOT, the treatment progress on node b03n041y from read in index file after 1100, By index value plus 1, then index file is written.The above process indicates that b03n041y will deviate the earthquake number that big gun number is equal to 1100 According to, and (on node b09n119y) treatment progress of next access index file will read to obtain index value 1101.In addition, processing Process read after index value, need to be written the current information of process journal file, it includes host that journal file, which records content, The information such as name, time and date, call number and execution state (as shown in Figure 5).

4) treatment progress completes (index value) specified seismic data process, and in the completed by treatment progress current information Journal file is written.

Treatment progress completes pre-Stack Reverse processing to specified seismic channel set data.The characteristics of pre-Stack Reverse is Calculation amount is very big and network overhead is smaller, can obtain very high parallel efficiency (general feelings using multinode parallel data processing Computational efficiency is with number of nodes linear increase under condition).After single-shot data complete migration processing, treatment progress is needed its current information Be written journal file, journal file record content include the information such as host name, time and date, call number and execution state (such as Shown in Fig. 5), it monitors and uses for concurrent job.

5) implementation steps 3 are repeated)~step 4), until total data has been processed into.

As shown in fig. 6, the pre-Stack Reverse processing that 50 calculate nodes complete a 2100 big gun seismic datas is enabled, name Per node on average completes 42 big gun data processings in justice.However, not having always to calculate node during seismic data process Body allocation processing task, each node in node listing are to be lined up to get seismic data to be processed from index file (call number), after the processing for completing current seismic trace gather, if not handling (i.e. current call number there are also remaining data Less than or equal to maximum index value), then the queuing of queue end is arrived again and gets seismic data, and if not locating without remaining data Reason, then process terminates.During above-mentioned seismic data process, if the calculated performance of different nodes has differences, count The low node processing data speed of calculation performance is slow, and the number for being lined up the seismic data got is naturally just less, and calculated performance is high Node processing data speed it is fast, be lined up the number of the seismic data the got just shape of more and this " able people should do more work " naturally State can adjust in real time with the actual loading situation of calculate node, so that multi-node parallel automatic load balancing is dexterously realized, And reach highest parallel efficiency calculation on the whole.As shown in fig. 7, the processing data volume that 50 nodes are completed is not equivalent Nominal 42 big guns, since calculated performance has differences, wherein No. 18 nodes have been only completed the processing of 12 big guns, No. 29 nodes complete 33 big guns Processing, No. 16 nodes complete the processing of 35 big guns, and the nodes such as 2,3,4,6 then complete the processing of 44 big guns.Fig. 8 is corresponding with Fig. 7 50 nodes complete the time spent by data processing, and the time of each node cost slightly has difference, but generally terminates simultaneously Processing task, without any node, idle the case where waiting, occurs for a long time, parallel place relative to the preparatory equal part data of MPI Reason, improves parallel efficiency calculation.In addition, increasing some nodes if necessary during above-mentioned seismic data process and participating in Current processing task does not need to terminate the currently processed process on each node, and need to only create one and (contain only newly-increased node ) node listing, and directly this processing operation is restarted by mode in step 2).If necessary to reduce some participations Initiating task then directly (is changed to kill operation in step 2)) to kill these treatment progress by currently processed node. At this point, loss brought by the process of kill is identical as loss caused by respective nodes delay machine, i.e., only will cause and kill process The trace gather shortage of data of number equivalent.

6) it checks concurrent job journal file, handles the seismic data of missing again.

After the completion of parallel processing operation, audit log file checks the performance of each shot gather data, find out those because Node delay machine or treatment progress are killed caused missing data (if certain big gun data processing only has initial time without tying The beam time then illustrates the shortage of data).The big gun number of these missing datas is collected, list is formed, is added to most after handling again In whole performance data.

The various embodiments described above are merely to illustrate the present invention, and each step may be changed, in the technology of the present invention On the basis of scheme, the improvement and equivalents that all principles according to the present invention carry out separate step should not be excluded in this hair Except bright protection scope.

Claims

1. a kind of parallel seismic processing job scheduling method of the file lock of automatic load balancing, which is characterized in that including following step It is rapid:

1) index file is created on shared disk, which only includes a starting big gun number, which specifies It is next will migration processing seismic channel set；

2) Telnet node sends processing operation；

3) treatment progress on each node be lined up under file lock synchronization mechanism read and modify index file, and will handle into Journal file is written in journey current information；

4) treatment progress completes specified seismic data process, and in the completed by treatment progress current information write-in log text Part；

5) implementation steps 3 are repeated)~step 4), until total data has been processed into；

2. method as described in claim 1, it is characterised in that: in the step 1), shared disk is each meter on high speed network The physical disk that operator node can access, and each calculate node has read-write to weigh the shared index file on the disk Limit.

3. method as claimed in claim 1 or 2, it is characterised in that: in the step 1), index file is what ASCII character was write Text file, content only include the starting big gun number an of integer, which is No. SHOT, No. CDP or No. OFFSET, need with The storage order of seismic data is consistent.

4. method as described in claim 1, it is characterised in that: in the step 2), Telnet node sends processing operation such as Under:

2.1) establish a node listing: node listing is a text file, and every a line of this document records a participation The host name of the node of calculating；

2.2) it by writing a script file, recycles the long-range shell-command rsh or ssh of Linux and logs in node column one by one Calculate node on table, and enter next step starting and handle.

5. method as described in claim 1, it is characterised in that: in the step 3), when a certain node is reading or rewriting rope When quotation part, when which is rewritten or read by other nodes again, when treatment progress reads and writes index file to rope Quotation part adds file lock；After locking, index file operating process becomes: certain treatment progress authorized from operating system -> Authorization is returned operating system by file unlock -> read-write operation -> file locking -> treatment progress；Treatment progress in queue repeats Above-mentioned process, until all processes are completed operation in queue.

6. method as claimed in claim 5, it is characterised in that: the rope that the treatment progress in the queue is read from index file Draw value and specifies process seismic data trace gather to be processed, and the index value that index file is re-write after modifying then specifies Next seismic data trace gather to be treated；Treatment progress read after index value, need to be written the current information of process Journal file, it includes host name, time and date, call number and execution state information that journal file, which records content,.

7. method as described in claim 1, it is characterised in that: in the step 4), treatment progress is to specified seismic channel set number It is handled according to pre-Stack Reverse is completed；Pre-Stack Reverse calculation amount is very big and network overhead is smaller, simultaneously using more piece point data Row processing can obtain very high parallel efficiency；After single-shot data complete migration processing, treatment progress is needed its current information Journal file is written, it includes host name, time and date, call number and execution state information that journal file, which records content, for simultaneously Row monitoring operation uses.

8. method as described in claim 1, it is characterised in that: in the step 5), during seismic data process, always not To the specific allocation processing task of calculate node, each node in node listing is to be lined up to get and to handle from index file Seismic data call number, after the processing for completing current seismic trace gather, if do not handled there are also remaining data, currently Call number is less than or equal to maximum index value, then arrives the queuing of queue end again and get seismic data, and if without remainder According to untreated, then process terminates.

9. method as claimed in claim 8, it is characterised in that: during the seismic data process, if the meter of different nodes Calculate performance have differences, then calculated performance it is low node processing data speed it is slow, be lined up the number for the seismic data got just It is less, and calculated performance it is high node processing data speed it is fast, the number for being lined up the seismic data got is just more, and this The state of " able people should do more work " can adjust in real time with the actual loading situation of calculate node, and it is equal to realize multi-node parallel automatic load Weighing apparatus, and reach highest parallel efficiency calculation on the whole.

10. method as described in claim 1, it is characterised in that: in the step 6), after the completion of parallel processing operation, check day Will file checks the performance of each trace gather data, finds out caused by those are killed because of node delay machine or treatment progress Missing data；The call number of these missing datas is collected, list is formed, final performance data is added to after handling again In.