Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings. The embodiments described below by referring to the drawings are illustrative and intended to explain the present application and should not be construed as limiting the application.
The terms first and second and the like in the description, the claims and the drawings of embodiments of the application are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In the prior art, data among a plurality of server clusters cannot be called among different server clusters, and if a computing task processed by any one of the plurality of server clusters depends on data stored in other server clusters in the plurality of server clusters, the data on which the computing task in the other server clusters depends needs to be migrated to a current server cluster, so that the current server cluster can be guaranteed to normally process the computing task. However, when the migrated data in the other server clusters changes, the migration data of the current server cluster is unchanged, so that the migration data in the current server cluster is inconsistent with the migrated data after the change in the other server clusters.
In the prior art, consistency of migration data of a current server cluster and latest data after the migration data in other server clusters are changed is generally maintained in the first mode that all the migrated data in other server clusters are migrated to the current server cluster again before each computing task is processed, and the second mode that data, which is inconsistent with the migration data in the current server cluster, of the migrated data in other server clusters is migrated to the current server cluster again by using a third party tool before each computing task is processed.
When the consistency of the migrated data and the migrated data is realized in the first mode, the data which is not changed in the migrated data is migrated to the current server cluster again, so that unnecessary waste of migration resources is caused, and a large amount of migration time is consumed. Although the migration time is saved in the second mode compared with the first mode, more verification time is needed, and the processing efficiency of the current server cluster based on data processing calculation tasks in other server clusters is low in both modes. In addition, in the first and second modes, only the latest migration data (the latest version of migration data) is stored in the target server cluster, and when the migration data is changed, the latest migration data covers the historical migration data before the change, so that the historical migration data is covered, and the historical migration data cannot be reused.
The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Fig. 1 is a schematic structural diagram of a computing task execution system according to an exemplary embodiment of the present application, where the system includes a target server cluster and a plurality of other server clusters, where an original server cluster is included in the plurality of other server clusters, that is, the other server cluster 1, and the number of server clusters in the plurality of other server clusters is 1 or more, for example, the plurality of other server clusters shown in fig. 1 further includes the other server cluster 2 and the other server cluster n.
The target server cluster is used for acquiring file information of data files in an original server cluster, which is relied by a calculation task to be executed, wherein the file information comprises a first file number and a first latest modification time, the original server cluster is any server cluster except the target server cluster in the plurality of server clusters, whether the data files are consistent with the historical data files acquired at the previous time or not is determined based on the file information, if so, the calculation task is executed, if not, the data files are acquired, and the calculation task is executed based on the data files. The server clusters in the plurality of other server clusters are used for providing data files for the target server cluster, wherein the server clusters are used for storing the data files on which the computing tasks executed by the target server cluster depend.
The execution principle and the interaction process of each component unit in the embodiment of the system, such as the target server cluster and the original server cluster in the plurality of other server clusters, can be referred to as the following description of each method embodiment.
Fig. 2 is a flowchart of a method for executing a computing task according to an exemplary embodiment of the present application, where an execution subject of the method may be a target server cluster of a plurality of server clusters, and the method at least includes the following steps S101 to S102:
S101, acquiring file information of data files in an original server cluster, which is relied by a computing task to be executed, wherein the file information comprises a first file number and a first latest modification time, and the original server cluster is any server cluster except the target server cluster in the plurality of server clusters.
Alternatively, the number of servers in any of the plurality of server clusters may be 1 or more.
Alternatively, a computing task refers to a series of operations or instructions that need to be processed by a server.
Optionally, the data file refers to a file storing data, and the data stored in the server cluster is stored in a file form, where the data file is a set of multiple sub-data files, and the sub-data file is a minimum file unit for storing the data.
By way of example, a data file may be understood a folder, and a sub-data file is a file in the folder, each file having data stored therein.
Optionally, the data file includes a plurality of sub data files, in S101, file information of the data files in the original server cluster, on which the computing task to be executed depends, is obtained, where the file information includes a first number of files and a first latest modification time, and the method includes the following steps S0001-S0003:
S0001, acquiring a plurality of metadata information corresponding to a plurality of sub-data files, acquiring the number of the plurality of metadata information, and taking the number of the plurality of metadata information as the first file number of the data files, wherein the metadata information corresponds to the sub-data files one by one.
S0002, acquiring a plurality of corresponding subfile modification times from the plurality of metadata information, and taking the maximum value in the plurality of subfile modification times as a first latest modification time.
Alternatively, the first latest modification time may be a Unix timestamp, i.e., the first latest modification time is a 10-bit integer. The Unix timestamp may be referred to in the prior art, and is not described in detail herein.
S0003, composing file information of data files in the original server cluster, on which the computing task to be executed depends, based on the first file number and the first latest modification time.
Alternatively, the metadata information refers to feature information describing the sub data file, such as access rights to the sub data file, file address information, file modification time, and the like.
Optionally, the file information refers to file characteristic information and file number information describing the data file, wherein the file characteristic information includes a latest file modification time for the data file.
Optionally, the target server cluster is used for processing the computing task to be executed, and the original server cluster stores data required for processing the computing task to be executed.
S102, determining whether the data file is consistent with the history data file acquired in the previous time based on the file information, if so, executing the calculation task, if not, acquiring the data file, and executing the calculation task based on the data file.
It should be understood that when the data file is inconsistent with the previous acquired historical data file, the data file in the original server cluster is changed, so that the latest data file needs to be acquired from the original server cluster, and the computing task is executed based on the latest data file, so that the computing task is ensured to be executed correctly.
Optionally, the historical data file refers to a data file of a different version obtained from an original server cluster when the target server cluster executes a computing task to be executed before the target server cluster executes the computing task, wherein the data file of the different version refers to a data file of the original server cluster which is relied on to be locally modified when the computing task is executed for multiple times, so that the target server cluster obtains a plurality of data files with local differences.
Optionally, when the file modification time of a certain sub data file in the original server cluster is smaller than the first latest modification time of the data file, the first latest modification time is unchanged, but the number of the data files is changed, so that whether the data file is modified is not accurate enough is judged by the first latest modification time. And when the number of the first files is consistent with the number of the historical data files and the first latest modification time is consistent with the second latest modification time of the historical data files, the data files are considered to be consistent with the historical data files acquired last time.
In summary, by judging the number of files of the data file and the latest modification time of the data file, the situation that the data file is modified when the sub data file is deleted or the content of the sub data file is modified can be comprehensively covered, so that whether the data file is modified can be accurately judged.
Optionally, in order to accurately determine whether the data file is consistent with the history data file acquired previously, the method further includes the following steps S1021-S1023:
S1021, determining a first character string for indicating the first file number and the first latest modification time based on the first file number and the first latest modification time.
Optionally, in the foregoing S1021, a first string for indicating the first number of files and the first latest modification time is determined based on the first number of files and the first latest modification time, including the following steps S01-S02:
and S01, splicing the first number of files and the first latest modification time in a character string mode to obtain a third character string.
Optionally, in the foregoing S01, the splicing is performed on the first number of files and the first latest modification time in a string manner to obtain a third string, including the following steps S0101-S0103:
s0101, converting the number of the first files into a fourth character string of the character string type.
S0102, converting the first latest modification time into a fifth character string of the character string type.
S0103, splicing the fourth character string and the fifth character string to obtain a third character string.
S02, calculating the third character string through a preset function to obtain a first character string used for indicating the number of the first files and the first latest modification time.
Optionally, the preset function in S02 is a hash function. The hash function comprises any one of SHA-1 function, MD5 function, SHA-2 function and SHA-3 function.
It should be understood that, for facilitating understanding how the first string is generated, referring to fig. 3, fig. 3 is a schematic flow chart of generating the first string according to an exemplary embodiment of the present application, where the schematic flow chart includes S301-S305:
s301, metadata information corresponding to each sub-data file in the plurality of sub-data files in the data file is acquired.
S302, acquiring a plurality of file modification times and the number of the plurality of metadata information from the plurality of metadata information;
s303, taking the maximum value in the file modification time as the first latest modification time and the number of the metadata information as the first file number.
S304, splicing the first latest modification time of the character string type and the first file number of the character string type to obtain a third character string.
S305, carrying out hash function processing on the third character string to obtain a first character string.
And S1022, if the first character string is the same as the second character string corresponding to the historical data file, the data file is considered to be consistent with the historical data file acquired in the previous time.
S1023, if the first character string is different from the second character string corresponding to the historical data file, the data file is not consistent with the historical data file acquired in the previous time.
Optionally, in order to distinguish the data files with different versions, the method further comprises creating a corresponding relation between the first character string and the data files, and storing the corresponding relation.
Alternatively, different versions of the data file refer to versions of the data file that have changed since the computing task was first performed. For example, from the first execution of the computing task, the data file has not been changed until the last execution of the computing task, only one version of the data file exists in the target server cluster, and if the data file has been changed three times from the first execution of the computing task until the last execution of the computing task, three versions of the data file (history data file) exist in the target server cluster.
Alternatively, the data file may be identified using the first string as a version identifier, so that the data file of the desired version may be obtained directly through the first string.
In summary, the first character string is used as the version number to identify the data file, so that the data files of multiple versions are stored in the target server cluster, and the situation that only one latest data file can be reserved and the historical data file is lost due to direct coverage of the historical data file is avoided, so that the historical data file cannot be reused.
It should be understood that, in order to facilitate understanding how the target server clusters store data files of different versions, referring to fig. 4, fig. 4 is a schematic flow chart of determining execution of a computing task based on a first string according to an exemplary embodiment of the present application, first, whether the target server clusters store the first string is determined through the obtained first string, if so, the computing task is executed directly based on the data file corresponding to the first string stored in the target server clusters, that is, the data file on which the computing task depends in the original server cluster is not changed, where the data file corresponding to the first string is the historical data file stored in the target server cluster (the latest version of the data file stored in the target server cluster). If not, the data file (the latest data file) is obtained from the original server cluster, the corresponding relation between the first character string and the data file is constructed, the corresponding relation is stored, namely, the data file which is depended on by the calculation task in the original server cluster is changed, and the first character string is used as the version number of the data file. And finally, executing the calculation task based on the data file.
Optionally, in the foregoing step S102, the step of obtaining the data file includes the following steps S001-S002:
s001, acquiring data file address information corresponding to the computing task.
S002, acquiring the data file from the data file address information.
In summary, the method for acquiring file information of data files in an original server cluster, on which a computing task to be executed depends, includes a first number of files and a first latest modification time, where the original server cluster is any server cluster except for the target server cluster in the plurality of server clusters, determines whether the data files are consistent with historical data files acquired at the previous time based on the file information, if so, executes the computing task, if not, acquires the data files, and executes a scheme of the computing task based on the data files. The method for determining whether the data files in the original server cluster are consistent with the data files in the target server cluster or not by utilizing the file information, and for inconsistent data files, migration is carried out, and consistent data is not migrated, so that unnecessary migration time is avoided or more verification time is avoided, the processing efficiency of the target server cluster for processing the calculation task is greatly improved, the data preparation efficiency before the calculation task is executed is improved, whether the data set changes or not is determined by the number of the first files and the first latest modification time, and the changed data files are migrated, so that the data in the latest data files in the original server cluster are relied when the calculation task is executed by the target server cluster, unnecessary processing resources for carrying out data migration processing are saved, and more processing resources for carrying out data comparison processing on the data are saved.
In addition, the first character string is used as a version identifier, the data files which are changed each time are migrated to the target server cluster, and the data files migrated each time are identified by the first character string, so that a plurality of versions of data files are stored in the target server cluster.
Fig. 5 is a schematic structural diagram of a computing task execution device according to an exemplary embodiment of the present application, where the device is applicable to a target server cluster in a plurality of server clusters, and the device includes:
An obtaining unit 51, configured to obtain file information of a data file in an original server cluster, where the data file depends on a computing task to be executed, where the file information includes a first number of files and a first latest modification time, and the original server cluster is any server cluster, except for the target server cluster, in the plurality of server clusters;
and a determining unit 52, configured to determine, based on the file information, whether the data file is consistent with the history data file acquired previously, if so, execute the calculation task, and if not, acquire the data file, and execute the calculation task based on the data file.
Optionally, the device is further configured to consider that the data file is consistent with the history data file acquired last time when the first number of files is consistent with the number of files of the history data file and the first latest modification time is consistent with the second latest modification time of the history data file.
Optionally, the device is further used for determining a first character string for indicating the first file number and the first latest modification time based on the first file number and the first latest modification time, and if the first character string is identical to a second character string corresponding to the historical data file, the data file is considered to be consistent with the historical data file acquired in the previous time.
Optionally, the device is used for determining a first character string for indicating the first file number and the first latest modification time based on the first file number and the first latest modification time, and specifically is used for splicing the first file number and the first latest modification time in a character string mode to obtain a third character string, and calculating the third character string through a preset function to obtain the first character string for indicating the first file number and the first latest modification time.
Optionally, the device is further used for creating a corresponding relation between the first character string and the data file and storing the corresponding relation.
Optionally, the device is used for acquiring the data file, in particular for acquiring the data file address information corresponding to the computing task, and acquiring the data file from the data file address information.
Optionally, in the device, the preset function is a hash function.
It should be understood that apparatus embodiments and method embodiments may correspond with each other and that similar descriptions may refer to the method embodiments. To avoid repetition, no further description is provided here. Specifically, the apparatus may perform the above method embodiments, and the foregoing and other operations and/or functions of each module in the apparatus are respectively for corresponding flows in each method in the above method embodiments, which are not described herein for brevity.
The apparatus of the embodiments of the present application is described above in terms of functional modules with reference to the accompanying drawings. It should be understood that the functional module may be implemented in hardware, or may be implemented by instructions in software, or may be implemented by a combination of hardware and software modules. Specifically, each step of the method embodiment in the embodiment of the present application may be implemented by an integrated logic circuit of hardware in a processor and/or an instruction in a software form, and the steps of the method disclosed in connection with the embodiment of the present application may be directly implemented as a hardware decoding processor or implemented by a combination of hardware and software modules in the decoding processor. Alternatively, the software modules may be located in a well-established storage medium in the art such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, and the like. The storage medium is located in a memory, and the processor reads information in the memory, and in combination with hardware, performs the steps in the above method embodiments.
Fig. 6 is a schematic block diagram of an electronic device provided by an embodiment of the present application, which may include:
a memory 601 and a processor 602, the memory 601 being adapted to store a computer program and to transfer the program code to the processor 602. In other words, the processor 602 may call and run a computer program from the memory 601 to implement the method in the embodiment of the present application.
For example, the processor 602 may be used to perform the method embodiments described above in accordance with instructions in the computer program.
In some embodiments of the application, the processor 602 may include, but is not limited to:
a general purpose Processor, digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field programmable gate array (Field Programmable GATE ARRAY, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.
In some embodiments of the application, the memory 601 includes, but is not limited to:
Volatile memory and/or nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available, such as static random access memory (STATIC RAM, SRAM), dynamic random access memory (DYNAMIC RAM, DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate Synchronous dynamic random access memory (Double DATA RATE SDRAM, DDR SDRAM), enhanced Synchronous dynamic random access memory (ENHANCED SDRAM, ESDRAM), synchronous link dynamic random access memory (SYNCH LINK DRAM, SLDRAM), and Direct memory bus RAM (DR RAM).
In some embodiments of the application, the computer program may be split into one or more modules that are stored in the memory 601 and executed by the processor 602 to perform the methods provided by the application. The one or more modules may be a series of computer program instruction segments capable of performing the specified functions, which are used to describe the execution of the computer program in the electronic device.
As shown in fig. 6, the electronic device may further include:
a transceiver 603, the transceiver 603 being connectable to the processor 602 or the memory 601.
The processor 602 may control the transceiver 603 to communicate with other devices, and in particular, may send information or data to other devices or receive information or data sent by other devices. The transceiver 603 may include a transmitter and a receiver. The transceiver 603 may further include antennas, the number of which may be one or more.
It will be appreciated that the various components in the electronic device are connected by a bus system that includes, in addition to a data bus, a power bus, a control bus, and a status signal bus.
The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a computer, enables the computer to perform the method of the above-described method embodiments. Alternatively, embodiments of the present application also provide a computer program product comprising instructions which, when executed by a computer, cause the computer to perform the method of the method embodiments described above.
When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital video disc (digital video disc, DVD)), or a semiconductor medium (e.g., a Solid State Drive (SSD)), or the like.
Those of ordinary skill in the art will appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.
The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. For example, functional modules in various embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily appreciate variations or alternatives within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.