CN113495894B

CN113495894B - Data synchronization method, device, equipment and storage medium

Info

Publication number: CN113495894B
Application number: CN202010251108.3A
Authority: CN
Inventors: 齐彩会; 刘江波
Original assignee: Beijing Jingdong Zhenshi Information Technology Co Ltd
Current assignee: Beijing Jingdong Zhenshi Information Technology Co Ltd
Priority date: 2020-04-01
Filing date: 2020-04-01
Publication date: 2024-07-16
Anticipated expiration: 2040-04-01
Also published as: CN113495894A

Abstract

The present disclosure provides a data synchronization method, apparatus, device, and storage medium. The method comprises the following steps: receiving a change log in an SQL database, wherein the change log comprises change data information; determining the data operation type of the change log record; when the data operation type is an update operation, based on a main key of the change log, searching an updated data file corresponding to the main key, and based on the main key, searching an original data file corresponding to the main key in a Hadoop Distributed File System (HDFS), and based on Hbase writing operation, replacing the original data file with the updated data file in the HDFS. The method can improve the availability and stability of the database.

Description

Data synchronization method, device, equipment and storage medium

Technical Field

The disclosure relates to the field of computer technology, and in particular, to a data synchronization method, a device, equipment and a storage medium.

Background

The method for synchronizing database data in the prior data in the SQL server is to circularly scan the database log, obtain a certain amount of change logs with information tables in the database on the main server each time, when the operation type of the change logs is modification operation, only hexadecimal strings which change relative to the original data are recorded in the change logs, when the modification operation is carried out, the main key and the changed data table are required to be analyzed firstly, then all field data in the data table are queried by using the main key, thus obtaining changed data information, and then the analyzed data are distributed to other nodes of the computer network, so that data updating is realized.

When the method is modified, the data table in the SQL database needs to be revisited to acquire the changed data information, so that larger performance cost is caused on the database, and the availability and stability of the database system are affected.

The related art uses a zipper table to store data, which results in that the data cannot be synchronously updated in real time.

It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The disclosure aims to provide a data synchronization method, a device, equipment and a storage medium, which can improve the availability and stability of a database.

Other features and advantages of the present disclosure will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure.

According to one aspect of the present disclosure, there is provided a data synchronization method including: receiving a change log in an SQL database, wherein the change log comprises change data information; determining the data operation type of the change log record; when the data operation type is an update operation, based on a main key of the change log, searching an updated data file corresponding to the main key, and based on the main key, searching an original data file corresponding to the main key in a Hadoop Distributed File System (HDFS), and based on Hbase writing operation, replacing the original data file with the updated data file in the HDFS.

In some embodiments, the above method further comprises: when the data operation type is the newly added operation, based on Hbase writing operation, the main key of the change log and the corresponding data file are stored in the HDFS.

In some embodiments, the above method further comprises: when the operation type is a delete operation, in the HDFS, data corresponding to the primary key identifying the change log is deleted.

In some embodiments, the above method further comprises: based on the Hive read operation, the data file is read from the HDFS.

In some embodiments, the data file stored in the HDFS is a hexadecimal data file.

In some embodiments, reading a data file from an HDFS based on a Hive read operation includes: based on Hive read operation, the data file is converted into decimal data by a pre-configured parser, and the converted data file is read out.

In some embodiments, receiving a change log in an SQL database includes: and receiving change logs in the SQL database acquired periodically.

According to another aspect of the present disclosure, there is provided a data synchronization apparatus including: the log receiving module is used for receiving a change log in the SQL database, wherein the change log comprises change data information; the type determining module is used for determining the data operation type of the change log record; and the file replacement module is used for inquiring the updated data file corresponding to the main key based on the main key of the change log when the data operation type is the update operation, inquiring the original data file corresponding to the main key in the Hadoop distributed file system HDFS based on the main key, and replacing the original data file by using the updated data file based on Hbase writing operation in the HDFS.

According to yet another aspect of the present disclosure, there is provided a computer apparatus comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform any of the methods described above via execution of the executable instructions.

According to yet another aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements any of the methods described above.

According to the data synchronization method provided by the embodiment of the disclosure, when the data operation type of the received change log is update operation, based on a primary key of the change log, an updated data file corresponding to the primary key is queried, and an original data file corresponding to the primary key is queried in an HDFS, and in the HDFS, based on Hbase writing operation, the updated data file is used for replacing the original data file. According to the method, the data files are stored in the HDFS, the analysis problem is not needed to be considered in data synchronization, and the operation of inquiring the data table in the SQL database is not needed to be visited, so that the performance of the SQL database system is prevented from being influenced.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort.

Fig. 1 is a flow chart illustrating a method of data synchronization according to an exemplary embodiment.

Fig. 2 is an architecture diagram of a data synchronization system, according to an example embodiment.

FIG. 3 is a flow chart illustrating a data synchronization system interaction according to an exemplary embodiment.

Fig. 4 is a flow chart illustrating another data synchronization method according to an exemplary embodiment.

Fig. 5 is a flow chart illustrating another data synchronization method according to an exemplary embodiment.

Fig. 6 is a flow chart illustrating another data synchronization method according to an exemplary embodiment.

Fig. 7 is a block diagram illustrating a data synchronization device according to an example embodiment.

Fig. 8 is a schematic diagram of a computer system according to an exemplary embodiment.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.

In the description of the present disclosure, the terms "first," "second," and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature.

The steps of the data synchronization method in the exemplary embodiments of the present disclosure will be described in more detail with reference to the accompanying drawings and embodiments.

Fig. 1 is a flow chart illustrating a method of data synchronization according to an exemplary embodiment. The method provided by the embodiments of the present disclosure may be performed by a client equipped with Hbase (H-base). Hbase is a distributed, nematic, open source database.

As shown in fig. 1, the data synchronization method 10 includes:

In step S102, a change log in an SQL (Structured Query Language, structured query statement) database is received.

Wherein the change log includes change data information.

The system shown in fig. 3 may include Manager and Agent nodes shown in fig. 2.

As shown in fig. 2 and 3, the Manager may be a server to which a database Manager is added, and the Agent may be an Hbase client.

The Manager may receive the change log file in the SQL server and distribute the change log file to the Agent node of the computer network that hosts the Hbase client.

The change log in the database may be obtained, for example, based on the Fn dblog (parse operation log) function that is self-contained in the SQL database. The fn_ dblog function may return the log of the SQL server in the form of a data table record.

Interaction between Manager and SQL may be achieved through JDBC (Java Database Connectivity, java database connection). JDBC is an application program interface in Java (computer programming language) that is used to specify how client programs access a database, and provides methods such as querying and updating data in the database.

Wherein, manager may correspond to one or more Agent nodes.

The system can circularly scan the log file of the SQL database to obtain the change log in the SQL database.

Or the system can also acquire the change log in the SQL database at preset time intervals.

In step S104, the data operation type of the change log record is determined.

The data operation type of the change log record may be determined, for example, at an Agent node of the Hbase client.

The data operation types of the change log record may include, for example: update operation, add operation, or delete operation.

When the data operation type of the change log record is an update operation, step S106 is performed.

In step S106, when the data operation type is an update operation, based on the primary key of the change log, the updated data file corresponding to the primary key is queried, and based on the primary key, the original data file corresponding to the primary key is queried in the Hadoop (big data) distributed file system HDFS, and in the HDFS, based on the Hbase write operation, the original data file is replaced with the updated data file.

As shown in fig. 3, the Hbase target data table may be stored in the HDFS.

When the data operation type of the change log record is an update operation, based on a main key (Rowkey) of the change log, an updated data file corresponding to the main key is queried, for example, the data file may be a data field corresponding to the main key, and based on the main key, an original data file corresponding to the main key is queried in an Hbase target data table in an HDFS (Hadoop Distributed File SystemHadoop, distributed file system), and in the HDFS, based on an Hbase writing operation, the updated data file is used to replace the original data file.

The updated data file is used to replace the original data file, for example, the updated data file may be used to directly replace the original data file, or a changed portion in the updated data file may be used to replace a corresponding portion in the original data file.

The data file may be, for example, a hexadecimal file.

For example, a changed hexadecimal file can be recorded in the change log, when the data operation type recorded in the change log is an updating operation, the changed hexadecimal file can be used for directly updating the corresponding part of the original data file in the HDFS, the operation of inquiring the data table in the SQL database is not required to be revisited, and the IO (Input/Output) operation on the SQL database is reduced, so that the performance of the SQL database system is prevented from being influenced.

Based on Hbase writing operation, batch writing of data can be realized, and data writing efficiency is improved.

And data is stored in the HDFS, so that real-time synchronous updating of the data can be realized.

According to the data synchronization method provided by the embodiment of the disclosure, when the data operation type of the received change log is update operation, based on a primary key of the change log, an updated data file corresponding to the primary key is queried, and an original data file corresponding to the primary key is queried in an HDFS, and in the HDFS, based on Hbase writing operation, the updated data file is used for replacing the original data file. According to the method, the data files are stored in the HDFS, the analysis problem is not needed to be considered in data synchronization, and the operation of inquiring the data table in the SQL database is not needed to be visited, so that the performance of the SQL database system is prevented from being influenced, and the availability and the stability of the database system are improved.

After step S104 of the data synchronization method 10 shown in fig. 1, the data synchronization method 20 shown in fig. 4 further includes:

In step S108, when the data operation type is the newly added operation, the primary key of the change log and its corresponding data file are stored in the HDFS based on the Hbase write operation.

When the data operation type is the newly added operation, based on the Hbase writing operation, the main key field of the change log is inserted into the main key field of the Hbase target data table, and the data file corresponding to the main key of the change log is inserted into the data field of the Hbase target data table.

Wherein, hbase target data table is stored in HDFS.

The data file may be, for example, hexadecimal data information, or may be cut hexadecimal data information. Wherein the hexadecimal data information is cut, and useless information such as a header can be removed.

After step S104 of the data synchronization method 10 shown in fig. 1, the data synchronization method 30 shown in fig. 5 further includes:

In step S110, when the operation type is a delete operation, in the HDFS, data corresponding to the primary key identifying the change log is deleted.

When the data operation type is a delete operation, based on the primary key of the change log, in the Hbase target data table of the HDFS, the data corresponding to the primary key is identified as deleted, for example, the logical delete field isdelete (delete) corresponding to the primary key may be identified as 1 or true.

Fig. 6 is a flow chart illustrating another data synchronization method according to an exemplary embodiment. The method provided by the embodiments of the present disclosure may be performed by a Hive (Hive) equipped client.

After step S106 of the data synchronization method 10 shown in fig. 1, the data synchronization method 40 shown in fig. 6 further includes:

in step S402, a data file is read from the HDFS based on the Hive read operation.

In this embodiment, based on Hbase writing operation, data of the change log is written into the HDFS, and based on Hive reading operation, the HDFS file written by Hbase is read, so that separate reading and writing processes can be realized.

The preconfigured parser may be, for example, an org.apache.hadoop.hive.hbasetoragehandler class.

For example, hexadecimal data files stored in the HDFS can be converted into decimal data files by executing org.apoche.hadoop.hive.hbasetoragehander class, so that Hive can directly use the HDFS files written by Hbase to realize the synchronous update of data in real time while supporting the operations of modification, addition and deletion.

It is noted that the above-described figures are merely schematic illustrations of processes involved in a method according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.

The following are device embodiments of the present disclosure that may be used to perform method embodiments of the present disclosure. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the method of the present disclosure.

As shown in fig. 7, the data synchronizing device 50 includes: log receiving module 502, type determining module 504, and file replacing module 506.

The log receiving module 502 is configured to receive a change log in the SQL database, where the change log includes change data information; the type determining module 504 is configured to determine a data operation type of the change log record; the file replacing module 506 is configured to, when the data operation type is an update operation, query an updated data file corresponding to the primary key based on the primary key of the change log, and query an original data file corresponding to the primary key in a Hadoop distributed file system HDFS based on the primary key, and replace the original data file with the updated data file based on an Hbase write operation in the HDFS.

In some embodiments, the data synchronization device 50 further comprises: and the file storage module is used for storing the main key of the change log and the corresponding data file thereof in the HDFS based on Hbase writing operation when the data operation type is the newly added operation.

In some embodiments, the data synchronization device 50 further comprises: and the data identification module is used for identifying the data corresponding to the main key of the change log as deleted in the HDFS when the operation type is a deleting operation.

In some embodiments, the data synchronization device 50 further comprises: and the file reading module is used for reading the data file from the HDFS based on the Hive reading operation.

In some embodiments, the file reading module includes a file reading unit for converting the data file into decimal data through a preconfigured parser based on the Hive read operation, and reading out the converted data file.

In some embodiments, the log receiving module 502 includes a log receiving unit for receiving change logs in a periodically acquired SQL database.

According to the data synchronization device provided by the embodiment of the disclosure, when the data operation type of the received change log is update operation, based on the primary key of the change log, the updated data file corresponding to the primary key is queried, and the original data file corresponding to the primary key is queried in the HDFS, and in the HDFS, based on Hbase writing operation, the updated data file is used for replacing the original data file. According to the device, the data files are stored in the HDFS, the analysis problem is not needed to be considered in data synchronization, and the operation of inquiring the data table in the SQL database is not needed to be visited, so that the performance of the SQL database system is prevented from being influenced, and the availability and the stability of the database system are improved.

It should be noted that the block diagrams shown in the above figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.

Fig. 8 is a schematic diagram of a computer device according to an exemplary embodiment. It should be noted that the computer device shown in fig. 8 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present invention.

As shown in fig. 8, the computer device 800 includes a Central Processing Unit (CPU) 801, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the system 800 are also stored. The CPU 801, ROM 802, and RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.

The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, mouse, etc.; an output portion 807 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 808 including a hard disk or the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. The drive 810 is also connected to the I/O interface 805 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as needed so that a computer program read out therefrom is mounted into the storage section 808 as needed.

In particular, according to embodiments of the present invention, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present invention include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 809, and/or installed from the removable media 811. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 801.

The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present invention may be implemented in software or in hardware. The described units may also be provided in a processor, for example, described as: a processor includes a transmitting unit, an acquiring unit, a determining unit, and a first processing unit. The names of these units do not constitute a limitation on the unit itself in some cases, and for example, the transmitting unit may also be described as "a unit that transmits a picture acquisition request to a connected server".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to include:

receiving a change log in an SQL database, wherein the change log comprises change data information;

Determining the data operation type of the change log record;

When the data operation type is an update operation, based on a main key of the change log, searching an updated data file corresponding to the main key, and based on the main key, searching an original data file corresponding to the main key in a Hadoop Distributed File System (HDFS), and based on Hbase writing operation, replacing the original data file with the updated data file in the HDFS.

The exemplary embodiments of the present invention have been particularly shown and described above. It is to be understood that this invention is not limited to the precise arrangements, instrumentalities and instrumentalities described herein; on the contrary, the invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A method of data synchronization, comprising:

determining the data operation type of the change log record;

When the data operation type is an updating operation, inquiring an updated data file corresponding to the main key based on the main key of the change log, and inquiring an original data file corresponding to the main key in a Hadoop Distributed File System (HDFS) based on the main key, wherein in the HDFS, based on Hbase writing operation, a changed part in the updated data file is used for replacing a corresponding part in the original data file;

2. The method as recited in claim 1, further comprising:

when the operation type is a delete operation, in the HDFS, data corresponding to a primary key of the change log is identified as deleted.

3. The method according to any one of claims 1-2, further comprising:

based on Hive read operations, data files are read from the HDFS.

4. A method according to claim 3, wherein the data files stored in the HDFS are hexadecimal data files.

5. The method of claim 4, wherein reading a data file from the HDFS based on a Hive read operation comprises:

Based on Hive reading operation, converting the data file into decimal data through a preconfigured parser, and reading out the converted data file.

6. The method of claim 1, wherein receiving a change log in an SQL database comprises: and receiving change logs in the SQL database acquired periodically.

7. A data synchronization device, comprising:

the log receiving module is used for receiving a change log in the SQL database, wherein the change log comprises change data information;

the type determining module is used for determining the data operation type of the change log record;

A file replacement module, configured to query, when the data operation type is an update operation, an updated data file corresponding to the primary key based on a primary key of the change log, and query, based on the primary key, an original data file corresponding to the primary key in a Hadoop distributed file system HDFS, where, based on an Hbase write operation, a portion of the updated data file that changes is used to replace a corresponding portion of the original data file;

And the file storage module is used for inserting the main key field of the change log into the main key field of the Hbase target data table based on the Hbase writing operation when the data operation type is the newly added operation, and inserting the data file corresponding to the main key of the change log into the data field of the Hbase target data table.

8. The apparatus as recited in claim 7, further comprising:

and the file storage module is used for storing the main key of the change log and the corresponding data file thereof in the HDFS based on Hbase writing operation when the data operation type is newly added operation.

9. A computer device, comprising: memory, a processor and executable instructions stored in the memory and executable in the processor, wherein the processor implements the method of any of claims 1-6 when executing the executable instructions.

10. A computer readable storage medium having stored thereon computer executable instructions which when executed by a processor implement the method of any of claims 1-6.