[go: up one dir, main page]

CN117194558A - Full-data synchronization method, device, equipment and medium - Google Patents

Full-data synchronization method, device, equipment and medium Download PDF

Info

Publication number
CN117194558A
CN117194558A CN202310294028.XA CN202310294028A CN117194558A CN 117194558 A CN117194558 A CN 117194558A CN 202310294028 A CN202310294028 A CN 202310294028A CN 117194558 A CN117194558 A CN 117194558A
Authority
CN
China
Prior art keywords
full
data
data file
file
full data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310294028.XA
Other languages
Chinese (zh)
Inventor
吕仁朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Inspur IGO Cloud Chain Information Technology Co Ltd
Original Assignee
Shandong Inspur IGO Cloud Chain Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Inspur IGO Cloud Chain Information Technology Co Ltd filed Critical Shandong Inspur IGO Cloud Chain Information Technology Co Ltd
Priority to CN202310294028.XA priority Critical patent/CN117194558A/en
Publication of CN117194558A publication Critical patent/CN117194558A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请实施例公开了一种一种全量数据同步方法、装置、设备及介质。包括,获取第三方系统发送的全量数据同步请求;基于全量数据请求确定出全量数据文件位置,并将全量数据文件拉取至本地sftp服务器;将本地sftp服务器中的全量数据文件划分为多个数据单元,将每个数据单元与历史全量数据文件进行比对,以基于比对结果将全量数据文件保存至不同的数据文件;对不同的数据文件生成不同的sql语句,将不同的sql语句进行入库处理,以完成全量数据同步。通过上述方法解决数据冗余问题的发生。

The embodiments of the present application disclose a full data synchronization method, device, equipment and medium. Including, obtaining the full data synchronization request sent by the third-party system; determining the location of the full data file based on the full data request, and pulling the full data file to the local sftp server; dividing the full data file in the local sftp server into multiple data unit, compare each data unit with the historical full data file, and save the full data file to different data files based on the comparison results; generate different sql statements for different data files, and enter the different sql statements. Database processing to complete full data synchronization. Solve the problem of data redundancy through the above methods.

Description

一种全量数据同步方法、装置、设备及介质A full data synchronization method, device, equipment and medium

技术领域Technical field

本申请涉及数据处理技术领域,尤其涉及一种全量数据同步方法、装置、设备及介质。The present application relates to the field of data processing technology, and in particular to a full data synchronization method, device, equipment and medium.

背景技术Background technique

全量数据是指当前需要迁移的数据库系统的全部数据。随着互联网和大数据技术的推广和普及,数据仓库逐步成为主流。由于数据仓库具有较大的存储量,且可处理结构化数据,可将数据库中的数据同步至数据仓库中进行处理。The full amount of data refers to all the data in the database system that currently needs to be migrated. With the promotion and popularization of the Internet and big data technology, data warehouse has gradually become mainstream. Since the data warehouse has a large storage capacity and can process structured data, the data in the database can be synchronized to the data warehouse for processing.

在数据迁移过程中,业务系统需要同步第三方系统数据,如订单数据、用户数据、企业数据、交易流水。但是第三方系统只提供全量数据。由于数据量过大,难以确保大量数据的准确迁移,从而造成数据冗余等问题的发生。During the data migration process, the business system needs to synchronize third-party system data, such as order data, user data, enterprise data, and transaction flow. But third-party systems only provide full data. Due to the large amount of data, it is difficult to ensure accurate migration of large amounts of data, resulting in problems such as data redundancy.

发明内容Contents of the invention

本申请实施例提供了一种全量数据同步方法、装置、设备及介质,用于解决如下技术问题:由于数据量过大,难以确保大量数据的准确迁移,从而造成数据冗余等问题的发生。Embodiments of the present application provide a full data synchronization method, device, equipment and medium to solve the following technical problem: due to excessive data volume, it is difficult to ensure accurate migration of a large amount of data, resulting in problems such as data redundancy.

本申请实施例采用下述技术方案:The embodiments of this application adopt the following technical solutions:

本申请实施例提供一种全量数据同步方法。包括,获取第三方系统发送的全量数据同步请求;基于全量数据请求确定出全量数据文件位置,并将全量数据文件拉取至本地sftp服务器;将本地sftp服务器中的全量数据文件划分为多个数据单元,将每个数据单元与历史全量数据文件进行比对,以基于比对结果将全量数据文件保存至不同的数据文件;对不同的数据文件生成不同的sql语句,将不同的sql语句进行入库处理,以完成全量数据同步。The embodiment of the present application provides a full data synchronization method. Including, obtaining the full data synchronization request sent by the third-party system; determining the location of the full data file based on the full data request, and pulling the full data file to the local sftp server; dividing the full data file in the local sftp server into multiple data unit, compare each data unit with the historical full data file, and save the full data file to different data files based on the comparison results; generate different sql statements for different data files, and enter the different sql statements. Database processing to complete full data synchronization.

本申请实施例文件传输协议适用于传输大问题,全量数据同步时,以文件为载体可以保证数据的传输效率。其次,本申请实施例通过将本地sftp服务器中的全量数据文件划分为多个数据单元,并将多个数据单元与历史全量数据文件进行比对,可以确定出与历史全量数据文件相同或不同的数据。全量数据文件解析为新增、修改以及删除等多种数据文件并最终转化为sql语句,将不同的数据进行不同保存,以解决全量数据冗余的问题。The file transfer protocol in the embodiment of this application is suitable for large-scale transmission problems. When all data is synchronized, using files as a carrier can ensure data transmission efficiency. Secondly, the embodiment of the present application divides the full data file in the local sftp server into multiple data units, and compares the multiple data units with the historical full data files to determine whether they are the same as or different from the historical full data files. data. The full data file is parsed into multiple data files such as new, modified, and deleted data files and finally converted into SQL statements. Different data are saved differently to solve the problem of full data redundancy.

在本申请的一种实现方式中,将全量数据文件拉取至本地sftp服务器之后,方法还包括:确定出全量数据文件对应的推送频率;基于推送频率设置全量文件本地存放目录;基于推送频率将全量数据文件存放至不同的本地存放目录下,并定期对本地存放目录下的全量数据文件进行备份。In one implementation of this application, after pulling the full data file to the local sftp server, the method also includes: determining the push frequency corresponding to the full data file; setting the local storage directory for the full file based on the push frequency; The full data files are stored in different local storage directories, and the full data files in the local storage directories are backed up regularly.

在本申请的一种实现方式中,将本地sftp服务器中的全量数据文件划分为多个数据单元之前,方法还包括:在全量数据文件在本地sftp服务器中不存在的情况下,确定全量数据文件为新增数据文件;将新增数据文件生成相应的的sql语句,将不同的sql语句进行入库处理。In an implementation manner of the present application, before dividing the full data file in the local sftp server into multiple data units, the method further includes: determining the full data file if the full data file does not exist in the local sftp server. To add new data files; generate corresponding sql statements for the new data files, and store different sql statements into the database for processing.

在本申请的一种实现方式中,将本地sftp服务器中的全量数据文件划分为多个数据单元,将每个数据单元与历史全量数据文件进行比对,以基于比对结果将全量数据文件保存至不同的数据文件,具体包括:在全量数据文件在本地sftp服务器中存在的情况下,以行为单位将全量数据文件划分为多个数据单元,并对多个数据单元分别进行唯一标识标注;将多个数据单元,与历史全量数据文件对应的多个数据单元进行比对,以基于比对结果将全量数据文件保存至不同的数据文件。In one implementation of this application, the full data file in the local sftp server is divided into multiple data units, each data unit is compared with the historical full data file, and the full data file is saved based on the comparison result. to different data files, specifically including: when the full data file exists in the local sftp server, divide the full data file into multiple data units in behavioral units, and uniquely identify the multiple data units; Multiple data units are compared with multiple data units corresponding to the historical full data file, so that the full data file is saved to different data files based on the comparison results.

在本申请的一种实现方式中,将多个数据单元,与历史全量数据文件对应的多个数据单元进行比对,以基于比对结果将全量数据文件保存至不同的数据文件,具体包括:在历史全量数据中不存在全量数据文件的情况下,将全量数据文件对应的数据存入新增数据文件;在历史全量数据中存在全量数据文件,且全量数据文件发生修改的情况下,将发生修改的全量数据文件存入修改数据文件;将历史全量数据中存在,但全量数据文件不存在的数据,存入删除数据文件;将历史全量数据与全量数据文件均存在的数据,存入无变化数据文件。In an implementation manner of the present application, multiple data units are compared with multiple data units corresponding to the historical full data files, so as to save the full data files to different data files based on the comparison results, which specifically includes: When there is no full data file in the historical full data, the data corresponding to the full data file is stored in the new data file; when there is a full data file in the historical full data and the full data file is modified, an error will occur. The modified full data file is stored in the modified data file; the data that exists in the historical full data but does not exist in the full data file is stored in the deleted data file; the data that exists in both the historical full data and the full data file is stored in unchanged data file.

在本申请的一种实现方式中,对不同的数据文件生成不同的sql语句,将不同的sql语句进行入库处理,以完成全量数据同步,具体包括:将新增数据文件生成insert sql语句;以及将修改数据文件生成update sql语句;以及将删除数据文件生成delete语句。In an implementation method of this application, different sql statements are generated for different data files, and the different sql statements are stored in the database for processing to complete full data synchronization, which specifically includes: generating insert sql statements for new data files; And the update sql statement will be generated by modifying the data file; and the delete statement will be generated by deleting the data file.

在本申请的一种实现方式中,基于全量数据请求确定出全量数据文件位置,并将全量数据文件拉取至本地sftp服务器,具体包括:通过预置用户名以及预置密码登录第三方系统sftp服务器;基于全量数据请求,在第三方系统sftp服务器中确定出全量数据文件的位置;将全量数据文件拉取至本地sftp服务器。In one implementation of this application, the location of the full data file is determined based on the full data request, and the full data file is pulled to the local sftp server. This specifically includes: logging into the third-party system sftp through a preset user name and a preset password. Server; based on the full data request, determine the location of the full data file in the third-party system sftp server; pull the full data file to the local sftp server.

本申请实施例提供一种全量数据同步装置,包括:全量数据同步请求获取单元,获取第三方系统发送的全量数据同步请求;全量数据文件获取单元,基于全量数据请求确定出全量数据文件位置,并将全量数据文件拉取至本地sftp服务器;全量数据比对单元,将本地sftp服务器中的全量数据文件划分为多个数据单元,将每个数据单元与历史全量数据文件进行比对,以基于比对结果将全量数据文件保存至不同的数据文件;sql语句生成单元,对不同的数据文件生成不同的sql语句,将不同的sql语句进行入库处理,以完成全量数据同步。Embodiments of the present application provide a full data synchronization device, including: a full data synchronization request acquisition unit, which acquires a full data synchronization request sent by a third-party system; a full data file acquisition unit, which determines the location of the full data file based on the full data request, and Pull the full data file to the local sftp server; the full data comparison unit divides the full data file in the local sftp server into multiple data units, and compares each data unit with the historical full data file. Save all data files to different data files for the results; the SQL statement generation unit generates different SQL statements for different data files, and stores the different SQL statements into the database for processing to complete full data synchronization.

本申请实施例提供一种全量数据同步设备,包括:至少一个处理器;以及,The embodiment of the present application provides a full data synchronization device, including: at least one processor; and,

与至少一个处理器通信连接的存储器;其中,存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够:获取第三方系统发送的全量数据同步请求;基于全量数据请求确定出全量数据文件位置,并将全量数据文件拉取至本地sftp服务器;将本地sftp服务器中的全量数据文件划分为多个数据单元,将每个数据单元与历史全量数据文件进行比对,以基于比对结果将全量数据文件保存至不同的数据文件;对不同的数据文件生成不同的sql语句,将不同的sql语句进行入库处理,以完成全量数据同步。A memory communicatively connected to at least one processor; wherein the memory stores instructions that can be executed by at least one processor, and the instructions are executed by at least one processor to enable at least one processor to: obtain full data synchronization sent by a third-party system Request; determine the location of the full data file based on the full data request, and pull the full data file to the local sftp server; divide the full data file in the local sftp server into multiple data units, and compare each data unit with the historical full data Compare the files to save all data files to different data files based on the comparison results; generate different sql statements for different data files, and store the different sql statements into the database for complete data synchronization.

本申请实施例提供的一种非易失性计算机存储介质,存储有计算机可执行指令,计算机可执行指令设置为:获取第三方系统发送的全量数据同步请求;基于全量数据请求确定出全量数据文件位置,并将全量数据文件拉取至本地sftp服务器;将本地sftp服务器中的全量数据文件划分为多个数据单元,将每个数据单元与历史全量数据文件进行比对,以基于比对结果将全量数据文件保存至不同的数据文件;对不同的数据文件生成不同的sql语句,将不同的sql语句进行入库处理,以完成全量数据同步。A non-volatile computer storage medium provided by an embodiment of the present application stores computer executable instructions, and the computer executable instructions are set to: obtain a full data synchronization request sent by a third-party system; determine a full data file based on the full data request location, and pull the full data file to the local sftp server; divide the full data file in the local sftp server into multiple data units, compare each data unit with the historical full data file, and compare the data files based on the comparison results. The full data files are saved to different data files; different SQL statements are generated for different data files, and the different SQL statements are stored in the database for complete data synchronization.

本申请实施例采用的上述至少一个技术方案能够达到以下有益效果:本申请实施例文件传输协议适用于传输大问题,全量数据同步时,以文件为载体可以保证数据的传输效率。其次,本申请实施例通过将本地sftp服务器中的全量数据文件划分为多个数据单元,并将多个数据单元与历史全量数据文件进行比对,可以确定出与历史全量数据文件相同或不同的数据。全量数据文件解析为新增、修改以及删除等多种数据文件并最终转化为sql语句,将不同的数据进行不同保存,以解决全量数据冗余的问题。At least one of the above technical solutions adopted in the embodiments of this application can achieve the following beneficial effects: The file transfer protocol in the embodiments of this application is suitable for large-scale transmission problems. When all data is synchronized, using files as a carrier can ensure data transmission efficiency. Secondly, the embodiment of the present application divides the full data file in the local sftp server into multiple data units, and compares the multiple data units with the historical full data files to determine whether they are the same as or different from the historical full data files. data. The full data file is parsed into multiple data files such as new, modified, and deleted data files and finally converted into SQL statements. Different data are saved differently to solve the problem of full data redundancy.

附图说明Description of the drawings

为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。在附In order to explain the embodiments of the present application or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only These are some of the embodiments recorded in this application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting any creative effort. Attached

图中:In the picture:

图1为本申请实施例提供的一种全量数据同步方法流程图;Figure 1 is a flow chart of a full data synchronization method provided by an embodiment of the present application;

图2为本申请实施例提供的一种全量数据同步装置组成框图;Figure 2 is a block diagram of a full data synchronization device provided by an embodiment of the present application;

图3为本申请实施例提供的一种全量数据同步装置结构示意图;Figure 3 is a schematic structural diagram of a full data synchronization device provided by an embodiment of the present application;

图4为本申请实施例提供的一种全量数据同步设备的结构示意图。Figure 4 is a schematic structural diagram of a full data synchronization device provided by an embodiment of the present application.

具体实施方式Detailed ways

本申请实施例提供一种全量数据同步方法、装置、设备及介质。Embodiments of the present application provide a full data synchronization method, device, equipment and medium.

为了使本技术领域的人员更好地理解本申请中的技术方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本说明书实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。In order to enable those in the technical field to better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described The embodiments are only some of the embodiments of this application, not all of them. Based on the embodiments of this specification, all other embodiments obtained by those of ordinary skill in the art without creative efforts should fall within the scope of protection of this application.

全量数据是指当前需要迁移的数据库系统的全部数据。随着互联网和大数据技术的推广和普及,数据仓库逐步成为主流。由于数据仓库具有较大的存储量,且可处理结构化数据,可将数据库中的数据同步至数据仓库中进行处理。The full amount of data refers to all the data in the database system that currently needs to be migrated. With the promotion and popularization of the Internet and big data technology, data warehouse has gradually become mainstream. Since the data warehouse has a large storage capacity and can process structured data, the data in the database can be synchronized to the data warehouse for processing.

在数据迁移过程中,业务系统需要同步第三方系统数据,如订单数据、用户数据、企业数据、交易流水。但是第三方系统只提供全量数据。由于数据量过大,难以确保大量数据的准确迁移,从而造成数据冗余等问题的发生。During the data migration process, the business system needs to synchronize third-party system data, such as order data, user data, enterprise data, and transaction flow. But third-party systems only provide full data. Due to the large amount of data, it is difficult to ensure accurate migration of large amounts of data, resulting in problems such as data redundancy.

为了解决上述问题,本申请实施例提供一种全量数据同步方法、装置、设备及介质。文件传输协议适用于传输大问题,全量数据同步时,以文件为载体可以保证数据的传输效率。其次,本申请实施例通过将本地sftp服务器中的全量数据文件划分为多个数据单元,并将多个数据单元与历史全量数据文件进行比对,可以确定出与历史全量数据文件相同或不同的数据。全量数据文件解析为新增、修改以及删除等多种数据文件并最终转化为sql语句,将不同的数据进行不同保存,以解决全量数据冗余的问题。In order to solve the above problems, embodiments of the present application provide a full data synchronization method, device, equipment and medium. The file transfer protocol is suitable for large transmission problems. When all data is synchronized, using files as a carrier can ensure data transmission efficiency. Secondly, the embodiment of the present application divides the full data file in the local sftp server into multiple data units, and compares the multiple data units with the historical full data files to determine whether they are the same as or different from the historical full data files. data. The full data file is parsed into multiple data files such as new, modified, and deleted data files and finally converted into SQL statements. Different data are saved differently to solve the problem of full data redundancy.

下面通过附图对本申请实施例提出的技术方案进行详细的说明。The technical solutions proposed in the embodiments of the present application will be described in detail below through the accompanying drawings.

图1为本申请实施例提供的一种全量数据同步方法流程图,如图1所示,全量数据同步方法包括如下步骤:Figure 1 is a flow chart of a full data synchronization method provided by an embodiment of the present application. As shown in Figure 1, the full data synchronization method includes the following steps:

S101、获取第三方系统发送的全量数据同步请求。S101. Obtain the full data synchronization request sent by the third-party system.

在本申请的一个实施例中,第三方系统给业务系统发送全量数据同步http请求。此时全量数据并未真正发送给业务系统,而是发送了全量数据所在位置,即全量数据文件所在文件服务器位置。In one embodiment of this application, the third-party system sends a full data synchronization http request to the business system. At this time, the full data is not actually sent to the business system, but the location of the full data, that is, the location of the file server where the full data file is located.

具体地,业务系统接收到第三方系统发送的全量数据同步请求,该请求中包括有当前待同步的全量数据的文件所在位置。业务系统根据接收到的全量数据同步请求,即可确定出当前全量数据的位置。Specifically, the business system receives a full data synchronization request sent by the third-party system, and the request includes the location of the file of the full data currently to be synchronized. The business system can determine the location of the current full data based on the received full data synchronization request.

S102、基于全量数据请求确定出全量数据文件位置,并将全量数据文件拉取至本地sftp服务器。S102. Determine the location of the full data file based on the full data request, and pull the full data file to the local sftp server.

在本申请的一个实施例中,通过预置用户名以及预置密码登录第三方系统sftp服务器。基于全量数据请求,在第三方系统sftp服务器中确定出全量数据文件的位置。将全量数据文件拉取至本地sftp服务器。In one embodiment of this application, the third-party system sftp server is logged in through a preset user name and a preset password. Based on the full data request, the location of the full data file is determined in the third-party system sftp server. Pull the full data file to the local sftp server.

具体地,第三方系统给业务系统发送全量数据同步请求后,会附带全量数据文件位置,业务系统通过事先告知的用户名、密码登录第三方系统sftp服务器,拉取全量数据文件到本地sftp服务器。Specifically, after the third-party system sends a full data synchronization request to the business system, it will attach the location of the full data file. The business system logs in to the third-party system sftp server through the user name and password notified in advance, and pulls the full data file to the local sftp server.

在本申请的一个实施例中,确定出全量数据文件对应的推送频率,基于推送频率设置全量文件本地存放目录。基于推送频率将全量数据文件存放至不同的本地存放目录下,并定期对本地存放目录下的全量数据文件进行备份。In one embodiment of the present application, the push frequency corresponding to the full data file is determined, and a local storage directory for the full file is set based on the push frequency. Store the full data files in different local storage directories based on push frequency, and back up the full data files in the local storage directories regularly.

具体地,根据全量文件推送频率预先设置全量文件本地存放目录。例如,全量数据文件每天推送,则以日期为文件夹存放每天全量数据文件,并对文件做定期备份,以保证历史全量数据文件可查。Specifically, the local storage directory of the full files is preset according to the frequency of pushing the full files. For example, if the full data files are pushed every day, the date will be used as the folder to store the daily full data files, and the files will be backed up regularly to ensure that the historical full data files can be checked.

S103、将本地sftp服务器中的全量数据文件划分为多个数据单元,将每个数据单元与历史全量数据文件进行比对,以基于比对结果将全量数据文件保存至不同的数据文件。S103. Divide the full data file in the local sftp server into multiple data units, compare each data unit with the historical full data file, and save the full data file to different data files based on the comparison results.

在本申请的一个实施例中,在全量数据文件在本地sftp服务器中不存在的情况下,确定全量数据文件为新增数据文件。将新增数据文件生成相应的的sql语句,将不同的sql语句进行入库处理。In one embodiment of the present application, when the full data file does not exist in the local sftp server, the full data file is determined to be a new data file. The new data files will generate corresponding sql statements, and different sql statements will be stored in the database for processing.

具体地,第三方全量数据文件在本地sftp服务器不存在,说明全量数据还未同步业务系统,全量数据都可导入业务系统。首先将第三方系统全量数据文件转换为业务系统自己的数据文件。数据导入文件分为四类:新增数据文件、修改数据文件、删除数据文件以及无变化数据文件。第三方全量数据导入文件在本地sftp服务器不存在时,转换的数据导入文件都为新增数据文件。将新增数据文件生成insert sql语句,并将sql语句进行入库处理。Specifically, the third-party full data file does not exist in the local sftp server, which means that the full data has not been synchronized with the business system, and the full data can be imported into the business system. First, convert the full data files of the third-party system into the business system's own data files. Data import files are divided into four categories: new data files, modified data files, deleted data files and unchanged data files. When the third-party full data import file does not exist in the local sftp server, the converted data import files are all new data files. Generate insert sql statements from the newly added data files, and store the sql statements into the database for processing.

在本申请的一个实施例中,在全量数据文件在本地sftp服务器中存在的情况下,以行为单位将全量数据文件划分为多个数据单元,并对多个数据单元分别进行唯一标识标注。将多个数据单元,与历史全量数据文件对应的多个数据单元进行比对,以基于比对结果将全量数据文件保存至不同的数据文件。In one embodiment of the present application, when the full data file exists in the local sftp server, the full data file is divided into multiple data units in units of rows, and each of the multiple data units is uniquely identified. Compare multiple data units with multiple data units corresponding to the historical full data file, so as to save the full data file to different data files based on the comparison results.

在本申请的一个实施例中,在历史全量数据中不存在全量数据文件的情况下,将全量数据文件对应的数据存入新增数据文件。在历史全量数据中存在全量数据文件,且全量数据文件发生修改的情况下,将发生修改的全量数据文件存入修改数据文件。将历史全量数据中存在,但全量数据文件不存在的数据,存入删除数据文件。将历史全量数据与全量数据文件均存在的数据,存入无变化数据文件。In one embodiment of the present application, when there is no full data file in the historical full data, the data corresponding to the full data file is stored in the new data file. When a full data file exists in the historical full data and the full data file is modified, the modified full data file is stored in the modified data file. The data that exists in the historical full data but does not exist in the full data file is stored in the deleted data file. Store the data that exists in both the historical full data and the full data file into the unchanged data file.

具体地,第三方全量数据文件在本地sftp服务器存在时,第三方全量数据文件与历史第三方全量数据文件比对,比对出四部分内容:新增部分数据、删除部分数据、修改部分数据数据以及无变化部分数据。Specifically, when the third-party full data file exists in the local sftp server, the third-party full data file is compared with the historical third-party full data file, and four parts of the content are found: adding partial data, deleting partial data, and modifying partial data. and unchanged partial data.

数据比对有个前提条件,将数据文件以行为单位作为数据单元,并且有唯一数据标识。当前第三方全量数据会与历史第三方全量数据逐行对比,在历史全量数据中不存在全量数据文件的情况下,将全量数据文件对应的数据存入新增数据文件。在历史全量数据中存在全量数据文件,且全量数据文件发生修改的情况下,将发生修改的全量数据文件存入修改数据文件。将历史全量数据中存在,但全量数据文件不存在的数据,存入删除数据文件。将历史全量数据与全量数据文件均存在的数据,存入无变化数据文件。There is a prerequisite for data comparison, which is that the data file should be used as a data unit in behavioral units and have a unique data identifier. The current third-party full data will be compared line by line with the historical third-party full data. If there is no full data file in the historical full data, the data corresponding to the full data file will be stored in the new data file. When a full data file exists in the historical full data and the full data file is modified, the modified full data file is stored in the modified data file. The data that exists in the historical full data but does not exist in the full data file is stored in the deleted data file. Store the data that exists in both the historical full data and the full data file into the unchanged data file.

S104、对不同的数据文件生成不同的sql语句,将不同的sql语句进行入库处理,以完成全量数据同步。S104. Generate different SQL statements for different data files, and store the different SQL statements into the database for complete data synchronization.

在本申请的一个实施例中,将新增数据文件生成insert sql语句;以及将修改数据文件生成update sql语句;以及将删除数据文件生成delete语句。In one embodiment of the present application, an insert sql statement is generated for adding a new data file; an update sql statement is generated for modifying a data file; and a delete statement is generated for deleting a data file.

具体地,新增数据文件生成insert sql语句;修改数据文件生成update sql语句;删除数据文件生成delete语句;无变化数据文件不做处理。执行生成的sql语句入库,全量数据同步完成。Specifically, adding a new data file generates an insert sql statement; modifying a data file generates an update sql statement; deleting a data file generates a delete statement; data files without changes are not processed. The generated SQL statements are executed and stored in the database, and all data are synchronized.

本申请实施例文件传输协议适用于传输大问题,全量数据同步时,以文件为载体可以保证数据的传输效率。其次,本申请实施例通过将本地sftp服务器中的全量数据文件划分为多个数据单元,并将多个数据单元与历史全量数据文件进行比对,可以确定出与历史全量数据文件相同或不同的数据。全量数据文件解析为新增、修改以及删除等多种数据文件并最终转化为sql语句,将不同的数据进行不同保存,以解决全量数据冗余的问题。The file transfer protocol in the embodiment of this application is suitable for large-scale transmission problems. When all data is synchronized, using files as a carrier can ensure data transmission efficiency. Secondly, the embodiment of the present application divides the full data file in the local sftp server into multiple data units, and compares the multiple data units with the historical full data files to determine whether they are the same as or different from the historical full data files. data. The full data file is parsed into multiple data files such as new, modified, and deleted data files and finally converted into SQL statements. Different data are saved differently to solve the problem of full data redundancy.

图2为本申请实施例提供的一种全量数据同步装置组成框图。如图2所示,全量数据同步装置包括第三方系统、业务系统、数据库、第三方sftp文件服务器以及本地sftp文件服务器。第三方系统发送http全量数据同步请求,告知全量数据文件位置。第三方系统将全量数据文件存储在第三方sftp文件服务器上。业务系统在接收到全量数据同步请求后,在第三方sftp文件服务器上拉取全量数据文件,并将拉取到的第三方全量数据文件存储到本地sftp文件服务器上。通过对本地sftp文件服务器上的全量数据进行解析,确定出新增、修改以及删除数据文件,将不同的文件解析为sql语句,执行生成的sql语句入库,全量数据同步完成。Figure 2 is a block diagram of a full data synchronization device provided by an embodiment of the present application. As shown in Figure 2, the full data synchronization device includes third-party systems, business systems, databases, third-party sftp file servers, and local sftp file servers. The third-party system sends an HTTP full data synchronization request to inform the full data file location. The third-party system stores all data files on the third-party sftp file server. After receiving the full data synchronization request, the business system pulls the full data file on the third-party sftp file server, and stores the pulled third-party full data file on the local sftp file server. By analyzing the full amount of data on the local sftp file server, new, modified and deleted data files are determined, different files are parsed into SQL statements, the generated SQL statements are executed and stored in the database, and all data are synchronized.

图3为本申请实施例提供的一种全量数据同步装置结构示意图,如图3所示,全量数据同步装置300包括全量数据同步请求获取单元301、全量数据文件获取单元302、全量数据比对单元303以及sql语句生成单元304。Figure 3 is a schematic structural diagram of a full data synchronization device provided by an embodiment of the present application. As shown in Figure 3, the full data synchronization device 300 includes a full data synchronization request acquisition unit 301, a full data file acquisition unit 302, and a full data comparison unit. 303 and sql statement generation unit 304.

全量数据同步请求获取单元301,获取第三方系统发送的全量数据同步请求;The full data synchronization request acquisition unit 301 obtains the full data synchronization request sent by the third-party system;

全量数据文件获取单元302,基于所述全量数据请求确定出全量数据文件位置,并将所述全量数据文件拉取至本地sftp服务器;The full data file acquisition unit 302 determines the location of the full data file based on the full data request, and pulls the full data file to the local sftp server;

全量数据比对单元303,将本地sftp服务器中的所述全量数据文件划分为多个数据单元,将每个数据单元与历史全量数据文件进行比对,以基于比对结果将所述全量数据文件保存至不同的数据文件;The full data comparison unit 303 divides the full data file in the local sftp server into multiple data units, compares each data unit with the historical full data file, and compares the full data file based on the comparison result. Save to different data files;

sql语句生成单元304,对所述不同的数据文件生成不同的sql语句,将所述不同的sql语句进行入库处理,以完成全量数据同步。The sql statement generation unit 304 generates different sql statements for the different data files, and stores the different sql statements into the database to complete full data synchronization.

进一步的,装置还包括:Furthermore, the device also includes:

全量数据文件存放单元,确定出所述全量数据文件对应的推送频率;基于所述推送频率设置全量文件本地存放目录;基于所述推送频率将所述全量数据文件存放至不同的本地存放目录下,并定期对本地存放目录下的全量数据文件进行备份。The full data file storage unit determines the push frequency corresponding to the full data file; sets a local storage directory for the full file based on the push frequency; stores the full data file in different local storage directories based on the push frequency, And regularly back up all data files in the local storage directory.

进一步的,装置还包括:Furthermore, the device also includes:

新增数据文件处理单元,在所述全量数据文件在所述本地sftp服务器中不存在的情况下,确定所述全量数据文件为新增数据文件;将所述新增数据文件生成相应的的sql语句,将所述不同的sql语句进行入库处理。The new data file processing unit determines that the full data file is a new data file when the full data file does not exist in the local sftp server; and generates the corresponding sql for the new data file. statement, and store the different sql statements into the database for processing.

进一步的,装置还包括:Furthermore, the device also includes:

全量数据文件比对单元,在所述全量数据文件在所述本地sftp服务器中存在的情况下,以行为单位将所述全量数据文件划分为多个数据单元,并对所述多个数据单元分别进行唯一标识标注;将所述多个数据单元,与历史全量数据文件对应的多个数据单元进行比对,以基于比对结果将所述全量数据文件保存至不同的数据文件。The full data file comparison unit, when the full data file exists in the local sftp server, divides the full data file into multiple data units in units of rows, and compares the multiple data units respectively. Perform unique identification annotation; compare the multiple data units with multiple data units corresponding to the historical full data files, so as to save the full data files to different data files based on the comparison results.

进一步的,装置还包括:Furthermore, the device also includes:

全量数据文件划分单元,在所述历史全量数据中不存在所述全量数据文件的情况下,将所述全量数据文件对应的数据存入新增数据文件;在所述历史全量数据中存在所述全量数据文件,且所述全量数据文件发生修改的情况下,将发生修改的全量数据文件存入修改数据文件;将所述历史全量数据中存在,但所述全量数据文件不存在的数据,存入删除数据文件;将所述历史全量数据与所述全量数据文件均存在的数据,存入无变化数据文件。A full data file dividing unit, when the full data file does not exist in the historical full data, stores the data corresponding to the full data file into a new data file; if the full data file exists in the historical full data Full data file, and when the full data file is modified, the modified full data file is stored in the modified data file; the data that exists in the historical full data but does not exist in the full data file is stored in Enter the deleted data file; store the data that exists in both the historical full data and the full data file into an unchanged data file.

进一步的,装置还包括:Furthermore, the device also includes:

sql语句生成单元,将新增数据文件生成insert sql语句;以及将修改数据文件生成update sql语句;以及将删除数据文件生成delete语句。The sql statement generation unit generates insert sql statements for adding new data files; generates update sql statements for modifying data files; and generates delete statements for deleting data files.

进一步的,装置还包括:Furthermore, the device also includes:

全量数据文件拉取单元,通过预置用户名以及预置密码登录第三方系统sftp服务器;基于所述全量数据请求,在所述第三方系统sftp服务器中确定出所述全量数据文件的位置;将所述全量数据文件拉取至本地sftp服务器。The full data file pulling unit logs in to the third-party system sftp server through a preset user name and a preset password; based on the full data request, determines the location of the full data file in the third-party system sftp server; The full data file is pulled to the local sftp server.

图4为本申请实施例提供的一种全量数据同步设备的结构示意图。如图4所示,全量数据同步设备,包括:Figure 4 is a schematic structural diagram of a full data synchronization device provided by an embodiment of the present application. As shown in Figure 4, full data synchronization equipment includes:

至少一个处理器;以及,at least one processor; and,

与所述至少一个处理器通信连接的存储器;其中,a memory communicatively connected to the at least one processor; wherein,

所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够:The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to:

获取第三方系统发送的全量数据同步请求;Obtain the full data synchronization request sent by the third-party system;

基于所述全量数据请求确定出全量数据文件位置,并将所述全量数据文件拉取至本地sftp服务器;Determine the location of the full data file based on the full data request, and pull the full data file to the local sftp server;

将本地sftp服务器中的所述全量数据文件划分为多个数据单元,将每个数据单元与历史全量数据文件进行比对,以基于比对结果将所述全量数据文件保存至不同的数据文件;Divide the full data file in the local sftp server into multiple data units, compare each data unit with historical full data files, and save the full data files to different data files based on the comparison results;

对所述不同的数据文件生成不同的sql语句,将所述不同的sql语句进行入库处理,以完成全量数据同步。Different sql statements are generated for the different data files, and the different sql statements are stored in the database to complete full data synchronization.

本申请实施例还提供一种非易失性计算机存储介质,存储有计算机可执行指令,所述计算机可执行指令设置为:Embodiments of the present application also provide a non-volatile computer storage medium that stores computer-executable instructions, and the computer-executable instructions are set to:

获取第三方系统发送的全量数据同步请求;Obtain the full data synchronization request sent by the third-party system;

基于所述全量数据请求确定出全量数据文件位置,并将所述全量数据文件拉取至本地sftp服务器;Determine the location of the full data file based on the full data request, and pull the full data file to the local sftp server;

将本地sftp服务器中的所述全量数据文件划分为多个数据单元,将每个数据单元与历史全量数据文件进行比对,以基于比对结果将所述全量数据文件保存至不同的数据文件;Divide the full data file in the local sftp server into multiple data units, compare each data unit with historical full data files, and save the full data files to different data files based on the comparison results;

对所述不同的数据文件生成不同的sql语句,将所述不同的sql语句进行入库处理,以完成全量数据同步。Different sql statements are generated for the different data files, and the different sql statements are stored in the database to complete full data synchronization.

本申请中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置、设备、非易失性计算机存储介质实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。Each embodiment in this application is described in a progressive manner. The same and similar parts between the various embodiments can be referred to each other. Each embodiment focuses on its differences from other embodiments. In particular, for the device, equipment, and non-volatile computer storage medium embodiments, since they are basically similar to the method embodiments, the descriptions are relatively simple. For relevant details, please refer to the partial description of the method embodiments.

上述对本申请特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。The above has described specific embodiments of the present application. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desired results. Additionally, the processes depicted in the figures do not necessarily require the specific order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain implementations.

以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请的实施例可以有各种更改和变化。凡在本申请实施例的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。The above descriptions are only examples of the present application and are not intended to limit the present application. For those skilled in the art, various modifications and changes may be made to the embodiments of the present application. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the embodiments of this application shall be included in the scope of the claims of this application.

Claims (10)

1. A method of full data synchronization, the method comprising:
acquiring a full data synchronization request sent by a third party system;
determining the position of a full-volume data file based on the full-volume data request, and pulling the full-volume data file to a local sftp server;
dividing the full data file in the local sftp server into a plurality of data units, and comparing each data unit with the historical full data file to store the full data file into different data files based on a comparison result;
and generating different sql sentences for the different data files, and carrying out warehousing processing on the different sql sentences to complete full data synchronization.
2. The full-size data synchronization method according to claim 1, wherein after the pulling the full-size data file to the local sftp server, the method further comprises:
determining the pushing frequency corresponding to the full data file;
setting a local storage catalog of the full-quantity file based on the pushing frequency;
and storing the full-volume data files under different local storage directories based on the pushing frequency, and regularly backing up the full-volume data files under the local storage directories.
3. A method of full data synchronization according to claim 1, wherein prior to said dividing said full data file in a local sftp server into a plurality of data units, said method further comprises:
determining that the full data file is a newly added data file under the condition that the full data file does not exist in the local sftp server;
generating corresponding sql sentences from the newly added data file, and carrying out warehousing processing on the different sql sentences.
4. The method for synchronizing full-size data according to claim 1, wherein the dividing the full-size data file in the local sftp server into a plurality of data units, comparing each data unit with a historical full-size data file, and storing the full-size data file to different data files based on the comparison result, specifically comprises:
dividing the full-size data file into a plurality of data units in a row unit under the condition that the full-size data file exists in the local sftp server, and respectively carrying out unique identification marking on the plurality of data units;
and comparing the data units with the data units corresponding to the historical full data file to save the full data file to different data files based on the comparison result.
5. The method for synchronizing full-size data according to claim 4, wherein the comparing the plurality of data units with the plurality of data units corresponding to the historical full-size data file to save the full-size data file to a different data file based on the comparison result comprises:
storing data corresponding to the full data file into a new data file under the condition that the full data file does not exist in the historical full data;
storing the modified full data file into a modified data file when the full data file exists in the historical full data and the full data file is modified;
storing data which exists in the historical full data but does not exist in the full data file into a deleted data file;
and storing the data existing in the historical full data and the full data file into a unchanged data file.
6. The full data synchronization method according to claim 1, wherein the generating different sql statements for the different data files, and performing a warehouse entry process on the different sql statements to complete full data synchronization, specifically includes:
generating an insert sql statement from the newly added data file; and
generating an update sql statement from the modified data file; and
the delete data file is generated into delete statement.
7. The full-scale data synchronization method according to claim 1, wherein determining a full-scale data file location based on the full-scale data request and pulling the full-scale data file to a local sftp server specifically comprises:
logging in a third party system sftp server through a preset user name and a preset password;
determining the position of the full-volume data file in the third-party system sftp server based on the full-volume data request;
and pulling the full data file to a local sftp server.
8. A full-scale data synchronization apparatus, comprising:
the full-volume data synchronization request acquisition unit acquires a full-volume data synchronization request sent by a third party system;
the full-volume data file acquisition unit is used for determining the position of the full-volume data file based on the full-volume data request and pulling the full-volume data file to a local sftp server;
the full data comparison unit divides the full data file in the local sftp server into a plurality of data units, compares each data unit with the historical full data file, and stores the full data file into different data files based on a comparison result;
and the sql statement generating unit generates different sql statements for the different data files, and performs warehousing processing on the different sql statements to complete full data synchronization.
9. A full-volume data synchronization device, comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to:
acquiring a full data synchronization request sent by a third party system;
determining the position of a full-volume data file based on the full-volume data request, and pulling the full-volume data file to a local sftp server;
dividing the full data file in the local sftp server into a plurality of data units, and comparing each data unit with the historical full data file to store the full data file into different data files based on a comparison result;
and generating different sql sentences for the different data files, and carrying out warehousing processing on the different sql sentences to complete full data synchronization.
10. A non-transitory computer storage medium storing computer-executable instructions configured to:
acquiring a full data synchronization request sent by a third party system;
determining the position of a full-volume data file based on the full-volume data request, and pulling the full-volume data file to a local sftp server;
dividing the full data file in the local sftp server into a plurality of data units, and comparing each data unit with the historical full data file to store the full data file into different data files based on a comparison result;
and generating different sql sentences for the different data files, and carrying out warehousing processing on the different sql sentences to complete full data synchronization.
CN202310294028.XA 2023-03-21 2023-03-21 Full-data synchronization method, device, equipment and medium Pending CN117194558A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310294028.XA CN117194558A (en) 2023-03-21 2023-03-21 Full-data synchronization method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310294028.XA CN117194558A (en) 2023-03-21 2023-03-21 Full-data synchronization method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN117194558A true CN117194558A (en) 2023-12-08

Family

ID=89002292

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310294028.XA Pending CN117194558A (en) 2023-03-21 2023-03-21 Full-data synchronization method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN117194558A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118394849A (en) * 2024-06-26 2024-07-26 杭州古珀医疗科技有限公司 Method and device for comparing difference of full-scale data in medical field

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108200220A (en) * 2018-04-08 2018-06-22 武汉斗鱼网络科技有限公司 A kind of method of data synchronization, server and storage medium
CN109710584A (en) * 2018-12-20 2019-05-03 浪潮软件集团有限公司 A method and device for realizing two-way synchronization of files by using a cloud message service platform
CN113821517A (en) * 2021-11-23 2021-12-21 太平金融科技服务(上海)有限公司深圳分公司 Data synchronization method, device, equipment and storage medium
CN115062090A (en) * 2022-06-23 2022-09-16 平安银行股份有限公司 Data synchronization method, system, device and computer readable storage medium
CN115757311A (en) * 2022-10-26 2023-03-07 南银法巴消费金融有限公司 Method, storage medium and device for quasi-real-time file synchronization based on SFTP

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108200220A (en) * 2018-04-08 2018-06-22 武汉斗鱼网络科技有限公司 A kind of method of data synchronization, server and storage medium
CN109710584A (en) * 2018-12-20 2019-05-03 浪潮软件集团有限公司 A method and device for realizing two-way synchronization of files by using a cloud message service platform
CN113821517A (en) * 2021-11-23 2021-12-21 太平金融科技服务(上海)有限公司深圳分公司 Data synchronization method, device, equipment and storage medium
CN115062090A (en) * 2022-06-23 2022-09-16 平安银行股份有限公司 Data synchronization method, system, device and computer readable storage medium
CN115757311A (en) * 2022-10-26 2023-03-07 南银法巴消费金融有限公司 Method, storage medium and device for quasi-real-time file synchronization based on SFTP

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118394849A (en) * 2024-06-26 2024-07-26 杭州古珀医疗科技有限公司 Method and device for comparing difference of full-scale data in medical field

Similar Documents

Publication Publication Date Title
US11468015B2 (en) Storage and synchronization of metadata in a distributed storage system
CN110147411B (en) Data synchronization method, device, computer equipment and storage medium
CN103098024B (en) Member Tracking and Data Eviction in Mobile Middleware Scenarios
CN109413127B (en) A data synchronization method and device
CN105243067B (en) A kind of method and device for realizing real-time incremental synchrodata
US9218405B2 (en) Batch processing and data synchronization in cloud-based systems
CN105373448B (en) The restoration methods and system of fault data in database
CN102272751B (en) Data integrity in a database environment through background synchronization
CN107977396B (en) A kind of updating method and table data updating device of data table of KeyValue database
CN110858194A (en) Method and device for expanding database
CN112948494B (en) Data synchronization method, device, electronic device and computer readable medium
CN111177159B (en) Data processing system and method and data updating equipment
CN110399348A (en) File deduplication method, device, system, and computer-readable storage medium
CN107562882A (en) A kind of method of data synchronization and device based on log analysis
CN117194558A (en) Full-data synchronization method, device, equipment and medium
CN112000678A (en) Data synchronization method, device, server and storage medium
CN111708835A (en) Blockchain data storage method and device
CN114116675B (en) Data archiving method and device
CN113760600B (en) Database backup method, database restoration method and related devices
CN110543520B (en) Data migration method and device
CN106407320B (en) File processing method, device and system
CN113626473A (en) Data information query method and device
WO2025086860A1 (en) Data table processing method and apparatus, computer device, and readable storage medium
CN117708239A (en) A method, system and electronic device for paging synchronized data
CN111562936A (en) Object history version management method and device based on Openstack-Swift

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination