[go: up one dir, main page]

CN113342750A - File data comparison method, device, equipment and storage medium - Google Patents

File data comparison method, device, equipment and storage medium Download PDF

Info

Publication number
CN113342750A
CN113342750A CN202110724780.4A CN202110724780A CN113342750A CN 113342750 A CN113342750 A CN 113342750A CN 202110724780 A CN202110724780 A CN 202110724780A CN 113342750 A CN113342750 A CN 113342750A
Authority
CN
China
Prior art keywords
file
data
files
sorted
difference
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110724780.4A
Other languages
Chinese (zh)
Other versions
CN113342750B (en
Inventor
徐继盛
万磊
李毅
钱进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN202110724780.4A priority Critical patent/CN113342750B/en
Publication of CN113342750A publication Critical patent/CN113342750A/en
Priority to PCT/CN2021/140732 priority patent/WO2023273235A1/en
Application granted granted Critical
Publication of CN113342750B publication Critical patent/CN113342750B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Technology Law (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请提供一种文件的数据比对方法、文件的数据比对装置、文件的数据比对设备及计算机可读存储介质,其中,方法包括:对获取到的对账文件进行等比拆分,得到N个拆分文件;按照对账文件关联的交易所具有的用户标识,将N个拆分文件分到M个数据分区中;其中,M个数据分区中每一数据分区对应一个用户标识,每一数据分区包含m个子文件;按照交易类型和交易时间信息,对每一数据分区中的m个子文件进行数据清洗归类,并对清洗归类后的所有文件进行等比拆分,得到n个待排序文件;按照交易时间信息,对M个数据分区中的n个待排序文件进行排序,得到n个排序后的文件;基于差异比对算法,对n个排序后的文件进行数据比对。

Figure 202110724780

The present application provides a data comparison method for files, a data comparison device for files, a data comparison device for files, and a computer-readable storage medium, wherein the method includes: splitting the acquired reconciliation files in equal proportions, Obtain N split files; divide the N split files into M data partitions according to the user ID of the exchange associated with the reconciliation file; wherein, each data partition in the M data partitions corresponds to a user ID, Each data partition contains m sub-files; according to the transaction type and transaction time information, the m sub-files in each data partition are cleaned and classified, and all the files after cleaning and classification are divided equally to obtain n according to the transaction time information, sort the n files to be sorted in the M data partitions to obtain n sorted files; based on the difference comparison algorithm, perform data comparison on the n sorted files .

Figure 202110724780

Description

File data comparison method, device, equipment and storage medium
Technical Field
The embodiment of the application relates to the technical field of data processing of financial technology (Fintech), and relates to but is not limited to a file data comparison method, a file data comparison device and a computer readable storage medium.
Background
With the development of computer technology, more and more technologies are applied in the financial field, and the traditional financial industry is gradually changing to financial technology (Fintech), however, the financial technology also puts higher demands on the technology due to the requirements of security and real-time performance of the financial industry.
In the field of financial science and technology, users and transaction amount of transaction products of micro-mass banks are very large, and the problem of checking whether daily transactions of the users are processed correctly or not is solved when hundreds of millions of stock users and single-day transactions are faced. For example, the change-through product under the micro-credit user terminal can initiate currency fund subscription redemption transaction in real time, share held by the user can be processed in real time, transaction processing records can be persistently stored in a database, corresponding reconciliation documents can be generated every day, a subscription day final reconciliation document can be generated for the subscription transaction, and a redemption day final reconciliation document can be generated for the redemption. The account checking file uses a special protocol format, one transaction is recorded in each line, billions of transactions are sent to the WeChat financing system through the account checking file, the WeChat financing system needs to check the content of the account checking file and the data of the real-time transaction record of the user, and the inconsistent data needs to be checked by taking the content of the account checking file as a reference.
In the related art, the account checking is realized by referring to the steps in fig. 1, firstly, directly reading an account checking file, and analyzing the content of each line in the account checking file; secondly, obtaining transaction data in a key field matching database through analysis; finally, several results of the matching are processed. When several matched results are processed, if the account checking file has no transaction record, the database is stored in the transaction record, and the transaction needs to be deleted and returned to be processed. If the account checking file has a transaction record, the database does not have the transaction record, and the transaction needs to be newly added and processed. If the account checking file has a transaction record, the database is stored in the transaction record; at this time, there are two cases, one is that the transaction data is inconsistent and the account checked file is required to be used for processing the transaction, and the other is that the transaction data is consistent and the account checked file is consistent and does not need to be processed. Therefore, in the account checking process, the problems of low processing efficiency and long consumed time exist in the related technology that at least when a large file is read, analysis and processing are directly carried out at the same time.
Disclosure of Invention
The embodiment of the application provides a file data comparison method, a file data comparison device and a computer readable storage medium, and aims to solve the problems that in the account checking process of the related art, at least when a large file is read, analysis and processing are directly carried out at the same time, the processing efficiency is low, and the consumed time is long.
The technical scheme of the embodiment of the application is realized as follows:
the embodiment of the application provides a data comparison method for a file, which comprises the following steps:
carrying out equal ratio splitting on the obtained account checking files to obtain N split files;
dividing the N split files into M data partitions according to the user identification of the transaction place associated with the reconciliation file; each data partition in the M data partitions corresponds to one user identifier, and each data partition comprises M sub-files;
according to the transaction type and the transaction time information, performing data cleaning classification on the m sub-files in each data partition, and performing equal ratio splitting on all cleaned and classified files to obtain n files to be sorted;
sorting the n files to be sorted in the M data partitions according to the transaction time information to obtain n sorted files;
and comparing the data of the n sorted files based on a difference comparison algorithm.
A data comparison device for files comprises:
the processing module is used for carrying out equal ratio splitting on the acquired reconciliation files to obtain N split files;
the processing module is used for dividing the N split files into M data partitions according to the user identification of the transaction related to the reconciliation file; each data partition in the M data partitions corresponds to one user identifier, and each data partition comprises M sub-files;
the processing module is used for cleaning and classifying the m sub-files in each data partition according to the transaction type and the transaction time information, and performing equal ratio splitting on all cleaned and classified files to obtain n files to be sorted;
the processing module is used for sequencing the n files to be sequenced in the M data partitions according to the transaction time information to obtain n sequenced files;
and the account checking module is used for comparing the data of the n sorted files based on a difference comparison algorithm.
An embodiment of the present application provides an apparatus, including:
a memory for storing executable instructions; a processor, when executing executable instructions stored in the memory, implements the method described above.
Embodiments of the present application provide a computer-readable storage medium storing executable instructions for causing a processor to implement the above-mentioned method when executed.
The embodiment of the application has the following beneficial effects:
obtaining N split files by performing equal ratio splitting on the obtained account checking files; dividing the N split files into M data partitions according to the user identification of the transaction place associated with the reconciliation file; each data partition in the M data partitions corresponds to one user identifier, and each data partition comprises M sub-files; according to the transaction type and the transaction time information, performing data cleaning classification on m sub-files in each data partition, and performing equal ratio splitting on all cleaned and classified files to obtain n files to be sorted; sorting n files to be sorted in the M data partitions according to the transaction time information to obtain n sorted files; based on a difference comparison algorithm, performing data comparison on the n sorted files; that is to say, this application is split to the account checking file earlier, realizes big file fragmentation analysis processing for the processing performance has been accelerated, and further, arranges in order to the file in the subregion, has improved the precision of file processing, has avoided directly handling unordered file great probability to lead to handling the phenomenon of failing.
Drawings
Fig. 1 is a schematic diagram of a reconciliation flow in the related art;
FIG. 2 is a schematic diagram of an alternative architecture of a server according to an embodiment of the present application;
FIG. 3 is a schematic view of an alternative process of a data comparison method for documents according to an embodiment of the present application;
FIG. 4 is a schematic flowchart of file splitting provided in an embodiment of the present application;
FIG. 5 is a diagram illustrating a result of file splitting provided by an embodiment of the present application;
FIG. 6 is a schematic overall flowchart of a file data comparison method according to an embodiment of the present disclosure;
FIG. 7 is a graph illustrating the results of data cleansing provided by an embodiment of the present application;
FIG. 8 is a diagram illustrating the results of document numbering provided by an embodiment of the present application;
FIG. 9 is a flow chart illustrating sorting of data in file blocks according to an embodiment of the present application;
FIG. 10 is a diagram illustrating the result of ordering data within file blocks according to an embodiment of the present application;
FIG. 11 is a diagram illustrating the result of data sorting between two file blocks according to an embodiment of the present application;
FIG. 12 is a diagram illustrating the result of data sorting among three file blocks according to an embodiment of the present application;
FIG. 13 is a schematic diagram of data ordering between two different numbered files provided by an embodiment of the present application;
FIG. 14 is a schematic diagram of a process for exporting a file from a database according to an embodiment of the present application;
FIG. 15 is a diagram illustrating the results of exporting a file from a database provided by an embodiment of the present application;
FIG. 16 is a schematic flow chart of exporting a file from a database according to an embodiment of the present application;
FIG. 17 is a schematic diagram of reconciliation files with database files in different partitions according to an embodiment of the present application;
FIG. 18 is a schematic diagram illustrating a comparison between a reconciliation file and a database file provided by an embodiment of the present application;
FIG. 19 is a diagram illustrating the result of de-duplicating the difference file between the reconciliation file and the database file provided by an embodiment of the present application;
FIG. 20 is a schematic flowchart illustrating a process of de-duplicating a difference file between a reconciliation file and a database file according to an embodiment of the present application;
fig. 21 is a schematic flowchart of removing duplicate file blocks by calculating sha1 values according to an embodiment of the present application;
FIG. 22 is a schematic diagram of information related to key-value pairs of associated data provided by an embodiment of the present application;
fig. 23 is a schematic flowchart of reconciliation provided in an embodiment of the present application.
Detailed Description
In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the embodiments of the present application belong. The terminology used in the embodiments of the present application is for the purpose of describing the embodiments of the present application only and is not intended to be limiting of the present application.
An exemplary application of the data comparison device of the file provided in the embodiment of the present application is described below, and the data comparison device of the file provided in the embodiment of the present application may be implemented as any terminal having an on-screen display function, such as a notebook computer, a tablet computer, a desktop computer, a mobile device (e.g., a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, a portable game device), an intelligent robot, and the like, and may also be implemented as a server. In the following, an exemplary application will be described when the data alignment apparatus of the file is implemented as a server.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a server 100 according to an embodiment of the present application, where the server 100 shown in fig. 2 includes: at least one processor 110, at least one network interface 120, a user interface 130, and memory 150. The various components in server 100 are coupled together by a bus system 140. It is understood that the bus system 140 is used to enable connected communication between these components. The bus system 140 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 140 in fig. 2.
The Processor 110 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.
The user interface 130 includes one or more output devices 131, including one or more speakers and/or one or more visual display screens, that enable the presentation of media content. The user interface 130 also includes one or more input devices 132 including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.
The memory 150 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 150 optionally includes one or more storage devices physically located remotely from processor 110. The memory 150 includes volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 150 described in embodiments herein is intended to comprise any suitable type of memory. In some embodiments, memory 150 is capable of storing data to support various operations, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below.
An operating system 151 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;
a network communication module 152 for communicating to other computing devices via one or more (wired or wireless) network interfaces 120, exemplary network interfaces 120 including: bluetooth, wireless-compatibility authentication (Wi-Fi), and Universal Serial Bus (USB), etc.;
an input processing module 153 for detecting one or more user inputs or interactions from one of the one or more input devices 132 and translating the detected inputs or interactions.
In some embodiments, the apparatus provided in the embodiments of the present application may be implemented in software, and fig. 2 illustrates a data alignment apparatus 154 of a file stored in the storage 150, where the data alignment apparatus 154 of the file may be a data alignment apparatus of a file in the server 100, which may be software in the form of programs and plug-ins, and includes the following software modules: processing module 1541, reconciliation module 1542, which are logical and therefore can be combined arbitrarily or further split depending on the functionality implemented. The functions of the respective modules will be explained below.
In other embodiments, the apparatus provided in the embodiments of the present Application may be implemented in hardware, and for example, the apparatus provided in the embodiments of the present Application may be a processor in the form of a hardware decoding processor, which is programmed to execute the data comparison method of the file provided in the embodiments of the present Application, for example, the processor in the form of the hardware decoding processor may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field-Programmable Gate arrays (FPGAs), or other electronic components.
The data comparison method for the file provided by the embodiment of the present application will be described below with reference to an exemplary application and implementation of the server 100 provided by the embodiment of the present application. Referring to fig. 3, fig. 3 is an alternative flowchart of a data comparison method for documents provided in the embodiment of the present application, which will be described with reference to the steps shown in fig. 3,
step S201, performing geometric splitting on the acquired reconciliation file to obtain N split files.
In the embodiment of the application, under the condition that the reconciliation file is obtained, the reconciliation file, namely the large file, is divided into the subfiles according to the block equal ratio according to the large file fragment analysis processing algorithm to obtain N divided files, and the divided files are finished by line breaks. Here, splitting the account checking file into subfiles can use the advantages of parallel computing of a distributed system, and meanwhile, each subfile is processed, so that the processing performance is accelerated.
In other embodiments of the present application, referring to fig. 4, if the reconciliation file is relatively small, for example, a file smaller than 10MB, the reconciliation file does not need to be split, and when the reconciliation is to be performed, the difference comparison algorithm provided by the present application is directly used for data comparison. In general, reconciliation files are large files, for example, files larger than 10MB, and the large files are split into N sub-files by an equal ratio, and then wait for further processing, for example, data partitioning, which is file partitioning according to customer dimensions, as described below.
Step S202, dividing the N split files into M data partitions according to the user identification of the transaction related to the account checking file.
Each data partition in the M data partitions corresponds to one user identifier, and each data partition comprises M sub-files. Here, the user identifier assigned to the user by the server, for example, an account number, is associated with a partition number of the data partition, so that the N split files can be divided into M data partitions according to the customer dimension.
In the embodiment of the application, when the user registers the account, the server generates a globally unique account Identifier (ID), and the account ID includes the partition number to which the user belongs. For example, the 16-digit account ID of a user is 0010000000000001, the first three 001 digits are partition numbers, the last 13 digits are the auto-increment sequence under the current partition, and the account is needed for each transaction operation of the user.
The system is deployed on a server and is partitioned according to clients, for example, the existing 40 partitions are partitioned, when different clients register accounts in a micro-banking bank, the different clients register one partition in the 40 partitions according to preset rules. Illustratively, referring to FIG. 5, an illustration of partitioning a split file of the N split files into three data partitions of the M data partitions, each of the three data partitions having partition numbers 001, 002, and 003, is shown in FIG. 5.
That is to say, according to the method and the device, after the obtained reconciliation file is split in an equal ratio to obtain N split files, each split file in the N split files is read and analyzed line by line, file partitioning is carried out according to a system partition to which a user account belongs to generate intermediate fragment subfiles, and each partition generates a plurality of subfile fragment sets. Here, the data in the N files needs to be divided into M partitions, and each of the fragmented files under the partitions is also stored in a certain size, which is 10MB as an example. It should be noted that the files generated by the file partition in this step are unordered, and only one data split is performed according to the partition to which the user account belongs, after the partition is reached, data is written to one file first, and if the file is larger than the set size value, a second file is newly written until all the data are written into the file of the designated partition.
Step S203, according to the transaction type and the transaction time information, performing data cleaning classification on the m sub-files in each data partition, and performing equal ratio splitting on all cleaned and classified files to obtain n files to be sorted.
The method considers that the account checking files are generally unordered, the account checking files are directly processed with larger probability failure, the account checking files need to be analyzed and processed for two times or more, however, the processing sequence of the transaction types is required, therefore, according to the method, the M sub-files in each data partition of the M partitions are cleaned and classified according to the transaction types and the transaction time information, all the cleaned and classified files are split in an equal ratio mode, n files to be sorted are obtained, and here, the data are cleaned and classified according to two factors of the transaction types and the transaction time information, and the sorting efficiency is effectively improved.
And step S204, sequencing the n files to be sequenced in the M data partitions according to the transaction time information to obtain n sequenced files.
In the embodiment of the application, under the condition that n files to be sorted in M data partitions are obtained by cleaning and sorting data, transaction time information is used as a sorting reference dimension to sort the n files to be sorted in the M data partitions, and each file obtained after the n sorted files are sorted is stored according to a certain size, wherein 2MB is taken as an example, so that the accuracy of file processing is improved by sorting the files in the partitions.
Step S205, based on the difference comparison algorithm, the data comparison is performed on the n sorted files.
In an implementation embodiment, referring to fig. 6, fig. 6 shows an overall process of a file data comparison method of the present application, and first, a reconciliation file, i.e., a large file, is split into N sub-files in an equal ratio; and finally, performing service logic data processing such as sorting processing on each subfile in the N subfiles according to the partitions, and further performing data comparison on the processed data. The mode of splitting earlier and then sequencing provided by the application processes the big file, and improves the file reading efficiency and the accuracy of account checking.
According to the file data comparison method, N split files are obtained by performing equal-ratio splitting on the obtained account checking files; dividing the N split files into M data partitions according to the user identification of the transaction place associated with the reconciliation file; each data partition in the M data partitions corresponds to one user identifier, and each data partition comprises M sub-files; according to the transaction type and the transaction time information, performing data cleaning classification on m sub-files in each data partition, and performing equal ratio splitting on all cleaned and classified files to obtain n files to be sorted; sorting n files to be sorted in the M data partitions according to the transaction time information to obtain n sorted files; based on a difference comparison algorithm, performing data comparison on the n sorted files; that is to say, this application is split to the account checking file earlier, realizes big file fragmentation analysis processing for the processing performance has been accelerated, and further, arranges in order to the file in the subregion, has improved the precision of file processing, has avoided directly handling unordered file great probability to lead to handling the phenomenon of failing.
In some embodiments, in step S203, according to the transaction type and the transaction time information, data cleaning and classifying are performed on m sub-files in each data partition, and all cleaned and classified files are split in an equal ratio to obtain n files to be sorted, which may be implemented by the following steps:
a11, reading each subfile in the m subfiles, and traversing the transaction type and transaction time information of each row of data in each subfile.
And A12, processing all the line data in each subfile according to the cleaning classification conditions with the jth transaction type and one hour of transaction time information to obtain all cleaned and classified files.
Wherein the transaction types include a jth transaction type. It should be noted that the data in all the files after the sorting is washed out is unordered. In the embodiment of the application, the data are sorted according to the transaction time.
In the embodiments of the present application, the transaction types include at least procurement and redemption.
And reading each subfile aiming at each partition in the M data partitions, traversing each row, cleaning and classifying the data according to the transaction type and the transaction time range, such as one hour, and storing the data into different files. Illustratively, the data fields of each line of the file in the reconciliation document of the present application are separated by "|", where some key fields are listed, in the following format: the number of the transaction stream | the user account | the transaction type | the transaction date | the transaction time | the transaction amount | the transaction share | the remark.
Referring to FIG. 7, for document processing within a partition, data for different transaction types is deposited into a designated document according to transaction time periods, such as one hour, where the transaction type field 0 is purchase-applied, 1 is redemption, and non-critical fields are replaced with. FIG. 7 shows the sorted documents after cleaning the documents in the partition, including: acquiring 09-hour transaction data, namely transaction data with the transaction type of acquiring and the transaction time of 09 hours; redemption of 09 hours of transaction data, i.e., transaction data with transaction type redeemed and transaction time 09 hours; and redeeming the 10-hour transaction data, i.e., transaction data having a transaction type of redemption and a transaction time of 10 hours.
And A13, carrying out equal ratio splitting on all the cleaned and classified files to obtain W split files.
The W split files comprise W files to be sorted, wherein the W files are provided with jth transaction types and transaction time information is one hour, and the W files corresponding to all transaction types form n files to be sorted.
Here, the file data within each partition is grouped into different files according to transaction type and hour range. After data cleaning and classification, the file data in each partition is collected into different files according to transaction types and hour ranges. The data volume of the transaction may be larger in some hours, and according to the previous file splitting principle, after the file reaches the size of 10MB, the data is split into the second file for storage. There may be w files, i.e. many transaction files, under the same hour file for the same transaction type.
In some embodiments, step S204 ranks n files to be ranked in the M data partitions according to the transaction time information to obtain n ranked files, and may be implemented by the following steps:
a21, numbering each of the w files to obtain a plurality of files numbered 1 through w.
For w files of a certain transaction type and a certain hour, each file is 10MB for example, the w files of one hour are numbered from 1 to w, as shown in fig. 8, the transaction type is purchase and the transaction time is 09 points, w files are corresponding, each file in the w files is numbered, and a plurality of files with the numbers from 1 to w are obtained, which includes: the 09 procurement hours transaction data files 1, 2, 3 … … 09, and w are the 09 procurement hours transaction data files. The data in all files is now unordered. In the present application, the file "{ transaction type } _ transaction period _ file number" is named in the following manner. For example, file name 0_09_000001 for 09 hours was purchased, file name 1_09_000001 for 09 hours was redeemed, and numbering was incremented using 6-bit integers.
A22, reading the file blocks with the preset size in the file with the serial number i in parallel each time aiming at the files with the serial numbers from 1 to w according to the file memory mapping mode to obtain a plurality of file blocks with the same serial number i.
In the embodiment of the application, in the sorting process, files with different numbers in the w files are subjected to parallel sorting processing. Here, the case of sorting the file of the number i, for example, the file of the number 1, is described as an example, and the files of other numbers are sorted in the same manner. For a file size of 10MB for number 1, each time a 2MB block of the file is read, the file number 1 would be equally divided into 5 block reads.
A23, reading a file block k in a plurality of file blocks with the same number i, and analyzing each line of data in the file block k in parallel to obtain the transaction time information of each line of data in the file block k.
Illustratively, reading a file block k in a plurality of file blocks with the same size of the number 1, and analyzing each line of data in the file block k in parallel to obtain transaction time information of each line of data in the file block k, wherein reading the first block of 2MB of data, analyzing each line of data line by line, and obtaining the transaction time as a basis for sorting.
And A24, if the (i + 1) th line of data in the file block k is read, comparing the (i + 1) th line of data with the previous (i) th line of data, determining the target position of the (i + 1) th line of data in the file block k, and inserting the (i + 1) th line of data into the target position to obtain the sorted file block k.
The transaction time of the (i + 1) th line of data at the target position in the sorted file blocks k is after the transaction time of the (i) th line of data at the previous adjacent position of the target position and before the transaction time of the (i + 2) th line of data at the next adjacent position of the target position.
Here, for the file block k, each line of data is read, a position equal to or more than the preceding time and less than the following time is found as compared with the preceding data of the line of data read in the file block k, the line of data is inserted into the position, and the following data is shifted backward by one line. And rewriting the first 2MB file block of the file with the number 1 of the sorted file block to realize the sorting of the file block k.
Illustratively, referring to FIG. 9, for file block 1, file block 1 contains 6 rows of data, and after reading the first row of data, the first row of data is compared only with the next row of data, and the location of the first row of data is unchanged because 090002 is smaller than the next row 092005; after reading the second line of data, 092005 is longer than 090002, 092005 is still longer than 090102 when compared with the data of the next line, which indicates that the position of the line of data corresponding to 092005 should be exchanged with the position of the line of data corresponding to 090102, after the exchange, the data of the third line is read, 092005 is longer than 090102 and shorter than 094002, the subsequent lines of data are continuously read for sorting until the time corresponding to each line of data in the file block 1 is longer than or equal to the time before and shorter than the time after, the sorting for the file block 1 is completed, and the sorted file block 1 is as shown in fig. 10.
In the embodiment of the present application, a plurality of sorted file blocks are obtained by sorting each file block k in a plurality of file blocks with the same number i, for example, 5 sorted file blocks are obtained for file block 1.
And aiming at the file blocks 2, 3, 4 and 5 with the number 1, sequencing by adopting the same sequencing method as the file 1, rewriting the file with the number i in each sequenced file block k, and persisting the inner arrangement result on a disk to realize the sequencing aiming at each file block.
A25, based on the sorting mode of multi-row matching, sorting all the file blocks with the serial number i after sorting to obtain all the file blocks with the serial number i after sorting.
In the embodiment of the present application, a24 obtains file blocks that are respectively sorted for file block k, and further, sorts the file blocks by using multi-row matching, so as to realize sorting among a plurality of file blocks with number i.
Here, for the sorting between file block 1 and file block 2, file block 1 and file block 2 are read, m consecutive rows of file block 2 (where m may be 1) are found at positions in file block 1 that are greater than or equal to the front time and less than the rear time, the m rows are inserted into the positions, the rear rows are moved backward from the first file block, the last m rows of file block 1 are moved backward by m rows toward file block 2, the m rows moved backward to file block 2 are also compared in the opposite direction, and the positions moved to the corresponding sorting of file block 2 are placed, so that the purpose of sorting two file blocks is achieved.
Illustratively, referring to fig. 11, reading file block 1 and file block 2, finding the position of 2 consecutive lines of file block 2 in file block 1, which is greater than or equal to the previous time and less than the next time, i.e. the position of 4 th-5 th lines in file block 1, inserting 2 consecutive lines of file block 2 into this position, and moving the next lines backwards from the first file block, and moving 7 th-8 th lines in file block 1 downwards to file block 2 by m lines, and moving 2 lines downwards to file block 2 are also compared in reverse direction, finding the position of 7 th-8 th lines in file block 1, which is greater than or equal to the previous time and less than the next time in file block 2, for example, the 7 th line after moving in file block 1 should be inserted into the 4 th line in file block 2, the 8 th line after moving in file block 1 should be inserted into the 6 th line in file block 2, and moving to the position of the file block 2 corresponding to the sorting and putting the file block into the sorting device to realize the purpose of sorting the two file blocks.
Thus, file block 1 and file block 2 are sorted again in memory, and the sorted results are written back to the first and second 2MB file blocks numbered 1.
Similarly, referring to fig. 12, the file block 3 is compared with the file block 1 and the file block 2 respectively, a suitable position is found, m rows in the file block 3 are moved to a suitable position of the file block 1 or the file block 2, correspondingly, the redundant m rows after insertion are moved down to the file block 2 or the file block 3, if the redundant m rows are moved to the file block 2, the residual m rows after sorting are continuously moved down to the file block 3, and finally, the purpose of sorting the three file blocks is achieved.
File blocks 4 and 5 are also processed in the same way, the file blocks 4 are respectively compared with the file blocks 1, 2 and 3 to select proper positions for insertion sorting, and the file blocks 5 are respectively compared with the file blocks 1, 2, 3 and 4 to select proper positions for insertion sorting. The ordering between the last number 1 file blocks is complete.
A26, sorting the files with numbers from 1 to w based on the sorting mode of multi-line matching to obtain n sorted files.
In the embodiment of the application, the files with the numbers from 2 to w are sorted by using the sorting mode of the files with the numbers from 1, the respective sorting of the files with different numbers can be processed in parallel, and the sorting is realized by using technologies such as multithreading or distributed clustering, and the like, so that the w files are sorted independently.
Further, in the method for expanding the number 1 ordering, for the ordering of a plurality of file blocks, as shown in fig. 13, p rows of data of the file block 1 of the file number 2 are compared with the file block 1 of the file number 1, if there is a suitable position, the p rows of data are inserted into the position, then all the file blocks of the file number 1 are shifted down by the p rows of data, and the file blocks are shifted into the file number 2, and the proper position is continuously found and stored in a reverse direction. And repeating the steps circularly, and sequencing all data rows of all file blocks of the file number 1 and the file number 2.
Further, the file numbers 3 to n are also operated in this way, and the ordering of the files for the subscription 09 hours is finally completed by comparing with the file numbers 1 and 2.
In some embodiments, the step S205 performs data comparison on the n sorted files based on the difference comparison algorithm, which may be implemented by the following steps:
a31, exporting the file i from the database according to the first row transaction time field and the last row transaction field of all the file blocks with the serial number i after sequencing.
And all the sorted file blocks with the number i have the same data partition identification as the database file i.
In the embodiment of the application, in the account checking process, data is exported from the database according to the same rule of the aforementioned data cleaning, and subfiles are exported according to the framed data type and the transaction time range. The data is exported from the database by first reading the time range of the sorted files in the partition, taking a single file as an example, and directly reading the transaction time fields of the head row and the tail row of the file. Second, transaction data is derived from the ranges.
In the embodiment of the application, in the process of exporting the file from the database, the transaction type can be obtained according to the file name, so that the time range can be framed when the transaction record is exported from the database, and meanwhile, the database script can be directly sequenced.
Further, for exported files, file names may be named by rules, with the database-exported file naming rules prefixed with the "db _" prefix before the preceding reconciliation file name. For example, if the sorted reconciliation file name in the partition is "0 _09_ 000001", the database export file name is "db _0_09_ 000001"
In some embodiments, there are two cases of sorting database export files, the first being that if a partition has only one database, then the database export files in step have already been sorted by rule. Secondly, if a partition uses a plurality of databases, there are a plurality of database export files corresponding to one reconciliation file, for example, referring to fig. 14, three databases are taken as an example, i.e. the reconciliation file name is "0 _09_ 000001", and the database export file name is "db 1_09_ 000001", "db 2_0_09_ 000001", and "db 3_0_09_ 000001". At this time, the data rows of the three files are sorted and combined into one file. The file ordering algorithm used above is used here.
Illustratively, referring to fig. 15 and 16, the database file 1 is exported from the database according to the head line transaction time field 090002 and the tail line transaction time field 095716 of all file blocks numbered 1, and the file name of the exported database file 1 is db _09_ 000001. Here, before the export, the transaction type may also be judged, and if the transaction types do not match, the export is stopped.
A32, calculating the first hash value of all the sorted file blocks with the number i based on the difference comparison algorithm.
In the embodiment of the present application, the difference alignment algorithm includes, but is not limited to, message digest algorithm md 5.
A33, calculating a second hash value of database file i based on the difference comparison algorithm.
In this embodiment of the application, a difference comparison algorithm based on a message digest algorithm is used, for each partition shown in fig. 17, a file comparison difference corresponding to each partition is obtained, and a file with a difference is screened out, referring to fig. 18, a value H is calculated for each file by using a message digest algorithm md5, taking two files 0_09_000001 and db _0_09_000001 as an example, if H calculated by the reconciliation file 0_09_000001 and the database export file db _0_09_000001 are the same, it indicates that the reconciliation is consistent, no processing is required, and the reconciliation is directly eliminated.
And if the comparison between the H calculated by the reconciliation file 0_09_000001 and the database export file db _0_09_000001 is different, the reconciliation is inconsistent, and the next processing is required to be continued.
A34, if the first hash value is different from the second hash value, determining that all the file blocks with the number i after sorting are different from the database file i.
A35, based on the data matching algorithm, removing the same file blocks in all the sorted file blocks with the number i from the database file i, and screening out the first difference file block in the database file i and the second difference file block in all the sorted file blocks with the number i.
Here, after comparing the values of the file md5, the completely consistent file is deduplicated, and the rest are files in which the reconciliation file and the database export file have differences. In a differential file, because the file is sorted, most of the blocks in the consecutive rows may be equal, and the same file block may be removed by using a file block data matching method, leaving a differential portion.
In the embodiment of the application, after the comparison of the md5 values of the files, the completely consistent files are removed, and the files with differences between the reconciliation files and the database export files are left.
And A36, determining difference information between a first difference file block in the database file i and a second difference file block in all the sorted file blocks with the number i, and performing data comparison based on the difference information.
In some embodiments, a35, based on a data matching algorithm, removes the same file block as the file block with the number i from the database file i, and filters out a first difference file block in the database file i and a second difference file block in all the file blocks with the number i, which may be implemented by the following steps:
and A351, if the row number of all the sequenced file blocks with the number i is different from the row number of the data contained in the database file i, removing the head row data and the tail row data of all the sequenced file blocks with the number i and the database file i at least once to obtain all the sequenced file blocks with the number i and the file blocks with the number i removed.
And all the sorted file blocks with the serial number i after the line removal and the file blocks with the serial number i after the line removal of the database file i have the same transaction time interval.
And A352, calculating third hash values of all the sorted file blocks with the serial number i after line removal based on a difference comparison algorithm.
And A353, calculating a fourth hash value of the file block of the database file i after the line is removed based on a difference comparison algorithm.
And A354, if the third hash value is different from the fourth hash value, determining that all the sorted file blocks with the serial number i after line removal are different from the file blocks with the serial number i after line removal.
And A355, based on a data matching algorithm, removing the same file blocks in all the file blocks which are sorted after the line removal of the serial number i from the file blocks which are subjected to the line removal of the database file i, and screening out third difference file blocks in the file blocks which are subjected to the line removal of the database file i and fourth difference file blocks in all the file blocks which are sorted after the line removal of the serial number i.
A356, removing tail data of a fourth difference file block with the number i and a third difference file block of the database file i at least once, and screening out a first difference file block in the database file i and a second difference file block with the number i; and the first difference file block in the database file i and the second difference file block with the number i have the same tail line transaction time.
Referring to fig. 19, 20, and 21, 1), taking account file 0_09_0000002 and database export file db _0_09_000002 as an example, first comparing the number of rows of two files, there may be generally three cases as follows:
in the first case, the number of file lines 0_09_0000002 is greater than db _0_09_ 000002.
In the second case, the number of file lines 0_09_0000002 is equal to db _0_09_ 000002.
In the third case, the number of file lines 0_09_0000002 is less than db _0_09_ 000002.
2) Continuously comparing the first lines of the two files, and removing the line with the smaller transaction time of the first line in the two files
3) Then comparing the two file tail lines, and removing the line with longer transaction time of the tail line in the two files
4) And after the operation of removing the head line and the tail line for multiple times, until the transaction time of the head line of the two files is equal, the transaction time of the tail line of the two files is also equal, and the line number of the data of the two files is equal, using a message digest algorithm sha1 to take and compare sha1 of the two files.
5) If the two sha1 values are equal, the equal file blocks in the two files are removed, leaving the previously excluded rows in the two files. Go to step 7)
6) If the two sha1 values are not equal, the two files remove the end row at the same time, continue to compare the transaction time of the end row, wait again until the transaction time of the end row of the two files is equal and the row number is equal, and go back to the step 5)
7) And circulating the step 1) to the step 6) until no identical file block exists.
Further, two associative arrays may be used to store the sha1 values and data rows of two difference files, respectively.
In some embodiments, a36 determines difference information between a first difference file block in a database file i and a second difference file block in all sorted file blocks with number i, and performs data comparison based on the difference information, which may be implemented by the following steps:
and A361, calculating a fifth hash value of the first difference file block in the database file i based on the difference comparison algorithm, and recording a data row of the first difference file block in the database file i as a value as a first association array of the database file i by taking the fifth hash value as a key.
And A362, calculating a sixth hash value of the second difference file block with the number i based on the difference comparison algorithm, and recording the sixth hash value as a key and the data line of the second difference file block with the number i as a value as a second association array with the number i.
And A363, in each partition, comparing the keys of the first associated array of the database file i with the keys of the second associated array of the number i, and removing the data rows with the same keys in the two associated arrays to obtain a third associated array of the database file i and a fourth associated array of the number i.
Wherein, the key in the third associated array of the database file i is the transaction serial number of each row of data and the value is the data row; the key in the fourth associative array of number i is the transaction serial number for each row of data and the value is the row of data.
Illustratively, as shown in FIG. 22, an associated array of keys as transaction serial numbers for each row of data and values as rows of data is obtained.
In the embodiment of the application, the second associated array of the number i is represented by an associated array a, the first associated array of the database file i is represented by an associated array B, the fourth associated array of the number i is represented by an associated array C, and the third associated array of the database file i is represented by an associated array D.
And A364, determining difference information between the third associated array of the database file i and the fourth associated array of the number i, and performing data comparison based on the difference information.
In some embodiments, a364 determines difference information between the third associated array of the database file i and the fourth associated array of the number i, and performs data comparison based on the difference information, which may be implemented by the following steps:
a3641, if a first key which does not exist in the third associated array of the database file i exists in the fourth associated array of the number i, determining that the difference information represents that the account checking file has data which does not exist in the database, determining a data row corresponding to the fourth associated array of the number i based on the first key, and adding the data row corresponding to the fourth associated array of the number i in the third associated array of the database file i.
A3642, if a second key existing in the third associated array of the database file i does not exist in the fourth associated array of the number i, determining that the difference information represents that the reconciliation file has no data existing in the database, determining a data row corresponding to the third associated array of the database file i based on the second key, and deleting the data row corresponding to the third associated array of the database file i.
A3643, if a third key exists in the third associated array of the database file i in the fourth associated array of the number i, determining that the difference information represents that transaction data exists in both the reconciliation file and the database, and the transaction data are inconsistent, and replacing the data row corresponding to the third key in the fourth associated array of the number i with the data row corresponding to the third key in the fourth associated array of the number i.
In one practical embodiment, referring to fig. 23, the implementation of reconciliation of the present application is further explained with reference to associative array a, associative array B, associative array C and associative array D, where after all files are processed, each partition only has two different associative arrays left after comparing the reconciliation file with the database file. And marking the difference association array generated by the reconciliation file as an association array A, and marking the difference association array generated by the database file as an association array B. By now, there has been little data, but there is still a relatively small chance that the same file line may exist.
Within each partition, two associative arrays are traversed, respectively. And comparing the keys of all the related arrays A with the keys of all the related arrays B, and excluding the data with the same key in the two related arrays. And remapping the surplus data in the two association arrays A and B into a new association array C and an association array D, taking the unique transaction serial number of each row of data as a key, and taking the data row as a value.
Comparing the keys of all associated arrays C with the keys of all associated arrays D, there are three cases:
firstly, if the associated array C exists and the associated array D does not exist, a key which indicates that the account checking file has data which is not available in the database needs to be used for re-associating the data row corresponding to the array C, and the data row corresponding to the array C associated with the key is added into the associated array D, so that the transaction serial number is newly added into the associated array D, and the transaction data is added aiming at the newly added transaction serial number.
Secondly, the association array C does not exist, and the key of the association array D does not exist, which indicates that the account checking file does not have data of the database, the key is needed to be used for re-associating the data row corresponding to the array D, and the data row corresponding to the array D associated with the key is deleted, so that the deletion of redundant and incorrect transaction information in the database is realized.
Thirdly, the existence of the associated array C and the existence of the associated array D in the key indicate that the data in the account file and the database exist, but the transaction data are inconsistent (because the algorithm for consistent data is removed), the data row corresponding to the array C needs to be re-associated by using the key, and the data row corresponding to the array D associated by the key is replaced by the data row corresponding to the array C associated by the key, so that the data in the database is ensured to be consistent with the data in the account file.
According to the file and data difference comparison algorithm, sequencing in a large file fragment analysis processing algorithm and partition design are adopted, parallel calculation of a distributed system is convenient to use, and the matching efficiency of file block data is high; meanwhile, the file processing accuracy is high, and the probability of file processing failure is greatly reduced.
Continuing with the exemplary structure of the data comparison device 154 of the file provided in the embodiment of the present application implemented as a software module, in some embodiments, as shown in fig. 2, the software module in the data comparison device 154 of the file stored in the memory 150 may be a data comparison device of a file in the server 100, including:
the processing module 1541 is configured to split the acquired reconciliation file in an equal ratio to obtain N split files;
the processing module 1541 is configured to divide the N split files into M data partitions according to a user identifier of an exchange associated with the reconciliation file; each data partition in the M data partitions corresponds to one user identifier, and each data partition comprises M sub-files;
the processing module 1541 is configured to perform data cleaning and classification on m sub-files in each data partition according to the transaction type and the transaction time information, and perform equal-ratio splitting on all cleaned and classified files to obtain n files to be sorted;
the processing module 1541 is configured to sort, according to the transaction time information, the n files to be sorted in the M data partitions to obtain n sorted files;
the reconciliation module 1542 is configured to perform data comparison on the n sorted files based on a difference comparison algorithm.
In some embodiments, the processing module 1541 is configured to read each subfile of the m subfiles, and traverse the transaction type and transaction time information of each line of data in each subfile; processing all the line data in each subfile according to cleaning classification conditions with a jth transaction type and one hour of transaction time information to obtain all cleaned and classified files; the transaction types comprise jth transaction types; carrying out equal ratio splitting on all the cleaned and classified files to obtain W split files; the W split files comprise W files to be sorted, wherein the W files are provided with jth transaction types and transaction time information is one hour, and the W files corresponding to all transaction types form n files to be sorted.
In some embodiments, the processing module 1541 is configured to number each file of each w files, to obtain a plurality of files numbered from number 1 to number w; according to a file memory mapping mode, for files with numbers from 1 to w, reading file blocks with preset sizes in the file with the number i in parallel every time to obtain a plurality of file blocks with the same size with the number i; reading a file block k in a plurality of file blocks with the same number i, and analyzing each line of data in the file block k in parallel to obtain transaction time information of each line of data in the file block k; if the (i + 1) th line of data in the file block k is read, comparing the (i + 1) th line of data with the previous (i) th line of data, determining the target position of the (i + 1) th line of data in the file block k, and inserting the (i + 1) th line of data into the target position to obtain a sorted file block k; the transaction time of the (i + 1) th line of data at the target position in the sorted file block k is after the transaction time of the (i) th line of data at the previous adjacent position of the target position and before the transaction time of the (i + 2) th line of data at the next adjacent position of the target position; sequencing all the sequenced file blocks with the number i based on a multi-row matching sequencing mode to obtain all the sequenced file blocks with the number i; and sequencing the files with the numbers from 1 to w based on a sequencing mode of multi-line matching to obtain n sequenced files.
In some embodiments, the reconciliation module 1542 is configured to derive a database file i from the database according to the head line transaction time field and the tail line transaction field of all the file blocks sorted by the number i; all the sequenced file blocks with the number i have the same data partition identification with the database file i; calculating first hash values of all the sorted file blocks with the number i based on a difference comparison algorithm; calculating a second hash value of the database file i based on a difference comparison algorithm; if the first hash value is different from the second hash value, determining that all the file blocks with the serial number i after sequencing are different from the database file i; based on a data matching algorithm, removing the same file blocks in all the sorted file blocks with the number i from the database file i, and screening out first difference file blocks in the database file i and second difference file blocks in all the sorted file blocks with the number i; and determining difference information between a first difference file block in the database file i and a second difference file block in all the sorted file blocks with the serial number i, and comparing data based on the difference information.
In some embodiments, the reconciliation module 1542 is configured to, if the number of rows of the sorted all file blocks with the number i is different from the number of rows of the data included in the database file i, remove the head row data and the tail row data of all the sorted file blocks with the number i and the database file i at least once to obtain all the sorted file blocks with the number i after the line removal and the file blocks with the number i after the line removal of the database file i; all the sorted file blocks with the serial number i after the line removal and the file blocks with the serial number i after the line removal of the database file i have the same transaction time interval; calculating third hash values of all the sorted file blocks with the serial number i after line removal based on a difference comparison algorithm; calculating a fourth hash value of the file block of the database file i after the line is removed based on a difference comparison algorithm; if the third hash value is different from the fourth hash value, determining that all the sorted file blocks with the serial number i after the line removal are different from the file blocks with the serial number i after the line removal; based on a data matching algorithm, removing the same file blocks in all the file blocks with the serial number i after the row removal and the sequencing from the file blocks with the row removal of the database file i, and screening out third difference file blocks in the file blocks with the row removal of the database file i and fourth difference file blocks in all the file blocks with the serial number i after the row removal and the sequencing; removing tail data of a fourth difference file block with the number i and a third difference file block of the database file i at least once, and screening out a first difference file block in the database file i and a second difference file block with the number i; and the first difference file block in the database file i and the second difference file block with the number i have the same tail line transaction time.
In some embodiments, the reconciliation module 1542 is configured to calculate a fifth hash value of the first difference file block in the database file i based on the difference comparison algorithm, and record the fifth hash value as a key and the data row of the first difference file block in the database file i as a value as the first associative array of the database file i; calculating a sixth hash value of the second difference file block with the number i based on a difference comparison algorithm, and recording a data line of the second difference file block with the number i as a value as a second association array of the number i by taking the sixth hash value as a key; in each partition, comparing the keys of the first associated array of the database file i with the keys of the second associated array of the serial number i, and removing the data rows with the same keys in the two associated arrays to obtain a third associated array of the database file i and a fourth associated array of the serial number i; wherein, the key in the third associated array of the database file i is the transaction serial number of each row of data and the value is the data row; keys in a fourth associated array of the serial number i are transaction serial numbers of each row of data, and the values are data rows; and determining difference information between the third associated array of the database file i and the fourth associated array of the number i, and performing data comparison based on the difference information.
In some embodiments, the reconciliation module 1542 is configured to determine, if a first key that does not exist in the third associated array of the database file i exists in the fourth associated array of the number i, that the difference information represents that the reconciliation file has data that does not exist in the database, determine, based on the first key, a data row corresponding to the fourth associated array of the number i, and add, to the third associated array of the database file i, the data row corresponding to the fourth associated array of the number i; if a second key existing in a third associated array of the database file i does not exist in a fourth associated array of the serial number i, determining that the difference information represents that the reconciliation file does not have data existing in the database, determining a data row corresponding to the third associated array of the database file i based on the second key, and deleting the data row corresponding to the third associated array of the database file i; and if a third key exists in the third associated array of the database file i in the fourth associated array of the serial number i, determining that the difference information represents that the reconciliation file and the database have transaction data which are inconsistent, and replacing the data row corresponding to the third key in the fourth associated array of the serial number i with the data row corresponding to the third key in the fourth associated array of the serial number i.
According to the file data comparison device, N split files are obtained by performing equal-ratio splitting on the obtained account checking files; dividing the N split files into M data partitions according to the user identification of the transaction place associated with the reconciliation file; each data partition in the M data partitions corresponds to one user identifier, and each data partition comprises M sub-files; according to the transaction type and the transaction time information, performing data cleaning classification on m sub-files in each data partition, and performing equal ratio splitting on all cleaned and classified files to obtain n files to be sorted; sorting n files to be sorted in the M data partitions according to the transaction time information to obtain n sorted files; based on a difference comparison algorithm, performing data comparison on the n sorted files; that is to say, this application is split to the account checking file earlier, realizes big file fragmentation analysis processing for the processing performance has been accelerated, and further, arranges in order to the file in the subregion, has improved the precision of file processing, has avoided directly handling unordered file great probability to lead to handling the phenomenon of failing.
It should be noted that the description of the apparatus in the embodiment of the present application is similar to the description of the method embodiment, and has similar beneficial effects to the method embodiment, and therefore, the description is not repeated. For technical details not disclosed in the embodiments of the apparatus, reference is made to the description of the embodiments of the method of the present application for understanding.
Embodiments of the present application provide a storage medium storing executable instructions, which when executed by a processor, will cause the processor to execute the method provided by the embodiments of the present application.
In some embodiments, the storage medium may be a computer-readable storage medium, such as a Ferroelectric Random Access Memory (FRAM), a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), a charged Erasable Programmable Read Only Memory (EEPROM), a flash Memory, a magnetic surface Memory, an optical disc, or a Compact disc Read Only Memory (CD-ROM), among other memories; or may be various devices including one or any combination of the above memories.
In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (hypertext Markup Language) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.
The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims (10)

1.一种文件的数据比对方法,其特征在于,包括:1. a data comparison method of a file, is characterized in that, comprises: 对获取到的对账文件进行等比拆分,得到N个拆分文件;Split the obtained reconciliation files in equal proportions to obtain N split files; 按照所述对账文件关联的交易所具有的用户标识,将所述N个拆分文件分到M个数据分区中;其中,所述M个数据分区中每一数据分区对应一个用户标识,所述每一数据分区包含m个子文件;The N split files are divided into M data partitions according to the user identifiers of the transactions associated with the reconciliation files; wherein, each data partition in the M data partitions corresponds to a user identifier, and the Said that each data partition contains m sub-files; 按照交易类型和交易时间信息,对所述每一数据分区中的所述m个子文件进行数据清洗归类,并对清洗归类后的所有文件进行等比拆分,得到n个待排序文件;According to the transaction type and transaction time information, data cleaning and classification is performed on the m sub-files in each data partition, and all files after cleaning and classification are equally divided to obtain n files to be sorted; 按照所述交易时间信息,对所述M个数据分区中的所述n个待排序文件进行排序,得到n个排序后的文件;According to the transaction time information, sort the n files to be sorted in the M data partitions to obtain n sorted files; 基于差异比对算法,对所述n个排序后的文件进行数据比对。Based on the difference comparison algorithm, data comparison is performed on the n sorted files. 2.根据权利要求1中所述的方法,其特征在于,所述按照交易类型和交易时间信息,对所述每一数据分区中的所述m个子文件进行数据清洗归类,并对清洗归类后的所有文件进行等比拆分,得到n个待排序文件包括:2. The method according to claim 1, wherein, according to transaction type and transaction time information, data cleaning and classification are performed on the m sub-files in each data partition, and the cleaning classification is performed. All files after the class are split equally, and n files to be sorted are obtained, including: 读取所述m个子文件中每一子文件,遍历所述每一子文件中的每一行数据的所述交易类型和所述交易时间信息;Reading each sub-file in the m sub-files, and traversing the transaction type and the transaction time information of each row of data in the each sub-file; 对所述每一子文件中的所有行数据,按照具有第j类交易类型且所述交易时间信息为一小时的清洗归类条件进行处理,得到所述清洗归类后的所有文件;其中,所述交易类型包括所述第j类交易类型;Process all the row data in each sub-file according to the cleaning and classification conditions that have the jth transaction type and the transaction time information is one hour, and obtain all the files after the cleaning and classification; wherein, the transaction type includes the jth transaction type; 对所述清洗归类后的所有文件进行等比拆分,得到W个拆分文件;其中,所述W个拆分文件中包括待排序的具有第j类交易类型且所述交易时间信息为一小时的w个文件,所有交易类型对应的w个文件组成所述n个待排序文件。All the files after the cleaning and classification are split in equal proportions, and W split files are obtained; wherein, the W split files include the j-th transaction type to be sorted and the transaction time information is: There are w files in one hour, and the w files corresponding to all transaction types constitute the n files to be sorted. 3.根据权利要求2所述的方法,其特征在于,所述按照所述交易时间信息,对所述M个数据分区中的所述n个待排序文件进行排序,得到n个排序后的文件,包括:3. The method according to claim 2, wherein, according to the transaction time information, the n files to be sorted in the M data partitions are sorted to obtain n sorted files ,include: 对每一所述w个文件中的每一文件进行编号,得到编号1至编号w的多个文件;Numbering each of the w files to obtain a plurality of files numbered 1 to w; 按照文件内存映射方式针对所述编号1至所述编号w中编号i的文件,每次并行读取所述编号i的文件中预设大小的文件块,得到所述编号i的大小相同的多个文件块;According to the file memory mapping method, for the file number i in the number 1 to the number w, read the file block of the preset size in the file number i in parallel each time, and obtain the same size as the number i file block. file blocks; 读取所述编号i的大小相同的多个文件块中的文件块k,并行解析所述文件块k中每一行数据,得到所述文件块k中所述每一行数据的交易时间信息;Read the file block k in the multiple file blocks with the same size of the number i, parse each line of data in the file block k in parallel, and obtain the transaction time information of each line of data in the file block k; 若读取到所述文件块k中第i+1行数据,将所述第i+1行数据与前i行数据进行比较,确定所述第i+1行数据在所述文件块k中的目标位置,并将所述第i+1行数据插入所述目标位置,得到排序后的所述文件块k;其中,所述排序后的所述文件块k中位于所述目标位置的所述第i+1行数据的交易时间,在位于所述目标位置的前一相邻位置的第i行数据的交易时间之后,且在位于所述目标位置的后一相邻位置的第i+2行数据的交易时间之前;If the data of the i+1th line in the file block k is read, compare the data of the i+1th line with the data of the previous i line, and determine that the data of the i+1th line is in the file block k and insert the i+1th row data into the target position to obtain the sorted file block k; wherein, in the sorted file block k, all the files located in the target position The transaction time of the i+1 row of data is after the transaction time of the i-th row of data at the immediately preceding adjacent position of the target position, and after the i+th row of data at the next adjacent position of the target position 2 lines of data before the transaction time; 基于多行匹配的排序方式,对所述编号i的排序后的所有文件块之间进行排序,得到所述编号i的排序后的所有文件块;Based on the sorting method of multi-line matching, sort all the sorted file blocks of the number i to obtain all the sorted file blocks of the number i; 基于多行匹配的排序方式,对所述编号1至所述编号w的文件之间进行排序,得到所述n个排序后的文件。Based on the sorting method of multi-line matching, the files with the number 1 to the number w are sorted to obtain the n sorted files. 4.根据权利要求3中所述的方法,其特征在于,所述基于差异比对算法,对所述n个排序后的文件进行数据比对,包括:4. The method according to claim 3, wherein the data comparison is performed on the n sorted files based on the difference comparison algorithm, comprising: 按照所述编号i的排序后的所有文件块的首行交易时间字段和尾行交易字段,从数据库中导出与数据库文件i;其中,所述编号i的排序后的所有文件块与所述数据库文件i具有相同的数据分区标识;According to the first row transaction time field and the last row transaction field of all file blocks sorted by the number i, the database file i is derived from the database; wherein, all the file blocks sorted by the number i are related to the database file. i has the same data partition ID; 基于所述差异比对算法计算所述编号i的排序后的所有文件块的第一散列值;Calculate the first hash value of all the sorted file blocks of the number i based on the difference comparison algorithm; 基于所述差异比对算法计算所述数据库文件i的第二散列值;calculating the second hash value of the database file i based on the difference comparison algorithm; 若所述第一散列值与所述第二散列值不同,确定所述编号i的排序后的所有文件块与所述数据库文件i不同;If the first hash value is different from the second hash value, determine that all the sorted file blocks of the number i are different from the database file i; 基于数据匹配算法,从所述数据库文件i中去掉与所述编号i的排序后的所有文件块中相同的文件块,筛选出所述数据库文件i中的第一差异文件块和所述编号i的排序后的所有文件块中的第二差异文件块;Based on the data matching algorithm, remove from the database file i the same file block as all the sorted file blocks with the number i, and filter out the first difference file block in the database file i and the file block with the number i The second difference file block among all the file blocks after sorting; 确定所述数据库文件i中的第一差异文件块和所述编号i的排序后的所有文件块中的第二差异文件块之间的差异信息,并基于所述差异信息进行数据比对。Determine the difference information between the first difference file block in the database file i and the second difference file block in all the sorted file blocks of the number i, and perform data comparison based on the difference information. 5.根据权利要求4中所述的方法,其特征在于,所述基于数据匹配算法,从所述数据库文件i中去掉与所述编号i的排序后的所有文件块中相同的文件块,筛选出所述数据库文件i中的第一差异文件块和所述编号i的排序后的所有文件块中的第二差异文件块,包括:5. method according to claim 4 is characterized in that, described based on data matching algorithm, remove from described database file i the same file block in all file blocks after the sorting of described number i, filter Obtain the first difference file block in the database file i and the second difference file block in all the sorted file blocks of the number i, including: 若所述编号i的排序后的所有文件块与所述数据库文件i中包含的数据的行数不同,至少一次去掉所述编号i的排序后的所有文件块和所述数据库文件i中的首行数据和尾行数据,得到所述编号i的去行后的排序后的所有文件块和所述数据库文件i的去行后的文件块;其中,所述编号i的去行后的排序后的所有文件块和所述数据库文件i的去行后的文件块具有相同的交易时段;If the number of all sorted file blocks of the number i is different from the number of rows of the data contained in the database file i, remove all the sorted file blocks of the number i and the first file block of the database file i at least once. Line data and end line data, obtain all the file blocks after the row removal of the number i and the file blocks after the row removal of the database file i; wherein, the sorted file blocks of the number i after the row removal are obtained. All file blocks and the file block after the row of the database file i have the same transaction period; 基于所述差异比对算法计算所述编号i的去行后的排序后的所有文件块的第三散列值;Calculate the third hash value of all the sorted file blocks of the number i after the row is removed based on the difference comparison algorithm; 基于所述差异比对算法计算所述数据库文件i的去行后的文件块的第四散列值;Calculate the fourth hash value of the file block of the database file i after the row is removed based on the difference comparison algorithm; 若所述第三散列值与所述第四散列值不同,确定所述编号i的去行后的排序后的所有文件块和所述数据库文件i的去行后的文件块不同;If the third hash value is different from the fourth hash value, determine that all the sorted file blocks of the number i after the row is removed are different from the file block of the database file i after the row is removed; 基于所述数据匹配算法,从所述数据库文件i的去行后的文件块中去掉与所述编号i的去行后的排序后的所有文件块中相同的文件块,筛选出所述数据库文件i的去行后的文件块中的第三差异文件块和所述编号i的去行后的排序后的所有文件块中的第四差异文件块;Based on the data matching algorithm, remove the same file blocks as all the sorted file blocks of the number i after the row is removed from the file block of the database file i after the row is removed, and filter out the database file The third difference file block in the file block after the line removal of i and the fourth difference file block in all the file blocks after the line removal of the number i; 至少一次去掉所述编号i的所述第四差异文件块和所述数据库文件i的第三差异文件块的尾行数据,筛选出所述数据库文件i中的所述第一差异文件块和所述编号i的所述第二差异文件块;其中,所述数据库文件i中的所述第一差异文件块和所述编号i的所述第二差异文件块具有相同的尾行交易时间。Remove the trailing line data of the fourth difference file block of the number i and the third difference file block of the database file i at least once, and filter out the first difference file block and the first difference file block in the database file i. The second difference file block with the number i; wherein the first difference file block in the database file i and the second difference file block with the number i have the same tail line transaction time. 6.根据权利要求4或5所述的方法,其特征在于,所述确定所述数据库文件i中的第一差异文件块和所述编号i的排序后的所有文件块中的第二差异文件块之间的差异信息,并基于所述差异信息进行数据比对,包括:6. The method according to claim 4 or 5, wherein the determining of the first difference file block in the database file i and the second difference file in all the file blocks sorted by the number i Difference information between blocks, and perform data comparison based on the difference information, including: 基于所述差异比对算法计算所述数据库文件i中的第一差异文件块的第五散列值,并将所述第五散列值作为键且将所述数据库文件i中的第一差异文件块的数据行作为值记录为所述数据库文件i的第一关联数组;A fifth hash value of the first difference file block in the database file i is calculated based on the difference comparison algorithm, and the fifth hash value is used as a key and the first difference in the database file i is used as a key. The data row of the file block is recorded as the first associative array of the database file i as a value; 基于所述差异比对算法计算所述编号i的所述第二差异文件块的第六散列值,并将所述第六散列值作为键且将所述编号i的所述第二差异文件块的数据行作为值记录为所述编号i的第二关联数组;A sixth hash value of the second difference file block of the number i is calculated based on the difference comparison algorithm, and the sixth hash value is used as a key and the second difference of the number i is used as a key The data row of the file block is recorded as the second associative array of the number i as a value; 在每一分区内,将所述数据库文件i的第一关联数组的键与所述编号i的第二关联数组的键比较,去掉两个关联数组中键相同的数据行,得到所述数据库文件i的第三关联数组与所述编号i的第四关联数组;其中,所述数据库文件i的第三关联数组中的键为每行数据的交易流水号且值为数据行;所述编号i的第四关联数组中的键为每行数据的交易流水号且值为数据行;In each partition, compare the key of the first associative array of the database file i with the key of the second associative array of the number i, remove the data rows with the same key in the two associative arrays, and obtain the database file The third associative array of i and the fourth associative array of the number i; wherein, the key in the third associative array of the database file i is the transaction serial number of each row of data and the value is a data row; the number i The key in the fourth associative array of is the transaction serial number of each row of data and the value is the data row; 确定所述数据库文件i的第三关联数组与所述编号i的第四关联数组之间的差异信息,并基于所述差异信息进行数据比对。Determine the difference information between the third associative array of the database file i and the fourth associative array of the number i, and perform data comparison based on the difference information. 7.根据权利要求6所述的方法,其特征在于,所述确定所述数据库文件i的第三关联数组与所述编号i的第四关联数组之间的差异信息,并基于所述差异信息进行数据比对,包括:7. The method according to claim 6, wherein the determining the difference information between the third associative array of the database file i and the fourth associative array of the number i, and based on the difference information Perform data comparisons, including: 若所述编号i的第四关联数组中存在所述数据库文件i的第三关联数组中不存在的第一键,确定所述差异信息表征所述对账文件有所述数据库没有的数据,基于所述第一键确定所述编号i的第四关联数组对应的数据行,在所述数据库文件i的第三关联数组中增加所述编号i的第四关联数组对应的数据行;If there is a first key in the fourth associative array numbered i that does not exist in the third associative array of the database file i, it is determined that the difference information indicates that the reconciliation file has data that the database does not have, based on The first key determines the data row corresponding to the fourth associative array of the number i, and adds the data row corresponding to the fourth associative array of the number i in the third associative array of the database file i; 若所述编号i的第四关联数组中不存在所述数据库文件i的第三关联数组中存在的第二键,确定所述差异信息表征所述对账文件没有所述数据库有的数据,基于所述第二键确定所述数据库文件i的第三关联数组对应的数据行,并删除所述数据库文件i的第三关联数组对应的数据行;If the second key that exists in the third associative array of the database file i does not exist in the fourth associative array numbered i The second key determines the data row corresponding to the third associative array of the database file i, and deletes the data row corresponding to the third associative array of the database file i; 若所述编号i的第四关联数组中存在所述数据库文件i的第三关联数组中存在的第三键,确定所述差异信息表征所述对账文件和所述数据库均存在交易数据,且所述交易数据不一致,将所述编号i的第四关联数组中所述第三键对应的数据行替换为所述编号i的第四关联数组中所述第三键对应的数据行。If there is a third key in the third associative array of the database file i in the fourth associative array with the number i, it is determined that the difference information indicates that transaction data exists in both the reconciliation file and the database, and If the transaction data is inconsistent, the data row corresponding to the third key in the fourth associative array numbered i is replaced with the data row corresponding to the third key in the fourth associative array numbered i. 8.一种文件的数据比对装置,其特征在于,包括:8. a data comparison device of a file, is characterized in that, comprises: 处理模块,用于对获取到的对账文件进行等比拆分,得到N个拆分文件;The processing module is used to split the obtained reconciliation files in equal proportions to obtain N split files; 所述处理模块,用于按照所述对账文件关联的交易所具有的用户标识,将所述N个拆分文件分到M个数据分区中;其中,所述M个数据分区中每一数据分区对应一个用户标识,所述每一数据分区包含m个子文件;The processing module is configured to divide the N split files into M data partitions according to the user identifier of the transaction associated with the reconciliation file; wherein, each data in the M data partitions The partition corresponds to a user ID, and each data partition contains m sub-files; 所述处理模块,用于按照交易类型和交易时间信息,对所述每一数据分区中的所述m个子文件进行数据清洗归类,并对清洗归类后的所有文件进行等比拆分,得到n个待排序文件;The processing module is configured to perform data cleaning and classification on the m sub-files in each data partition according to the transaction type and transaction time information, and split all the files after cleaning and classification in equal proportions, Get n files to be sorted; 所述处理模块,用于按照所述交易时间信息,对所述M个数据分区中的所述n个待排序文件进行排序,得到n个排序后的文件;The processing module is configured to sort the n files to be sorted in the M data partitions according to the transaction time information to obtain n sorted files; 对账模块,用于基于差异比对算法,对所述n个排序后的文件进行数据比对。The account reconciliation module is configured to perform data comparison on the n sorted files based on the difference comparison algorithm. 9.一种文件的数据比对设备,其特征在于,包括:9. a data comparison device of a file, is characterized in that, comprises: 存储器,用于存储可执行指令;处理器,用于执行所述存储器中存储的可执行指令时,实现权利要求1至7任一项所述的方法。The memory is used to store executable instructions; the processor is used to implement the method of any one of claims 1 to 7 when executing the executable instructions stored in the memory. 10.一种计算机可读存储介质,其特征在于,存储有可执行指令,用于引起处理器执行时,实现权利要求1至7任一项所述的方法。10. A computer-readable storage medium, characterized by storing executable instructions for causing a processor to implement the method of any one of claims 1 to 7 when executed.
CN202110724780.4A 2021-06-29 2021-06-29 File data comparison method, device, equipment and storage medium Active CN113342750B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110724780.4A CN113342750B (en) 2021-06-29 2021-06-29 File data comparison method, device, equipment and storage medium
PCT/CN2021/140732 WO2023273235A1 (en) 2021-06-29 2021-12-23 Data comparison method, apparatus and device for file, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110724780.4A CN113342750B (en) 2021-06-29 2021-06-29 File data comparison method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113342750A true CN113342750A (en) 2021-09-03
CN113342750B CN113342750B (en) 2023-11-17

Family

ID=77481343

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110724780.4A Active CN113342750B (en) 2021-06-29 2021-06-29 File data comparison method, device, equipment and storage medium

Country Status (2)

Country Link
CN (1) CN113342750B (en)
WO (1) WO2023273235A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113656654A (en) * 2021-10-19 2021-11-16 云丁网络技术(北京)有限公司 Method, device and system for adding equipment
CN113837878A (en) * 2021-09-07 2021-12-24 中国银联股份有限公司 Data comparison method, device, equipment and storage medium
CN113886332A (en) * 2021-12-09 2022-01-04 广东睿江云计算股份有限公司 A large file difference comparison method, device, computer equipment and storage medium
CN114297193A (en) * 2021-12-10 2022-04-08 武汉达梦数据库股份有限公司 HASH-based data comparison method and device
CN114363321A (en) * 2021-12-30 2022-04-15 支付宝(杭州)信息技术有限公司 File transmission method, equipment and system
WO2023273235A1 (en) * 2021-06-29 2023-01-05 深圳前海微众银行股份有限公司 Data comparison method, apparatus and device for file, and storage medium

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116702024B (en) * 2023-05-16 2024-05-28 见知数据科技(上海)有限公司 Method, device, computer equipment and storage medium for identifying type of stream data
CN116308850B (en) * 2023-05-19 2023-09-05 深圳市四格互联信息技术有限公司 Account checking method, account checking system, account checking server and storage medium
CN116910631B (en) * 2023-09-14 2024-01-05 深圳市智慧城市科技发展集团有限公司 Array comparison method, device, electronic equipment and readable storage medium
CN117762873B (en) * 2023-12-20 2024-09-06 中邮消费金融有限公司 Data processing method, device, equipment and storage medium
CN118394849B (en) * 2024-06-26 2024-09-20 杭州古珀医疗科技有限公司 Method and device for comparing difference of full-scale data in medical field

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10586019B1 (en) * 2014-01-24 2020-03-10 The Pnc Financial Services Group, Inc. Automated healthcare cash account reconciliation method
CN111325617A (en) * 2020-01-22 2020-06-23 北京开科唯识技术有限公司 File-based account checking method and device, computer equipment and readable storage medium
CN112037003A (en) * 2020-09-17 2020-12-04 中国银行股份有限公司 File account checking processing method and device
CN112613964A (en) * 2020-12-25 2021-04-06 深圳鼎盛电脑科技有限公司 Account checking method, account checking device, account checking equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113342750B (en) * 2021-06-29 2023-11-17 深圳前海微众银行股份有限公司 File data comparison method, device, equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10586019B1 (en) * 2014-01-24 2020-03-10 The Pnc Financial Services Group, Inc. Automated healthcare cash account reconciliation method
CN111325617A (en) * 2020-01-22 2020-06-23 北京开科唯识技术有限公司 File-based account checking method and device, computer equipment and readable storage medium
CN112037003A (en) * 2020-09-17 2020-12-04 中国银行股份有限公司 File account checking processing method and device
CN112613964A (en) * 2020-12-25 2021-04-06 深圳鼎盛电脑科技有限公司 Account checking method, account checking device, account checking equipment and storage medium

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023273235A1 (en) * 2021-06-29 2023-01-05 深圳前海微众银行股份有限公司 Data comparison method, apparatus and device for file, and storage medium
CN113837878A (en) * 2021-09-07 2021-12-24 中国银联股份有限公司 Data comparison method, device, equipment and storage medium
CN113837878B (en) * 2021-09-07 2024-05-03 中国银联股份有限公司 Data comparison method, device, equipment and storage medium
CN113656654A (en) * 2021-10-19 2021-11-16 云丁网络技术(北京)有限公司 Method, device and system for adding equipment
CN113656654B (en) * 2021-10-19 2022-05-10 云丁网络技术(北京)有限公司 Method, device and system for adding equipment
CN113886332A (en) * 2021-12-09 2022-01-04 广东睿江云计算股份有限公司 A large file difference comparison method, device, computer equipment and storage medium
CN113886332B (en) * 2021-12-09 2022-02-08 广东睿江云计算股份有限公司 A large file difference comparison method, device, computer equipment and storage medium
CN114297193A (en) * 2021-12-10 2022-04-08 武汉达梦数据库股份有限公司 HASH-based data comparison method and device
CN114363321A (en) * 2021-12-30 2022-04-15 支付宝(杭州)信息技术有限公司 File transmission method, equipment and system

Also Published As

Publication number Publication date
WO2023273235A1 (en) 2023-01-05
CN113342750B (en) 2023-11-17

Similar Documents

Publication Publication Date Title
CN113342750A (en) File data comparison method, device, equipment and storage medium
US20170109378A1 (en) Distributed pipeline optimization for data preparation
US10642814B2 (en) Signature-based cache optimization for data preparation
CN101553813A (en) Managing storage of individually accessible data units
CN108205571B (en) Key value data table connection method and device
US20070239663A1 (en) Parallel processing of count distinct values
WO2020211466A1 (en) Non-redundant gene clustering method and system, and electronic device
CN110851515B (en) A large data ETL model execution method and medium based on Spark distributed environment
US20200278980A1 (en) Database processing apparatus, group map file generating method, and recording medium
KR102425595B1 (en) System for performing searching and analysis based on in-memory computing for real-time data processing, analysis method, and computer program
CN116821053B (en) Data reporting methods, devices, computer equipment and storage media
WO2004038582A1 (en) Data processing method and data processing program
US11675751B2 (en) Systems and methods for capturing data schema for databases during data insertion
US11016978B2 (en) Joiner for distributed databases
CN116010345A (en) Method, device and equipment for realizing table service scheme of flow batch integrated data lake
CN108256284A (en) A kind of drug virtual screening method
CN114996240B (en) Data table verification method, device, equipment, storage medium and program
Hagan et al. SparkDWM: a scalable design of a Data Washing Machine using Apache Spark
CN114237977B (en) Transaction rollback method, apparatus, computer device and storage medium
US11126401B2 (en) Pluggable sorting for distributed databases
CN117520306B (en) Data verification method and device and electronic equipment
CN117649240B (en) Suspicious account identification method, system, device, storage medium and program product
US11288447B2 (en) Step editor for data preparation
CN109542900B (en) Data processing method and device
CN118035505A (en) Data sampling method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant