[go: up one dir, main page]

CN106776795B - Data writing method and device based on Hbase database - Google Patents

Data writing method and device based on Hbase database Download PDF

Info

Publication number
CN106776795B
CN106776795B CN201611047256.3A CN201611047256A CN106776795B CN 106776795 B CN106776795 B CN 106776795B CN 201611047256 A CN201611047256 A CN 201611047256A CN 106776795 B CN106776795 B CN 106776795B
Authority
CN
China
Prior art keywords
data
primary key
thread
written
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201611047256.3A
Other languages
Chinese (zh)
Other versions
CN106776795A (en
Inventor
黄健文
王刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201611047256.3A priority Critical patent/CN106776795B/en
Publication of CN106776795A publication Critical patent/CN106776795A/en
Application granted granted Critical
Publication of CN106776795B publication Critical patent/CN106776795B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种基于Hbase数据库的数据写入方法及装置,该方法包括:将该获取的线程的识别码和该生成的行主键列表作为基准数据,将该缓存内存中存储的数据记录、该存储的数据记录对应的行主键值、存储的线程的识别码和存储的行主键列表写入该数据库中的分布式文件系统中,并在写入完成后,将存储于该缓存内存中的线程的识别码和行主键列表作为待比对数据,将该基准数据与该待比对数据进行比对,若比对结果为存在有数据记录未写入该数据库的分布式文件系统中,则重新将该待写入文件写入该数据库中,这样确保数据存储的完整性,同时相比于以日志文件的方式记录,本发明利用行主键列表来比对,占用系统资源很小,进而不会影响数据存储效率。

Figure 201611047256

The invention discloses a data writing method and device based on an Hbase database. The method comprises: using the acquired thread identification code and the generated row primary key list as reference data, recording the data stored in the cache memory, The row primary key value corresponding to the stored data record, the stored thread identification code and the stored row primary key list are written into the distributed file system in the database, and after the writing is completed, they will be stored in the cache memory The identification code of the thread and the row primary key list are used as the data to be compared, and the benchmark data is compared with the data to be compared. If the comparison result is that there is a data record in the distributed file system that is not written into the database, Then write the to-be-written file into the database again, so as to ensure the integrity of data storage, and at the same time, compared with recording in a log file, the present invention uses the row primary key list for comparison, which occupies very little system resources, and further Does not affect data storage efficiency.

Figure 201611047256

Description

Data writing method and device based on Hbase database
Technical Field
The invention belongs to the field of data storage, and particularly relates to a data writing method and device based on an Hbase database.
Background
At present, a distributed storage method is adopted in a cloud storage system, and data is stored on a plurality of independent devices in a dispersed manner, so that on one hand, the performance of a database is improved, and the data reading efficiency is improved; on the other hand, due to the distributed storage structure, when a storage device fails, only the access of local data is affected, the whole database is not paralyzed, and the safety and the reliability of big data are further improved. A Hadoop Database (HBase, Hadoop Database) is a distributed storage system. Although the HBase database can avoid that access of all data in the database is not affected when a storage device fails to send a fault, it cannot avoid that a fault occurs in the data writing process, and thus target data cannot be queried through an index.
In the prior art, the pre-written log WAL (write Ahead logging) is a standard method for ensuring data integrity. In the event of a database crash, the database is restored through the log previously stored by the WAL. The pre-stored logs need to record each storage process, so the logs occupy a large amount of storage resources of the system and I/O resources of the system, and once the amount of stored data is increased, the efficiency of data storage is reduced.
Disclosure of Invention
The invention provides a data writing method and device based on an Hbase database, and aims to solve the problem that in the prior art, pre-stored logs occupy a large amount of system resources to reduce data storage efficiency.
The invention provides a data writing method based on an Hbase database, which comprises the following steps: acquiring a data record corresponding to a file to be written, a row primary key value corresponding to the data record and an identification code of the thread from the thread, generating a row primary key list containing the corresponding relation between the data record and the row primary key value, and taking the acquired identification code of the thread and the generated row primary key list as reference data; writing the obtained data record, the row primary key value corresponding to the obtained data record, the identification code of the obtained thread and the generated row primary key list into a cache memory in a database; writing the data records stored in the cache memory, the row main key values corresponding to the stored data records, the identification codes of the stored threads and the stored row main key list into a distributed file system in the database, and taking the identification codes of the threads and the row main key list stored in the cache memory as data to be compared after the writing is finished; and comparing the reference data with the data to be compared, and if the comparison result shows that a data record does not exist in the distributed file system of the database, rewriting the file to be written into the database.
The invention provides a data writing device based on an Hbase database, which comprises: an obtaining module, configured to obtain, from a thread, a data record corresponding to a file to be written, a row primary key value corresponding to the data record, and an identification code of the thread, generate a row primary key list including a correspondence between the data record and the row primary key value, and use the obtained identification code of the thread and the generated row primary key list as reference data; a processing module, configured to write the obtained data record, the row primary key value corresponding to the obtained data record, the identification code of the obtained thread, and the generated row primary key list into a cache memory in a database; the processing module is further configured to write the data records stored in the cache memory, the row primary key values corresponding to the stored data records, the identification codes of the stored threads, and the stored row primary key list into the distributed file system in the database, and after the writing is completed, take the identification codes of the threads and the row primary key list stored in the cache memory as data to be compared; the processing module is further configured to compare the reference data with the data to be compared, and if the comparison result indicates that a data record does not exist in the distributed file system of the database, write the file to be written into the database again.
The data writing method and device based on Hbase database provided by the invention acquire the data record corresponding to the file to be written, the row primary key value corresponding to the data record and the identification code of the thread from the thread, generate the row primary key list containing the corresponding relation between the data record and the row primary key value, simultaneously write the identification code of the acquired thread and the generated row primary key list as reference data into a cache memory in the database, write the data record stored in the cache memory, the row primary key value corresponding to the stored data record, the identification code of the stored thread and the stored row primary key list into a distributed file system in the database, and after the writing is finished, the identification code of the thread stored in the cache memory and the row main key list are used as data to be compared, the reference data and the data to be compared are compared, if the comparison result shows that a distributed file system with data records not written into the database exists, the file to be written is written into the database again, and therefore after the data are written in each time, whether all the data are written into the database or not is determined through comparison, the completeness of data storage is further ensured.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention.
Fig. 1 is a schematic flow chart of an implementation of a data writing method based on an Hbase database according to a first embodiment of the present invention;
FIG. 2 is a schematic flow chart illustrating an implementation of a data writing method based on an Hbase database according to a second embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a data writing apparatus based on Hbase database according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a data writing device based on an Hbase database according to a fourth embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a schematic flow chart of a data writing method based on an Hbase database according to a first embodiment of the present invention, which can be applied to a terminal with a data processing function, such as a computer, and the data writing method based on the Hbase database shown in fig. 1 mainly includes the following steps:
s101, acquiring a data record corresponding to a file to be written, a row primary key value corresponding to the data record and an identification code of the thread from the thread, generating a row primary key list containing the corresponding relation between the data record and the row primary key value, and taking the acquired identification code of the thread and the generated row primary key list as reference data.
The Hbase database includes a plurality of Threads (Threads) that are used for allocation and scheduling. Each thread corresponds to an identification code of the thread, i.e., an Identification (ID) of the thread. In practical applications, a file to be written may be divided into a plurality of data records, and most of the data records in the file to be written are allocated to one thread, but there is also a possibility of being allocated to a plurality of threads. One data record corresponds to one row primary key value (rowkey). The row primary key list includes a plurality of corresponding relations between the data records and the row primary key values, wherein the row primary key list corresponds to the obtained thread ID.
Optionally, the row primary key list may also be corresponding to the file to be written, the correspondence between the row primary key list and the file to be written is stored, and the thread is notified to store the correspondence between the row primary key list and the file to be written.
S102, writing the acquired data record, the row primary key value corresponding to the acquired data record, the identification code of the acquired thread and the generated row primary key list into a cache memory in a database.
And in practical application, writing the acquired data record, the row primary key value corresponding to the acquired data record, the identification code of the acquired thread and the generated row primary key list into a memory MemStore.
S103, writing the data records stored in the cache memory, the row main key values corresponding to the stored data records, the identification codes of the stored threads and the stored row main key list into a distributed file system in the database, and taking the identification codes of the threads and the row main key list stored in the cache memory as data to be compared after the writing is finished.
Here, the data stored in the cache memory in S102 is written into Hfile of the distributed file system, i.e., the HBase database.
And S104, comparing the reference data with the data to be compared, and if the comparison result shows that the data record does not exist in the distributed file system of the database, rewriting the file to be written into the database.
The process of the comparison is to ensure the integrity of the data written to the distributed file system. If the comparison result indicates that the distributed file system with the data record not written into the database exists, the file to be written is written into the database again, that is, the steps S101 to S104 need to be executed again.
In the embodiment of the invention, a data record corresponding to a file to be written, a row main key value corresponding to the data record and an identification code of the thread are obtained from the thread, a row main key list containing a corresponding relation between the data record and the row main key value is generated, the identification code of the obtained thread and the generated row main key list are used as reference data, the obtained data record, the row main key value corresponding to the obtained data record, the identification code of the obtained thread and the generated row main key list are written into a cache memory in a database, the data record stored in the cache memory, the row main key value corresponding to the stored data record, the identification code of the stored thread and the stored row main key list are written into a distributed file system in the database, and after the writing is finished, the identification code of the thread and the row main key list stored in the cache memory are used as data to be compared, the reference data and the data to be compared are compared, if the comparison result shows that a data record is not written into the distributed file system of the database, the file to be written is written into the database again, and therefore after the data are written in each time, whether the data are all written into the database is determined through comparison, the completeness of data storage is further ensured.
Referring to fig. 2, fig. 2 is a schematic flow chart of a data writing method based on an Hbase database according to a second embodiment of the present invention, which can be applied to a terminal with a data processing function, such as a computer, and the data writing method based on the Hbase database shown in fig. 2 mainly includes the following steps:
s201, sending the data records in the file to be written to a thread, and generating corresponding row primary key values for each data record received by the thread through the thread.
One data record corresponds to one row primary key value. In practical applications, a file to be written may be divided into a plurality of data records, and most of the data records in the file to be written are allocated to one thread, but there is also a possibility of being allocated to a plurality of threads. In practical applications, a thread is a rowkey value generated by a Hash (Hash) algorithm.
S202, acquiring a data record corresponding to a file to be written, a row primary key value corresponding to the data record and an identification code of the thread from the thread, generating a row primary key list containing the corresponding relation between the data record and the row primary key value, and taking the acquired identification code of the thread and the generated row primary key list as reference data.
The row primary key list includes a plurality of corresponding relations between the data records and the row primary key values, wherein the row primary key list corresponds to the obtained thread ID.
Optionally, the thread may be further notified to store the corresponding relationship between the row primary key list and the file to be written.
S203, writing the obtained data record, the row primary key value corresponding to the obtained data record, the identification code of the obtained thread, and the generated row primary key list into a cache memory in the database.
And in practical application, writing the acquired data record, the row primary key value corresponding to the acquired data record, the identification code of the acquired thread and the generated row primary key list into a memory MemStore.
And S204, writing the data records stored in the cache memory, the row main key values corresponding to the stored data records, the identification codes of the stored threads and the stored row main key list into a distributed file system in the database, and taking the identification codes of the threads and the row main key list stored in the cache memory as data to be compared after the writing is finished.
Here, the data stored in the cache memory in S203 is written into Hfile of the distributed file system, i.e., the HBase database. In practical application, the data records stored in the cache memory, the row main key values corresponding to the stored data records, the identification codes of the stored threads and the stored row main key list are firstly forwarded to the HStore of the HBase database, and then the data records stored in the cache memory, the row main key values corresponding to the stored data records, the identification codes of the stored threads and the stored row main key list are written into the HFile through a commit mode. After writing the HFile, the identification code of the thread in the cache memory and the row main key list are saved as the data to be compared in the server where the cache memory is located.
S205, comparing the reference data with the data to be compared, and if the comparison result shows that the data record does not exist in the distributed file system of the database, writing the file to be written into the database again.
The process of the comparison is to ensure the integrity of the data written to the distributed file system. If the comparison result indicates that there is a distributed file system in which the data record is not written into the database, the file to be written is written into the database again, that is, the steps S201 to S205 need to be executed again.
Optionally, comparing the reference data with the data to be compared specifically includes:
judging whether the identification code of the thread in the data to be compared is consistent with the identification code of the thread in the reference data;
if the comparison result is consistent with the comparison result, comparing the row main key list in the data to be compared with the row main key list in the reference data;
if the row main key list in the data to be compared is completely consistent with the row main key list in the reference data, the comparison result is that no data record is written into the distributed file system of the database;
and if the row main key list in the data to be compared is inconsistent with the row main key list in the reference data, determining that the comparison result is that a data record exists in a distributed file system which is not written in the database.
Firstly, judging whether the IDs of the threads in the reference data and the data to be compared are consistent, and comparing the consistency of the row main key lists in the reference data and the data to be compared under the condition that the IDs of the threads are consistent.
Optionally, if the comparison result indicates that there is a data record in the distributed file system that is not written in the database, then writing the file to be written in the database again specifically includes:
if the comparison result is that a data record does not exist in the distributed file system of the database, searching missing row primary key values in the row primary key list of the data to be compared from the row primary key list of the reference data;
acquiring a row main key list where the missing row main key values are located and a to-be-written file corresponding to the acquired row main key list from the thread according to the identification code of the thread in the reference data or the to-be-compared data, wherein the thread stores the corresponding relationship between the row main key list and the to-be-written file;
and writing the data record corresponding to the file to be written into the database again, and comparing the reference data with the data to be compared again until the comparison result is that no data record is written into the distributed file system of the database.
Firstly, determining a row main key list where missing row main key values are located, then finding the ID of a thread through the row main key list, finally, finding a to-be-written file needing to be rewritten by using the row main key list through the corresponding relation between the row main key list stored in the thread and the to-be-written file, and then re-executing the step S201 to the step S205 until the comparison result is that no data record is written into a distributed file system of the database, so that data loss is avoided, and the data can be completely stored in the database.
It should be noted that, for the same file to be written that is rewritten, the respective generated rowkey values are the same, and at the same time, it can also be understood that the row primary key lists are the same, so that the same data records can be prevented from being repeatedly stored.
Optionally, after comparing the reference data with the data to be compared, the method further includes:
if the comparison result is that no data record is written into the distributed file system of the database, deleting the reference data and the data to be compared;
and sending deletion prompt information to the thread according to the identification code of the thread in the reference data or the data to be compared, wherein the deletion prompt information is used for prompting to delete the corresponding relation between the row main key list stored in the thread and the file to be written.
If the comparison result is that no data record is written into the distributed file system of the database, deleting the reference data and the data to be compared, and informing the thread to delete the corresponding relation between the row main key list stored in the thread and the file to be written, so that partial storage space can be released, and system resources are optimized.
In the embodiment of the invention, the data records in the file to be written are sent to the thread, so that each data record received by the thread generates a corresponding row main key value through the thread, the data record corresponding to the file to be written, the row main key value corresponding to the data record and the identification code of the thread are obtained from the thread, a row main key list containing the corresponding relation between the data record and the row main key value is generated, meanwhile, the identification code of the obtained thread and the generated row main key list are used as reference data, the obtained data record, the row main key value corresponding to the obtained data record, the identification code of the obtained thread and the generated row main key list are written into a cache memory in a database, the data records stored in the cache memory, the row main key values corresponding to the stored data records, the identification codes of the stored threads and the stored row main key list are written into a distributed file system in the database, after the writing is finished, the identification code of the thread stored in the cache memory and the row main key list are used as data to be compared, the reference data and the data to be compared are compared, if the comparison result shows that a distributed file system with data records not written in the database exists, the file to be written in is written in the database again, and therefore after the data are written in each time, whether the data are all written in the database or not is determined through comparison, the completeness of data storage is further ensured.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a data writing apparatus based on an Hbase database according to a third embodiment of the present invention, and for convenience of description, only the parts related to the embodiment of the present invention are shown. The Hbase database-based data writing apparatus illustrated in fig. 3 may be an implementation subject of the Hbase database-based data writing method provided in the embodiments illustrated in fig. 1 and fig. 2. The data writing device based on the Hbase database illustrated in fig. 3 mainly includes: an acquisition module 301 and a processing module 302. The above functional modules are described in detail as follows:
an obtaining module 301, configured to obtain, from a thread, a data record corresponding to a file to be written, a row primary key value corresponding to the data record, and an identification code of the thread, generate a row primary key list including a correspondence between the data record and the row primary key value, and use the obtained identification code of the thread and the generated row primary key list as reference data;
a processing module 302, configured to write the obtained data record, the row primary key value corresponding to the obtained data record, the identification code of the obtained thread, and the generated row primary key list into a cache memory in a database;
the processing module 302 is further configured to write the data records stored in the cache memory, the row primary key values corresponding to the stored data records, the identification codes of the stored threads, and the stored row primary key list into the distributed file system in the database, and after the writing is completed, take the identification codes of the threads and the row primary key list stored in the cache memory as data to be compared;
the processing module 302 is further configured to compare the reference data with the data to be compared, and if the comparison result indicates that there is a data record in the distributed file system that is not written in the database, write the file to be written in the database again.
Each thread corresponds to an identification code of the thread, i.e., an ID of the thread. A file to be written may be divided into a plurality of data records, and most of the data records in a file to be written may be allocated to one thread, but there is also a possibility of being allocated to a plurality of threads. The row primary key list includes a plurality of corresponding relations between the data records and the row primary key values, wherein the row primary key list corresponds to the obtained thread ID. Optionally, the row primary key list may also be corresponding to the file to be written, the correspondence between the row primary key list and the file to be written is stored, and the thread is notified to store the correspondence between the row primary key list and the file to be written. The cache memory of the Hbase database is MemStore.
The process of the comparison is to ensure the integrity of the data written to the distributed file system. If the comparison result indicates that there is a distributed file system in which the data record is not written into the database, the processing module 302 writes the file to be written into the database again.
For details that are not described in the present embodiment, please refer to the description of the embodiment shown in fig. 1, which is not described herein again.
It should be noted that, in the embodiment of the data writing device based on the Hbase database illustrated in fig. 3, the division of the functional modules is only an example, and in practical applications, the above functions may be allocated by different functional modules according to needs, for example, configuration requirements of corresponding hardware or convenience of implementation of software, that is, the internal structure of the data writing device based on the Hbase database is divided into different functional modules to complete all or part of the above described functions. In addition, in practical applications, the corresponding functional modules in this embodiment may be implemented by corresponding hardware, or may be implemented by corresponding hardware executing corresponding software. The above description principles can be applied to various embodiments provided in the present specification, and are not described in detail below.
In the embodiment of the present invention, the obtaining module 301 obtains a data record corresponding to a file to be written, a row primary key value corresponding to the data record, and an identification code of the thread from a thread, generates a row primary key list including a correspondence relationship between the data record and the row primary key value, and writes the identification code of the obtained thread and the generated row primary key list as reference data, the processing module 302 writes the obtained data record, the row primary key value corresponding to the obtained data record, the identification code of the obtained thread, and the generated row primary key list into a cache memory in a database, the processing module 302 writes the data record stored in the cache memory, the row primary key value corresponding to the stored data record, the identification code of the stored thread, and the stored row primary key list into a distributed file system in the database, and after the writing is completed, the identification code of the thread stored in the cache memory and the row main key list are used as data to be compared, the processing module 302 compares the reference data with the data to be compared, if the comparison result shows that a data record is not written into the distributed file system of the database, the file to be written is written into the database again, and therefore after the data is written, whether all the data are written into the database is determined through comparison every time, the integrity of data storage is ensured.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a data writing device based on an Hbase database according to a fourth embodiment of the present invention, and for convenience of description, only the parts related to the embodiment of the present invention are shown. The Hbase database-based data writing apparatus illustrated in fig. 4 may be an implementation subject of the Hbase database-based data writing method provided in the embodiments illustrated in fig. 1 and fig. 2. The data writing device based on the Hbase database illustrated in fig. 4 mainly includes: a sending module 401, an obtaining module 402, and a processing module 403, where the processing module 403 includes a pair sub-module 4031; the processing module 403 further includes a search sub-module 4032, an acquisition sub-module 4033 and a reset sub-module 4034. The above functional modules are described in detail as follows:
the sending module 401 is configured to send the data record in the file to be written to a thread, so that each data record received by the thread generates a corresponding row primary key value through the thread.
An obtaining module 402, configured to obtain, from a thread, a data record corresponding to a file to be written, a row primary key value corresponding to the data record, and an identification code of the thread, generate a row primary key list including a correspondence between the data record and the row primary key value, and use the obtained identification code of the thread and the generated row primary key list as reference data;
a processing module 403, configured to write the obtained data record, the row primary key value corresponding to the obtained data record, the identification code of the obtained thread, and the generated row primary key list into a cache memory in a database;
the processing module 403 is further configured to write the data records stored in the cache memory, the row primary key values corresponding to the stored data records, the identification codes of the stored threads, and the stored row primary key list into the distributed file system in the database, and after the writing is completed, take the identification codes of the threads and the row primary key list stored in the cache memory as data to be compared;
the processing module 403 is further configured to compare the reference data with the data to be compared, and if the comparison result indicates that there is a data record in the distributed file system that is not written in the database, write the file to be written in the database again.
One data record corresponds to one row primary key value. In practical applications, a file to be written may be divided into a plurality of data records, and most of the data records in the file to be written are allocated to one thread, but there is also a possibility of being allocated to a plurality of threads. In practical applications, a thread is a rowkey value generated by a Hash algorithm. The row primary key list includes a plurality of corresponding relations between the data records and the row primary key values, wherein the row primary key list corresponds to the obtained thread ID.
Optionally, the sending module 401 is further configured to notify the thread to store the corresponding relationship between the row primary key list and the file to be written.
Optionally, the processing module 403 includes: a pair sub-module 4031;
the comparison submodule 4031 is configured to determine whether the identification code of the thread in the data to be compared is consistent with the identification code of the thread in the reference data;
the comparison submodule 4031 is further configured to compare the row primary key list in the data to be compared with the row primary key list in the reference data if the row primary key lists are consistent with each other;
the comparison submodule 4031 is further configured to determine that, if the row primary key list in the data to be compared is completely consistent with the row primary key list in the reference data, the comparison result is that no data record is written in the distributed file system of the database;
the comparison sub-module 4031 is further configured to determine that there is a data record in the distributed file system that is not written in the database if the row primary key list in the data to be compared is inconsistent with the row primary key list in the reference data.
Optionally, the processing module 403 further includes: the search sub-module 4032, the acquisition sub-module 4033, the reset sub-module 4034, the deletion sub-module 4035 and the prompt sub-module 4036;
a searching submodule 4032, configured to search, if the comparison result indicates that there is a data record that is not written in the distributed file system of the database, a row primary key value missing from the row primary key list of the data to be compared from the row primary key list of the reference data;
an obtaining sub-module 4033, configured to obtain, according to an identification code of a thread in the reference data or the data to be compared, a rowkey list where the missing rowkey value is located and a file to be written corresponding to the obtained rowkey list from the thread, where a correspondence between the rowkey list and the file to be written is stored in the thread;
the resetting sub-module 4034 is configured to rewrite the data record corresponding to the file to be written into the database, and to compare the reference data with the data to be compared again until the comparison result indicates that no data record is written into the distributed file system of the database.
A deleting submodule 4035, configured to delete the reference data and the data to be compared if the comparison result is that no data record is written in the distributed file system of the database;
a prompt submodule 4036, configured to send deletion prompt information to the thread according to the identification code of the thread in the reference data or the data to be compared, where the deletion prompt information is used to prompt to delete the correspondence between the row key list stored in the thread and the file to be written.
It should be noted that, for the same file to be written that is rewritten, the respective generated rowkey values are the same, and at the same time, it can also be understood that the row primary key lists are the same, so that the same data records can be prevented from being repeatedly stored.
For details of the embodiment, please refer to the description of the embodiment shown in fig. 1 and fig. 2, which is not repeated herein.
In this embodiment of the present invention, the sending module 401 sends the data record in the file to be written to the thread, so as to generate a row primary key value corresponding to each data record received by the thread through the thread, the obtaining module 402 obtains the data record corresponding to the file to be written, the row primary key value corresponding to the data record, and the identification code of the thread from the thread, and generates a row primary key list including a correspondence relationship between the data record and the row primary key value, and simultaneously uses the identification code of the obtained thread and the generated row primary key list as reference data, the processing module 403 writes the obtained data record, the row primary key value corresponding to the obtained data record, the identification code of the obtained thread, and the generated row primary key list into a cache memory in a database, and writes the data record stored in the cache memory, the row key value corresponding to the stored data record, the row primary key value corresponding to the line, and the row primary key value corresponding to the thread into the cache memory, The stored thread identification code and the stored row main key list are written into a distributed file system in the database, after the writing is finished, the thread identification code and the row main key list stored in the cache memory are used as data to be compared, the reference data and the data to be compared are compared, if the comparison result shows that data records are not written into the distributed file system of the database, the file to be written is written into the database again, and therefore after the data are written in each time, whether the data are all written into the database is determined through comparison, the integrity of data storage is further ensured.
In the embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication link may be an indirect coupling or communication link of some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.
The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It should be noted that, for the sake of simplicity, the above-mentioned method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present invention is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no acts or modules are necessarily required of the invention.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In summary, the present disclosure should not be construed as limiting the present disclosure since the method and apparatus for writing data based on Hbase database provided by the present disclosure may be modified by those skilled in the art according to the concepts of the present disclosure.

Claims (10)

1.一种基于Hbase数据库的数据写入方法,其特征在于,包括:1. a data writing method based on Hbase database, is characterized in that, comprises: 从线程中获取待写入文件对应的数据记录、所述数据记录对应的行主键值以及所述线程的识别码,并生成包含有所述数据记录与所述行主键值之间的对应关系的行主键列表,同时将所述获取的线程的识别码和所述生成的行主键列表作为基准数据;Acquire the data record corresponding to the file to be written, the row primary key value corresponding to the data record, and the thread identification code from the thread, and generate a correspondence between the data record and the row primary key value. The row primary key list of the relationship, and the identification code of the obtained thread and the generated row primary key list are used as reference data at the same time; 将所述获取的数据记录、所述获取的数据记录对应的行主键值、所述获取的线程的识别码和所述生成的行主键列表写入数据库中的缓存内存中;Write the acquired data record, the row primary key value corresponding to the acquired data record, the identification code of the acquired thread and the generated row primary key list into the cache memory in the database; 将所述缓存内存中存储的数据记录、所述存储的数据记录对应的行主键值、存储的线程的识别码和存储的行主键列表写入所述数据库中的分布式文件系统中,并在写入完成后,将存储于所述缓存内存中的线程的识别码和行主键列表作为待比对数据;Write the data record stored in the cache memory, the row primary key value corresponding to the stored data record, the stored thread identification code and the stored row primary key list into the distributed file system in the database, and After the writing is completed, the identification code of the thread and the row primary key list stored in the cache memory are used as the data to be compared; 将所述基准数据与所述待比对数据进行比对,若比对结果为存在有数据记录未写入所述数据库的分布式文件系统中,则重新将所述待写入文件写入所述数据库中。The benchmark data is compared with the data to be compared, and if the comparison result is that there are data records in the distributed file system that have not been written to the database, the to-be-written file is rewritten into the database. in the mentioned database. 2.根据权利要求1所述的方法,其特征在于,所述从线程中获取待写入文件对应的数据记录、所述数据记录对应的行主键值以及所述线程的识别码之前还包括:2. The method according to claim 1, characterized in that before obtaining the data record corresponding to the file to be written, the row primary key value corresponding to the data record and the identification code of the thread from the thread, the method further comprises: : 将所述待写入文件中数据记录发送至线程中,以通过所述线程将所述线程接收到的每一数据记录生成对应的行主键值。The data record in the to-be-written file is sent to the thread, so that the thread generates a corresponding row primary key value for each data record received by the thread. 3.根据权利要求1所述的方法,其特征在于,所述将所述基准数据与所述待比对数据进行比对包括:3. The method according to claim 1, wherein the comparing the reference data with the data to be compared comprises: 判断所述待比对数据中的线程的识别码与所述基准数据中的线程的识别码是否一致;Determine whether the identification code of the thread in the data to be compared is consistent with the identification code of the thread in the benchmark data; 若一致,则将所述待比对数据中的行主键列表与所述基准数据中的行主键列表进行比对;If they are consistent, compare the row primary key list in the data to be compared with the row primary key list in the benchmark data; 若所述待比对数据中的行主键列表与所述基准数据中的行主键列表完全一致,则所述比对结果为没有数据记录未写入所述数据库的分布式文件系统中;If the row primary key list in the data to be compared is completely consistent with the row primary key list in the benchmark data, then the comparison result is that no data record is not written into the distributed file system of the database; 若所述待比对数据中的行主键列表与所述基准数据中的行主键列表不一致,则所述比对结果为存在有数据记录未写入所述数据库的分布式文件系统中。If the row primary key list in the data to be compared is inconsistent with the row primary key list in the reference data, the comparison result is that there are data records in the distributed file system that are not written into the database. 4.根据权利要求3所述的方法,其特征在于,所述若比对结果为存在有数据记录未写入所述数据库的分布式文件系统中,则重新将所述待写入文件写入所述数据库中包括:4. The method according to claim 3, wherein if the comparison result is that there is a distributed file system in which data records are not written into the database, the to-be-written file is rewritten into The database includes: 若所述比对结果为存在有数据记录未写入所述数据库的分布式文件系统中,则从所述基准数据的行主键列表中查找所述待比对数据的行主键列表中缺失的行主键值;If the comparison result is that there are data records that are not written into the distributed file system of the database, search for the missing row in the row primary key list of the data to be compared from the row primary key list of the benchmark data primary key value; 按照所述基准数据或所述待比对数据中线程的识别码,从所述线程中获取所述缺失的行主键值所在的行主键列表以及所述获取的行主键列表对应的待写入文件,其中所述线程中存储有行主键列表与待写入文件之间的对应关系;According to the identification code of the thread in the benchmark data or the data to be compared, obtain the row primary key list where the missing row primary key value is located and the to-be-written corresponding to the obtained row primary key list from the thread file, wherein the thread stores the correspondence between the row primary key list and the file to be written; 重新将所述待写入文件对应的数据记录写入所述数据库中,并重新比对基准数据和待比对数据,直至所述比对结果为没有数据记录未写入所述数据库的分布式文件系统中。Re-write the data record corresponding to the file to be written into the database, and re-comparison the benchmark data and the data to be compared, until the comparison result is that there is no data record that is not written into the database. in the file system. 5.根据权利要求3所述的方法,其特征在于,所述将所述基准数据与所述待比对数据进行比对之后,还包括:5. The method according to claim 3, wherein after comparing the reference data with the data to be compared, the method further comprises: 若所述比对结果为没有数据记录未写入所述数据库的分布式文件系统中,则删除所述基准数据和所述待比对数据;If the comparison result is that no data record is not written into the distributed file system of the database, then delete the reference data and the data to be compared; 按照所述基准数据或所述待比对数据中线程的识别码,向所述线程发送删除提示信息,所述删除提示信息用于提示删除存储于所述线程中的行主键列表与所述待写入文件之间的对应关系。According to the identification code of the thread in the benchmark data or the data to be compared, a deletion prompt message is sent to the thread, and the deletion prompt information is used to prompt to delete the row primary key list stored in the thread and the Write correspondence between files. 6.一种基于Hbase数据库的数据写入装置,其特征在于,所述装置包括:6. a data writing device based on Hbase database, is characterized in that, described device comprises: 获取模块,用于从线程中获取待写入文件对应的数据记录、所述数据记录对应的行主键值以及所述线程的识别码,并生成包含有所述数据记录与所述行主键值之间的对应关系的行主键列表,同时将所述获取的线程的识别码和所述生成的行主键列表作为基准数据;The acquisition module is used to acquire the data record corresponding to the file to be written, the row primary key value corresponding to the data record, and the identification code of the thread from the thread, and generate the data record and the row primary key. The row primary key list of the corresponding relationship between the values, and the identification code of the obtained thread and the generated row primary key list are used as reference data at the same time; 处理模块,用于将所述获取的数据记录、所述获取的数据记录对应的行主键值、所述获取的线程的识别码和所述生成的行主键列表写入数据库中的缓存内存中;The processing module is used to write the acquired data record, the row primary key value corresponding to the acquired data record, the identification code of the acquired thread and the generated row primary key list into the cache memory in the database ; 所述处理模块,还用于将所述缓存内存中存储的数据记录、所述存储的数据记录对应的行主键值、存储的线程的识别码和存储的行主键列表写入所述数据库中的分布式文件系统中,并在写入完成后,将存储于所述缓存内存中的线程的识别码和行主键列表作为待比对数据;The processing module is further configured to write the data record stored in the cache memory, the row primary key value corresponding to the stored data record, the stored thread identification code and the stored row primary key list into the database. In the distributed file system, and after the writing is completed, the identification code and row primary key list of the thread stored in the cache memory are used as the data to be compared; 所述处理模块,还用于将所述基准数据与所述待比对数据进行比对,若比对结果为存在有数据记录未写入所述数据库的分布式文件系统中,则重新将所述待写入文件写入所述数据库中。The processing module is further configured to compare the benchmark data with the data to be compared, and if the result of the comparison is that there are data records in the distributed file system that are not written into the database, the data will be re-registered. The to-be-written file is written into the database. 7.根据权利要求6所述的装置,其特征在于,所述装置还包括:7. The apparatus of claim 6, wherein the apparatus further comprises: 发送模块,用于将所述待写入文件中数据记录发送至线程中,以通过所述线程将所述线程接收到的每一数据记录生成对应的行主键值。The sending module is configured to send the data record in the to-be-written file to the thread, so as to generate a corresponding row primary key value for each data record received by the thread through the thread. 8.根据权利要求6所述的装置,其特征在于,所述处理模块包括:8. The apparatus according to claim 6, wherein the processing module comprises: 比对子模块,用于判断所述待比对数据中的线程的识别码与所述基准数据中的线程的识别码是否一致;A comparison submodule, for judging whether the identification code of the thread in the data to be compared is consistent with the identification code of the thread in the benchmark data; 所述比对子模块,还用于若一致,则将所述待比对数据中的行主键列表与所述基准数据中的行主键列表进行比对;The comparison submodule is also used to compare the row primary key list in the data to be compared with the row primary key list in the benchmark data if they are consistent; 所述比对子模块,还用于若所述待比对数据中的行主键列表与所述基准数据中的行主键列表完全一致,则所述比对结果为没有数据记录未写入所述数据库的分布式文件系统中;The comparison submodule is further configured to, if the row primary key list in the data to be compared is completely consistent with the row primary key list in the benchmark data, the comparison result is that no data record is not written in the In the distributed file system of the database; 所述比对子模块,还用于若所述待比对数据中的行主键列表与所述基准数据中的行主键列表不一致,则所述比对结果为存在有数据记录未写入所述数据库的分布式文件系统中。The comparison submodule is further configured to, if the row primary key list in the data to be compared is inconsistent with the row primary key list in the benchmark data, the comparison result is that there is a data record that is not written into the data record. In the distributed file system of the database. 9.根据权利要求8所述的装置,其特征在于,所述处理模块还包括:9. The apparatus according to claim 8, wherein the processing module further comprises: 查找子模块,用于若所述比对结果为存在有数据记录未写入所述数据库的分布式文件系统中,则从所述基准数据的行主键列表中查找所述待比对数据的行主键列表中缺失的行主键值;A search submodule, configured to search for the row of the data to be compared from the row primary key list of the benchmark data if the comparison result is that there is a data record in the distributed file system that is not written into the database The row primary key value that is missing from the primary key list; 获取子模块,用于按照所述基准数据或所述待比对数据中线程的识别码,从所述线程中获取所述缺失的行主键值所在的行主键列表以及所述获取的行主键列表对应的待写入文件,其中所述线程中存储有行主键列表与待写入文件之间的对应关系;The acquisition sub-module is used to acquire the row primary key list where the missing row primary key value is located and the acquired row primary key from the thread according to the identification code of the thread in the benchmark data or the data to be compared The file to be written corresponding to the list, wherein the thread stores the correspondence between the row primary key list and the file to be written; 重置子模块,用于重新将所述待写入文件对应的数据记录写入所述数据库中,并重新比对基准数据和待比对数据,直至所述比对结果为没有数据记录未写入所述数据库的分布式文件系统中。The reset sub-module is used to re-write the data record corresponding to the file to be written into the database, and re-comparison the reference data and the data to be compared, until the comparison result is that there is no data record that has not been written into the distributed file system of the database. 10.根据权利要求8所述的装置,其特征在于,所述处理模块还包括:10. The apparatus according to claim 8, wherein the processing module further comprises: 删除子模块,用于若所述比对结果为没有数据记录未写入所述数据库的分布式文件系统中,则删除所述基准数据和所述待比对数据;A deletion submodule for deleting the reference data and the data to be compared if the comparison result is that there is no data record that is not written into the distributed file system of the database; 提示子模块,用于按照所述基准数据或所述待比对数据中线程的识别码,向所述线程发送删除提示信息,所述删除提示信息用于提示删除存储于所述线程中的行主键列表与所述待写入文件之间的对应关系。The prompt submodule is used to send deletion prompt information to the thread according to the identification code of the thread in the benchmark data or the data to be compared, and the deletion prompt information is used to prompt deletion of the row stored in the thread Correspondence between the primary key list and the to-be-written file.
CN201611047256.3A 2016-11-23 2016-11-23 Data writing method and device based on Hbase database Expired - Fee Related CN106776795B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611047256.3A CN106776795B (en) 2016-11-23 2016-11-23 Data writing method and device based on Hbase database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611047256.3A CN106776795B (en) 2016-11-23 2016-11-23 Data writing method and device based on Hbase database

Publications (2)

Publication Number Publication Date
CN106776795A CN106776795A (en) 2017-05-31
CN106776795B true CN106776795B (en) 2020-05-12

Family

ID=58974335

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611047256.3A Expired - Fee Related CN106776795B (en) 2016-11-23 2016-11-23 Data writing method and device based on Hbase database

Country Status (1)

Country Link
CN (1) CN106776795B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776795B (en) * 2016-11-23 2020-05-12 黄健文 Data writing method and device based on Hbase database
CN107273462B (en) * 2017-06-02 2020-09-25 浪潮云信息技术股份公司 A method for constructing HBase cluster full-text index, data reading method and data writing method
CN109492001B (en) * 2018-10-15 2021-10-01 四川巧夺天工信息安全智能设备有限公司 Method for extracting fragment data in ACCESS database in classified manner
CN111506582A (en) * 2019-01-30 2020-08-07 普天信息技术有限公司 A data storage method and device
CN110096296A (en) * 2019-05-10 2019-08-06 广州品唯软件有限公司 A kind of caching control methods and equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103853727A (en) * 2012-11-29 2014-06-11 深圳中兴力维技术有限公司 Method and system for improving large data volume query performance
CN104077420A (en) * 2014-07-21 2014-10-01 北京京东尚科信息技术有限公司 Method and device for importing data into HBase database
CN104778182A (en) * 2014-01-14 2015-07-15 博雅网络游戏开发(深圳)有限公司 Data import method and system based on HBase (Hadoop Database)
WO2015109250A1 (en) * 2014-01-20 2015-07-23 Alibaba Group Holding Limited CREATING NoSQL DATABASE INDEX FOR SEMI-STRUCTURED DATA
CN106776795A (en) * 2016-11-23 2017-05-31 黄健文 Method for writing data and device based on Hbase databases

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8504542B2 (en) * 2011-09-02 2013-08-06 Palantir Technologies, Inc. Multi-row transactions

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103853727A (en) * 2012-11-29 2014-06-11 深圳中兴力维技术有限公司 Method and system for improving large data volume query performance
CN104778182A (en) * 2014-01-14 2015-07-15 博雅网络游戏开发(深圳)有限公司 Data import method and system based on HBase (Hadoop Database)
WO2015109250A1 (en) * 2014-01-20 2015-07-23 Alibaba Group Holding Limited CREATING NoSQL DATABASE INDEX FOR SEMI-STRUCTURED DATA
CN104077420A (en) * 2014-07-21 2014-10-01 北京京东尚科信息技术有限公司 Method and device for importing data into HBase database
CN106776795A (en) * 2016-11-23 2017-05-31 黄健文 Method for writing data and device based on Hbase databases

Also Published As

Publication number Publication date
CN106776795A (en) 2017-05-31

Similar Documents

Publication Publication Date Title
CN108446407B (en) Database auditing method and device based on block chain
CN110275884B (en) Data storage method and node
CN107391628B (en) Data synchronization method and device
CN106776795B (en) Data writing method and device based on Hbase database
US11294875B2 (en) Data storage on tree nodes
US11176110B2 (en) Data updating method and device for a distributed database system
CN108512930A (en) Shared file management method, device, server and storage medium
CN112912870A (en) Conversion of Tenant Identifiers
CN106844089A (en) A kind of method and apparatus for recovering tree data storage
US11379315B2 (en) System and method for a backup data verification for a file system based backup
CN118152185A (en) Sharded cluster backup and recovery method, system, device, computer equipment and medium
CN115730016B (en) Data synchronization method, system, device, computer equipment and storage medium
CN106980618B (en) File storage method and system based on MongoDB distributed cluster architecture
CN117743299B (en) Database migration method, device, equipment, medium and product
CN117130995A (en) Data processing methods, devices, equipment and media
CN111444194B (en) Method, device and equipment for clearing indexes in block chain type account book
CN107305582B (en) Method and device for processing metadata
CN116257531B (en) Database space recovery method
US20260023785A1 (en) Property index creation method and apparatus for graph database, device, and storage medium
CN118972129A (en) Business request processing method, device, equipment, storage medium and program product
CN117278625A (en) Message conversion method, device, computer equipment and storage medium
CN114968560A (en) Data backup method and device, computer equipment and storage medium
CN117453454A (en) Data backup method, device, computer equipment, medium and product
CN119669336A (en) Data transfer method, device, computer equipment, readable storage medium and program product
CN119357150A (en) Request processing method, device, computer equipment, storage medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200512

Termination date: 20211123

CF01 Termination of patent right due to non-payment of annual fee