CN113704346B - Hbase table cold-hot data conversion method and device and electronic equipment - Google Patents
Hbase table cold-hot data conversion method and device and electronic equipment Download PDFInfo
- Publication number
- CN113704346B CN113704346B CN202010430568.2A CN202010430568A CN113704346B CN 113704346 B CN113704346 B CN 113704346B CN 202010430568 A CN202010430568 A CN 202010430568A CN 113704346 B CN113704346 B CN 113704346B
- Authority
- CN
- China
- Prior art keywords
- storage
- data
- interval
- time
- cold
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention provides a Hbase table cold-hot data conversion method, a Hbase table cold-hot data conversion device and electronic equipment, which comprise the following steps: and determining a target thermal storage section with a time length longer than a first preset time length from the end time of the storage period to the current time in each thermal storage section, and changing the target thermal storage section into a cold storage section for storing cold data so as to enable the thermal data stored in the target thermal storage section to be converted into the cold data, wherein the cold storage section is a storage section for storing the cold data on a second hard disk, and the thermal storage section meeting certain conditions can be changed into the cold storage section for storing the cold data without transferring the data from a table for storing the thermal data to a table for storing the thermal data, so that the storage burden of the database caused by the change of the cold and the thermal data is reduced.
Description
Technical Field
The invention relates to the technical field of data storage, in particular to a Hbase table cold-hot data conversion method and device and electronic equipment.
Background
HBase (Hadoop Database) is a high-reliability, high-performance, column-oriented, scalable and real-time read-write distributed Database, and plays an important role in the Hadoop ecological circle due to numerous advantages.
In many business scenes of HBase, data are required to be stored in layers according to the time length of data generation, the data are divided into hot data and cold data according to the time length of data generation, wherein the hot data generate data with the time length smaller than a set threshold value, and the cold data generate data with the time length longer than the set threshold value.
In order to achieve both storage cost and storage performance, hot data may be stored in an SSD (Solid STATE DISK, solid state storage hard disk) with relatively high read/write performance, and cold data may be stored in a SATA (SERIAL ADVANCED Technology Attachment ) hard disk with relatively low read/write performance.
The inventors have found that in the process of implementing the present invention, at least the following problems exist in the prior art:
In the conventional Hbase storage technology, a Table (Table) is used as a basic section of data management, and cold/hot data is also distinguished by the Table, so that when hot data in one Table is changed to cold data, the data needs to be migrated to the Table for storing the cold data, and in the Hbase, the migration of the data involves the change of storage information such as a data query mode, and excessive migration greatly increases the storage burden of Hbaes.
Disclosure of Invention
The embodiment of the invention aims to provide a cold and hot data conversion method in an Hbase table, so as to reduce the storage burden of a database caused by the change of cold and hot data. The specific technical scheme is as follows:
the embodiment of the invention provides a Hbase table cold-hot data conversion method, which comprises the following steps:
Determining a thermal storage section with a time length longer than a first preset time length from the end time of a storage section to the current time in each thermal storage section contained in an Hbase table as a target thermal storage section, wherein the thermal storage section is a storage section for storing thermal data on a first hard disk, and a storage section corresponding to one storage section indicates that the storage section is used for storing data with a generation time located in the storage section;
and changing the target thermal storage section into a cold storage section for storing cold data so as to enable the thermal data stored in the target thermal storage section to be converted into the cold data, wherein the cold storage section is a storage section for storing the cold data on a second hard disk, and the read-write performance of the first hard disk is higher than that of the second hard disk.
Further, each storage interval in the Hbase table corresponds to an interval identifier, wherein the interval identifier is a character string in a first preset format, and the duration from the end time of one storage period to the current time is inversely proportional to the value of the character string of the interval identifier corresponding to the storage period;
In each heat storage section included in the Hbase table, determining a heat storage section with a duration from an end time of a storage period to a current time being longer than a first preset duration, including:
Encoding the current time according to a preset first encoding strategy based on a first preset time length to obtain a reference mark, wherein the reference mark is a character string in a first preset format corresponding to a time which is different from the current time by the first preset time length;
and in each heat storage section, determining a heat storage section with the character string value of the section identifier smaller than the character string value of the reference identifier as a target heat storage section.
Further, a storage unit is included between the storage areas in the Hbase table;
The changing the target thermal storage section to a cold storage section storing cold data so as to enable the thermal data stored in the target thermal storage section to be converted into cold data includes:
And modifying the thermal storage strategy of the storage units contained in the target thermal storage interval into a cold storage strategy so as to transfer the storage units contained in the target thermal storage interval from the first hard disk to the second hard disk.
The embodiment of the invention also provides a data storage method, which comprises the following steps:
Determining generation time of data to be stored in an Hbase table aiming at the data to be stored, wherein the Hbase table comprises a hot storage interval and a cold storage interval, the hot storage interval is a storage interval for storing hot data on a first hard disk, the cold storage interval is a storage interval for storing cold data on a second hard disk, the read-write performance of the first hard disk is higher than that of the second hard disk, each storage interval in the Hbase table corresponds to a storage period, and the storage period corresponding to one storage interval represents data of the storage interval for storing the generation time in the storage period;
and storing the data to be stored into a storage interval corresponding to a storage period to which the generation time of the data to be stored belongs in the Hbase table.
Further, each storage interval in the Hbase table corresponds to an interval identifier, wherein the interval identifier is a character string in a first preset format, and the duration from the end time of one storage period to the current time is inversely proportional to the value of the character string of the interval identifier corresponding to the storage period;
Before the data to be stored is stored in the storage interval corresponding to the storage period to which the generation time of the data to be stored belongs in the Hbase table, the method further comprises:
Encoding the generation time of the data to be stored according to a preset second encoding strategy to obtain a character string in the first preset format, wherein the character string is used as a data time identifier of the data to be stored, and the size of a character string value corresponding to the data time identifier of one data is inversely proportional to the duration between the generation time of the data and the current time;
The storing the data to be stored in the storage interval corresponding to the storage period to which the generation time of the data to be stored belongs in the Hbase table includes:
determining a storage interval of which the corresponding interval identifier in the Hbase table is matched with the data time identifier as a target storage interval;
and storing the data to be stored into the target storage interval.
Further, a storage unit is included between the storage areas in the Hbase table;
the storing the data to be stored in the target storage interval includes:
and randomly storing the data to be stored into a storage unit contained in the target storage interval.
Further, the storage interval comprises a first preset number of storage units, each storage unit corresponds to a unit salting-out identifier, and the character string values corresponding to the unit salting-out identifiers of the storage units belonging to the same storage interval are the first preset number of values which are continuous from a preset threshold value;
Before determining the storage interval of which the corresponding interval identifier in the Hbase table is matched with the data time identifier as the target storage interval, the method further comprises the following steps:
acquiring a data line identifier of the data to be stored;
carrying out hash operation on the data line identifier to obtain a hash operation result of the data line identifier, wherein the hash operation result is a numerical value;
Performing remainder operation on the first preset quantity by using the hash operation result to obtain a remainder value;
Determining a data salting-out identifier of the data to be stored based on the numerical sum of the remainder value and the preset threshold value, wherein the character string value of the data salting-out identifier is the numerical sum;
the storing the data to be stored in the target storage interval includes:
determining a storage unit with the same unit salting-out identification as the data salting-out identification in storage units contained in the target storage section as a target storage unit;
and storing the data to be stored into the target storage unit.
The embodiment of the invention also provides a method for establishing the Hbase table, which comprises the following steps:
Determining a time which is different from the current time by a first preset time length as a reference time;
Generating a storage interval based on the reference time, wherein the storage interval comprises a first storage interval of which the starting time of a corresponding storage time period is greater than the reference time and a second storage interval of which the ending time of the corresponding storage time period is not greater than the reference time, and the storage time period corresponding to one storage interval represents that the storage interval is used for storing data of which the generation time is positioned in the storage time period;
The first storage interval is set as a thermal storage interval, and the second storage interval is set as a cold storage interval, wherein the thermal storage interval is a storage interval for storing thermal data on a first hard disk, the cold storage interval is a storage interval for storing cold data on a second hard disk, and the read-write performance of the first hard disk is higher than that of the second hard disk.
Further, the generating a storage section based on the reference time includes:
generating a first preset number of continuous character strings according to a first preset format, and using the character strings as interval identifiers of storage intervals contained in an Hbase table to be established;
And generating a storage interval corresponding to each interval identifier based on the reference time, wherein the interval identifier with the smallest character string value corresponds to the storage interval with the ending time smaller than the reference time, and the duration from the ending time of one storage period to the current time is inversely proportional to the character string value of the interval identifier corresponding to the storage period.
Further, before generating a first preset number of continuous strings according to the first preset format, and using the strings as the section identifier of the storage section included in the Hbase table to be established, the method further includes:
acquiring a first preset duration and a cold-hot data conversion granularity, wherein the cold-hot data conversion granularity represents the granularity of hot data conversion cold data in a time dimension;
calculating the quotient of the first preset time length and the time length ratio corresponding to the cold-hot data conversion granularity;
And taking the sum of the quotient and the second preset quantity as a first preset quantity.
Further, the starting time is greater than the reference time, and the duration of the storage intervals smaller than the current time is the same, and is the second preset duration, and every two adjacent interval identifiers correspond to two adjacent storage periods;
before the generating the storage section corresponding to each section identifier based on the reference time, the method further comprises:
Calculating the ratio of the second preset time length to the time length corresponding to the data storage granularity, wherein the data storage granularity is the minimum time length granularity of data storage, and the ratio is used as a third preset quantity;
Generating a continuous third preset number of character strings according to a second preset format, and using the character strings as unit salting-out marks;
The generating a storage section corresponding to each section identifier based on the reference time includes:
For each interval identifier, respectively combining the interval identifier with the third preset number of unit salting-out identifiers to generate the third preset number of unit identifiers corresponding to each interval identifier;
And generating storage units corresponding to each unit identifier based on the reference time, wherein the storage units with the same interval identifier form a storage interval corresponding to the interval identifier.
Further, the setting the first storage section as a hot storage section and the second storage section as a cold storage section includes:
Setting a storage policy of the storage units constituting the first storage section as a hot storage policy storing hot data, and setting a storage policy of the storage units constituting the second storage section as a cold storage policy storing cold data.
The embodiment of the invention also provides a Hbase table cold-hot data conversion device, which comprises:
a storage interval determining module, configured to determine, as a target thermal storage interval, a thermal storage interval having a duration greater than a first preset duration from an end time to a current time of a storage interval, in each thermal storage interval included in the Hbase table, where the thermal storage interval is a storage interval for storing thermal data on a first hard disk, and a storage interval corresponding to a storage interval indicates that the storage interval is used to store data with a generation time located in the storage interval;
The storage interval changing module is used for changing the target thermal storage interval into a cold storage interval for storing cold data so as to enable the thermal data stored in the target thermal storage interval to be converted into the cold data, the cold storage interval is a storage interval for storing the cold data on a second hard disk, and the read-write performance of the first hard disk is higher than that of the second hard disk.
Further, each storage interval in the Hbase table corresponds to an interval identifier, wherein the interval identifier is a character string in a first preset format, and the duration from the end time of one storage period to the current time is inversely proportional to the value of the character string of the interval identifier corresponding to the storage period;
The storage interval determining module is specifically configured to encode the current time according to a preset first encoding policy based on a first preset duration to obtain a reference identifier, where the reference identifier is a character string in the first preset format corresponding to a time different from the current time by the first preset duration, and in each thermal storage interval, a thermal storage interval in which a value of the character string of the interval identifier is determined to be smaller than that of the reference identifier is used as a target thermal storage interval.
Further, a storage unit is included between the storage areas in the Hbase table;
The storage interval changing module is specifically configured to change a thermal storage policy of a storage unit included in the target thermal storage interval to a cold storage policy, so that the storage unit included in the target thermal storage interval is transferred from the first hard disk to the second hard disk.
The embodiment of the invention also provides a data storage device, which comprises:
The generation time determining module is used for determining the generation time of the data to be stored in the Hbase table aiming at the data to be stored, wherein the Hbase table comprises a hot storage interval and a cold storage interval, the hot storage interval is a storage interval for storing hot data on a first hard disk, the cold storage interval is a storage interval for storing cold data on a second hard disk, the read-write performance of the first hard disk is higher than that of the second hard disk, each storage interval in the Hbase table corresponds to a storage period, and the storage period corresponding to one storage interval represents the data of which the generation time is located in the storage period;
And the data storage module is used for storing the data to be stored into a storage interval corresponding to a storage period to which the generation time of the data to be stored belongs in the Hbase table.
Further, each storage interval in the Hbase table corresponds to an interval identifier, wherein the interval identifier is a character string in a first preset format, and the duration from the end time of one storage period to the current time is inversely proportional to the value of the character string of the interval identifier corresponding to the storage period;
the apparatus further comprises:
The first encoding module is configured to encode, according to a preset second encoding policy, the generation time of the data to be stored before the data storage module performs storing the data to be stored in a storage interval corresponding to a storage period to which the generation time of the data to be stored belongs in the Hbase table, to obtain a character string in the first preset format, where the character string is used as a data time identifier of the data to be stored, and the size of a character string value corresponding to the data time identifier of one data is inversely proportional to a duration between the generation time of the data and the current time;
The data storage module is specifically configured to determine a storage interval in the Hbase table, where the storage interval corresponds to an interval identifier and is matched with the data time identifier, as a target storage interval, and store the data to be stored in the target storage interval.
Further, a storage unit is included between the storage areas in the Hbase table;
The data storage module is specifically configured to randomly store the data to be stored in a storage unit included in the target storage interval.
Further, the storage interval comprises a first preset number of storage units, each storage unit corresponds to a unit salting-out identifier, and the character string values corresponding to the unit salting-out identifiers of the storage units belonging to the same storage interval are the first preset number of values which are continuous from a preset threshold value;
the apparatus further comprises:
The salting-out identification determining module is used for acquiring a data line identification of the data to be stored before the data storage module performs a storage interval which is used for determining that a corresponding interval identification in the Hbase table is matched with the data time identification and is used as a target storage interval, performing hash operation on the data line identification to obtain a hash operation result of the data line identification, performing remainder taking operation on the first preset number by using the hash operation result as a numerical value, obtaining a remainder taking value, and determining a data salting-out identification of the data to be stored based on the numerical sum of the remainder taking value and the preset threshold value, wherein the character string value of the data salting-out identification is the numerical value sum;
The data storage module is specifically configured to determine, from storage units included in the target storage section, a storage unit with a unit salting-out identifier identical to the data salting-out identifier as a target storage unit, and store the data to be stored in the target storage unit.
The embodiment of the invention also provides a device for establishing the Hbase table, which comprises:
The reference time determining module is used for determining the time which is different from the current time by a first preset time length and is used as the reference time;
A storage interval generating module, configured to generate a storage interval based on the reference time, where the storage interval includes a first storage interval in which a start time of a corresponding storage period is greater than the reference time, and a second storage interval in which an end time of the corresponding storage period is not greater than the reference time, and a storage period corresponding to a storage interval indicates that the storage interval is used to store data in the storage period when the generation time is located;
The storage interval setting module is used for setting the first storage interval as a thermal storage interval and setting the second storage interval as a cold storage interval, wherein the thermal storage interval is a storage interval for storing thermal data on a first hard disk, the cold storage interval is a storage interval for storing cold data on a second hard disk, and the read-write performance of the first hard disk is higher than that of the second hard disk.
Further, the storage interval generating module is specifically configured to generate, according to a first preset format, a first preset number of continuous strings as interval identifiers of storage intervals included in an Hbase table to be established, and generate, based on the reference time, a storage interval corresponding to each interval identifier, where an interval identifier with a minimum string value corresponds to a storage interval with an end time smaller than the reference time, and a duration from an end time to a current time of one storage period is inversely proportional to a size of the string value of the interval identifier corresponding to the storage period.
Further, the device further comprises:
The first preset number determining module is configured to obtain a first preset duration and a cold-hot data conversion granularity before the storage interval generating module executes generating a first preset number of continuous character strings according to a first preset format as an interval identifier of a storage interval included in an Hbase table to be established, where the cold-hot data conversion granularity represents a granularity of hot data conversion cold data in a time dimension, calculate a quotient of a ratio of the first preset duration to a duration corresponding to the cold-hot data conversion granularity, and use a sum of the quotient and a second preset number as the first preset number.
Further, the starting time is greater than the reference time, and the duration of the storage intervals smaller than the current time is the same, and is the second preset duration, and every two adjacent interval identifiers correspond to two adjacent storage periods;
the salting-out identifier generating module is used for calculating the ratio of the second preset time length to the time length corresponding to the data storage granularity as a third preset number before the storage interval generating module executes the generation of the storage interval corresponding to each interval identifier based on the reference time, wherein the data storage granularity is the minimum time length granularity of the data storage, and the continuous third preset number of character strings are generated as unit salting-out identifiers according to a second preset format;
The storage interval generation module is specifically configured to, for each interval identifier, respectively combine the interval identifier with the third preset number of unit salting-out identifiers, generate the third preset number of unit identifiers corresponding to each interval identifier, and generate, based on the reference time, a storage unit corresponding to each unit identifier, where storage units having the same interval identifier form a storage interval corresponding to the interval identifier.
Further, the storage interval setting module is specifically configured to set a storage policy of a storage unit that constitutes the first storage interval to a hot storage policy that stores hot data, and set a storage policy of a storage unit that constitutes the second storage interval to a cold storage policy that stores cold data.
The embodiment of the invention also provides electronic equipment, which comprises a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface, and the memory are communicated with each other through the communication bus;
A memory for storing a computer program;
And the processor is used for realizing the steps of any Hbase table cold-hot data conversion method when executing the program stored in the memory.
The embodiment of the invention also provides electronic equipment, which comprises a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface, and the memory are communicated with each other through the communication bus;
A memory for storing a computer program;
and the processor is used for realizing the steps of any data storage method when executing the program stored in the memory.
The embodiment of the invention also provides electronic equipment, which comprises a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface, and the memory are communicated with each other through the communication bus;
A memory for storing a computer program;
And the processor is used for realizing the steps of any Hbase table establishing method when executing the program stored in the memory.
The embodiment of the invention also provides a computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, and the computer program realizes the steps of any Hbase table cold-hot data conversion method when being executed by a processor.
The embodiment of the invention also provides a computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, and the computer program realizes the steps of any data storage method when being executed by a processor.
The embodiment of the invention also provides a computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, and the computer program realizes the steps of any Hbase table establishing method when being executed by a processor.
The embodiment of the invention also provides a computer program product containing instructions, which when run on a computer, cause the computer to execute any Hbase table cold-hot data conversion method.
The embodiments of the present invention also provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform a method of storing any of the data described above.
The embodiment of the invention also provides a computer program product containing instructions, which when run on a computer, cause the computer to execute any of the above-mentioned methods for establishing Hbase table.
According to the cold and hot data conversion method, device and electronic equipment in the Hbase table, in the scheme, in each hot storage section contained in the Hbase table, a hot storage section with a duration longer than a first preset duration from the end time to the current time of the storage section is determined and used as a target hot storage section, wherein the hot storage section is a storage section for storing hot data on a first hard disk, a storage section corresponding to the storage section is used for storing data with the generation time in the storage section, the target hot storage section is changed into a cold storage section for storing cold data, so that the hot data stored in the target hot storage section are converted into the cold data, the cold storage section is a storage section for storing the cold data on a second hard disk, and the read-write performance of the first hard disk is higher than that of the second hard disk.
Of course, it is not necessary for any one product or method of practicing the invention to achieve all of the advantages set forth above at the same time.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
FIG. 1 is a flowchart of a method for converting cold-hot data in Hbase according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a memory section according to an embodiment of the present invention;
FIG. 3 is a flow chart of a method for determining a target thermal storage zone according to one embodiment of the present invention;
FIG. 4 is a flow chart of a method for storing data according to an embodiment of the present invention;
FIG. 5 is a flowchart of another method for storing data according to an embodiment of the present invention;
FIG. 6 is a flow chart of a salting-out method according to an embodiment of the present invention;
FIG. 7 is a flow chart of a method for storing data according to another embodiment of the present invention;
FIG. 8 is a schematic diagram of data rowkey to be stored according to one embodiment of the present invention;
FIG. 9 is a schematic diagram of a unit identifier composition according to an embodiment of the present invention;
FIG. 10 is a flowchart of a method for establishing an Hbase table according to an embodiment of the present invention;
FIG. 11 is a flowchart of another Hbase table establishing method according to an embodiment of the present invention;
FIG. 12 is a schematic diagram of a memory cell according to an embodiment of the present invention;
FIG. 13 is a schematic diagram of a Hbase table cold-hot data converting apparatus according to an embodiment of the present invention;
FIG. 14 is a schematic diagram of a data storage device according to an embodiment of the present invention;
FIG. 15 is a schematic diagram of an Hbase table establishing apparatus according to an embodiment of the present invention;
Fig. 16 is a schematic structural diagram of an electronic device corresponding to the method for converting cold-hot data in an Hbase table according to an embodiment of the present invention.
Fig. 17 is a schematic structural diagram of an electronic device corresponding to a data storage method according to an embodiment of the present invention.
FIG. 18 is a schematic structural diagram of an electronic device corresponding to the method for creating an Hbase table according to an embodiment of the present invention.
Detailed Description
In order to provide an implementation scheme for reducing storage burden on a database caused by cold and hot data change, an embodiment of the application provides a Hbase table cold and hot data conversion method, a Hbase table cold and hot data conversion device and electronic equipment, and the embodiment of the application is described below with reference to an attached drawing of the specification. And embodiments of the application and features of the embodiments may be combined with each other without conflict.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
In one embodiment of the present invention, there is provided a method for converting cold-hot data in an Hbase table, as shown in fig. 1, comprising the steps of:
S101: and determining a thermal storage section with a time length from the end time of the storage section to the current time being longer than a first preset time length from each thermal storage section contained in the Hbase table as a target thermal storage section, wherein the thermal storage section is a storage section for storing thermal data on a first hard disk, and a storage section corresponding to one storage section indicates that the storage section is used for storing data with the generation time being positioned in the storage section.
S102: and changing the target thermal storage section into a cold storage section for storing cold data so as to enable the thermal data stored in the target thermal storage section to be converted into the cold data, wherein the cold storage section is a storage section for storing the cold data on the second hard disk, and the read-write performance of the first hard disk is higher than that of the second hard disk.
In the cold-hot data conversion method in the Hbase table shown in fig. 1, since the storage interval meeting the condition of storing cold data in the Hbase table can be determined according to the duration from the end time of the storage period to the current time, and then the storage interval is set as the storage interval for storing cold data, the data stored in the storage interval does not need to be migrated one by one into the Hbase table for storing cold data, namely, the hot data is converted into cold data, the data is not migrated any more, only the storage interval where the data is located is migrated from the first hard disk to the second hard disk, and the whole data migration process does not involve the change of storage information such as a data query mode, thereby reducing the storage burden caused by the change of cold and hot data to the Hbase.
As known to those skilled in the art, region is the basic unit of HBase management data.
For the above scheme provided by the embodiment of the present application, in S101, the storage section may represent a set of regions in the Hbase table, where each region in the set of regions represented by one storage section is commonly responsible for storing data whose generation time is located in a storage period corresponding to the storage section.
For the Hbase table, each storage interval included in the Hbase table has a storage period corresponding to the Hbase table, the storage period corresponding to one storage interval indicates that the storage interval is used for storing data with generation time being located in the storage period, and optionally, two storage periods corresponding to any two storage intervals do not have intersection, and each storage period is independent.
Optionally, the first preset duration may be determined based on experience and demand, and may be shorter for hbases storing data with higher aging requirements, and longer for hbases storing data with no time efficiency requirements. Further, for convenience of use, the first preset time period may have a granularity determined by a time unit of "year, month, day, hour" or the like as a time period, for example, 6 days.
The first preset duration may be used as a criterion for cold and hot data, for example, when the duration between the time of generating one data and the current time is greater than the first preset duration, the data may be determined to be cold data, and in order to save the storage cost, the data should be stored in a disk with lower storage cost.
In the embodiment of the present invention, in order to avoid frequent migration of data, for a data, even if the duration between the generation time and the current time is longer than the first preset duration, it does not mean that the data will be changed in storage position, and only when the generation time of all the stored data in the storage interval where the data is located is longer than the first preset duration, the data will be stored in the disk with lower storage cost.
Therefore, in the embodiment of the present invention, whether the storage interval corresponding to the storage period needs to be changed into the cold storage interval is determined according to the duration between the end time and the current time of the storage period, and when the duration from the end time to the current time of one storage period is greater than the first preset duration, it means that the duration between the generation time and the current time of all the data stored in the storage period is greater than the first preset threshold.
For example, as shown in fig. 2, a schematic diagram of a memory segment is provided in the embodiment of the present invention, where a region set formed by a region 1, a region 2 and a region 3 may be denoted as a memory segment a, and a region set formed by a region 4, a region 5 and a region 6 may be denoted as a memory segment B.
For example, the storage period corresponding to the storage interval a is [2019, 11, 1, 2019, 11, 4, etc. ], which means that the storage interval a is used for storing data with the generation time within the period from 2019, 11, 1, 0 to 2019, 11, 4, 0, and the end time is 2019, 11, 4, 0. The storage period corresponding to the storage section B is [2019 11 month 4 day, 2019 11 month 7 day ], which means that the storage section B is used for storing data with the generation time within the period from 2019 11 month 4 day 0 to 2019 11 month 7 day 0, and the end time is 2019 11 month 7 day 0.
If the first preset duration is 6 days, the current time is 2019, 11 and 0 days, it can be known through calculation that the duration from the end time of the storage section A to the current time is 7 days and is longer than the first preset duration, and the duration from the end time of the storage section B to the current time is 4 days and is not longer than the first preset duration, so that the storage section A is the target thermal storage section.
Alternatively, the first hard disk may be a hard disk with excellent read/write performance, for example, an SSD.
In S102, the duration between the time of generating each data stored in the target thermal storage section and the current time should be greater than the first preset duration, and in order to convert the data into cold data, the target thermal storage section may be changed to the cold storage section, that is, the storage position of the target thermal storage section is changed from the first hard disk to the second hard disk.
Optionally, a storage unit is included between the storage areas in the Hbase table, where the storage unit may be a region in the Hbase table.
At this time, the thermal storage policy of the storage unit included in the target thermal storage section may be modified to a cold storage policy so that the storage unit included in the target thermal storage section is transferred from the first hard disk to the second hard disk.
The thermal storage policy may be all_ssd (all_solid STATE DISK), i.e. the storage unit is stored on the SSD hard disk, and the cold storage policy may be HOT (HOT), i.e. the storage unit is stored on a general hard disk, such as SATA disk.
As known to those skilled in the art, after the storage policy of the storage unit is modified, a mover (mobile) tool of the HDFS may automatically and regularly scan the storage policy of the HDFS directory, and transfer the storage policy from the data that does not conform to the current actual storage manner, so as to transfer the storage unit from the first hard disk to the second hard disk.
For a clearer understanding of the present solution, the above-mentioned step S101 is described in detail below in conjunction with the target thermal storage section determining method shown in fig. 3, where in the target thermal storage section determining method shown in fig. 3, each storage section in the Hbase table corresponds to a section identifier, the section identifier is a character string in a first preset format, the duration from the end time to the current time of a storage period is inversely proportional to the size of the character string value of the section identifier corresponding to the storage period, and the target thermal storage section determining method shown in fig. 3 includes:
S301: based on a first preset duration, encoding the current time according to a preset first encoding strategy to obtain a reference mark, wherein the reference mark is a character string in a first preset format corresponding to a time which is different from the current time by the first preset duration.
In this step, each storage section in the Hbase table corresponds to a section identifier, where the section identifier is a string in a first preset format, and the duration from the end time of a storage period to the current time is inversely proportional to the value of the string of the section identifier corresponding to the storage period.
The character strings of the first preset format may be determined according to actual storage conditions and experience, the number of identifiers that can be formed by character strings with different character numbers is also different, for Hbase with a large number of storage sections, the requirement of the Hbase for section identifiers corresponding to the storage sections is also large, so that the character numbers of the character strings can be set more, optionally, the section identifiers can be not 32-bit character strings, and the range of the section identifiers is from 0x0000 to 0xFFFF.
For any two storage sections in the same Hbase table, the smaller the duration of the end time of the storage period reaching the current time is, the larger the character string value of the corresponding section identifier is.
For example, the storage period corresponding to the storage section a is [2019, 11, 1, 2019, 11, 4, etc. ], the storage period corresponding to the storage section B is [2019, 11, 4, 2019, 11, 7, etc. ], and the current time is 2019, 11, 15, etc., the string value of the storage section a section identifier should be smaller than the string value of the storage section B section identifier.
Optionally, when two storage periods of two storage intervals are adjacent, the string values of the corresponding interval identifiers are also adjacent.
For example, in the above example, when the end time of the storage section a is the start time of the storage section B, the section identifier of the storage section a and the section identifier of the storage section B should be adjacent, for example, when the storage section a is 0x0002, the storage section B should be 0x0003.
In one embodiment of the present invention, the preset first encoding strategy may be determined according to a relationship between a string value of the region identifier of the storage section and the storage period in terms of values.
For example, the first preset duration is 6 days, excluding the storage sections of the head and tail, the duration of the storage period of each storage section is 3 days, when the storage period corresponding to the storage section a is [2019 11 month 1 day, 2019 11 month 4 day ], the corresponding section identifier is 0x0002, the character string value of the section identifier is 2, the storage period corresponding to the storage section B is [2019 11 month 4 day, 2019 11 month 7 day), the corresponding section identifier is 0x0003, the character string value of the section identifier is 3, every 3 days, the character string value of the section identifier differs by 1, when the current time is 2019 11 month 14 day, the reference time should be 2019 11 month 8 day, which belongs to the section identifier of the storage section B from the time period 2019 11 month 7 day 0 to the time 2019 month 9 day 0, and therefore, the coded reference identifier is larger than the section identifier of the storage section B, that is, the character string value of the reference identifier differs from the section identifier of the storage section B by 1, that is, and is 0x0004.
Optionally, the preset first code measurement may determine the reference identifier of the current time according to a conversion formula. The conversion formula may be: [ (current time-initial time-first preset duration)/storage period duration +1].
In the above conversion formula, the initial time is determined based on the table building time of the Hbase table and a first preset time length, and optionally, the initial time may be a time length of a storage period which is different from the table building time by the first preset time length and is earlier than the table building time, and may be a time length of a storage period corresponding to each storage period except the storage period of the head end and the tail end in the Hbase table.
Optionally, the conversion formula is rounded, and the obtained result is used as the character string value of the reference mark.
It should be noted that when the conversion formula is used to determine the reference identifier, a certain conversion relationship should also be satisfied between the section identifier of the storage section and the storage period.
For example, the section identifier = [ (storage period time-initial time)/storage period duration +1], wherein the storage period time is any time included in the storage period except for the end time, for example, for the storage period [2019 11 month 1 day, 2019 11 month 4 day ], the storage period time should satisfy the 2019 11 month 1 day 0 or more and the 2019 11 month 4 day 0 point or less, for example, the 2019 11 month 3 day 0 point or less.
S302: and in each heat storage section, determining the heat storage section with the character string number value of the section identifier smaller than that of the reference identifier as a target heat storage section.
In this step, a thermal storage section in which the string value of the section identifier is smaller than the string value of the reference identifier may be used as the target thermal storage section, where the reference time is a time shorter than the first preset duration of the current time, and when the string value of the section identifier is smaller than the string value of the reference identifier, it means that the duration of the end time of the storage period corresponding to the section identifier reaching the current time should be longer than the first preset duration.
In the method for determining the target thermal storage section as shown in fig. 3, the current time can be encoded according to the preset first encoding strategy based on the first preset time length to obtain the reference mark, wherein the reference mark is a character string in the first preset format corresponding to the time which is different from the current time by the first preset time length, and in each thermal storage section, the thermal storage section with the character string value of the determined section mark smaller than the character string value of the reference mark is used as the target thermal storage section, and the target thermal storage section can be conveniently and rapidly determined by directly comparing the character string values of the section mark and the reference mark.
In contrast, in one embodiment of the present invention, there is provided a data storage method, as shown in fig. 4, including the steps of:
S401: determining generation time of data to be stored in an Hbase table aiming at the data to be stored, wherein the Hbase table comprises a hot storage section and a cold storage section, the hot storage section is a storage section for storing hot data on a first hard disk, the cold storage section is a storage section for storing cold data on a second hard disk, the read-write performance of the first hard disk is higher than that of the second hard disk, each storage section in the Hbase table corresponds to a storage period, and the storage period corresponding to one storage section represents that the storage section is used for storing data with generation time at the storage period.
S402: and storing the data to be stored into a storage interval corresponding to a storage period to which the generation time of the data to be stored belongs in the Hbase table.
In the method for storing data as shown in fig. 4, because the Hbase table includes the hot storage section and the cold storage section, the data to be stored can be determined to be stored in the hot storage section or in the cold storage section according to the generation time of the data to be stored, and the hot data and the cold data can be stored in one Hbase table at the same time, so that the storage burden of the Hbase caused by the change of the hot and cold data is reduced.
In the above solution provided by the embodiment of the present application, in S401, the Hbase table includes a hot storage section and a cold storage section, which may be determined during generation of the Hbase table, or may be determined during data storage of the Hbase table by using the method for converting cold-hot data in the Hbase table provided by the above embodiment.
In one embodiment, each storage interval in the Hbase table corresponds to a storage period, and the storage period corresponding to one storage interval indicates that the storage interval is used for storing data with the generation time being in the storage period.
For example, the Hbase table includes two storage sections, namely, a storage section C and a storage section D, where the storage section C corresponds to a storage period of [2019, 11, 7, 2019, 11, 10, and indicates that the storage section C is used for storing data with a generation time within a range from 0 point of 2019, 11, 7, 10, and 2019, and the storage section D corresponds to a storage period of [2019, 11, 10, 13, 2019, 11, 13 ] and indicates that the storage section D is used for storing data with a generation time within a range from 0 point of 2019, 11, 10, 0 to 0 point of 2019, 11, 13.
The storage section C is a storage section for storing cold data on the second hard disk, and the storage section D is a storage section for storing hot data on the first hard disk.
The first hard disk may be an SSD, and the second hard disk may be a SATA.
In one embodiment, the time of generating the data to be stored may be the time of generating the data to be stored, for example, the time of generating the log data in response to a request from the client, where the time when the log data is generated may be the time of generating the log data, or alternatively, the time of generating the data to be stored may be the time when the data to be stored enters the distributed storage system where the Hbase table is located.
In one embodiment, the determining manner of the generating time of the data to be stored may be determined according to the data type of the data to be stored, the generating time type of the data to be stored, or the like, alternatively, the generating time of the data to be stored may be stored in metadata of the data to be stored, so that the generating time of the data to be stored may be determined by reading metadata of the data to be stored, alternatively, the generating time of the data to be stored may be used as the attribute information of the data to be stored, so that the generating time of the data to be stored may be acquired by reading the data to be stored, alternatively, the generating time of the data to be stored may be recorded by a distributed storage system where Hbase is located, and thus, the generating time of the data to be stored may be determined by Hbase.
In another embodiment, for each data that needs to be stored in the Hbase, before the data is written into the Hbase, format conversion is required for the data, so that the data format of the data can meet the requirement of the Hbase, and therefore, when format conversion is performed for the data, the generation time of the data can be obtained.
In one embodiment, after the generation time of the contemporary stored data is determined, the data to be stored may be stored in a storage section corresponding to a storage period to which the generation time of the data to be stored belongs in the Hbase table.
For example, the generation time of the data to be stored is 2019, 11, 12, the storage period of the storage interval C is [2019, 11, 7, 2019, 11, 10), and the storage period of the storage interval D is [2019, 11, 10, 2019, 11, 13 ], the generation time of the data to be stored should belong to the storage period corresponding to the storage interval D, and therefore, the data to be stored is stored in the storage interval D.
In one embodiment of the present invention, another data storage method is provided, where each storage section in the Hbase table corresponds to a section identifier, the section identifier is a string in a first preset format, and a duration from an end time of a storage period to a current time is inversely proportional to a value of the string of the section identifier corresponding to the storage period, as shown in fig. 5, and the method includes the following steps:
s501: and determining the generation time of the data to be stored aiming at the data to be stored in the Hbase table.
In this step, the implementation manner is the same as or similar to S401, and will not be described here again.
S502: according to a second preset encoding strategy, encoding the generation time of the data to be stored to obtain a character string in a first preset format, wherein the character string is used as a data time identifier of the data to be stored, and the size of a character string value corresponding to the data time identifier of one data is inversely proportional to the duration between the generation time of the data and the current time.
In this step, since each storage section in the Hbase table corresponds to a section identifier, the section identifier is a string in a first preset format, and the duration from the end time of one storage period to the current time is inversely proportional to the value of the string of the section identifier corresponding to the storage period.
The character strings of the first preset format may be determined according to actual storage conditions and experience, the number of identifiers that can be formed by character strings with different character numbers is also different, for Hbase with a large number of storage sections, the requirement of the Hbase for section identifiers corresponding to the storage sections is also large, so that the character numbers of the character strings can be set more, optionally, the section identifiers can be not 32-bit character strings, and the range of the section identifiers is from 0x0000 to 0xFFFF.
Alternatively, the interval identifier of the storage interval may be incremented from 0x0000, where the string value of the interval identifier of 0x0000 is the smallest, so that the end time of the storage period of the storage interval corresponding to the interval identifier should be farthest from the current time, that is, 0x0000 corresponds to the initial storage interval in the Hbase table, and all the data smaller than the end time of the initial storage interval should be stored in the initial storage interval.
In one embodiment, the preset second encoding strategy may be determined according to a relationship between a string value of the region identifier of the storage section and the storage period in terms of values.
In short, in order to enable the data to be stored to be correctly stored in the storage period to which the generation time belongs, the data time identifier of the data to be stored may be the same as the interval identifier corresponding to the storage period to which the generation time of the data to be stored belongs, so the data time identifier corresponding to the generation time of the data to be stored may be determined based on the relationship between the character string value of the area identifier of the storage interval and the storage period in terms of values.
For example, the Hbase includes three storage sections, namely a storage section 1, a storage section 2 and a storage section 3, wherein the storage section 1 is identified as 0x0000, the corresponding storage period is less than 2019 11 month 1 day 0 point, the storage section 2 is identified as 0x0001, the corresponding storage period is greater than or equal to 2019 11 month 1 day 0 point, and less than 2019 11 month 2 day 0 point, and the storage section 3 is identified as 0x0003, the corresponding storage period is greater than or equal to 2019 11 month 2 day 0 point, and less than 2019 month 11 month 3 day 0 point. The character string value of the interval identifier is added with 1 every 1 day, and further, the character string data of the interval identifier pair can be determined by adding 1 to the number of days which are different from the number of days of the ending time corresponding to the interval identifier 0x0000, so that the number of days which are different from the number of days of the starting storage interval in the generation time of the data to be stored can be used as the data time identifier of the generation time of the data to be stored.
Alternatively, the preset second encoding measure may generate a conversion formula for the data time identification of the instant data, wherein the formula may be [ (current instant-initial time)/storage period duration +1].
In the above conversion formula, the initial time is determined based on the table building time of the Hbase table and a first preset time length, and optionally, the initial time may be a time length of a storage period which is different from the table building time by the first preset time length and is earlier than the table building time, and may be a time length of a storage period corresponding to each storage period except the storage period of the head end and the tail end in the Hbase table.
Alternatively, the above formula may be rounded, and the obtained result is used as the string value of the data time identifier.
It should be noted that when the above formula is used to determine the data time identifier, a certain conversion relationship should be satisfied between the interval identifier of the storage interval and the storage period.
For example, the section flag= [ (storage period time-initial time)/storage period duration+1 ], wherein the storage period time is any time included in the storage period except for the end time.
S503: and determining a storage section of which the corresponding section identifier is matched with the data time identifier in the Hbase table as a target storage section.
In this step, the data time identifier is matched with the interval identifier, which may be the same as the interval identifier or different by a preset value.
Optionally, the interval identifier of the storage interval may include two identifiers, namely a start identifier and an end identifier, and when the character string value corresponding to the data time identifier of the data to be stored is between the start identifier and the end identifier, the data time information identifier is matched with the to-be-stored interval identifier.
S504: and storing the data to be stored into the target storage interval.
In this step, the data to be stored may be stored in the determined target storage section.
Optionally, a storage unit is included between storage areas in the Hbase table, where the storage unit may be a region in the Hbase table, and the storage area is a set of regions.
At this time, the data to be stored may be randomly stored in the storage unit included in the target storage section.
The data to be stored is randomly stored in the storage units contained in the target storage interval, so that excessive data stored in a single storage unit can be avoided, and load balancing is ensured.
In the data storage method shown in fig. 5, the target storage interval for storing the data to be stored can be determined rapidly and accurately through the data time identification.
Further, before executing step S504, the embodiment of the present invention further provides a salting-out method for avoiding excessive concentration of data, where a storage interval includes a first preset number of storage units, each storage unit corresponds to a unit salting-out identifier, and a string value corresponding to a unit salting-out identifier of a storage unit belonging to the same storage interval is a first preset number of values that are continuous from a preset threshold, as shown in fig. 6, including:
s601: and acquiring a data line identifier of the data to be stored.
In this step, the storage unit may be the regions in the Hbase table, the storage interval is a region set, the first preset number may be determined according to experience and actual requirements, and may represent granularity of the regions during cold-hot data conversion, where only if the first preset number of regions all meet the condition of hot data conversion and cold data conversion, the first preset number of regions will be changed into the regions for storing cold data at the same time, and the regions are stored in the hard disk with lower read-write performance and lower storage cost.
Salting out (salt) is an operation in HBase to break up data storage, and is aimed at preventing data that is close in time from being concentrated in one region, resulting in frequent single region overheating.
Optionally, the string values corresponding to the unit salting-out identifiers of the storage units belonging to the same storage section are a first preset number of values which are continuous from a preset threshold value.
For example, when the first preset number is 3 and the preset threshold is 0, the unit salting-out identifiers of the memory units belonging to the same memory interval should be3 consecutive values, such as 0, 1 and 2, which may be represented by 16 as character strings, and the corresponding hexadecimal character strings may be 0x00, 0x01 and 0x02, respectively. When the preset threshold is 3, then the unit salting-out identifications may be 0x03, 0x04, and 0x05.
It is known to those skilled in the art that each data stored in Hbase needs to have its own unique corresponding rowkey (row key) for uniquely identifying the data, which can be identified for the data row of the data to be stored.
Alternatively, this step may be rowkey to obtain the data to be stored.
S602: and carrying out hash operation on the data line identifier to obtain a hash operation result of the data line identifier, wherein the hash operation result is a numerical value.
In this step, after any string of characters is input and hash operation is performed on the data line identifier, another string of character strings can be obtained, and optionally, the character string value of the character string after operation can be calculated and used as the hash operation result.
Alternatively, the hash operation may be MD4, MD5, SHA-1 (Secure Hash Algorithm, secure Hash Algorithm 1).
S603: and performing remainder operation on the first preset quantity by using the hash operation result to obtain a remainder value.
In this step, the hash result is subjected to remainder operation on the first preset number, so that the obtained remainder values are smaller than the first preset number.
For example, when the first preset number is 3, for any hash result, the remainder value obtained by the hash result may only be three values of 0, 1 and 2. When the first preset number is 4, the obtained remainder value can only be four values of 0, 1,2 and 3. When the first preset number is 5, the obtained remainder value can only be five values of 0, 1,2, 3 and 4.
S604: and determining a data salting-out identifier of the data to be stored based on the sum of the remainder value and the value of the preset threshold value, wherein the value of the character string of the data salting-out identifier is the sum of the values.
In this step, in order to make the data salting-out identifier correspond to the unit salting-out identifier, the sum of the remainder value and the preset threshold value may be determined based on the remainder value obtained in step S604.
For example, if the remainder of step S604 is three values of 0, 1 and 2, and the preset threshold is1, the string value of the data salting-out identifier is one of the three values of 1,2 and 3, and the data salting-out identifier is one of 0x01, 0x02 and 0x 03.
In the salting-out method shown in fig. 6, the data line identifier of the data to be stored can be obtained, hash operation is performed on the data line identifier to obtain the hash operation result of the data line identifier, the hash operation result is a numerical value, the hash operation result is subjected to residual operation on the first preset number to obtain the residual value, the data salting-out identifier of the data to be stored is determined based on the sum of the residual value and the value of the preset threshold, and the character string value of the data salting-out identifier is the sum of the numerical value.
In one embodiment, based on the foregoing embodiment, the step S504 may be specifically implemented by, as shown in fig. 7, a further data storage method according to an embodiment of the present invention, including:
S701: and determining a storage unit with the same unit salting-out identification as the data salting-out identification in the storage units contained in the target storage interval as a target storage unit.
In this step, the unit salting-out flag of the storage unit included in the target storage section is a numerical value incremented from the preset threshold value.
For example, if the preset threshold is 3 and the first preset number is 3, the unit salting-out identifiers of the storage units in the target storage section are 0x03, 0x04, and 0x05, and based on the salting-out method shown in fig. 6, the obtained data salting-out identifier is also one of 0x03, 0x04, and 0x05, so that the unit salting-out identifier identical to the data salting-out identifier can be determined, and the target storage unit can be further determined.
S702: and storing the data to be stored into the target storage unit.
In this step, the data to be stored may be stored in the target storage unit included in the target storage section.
In the data storage method shown in fig. 7, according to the embodiment of the present invention, the target storage unit can be further determined in the target storage interval by using the unit salting-out identifier and the digital salting-out identifier, so that the problem of overheating of the storage unit caused by storing the data in one storage unit in a data set can be avoided.
In an embodiment of the present invention, the data time identifier and the data salting-out identifier of the data to be stored and the data line identifier of the data to be stored may be combined into a new rowkey of the data to be stored, optionally, as shown in fig. 8, a schematic diagram of the data to be stored rowkey is shown, where rowkey =data time identifier+data salting-out identifier+data line identifier of the data to be stored, where the data line identifier of the data to be stored may be the original rowkey of the data.
In contrast, the storage unit of each storage section also includes a unit identifier, and optionally, the unit identifier is composed of a section identifier and a unit salting-out identifier of the storage section to which the storage unit belongs, as shown in fig. 9, which is a schematic diagram of the unit identifier composition of the storage unit, where the unit identifier=the section identifier+the unit salting-out identifier.
Optionally, for a data to be stored, only the calculator data time identifier and the data salting-out identifier are needed, and the target storage unit can be determined by determining the same storage unit from the Hbase table, so that the data to be stored is stored in the target storage unit.
Corresponding to the above-mentioned method for converting cold and hot data in an Hbase table shown in fig. 1 and the method for storing data shown in fig. 4 provided by the embodiment of the present invention, the embodiment of the present invention provides a method for establishing an Hbase table, as shown in fig. 10, including:
s1001: and determining the time which is different from the current time by a first preset time length as a reference time.
S1002: and generating a storage section based on the reference time, wherein the storage section comprises a first storage section of which the starting time of the corresponding storage period is not less than the reference time and a second storage section of which the ending time of the corresponding storage period is less than the reference time, and the storage period corresponding to one storage section represents that the storage section is used for storing data of which the generation time is positioned in the storage period.
S1003: the first storage interval is set as a hot storage interval, and the second storage interval is set as a cold storage interval, wherein the hot storage interval is a storage interval for storing hot data on the first hard disk, the cold storage interval is a storage interval for storing cold data on the second hard disk, and the read-write performance of the first hard disk is higher than that of the second hard disk.
In the method for establishing the Hbase table shown in fig. 10 according to the embodiment of the present invention, the Hbase to be established includes both a hot storage section for storing hot data and a cold storage section for storing cold data, so that the storage burden of the Hbase caused by the change of the hot and cold data is reduced.
In the above solution provided by the embodiment of the present application, in S1001, the reference time is a time earlier than the current time.
For example, if the first preset time period is 6 days and the current time is 10 days of 11 months of 2019, the reference time may be determined to be 0 days of 4 months of 11 months of 2019.
Optionally, the first preset duration is a preset duration for judging cold-hot data conversion, and when the duration from the time of generating one piece of hot data to the current time is longer than the first preset duration, the hot data is converted into cold data, so that the reference time can be a cold-hot data separation time corresponding to the current time, that is, the generating time is cold data before the reference time, the generating time is reference time, or the generating time is hot data after the reference time.
Alternatively, a Pre-partitioning method (Pre-Split) may be used to separate at least two regions with reference time as a partitioning point, where at least one region with end time smaller than the reference time is used as a second storage region, and at least one region with start time not smaller than the reference time is used as a first storage region.
For the above embodiment of the present invention, the starting time of the storage period corresponding to the first storage section is not less than the reference time, which means that the generation time of the data stored in the first storage section should not be less than the reference time, and the data stored in the first storage section should be hot data, and the data stored in the second storage section should be cold data.
Optionally, the storage position of the first storage section is set on the first hard disk with better read-write performance, and the storage position of the second storage section is set on the second hard disk with lower cost.
In one embodiment of the present invention, another method for establishing an Hbase table is provided, as shown in fig. 11, including:
S1101: and acquiring a first preset duration and a cold-hot data conversion granularity, wherein the cold-hot data conversion granularity represents the granularity of the hot data conversion cold data in the time dimension.
In this step, the granularity of cold-hot data conversion represents the granularity of hot data conversion cold data in the time dimension, that is, the stored data is temporally divided according to the granularity of cold-hot data conversion, and the data in the same granularity of cold-hot data conversion have the same cold-hot attribute.
For example, if the granularity of cold-hot data conversion is 3 days, cold-hot data conversion is performed once every 3 days, and for example, the first preset time period is 6 days, [2019 11 month 12 day 0 point to 2019 11 month 15 day 0 point) is one granularity of cold-hot data conversion, when the current time is 2019 11 month 20 day 0 point, [2019 11 month 12 day 0 point to 2019 11 month 15 day 0 point) part of the time periods satisfy the condition that the difference from the current time is greater than the first preset time period, and the condition that hot data is converted into cold data is satisfied, but because the condition that hot data is not converted into cold data in one granularity is not performed, the condition that all the granularity satisfies the condition that hot data is converted into cold data is waited, the whole granularity is converted into cold data, for example, the current time period is 2019 11 month 21 day 0 point, and at any time period [2019 11 month 12 day 0 point to 2019 month 11 month 15 day 0 point) satisfies the condition that the hot data is converted into cold data, and the corresponding cold data is converted into cold data.
Alternatively, the first preset duration and the granularity of the cold-hot data conversion may be determined according to actual requirements and experience.
Alternatively, the granularity of cold-hot data conversion may be used as the storage duration of the storage period corresponding to the storage section.
S1102: and calculating the quotient of the first preset time length and the time length ratio corresponding to the cold-hot data conversion granularity.
In this step, a quotient of the ratio of the first preset duration to the duration corresponding to the granularity of cold-hot data conversion may be calculated, for example, the first preset duration is 6 days, the granularity of cold-hot data conversion is 3 days, the calculated quotient is 6/3=2, or the first preset duration is 5 days, the granularity of cold-hot data conversion is 2 days, the calculated quotient is 5/2=2 more than 1, and the quotient is 2.
The number of granularity that can be contained in a first preset duration can be determined by calculating the first preset duration and the granularity of cold-hot data conversion.
S1103: taking the sum of the quotient and the second preset quantity as the first preset quantity.
In this step, the second preset number may be determined according to requirements and experience, and optionally, the second preset number may be the number of the initial storage interval and the tail storage interval, where the number of the initial storage interval is fixed and only one of the initial storage interval and the tail storage interval is included, and the tail storage interval is a storage period corresponding to the storage period where the current time is located, and optionally, the second preset number is 2.
For example, when the quotient of the first preset time length and the time length corresponding to the cold-hot data conversion granularity is 2, the first preset number is 4.
S1104: and generating a first preset number of continuous character strings according to a first preset format, and taking the character strings as interval identifiers of storage intervals contained in the Hbase table to be established.
In this step, the first preset format is a format of a section identifier, for example, the section identifier is a 32-bit string. When the first preset number is 4, the first preset number of consecutive strings may be 0x0000, 0x0001, 0x0002, and 0x0003, respectively, as section identifiers of the Hbase storage sections to be established.
From the analysis, the smaller the value of the string corresponding to the section identifier, the longer the time period from the end time of the corresponding storage time to the current time is, so that the string with the smallest string value is the section identifier corresponding to the initial storage section and the string with the largest string value is the section identifier corresponding to the tail storage section in the obtained first preset number of continuous strings.
And dividing the storage interval into quotient storage intervals of the ratio of the first preset duration to the duration corresponding to the cold-hot data conversion granularity according to the cold-hot storage granularity between the end time corresponding to the start storage interval and the start time corresponding to the tail storage interval.
S1105: and calculating the ratio of the second preset time length to the time length corresponding to the data storage granularity, wherein the data storage granularity is the minimum time length granularity of the data storage as a third preset quantity.
In this step, optionally, the duration of the storage intervals, the starting time of which is not less than the reference time and less than the current time, is the same as the second preset duration, and every two adjacent storage periods corresponding to two interval identifiers are adjacent.
The data storage granularity is the minimum period of data storage, and when the data storage granularity is 1 day, the data storage is performed with the minimum data storage period of 1 day, and the storage intervals of the data storage within 1 day are the same.
Optionally, the data storage granularity is the minimum time granularity of the data stored in the Hbase table storage unit to be established.
The number of storage units that should be included in the storage interval can be determined by calculating the ratio of the second preset duration to the duration corresponding to the granularity of data storage.
For example, when the second preset time period is 3 days, the granularity of data storage is 1 day, and the calculated ratio is 3, which means that 3 storage units should be included in one storage area.
S1106: and generating a third continuous preset number of character strings according to the second preset format, and taking the character strings as unit salting-out marks.
In this step, the second preset format may be according to actual requirements and experience, and the number of character combinations corresponding to the character strings in the second preset format may be greater than the third preset number.
For example, when the third preset number is 3, the character string of the second preset format needs to contain at least 2 characters.
Optionally, the number of characters of the character string in the second preset format is 16 bits, and 16 is expressed as 0x00 to 0xFF.
In one embodiment, generating the consecutive third preset number of strings may begin with 0, and illustratively, when the third preset number is 3, the unit salting-out flag includes three of 0x00, 0x01, and 0x 02.
S1107: and respectively combining the interval identifiers with the third preset number of unit salting-out identifiers to generate a third preset number of unit identifiers corresponding to each interval identifier.
In this step, each section identifier and the unit salting-out identifier are combined.
Illustratively, the interval identifier includes 0x0000 to 0x0003, the unit salting-out identifier includes 0x00 to 0x02, and the unit identifier generated by the combination includes (0 x000000, 0x000001, 0x 000002), (0 x000100, 0x000101, 0x 000102), (0 x000200, 0x000201, 0x 000202) and (0 x000300, 0x000301, 0x 000302).
S1108: and generating storage units corresponding to each unit identifier based on the reference time, wherein the storage units with the same interval identifier form a storage interval corresponding to the interval identifier.
In this step, the reference time corresponds to the unit identifier with the smallest character string value, and the area where the unit identifier is located is the second storage area.
Each unit identification corresponding storage unit may be generated based on Pre-Split (Pre-separation) means.
The optional memory locations are regions, one of which corresponds to a start key and an end key, as known to those skilled in the art.
Optionally, each unit identifier may be ordered according to a dictionary sequential wire arrangement method, and for the ordering result, a combination of any two adjacent unit identifiers is determined, where each combination corresponds to a region, where two unit identifiers belonging to the same combination are ordered, a start key that is used as a corresponding region and an end key that is used as a corresponding region and is ordered.
Alternatively, the corresponding generated start storage unit may not set a start key, and default infinity to the end time is a part of the corresponding storage period of the storage section corresponding to the start storage unit. Similarly, end keys may not be set for tail memory locations.
Illustratively, the cell identifications include (0 x000000, 0x000001, 0x 000002), (0 x000100, 0x000101, 0x 000102), and (0 x000200, 0x000201, 0x 000202). As shown in fig. 12, the corresponding generation of 9 storage units is a schematic diagram of the storage units, wherein the storage unit 1 only has end key (0 x 000001), the storage unit 9 only has start key (0 x 000202), and the unit identifiers corresponding to the rest storage intervals 2-8 are (0x000001、0x000002)、(0x000002、0x000100)、(0x000100、0x000101)、(0x000101、0x000102)、(0x000102、0x000200)、(0x000200、0x000201) and (0 x000201, 0x 000202) respectively.
S1109: the storage policy of the storage units constituting the first storage section is set as a hot storage policy for storing hot data, and the storage policy of the storage units constituting the second storage section is set as a cold storage policy for storing cold data.
In this step, the thermal storage policy may be all_ssd (all_solid STATE DISK, ALL Solid state disks), i.e. the storage unit is stored on the SSD hard disk, and the cold storage policy may be HOT (HOT), i.e. the storage unit is stored on a general hard disk, such as SATA disk.
The method for establishing the Hbase table shown in FIG. 11 provided by the implementation of the invention, because the Hbase is established to comprise both a hot storage section for storing hot data and a cold storage section for storing cold data, the storage burden of the Hbase caused by the change of the hot and cold data is reduced, and the storage units in each storage section in the Hbase table are marked by section marks and unit salting-out marks, wherein the section marks are determined according to the storage time periods of the storage sections, the stored data can be segmented according to time, so that the data with the same or similar time can be conveniently and intensively managed and generated, and further, the excessive data stored in a single storage unit can be avoided through the unit salting-out marks, and the load balance of the storage units is ensured.
In one embodiment, after the establishment of the Hbase table is completed, a third predetermined number of storage units may be (separated) at the tail storage unit spilt after each second predetermined period, and the corresponding unit identifiers thereof are incremented in the foregoing manner.
Based on the same inventive concept, according to the method for converting the cold and hot data in the Hbase table provided by the embodiment of the present invention, the embodiment of the present invention further provides a device for converting the cold and hot data in the Hbase table, as shown in fig. 13, where the device includes:
a storage interval determining module 1301, configured to determine, as a target thermal storage interval, a thermal storage interval having a duration greater than a first preset duration from an end time to a current time of a storage interval, from among thermal storage intervals included in the Hbase table, where the thermal storage interval is a storage interval for storing thermal data on a first hard disk, and a storage interval corresponding to a storage interval indicates that the storage interval is used to store data with a generation time located in the storage interval;
The storage section changing module 1302 is configured to change the target thermal storage section to a cold storage section for storing cold data, so that the thermal data stored in the target thermal storage section is converted into cold data, the cold storage section is a storage section for storing cold data on the second hard disk, and the read-write performance of the first hard disk is higher than the read-write performance of the second hard disk.
Further, each storage interval in the Hbase table corresponds to an interval identifier, the interval identifier is a character string in a first preset format, and the duration from the end time of one storage period to the current time is inversely proportional to the value of the character string of the interval identifier corresponding to the storage period;
The storage interval determining module is specifically configured to encode, according to a preset first encoding policy, a current time to obtain a reference identifier, where the reference identifier is a character string in a first preset format corresponding to a time different from the current time by a first preset time, and in each thermal storage interval, determine a thermal storage interval in which a value of the character string of the interval identifier is smaller than a value of the character string of the reference identifier, as a target thermal storage interval.
Further, a storage unit is included between the storage areas in the Hbase table;
The storage interval changing module is specifically configured to change a thermal storage policy of a storage unit included in the target thermal storage interval to a cold storage policy, so that the storage unit included in the target thermal storage interval is transferred from the first hard disk to the second hard disk.
In the cold-hot data conversion device in the Hbase table as shown in fig. 13, the storage section meeting the condition of storing cold data in the Hbase table can be determined according to the duration from the end time of the storage period to the current time, and then the storage section is set as the storage section for storing cold data, so that the data stored in the storage section does not need to be migrated one by one into the Hbase table for storing cold data, namely, the hot data is converted into cold data, the data is not migrated, only the storage section where the data is located is migrated from the first hard disk to the second hard disk, and the whole data migration process does not involve the change of storage information such as a data query mode, thereby reducing the storage burden caused by the change of cold and hot data to the Hbase.
Based on the same inventive concept, according to the method for storing data shown in fig. 4 provided by the embodiment of the present invention, the embodiment of the present invention further provides a device for storing data, as shown in fig. 14, where the device includes:
A generating time determining module 1401, configured to determine, for data to be stored in an Hbase table, a generating time of the data to be stored, where the Hbase table includes a hot storage section and a cold storage section, the hot storage section is a storage section for storing hot data on a first hard disk, the cold storage section is a storage section for storing cold data on a second hard disk, a read-write performance of the first hard disk is higher than a read-write performance of the second hard disk, each storage section in the Hbase table corresponds to a storage period, and a storage period corresponding to one storage section indicates that the storage section is used for storing data with the generating time located in the storage period;
The data storage module 1402 is configured to store data to be stored in a storage interval corresponding to a storage period to which a generation time of the data to be stored belongs in the Hbase table.
Further, each storage interval in the Hbase table corresponds to an interval identifier, the interval identifier is a character string in a first preset format, and the duration from the end time of one storage period to the current time is inversely proportional to the value of the character string of the interval identifier corresponding to the storage period;
The device further comprises:
The first encoding module is configured to encode, according to a preset second encoding policy, a generation time of data to be stored to obtain a character string in a first preset format, where the character string is used as a data time identifier of the data to be stored, and a size of a character string value corresponding to the data time identifier of one data is inversely proportional to a duration between the generation time of the data and a current time, before the data storage module 1402 performs storing of the data to be stored in a storage interval corresponding to a storage period to which the generation time of the data to be stored belongs in the Hbase table;
The data storage module is specifically configured to determine a storage interval in the Hbase table, where the storage interval is matched with the data time identifier, as a target storage interval, and store data to be stored in the target storage interval.
Further, a storage unit is included between the storage areas in the Hbase table;
the data storage module is specifically configured to randomly store data to be stored in a storage unit included in the target storage interval.
Further, the storage interval comprises a first preset number of storage units, each storage unit corresponds to a unit salting-out identifier, and the character string values corresponding to the unit salting-out identifiers of the storage units belonging to the same storage interval are continuous first preset number of values from a preset threshold value;
The apparatus further comprises:
The salting-out identification determining module is used for acquiring a data line identification of data to be stored before a storage interval, which is matched with a corresponding interval identification and a data time identification, in the Hbase table is executed by the data storage module and is used as a target storage interval, hash operation is carried out on the data line identification to obtain a hash operation result of the data line identification, the hash operation result is a numerical value, the hash operation result carries out remainder taking operation on a first preset number to obtain a remainder taking value, and the data salting-out identification of the data to be stored is determined based on the sum of the remainder taking value and the numerical value of a preset threshold value, wherein the numerical value of a character string of the data salting-out identification is the sum of the numerical value;
the data storage module is specifically configured to determine a storage unit, which is the same as the data salting-out identifier, in storage units included in the target storage section, as a target storage unit, and store data to be stored in the target storage unit.
In the data storage device shown in fig. 14, because the Hbase table includes the hot storage section and the cold storage section, the data to be stored can be determined to be stored in the hot storage section or in the cold storage section according to the generation time of the data to be stored, and the hot data and the cold data can be stored in one Hbase table at the same time, so that the storage burden of the Hbase caused by the change of the hot and cold data is reduced.
Based on the same inventive concept, according to the method for establishing an Hbase table provided by the embodiment of the present invention, the embodiment of the present invention further provides an apparatus for establishing an Hbase table, as shown in fig. 15, where the apparatus includes:
a reference time determining module 1501, configured to determine a time different from the current time by a first preset duration as a reference time;
A storage interval generating module 1502, configured to generate a storage interval based on a reference time, where the storage interval includes a first storage interval in which a start time of a corresponding storage period is greater than the reference time, and a second storage interval in which an end time of the corresponding storage period is not greater than the reference time, and a storage period corresponding to one storage interval indicates that the storage interval is used to store data in the storage period when the generation time is located;
The storage interval setting module 1503 is configured to set the first storage interval as a hot storage interval and set the second storage interval as a cold storage interval, where the hot storage interval is a storage interval for storing hot data on the first hard disk, the cold storage interval is a storage interval for storing cold data on the second hard disk, and the read-write performance of the first hard disk is higher than the read-write performance of the second hard disk.
Further, the storage interval generating module is specifically configured to generate a first preset number of continuous strings according to a first preset format, as interval identifiers of storage intervals included in the Hbase table to be established, and generate storage intervals corresponding to the interval identifiers based on a reference time, where the interval identifier with the smallest string value corresponds to the storage interval with the ending time smaller than the reference time, and a duration from the ending time of one storage period to the current time is inversely proportional to a size of the string value of the interval identifier corresponding to the storage period.
Further, the apparatus further comprises:
A first preset number determining module, configured to, before the storage interval generating module 1502 performs generating a first preset number of continuous strings according to a first preset format, obtain a first preset duration and a hot-cold data conversion granularity as an interval identifier of a storage interval included in an Hbase table to be established, where the hot-cold data conversion granularity represents a granularity of hot data conversion cold data in a time dimension, calculate a quotient of a ratio of the first preset duration to a duration corresponding to the hot-cold data conversion granularity, and use a sum of the quotient and the second preset number as the first preset number.
Further, the starting time is greater than the reference time, and the duration of the storage intervals smaller than the current time is the same, and is the second preset duration, and two storage periods corresponding to every two adjacent interval identifiers are adjacent;
the salting-out identifier generation module is used for calculating the ratio of the second preset time length to the time length corresponding to the data storage granularity as a third preset number before the storage interval generation module executes the generation of the storage interval corresponding to each interval identifier based on the reference time, wherein the data storage granularity is the minimum time length granularity of the data storage, and a continuous third preset number of character strings are generated as unit salting-out identifiers according to the second preset format;
The storage interval generation module is specifically configured to, for each interval identifier, respectively combine the third preset number of unit salting-out identifiers with the third preset number of unit salting-out identifiers, generate a third preset number of unit identifiers corresponding to each interval identifier, and generate storage units corresponding to each unit identifier based on a reference time, where storage units with the same interval identifier form a storage interval corresponding to the interval identifier.
Further, the storage interval setting module is specifically configured to set a storage policy of a storage unit that constitutes the first storage interval to a hot storage policy that stores hot data, and set a storage policy of a storage unit that constitutes the second storage interval to a cold storage policy that stores cold data.
In the method for establishing the Hbase table shown in fig. 15 according to the embodiment of the present invention, the Hbase to be established includes both a hot storage section for storing hot data and a cold storage section for storing cold data, so that the storage burden of the Hbase caused by the change of the hot and cold data is reduced.
The embodiment of the present invention also provides an electronic device, as shown in fig. 16, including a processor 1601, a communication interface 1602, a memory 1603, and a communication bus 1604, where the processor 1601, the communication interface 1602, and the memory 1603 perform communication with each other through the communication bus 1604,
A memory 1603 for storing a computer program;
the processor 1601 is configured to execute a program stored in the memory 1603, and implement the following steps:
Determining a thermal storage section with a time length longer than a first preset time length from the end time of a storage section to the current time in each thermal storage section contained in an Hbase table as a target thermal storage section, wherein the thermal storage section is a storage section for storing thermal data on a first hard disk, and a storage section corresponding to one storage section indicates that the storage section is used for storing data with a generation time located in the storage section;
and changing the target thermal storage section into a cold storage section for storing cold data so as to enable the thermal data stored in the target thermal storage section to be converted into the cold data, wherein the cold storage section is a storage section for storing the cold data on a second hard disk, and the read-write performance of the first hard disk is higher than that of the second hard disk.
In the electronic device shown in fig. 16, since the storage interval meeting the condition of storing cold data in the Hbase table can be determined according to the duration from the end time of the storage interval to the current time, and then the storage interval is set as the storage interval for storing cold data, without transferring the data stored in the storage interval to the Hbase table for storing cold data one by one, that is, the hot data is converted into cold data, and the data is not transferred, only the storage interval where the data is located is transferred from the first hard disk to the second hard disk, and the whole data transfer process does not involve the change of storage information such as a data query mode, thereby reducing the storage burden caused by the change of cold and hot data on the Hbase.
The embodiment of the present invention further provides an electronic device, as shown in fig. 17, including a processor 1701, a communication interface 1702, a memory 1703 and a communication bus 1704, where the processor 1701, the communication interface 1702, the memory 1703 complete communication with each other through the communication bus 1704,
A memory 1703 for storing a computer program;
the processor 1701 is configured to execute the program stored in the memory 1703, and implement the following steps:
Determining generation time of data to be stored in an Hbase table aiming at the data to be stored, wherein the Hbase table comprises a hot storage interval and a cold storage interval, the hot storage interval is a storage interval for storing hot data on a first hard disk, the cold storage interval is a storage interval for storing cold data on a second hard disk, the read-write performance of the first hard disk is higher than that of the second hard disk, each storage interval in the Hbase table corresponds to a storage period, and the storage period corresponding to one storage interval represents data of the storage interval for storing the generation time in the storage period;
and storing the data to be stored into a storage interval corresponding to a storage period to which the generation time of the data to be stored belongs in the Hbase table.
In the electronic device shown in fig. 17, because the Hbase table includes the hot storage section and the cold storage section, the data to be stored can be determined to be stored in the hot storage section or in the cold storage section according to the generation time of the data to be stored, and the hot data and the cold data can be stored in one Hbase table at the same time, so that the storage burden of the Hbase caused by the change of the hot and cold data is reduced.
The embodiment of the present invention also provides an electronic device, as shown in fig. 18, including a processor 1801, a communication interface 1802, a memory 1803, and a communication bus 1804, where the processor 1801, the communication interface 1802, the memory 1803 complete communication with each other through the communication bus 1804,
A memory 1803 for storing a computer program;
the processor 1801 is configured to execute the program stored in the memory 1803, and implement the following steps:
Determining a time which is different from the current time by a first preset time length as a reference time;
Generating a storage interval based on the reference time, wherein the storage interval comprises a first storage interval of which the starting time of a corresponding storage time period is greater than the reference time and a second storage interval of which the ending time of the corresponding storage time period is not greater than the reference time, and the storage time period corresponding to one storage interval represents that the storage interval is used for storing data of which the generation time is positioned in the storage time period;
The first storage interval is set as a thermal storage interval, and the second storage interval is set as a cold storage interval, wherein the thermal storage interval is a storage interval for storing thermal data on a first hard disk, the cold storage interval is a storage interval for storing cold data on a second hard disk, and the read-write performance of the first hard disk is higher than that of the second hard disk.
In the electronic device shown in fig. 18, the Hbase is set up to include both a hot storage section for storing hot data and a cold storage section for storing cold data, so that the storage burden of the Hbase caused by the change of the hot and cold data is reduced.
It should be noted that, other embodiments of the filtering method implemented by the electronic device are the same as the methods mentioned in the foregoing method embodiments, and are not repeated herein.
The communication bus mentioned above for the electronic device may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the electronic device and other devices.
The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but may also be a digital signal processor (DIGITAL SIGNAL Processing, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.
In a further embodiment of the present invention, a computer readable storage medium is provided, in which a computer program is stored, which when being executed by a processor, implements the steps of any of the above-mentioned methods for converting cold and hot data in an Hbase table.
In yet another embodiment of the present invention, there is also provided a computer readable storage medium having stored therein a computer program which when executed by a processor implements the steps of the method of storing any of the data described above.
In a further embodiment of the present invention, a computer readable storage medium is provided, in which a computer program is stored, which when executed by a processor, implements the steps of any of the above-mentioned Hbase table establishment methods.
In a further embodiment of the present invention, a computer program product comprising instructions which, when run on a computer, causes the computer to perform the method of cold-hot data conversion in an Hbase table according to any of the above embodiments is also provided.
In yet another embodiment of the present invention, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform the method of storing data of any of the above embodiments.
In a further embodiment of the present invention, a computer program product comprising instructions is provided which, when run on a computer, causes the computer to perform the method of establishing an Hbase table according to any of the embodiments described above.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium can be any available medium that can be accessed by a computer or a Hbase table cold-hot data conversion device including one or more servers, data centers, etc. in which the available medium is integrated. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk Solid STATE DISK (SSD)), etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for an apparatus, an electronic device, a computer readable storage medium, a computer program product, a description is relatively simple, as it is substantially similar to the method embodiments, as relevant see also part of the description of the method embodiments.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.
Claims (14)
1. The Hbase table cold-hot data conversion method is characterized by comprising the following steps of:
Determining a thermal storage section with a time length longer than a first preset time length from the end time of a storage section to the current time in each thermal storage section contained in an Hbase table as a target thermal storage section, wherein the thermal storage section is a storage section for storing thermal data on a first hard disk, and a storage section corresponding to one storage section indicates that the storage section is used for storing data with a generation time located in the storage section;
Changing the target thermal storage section into a cold storage section for storing cold data so as to enable the thermal data stored in the target thermal storage section to be converted into cold data, wherein the cold storage section is a storage section for storing cold data on a second hard disk, and the read-write performance of the first hard disk is higher than that of the second hard disk;
The number of storage intervals contained in the Hbase table is a first preset number;
the first preset number is obtained by the following steps:
acquiring a first preset duration and a cold-hot data conversion granularity, wherein the cold-hot data conversion granularity represents the granularity of hot data conversion cold data in a time dimension;
calculating the quotient of the first preset time length and the time length ratio corresponding to the cold-hot data conversion granularity;
And taking the sum of the quotient and the second preset quantity as a first preset quantity.
2. The method of claim 1, wherein each storage section in the Hbase table corresponds to a section identifier, the section identifier is a string in a first preset format, and a duration from an end time of one storage period to a current time is inversely proportional to a value of the string of the section identifier corresponding to the storage period;
In each heat storage section included in the Hbase table, determining a heat storage section with a duration from an end time of a storage period to a current time being longer than a first preset duration, including:
Encoding the current time according to a preset first encoding strategy based on a first preset time length to obtain a reference mark, wherein the reference mark is a character string in a first preset format corresponding to a time which is different from the current time by the first preset time length;
and in each heat storage section, determining a heat storage section with the character string value of the section identifier smaller than the character string value of the reference identifier as a target heat storage section.
3. The method according to claim 1 or 2, characterized in that the Hbase table comprises a memory unit between the memory areas;
The changing the target thermal storage section to a cold storage section storing cold data so as to enable the thermal data stored in the target thermal storage section to be converted into cold data includes:
And modifying the thermal storage strategy of the storage units contained in the target thermal storage interval into a cold storage strategy so as to transfer the storage units contained in the target thermal storage interval from the first hard disk to the second hard disk.
4. A method of storing data, comprising:
Determining generation time of data to be stored in an Hbase table aiming at the data to be stored, wherein the Hbase table comprises a hot storage interval and a cold storage interval, the hot storage interval is a storage interval for storing hot data on a first hard disk, the cold storage interval is a storage interval for storing cold data on a second hard disk, the read-write performance of the first hard disk is higher than that of the second hard disk, each storage interval in the Hbase table corresponds to a storage period, and the storage period corresponding to one storage interval represents data of the storage interval for storing the generation time in the storage period;
Storing the data to be stored into a storage interval corresponding to a storage period to which the generation time of the data to be stored belongs in the Hbase table;
The storage interval comprises a first preset number of storage units, each storage unit corresponds to a unit salting-out identifier, and character string values corresponding to the unit salting-out identifiers of the storage units belonging to the same storage interval are the first preset number of values which are continuous from a preset threshold value;
Before the data to be stored is stored in the storage interval corresponding to the storage period to which the generation time of the data to be stored belongs in the Hbase table, the method further comprises:
acquiring a data line identifier of the data to be stored;
carrying out hash operation on the data line identifier to obtain a hash operation result of the data line identifier, wherein the hash operation result is a numerical value;
Performing remainder operation on the first preset quantity by using the hash operation result to obtain a remainder value;
Determining a data salting-out identifier of the data to be stored based on the numerical sum of the remainder value and the preset threshold value, wherein the character string value of the data salting-out identifier is the numerical sum;
storing the data to be stored in a storage interval corresponding to a storage period to which the generation time of the data to be stored belongs in the Hbase table, including:
Determining a storage unit with the same unit salting-out identification as the data salting-out identification in storage units contained in a storage interval corresponding to a storage period to which the generation time of the data to be stored in the Hbase table belongs as a target storage unit;
and storing the data to be stored into the target storage unit.
5. The method of claim 4, wherein each storage section in the Hbase table corresponds to a section identifier, the section identifier is a string in a first preset format, and a duration from an end time of one storage period to a current time is inversely proportional to a value of the string of the section identifier corresponding to the storage period;
Before the data to be stored is stored in the storage interval corresponding to the storage period to which the generation time of the data to be stored belongs in the Hbase table, the method further comprises:
Encoding the generation time of the data to be stored according to a preset second encoding strategy to obtain a character string in the first preset format, wherein the character string is used as a data time identifier of the data to be stored, and the size of a character string value corresponding to the data time identifier of one data is inversely proportional to the duration between the generation time of the data and the current time;
The storing the data to be stored in the storage interval corresponding to the storage period to which the generation time of the data to be stored belongs in the Hbase table includes:
determining a storage interval of which the corresponding interval identifier in the Hbase table is matched with the data time identifier as a target storage interval;
and storing the data to be stored into the target storage interval.
6. The method of claim 5, wherein the Hbase table comprises a memory unit between memory areas;
the storing the data to be stored in the target storage interval includes:
and randomly storing the data to be stored into a storage unit contained in the target storage interval.
7. The method for establishing the Hbase table is characterized by comprising the following steps:
Determining a time which is different from the current time by a first preset time length as a reference time;
Generating a storage interval based on the reference time, wherein the storage interval comprises a first storage interval of which the starting time of a corresponding storage time period is greater than the reference time and a second storage interval of which the ending time of the corresponding storage time period is not greater than the reference time, and the storage time period corresponding to one storage interval represents that the storage interval is used for storing data of which the generation time is positioned in the storage time period;
setting the first storage interval as a thermal storage interval and the second storage interval as a cold storage interval, wherein the thermal storage interval is a storage interval for storing thermal data on a first hard disk, the cold storage interval is a storage interval for storing cold data on a second hard disk, and the read-write performance of the first hard disk is higher than that of the second hard disk;
The number of the generated storage intervals is a first preset number;
the first preset number is obtained by the following steps:
acquiring a first preset duration and a cold-hot data conversion granularity, wherein the cold-hot data conversion granularity represents the granularity of hot data conversion cold data in a time dimension;
calculating the quotient of the first preset time length and the time length ratio corresponding to the cold-hot data conversion granularity;
And taking the sum of the quotient and the second preset quantity as a first preset quantity.
8. The method of claim 7, wherein the generating a storage interval based on the reference time instant comprises:
generating a first preset number of continuous character strings according to a first preset format, and using the character strings as interval identifiers of storage intervals contained in an Hbase table to be established;
And generating a storage interval corresponding to each interval identifier based on the reference time, wherein the interval identifier with the smallest character string value corresponds to the storage interval with the ending time smaller than the reference time, and the duration from the ending time of one storage period to the current time is inversely proportional to the character string value of the interval identifier corresponding to the storage period.
9. The method of claim 8, wherein a starting time is greater than the reference time and a duration of a storage interval less than the current time is the same, and each two adjacent intervals identify two adjacent storage periods corresponding to the two adjacent intervals as a second preset duration;
before the generating the storage section corresponding to each section identifier based on the reference time, the method further comprises:
Calculating the ratio of the second preset time length to the time length corresponding to the data storage granularity, wherein the data storage granularity is the minimum time length granularity of data storage, and the ratio is used as a third preset quantity;
Generating a continuous third preset number of character strings according to a second preset format, and using the character strings as unit salting-out marks;
The generating a storage section corresponding to each section identifier based on the reference time includes:
For each interval identifier, respectively combining the interval identifier with the third preset number of unit salting-out identifiers to generate the third preset number of unit identifiers corresponding to each interval identifier;
And generating storage units corresponding to each unit identifier based on the reference time, wherein the storage units with the same interval identifier form a storage interval corresponding to the interval identifier.
10. The method of claim 9, wherein the setting the first storage interval as a hot storage interval and the second storage interval as a cold storage interval comprises:
Setting a storage policy of the storage units constituting the first storage section as a hot storage policy storing hot data, and setting a storage policy of the storage units constituting the second storage section as a cold storage policy storing cold data.
11. An Hbase table cold-hot data conversion device, comprising:
a storage interval determining module, configured to determine, as a target thermal storage interval, a thermal storage interval having a duration greater than a first preset duration from an end time to a current time of a storage interval, in each thermal storage interval included in the Hbase table, where the thermal storage interval is a storage interval for storing thermal data on a first hard disk, and a storage interval corresponding to a storage interval indicates that the storage interval is used to store data with a generation time located in the storage interval;
The storage interval changing module is used for changing the target thermal storage interval into a cold storage interval for storing cold data so as to enable the thermal data stored in the target thermal storage interval to be converted into the cold data, the cold storage interval is a storage interval for storing the cold data on a second hard disk, and the read-write performance of the first hard disk is higher than that of the second hard disk;
The number of storage intervals contained in the Hbase table is a first preset number;
the first preset number is obtained by the following steps:
acquiring a first preset duration and a cold-hot data conversion granularity, wherein the cold-hot data conversion granularity represents the granularity of hot data conversion cold data in a time dimension;
calculating the quotient of the first preset time length and the time length ratio corresponding to the cold-hot data conversion granularity;
And taking the sum of the quotient and the second preset quantity as a first preset quantity.
12. A data storage device, comprising:
The generation time determining module is used for determining the generation time of the data to be stored in the Hbase table aiming at the data to be stored, wherein the Hbase table comprises a hot storage interval and a cold storage interval, the hot storage interval is a storage interval for storing hot data on a first hard disk, the cold storage interval is a storage interval for storing cold data on a second hard disk, the read-write performance of the first hard disk is higher than that of the second hard disk, each storage interval in the Hbase table corresponds to a storage period, and the storage period corresponding to one storage interval represents the data of which the generation time is located in the storage period;
the data storage module is used for storing the data to be stored into a storage interval corresponding to a storage period to which the generation time of the data to be stored belongs in the Hbase table;
The storage interval comprises a first preset number of storage units, each storage unit corresponds to a unit salting-out identifier, and character string values corresponding to the unit salting-out identifiers of the storage units belonging to the same storage interval are the first preset number of values which are continuous from a preset threshold value;
the apparatus further comprises:
The salting-out identification determining module is used for acquiring a data line identification of the data to be stored before the data to be stored is stored in a storage interval corresponding to a storage period to which the generation time of the data to be stored belongs in the Hbase table by the data storage module; carrying out hash operation on the data line identifier to obtain a hash operation result of the data line identifier, wherein the hash operation result is a numerical value; performing remainder operation on the first preset quantity by using the hash operation result to obtain a remainder value; determining a data salting-out identifier of the data to be stored based on the numerical sum of the remainder value and the preset threshold value, wherein the character string value of the data salting-out identifier is the numerical sum;
The data storage module is specifically configured to determine, as a target storage unit, a storage unit with a unit salting-out identifier identical to the data salting-out identifier, from storage units included in a storage interval corresponding to a storage period to which the generation time of the data to be stored belongs in the Hbase table; and storing the data to be stored into the target storage unit.
13. An apparatus for establishing an Hbase table, comprising:
The reference time determining module is used for determining the time which is different from the current time by a first preset time length and is used as the reference time;
A storage interval generating module, configured to generate a storage interval based on the reference time, where the storage interval includes a first storage interval in which a start time of a corresponding storage period is greater than the reference time, and a second storage interval in which an end time of the corresponding storage period is not greater than the reference time, and a storage period corresponding to a storage interval indicates that the storage interval is used to store data in the storage period when the generation time is located;
A storage interval setting module, configured to set the first storage interval as a thermal storage interval and set the second storage interval as a cold storage interval, where the thermal storage interval is a storage interval for storing thermal data on a first hard disk, the cold storage interval is a storage interval for storing cold data on a second hard disk, and read-write performance of the first hard disk is higher than read-write performance of the second hard disk;
The number of the generated storage intervals is a first preset number;
the first preset number is obtained by the following steps:
acquiring a first preset duration and a cold-hot data conversion granularity, wherein the cold-hot data conversion granularity represents the granularity of hot data conversion cold data in a time dimension;
calculating the quotient of the first preset time length and the time length ratio corresponding to the cold-hot data conversion granularity;
And taking the sum of the quotient and the second preset quantity as a first preset quantity.
14. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
A memory for storing a computer program;
A processor for carrying out the method steps of any one of claims 1-3 or 4-6 when executing a program stored on a memory.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010430568.2A CN113704346B (en) | 2020-05-20 | 2020-05-20 | Hbase table cold-hot data conversion method and device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010430568.2A CN113704346B (en) | 2020-05-20 | 2020-05-20 | Hbase table cold-hot data conversion method and device and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113704346A CN113704346A (en) | 2021-11-26 |
CN113704346B true CN113704346B (en) | 2024-06-04 |
Family
ID=78645616
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010430568.2A Active CN113704346B (en) | 2020-05-20 | 2020-05-20 | Hbase table cold-hot data conversion method and device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113704346B (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103942289A (en) * | 2014-04-12 | 2014-07-23 | 广西师范大学 | Memory caching method oriented to range querying on Hadoop |
US9122588B1 (en) * | 2013-03-15 | 2015-09-01 | Virident Systems Inc. | Managing asymmetric memory system as a cache device |
KR20150128553A (en) * | 2014-05-08 | 2015-11-18 | 주식회사 알티베이스 | Hybrid DBMS and the method to manage table thereof |
WO2016023372A1 (en) * | 2014-08-14 | 2016-02-18 | 中兴通讯股份有限公司 | Data storage processing method and device |
CN105653524A (en) * | 2014-11-10 | 2016-06-08 | 阿里巴巴集团控股有限公司 | Data storage method, device and system |
WO2017092470A1 (en) * | 2015-12-01 | 2017-06-08 | 中兴通讯股份有限公司 | Data storage method and device |
CN109033360A (en) * | 2018-07-26 | 2018-12-18 | 腾讯科技(深圳)有限公司 | A kind of data query method, apparatus, server and storage medium |
CN110083649A (en) * | 2019-04-24 | 2019-08-02 | 北京电子工程总体研究所 | It is a kind of based on cold, warm, dsc data data management system and data managing method |
CN110795427A (en) * | 2019-09-27 | 2020-02-14 | 苏宁云计算有限公司 | Data separation storage method and device, computer equipment and storage medium |
CN110858210A (en) * | 2018-08-17 | 2020-03-03 | 阿里巴巴集团控股有限公司 | Data query method and device |
CN110888861A (en) * | 2019-11-12 | 2020-03-17 | 上海麦克风文化传媒有限公司 | Novel big data storage method |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10169355B2 (en) * | 2014-10-27 | 2019-01-01 | Tata Consultancy Services Limited | Knowledge representation in a multi-layered database |
US9696935B2 (en) * | 2015-04-24 | 2017-07-04 | Kabushiki Kaisha Toshiba | Storage device that secures a block for a stream or namespace and system having the storage device |
CA3050220A1 (en) * | 2018-07-19 | 2020-01-19 | Bank Of Montreal | Systems and methods for data storage and processing |
-
2020
- 2020-05-20 CN CN202010430568.2A patent/CN113704346B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9122588B1 (en) * | 2013-03-15 | 2015-09-01 | Virident Systems Inc. | Managing asymmetric memory system as a cache device |
CN103942289A (en) * | 2014-04-12 | 2014-07-23 | 广西师范大学 | Memory caching method oriented to range querying on Hadoop |
KR20150128553A (en) * | 2014-05-08 | 2015-11-18 | 주식회사 알티베이스 | Hybrid DBMS and the method to manage table thereof |
WO2016023372A1 (en) * | 2014-08-14 | 2016-02-18 | 中兴通讯股份有限公司 | Data storage processing method and device |
CN105653524A (en) * | 2014-11-10 | 2016-06-08 | 阿里巴巴集团控股有限公司 | Data storage method, device and system |
WO2017092470A1 (en) * | 2015-12-01 | 2017-06-08 | 中兴通讯股份有限公司 | Data storage method and device |
CN109033360A (en) * | 2018-07-26 | 2018-12-18 | 腾讯科技(深圳)有限公司 | A kind of data query method, apparatus, server and storage medium |
CN110858210A (en) * | 2018-08-17 | 2020-03-03 | 阿里巴巴集团控股有限公司 | Data query method and device |
CN110083649A (en) * | 2019-04-24 | 2019-08-02 | 北京电子工程总体研究所 | It is a kind of based on cold, warm, dsc data data management system and data managing method |
CN110795427A (en) * | 2019-09-27 | 2020-02-14 | 苏宁云计算有限公司 | Data separation storage method and device, computer equipment and storage medium |
CN110888861A (en) * | 2019-11-12 | 2020-03-17 | 上海麦克风文化传媒有限公司 | Novel big data storage method |
Non-Patent Citations (2)
Title |
---|
Cold Data Eviction Using Node Congestion Probability for HDFS based on Hybrid SSD;Nayoung Park 等;《SNPD 2015》;全文 * |
HDFS存储和优化技术研究综述;金国栋 等;《软件学报》;第31卷(第1期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113704346A (en) | 2021-11-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10642515B2 (en) | Data storage method, electronic device, and computer non-volatile storage medium | |
CN109885577B (en) | Data processing method, device, terminal and storage medium | |
US10880619B2 (en) | Verifying provenance of digital content | |
CN114077680B (en) | Graph data storage method, system and device | |
US10021050B2 (en) | Secure conversation and document binder | |
JP2014063217A (en) | Backup control program, backup control method, and information processor | |
CN114564446B (en) | File storage method, device, system and storage medium | |
EP3809708A1 (en) | Video data storage method and device in cloud storage system | |
CN111008181A (en) | A distributed file system storage policy switching method, system, terminal and storage medium | |
CN113704346B (en) | Hbase table cold-hot data conversion method and device and electronic equipment | |
CN101799785A (en) | Messaging device, information processing method and program | |
CN111628996A (en) | Electronic data communication method and system based on Internet of things | |
CN111190896B (en) | Data processing method, device, storage medium and computer equipment | |
CN112889039B (en) | Identification of records for post-cloning tenant identifier conversion | |
CN115270162B (en) | Multi-party calculation-based auditing and auditing pricing heterogeneous data online integration method and system | |
JP7328884B2 (en) | Data management computer and data management method | |
CN110968267B (en) | Data management method, device, server and system | |
CN116820323A (en) | Data storage method, device, electronic equipment and computer readable storage medium | |
JP7105717B2 (en) | Information processing device, extraction method, and program | |
US20200142867A1 (en) | Intelligent pushing method and system for related contents of newly-created entries | |
CN112232970A (en) | Data relationship identification method and device, storage medium and electronic equipment | |
JP6440256B2 (en) | How to search the database | |
CN117252160B (en) | Document editing method, device, equipment and medium | |
CN118626474B (en) | Data migration method, device and system | |
JP2006113663A (en) | Data storage system, its method, file server, terminal and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |