CN107704527B

CN107704527B - Data storage method, device and storage medium

Info

Publication number: CN107704527B
Application number: CN201710841916.3A
Authority: CN
Inventors: 钟超强; 毕杰山
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Cloud Computing Technologies Co Ltd
Priority date: 2017-09-18
Filing date: 2017-09-18
Publication date: 2020-05-08
Anticipated expiration: 2037-09-18
Also published as: WO2019052209A1; CN107704527A

Abstract

The application discloses a data storage method, a data storage device and a data storage medium, and belongs to the technical field of information processing. The method comprises the following steps: when at least one data record is acquired, determining bitmaps of tag values in bitmap index partitions included in the bitmap index through a preset mapping/reduction model based on the at least one data record, so as to store the at least one data record into the corresponding bitmap index partition. Because the bitmap index comprises at least one bitmap, and each bitmap corresponds to one label value, the bearer identifier with the label value can be searched through the bitmap index based on the label value, and the efficiency of data query based on the label value is improved. In addition, the bitmap of the label value in each bitmap index partition can be determined in parallel through the preset mapping/reduction model, and the efficiency of data storage is improved.

Description

Data storage method, device and storage medium

Technical Field

The present application relates to the field of information processing technologies, and in particular, to a data storage method, an apparatus, and a storage medium.

Background

The Hadoop Database (HBase) has the characteristics of distribution, high reliability, high performance, Key-Value-based (Key-Value) storage and the like, so that more and more enterprises and users use the HBase to construct data tables.

Typically, a data table comprises a plurality of rows of data records, each row of data records comprising an identification of a carrier and a label value for a respective label carried by the carrier. For example, for a user a, the user a has two tag values of gender "woman" and occupation "engineer", and the row corresponding to the user a in the data table includes the identifier of the user a, the tag value "woman" and the tag value "engineer". That is, the data table records the correspondence between the identifier of the bearer and the tag value that the bearer has.

Based on the storage method of the data table, when data needs to be queried in the data table, the query efficiency is high when the data is queried according to the identifier of the bearer, and when the query is performed according to a certain label value or a combination of label values, the label values of the bearers can only be queried line by line according to the identifier of the bearer by line according to a column value filter (column value filter) in the related art, and because the number of lines in the data table is usually thousands or tens of thousands, the data query efficiency is low when the data is queried according to the label values in the related scheme.

Disclosure of Invention

In order to solve the problem that the data query efficiency is low when data query is performed based on a tag value in the related art, the application provides a data storage method, a data storage device and a storage medium. The technical scheme is as follows:

in a first aspect, a data storage method is provided, the method including:

obtaining at least one data record, wherein each data record comprises a bearer identification and at least one label value;

based on a carrier identifier included in each data record, performing first-class classification on the at least one data record according to partition information of N first protocol partitions included in a preset mapping/protocol model to obtain at least one first mapping set, wherein each first mapping set corresponds to one first protocol partition;

the N first reduction partitions are determined according to partition information of N bitmap index partitions included in a bitmap index, where N is a positive integer, each bitmap index partition corresponds to one first reduction partition, each bitmap index partition includes at least one bitmap, each bitmap corresponds to one tag value, each bitmap includes at least one bitmap bit, and each bitmap bit is used for recording whether a bearer corresponding to one bearer identifier has a tag value corresponding to a current bitmap;

performing first-type reduction processing on the at least one first mapping set in parallel through a first reduction partition corresponding to the at least one first mapping set to obtain bitmaps of the tag values in each bitmap index partition;

and storing the obtained bitmap of the label value in each bitmap index partition into the corresponding bitmap index partition.

In the embodiment of the present invention, when at least one data record is obtained, the at least one data record may be stored into the bitmap index based on a preset mapping/reduction model, so that after the data is stored, the bearer identifier having a certain label value is searched for through the bitmap index based on the certain label value. In addition, the bitmap of the label value in each bitmap index partition can be determined in parallel through the preset mapping/reduction model, and the efficiency of storing at least one data record into N bitmap index partitions is improved.

Optionally, the partition information of each first protocol partition is composed of a bitmap index table identifier and a bearer identifier of a preset interval range;

the performing a first type classification on the at least one data record according to partition information of N first protocol partitions included in a preset mapping/protocol model to obtain at least one first mapping set includes:

performing a first type of mapping processing on the at least one data record in parallel through the preset mapping/reduction model to obtain at least one first mapping result, wherein each first mapping result comprises a bitmap index table identifier, a bearer identifier and at least one label value;

and classifying the at least one first mapping result according to the partition information of the N first reduction partitions to obtain at least one first mapping set.

Further, before the at least one data record is subjected to the first type classification according to the partition information of the N first reduction partitions included in the preset mapping/reduction model, each data record needs to be subjected to the first type mapping processing in parallel through the preset mapping/reduction model, so that at least one first mapping result after mapping is classified afterwards.

Optionally, the performing, in parallel, a first kind of reduction processing on the at least one first mapping set through a first reduction partition corresponding to each of the at least one first mapping set to obtain a bitmap of a tag value in each bitmap index partition includes:

for each first mapping set, determining a first reduction partition corresponding to the first mapping set;

sorting the first mapping results in the first mapping set through the first reduction partition according to the bearer identifier in each first mapping result in the first mapping set;

and for each sorted first mapping result, acquiring a bitmap of each label value in at least one label value included in the first mapping result from a bitmap index partition corresponding to the first reduction partition according to the sorting result, and updating the bitmap of the label value according to the bitmap bit of the bearer identifier.

For each reduction partition, the reduction partition processes a plurality of data belonging to the reduction partition according to a certain order, so that for each first mapping set, the first reduction partition corresponding to the first mapping set may sort the first mapping results in the first mapping set first, and process each first mapping result in the first mapping set according to the sorting result.

Optionally, before the updating the bitmap of the label value according to the bitmap bit of the bearer identifier, the method further includes:

when the first mapping result further comprises a bitmap bit of a bearer identifier, executing an operation of updating the bitmap of the label value according to the bitmap bit of the bearer identifier; or

And when the first mapping result does not comprise the bitmap bit of the bearer identifier, acquiring the bitmap bit of the bearer identifier, and executing the operation of updating the bitmap of the label value according to the bitmap bit of the bearer identifier.

Since the bitmap bit of the bearer id needs to be determined first when updating the bitmap of a certain tag value, and the system may have configured the bitmap bit for the bearer id in advance, or may not configure the bitmap bit for the bearer id, the first mapping result may include the bitmap bit of the bearer id, or may not include the bitmap bit of the bearer id. When the first mapping result does not include the bitmap bit of the bearer identifier, the bitmap bit of the bearer identifier needs to be acquired before updating the bitmap of a certain label value.

Optionally, after obtaining the bitmap bit of the bearer identifier, the method further includes:

and storing the corresponding relation between the bitmap bit of the bearer identification and the bearer identification.

Further, when the first mapping result does not include the bitmap bit of the bearer identifier, after the bitmap bit of the bearer identifier is obtained, the corresponding relationship between the bitmap bit of the bearer identifier and the bearer identifier may also be stored, so as to subsequently query the bitmap bit of the bearer identifier according to the bearer identifier, or query the bearer identifier corresponding to the bitmap bit according to the bitmap bit.

Optionally, before the performing the first-class classification on the at least one data record according to partition information of N first reduction partitions included in a preset mapping/reduction model based on a bearer identifier included in each data record, the method further includes:

determining partition information of the bitmap index, wherein the partition information of the bitmap index is used for describing a set of bearer identifications corresponding to each bitmap index partition in the bitmap index;

and determining N first reduction partitions in the preset mapping/reduction model according to the partition information of the bitmap index.

Since the at least one data record is subjected to the first-class classification according to the partition information of the N first reduction partitions included in the preset mapping/reduction model, before the at least one data record is subjected to the first-class classification, the partition information of the N first reduction partitions included in the preset mapping/reduction model can be determined according to the partition information of the bitmap index.

Optionally, after the obtaining of the at least one data record, the method further includes:

based on the bearer identifier included in each data record, performing second-class classification on the at least one data record according to partition information of M second protocol partitions included in the preset mapping/protocol model to obtain at least one second mapping set, wherein each second mapping set corresponds to one second protocol partition;

the M second protocol partitions are determined according to partition information of the M data partitions included in the data table, M is a positive integer, each data partition corresponds to one second protocol partition, and each data partition is used for recording the corresponding relation between a carrier identifier and a tag value;

performing second-class specification processing on the at least one second mapping set in parallel through a second specification partition corresponding to the at least one second mapping set to obtain data in each data partition;

and storing the obtained data of each data partition into the corresponding data partition.

Further, in the embodiment of the present invention, when at least one data record is obtained, the at least one data record may be further stored into a data table based on a preset mapping/reduction model, so as to implement storing the at least one data record into the bitmap index and the data table at the same time. And the data in each data partition can be determined in parallel through the preset mapping/protocol model, so that the efficiency of storing at least one data record to M data partitions is improved.

Optionally, the partition information of each second protocol partition is composed of a bearer data table identifier and a bearer identifier in a preset interval range;

the second classification of the at least one data record according to the partition information of the M second reduction partitions included in the preset mapping/reduction model includes:

performing second-class mapping processing on the at least one data record in parallel through the preset mapping/reduction model to obtain at least one second mapping result, wherein each second mapping result comprises the data table identifier, the bearer identifier and at least one label value;

and classifying the at least one second mapping result according to the partition information of the M second reduction partitions to obtain at least one second mapping set.

Further, before performing second-class classification on the at least one data record according to partition information of M second reduction partitions included in the preset mapping/reduction model, each data record needs to be subjected to second-class mapping processing in parallel through the preset mapping/reduction model, so that at least one second mapping result after mapping is classified later.

Optionally, N is less than or equal to M, N is greater than or equal to 2, each of the M data partitions belongs to a unique bitmap index partition, and each of the N bitmap index partitions contains at least one data partition.

In addition, in order to improve the efficiency of querying data from the data table and simultaneously improve the efficiency of querying data from the bitmap index partition, the M data partitions included in the data table and the N bitmap index partitions included in the bitmap index may satisfy the above condition, that is, the partition range of each data partition in the data table may be set to be smaller and the partition range of each bitmap index partition in the bitmap index may be set to be larger as appropriate.

In a second aspect, there is provided a data storage apparatus having the functionality to carry out the acts of the data storage method of the first aspect described above. The data storage device comprises at least one module, and the at least one module is used for realizing the data storage method provided by the first aspect.

In a third aspect, another data storage device is provided, where the structure of the data storage device includes a processor and a memory, and the memory is used to store a program that supports the data storage device to execute the data storage method provided in the first aspect, and to store data used to implement the data storage method provided in the first aspect. The processor is configured to execute programs stored in the memory. The operating means of the memory device may further comprise a communication bus for establishing a connection between the processor and the memory.

In a fourth aspect, a computer-readable storage medium is provided, which has stored therein instructions, which, when run on a computer, cause the computer to perform the data storage method of the first aspect described above.

In a fifth aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the data storage method of the first aspect.

The beneficial effect that technical scheme that this application provided brought is:

in this application, when at least one data record is obtained, a bitmap of a tag value in each bitmap index partition included in a bitmap index may be determined based on the at least one data record through a preset mapping/reduction model, so as to store the at least one data record into the corresponding bitmap index partition. Because the bitmap index comprises at least one bitmap, and each bitmap corresponds to one label value, the bearer identifier with the label value can be searched through the bitmap index based on the label value, and the efficiency of data query based on the label value is improved. In addition, the bitmap of the label value in each bitmap index partition can be determined in parallel through the preset mapping/reduction model, and the efficiency of data storage is improved.

Drawings

FIG. 1 is a diagram illustrating a bitmap of a tag value according to an embodiment of the present invention;

FIG. 2 is a flow chart of a data storage method according to an embodiment of the present invention;

FIG. 3 is a flow chart of another data storage method provided by an embodiment of the invention;

FIG. 4 is a flow chart of another data storage method provided by the embodiments of the present invention;

FIG. 5A is a block diagram of a data storage device according to an embodiment of the present invention;

FIG. 5B is a block diagram of a first classification module according to an embodiment of the present invention;

FIG. 5C is a block diagram of another data storage device provided by an embodiment of the present invention;

FIG. 5D is a block diagram of a second classification module according to an embodiment of the present invention;

FIG. 6 is a block diagram of another data storage device according to an embodiment of the present invention.

Detailed Description

For the sake of understanding, related terms related to the embodiments of the present invention will be briefly described.

Tags are a way of organizing content to characterize certain characteristics of data to help people describe and classify content. For example, common labels are gender, school calendar, occupation, color, and the like. Optionally, the label is manually specified.

In one possible implementation, the tags may include both enumerated tags and Boolean tags. Enumerated labels refer to labels that include a plurality of enumerated values, such as, for example, a subject including a specialty, a president, a researcher, a doctor, and the like, and, for example, a gender including a male or a female; the boolean label is used only to indicate whether the label is available, such as whether there is a house, whether there is a poison, whether there is a record of a crime, etc.

When the tag is an enumerated tag, the tag value of the tag refers to a specific value of the tag. For example, taking the label as the student calendar, the label value is the subject when the student calendar is the subject, and the label value is the student when the student calendar is the student. And when the tag is a boolean tag, the tag value of the tag is the tag itself. For example, when the user has a room, the tag value is the room, and for example, when the user has no criminal record, the corresponding tag value is the non-criminal record.

Carrying a carrier: are the objects described by the respective tags. Alternatively, the bearer may be a person, a car, a telephone number or a virtual user account, etc. One carrier may have one label or may have a plurality of labels. For example, where the carrier is a person, the label describing the person may be gender, school calendar, whether there is a room, whether there is a record of criminal offense, and the like. For another example, taking the carrier as a car, the label describing the car may have a color, whether there is a violation record, and the like.

Data table: the data records are established in the database by taking the carrier as an index. Each data record in the data table records the identifier of a carrier, records all the label values of the carrier, and records the corresponding relationship between the identifier of the carrier and the label values of the carrier.

And (3) bitmap indexing: the index is a secondary index established in the database by taking the label value in the data table as the index. Optionally, the bitmap index records a tag value and a bitmap, and also records a one-to-one correspondence between the tag value and the bitmap. Each bitmap bit in the bitmap corresponds to the identifier of one carrier, but different bitmap bits in the bitmap correspond to the identifiers of different carriers, that is, all bitmap bits in the bitmap correspond to the identifiers of all carriers one by one. Recording whether a bearer corresponding to the identifier of one bearer has a label value corresponding to a current bitmap (the bitmap in which the bitmap bit is located) by each bitmap bit in the bitmap; for example, if a bitmap bit in a bitmap of a label value is 1, it indicates that the bearer corresponding to the bitmap bit has the label value, whereas if the bitmap bit is 0, it indicates that the bearer corresponding to the bitmap bit does not have the label value. The same bitmap bit in different bitmaps corresponds to the identity of the same bearer.

Taking the carrier as the virtual user account as an example, assume that there are 8 virtual user accounts, and each virtual user account is a user1, a user2, a …, and a user 8. The set with the label value "online shopping destiny" is: users 1, 4, 8, with the label value "forum active molecules" set as: user1, user2 and user 8. Bitmap bits allocated for 8 virtual user accounts in the bitmap are 1, 2 and 3 … 8 in sequence, as shown in fig. 1; for the label value of "online shopping reach", the corresponding bitmap is "10010001"; for the label value "forum active molecule," its corresponding bitmap includes 11000001. Taking the bitmap "10010001" corresponding to the "online buyer" as an example, the first "1" in the bitmap indicates that the virtual user account with the bitmap bit of 1 is the online buyer, similarly, the second "1" in the bitmap indicates that the virtual user account with the bitmap bit of 4 is the online buyer, and the third "1" in the bitmap indicates that the virtual user account with the bitmap bit of 8 is the online buyer; the label value "forum active molecule" corresponds to bitmap "11000001" with a similar meaning. As can be seen from FIG. 1, both user1 and user8 have two tag values of "network reach" and "forum active molecules".

Next, an application scenario of the embodiment of the present invention is introduced, in an actual application, a client generally needs to perform data query through a server, for example, when the client sends a label query request for a certain bearer to the server, the server queries, according to a pre-stored data table, at least one label value corresponding to an identifier of the bearer from the data table according to the identifier of the bearer, and determines at least one label corresponding to the queried at least one label value as a label that the bearer has, where efficiency of querying data is higher at this time. For another example, when the client sends a query request for a certain label value to the server, the server queries the label value of each bearer item by item according to the data table to determine which bearers have the label value, and at this time, the efficiency of querying data is low. Therefore, how the server stores the corresponding relationship between the bearer and the label value will affect the efficiency of the client for performing data query through the server. The embodiment of the invention is applied to a scene of how the server stores data.

That is, the data storage method provided by the embodiment of the present invention is applied to a server. The server can be one or more servers; alternatively, a plurality of servers may provide database services for the terminal in a server cluster manner. In one possible implementation, the server is provided with a Database, which may be a distributed Database such as HBase, Mongo Database (mongoddb), Distributed Relational Database Service (DRDS), Volt Database (VoltDB), ScaleBase, and so on.

It should be noted that the method for storing data provided by the embodiment of the present invention mainly includes two parts, that is, at least one data record is stored in the bitmap index, and the at least one data record is stored in the data table. For convenience of description, the bitmap index and the data table provided by the embodiment of the present invention are introduced first.

The data table is used for recording the corresponding relation between the bearer identification and the label value, the bitmap index comprises at least one bitmap, each bitmap corresponds to one label value, each bitmap comprises at least one bitmap bit, and each bitmap bit is used for recording whether the bearer corresponding to one bearer identification has the label value corresponding to the current bitmap.

Further, in order to improve the efficiency of querying data from the data table and the bitmap index, the data table may be divided into M data partitions, and different data may be stored in the M data partitions in a distributed manner. And simultaneously, dividing the bitmap index into N bitmap index partitions, and distributively storing N data partitions of different data storage data tables. That is, the data table includes M data partitions and the bitmap index includes N bitmap index partitions.

It should be noted that the data is stored in the data table to facilitate subsequent query of the label value corresponding to the bearer identifier according to the bearer identifier, and therefore, in order to improve the efficiency of querying the data from the data table, the first range of each data partition in the data table may be set to be smaller as appropriate. And when the data is stored in the bitmap index, in order to facilitate the subsequent search of the corresponding bearer identifier according to the tag value, since each bitmap index partition includes all the tag values in the tag definition table, in order to improve the efficiency of querying the data from the bitmap index partition, the second range of each bitmap index partition in the bitmap index may be set to be a little larger as appropriate. That is, for each of the N bitmap index partitions, the bitmap index partition includes data in at least one data partition.

That is, in the embodiment of the present invention, M and N may satisfy the relationship that N is less than or equal to M, N is greater than or equal to 2, each of the M data partitions belongs to a unique bitmap index partition, and each of the N bitmap index partitions includes at least one data partition.

Alternatively, the data partitions may be divided by specifying the number of the data partitions, or a partition interval of each data partition may be directly defined. In the embodiment of the present invention, a partition interval of each data partition is directly defined as an example. When the partition interval of each data partition is directly defined, for convenience of description, the set of bearer identifiers corresponding to each data partition is referred to as a first range, that is, the set of bearer identifiers corresponding to each data partition is the same. Each data partition is used for storing data records with bearer identifications located in the partition interval, and no intersection exists between each partition interval, so that the same data record is prevented from being stored in two different data partitions.

For example, the following partition intervals are set for the data table: data partition 1: [, a1), data partition 2: [ a1, a2), data partition 3: [ a2, a3), …, data partition 9: [ a8, a 9). Data partition 1 is used for storing data records with bearer identification in the partition interval [, a1 ], data partition 2 is used for storing data records with bearer identification in the partition interval [ a1, a2 ], data partition 3 is used for storing data records with bearer identification in the partition interval [ a2, a3 ], …, and data partition 9 is used for storing data records with bearer identification in the partition interval [ a8, a 9). Wherein, no intersection exists between two partition intervals [, a1), [ a1, a2), [ a2, a3), …, and [ a8, a 9).

In addition, each data partition of the data table may be automatically fissured or expanded. For example, as time goes by, more and more data of a certain data partition is obtained, and when the data amount of the data partition reaches the split threshold, the server may split the data partition into two data partitions, so as to avoid that new data cannot be continuously written into the data partition after the storage space of the data partition is full.

And the bitmap index partition may be divided in a manner similar to that of the reference data partition. For example, when the data partition is divided in a partition interval defining each data partition, the bitmap index partition is also divided in a partition interval defining each bitmap index partition. That is, the partition interval corresponding to each bitmap index partition is preset by the user, and the server partitions the bitmap index partitions according to the preset partition interval. The ranges of bearer identities corresponding to the bitmap index partitions are also the same. For convenience of description, the range of the bearer identifier corresponding to each bitmap index partition is referred to as a second range.

For example, non-overlapping partition sections [ b0, c0), [ c0, d0), [ d0, e0), [ e0, f0), and [ f0, j0) are set in advance. The bitmap index can be divided into a bitmap index partition 1, a bitmap index partition 2, a bitmap index partition 3, a bitmap index partition 4 and a bitmap index partition 5 according to the several partition intervals. Wherein, bitmap index partition 1 is used for storing bitmap carrying tag value identified in partition interval [ b0, c0), bitmap index partition 2 is used for storing bitmap carrying tag value identified in partition interval [ c0, d0), bitmap index partition 3 is used for storing bitmap carrying tag value identified in partition interval [ d0, e0), bitmap index partition 4 is used for storing bitmap carrying tag value identified in partition interval [ e0, f0), and bitmap index partition 5 is used for storing bitmap carrying tag value identified in partition interval [ f0, j 0).

It should be noted that, each bitmap index partition is used to store a part of bitmap carrying identified tag values, and therefore, for the whole bitmap index, the bitmap of each tag value is formed by combining a part of bitmaps of the tag value in each bitmap index partition.

That is, the same bitmap bits of different sub-bitmaps in each bitmap index partition correspond to the identity of the same bearer, and the same bitmap bits of different sub-bitmaps in different bitmap index partitions correspond to the identity of different bearers.

Since one sub-bitmap is respectively established for all the label values in the label definition table in each bitmap index partition, the number of sub-bitmaps in each bitmap index partition is the number of all the labels in the label definition table. For example, assuming that the total number of the tag values set in the tag definition table is 10, after each bitmap index partition respectively establishes the sub-bitmaps for all the tag values in the tag definition table, the number of the sub-bitmaps of each bitmap index partition is also 10.

Optionally, since the bitmap index partition expansion or fragmentation may cause all bitmap index partitions in the bitmap index to need to be reconstructed, the cost is high, and in this embodiment, the server sets the bitmap index partition as non-splittable or non-expandable.

The following two embodiments will be used to illustrate the detailed process of storing at least one data record to the bitmap index and the data table, respectively.

Fig. 2 is a flowchart of a data storage method according to an embodiment of the present invention, which is applied to a scenario in which the at least one data record is stored in a bitmap index. As shown in fig. 2, the data storage method includes the steps of:

step 201: at least one data record is obtained, each data record comprising a bearer identification and at least one tag value.

Specifically, at least one piece of source data is obtained, each piece of source data includes a bearer identifier and at least one label, and for each piece of source data, a label value of each label in the at least one label is determined according to a preset label definition table, so as to obtain at least one label value.

At least one piece of source data may be data stored in a Hadoop Distributed File System (HDFS), that is, when a client needs to store a certain piece of data, the data is sent to a server, the server stores the data in the HDFS first, and then the server stores the data according to the source data in the HDFS. It should be noted that the server may obtain at least one piece of source data from the HDFS according to a default path, or may obtain at least one piece of source data from the HDFS according to a preset path, which is not specifically limited herein in the embodiment of the present invention.

In addition, the tag definition table may be information that is acquired and stored in advance by the server. Alternatively, the tag definition table may be stored in a separate file, such as an Extensible Markup Language (XML) file, or may be stored in a third-party distributed storage system, such as a ZooKeeper.

The preset label definition table records a plurality of preset label values. In an optional preset mode, the tag value contained in the tag is set according to historical data, or the tag value contained in the tag is artificially defined.

Table 1 shows one possible tag definition table. Of course, table 1 may include more or fewer labels, and is not limited thereto.

TABLE 1

Label (R)	Tag value	Tag configuration information
			Sex	Male and female	Resident memory
Study calendar	Specialist, Ben Ke, research student, doctor	Non-resident memory
			Occupation of the world	Students, teachers, individuals, employees of enterprises	Resident memory
Online shopping fan	Online shopping fan	Resident memory
			Drug addict	Drug addict	Non-resident memory

Optionally, as shown in table 1, the tag definition table may further include tag configuration information, where the tag configuration information includes whether the tag value needs to reside in the memory, whether a bitmap corresponding to the tag value that needs to reside in the memory, and whether a bitmap corresponding to the tag value that does not need to reside in the memory needs to reside in the memory.

In table 1, a flag "resident memory" is set for a tag value requiring resident memory, and a flag "non-resident memory" is set for a tag value not requiring resident memory. It should be understood that the flag "resident memory" may also be set for tag values requiring resident memory, and no flag may be set for tag values not requiring resident memory, and table 1 sets the flag "non-resident memory" for tag values not requiring resident memory, just as an example.

Optionally, the tag definition table may further include a life cycle of each tag value, where the life cycle refers to a time period during which the tag value is valid; i.e., other times not belonging to the lifecycle, the tag value is invalid.

Optionally, the server may also assign a tag number to each tag value in table 1. When the mapping relation between the label value and the bitmap is stored, the label number can be used for replacing the label value, and the storage space can be saved compared with the storage of the label value. In addition, in the method, the corresponding tag value can be inquired according to the tag number, and the corresponding tag number can be inquired according to the tag value.

For example, table 2 is a format of source data provided by the embodiment of the present invention, each row in table 2 represents a piece of source data, each piece of source data has a unique bearer identifier, and each piece of source data further includes at least one label corresponding to the bearer identifier.

TABLE 2

Carrier mark
		a01	Sex-male, study calendar-this department
a02	Gender, study, specialty, occupation, individual, and online shopping person
		b01	Sex, study, special subject, occupation and enterprise employee
b02	Gender, female, study calendar, department, occupation and student
		c01	Sex, male, school calendar, student,
c02	gender, study, specialty, occupation, enterprise employee, and online shopping host
		d01	Sex, male, student, professional, enterprise employee, and online shopping host
d02	Gender, female, study calendar, department, occupation and student
		d03	Sex, study, special subject, occupation and enterprise employee
e01	Sex, student, job, individual and drug addict
		e02	Sex-womanStudy, subject, occupation and individual
e03	Sex, male, study, book, occupation and student
		f01	Sex, student, professional, teacher,
f02	gender, study, specialty, occupation, enterprise employee, and online shopping host
		f03	Sex, male, study, book, occupation and student
f04	Sex, female, student, professional, individual, drug addict

For the source data shown in table 2, table 3 shows the data records corresponding to the respective source data determined by the server according to table 1, wherein the content in each [ ] in table 3 represents a tag value. Each row in table 3 represents a data record, each data record including a bearer identification and at least one label value.

TABLE 3

In the embodiment of the present invention, after the server obtains the at least one data record, the at least one data record may be stored in the bitmap index through a preset mapping/reduction (map/reduce) model. For the sake of convenience in the following description, the preset mapping/reduction model is explained here.

The preset mapping/specification model is a parallel computing model and mainly comprises two computing processes, namely a mapping process (map) and a specification process (reduce), wherein the mapping process is a process of classifying data records according to the types of data to be stored, and the specification process is a process of storing the data records into corresponding files according to the specification partitions corresponding to the data records.

The preset mapping/protocol model comprises a plurality of protocol partitions, each protocol partition corresponds to one data interval, each protocol partition is used for processing data belonging to the data interval, and parallel processing modes are adopted among different protocol partitions. Due to the fact that the different reduction partitions are processed in a parallel mode, the bitmap of the label value in each bitmap index partition can be determined in parallel through the preset mapping/reduction model.

In addition, the mapping process is to map each data record in parallel, so batch data can be processed in parallel through the preset mapping/reduction model, and the data processing efficiency is also improved.

It should be noted that, in the embodiment of the present invention, in addition to storing at least one data record in the bitmap index, the at least one data record may also be stored in the data table, that is, it is necessary to store the data record in both the data table and the bitmap index, so in order to distinguish the data table from the bitmap index, a data table identifier and a bitmap index table identifier are introduced herein, where the data table identifier is used to uniquely identify the data table, and the bitmap index table identifier is used to uniquely identify the bitmap index.

Because need with data record storage to in the data partition that corresponds in the data sheet and the bitmap index in the bitmap index partition, consequently, the stipulation partition of above-mentioned preset mapping/stipulation model can set up to the combination of the data partition of this data sheet and the bitmap index partition of this bitmap index, and at this moment, can directly store data record to in corresponding data partition and the bitmap index partition through this preset mapping/stipulation model. For the sake of convenience in the following description, N reduction partitions corresponding to N bitmap index partitions of the bitmap index one to one are referred to as first reduction partitions, and M reduction partitions corresponding to M data partitions of the data table one to one are referred to as second reduction partitions.

Correspondingly, the mapping process of the preset mapping/reduction model also includes two different mapping processing processes, one is a corresponding mapping process when at least one data record is stored in the bitmap index, which is called a first type of mapping processing, and the other is a corresponding mapping process when at least one data record is stored in the data table, which is called a second type of mapping processing.

Similarly, the specification process of the preset mapping/specification model also includes two different specification processing processes, one is a corresponding specification process when at least one data record is stored in the bitmap index, which is called a first class of specification processing, and the other is a corresponding specification process when at least one data record is stored in the data table, which is called a second class of specification processing.

Therefore, when at least one data record is acquired, in order to partition the bitmap index corresponding to the at least one data record through the preset mapping/reduction model, N first reduction partitions included in the preset mapping/reduction model need to be determined first. Specifically, the process may be implemented by step 302 described below.

Step 202: n first reduction partitions in the preset mapping/reduction model are determined.

Specifically, partition information of the bitmap index is determined, and the partition information of the bitmap index is used for describing a set of bearer identifications corresponding to each bitmap index partition in the bitmap index. And determining N first reduction partitions in the preset mapping/reduction model according to the partition information of the bitmap index, wherein each first reduction partition corresponds to one bitmap index partition. That is, the N first reduction partitions are determined according to partition information of the N bitmap index partitions included in the bitmap index.

It should be noted that, since each partition interval in the data table represents a set of bearer identifiers, and each partition interval in the bitmap index also represents a set of bearer identifiers, if the partition interval of the data table is directly used as the partition interval of M second reduction partitions in the preset mapping/reduction model, and the partition interval of the bitmap index is used as the partition interval of N first reduction partitions in the preset mapping/reduction model, an intersection may exist between the N first reduction partitions and the M second reduction partitions.

Therefore, in order to avoid possible intersection between different reduced partitions, a bitmap index table identifier for identifying a bitmap index is added to the partition interval of the bitmap index, and the partition interval of the bitmap index after the bitmap index table identifier is added is determined as the partition interval of the N first reduced partitions in the preset mapping/reduction model. That is, the partition information of each first reduction partition is composed of the bitmap index table identifier and the bearer identifier of the preset interval range.

For example, the following partition intervals are set for the bitmap index in advance:

[ b0, c0), [ c0, d0), [ d0, e0), [ e0, f0), and [ f0, j 0).

At this time, the following first reduction partition may be set for the preset mapping/reduction model:

[ B b0, Bc0), [ Bc0, Bd0), [ Bd0, Be0), [ Be0, Bf0) and [ Bf0, B j 0).

Wherein, B is an identifier for identifying the bitmap index, that is, a bitmap index table identifier. That is, the first reduction partition [ Bb0, Bc 0], [ Bc0, Bd 0], [ Bd0, Be 0], [ Be0, Bf0) and [ Bf0, B j0) are reduction partitions corresponding to the respective bitmap index partitions one to one.

It is worth noting that for the preset mapping/specification model, since different first specification partitions can process data belonging to partition intervals of the specification partition in parallel, the at least one data record needs to be classified according to the N first specification partitions, so that the different first specification partitions correspondingly process data belonging to the first specification partition.

That is, based on the bearer identifier included in each data record, the at least one data record is subjected to first-class classification according to partition information of N first reduction partitions included in a preset mapping/reduction model, so as to obtain at least one first mapping set, where each first mapping set corresponds to one first reduction partition, so that the first reduction partitions process data in the corresponding first mapping set. Specifically, the process can be realized by the following steps 203 to 204.

Step 203: and performing first-class mapping processing on at least one data record in parallel through the preset mapping/reduction model to obtain at least one first mapping result, wherein each first mapping result comprises the bitmap index table identifier, the bearer identifier and at least one label value.

As shown in step 202, partition intervals of the N first reduction partitions in the preset mapping/reduction model are not actually partition intervals of bitmap index partitions in the bitmap index, and therefore, the first type of mapping process mainly adds a bitmap index table identifier to each data record, so as to subsequently determine the first reduction partition corresponding to each data record.

That is, for each data record, adding a bitmap index table identifier to each data record to obtain a first mapping result.

It should be noted that the preset mapping/reduction model is to add the bitmap index table identifier to each data record in parallel, that is, the preset mapping/reduction model is to add the bitmap index table identifier to each data record at the same time. Therefore, the preset mapping/reduction model adds the bitmap index table identifier to 1 data record in the same time as the bitmap index table identifier is added to n data records, and the efficiency of adding the bitmap index table identifier to at least one data record is improved.

In addition, for the first mapping result, the first mapping result may be recorded in a key-value format. Specifically, table 4 is a format of the first mapping result provided in the embodiment of the present invention, and as shown in table 4, the bitmap index table identifier and the bearer identifier in the first mapping result are collectively set as a key, and at least one tag value in the first mapping result is set as a value of the key.

TABLE 4

Mapping results	Key with a key body	Value of
			First mapping result	Bitmap index table identification + bearer identification	List of tag values

When the first mapping result is recorded in a key-value format, for each first mapping result, remark information may be further added to the corresponding value, where the remark information includes a generation time of each of the at least one tag value, or an internal Identification (ID) of each tag value. When the remark information includes the internal ID of each tag value, it indicates that the tag value can be replaced with the internal ID of the tag value to reduce the transmission amount of the data transmission process.

For example, B is a bitmap index table identification. For the first data record "a 01- > { gender: male, academic: subject }" in table 3, the bearer in this data record is identified as a01, this data record includes two tag values "gender: male "and" school calendar: the president ' performs a first type of mapping processing on the data record by using the preset mapping/reduction model to obtain a first mapping result, wherein the first mapping result comprises a bitmap index table identifier B, a bearer identifier a01 and two tag values of ' male ' and ' president '. Meanwhile, the first mapping result is recorded in the format shown in table 4, and the following first mapping result shown in table 5 is obtained, that is, the first mapping result is recorded as data with a key of Ba01 and a value of { sex: male, calendar: subject }.

TABLE 5

Key with a key body	Value of
		Ba01	{ sex: male, study: this family }

For the first mapping result shown in table 5, when the system is the tag value "gender: male "configured internal ID is 1, tag value" gender: woman "configured internal ID is 2, label value" scholarly: this subject "configured internal ID is 3, label value" study: when the internal ID of the specialist "configuration is 4, the value corresponding to the key Ba01 shown in table 5 can be recorded as {1,3 }.

It should be noted that, for a certain bearer id, the system may already configure a corresponding bitmap bit for the bearer id, and at this time, the first mapping result further includes the bitmap bit of the bearer id. When the system does not configure the corresponding bitmap bit for the bearer id, the first mapping result does not include the bitmap bit of the bearer id at this time.

When the first mapping result includes the bitmap bit of the bearer id, if the key-value format is still used to record the first mapping result, the mapping result format shown in table 6 or table 7 can be obtained. As shown in table 6, at this time, the bitmap index table identifier, the bearer identifier, and the bitmap bit of the bearer identifier are collectively set as a key, and the value is still at least one tag value in the first mapping result.

As shown in table 7, the bitmap index table identifier and the bearer identifier may be set together as a key, and at least one label value and the bitmap bit of the bearer identifier may be set together as a value corresponding to the key.

For example, for the first data record "a 01- > (gender: male, academic: subject }" in table 3, if the current system has configured the corresponding bitmap bit for the bearer identifier a01 and the bitmap bit for the bearer identifier a01 is 5, then the first mapping result is recorded according to the format shown in table 6, and the following first mapping result shown in table 8 is obtained, and the first mapping result is recorded as data with a key of (Ba01, 5) and a value of { gender: male, academic: subject }.

TABLE 6

TABLE 7

TABLE 8

Key with a key body	Value of
		Ba01，5	{ sex: male, study: this family }

After the first type of mapping processing is performed on at least one data record through the preset mapping/reduction model, at least one first mapping result is obtained, that is, for each data record, the first mapping result shown in table 4 or table 6 is obtained. Thereafter, the at least one first mapping result needs to be classified into a first class by the following step 204.

Step 204: and classifying the at least one first mapping result according to the partition information of the N first reduction partitions to obtain at least one first mapping set, wherein each first mapping set corresponds to one first reduction partition.

For this preset mapping/specification model, different specification partitions can process data belonging to partition intervals of this specification partition in parallel, and therefore, for this at least one first mapping result, this at least one first mapping result needs to be categorized into a corresponding first specification partition.

For each first mapping result in the at least one first mapping result, according to the bearer identifier and the bitmap index table identifier in the first mapping result, a partition interval to which the bearer identifier in the first mapping result belongs is searched from partition intervals of the N first reduction partitions, so as to implement classification of the at least one first mapping result. After the classification, at least one first mapping set is obtained, and for each first mapping set, the first mapping set comprises at least one first mapping result.

Step 205: and carrying out first-class reduction processing on the at least one first mapping set in parallel through the first reduction partition corresponding to the at least one first mapping set to obtain the bitmap of the label value in each bitmap index partition.

Since the first reduction partition corresponding to each of the at least one first mapping set performs the first-type reduction processing on the at least one first mapping set in parallel, the following explains a process of performing the first-type reduction processing on one first mapping set. Specifically, the first-class protocol processing procedure is divided into the following two procedures:

(1) and for each first mapping set, determining a first reduction partition corresponding to the first mapping set, and sorting the first mapping results in the first mapping set through the first reduction partition according to the bearer identifier in each first mapping result in the first mapping set.

When the server performs the above-mentioned first kind of reduction processing through the preset mapping/reduction model, for each data record, since the data record has a corresponding first mapping result, the first mapping set to which the first mapping result belongs is determined through step 204. At this time, since the data records corresponding to the first mapping result belonging to the first mapping set are stored in the same bitmap index partition, the server sorts the first mapping result belonging to the first mapping set by the first reduction partition corresponding to the first mapping set, and sequentially stores the data in the first mapping set into the corresponding bitmap index partitions in the order after the sorting.

The method for sorting the first mapping results in the first mapping set is generally a default sorting method, where the default sorting method is arranged in an ascending order according to the dictionary order identified by the bearer or arranged in a descending order according to the dictionary order identified by the bearer, and embodiments of the present invention are not limited in this respect.

For example, the first mapping set includes three first mapping results, bearer identifiers of the three first mapping results are a01, a02 and a03, and the three first mapping results may be sequentially sorted according to an order of a01, a02 and a 03.

(2) And for each sorted first mapping result, acquiring a bitmap of each label value in at least one label value included in the first mapping result from a bitmap index partition corresponding to the first reduction partition according to the sorting result, and updating the bitmap of the label value according to the bitmap bit of the bearer identifier to obtain a bitmap of each label value in the bitmap index corresponding to the first reduction partition.

As shown in step 203, for each data record, the first mapping result of the data record may include bitmap bits of the bearer identifier or may not include bitmap bits of the bearer identifier, so that updating the bitmap of each label value in the at least one label value included in the first mapping result according to the bitmap bits of the bearer identifier may be implemented in two ways:

in a first manner, when the first mapping result further includes a bitmap bit of the bearer identifier, the bitmap of the corresponding label value is updated according to the bitmap bit of the bearer identifier.

In a second manner, when the first mapping result does not include the bitmap bit of the bearer identifier, the bitmap bit of the bearer identifier is obtained, and the bitmap of the corresponding label value is updated according to the bitmap bit of the bearer identifier.

In any case, updating the bitmap of each of the at least one label value included in the first mapping result requires determining a bitmap bit of the bearer identifier first, and after determining the bitmap bit of the bearer identifier, updating a value of the bitmap of the label value on the bitmap bit of the bearer identifier for the bitmap of each of the at least one label value included in the first mapping result.

In the embodiment of the present invention, the bitmap of the label value may be stored in the manner shown in fig. 1, that is, the value of the bitmap of the label value on each bitmap bit is 0 or 1, at this time, the value of the bitmap of the label value on the bitmap bit of the bearer identifier is updated, that is, the value of the bitmap of the label value on the bitmap bit of the bearer identifier is set to 1.

It should be noted that, since the bitmap for determining each label value is implemented by setting the value of the bitmap for the label value at the bitmap bit of the bearer identity to 1. Therefore, in the embodiment of the present invention, for the bitmap of each tag value, the value of each tag value on each bitmap bit is initialized in advance, that is, set to 0. Then, for each label value in a certain bitmap index partition, when a first mapping result is currently being processed, determining a bitmap bit of a bearer identifier included in the first mapping result, and for each label value in the first mapping result that includes at least one label value, changing a value of a sub-bitmap of the label value on the bitmap bit of the bearer identifier to 1, that is, updating the sub-bitmap of the at least one label value, that is, updating the bitmap index partition. No processing is performed on the other label values except the at least one label value in the bitmap index partition, that is, the value of the sub-bitmap of the other label value on the bitmap bit of the bearer identifier is still 0, indicating that the bearer identifier does not have the other label values.

After the first mapping result is processed, when the second first mapping result is processed, the difference is that, at this time, the bitmap index partition is updated continuously on the bitmap index partition after the bitmap index partition is updated according to the first mapping result, that is, at this time, when the sub-bitmap of at least one label value in the first mapping result in the bitmap index partition is updated, the value of the sub-bitmap of at least one label value in the first mapping result in the bitmap index partition on the bitmap bit of the bearer identifier of the first mapping result is already 1.

That is, in the process of sequentially determining the bitmap of each tag value in at least one tag value included in the first mapping result, for each first mapping result, the bitmap of the tag value is continuously updated on the basis of the bitmap of the tag value in the bitmap index partition that is updated according to the last first mapping result.

For example, table 9 is an initialized bitmap index according to an embodiment of the present invention, and as shown in table 9, the bitmap index includes a plurality of bitmap index partitions, each bitmap index partition includes sub-bitmaps of all tag values, and the initialized value of each sub-bitmap of each tag value on each bitmap bit is 0.

TABLE 9

For the first data record "a 01- > { gender: male, course: subject }" to the 9 th data record "d 03- > { gender: male, course: subject, profession: enterprise employee } ", the first mapping results of the 9 data records are the first mapping results shown in table 5, and the first mapping results of the 9 data records are classified into the first reduced partition corresponding to the bitmap index partition 1 through the above step 204. And sorting the 9 first mapping results according to the bearer identifications a01, a02, b01, b02, c01, c02, d01, d02 and d03 in the 9 first mapping results, and sequentially sorting the first mapping result of the first data record, the first mapping result of the second data record, … and the first mapping result of the 9 th data record.

When it is determined that the bitmap bits configured by the system for the bearer identifiers a01, a02, b01, b02, c01, c02, d01, d02, and d03 are 1, 2, 3, 4, 5, 6, 7, 8, and 9, respectively, for the first mapping result with the bearer identifier a01, the second mapping result includes two tag values, "gender: from table 9, the "and" academic story "of man indicates that the two tag values correspond to the first sub-bitmap of the tag value and the fourth sub-bitmap of the tag value in bitmap index partition 1, and at this time, the value of the sub-bitmap of the two tag values on bitmap bit 1 is updated to 1, so as to obtain bitmap index partition 1 shown in table 10.

Watch 10

For a first mapping result with a bearer identification of a02, the first mapping result includes four label values, "gender: from table 9, the corresponding sub-bitmaps of the four tag values in bitmap index partition 1 are the sub-bitmap of the second tag value, the sub-bitmap of the third tag value, the sub-bitmap of the sixth tag value, and the sub-bitmap of the penultimate tag value, and at this time, on the basis of table 10, the bitmap of the four tag values on bitmap bit 2 is continuously updated to 1, so as to obtain bitmap index partition 1 shown in table 11.

TABLE 11

And repeating the above steps until the 9 second mapping results corresponding to the 9 data records are processed, so that the 9 data records are all stored in the bitmap index partition 1, that is, the bitmap of each label value in the bitmap index partition 1 is obtained.

Optionally, the bitmap of the tag value may also be represented in an array manner, where the array of the tag value is used to represent the bitmap bit of "1" in the bitmap of the tag value. For example, the label value "antivirus" corresponds to a bitmap "[ 0000000001000000. ]", which may also be represented as an array [10], and the label value "online buyer" corresponds to a bitmap "[ 0100011000000100. ]", which may also be represented as an array [2, 6, 7, 14 ]. The bitmap of the label value is represented in an array mode, so that the storage space can be saved.

At this time, the value of the bitmap of the label value on the bitmap bit of the bearer identifier is updated, that is, the bitmap bit of the bearer identifier is newly added in the array of the label value. For example, for a certain label value, the bitmap bit corresponding to the identifier of the bearer in the sub-bitmap is 3, the initial sub-bitmap corresponding to the label value is [1, 7], and after the value of the bitmap of the label value on the bitmap bit of the identifier of the bearer is updated, the sub-bitmap after the label value is updated is [1, 3, 7 ].

In addition, when the first mapping result does not include the bitmap bit identified by the bearer identifier, it indicates that the system has not configured the corresponding bitmap bit for the bearer identifier in the data record before mapping the data record corresponding to the first mapping result, and at this time, after acquiring the bitmap of the bearer identifier, the corresponding relationship between the bitmap bit of the bearer identifier and the bearer identifier may also be stored.

Specifically, a first pair of key values for indicating a mapping relationship from the bearer identifier to the bitmap bit is determined according to the bitmap bit of the bearer identifier and the bearer identifier, where a key is the identifier of the bearer and a value is the bitmap bit of the bearer identifier. And determining a second pair of key values for indicating the mapping relationship between the bitmap bit and the bearer identity, wherein the key is the bitmap bit of the bearer identity, and the value is the bearer identity. And stores the first pair of key values and the second pair of key values.

That is, in the embodiment of the present invention, the bitmap bits of the bearer identifier and the bearer identifier are stored in a bidirectional mapping manner, so as to search for the corresponding bitmap bits according to the bearer identifier, or search for the corresponding bearer identifier according to the bitmap bits.

Step 206: and storing the obtained bitmap of the label value in each bitmap index partition into the corresponding bitmap index partition.

As shown in step 205, for a first mapping set, each time at least one bitmap of a first mapping result is determined based on at least one bitmap of a last first mapping result, for any first mapping result, when at least one bitmap is determined to be obtained, the at least one bitmap needs to be stored in a bitmap index partition corresponding to the first mapping result, so that when a next first mapping result is processed later, updating is continued according to an updated target bitmap index partition.

Therefore, after all the first mapping results belonging to the first mapping set are subjected to the first type of reduction processing through the first reduction partition corresponding to the first mapping set, the bitmap of each tag value in the bitmap index partition corresponding to the first reduction partition can be obtained, and at this time, the obtained bitmap of each tag value in the bitmap index partition can be directly stored in the bitmap index partition.

In the embodiment of the present invention, when at least one data record is obtained, a bitmap of a tag value in each bitmap index partition included in the bitmap index may be determined based on the at least one data record through a preset mapping/reduction model, so as to implement storage of the at least one data record into the corresponding bitmap index partition. Because the bitmap index comprises at least one bitmap, and each bitmap corresponds to one label value, the bearer identifier with the label value can be searched through the bitmap index based on the label value, and the efficiency of data query based on the label value is improved. In addition, the bitmap of the label value in each bitmap index partition can be determined in parallel through the preset mapping/reduction model, and the efficiency of data storage is improved.

Fig. 3 is a flowchart of a data storage method according to an embodiment of the present invention, which is applied to a scenario in which the at least one data record is stored in a data table. As shown in fig. 3, the data storage method includes the steps of:

step 301: at least one data record is obtained, each data record comprising a bearer identification and at least one tag value.

Implementation of step 301 is substantially the same as that of step 201 in fig. 2, and will not be described in detail here.

It should be noted that, as shown in step 201 in fig. 2, the preset mapping/reduction model includes M second reduction partitions, which are reduction partitions corresponding to M data partitions included in the data table in a one-to-one manner. And storing the at least one data record into a data table, wherein the second type of mapping processing and the second type of protocol processing of the preset mapping/protocol model need to be passed.

That is, when at least one data record is acquired, in order to partition the data corresponding to the at least one data record by using the preset mapping/reduction model, a plurality of first reduction partitions included in the preset mapping/reduction model need to be determined first. Specifically, the process may be implemented by step 302 described below.

Step 302: m second reduction partitions in the preset mapping/reduction model are determined.

Specifically, partition information of a data table is determined, and the partition information of the data table is used for describing a set of bearer identifications corresponding to each data partition in the data table. And determining M second protocol partitions in the preset mapping/protocol model according to the partition information of the data table, wherein each second protocol partition corresponds to one data partition.

As shown in step 202 in fig. 2, to avoid possible intersection between different reduced partitions, a data table identifier for identifying a data table is added to a partition interval of the data table, and the partition interval of the data table after the data table identifier is added is determined as a partition interval of M second reduced partitions in the preset mapping/reduction model. That is, the partition information of each second protocol partition is composed of a data table identifier and a bearer identifier of a preset interval range.

For example, the following partition sections are set for the data table in advance:

[,a1)、[a1,a2)、[a2,a3)、…、[a8,a9)。

at this time, the following second reduction partition may be set for the preset mapping/reduction model:

[,Aa1)、[Aa1,A a2)、[Aa2,Aa3)、…、[Aa8,Aa9)。

wherein, a is an identifier for identifying the data table, that is, the data table identifier. That is, the second protocol partitions [, Aa1 ], [ Aa1, A a2 ], [ Aa2, Aa3), …, [ Aa8, Aa9) are protocol partitions corresponding to the respective data partitions one to one.

Similarly, since different second specification partitions can process the data belonging to the partition interval of the specification partition in parallel, it is necessary to classify the at least one data record according to the M second specification partitions, so that the different second specification partitions correspondingly process the data belonging to the second specification partition.

That is, based on the bearer identifier included in each data record, performing second-class classification on the at least one data record according to partition information of M second reduction partitions included in a preset mapping/reduction model to obtain at least one second mapping set, where each second mapping set corresponds to one second reduction partition, so that the second reduction partitions process data in the corresponding first mapping set. Specifically, the process can be realized by the following steps 303 to 304.

Step 303: and carrying out second type mapping processing on at least one data record in parallel through a preset mapping/reduction model to obtain at least one second mapping result, wherein each second mapping result comprises a data table identifier, a carrier identifier and at least one label value.

As can be seen from step 302, partition intervals of M second protocol partitions in the preset mapping/protocol model are not actually partition intervals of data partitions in the data table, and therefore, the second type of mapping processing mainly adds a data table identifier to each data record, so as to subsequently determine the second protocol partition corresponding to each data record.

That is, for each data record, the preset mapping/reduction model adds a data table identifier to each data record to obtain a second mapping result.

It should be noted that the preset mapping/reduction model adds the data table identifier to each data record in parallel, that is, the preset mapping/reduction model adds the data table identifier to each data record at the same time. Therefore, the preset mapping/reduction model adds the data table identifier to 1 data record in the same time as the preset mapping/reduction model adds the data table identifier to n data records, and the efficiency of adding the data table identifier to the at least one data record is improved.

In addition, for the second mapping result, the second mapping result may be recorded in a key-value format. Specifically, table 12 is a format of a second mapping result provided in the embodiment of the present invention, and as shown in table 12, for the second mapping result, the data table identifier and the bearer identifier are collectively set as a key, and at least one tag value in the second mapping result is set as a value of the key.

TABLE 12

Mapping results	Key with a key body	Value of
			Second mapping result	Data table identification + carrier identification	List of tag values

Likewise, when the second mapping results are recorded in a key-value format, for each second mapping result, remark information may also be added to the corresponding value, and the remark information may be the remark information in the first mapping result in step 203 in fig. 2.

For example, A is a data table identification. For the first data record "a 01- > { gender: male, academic: subject }" in table 3, the bearer in this data record is identified as a01, this data record includes two tag values "gender: male "and" school calendar: the family ", the preset mapping/reduction model performs a second type of mapping processing on the data record to obtain a second mapping result, and the second mapping result includes a data table identifier a, a bearer identifier a01, and two tag values" male "and" family ". At the same time, the second mapping result is recorded in the format shown in table 12, and the following second mapping result shown in table 13 is obtained, that is, the second mapping result is recorded as data with a key of Aa01 and a value of { sex: male, academic calendar: subject }.

Watch 13

Key with a key body	Value of
		Aa01	{ sex: male, study: this family }

For the second mapping result shown in table 13, when the system is the tag value "gender: male "configured internal ID is 1, tag value" gender: woman "configured internal ID is 2, label value" scholarly: this subject "configured internal ID is 3, label value" study: when the internal ID of the special "arrangement is 4, the value corresponding to the key Aa01 shown in table 13 can be recorded as {1,3 }.

After the second type mapping processing is performed on at least one data record through the preset mapping/reduction model, at least one second mapping result is obtained, that is, for each data record, the second mapping result shown in the table 12 is obtained. Thereafter, the at least one second mapping result needs to be classified into a second class by the following step 304.

Step 304: and classifying the at least one second mapping result according to the partition information of the M second reduction partitions to obtain at least one second mapping set, wherein each first mapping set corresponds to two second reduction partitions.

For this preset mapping/specification model, different specification partitions can process data belonging to partition intervals of this specification partition in parallel, and therefore, for this at least one second mapping result, this at least one second mapping result needs to be categorized into a corresponding second specification partition.

For each second mapping result in the at least one second mapping result, according to the bearer identifier and the data table identifier in the second mapping result, a partition interval to which the bearer identifier in the second mapping result belongs is searched from partition intervals of the M second reduction partitions, so as to implement classification of the at least one second mapping result. After the classification, at least one second mapping set is obtained, and for each second mapping set, the second mapping set comprises at least one second mapping result.

Step 305: and carrying out second type reduction processing on the at least one second mapping set in parallel through the second reduction partition corresponding to the at least one second mapping set to obtain data in each data partition.

Since the second reduction partition corresponding to each of the at least one second mapping set performs the second-type reduction processing on the at least one second mapping set in parallel, the following explains a process of performing the second-type reduction processing on one second mapping set. Specifically, as in step 205 in fig. 2, the second-class specification processing procedure is also divided into the following two procedures:

(1) and for each second mapping set, determining a first reduction partition corresponding to the second mapping set, and sorting the second mapping results in the second mapping set through the second reduction partition according to the bearer identifier in each second mapping result in the second mapping set.

The implementation manner of sorting the second mapping results in the second mapping set by the second reduced partition may refer to the implementation manner of sorting the first mapping results in the first mapping set by the first reduced partition in step 205 in fig. 2, and an embodiment of the present invention is not described in detail herein.

(2) And for each sorted second mapping result, sequentially generating at least one record of the second mapping result according to the sorting result and the sorting result, wherein each record comprises a bearer identifier and a label value so as to obtain data in the data partition corresponding to the second mapping set.

After the second mapping results in the second mapping set are sorted, each second mapping result can be sequentially processed according to the sorting result through the second reduction partition corresponding to the second mapping set in the preset mapping/reduction model.

When the format of the second mapping result is the key-value format shown in table 12 in step 303, at this time, at least one record of the second mapping result is generated, that is, the data table identifier is deleted from the key of the second mapping result, data whose key is the bearer data identifier and whose value is the at least one tag value is obtained, and the obtained data is converted into at least one record, where each record includes the bearer identifier and a tag value.

At this time, the at least one record may also be output in a key-value format, that is, for each record, the bearer identifier is used as a key, and the one tag value is used as a value, so as to obtain a record in a key-value format.

For example, for the second mapping result in table 13, the following two records shown in table 14 can be obtained through step 305. As shown in Table 14, the key of the first record is a01 with a value of { sex: male }, and the key of the second record is a01 with a value of { academic: subject }.

TABLE 14

Key with a key body	Value of
		a01	Sex of male
a01	Study calendar of this subject

Step 306: and storing the obtained data of each data partition into the corresponding data partition.

For each first mapping set, when the data in each data partition is obtained through step 305, the data of each data partition may be directly stored into the corresponding data partition. Since the step 305 is executed in parallel by the different second mapping sets, that is, in the embodiment of the present invention, the preset mapping/reduction model can implement that the data records belonging to different data partitions are stored in parallel into the corresponding data partitions, so as to improve the efficiency of storing data.

In the embodiment of the present invention, when at least one data record is obtained, data in each data partition included in the data table may be determined based on the at least one data record through a preset mapping/reduction model, so as to store the at least one data record into a corresponding data partition, so as to facilitate subsequent query of a tag of a certain bearer based on the data table. In addition, the data in each data partition can be determined in parallel through the preset mapping/protocol model, and the data storage efficiency is improved.

It should be noted that, because different protocol partitions in the preset mapping/protocol model are processed in parallel, in the embodiment of the present invention, through N first protocol partitions and M second protocol partitions included in the preset mapping/protocol model, it is possible to simultaneously construct a data table and a bitmap index according to the at least one data record. This will be explained in detail in the following examples.

Referring to fig. 4, an embodiment of the present invention provides a data storage method for storing the at least one data record to a scene in a data table and a bitmap index simultaneously, as shown in fig. 4, the method includes the following steps:

step 401: at least one data record is obtained, each data record comprising a bearer identification and at least one tag value.

Implementation of step 401 is substantially the same as that of step 201 in fig. 2, and will not be described in detail here.

After obtaining at least one data record, a bitmap index and a data table are simultaneously constructed through the following steps 402 to 406.

Step 402: n first and M second reduced partitions in the preset mapping/reduction model are determined.

The implementation of step 402 may refer to the implementation of step 202 in fig. 2 and step 302 in fig. 3.

That is, in order to implement simultaneous construction of the bitmap index and the data table, when at least one data record is obtained, N first reduction partitions corresponding to the N bitmap index partitions and M second reduction partitions corresponding to the M data partitions may be obtained simultaneously.

Step 403: performing first type mapping processing on at least one data record in parallel through the preset mapping/reduction model to obtain at least one first mapping result, wherein each first mapping result comprises a bitmap index table identifier, a bearer identifier and at least one label value; and meanwhile, carrying out second-class mapping processing on at least one data record in parallel through a preset mapping/reduction model to obtain at least one second mapping result, wherein each second mapping result comprises a data table identifier, a carrier identifier and at least one label value.

The implementation manner of step 403 may refer to the implementation manners of step 203 in fig. 2 and step 303 in fig. 3.

That is, in the embodiment of the present invention, the first type mapping process in step 203 in fig. 2 and the second type mapping process in step 303 in fig. 3 may be processed in parallel, so as to obtain the first mapping result and the second mapping result of each data record at the same time.

Step 404: classifying the at least one first mapping result according to partition information of the N first reduction partitions to obtain at least one first mapping set, wherein each first mapping set corresponds to one first reduction partition; and meanwhile, classifying the at least one second mapping result according to the partition information of the M second protocol partitions to obtain at least one second mapping set, wherein each first mapping set corresponds to two second protocol partitions.

Wherein, the implementation of step 404 may refer to the implementation of step 204 in fig. 2 and step 304 in fig. 3.

That is, the first class classification in step 204 in fig. 2 and the second class classification in step 304 in fig. 3 may be processed in parallel to achieve the classification of the at least one first mapping result and the classification of the at least one second mapping result at the same time.

Step 405: performing first-type reduction processing on the at least one first mapping set in parallel through the first reduction partition corresponding to the at least one first mapping set to obtain bitmaps of the tag values in each bitmap index partition; and meanwhile, carrying out second-class reduction processing on the at least one second mapping set in parallel through the second reduction partition corresponding to the at least one second mapping set to obtain data in each data partition.

The implementation of step 405 may refer to the implementation of step 205 in fig. 2 and step 305 in fig. 3.

That is, in the embodiment of the present invention, whether between N first reduced partitions, between M second reduced partitions, or between a first reduced partition and a second reduced partition, each reduced partition belongs to its own data in parallel. That is, the processing data between the respective reduction partitions are independent from each other, so that the data in the respective bitmap index partitions and the data in the respective data partitions are determined at the same time.

Step 406: storing the obtained bitmap of the label value in each bitmap index partition into the corresponding bitmap index partition; and simultaneously, storing the obtained data of each data partition into the corresponding data partition.

Wherein, the implementation of step 406 may refer to the implementation of step 206 in fig. 2 and step 306 in fig. 3.

Because the different protocol partitions are independent from each other, the at least one data record can be stored in the bitmap index and the data table at the same time.

In the embodiment of the present invention, when at least one data record is obtained, the at least one data record may be simultaneously stored into the corresponding bitmap index partition and the corresponding data partition based on the at least one data record through a preset mapping/reduction model, so as to query a bearer identifier corresponding to a certain label value based on the bitmap index partition or query a label of a certain bearer based on the data partition, thereby improving efficiency of data storage.

In addition to providing the above-mentioned data storage method, the embodiment of the present invention further provides a data storage apparatus, and referring to fig. 5A, the data storage apparatus 500 includes an obtaining module 501, a first classification module 502, a first specification module 503, and a first storage module 504.

An obtaining module 501, configured to execute step 201 in fig. 2 or step 301 in fig. 3;

a first classification module 502, configured to perform, based on a bearer identifier included in each data record, first class classification on the at least one data record according to partition information of N first protocol partitions included in a preset mapping/protocol model, so as to obtain at least one first mapping set, where each first mapping set corresponds to one first protocol partition;

a first reduction module 503, configured to perform step 205 in fig. 2;

the first storage module 504 is configured to execute step 206 in fig. 2.

referring to fig. 5B, the first classification module 502 includes a first mapping unit 5021 and a first classification unit 5022:

a first mapping unit 5021, configured to perform step 203 in fig. 2;

the first classification unit 5022 is used for performing step 204 in fig. 2.

Optionally, the first reduction module 503 includes:

a determining unit, configured to determine, for each first mapping set, a first reduced partition corresponding to the first mapping set;

a sorting unit, configured to sort, according to a bearer identifier in each first mapping result in the first mapping set, the first mapping results in the first mapping set through the first reduction partition;

and the updating unit is used for acquiring a bitmap of each label value in at least one label value included in the first mapping result from a bitmap index partition corresponding to the first reduction partition according to the sorting result for each sorted first mapping result, and updating the bitmap of the label value according to the bitmap bit of the bearer identifier.

Optionally, the first reduction module 503 further includes:

a first execution unit, configured to, when the first mapping result further includes a bitmap bit of a bearer identifier, execute an operation of updating the bitmap of the tag value according to the bitmap bit of the bearer identifier; or

And a second execution unit, configured to, when the first mapping result does not include the bitmap bit of the bearer identifier, obtain the bitmap bit of the bearer identifier, and execute an operation of updating the bitmap of the label value according to the bitmap bit of the bearer identifier.

Optionally, the second execution unit is further configured to:

Optionally, the apparatus 500 further comprises:

a first determining module, configured to determine partition information of the bitmap index, where the partition information of the bitmap index is used to describe a set of bearer identifiers corresponding to each bitmap index partition in the bitmap index;

and the second determining module is used for determining N first reduction partitions in the preset mapping/reduction model according to the partition information of the bitmap index.

Optionally, referring to fig. 5C, the apparatus 500 further includes a second classification module 505, a second specification module 506, and a second storage module 507:

a second classification module 505, configured to perform second-class classification on at least one data record according to partition information of M second reduction partitions included in the preset mapping/reduction model based on a bearer identifier included in each data record, so as to obtain at least one second mapping set, where each second mapping set corresponds to one second reduction partition;

a second reduction module 506, configured to perform step 305 in fig. 3;

a second storage module 507, configured to execute step 306 in fig. 3.

referring to fig. 5D, the second classification module 505 includes a second mapping unit 5051 and a second classification unit 5052:

a second mapping unit 5051, which is used for performing step 304 in fig. 3;

a second classification element 5052 is used to perform step 305 in fig. 3.

Fig. 6 is a schematic diagram of another data storage device according to an embodiment of the invention. The data storage apparatus 600 may be a computer device, which may be a server as described above, the data storage apparatus 600 comprising at least one processor 601, a communication bus 602, a memory 603 and at least one communication interface 604.

The processor 601 may be a general purpose Central Processing Unit (CPU), microprocessor, application-specific integrated circuit (ASIC), or one or more ics for controlling the execution of programs in accordance with the inventive arrangements.

The communication bus 602 may include a path that conveys information between the aforementioned components. The communication interface 604 may be implemented using any device, such as a transceiver, for communicating with other devices or communication Networks, such as an ethernet, a Radio Access Network (RAN), a Wireless Local Area Network (WLAN), etc.

The Memory 603 may be a Read-Only Memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an electrically erasable Programmable Read-Only Memory (EEPROM), a Compact Disc Read-Only Memory (CD-ROM) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these. The memory may be self-contained and coupled to the processor via a bus. The memory may also be integral to the processor.

The memory 603 is used for storing program codes for implementing the present invention, and is controlled by the processor 601. The processor 601 is configured to execute program code stored in the memory 603.

In particular implementations, processor 601 may include one or more CPUs such as CPU0 and CPU1 in fig. 6 as an example.

In particular implementations, data storage device 600 may include multiple processors, such as processor 601 and processor 608 of FIG. 6, for example, as an example. Each of these processors may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).

In particular implementations, data storage 600 may also include an output device 605 and an input device 606, as one embodiment. Output device 605 is in communication with processor 601 and may display information in a variety of ways. For example, the output device 605 may be a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display device, a Cathode Ray Tube (CRT) display device, a projector (projector), or the like. The input device 606 is in communication with the processor 601 and may accept user input in a variety of ways. For example, the input device 606 may be a mouse, a keyboard, a touch screen device, or a sensing device, among others.

The data storage device 600 may be a general-purpose computer device or a special-purpose computer device. In a specific implementation, the data storage device 600 may be a desktop computer, a portable computer, a web server, a Personal Digital Assistant (PDA), a mobile phone, a tablet computer, a wireless terminal device, a communication device, an embedded device, or a device with a similar structure as in fig. 6. The embodiment of the present invention does not limit the type of the data storage apparatus 600 for user password management.

One or more software modules are stored in the memory of the data storage device. The data storage device may implement the data storage method according to the above embodiment by implementing a software module by a processor and program codes in a memory.

One embodiment of the present application also provides a computer storage medium having instructions stored therein; the data storage device (which may be a computer device, such as a server) executes the instructions, for example, a processor in the computer device executes the instructions, so that the data storage device implements the data storage method according to the above embodiments.

An embodiment of the present application provides a computer program product, comprising instructions; the data storage device (which may be a computer device, such as a server) executes the instructions, causing the data storage device to perform the data storage method of the above-described method embodiments.

The above-mentioned embodiments are provided not to limit the present application, and any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of data storage, the method comprising:

performing a first type of reduction processing on the at least one first mapping set in parallel through a first reduction partition corresponding to each of the at least one first mapping set to obtain a bitmap of a tag value in each bitmap index partition, wherein the first type of reduction processing refers to: for any first mapping set, determining a bitmap index partition corresponding to a first reduction partition corresponding to the any first mapping set, and updating a label value corresponding to a bearer identifier in a data record in the any first mapping set in each bitmap in the determined bitmap index partition according to the bearer identifier in the data record in the any first mapping set;

2. The method according to claim 1, wherein the partition information of each first reduced partition is composed of a bitmap index table identifier and a bearer identifier of a preset interval range;

3. The method according to claim 2, wherein the performing, in parallel, a first type of reduction processing on the at least one first mapping set through the corresponding first reduction partition of the at least one first mapping set to obtain the bitmap of the tag value in each bitmap index partition comprises:

4. The method of claim 3, wherein before updating the bitmap of label values according to the bitmap bits of the bearer identity, further comprising:

5. The method of claim 4, wherein after obtaining the bitmap bits of the bearer identity, further comprising:

6. The method according to any of claims 1 to 5, wherein before said first classification of said at least one data record according to partition information of N first reduced partitions included in a preset mapping/reduction model based on a bearer identifier included in each data record, further comprising:

7. The method of claim 1, wherein after obtaining the at least one data record, further comprising:

performing second type reduction processing on the at least one second mapping set in parallel through a second reduction partition corresponding to the at least one second mapping set, so as to obtain data in each data partition, where the second type reduction processing refers to: for any second mapping set, determining a data partition corresponding to a second reduction partition corresponding to the any second mapping set, and taking a label value corresponding to a bearer identifier in a data record in the any second mapping set as data in the determined data partition according to the bearer identifier in the data record in the any second mapping set;

8. The method according to claim 7, wherein the partition information of each second reduction partition is composed of a bearer data table identifier and a bearer identifier of a preset interval range;

9. The method of any of claims 1 to 5, 7 to 8, wherein N is less than or equal to M, N is greater than or equal to 2, each of the M data partitions belongs to a unique bitmap index partition, and each of the N bitmap index partitions contains at least one data partition.

10. A data storage device, characterized in that the device comprises:

the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring at least one data record, and each data record comprises a bearer identifier and at least one label value;

the first classification module is used for performing first-class classification on at least one data record according to partition information of N first protocol partitions included in a preset mapping/protocol model based on a carrier identifier included in each data record to obtain at least one first mapping set, wherein each first mapping set corresponds to one first protocol partition;

a first reduction module, configured to perform a first type of reduction processing on the at least one first mapping set in parallel through a first reduction partition corresponding to the at least one first mapping set, so as to obtain a bitmap of a tag value in each bitmap index partition, where the first type of reduction processing refers to: for any first mapping set, determining a bitmap index partition corresponding to a first reduction partition corresponding to the any first mapping set, and updating a label value corresponding to a bearer identifier in a data record in the any first mapping set in each bitmap in the determined bitmap index partition according to the bearer identifier in the data record in the any first mapping set;

and the first storage module is used for storing the obtained bitmap of the label value in each bitmap index partition into the corresponding bitmap index partition.

11. The apparatus according to claim 10, wherein the partition information of each first reduced partition is composed of a bitmap index table identifier and a bearer identifier of a preset interval range;

the first classification module comprises:

the first mapping unit is used for performing first-class mapping processing on the at least one data record in parallel through the preset mapping/reduction model to obtain at least one first mapping result, and each first mapping result comprises a bitmap index table identifier, a bearer identifier and at least one label value;

and the first classification unit is used for classifying the at least one first mapping result according to the partition information of the N first reduction partitions to obtain at least one first mapping set.

12. The apparatus of claim 11, wherein said first reduction module comprises:

13. The apparatus of claim 12, wherein said first reduction module further comprises:

14. The apparatus of claim 13, wherein the second execution unit is further configured to:

15. The apparatus of any of claims 10-14, further comprising:

16. The apparatus of claim 10, further comprising:

the second classification module is configured to perform second-class classification on the at least one data record according to partition information of M second protocol partitions included in the preset mapping/protocol model based on a bearer identifier included in each data record, so as to obtain at least one second mapping set, where each second mapping set corresponds to one second protocol partition;

a second reduction module, configured to perform second-class reduction processing on the at least one second mapping set in parallel through a second reduction partition corresponding to the at least one second mapping set, so as to obtain data in each data partition, where the second-class reduction processing refers to: for any second mapping set, determining a data partition corresponding to a second reduction partition corresponding to the any second mapping set, and taking a label value corresponding to a bearer identifier in a data record in the any second mapping set as data in the determined data partition according to the bearer identifier in the data record in the any second mapping set;

and the second storage module is used for storing the obtained data of each data partition into the corresponding data partition.

17. The apparatus according to claim 16, wherein the partition information of each second protocol partition is composed of a bearer data table identifier and a bearer identifier of a preset interval range;

the second classification module comprises:

the second mapping unit is used for performing second-class mapping processing on the at least one data record in parallel through the preset mapping/reduction model to obtain at least one second mapping result, and each second mapping result comprises the data table identifier, the bearer identifier and at least one label value;

and the second classification unit is used for classifying the at least one second mapping result according to the partition information of the M second protocol partitions to obtain at least one second mapping set.

18. The apparatus of any of claims 10 to 14, 16 to 17, wherein N is less than or equal to M, N is greater than or equal to 2, each of the M data partitions belongs to a unique bitmap index partition, and each of the N bitmap index partitions contains at least one data partition.

19. A data storage device, characterized in that the device comprises: a memory having instructions stored therein, and a processor that causes a data storage device to implement the data storage method of any of claims 1 to 9 by executing the instructions stored in the memory.

20. A computer-readable storage medium having instructions stored therein, execution of which by a data storage device causes the data storage device to implement the data storage method of any one of claims 1 to 9.