[go: up one dir, main page]

CN110519319B - Method and device for splitting partitions - Google Patents

Method and device for splitting partitions Download PDF

Info

Publication number
CN110519319B
CN110519319B CN201810494401.5A CN201810494401A CN110519319B CN 110519319 B CN110519319 B CN 110519319B CN 201810494401 A CN201810494401 A CN 201810494401A CN 110519319 B CN110519319 B CN 110519319B
Authority
CN
China
Prior art keywords
target
unit
hbase table
rnn
timestamp
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810494401.5A
Other languages
Chinese (zh)
Other versions
CN110519319A (en
Inventor
王玉华
王鹏宇
董明
李林森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN201810494401.5A priority Critical patent/CN110519319B/en
Publication of CN110519319A publication Critical patent/CN110519319A/en
Application granted granted Critical
Publication of CN110519319B publication Critical patent/CN110519319B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a method and a device for splitting partitions, and belongs to the field of communication. The method comprises the following steps: predicting a first number of partitions in an Hbase table to be occupied in a target time period according to data records stored in the Hbase table before the target time period starts; when the first number is larger than a second number, acquiring a partition with a storage rate exceeding a preset storage rate threshold value from the second number of partitions as a target partition, wherein the second number of partitions are partitions which are allocated in advance and need to occupy the Hbase table in the target time period; when the number of the acquired target partitions is smaller than or equal to a third number, splitting the target partitions at splitting time corresponding to the target partitions, wherein the splitting time is later than the acquiring time of the target partitions, and the third number is equal to the first number minus the second number. The number of affected clients can be reduced.

Description

Method and device for splitting partitions
Technical Field
The present application relates to the field of communications, and in particular, to a method and an apparatus for splitting a partition.
Background
Hbase (hadoop database) is a highly reliable, high performance, nematic, scalable distributed storage system. There is only one Region in the Hbase table that is created by default, and the Region is a partition in the Hbase table and is used for storing data.
The Region has a fixed capacity, and assuming that the Region has a capacity of N, when the Region has insufficient space due to more and more data to be stored, the Region needs to be split into two regions having a capacity of N to store more data. Hbase has the function of automatically splitting a Region, and when the Region is full of data, the Region is automatically split into two new regions, and the data in the Region can be stored in the two new regions.
In the process of implementing the present application, the inventors found that the above manner has at least the following defects:
the Region is inaccessible to the client in the process of splitting the Region into two new regions. In a busy time period when the service is provided busy, a large amount of data can be stored or accessed into the Region, so that the Region is easy to be full and split in the busy time period, and in the busy time period, a large amount of clients can access the Region to request real-time service, so that the real-time service cannot be provided for a large amount of clients in the busy time period, and influence is caused on the large amount of clients.
Disclosure of Invention
In order to reduce the number of the clients to be shadowed, the embodiment of the present application provides a method and an apparatus for splitting a partition. The technical scheme is as follows:
in a first aspect, the present application provides a method of splitting a partition, the method comprising:
predicting a first number of partitions in an Hbase table to be occupied in a target time period according to data records stored in the Hbase table before the target time period starts;
when the first number is larger than a second number, acquiring a partition with a storage rate exceeding a preset storage rate threshold value from the second number of partitions as a target partition, wherein the second number of partitions are partitions which are allocated in advance and need to occupy the Hbase table in the target time period;
when the number of the acquired target partitions is smaller than or equal to a third number, splitting the target partitions at splitting time corresponding to the target partitions, wherein the splitting time is later than the acquiring time of the target partitions, and the third number is equal to the first number minus the second number.
Optionally, the predicting, according to the data record stored in the Hbase table before the target time period starts, a first number of partitions that need to be occupied in the Hbase table in the target time period includes:
calculating the average data volume of each data record in the Hbase table according to the stored data record total volume of the Hbase table and the used space capacity in the Hbase table;
predicting the number of data records to be stored generated in the target time period through a prediction model;
and calculating a first number of partitions needing to occupy in the Hbase table in the target time period according to the number of the data records to be stored, the average data volume and the partition capacity.
Optionally, before predicting that the first number of partitions in the Hbase table need to be occupied in the target time period, the method further includes:
and generating the prediction model according to the data record stored in the Hbase table before the target time period starts.
Optionally, the generating the prediction model according to the data record stored in the Hbase table before the target time period starts includes:
acquiring a first unit time set and a second unit time set according to timestamps corresponding to all data records in the Hbase table, wherein the first unit time set comprises a first data record number generated in each unit time between a first timestamp and a second timestamp, the second unit time set comprises a first data record number generated in each unit time between the second timestamp and a third timestamp, the first timestamp is the earliest timestamp from the timestamps corresponding to all the data records in the Hbase table, the third timestamp is the latest timestamp from the timestamps corresponding to all the data records in the Hbase table, and the second timestamp is located between the first timestamp and the third timestamp;
acquiring a first parameter value in at least one Recurrent Neural Network (RNN) parameter, and setting the RNN parameter of the first RNN according to the first parameter value of the at least one RNN parameter to obtain a second RNN;
generating the prediction model from the first set of unit times, the second set of unit times, and the second RNN.
Optionally, the generating the prediction model according to the first set of unit time, the second set of unit time, and the second RNN includes:
generating a first model by the second RNN according to a first number of data records per unit time in a first set of unit times;
predicting the number of second data records generated in each unit time between a second timestamp and a third timestamp through the first model to obtain a third unit time set;
and when the second unit time set and the third unit time set meet preset conditions, determining the first model as a prediction model.
Optionally, the method further includes:
when the second unit time set and the third unit time set do not meet preset conditions, acquiring a second parameter value corresponding to an RNN parameter, and setting the RNN parameter of the second RNN according to the second parameter value corresponding to the RNN parameter to obtain a third RNN;
generating the prediction model from the first set of unit times, the second set of unit times, and the third RNN.
Optionally, after the partition with the storage rate exceeding the preset storage rate threshold is obtained from the second number of partitions as the target partition, the method further includes:
determining the current date of the acquisition time of the target partition, selecting a time point in the current date and determining the time point as the splitting time corresponding to the target partition, wherein the splitting time is later than the preset time point in the current date.
Optionally, the method further includes:
storing the data packet in the cache space of the message system in a partition of Hbase according to the configuration file;
the configuration file comprises at least one topic related information and at least one object set, wherein the topic related information at least comprises a topic identifier, an Hbase table identifier and an object set identifier;
the object set comprises at least one field domain information, and the field domain information at least comprises a field name and a column family to which the field belongs.
Optionally, the storing the data packet in the buffer space of the message system in the partition of the Hbase according to the configuration file includes:
according to a theme identifier corresponding to a cache space of the message system, obtaining theme related information comprising the theme identifier from a configuration file, wherein the theme information further comprises an identifier of an Hbase table and a set identifier of an object set;
acquiring field content corresponding to each field name in an object set corresponding to the set identification from a data packet in a cache space of the message system;
and forming a data record by the obtained contents of the fields, and storing the data record in a partition of the Hbase table corresponding to the identifier of the Hbase table according to the column family to which the fields in the object set belong.
In a second aspect, the present application provides an apparatus for splitting partitions, the apparatus comprising:
the prediction module is used for predicting a first number of partitions needing to be occupied in the Hbase table in a target time period according to data records stored in the Hbase table before the target time period starts;
an obtaining module, configured to obtain, as a target partition, a partition whose storage rate exceeds a preset storage rate threshold from the second number of partitions when the first number is greater than a second number, where the second number of partitions is a partition that is pre-allocated and needs to occupy in the Hbase table within the target time period;
the splitting module is used for splitting the target partition at the splitting time corresponding to the target partition when the number of the obtained target partitions is smaller than or equal to a third number, wherein the splitting time is later than the obtaining time of the target partition, and the third number is equal to the first number minus the second number.
Optionally, the prediction module includes:
a first calculating unit, configured to calculate an average data amount of each data record in an Hbase table according to a total data record amount stored in the Hbase table and a used space capacity in the Hbase table;
a prediction unit for predicting the number of data records to be stored generated within the target time period by a prediction model;
and the second calculating unit is used for calculating a first number of partitions which need to occupy the Hbase table in the target time period according to the number of the data records to be stored, the average data volume and the partition capacity.
Optionally, the apparatus further comprises:
and the generation module is used for generating the prediction model according to the data record stored in the Hbase table before the target time period starts.
Optionally, the generating module includes:
a first obtaining unit, configured to obtain a first unit time set and a second unit time set according to timestamps corresponding to respective data records in the Hbase table, where the first unit time set includes a first number of data records generated in each unit time between a first timestamp and a second timestamp, the second unit time set includes a first number of data records generated in each unit time between the second timestamp and a third timestamp, the first timestamp is a timestamp that is earliest from a current timestamp in the timestamps corresponding to the respective data records in the Hbase table, the third timestamp is a timestamp that is latest from the current timestamp in the timestamps corresponding to the respective data records in the Hbase table, and the second timestamp is located between the first timestamp and the third timestamp;
a second obtaining unit, configured to obtain a first parameter value of at least one recurrent neural network RNN parameter, and set an RNN parameter of the first RNN according to the first parameter value of the at least one RNN parameter, to obtain a second RNN;
a generating unit configured to generate the prediction model according to the first set of unit times, the second set of unit times, and the second RNN.
Optionally, the generating unit is configured to:
generating a first model by the second RNN according to a first number of data records per unit time in a first set of unit times;
predicting the number of second data records generated in each unit time between a second timestamp and a third timestamp through the first model to obtain a third unit time set;
and when the second unit time set and the third unit time set meet preset conditions, determining the first model as a prediction model.
Optionally, the generating unit is further configured to:
when the second unit time set and the third unit time set do not meet preset conditions, acquiring a second parameter value corresponding to an RNN parameter, and setting the RNN parameter of the second RNN according to the second parameter value corresponding to the RNN parameter to obtain a third RNN;
generating the prediction model from the first set of unit times, the second set of unit times, and the third RNN.
Optionally, the apparatus further comprises:
the determining module is used for determining the current date of the obtaining time of the target partition, selecting a time point in the current date and determining the time point as the splitting time corresponding to the target partition, wherein the splitting time is later than the preset time point in the current date.
Optionally, the apparatus further comprises:
the storage module is used for storing the data packet in the cache space of the message system in the partition of the Hbase according to the configuration file;
the configuration file comprises at least one topic related information and at least one object set, wherein the topic related information at least comprises a topic identifier, an Hbase table identifier and an object set identifier;
the object set comprises at least one field domain information, and the field domain information at least comprises a field name and a column family to which the field belongs.
Optionally, the storage module includes:
a third obtaining unit, configured to obtain, according to a subject identifier corresponding to a cache space of the message system, subject related information including the subject identifier from a configuration file, where the subject information further includes an identifier of an Hbase table and a set identifier of an object set;
a fourth obtaining unit, configured to obtain, from a data packet in a cache space of the message system, field content corresponding to each field name in an object set corresponding to the set identifier;
and the storage unit is used for forming a data record by the obtained contents of the fields, and storing the data record in a partition of the Hbase table corresponding to the identifier of the Hbase table according to the column family to which the fields in the object set belong.
In a sixth aspect, the present application provides a non-transitory computer readable storage medium for storing a computer program which is loaded and executed by a processor to implement the instructions of the first aspect or any of the alternative methods of the first aspect.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
by predicting a first number of partitions within the target time period, a third number of partitions to be split within the target time period may be derived from the first number. Therefore, when the target partition with the storage rate exceeding the preset storage rate threshold is detected in the target time period, if the acquired number of the target partitions is less than or equal to the third number, the target partitions can be split in the splitting time of the idle time period with idle service, the number of affected clients can be reduced, and meanwhile, the number of the split target partitions is controlled through the third number, so that the splitting of excessive partitions is avoided, and a large amount of storage resource waste of the partitions is caused.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
Fig. 1 is a schematic diagram of a network architecture provided in an embodiment of the present application;
FIG. 2 is a flow chart of a method for splitting partitions according to an embodiment of the present application;
FIG. 3-1 is a flow chart of a method for splitting partitions according to an embodiment of the present application;
3-2 is a flow chart of a method for generating a prediction module provided by an embodiment of the present application;
3-3 are waveforms of the number of records per unit time provided by embodiments of the present application;
FIG. 4 is a schematic structural diagram of an apparatus for splitting partitions according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a terminal provided in an embodiment of the present application.
With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
Referring to fig. 1, an embodiment of the present application provides a network architecture, including:
the system comprises a data acquisition terminal 1, a message system 2 and a distributed storage system 3. The message system 2 includes at least one cache space, and each cache space of the at least one cache space corresponds to a topic.
The message system may be KAFKA and the buffer space in the message system may be a message queue in KAFKA. The distributed storage system comprises a head node and a plurality of data nodes, wherein the head node is used for managing the data nodes, and the data nodes are used for storing data.
For each cache space in the message system, the cache space is used for caching the data packets belonging to the subject corresponding to the cache space and received by the message system 2.
The distributed storage system comprises a configuration file, wherein the configuration file comprises at least one piece of topic related information and at least one object set, and the topic related information at least comprises a topic identifier, an identifier of an Hbase table and a set identifier of an object set (Schema); at least one of a rowkey generation rule, message system cluster information to which the subject belongs, cluster information to which the Hbase table belongs, and the like may also be included, and rowkey is a unique identifier for identifying one data record.
The object set comprises at least one field domain information, and the field domain information at least comprises a field name and a column family to which the field belongs; at least one of a field type, whether the field is optional, whether the field indexes information, and the like may also be included.
Optionally, the data acquisition end 1 may be configured to acquire a data sequence, where the data sequence includes at least one data packet generated in time sequence, a packet header of each data packet includes a timestamp and a topic identifier of a topic to which the data packet belongs, and the data acquisition end 1 is further configured to send each acquired data packet to the message system 2.
Optionally, before sending each data packet, the data acquisition end 1 may further encapsulate each acquired data packet into a data packet in a PB (peer to peer) protocol format, and then send the encapsulated data packet to the message system 2.
Because the data acquisition end 1 encapsulates each acquired data packet into a data packet in a PB protocol format, the data volume of the data packet in the PB protocol format is small, and the analysis speed is faster and simpler.
The message system 2 is configured to receive a data packet sent by the data acquisition end 1, determine a topic corresponding to the topic identifier according to the topic identifier included in the data packet, and cache the data packet in a cache space corresponding to the topic.
Optionally, the distributed storage system may store the data packet cached in each cache space included in the distributed storage system according to the configuration file, and the storage operation may be executed by a general node in the distributed storage system, and may be:
for each cache space, according to the subject identification corresponding to the cache space, obtaining subject related information including the subject identification from a configuration file, wherein the subject related information also includes an identification and an object set of an Hbase table; acquiring field content corresponding to each field name in the object set from the data packet in the cache space; and combining the obtained contents of the fields into a data record, and storing the data record in a Region of the Hbase table corresponding to the identifier of the Hbase table according to the column family to which the fields in the object set belong.
Referring to fig. 2, an embodiment of the present application provides a method for splitting a partition, where the method may be applied to a network architecture provided in the embodiment shown in fig. 1, and an execution subject of the method may be a general node in a distributed storage system, including:
step 201: predicting a first number of regions in the Hbase table that need to be occupied during the target time period based on data records stored in the Hbase table prior to the start of the target time period.
Regions are partitions, which may be partitions in the Hbase table, and the meaning of regions appearing in other embodiments of the present application can be referred to herein, and will not be described one by one.
Optionally, an embodiment of the present application provides a first optional implementation manner to implement this step, where the first implementation manner includes the following operations 2011 to 2013, which are respectively:
2011: the average data amount of each data record in the Hbase table is calculated according to the stored data record total amount of the Hbase table and the used space capacity in the Hbase table.
2012: and predicting the number of the data records to be stored generated in the target time period through the prediction model.
2013: calculating a first number of partitions required to occupy in the Hbase table in the target time period according to the number of data records to be stored, the average data amount and the partition capacity.
Optionally, the embodiment of the present application provides a second optional implementation manner, and before performing step 201, the following operation of step 200 may also be performed in the second optional implementation manner.
Step 200: and generating a prediction model according to the data record stored in the Hbase table before the target time period starts.
Optionally, the first optional implementation manner and the second optional implementation manner may be combined to form a method of splitting partitions.
Optionally, in combination with the second optional implementation manner, an embodiment of the present application provides a third optional implementation manner, and in the third optional implementation manner, the step 200 may include the following operations 2001 to 2003, which are respectively:
2001: and acquiring a first unit time set and a second unit time set according to the time stamps corresponding to the data records in the Hbase table, wherein the first unit time set comprises the first number of data records generated in each unit time between the first time stamp and the second time stamp, the second unit time set comprises the first number of data records generated in each unit time between the second time stamp and the third time stamp, the first time stamp is the earliest time stamp from the time stamps corresponding to the data records in the Hbase table, the third time stamp is the latest time stamp from the time stamps corresponding to the data records in the Hbase table, and the second time stamp is positioned between the first time stamp and the third time stamp.
2002: the method comprises the steps of obtaining a first parameter value in at least one Recurrent Neural Networks (RNN) parameter, and setting the RNN parameter of the first RNN according to the first parameter value of the at least one RNN parameter to obtain a second RNN.
2003: a prediction model is generated from the first set of unit times, the second set of unit times, and the second RNN.
Optionally, the first optional implementation manner, the second optional implementation manner, and the third optional implementation manner may be combined to form a method of splitting a partition.
Optionally, with reference to the third optional implementation manner, an embodiment of the present application provides a fourth optional implementation manner, and in the fourth optional implementation manner, the operation of the step 2003 may be:
generating a first model by the second RNN according to the first number of data records per unit time in the first set of unit times;
predicting the number of second data records generated in each unit time between the second timestamp and the third timestamp through the first model to obtain a third unit time set;
and when the second unit time set and the third unit time set meet preset conditions, determining the first model as a prediction model.
Optionally, the first optional implementation manner, the second optional implementation manner, the third optional implementation manner, and the fourth optional implementation manner may be combined to form a method of splitting a partition.
Optionally, with reference to the fourth optional implementation manner, an embodiment of the present application provides a fifth optional implementation manner, and in the fifth optional implementation manner, the method for splitting a partition may further include the following operations:
when the second unit time set and the third unit time set do not meet preset conditions, acquiring a second parameter value corresponding to the RNN parameter, and setting the RNN parameter of the second RNN according to the second parameter value corresponding to the RNN parameter to obtain a third RNN; a prediction model is generated from the first set of unit times, the second set of unit times, and the third RNN.
Optionally, the first optional implementation manner, the second optional implementation manner, the third optional implementation manner, the fourth optional implementation manner, and the fifth optional implementation manner may be combined to form a method of splitting a partition.
Step 202: and when the first number is larger than the second number, acquiring regions with the storage rate exceeding a preset storage rate threshold value from a second number of regions as target regions, wherein the second number of regions are pre-allocated regions which need to occupy the Hbase table in a target time period.
Optionally, this embodiment of the present application provides a sixth optional implementation manner, where in the sixth optional implementation manner, after the operation of step 202 is performed, the method for splitting a partition may further include the following operations:
determining the current date of the acquisition time of the target partition, selecting a time point in the current date and determining the time point as the splitting time corresponding to the target partition, wherein the splitting time is later than the preset time point in the current date.
Optionally, a sixth optional implementation manner may be combined with any one implementation manner of the first optional implementation manner, the second optional implementation manner, the third optional implementation manner, the fourth optional implementation manner, and the fifth optional implementation manner to form the method of splitting a partition.
Step 203: and when the number of the acquired target regions is less than or equal to a third number, splitting the target regions at splitting times corresponding to the target regions, wherein the splitting times are later than the acquiring times of the target regions, and the third number is equal to the first number minus the second number.
The preset storage rate threshold is a value smaller than 1, for example, a value such as 0.8, 0.7, or 0.9, and the splitting time may be a certain time in an idle time period during which the service is idle every day.
Optionally, an embodiment of the present application provides a seventh optional implementation manner, and in the seventh optional implementation manner, the method for splitting a partition may further include the following operations 204:
step 204: storing the data packet in the cache space of the message system in a partition of Hbase according to the configuration file; the configuration file comprises at least one topic related information and at least one object set, wherein the topic related information at least comprises a topic identifier, an Hbase table identifier and an object set identifier; the object set comprises at least one field domain information, and the field domain information at least comprises a field name and a column family to which the field belongs.
Optionally, a seventh optional implementation manner may be combined with any one implementation manner of the first optional implementation manner, the second optional implementation manner, the third optional implementation manner, the fourth optional implementation manner, and the sixth optional implementation manner to form the method of splitting the partition.
Optionally, an eighth optional implementation manner is provided in an embodiment of the present application, and in the eighth optional implementation manner, the operation 204 may be:
according to the topic identification corresponding to the cache space of the message system, the topic related information including the topic identification is obtained from the configuration file, and the topic information further includes the identification of the Hbase table and the set identification of the object set.
And acquiring the field content corresponding to each field name in the object set corresponding to the set identification from the data packet in the cache space of the message system.
And combining the obtained contents of the fields into a data record, and storing the data record in a partition of the Hbase table corresponding to the identifier of the Hbase table according to the column group to which the fields in the object set belong.
Optionally, an eighth optional implementation manner may be combined with any one implementation manner of the first optional implementation manner, the second optional implementation manner, the third optional implementation manner, the fourth optional implementation manner, and the sixth optional implementation manner to form the method of splitting a partition.
In an embodiment of the present application, by predicting a first number of regions within the target time period, a third number of regions to be split within the target time period may be derived from the first number. When detecting a Region with a storage rate exceeding a preset storage rate threshold value in a target time period, taking the Region as a target Region to be split, wherein the preset storage rate threshold value is a value smaller than 1, so that the target Region is not fully stored, at this time, the target Region is not split, but the target Region is split at the splitting time of an idle time period in which the service is idle, so that the target Region can continue to provide the real-time service in a busy time period in which the service is busy, and the problem that the real-time service cannot be provided to a large number of clients in the busy time period is avoided. The target Region is split during idle periods, which may reduce the number of affected clients due to the smaller number of clients accessing the target Region during idle periods. In addition, the target regions are split only when the number of the obtained target regions is less than or equal to the third number, so that the splitting of a large number of regions which are not necessary to be split does not occur, and the waste of resources is reduced.
For the embodiment shown in fig. 2, the present application provides an example of a method for splitting partitions, in which at least one target time period may be defined in advance, and a second number of regions that need to occupy in the Hbase table in each target time period is configured in advance.
The Hbase table has an automatic splitting Region function, which is used to automatically split a Region into two new regions when the Region is full, but in the embodiment of the present application, the automatic splitting Region function of the Hbase table may be turned off.
Alternatively, the target time period may be a one-week time, a one-month time, or a two-month time, etc. The time duration of each target time period may be equal or unequal. The second number of regions configured for each target time period may or may not be equal.
Assuming that the target time period is a month time as an example, five target time periods are defined in advance, and the number of regions required to be occupied in each target time period is configured to be 6. That is, during the next five months, six regions into the Hbase table may be used to store data records each month.
For any target time period, if more data records are generated in the target time period, so that the second number of regions allocated to the target time period in advance is insufficient, part or all of the second number of regions are split in the target time period to split more regions. Referring to fig. 3-1, the method of splitting a Region includes:
step 301: calculating the average data amount of each data record in the Hbase table according to the total number of the data records stored in the Hbase table and the used space capacity in the Hbase table.
The Hbase table comprises at least one Region, each Region has equal capacity and is used for storing at least one data record, and each data record is composed of partial or all field contents contained in one data packet received by the message system.
This step can be performed by calculating the average data amount of each data record in the Hbase table by the following steps 3011 to 3013, including:
3011: and acquiring the number of the data records stored in each Region in the Hbase table, and calculating the total number of the data records stored in the Hbase table according to the number of the data records stored in each Region.
Optionally, statistics may be directly performed on the data records stored in the Region to obtain the number of the data records stored in the Region.
3012: and acquiring the used space capacity in each Region in the Hbase table, and calculating the used space capacity in the Hbase table according to the used space capacity of each Region.
For each Region in the Hbase table, the data records are stored one by one in the Region. The attribute information of the Region includes the space capacity currently used by the Region, and may also include the space capacity currently free by the Region.
The used space capacity in each Region in the Hbase table can be directly read from the attribute information of the Region.
3013: calculating the average data amount of each data record in the Hbase table according to the total number of the data records stored in the Hbase table and the used space capacity in the Hbase table.
The average data size of each data record in the Hbase table may be obtained by dividing the used space capacity in the Hbase table by the total number of data records.
Step 302: and predicting the number of the data records to be stored generated in the target time period through the prediction model.
The target time period in this step is any one of the defined target time periods, and the above-described steps 301 and 302 are performed before the start of the target time period. Alternatively, the above steps 301 and 302 may be performed on the last day of the previous target time period adjacent to the target time period.
In this step, the start time and the end time of the target period may be input to a prediction model by which the number of data records to be stored generated within the target period is predicted.
Optionally, a predictive model may also be generated prior to performing this step. The operation of generating the prediction model may be: the predictive model is generated from the data records stored in the Hbase table before the start of the target time period. Referring to fig. 3-2, the detailed process of the training operation may be:
3021: and acquiring a first unit time set and a second unit time set according to the time stamp corresponding to each data record stored in the Hbase table.
The first unit time set comprises a first data record number generated in each unit time between a first time stamp and a second time stamp, the second unit time set comprises a first data record number generated in each unit time between the second time stamp and a third time stamp, the first time stamp is the earliest time stamp from the time stamps corresponding to all the data records in the Hbase table, the third time stamp is the latest time stamp from the time stamps corresponding to all the data records in the Hbase table, and the second time stamp is located between the first time stamp and the third time stamp.
The method comprises the following steps: a first timestamp which is the earliest from the current time and a third timestamp which is the latest from the current time can be obtained from timestamps corresponding to all data records stored in the Hbase table, and the time period between the first timestamp and the second timestamp is divided according to the time length of unit time to obtain S unit times; determining each data record generated in each unit time in the S unit times according to the timestamp corresponding to each data record stored in the Hbase table, and counting the first data record number of the data records generated in each unit time; the S unit times are divided into two parts, one part is a first set of unit times, and the other part is a second set of unit times.
The first unit time set comprises M unit times, the second unit time set comprises N unit times, S is M + N, and the ratio between M and S is a preset ratio.
Optionally, the first number of data records in each unit time in the first unit time set and the first number of data records in each unit time in the second unit time set may be scaled to a unit range and normalized.
3022: and acquiring a parameter value of at least one RNN parameter, and setting the RNN parameter of the first RNN according to the parameter value of the at least one RNN parameter to obtain a second RNN.
The at least one RNN parameter includes at least one of a number of random seeds, a learning algorithm, a number of algorithm iterations, a weight initialization method, an optimization method, a learning rate, an input layer, an output layer, a direction of a recursive network algorithm, and the like.
Optionally, when generating the prediction model, the user may input a first parameter value corresponding to each RNN parameter. Accordingly, in this step, a first parameter value corresponding to each RNN parameter input by the user may be obtained.
For example, the user may input 40 first parameter values corresponding to the number of random seeds, SGD first parameter values corresponding to the learning algorithm, 1 first parameter value corresponding to the number of iterations of the algorithm, Xavier first parameter values corresponding to the weight initialization method, adapelta first parameter values corresponding to the optimization method, 0.0004 first parameter values corresponding to the learning rate, 1 first parameter value corresponding to the input layer, 10 first parameter values corresponding to the output layer, and forward first parameter values corresponding to the direction of the recursive network algorithm. Then, each first parameter value input by the user is obtained, and each parameter of the first RNN is set according to each first parameter value.
3023: a first model is generated by the second RNN based on a first number of data records per unit time in the first set of unit times.
Optionally, the first number of data records per unit time in the first unit time set may be input into the second RNN, and an output result of the second RNN is obtained, where the output result is a function, and the function is used as the first model.
After the first model is generated, a prediction model may be generated from the first model, the first set of unit times, and the second set of unit times as follows 3024 and 3025.
3024: and predicting the number of second data records generated in each unit time between the second time stamp and the third time stamp through the first model to obtain a third unit time set.
Optionally, the second time stamp and the third time stamp are input into the first model, and the first model predicts and outputs a second number of data records generated in each unit time between the second time stamp and the third time stamp.
The number of unit times included in the third unit time set is equal to the number of unit times included in the second unit time set, that is, the third unit time set includes N unit times.
3025: and when the second unit time set and the third unit time set meet preset conditions, determining the first model as a prediction model.
Calculating the difference between the first data record number and the second data record number in the unit time to obtain the difference of the unit time for the same unit time in the second unit time set and the third unit time set; calculating to obtain the difference value of each unit time in the N unit times according to the mode, and acquiring each difference value exceeding a preset difference value threshold value from the difference values of the N unit times; and if the ratio of the obtained number of the difference values to the N exceeds a preset ratio threshold, determining that the second unit time set and the third unit time set meet the preset condition, and if the ratio of the obtained number of the difference values to the N does not exceed the preset ratio threshold, determining that the second unit time set and the third unit time set do not meet the preset condition.
3026: and when the second unit time set and the third unit time set do not meet the preset conditions, acquiring a second parameter value of the at least one RNN parameter, and setting the RNN parameter of the second RNN according to the second parameter value of the at least one RNN parameter to obtain a third RNN.
Optionally, referring to fig. 3-3, a first waveform graph line varying with unit time may be further drawn in the coordinate system according to the first data record number per unit time in S unit times, and a second waveform graph line varying with unit time may be further drawn according to the predicted second data record number per unit time in N unit times in the third unit time set; the first and second plotted oscillograms are then displayed.
Thus, the user can compare the first waveform graph with the second waveform graph, adjust the parameter value of at least one RNN parameter according to the comparison result, namely obtain the second parameter value of the at least one RNN parameter, and input the adjusted second parameter value of the at least one RNN parameter. Correspondingly, a second parameter value of the at least one RNN parameter is obtained.
Alternatively, referring to fig. 3-3, the horizontal axis of the coordinate system is a time axis, the scale on the time axis is a unit time, and the scale on the vertical axis is the number of data records.
Next, a prediction model may be generated from the first set of unit times, the second set of unit times, and the third RNN. The detailed implementation is as follows:
3027: a second model is generated by the third RNN based on the first number of data records per unit time in the first set of unit times.
Optionally, the first number of data records per unit time in the first unit time set may be input into the third RNN, and an output result of the third RNN is obtained, where the output result is a function, and the function is used as the second model.
3028: and predicting the number of second data records generated in each unit time between the second time stamp and the third time stamp through the second model to obtain a third unit time set.
Optionally, the second time stamp and the third time stamp are input into a second model, and the second model predicts and outputs a second number of data records generated in each unit time between the second time stamp and the third time stamp.
The number of unit times included in the third unit time set is equal to the number of unit times included in the second unit time set, that is, the third unit time set includes N unit times.
3029: and when the second unit time set and the third unit time set meet preset conditions, determining the second model as a prediction model.
And when the second unit time set and the third unit time set do not meet the preset condition, re-acquiring the parameter value of at least one RNN parameter adjusted by the user, and then operating according to the steps 3026 to 3029 until a prediction model is obtained.
Step 303: and calculating a first number of regions required to occupy in the Hbase table in the target time period according to the number of data records to be stored, the average data amount and the Region capacity.
The method comprises the following steps: calculating the total data volume of the data records to be stored generated in the target time period according to the number of the data records to be stored and the average data volume; and calculating a first number of regions required to occupy in the Hbase table in the target time period according to the total data amount and the Region capacity.
Step 304: and when the first number is larger than the second number, acquiring the regions with the storage rate exceeding a preset storage rate threshold from the second number as target regions.
The second number of regions are regions in the Hbase table that need to be occupied for a pre-allocated target time period.
When the first number is greater than the second number, indicating that there may be more data records generated in the target time period, the pre-allocated second number of regions may not be able to store all the data records generated in the target time period, and a third number of regions may need to be split, the third number being the difference between the first number and the second number. For this purpose, in the application embodiment, the Region whose storage rate first reaches the preset storage rate threshold in the target time period may be split.
Optionally, when the message system stores a data record in a Region in the Hbase table, the message system increases the used space capacity included in the configuration information of the Region. In this step, the storage rate of each Region in the second number of regions is also obtained in real time, and the obtaining process may be: for each Region, reading the used space capacity of the Region from the configuration information of the Region, and calculating the storage rate of the Region according to the used space capacity and the Region capacity.
In this step, the storage rate of each Region in the second number of regions is obtained in real time, and when the storage rate of a certain Region exceeds a preset storage rate threshold, the following operation of step 305 is performed.
Step 305: and when the number of the acquired target regions is less than or equal to the third number, determining the splitting time corresponding to the target regions, wherein the splitting time is later than the acquiring time of the target regions.
Specifically, the current date of the acquisition time of the target Region domain is determined, a time point is selected in the current date, the time point is determined as the splitting time corresponding to the target Region, and the splitting time is later than the preset time point in the current date.
In this step, it may be defined to split the Region after a preset time point of each day, which is generally a time period during which the service is relatively idle. For example, a time period in which it is idle to provide services is usually followed by 10 nights each day, so the preset time point may be 10 nights, 10 midnight, 11 nights, or the like. Therefore, Region can be divided in a time period when the service is idle, and the number of affected clients can be reduced as much as possible.
Step 306: and splitting the target Region at the splitting time corresponding to the target Region to obtain two new regions.
After splitting into two new regions, the data records already stored in the target Region may be stored in the two new regions.
Alternatively, the data records already stored in the target Region may be stored in the two new regions on average.
Optionally, if the number of target regions acquired during the target time period is greater than the third number, indicating that the third number of regions have been split, the current Region is sufficient to store the data records generated during the target time period, so splitting the acquired target regions is stopped.
The target Region also corresponds to an identification segment, the identification segment comprises a starting identification and an ending identification, when a data record is stored in the target Region, a rowkey for uniquely identifying the data record is generated according to a generation rule of the rowkey, the rowkey is a unique identification of the data record, and the rowkey is located in the identification segment.
When the target Region is split into two new regions, the identifier located in the identifier segment can be obtained, and a suffix is added on the identifier to obtain the separation identifier. Optionally, the obtained identifier may be a median or other value of the identifier segment.
The identifier segment of one new Region may be set as the identifier segment from the start identifier to the partition identifier, and the identifier segment of another new Region may be set as the identifier segment from the partition identifier to the end identifier.
Among them, it should be noted that: in the present embodiment, the suffix is added to be short, and the suffix has only one or a few digits and/or characters such as letters, so that the division mark is short, and the start mark and the end mark of the Region are short. Therefore, after a data record is obtained and the rowkey of the data record is generated, when the data record is stored in the Region, the data record is stored in the Region matched with the rowkey according to the identification segment matched with the Region by the rowkey, and the starting identification and the ending identification of the identification segment are shorter, so that the matching speed can be improved, and the storage speed can be improved.
In an embodiment of the present application, by predicting a first number of regions within the target time period, a third number of regions to be split within the target time period may be derived from the first number. When the third number is not 0, when a Region with a storage rate exceeding a preset storage rate threshold value is detected in a target time period, the Region is used as a target Region to be split, wherein the preset storage rate threshold value is a value smaller than 1, so that the target Region is not full of storage, at this time, the target Region is not split, but the target Region is split at the splitting time of an idle time period in which the service is idle, so that the target Region can continue to provide the real-time service in a busy time period in which the service is busy, and the problem that the real-time service cannot be provided to a large number of clients in the busy time period is avoided. The target Region is split during idle periods, which may reduce the number of affected clients due to the smaller number of clients accessing the target Region during idle periods. In addition, the target regions are split only when the number of the obtained target regions is less than or equal to the third number, so that a large number of regions which are not required to be split do not occur, storage space of a plurality of regions is not used in the target time period, and waste of storage resources is reduced.
The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.
Referring to fig. 4, an embodiment of the present application provides a first alternative implementation manner, in which an apparatus 400 for splitting partitions is provided, where the apparatus 400 includes:
a predicting module 401, configured to predict, according to a data record stored in an Hbase table before a target time period starts, a first number of regions that need to be occupied in the Hbase table in the target time period;
an obtaining module 402, configured to obtain, as a target Region, a Region with a storage rate exceeding a preset storage rate threshold from the second number of regions when the first number is greater than the second number, where the second number of regions are pre-allocated regions that need to occupy in the Hbase table within the target time period;
a splitting module 403, configured to split the target Region at a splitting time corresponding to the target Region when the number of the obtained target regions is less than or equal to a third number, where the splitting time is later than an obtaining time of obtaining the target Region, and the third number is equal to the first number minus the second number.
Optionally, with reference to the first optional implementation manner of the embodiment of the present application, in a second optional implementation manner of the embodiment of the present application, the prediction module 401 includes:
a first calculating unit, configured to calculate an average data amount of each data record in an Hbase table according to a total data record amount stored in the Hbase table and a used space capacity in the Hbase table;
a prediction unit for predicting the number of data records to be stored generated within the target time period by a prediction model;
and the second calculating unit is used for calculating a first number of regions required to occupy in the Hbase table in the target time period according to the number of the data records to be stored, the average data volume and the Region capacity.
Optionally, with reference to the first optional implementation manner or the second optional implementation manner of the embodiment of the present application, in a third optional implementation manner of the embodiment of the present application, the apparatus 400 further includes:
and the generation module is used for generating the prediction model according to the data record stored in the Hbase table before the target time period starts.
Optionally, with reference to the third optional implementation manner of the embodiment of the present application, in a fourth optional implementation manner of the embodiment of the present application, the generating module includes:
a first obtaining unit, configured to obtain a first unit time set and a second unit time set according to timestamps corresponding to respective data records in the Hbase table, where the first unit time set includes a first number of data records generated in each unit time between a first timestamp and a second timestamp, the second unit time set includes a first number of data records generated in each unit time between the second timestamp and a third timestamp, the first timestamp is a timestamp that is earliest from a current timestamp in the timestamps corresponding to the respective data records in the Hbase table, the third timestamp is a timestamp that is latest from the current timestamp in the timestamps corresponding to the respective data records in the Hbase table, and the second timestamp is located between the first timestamp and the third timestamp;
a second obtaining unit, configured to obtain a first parameter value of the at least one RNN parameter, and set an RNN parameter of the first RNN according to the first parameter value of the at least one RNN parameter, to obtain a second RNN;
a generating unit configured to generate the prediction model according to the first set of unit times, the second set of unit times, and the second RNN.
Optionally, with reference to the fourth optional implementation manner of the embodiment of the present application, in a fifth optional implementation manner of the embodiment of the present application, the generating unit is configured to:
generating a first model by the second RNN according to a first number of data records per unit time in a first set of unit times;
predicting the number of second data records generated in each unit time between a second timestamp and a third timestamp through the first model to obtain a third unit time set;
and when the second unit time set and the third unit time set meet preset conditions, determining the first model as a prediction model.
Optionally, with reference to the fifth optional implementation manner of the embodiment of the present application, in a sixth optional implementation manner of the embodiment of the present application, the generating unit is further configured to:
when the second unit time set and the third unit time set do not meet preset conditions, acquiring a second parameter value corresponding to an RNN parameter, and setting the RNN parameter of the second RNN according to the second parameter value corresponding to the RNN parameter to obtain a third RNN;
generating the prediction model from the first set of unit times, the second set of unit times, and the third RNN.
Optionally, with reference to any optional implementation manner of the first to sixth optional implementation manners of the embodiment of the present application, in a seventh optional implementation manner of the embodiment of the present application, the apparatus 400 further includes:
and the determining module is used for determining the current date of the acquisition time of the target Region, selecting a time point in the current date and determining the time point as the splitting time corresponding to the target Region, wherein the splitting time is later than the preset time point in the current date.
Optionally, with reference to any optional implementation manner of the first to seventh optional implementation manners of the embodiment of the present application, in an eighth optional implementation manner of the embodiment of the present application, the apparatus 400 further includes:
the storage module is used for storing the data packet in the cache space of the message system in the Region of the Hbase according to the configuration file;
the configuration file comprises at least one topic related information and at least one object set, wherein the topic related information at least comprises a topic identifier, an Hbase table identifier and an object set identifier;
the object set comprises at least one field domain information, and the field domain information at least comprises a field name and a column family to which the field belongs.
Optionally, with reference to the eighth optional implementation manner of the embodiment of the present application, in a ninth optional implementation manner of the embodiment of the present application, the storage module includes:
a third obtaining unit, configured to obtain, according to a subject identifier corresponding to a cache space of the message system, subject related information including the subject identifier from a configuration file, where the subject information further includes an identifier of an Hbase table and a set identifier of an object set;
a fourth obtaining unit, configured to obtain, from a data packet in a cache space of the message system, field content corresponding to each field name in an object set corresponding to the set identifier;
and the storage unit is used for forming a data record by the obtained contents of the fields, and storing the data record in the Region of the Hbase table corresponding to the identifier of the Hbase table according to the column family to which the fields in the object set belong.
In the embodiment of the application, the prediction module predicts the first number of regions in the target time period, and the third number of regions to be split in the target time period can be obtained according to the first number. Therefore, when the target Region with the storage rate exceeding the preset storage rate threshold is detected in the target time period, if the acquired number of the target regions is less than or equal to the third number, the splitting module can split the target regions in the splitting time of the idle time period which is idle for providing services, the number of affected clients can be reduced, meanwhile, the number of the split target regions is controlled through the third number, and the phenomenon that the excessive regions are split to cause a large amount of waste of storage resources of the regions is avoided.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 5 is a block diagram illustrating a structure of a terminal 500 according to an exemplary embodiment of the present invention, where the terminal 500 may be an overall node in a distributed storage system. The terminal 500 may be a portable mobile terminal such as: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Terminal 500 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, and the like.
In general, the terminal 500 includes: a processor 501 and a memory 502.
The processor 501 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 501 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 501 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 501 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, processor 501 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.
Memory 502 may include one or more computer-readable storage media, which may be non-transitory. Memory 502 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 502 is used to store at least one instruction for execution by processor 501 to implement a method of splitting regions as provided by method embodiments herein.
In some embodiments, the terminal 500 may further optionally include: a peripheral interface 503 and at least one peripheral. The processor 501, memory 502 and peripheral interface 503 may be connected by a bus or signal lines. Each peripheral may be connected to the peripheral interface 503 by a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 504, touch screen display 505, camera 506, audio circuitry 507, positioning components 508, and power supply 509.
The peripheral interface 503 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 501 and the memory 502. In some embodiments, the processor 501, memory 502, and peripheral interface 503 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 501, the memory 502, and the peripheral interface 503 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.
The Radio Frequency circuit 504 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 504 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 504 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 504 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 504 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 504 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.
The display screen 505 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 505 is a touch display screen, the display screen 505 also has the ability to capture touch signals on or over the surface of the display screen 505. The touch signal may be input to the processor 501 as a control signal for processing. At this point, the display screen 505 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display screen 505 may be one, providing the front panel of the terminal 500; in other embodiments, the display screens 505 may be at least two, respectively disposed on different surfaces of the terminal 500 or in a folded design; in still other embodiments, the display 505 may be a flexible display disposed on a curved surface or on a folded surface of the terminal 500. Even more, the display screen 505 can be arranged in a non-rectangular irregular figure, i.e. a shaped screen. The Display screen 505 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and other materials.
The camera assembly 506 is used to capture images or video. Optionally, camera assembly 506 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 506 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.
Audio circuitry 507 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 501 for processing, or inputting the electric signals to the radio frequency circuit 504 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different portions of the terminal 500. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 501 or the radio frequency circuit 504 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, audio circuitry 507 may also include a headphone jack.
The positioning component 508 is used for positioning the current geographic Location of the terminal 500 for navigation or LBS (Location Based Service). The Positioning component 508 may be a Positioning component based on the Global Positioning System (GPS) in the united states, the beidou System in china, or the galileo System in russia.
Power supply 509 is used to power the various components in terminal 500. The power source 509 may be alternating current, direct current, disposable or rechargeable. When power supply 509 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.
In some embodiments, terminal 500 also includes one or more sensors 510. The one or more sensors 510 include, but are not limited to: acceleration sensor 511, gyro sensor 512, pressure sensor 513, fingerprint sensor 514, optical sensor 515, and proximity sensor 516.
The acceleration sensor 511 may detect the magnitude of acceleration on three coordinate axes of the coordinate system established with the terminal 500. For example, the acceleration sensor 511 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 501 may control the touch screen 505 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 511. The acceleration sensor 511 may also be used for acquisition of motion data of a game or a user.
The gyro sensor 512 may detect a body direction and a rotation angle of the terminal 500, and the gyro sensor 512 may cooperate with the acceleration sensor 511 to acquire a 3D motion of the user on the terminal 500. The processor 501 may implement the following functions according to the data collected by the gyro sensor 512: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.
The pressure sensor 513 may be disposed on a side bezel of the terminal 500 and/or an underlying layer of the touch display screen 505. When the pressure sensor 513 is disposed on the side frame of the terminal 500, a user's holding signal of the terminal 500 may be detected, and the processor 501 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 513. When the pressure sensor 513 is disposed at the lower layer of the touch display screen 505, the processor 501 controls the operability control on the UI interface according to the pressure operation of the user on the touch display screen 505. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.
The fingerprint sensor 514 is used for collecting a fingerprint of the user, and the processor 501 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 514, or the fingerprint sensor 514 identifies the identity of the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the processor 501 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings, etc. The fingerprint sensor 514 may be provided on the front, back, or side of the terminal 500. When a physical button or a vendor Logo is provided on the terminal 500, the fingerprint sensor 514 may be integrated with the physical button or the vendor Logo.
The optical sensor 515 is used to collect the ambient light intensity. In one embodiment, the processor 501 may control the display brightness of the touch display screen 505 based on the ambient light intensity collected by the optical sensor 515. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 505 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 505 is turned down. In another embodiment, processor 501 may also dynamically adjust the shooting parameters of camera head assembly 506 based on the ambient light intensity collected by optical sensor 515.
A proximity sensor 516, also referred to as a distance sensor, is typically disposed on the front panel of the terminal 500. The proximity sensor 516 is used to collect the distance between the user and the front surface of the terminal 500. In one embodiment, when the proximity sensor 516 detects that the distance between the user and the front surface of the terminal 500 gradually decreases, the processor 501 controls the touch display screen 505 to switch from the bright screen state to the dark screen state; when the proximity sensor 516 detects that the distance between the user and the front surface of the terminal 500 becomes gradually larger, the processor 501 controls the touch display screen 505 to switch from the screen-rest state to the screen-on state.
Those skilled in the art will appreciate that the configuration shown in fig. 5 is not intended to be limiting of terminal 500 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (18)

1. A method of splitting a partition, the method comprising:
predicting a first number of partitions in an Hbase table to be occupied in a target time period according to data records stored in the Hbase table before the target time period starts;
when the first number is larger than a second number, acquiring a partition with a storage rate exceeding a preset storage rate threshold value from the second number of partitions as a target partition, wherein the second number of partitions are partitions which are allocated in advance and need to occupy the Hbase table in the target time period;
when the number of the acquired target partitions is smaller than or equal to a third number, splitting the target partitions at splitting time corresponding to the target partitions, wherein the splitting time is later than the acquiring time of the target partitions, and the third number is equal to the first number minus the second number.
2. The method of claim 1, wherein predicting a first number of partitions in the Hbase table that need to be occupied during a target time period based on data records stored in the Hbase table prior to a start of the target time period comprises:
calculating the average data volume of each data record in the Hbase table according to the stored data record total volume of the Hbase table and the used space capacity in the Hbase table;
predicting the number of data records to be stored generated in the target time period through a prediction model;
and calculating a first number of partitions needing to occupy in the Hbase table in the target time period according to the number of the data records to be stored, the average data volume and the partition capacity.
3. The method of claim 2, wherein the predicting before the first number of partitions in the Hbase table need to be occupied within the target time period further comprises:
and generating the prediction model according to the data record stored in the Hbase table before the target time period starts.
4. The method of claim 3, wherein said generating the predictive model from data records stored in the Hbase table prior to the beginning of the target time period comprises:
acquiring a first unit time set and a second unit time set according to timestamps corresponding to all data records in the Hbase table, wherein the first unit time set comprises a first data record number generated in each unit time between a first timestamp and a second timestamp, the second unit time set comprises a first data record number generated in each unit time between the second timestamp and a third timestamp, the first timestamp is the earliest timestamp from the timestamps corresponding to all the data records in the Hbase table, the third timestamp is the latest timestamp from the timestamps corresponding to all the data records in the Hbase table, and the second timestamp is located between the first timestamp and the third timestamp;
acquiring a first parameter value in at least one Recurrent Neural Network (RNN) parameter, and setting the RNN parameter of the first RNN according to the first parameter value of the at least one RNN parameter to obtain a second RNN;
generating the prediction model from the first set of unit times, the second set of unit times, and the second RNN.
5. The method of claim 4, wherein the generating the prediction model from the first set of unit times, the second set of unit times, and the second RNN comprises:
generating a first model by the second RNN according to a first number of data records per unit time in a first set of unit times;
predicting the number of second data records generated in each unit time between a second timestamp and a third timestamp through the first model to obtain a third unit time set;
and when the second unit time set and the third unit time set meet preset conditions, determining the first model as a prediction model.
6. The method of claim 5, wherein the method further comprises:
when the second unit time set and the third unit time set do not meet preset conditions, acquiring a second parameter value corresponding to an RNN parameter, and setting the RNN parameter of the second RNN according to the second parameter value corresponding to the RNN parameter to obtain a third RNN;
generating the prediction model from the first set of unit times, the second set of unit times, and the third RNN.
7. The method of claim 1, wherein after obtaining the partition with the storage rate exceeding the preset storage rate threshold from the second number of partitions as the target partition, further comprising:
determining the current date of the acquisition time of the target partition, selecting a time point in the current date and determining the time point as the splitting time corresponding to the target partition, wherein the splitting time is later than the preset time point in the current date.
8. The method of any of claims 1 to 7, further comprising:
storing the data packet in the cache space of the message system in a partition of Hbase according to the configuration file;
the configuration file comprises at least one topic related information and at least one object set, wherein the topic related information at least comprises a topic identifier, an Hbase table identifier and an object set identifier;
the object set comprises at least one field domain information, and the field domain information at least comprises a field name and a column family to which the field belongs.
9. The method of claim 8, wherein storing the packet in the buffer space of the message system in the partition of the Hbase according to the configuration file comprises:
according to a theme identifier corresponding to a cache space of the message system, obtaining theme related information comprising the theme identifier from a configuration file, wherein the theme information further comprises an identifier of an Hbase table and a set identifier of an object set;
acquiring field content corresponding to each field name in an object set corresponding to the set identification from a data packet in a cache space of the message system;
and forming a data record by the obtained contents of the fields, and storing the data record in a partition of the Hbase table corresponding to the identifier of the Hbase table according to the column family to which the fields in the object set belong.
10. An apparatus for splitting partitions, the apparatus comprising:
the prediction module is used for predicting a first number of partitions needing to be occupied in the Hbase table in a target time period according to data records stored in the Hbase table before the target time period starts;
an obtaining module, configured to obtain, as a target partition, a partition whose storage rate exceeds a preset storage rate threshold from the second number of partitions when the first number is greater than a second number, where the second number of partitions is a partition that is pre-allocated and needs to occupy in the Hbase table within the target time period;
the splitting module is used for splitting the target partition at the splitting time corresponding to the target partition when the number of the obtained target partitions is smaller than or equal to a third number, wherein the splitting time is later than the obtaining time of the target partition, and the third number is equal to the first number minus the second number.
11. The apparatus of claim 10, wherein the prediction module comprises:
a first calculating unit, configured to calculate an average data amount of each data record in an Hbase table according to a total data record amount stored in the Hbase table and a used space capacity in the Hbase table;
a prediction unit for predicting the number of data records to be stored generated within the target time period by a prediction model;
and the second calculating unit is used for calculating a first number of partitions which need to occupy the Hbase table in the target time period according to the number of the data records to be stored, the average data volume and the partition capacity.
12. The apparatus of claim 11, wherein the apparatus further comprises:
and the generation module is used for generating the prediction model according to the data record stored in the Hbase table before the target time period starts.
13. The apparatus of claim 12, wherein the generating module comprises:
a first obtaining unit, configured to obtain a first unit time set and a second unit time set according to timestamps corresponding to respective data records in the Hbase table, where the first unit time set includes a first number of data records generated in each unit time between a first timestamp and a second timestamp, the second unit time set includes a first number of data records generated in each unit time between the second timestamp and a third timestamp, the first timestamp is a timestamp that is earliest from a current timestamp in the timestamps corresponding to the respective data records in the Hbase table, the third timestamp is a timestamp that is latest from the current timestamp in the timestamps corresponding to the respective data records in the Hbase table, and the second timestamp is located between the first timestamp and the third timestamp;
a second obtaining unit, configured to obtain a first parameter value of at least one recurrent neural network RNN parameter, and set an RNN parameter of the first RNN according to the first parameter value of the at least one RNN parameter, to obtain a second RNN;
a generating unit configured to generate the prediction model according to the first set of unit times, the second set of unit times, and the second RNN.
14. The apparatus of claim 13, wherein the generating unit is to:
generating a first model by the second RNN according to a first number of data records per unit time in a first set of unit times;
predicting the number of second data records generated in each unit time between a second timestamp and a third timestamp through the first model to obtain a third unit time set;
and when the second unit time set and the third unit time set meet preset conditions, determining the first model as a prediction model.
15. The apparatus of claim 14, wherein the generating unit is further configured to:
when the second unit time set and the third unit time set do not meet preset conditions, acquiring a second parameter value corresponding to an RNN parameter, and setting the RNN parameter of the second RNN according to the second parameter value corresponding to the RNN parameter to obtain a third RNN;
generating the prediction model from the first set of unit times, the second set of unit times, and the third RNN.
16. The apparatus of claim 10, wherein the apparatus further comprises:
the determining module is used for determining the current date of the obtaining time of the target partition, selecting a time point in the current date and determining the time point as the splitting time corresponding to the target partition, wherein the splitting time is later than the preset time point in the current date.
17. The apparatus of any of claims 10 to 16, further comprising:
the storage module is used for storing the data packet in the cache space of the message system in the partition of the Hbase according to the configuration file;
the configuration file comprises at least one topic related information and at least one object set, wherein the topic related information at least comprises a topic identifier, an Hbase table identifier and an object set identifier;
the object set comprises at least one field domain information, and the field domain information at least comprises a field name and a column family to which the field belongs.
18. The apparatus of claim 17, wherein the storage module comprises:
a third obtaining unit, configured to obtain, according to a subject identifier corresponding to a cache space of the message system, subject related information including the subject identifier from a configuration file, where the subject information further includes an identifier of an Hbase table and a set identifier of an object set;
a fourth obtaining unit, configured to obtain, from a data packet in a cache space of the message system, field content corresponding to each field name in an object set corresponding to the set identifier;
and the storage unit is used for forming a data record by the obtained contents of the fields, and storing the data record in a partition of the Hbase table corresponding to the identifier of the Hbase table according to the column family to which the fields in the object set belong.
CN201810494401.5A 2018-05-22 2018-05-22 Method and device for splitting partitions Active CN110519319B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810494401.5A CN110519319B (en) 2018-05-22 2018-05-22 Method and device for splitting partitions

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810494401.5A CN110519319B (en) 2018-05-22 2018-05-22 Method and device for splitting partitions

Publications (2)

Publication Number Publication Date
CN110519319A CN110519319A (en) 2019-11-29
CN110519319B true CN110519319B (en) 2022-02-11

Family

ID=68621791

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810494401.5A Active CN110519319B (en) 2018-05-22 2018-05-22 Method and device for splitting partitions

Country Status (1)

Country Link
CN (1) CN110519319B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110888846B (en) * 2019-12-10 2020-10-23 北京北龙云海网络数据科技有限责任公司 Data memory management method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6023706A (en) * 1997-07-11 2000-02-08 International Business Machines Corporation Parallel file system and method for multiple node file access
CN105988995A (en) * 2015-01-27 2016-10-05 杭州海康威视数字技术股份有限公司 HFile based data batch loading method
CN107169009A (en) * 2017-03-31 2017-09-15 北京奇艺世纪科技有限公司 A kind of data splitting method and device of distributed memory system
CN107480205A (en) * 2017-07-24 2017-12-15 北京京东尚科信息技术有限公司 A kind of method and apparatus for carrying out data partition
CN107943412A (en) * 2016-10-12 2018-04-20 阿里巴巴集团控股有限公司 A kind of subregion division, the method, apparatus and system for deleting data file in subregion

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101757307B1 (en) * 2013-08-20 2017-07-26 엘지전자 주식회사 Apparatus for transmitting media data via streaming service, apparatus for receiving media data via streaming service, method for transmitting media data via streaming service and method for receiving media data via streaming service

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6023706A (en) * 1997-07-11 2000-02-08 International Business Machines Corporation Parallel file system and method for multiple node file access
CN105988995A (en) * 2015-01-27 2016-10-05 杭州海康威视数字技术股份有限公司 HFile based data batch loading method
CN107943412A (en) * 2016-10-12 2018-04-20 阿里巴巴集团控股有限公司 A kind of subregion division, the method, apparatus and system for deleting data file in subregion
CN107169009A (en) * 2017-03-31 2017-09-15 北京奇艺世纪科技有限公司 A kind of data splitting method and device of distributed memory system
CN107480205A (en) * 2017-07-24 2017-12-15 北京京东尚科信息技术有限公司 A kind of method and apparatus for carrying out data partition

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"BESIII Physics Data Storing and Processing on HBase and MapReduce";Xiaofeng LEI;《21st International Conference on Computing in High Energy and Nuclear Physics》;20151231;全文 *
"基于增量式分区策略的MapReduce数据均衡方法";王卓;《计算机学报》;20160131;全文 *

Also Published As

Publication number Publication date
CN110519319A (en) 2019-11-29

Similar Documents

Publication Publication Date Title
CN110674022B (en) Behavior data acquisition method and device and storage medium
CN108259945B (en) Method and device for processing playing request for playing multimedia data
CN109451343A (en) Video sharing method, apparatus, terminal and storage medium
CN110248236B (en) Video playing method, device, terminal and storage medium
CN111327694B (en) File uploading method and device, storage medium and electronic equipment
CN111586431B (en) Method, device and equipment for live broadcast processing and storage medium
CN110196673B (en) Picture interaction method, device, terminal and storage medium
CN111836069A (en) Virtual gift presenting method, device, terminal, server and storage medium
CN110147503B (en) Information issuing method and device, computer equipment and storage medium
CN110569220B (en) Game resource file display method and device, terminal and storage medium
CN111177137A (en) Data deduplication method, device, equipment and storage medium
CN109451248B (en) Video data processing method and device, terminal and storage medium
CN110968815A (en) Page refreshing method, device, terminal and storage medium
CN111625315A (en) Page display method and device, electronic equipment and storage medium
CN111275607A (en) Interface display method and device, computer equipment and storage medium
CN108401194B (en) Time stamp determination method, apparatus and computer-readable storage medium
CN113032587A (en) Multimedia information recommendation method, system, device, terminal and server
CN113098781B (en) Session list processing method, device, server and storage medium
CN111694521B (en) Method, device and system for storing file
CN110519319B (en) Method and device for splitting partitions
CN110113669B (en) Method and device for acquiring video data, electronic equipment and storage medium
CN110851435B (en) Data storage method and device
CN110971692B (en) Method and device for opening service and computer storage medium
CN110213131B (en) Bandwidth determination method and device, computer equipment and storage medium
CN114785766A (en) Control method of intelligent equipment, terminal and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant