CN105786916B - Storage method and system for a hierarchical directory based on a large-capacity table - Google Patents
Storage method and system for a hierarchical directory based on a large-capacity table Download PDFInfo
- Publication number
- CN105786916B CN105786916B CN201410827977.0A CN201410827977A CN105786916B CN 105786916 B CN105786916 B CN 105786916B CN 201410827977 A CN201410827977 A CN 201410827977A CN 105786916 B CN105786916 B CN 105786916B
- Authority
- CN
- China
- Prior art keywords
- directory
- path
- data block
- layer
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明公开了一种基于大容量表的分层目录的存储方法及系统。该方法包括:将给定路径最低层对象或目录的首字母前加标识符,标识所述对象或目录为最低层,并存储所述对象或目录;获取所述对象或目录的父目录路径,并判断所述父目录路径是否为第一层,若不是,则在所述父目录路径的目录的首字母前加标识符,标识所述父目录路径为最低层,并存储所述父目录路径的目录。因此,实现了同一层的对象和目录集中存储,进而在检索时提高了效率。
The invention discloses a storage method and system of a hierarchical directory based on a large-capacity table. The method includes: adding an identifier before the first letter of the lowest layer object or directory of a given path, identifying the object or directory as the lowest layer, and storing the object or directory; obtaining the parent directory path of the object or directory, And judge whether described parent directory path is the first layer, if not, then add identifier before the first letter of the catalog of described parent directory path, mark described parent directory path as the lowest layer, and store described parent directory path Directory. Therefore, centralized storage of objects and directories at the same layer is realized, thereby improving efficiency during retrieval.
Description
技术领域technical field
本发明涉及云计算领域,尤其涉及一种基于大容量表的分层目录的存储方法及系统。The invention relates to the field of cloud computing, in particular to a storage method and system for a hierarchical directory based on a large-capacity table.
背景技术Background technique
传统关系型数据库对于万亿级条数的数据的性能和可扩容性方面都比较差。大容量表可以容纳PB级的数据,随着业务的发展,其数量可数十倍级的增长。虽然大容量表解决了容量的限制,但其目录或文件在存储方时,被查询的特定层级的目录或文件是随机分布的,这会导致大量的连续扫描,和随后的路径解析,从而消耗大量IO和CPU资源。例如,以抓取的字母排序比较靠前、中间、靠后的/bn1/a_dp1/a_dp2/b、/bn1/a_dp1/mfile_dp2、/bn1/a_dp1/z_dp2/z的路径为例:Traditional relational databases have poor performance and scalability for trillion-level data. Large-capacity tables can accommodate PB-level data, and with the development of business, the number can increase by dozens of times. Although the large-capacity table solves the limitation of capacity, when its directory or file is stored, the directory or file of a specific level to be queried is randomly distributed, which will cause a large number of continuous scans and subsequent path resolution, thus consuming Lots of IO and CPU resources. For example, take the captured paths of /bn1/a_dp1/a_dp2/b, /bn1/a_dp1/mfile_dp2, and /bn1/a_dp1/z_dp2/z in the front, middle, and back of alphabetical sorting as an example:
/bn1/a_dp1/a_dp2/b/bn1/a_dp1/a_dp2/b
........
/bn1/a_dp1/a_dp2/c/23/s/bn1/a_dp1/a_dp2/c/23/s
/bn1/a_dp1/ab_dp2/c/23/s/bn1/a_dp1/ab_dp2/c/23/s
(中间共有600亿个对象记录)(a total of 60 billion object records in the middle)
........
/bn1/a_dp1/mfile_dp2/bn1/a_dp1/mfile_dp2
........
/bn1/a_dp1/a_dp2/c/23/s/bn1/a_dp1/a_dp2/c/23/s
/bn1/a_dp1/ab_dp2/c/23/s/bn1/a_dp1/ab_dp2/c/23/s
(中间共有400亿个对象记录)(a total of 40 billion object records in the middle)
........
/bn1/a_dp1/z_dp2/z/bn1/a_dp1/z_dp2/z
假如在一台服务器上可以存放100亿条对象记录,有10台服务器,为检索/bn1/a_dp1的下一级目录或文件,需要对以/bn1/a_dp1/a_dp2/b到/bn1/a_dp1/z_dp2/z止的10台服务器进行全扫描,且在扫描的同时还需要对每条记录的对象名进行分级解析,以截取到第3级目录,这样的话做一次目录检索,需要对10台服务器的存储做一次全扫描,对1000亿条对象记录做字符串截取、匹配、去重多种运算。因此,大容量表的目录存储方式,使得同一层的目录和对象不能集中存储,导致在检索时效率极低。If 10 billion object records can be stored on one server, and there are 10 servers, in order to retrieve the lower-level directory or file of /bn1/a_dp1, it is necessary to search from /bn1/a_dp1/a_dp2/b to /bn1/a_dp1/ The 10 servers limited to z_dp2/z are fully scanned, and at the same time, the object name of each record needs to be analyzed hierarchically to intercept the third-level directory. In this case, 10 servers need to be searched for a directory Perform a full scan of the storage, and perform string interception, matching, and deduplication operations on 100 billion object records. Therefore, the directory storage method of the large-capacity table prevents the directories and objects of the same layer from being stored centrally, resulting in extremely low retrieval efficiency.
发明内容Contents of the invention
本发明要解决的技术问题是大容量表的目录和对象存储方式,使得同一层的目录和对象不能集中存储,导致在检索时效率极低。The technical problem to be solved by the present invention is that the directory and object storage method of the large-capacity table prevents centralized storage of the directories and objects of the same layer, resulting in extremely low retrieval efficiency.
根据本发明一方面,提出一种基于大容量表的分层目录的存储方法,包括:According to one aspect of the present invention, a storage method for a hierarchical directory based on a large-capacity table is proposed, including:
将给定路径最低层对象或目录的首字母前加标识符,标识所述对象或目录为最低层,并存储所述对象或目录;add an identifier before the first letter of the lowest-level object or directory of the given path, identify the object or directory as the lowest level, and store the object or directory;
获取所述对象或目录的父目录路径,并判断所述父目录路径是否为第一层,若不是,则在所述父目录路径的目录的首字母前加标识符,标识所述父目录路径为最低层,并存储所述父目录路径的目录。Obtain the parent directory path of the object or directory, and judge whether the parent directory path is the first layer, if not, add an identifier before the first letter of the directory of the parent directory path to identify the parent directory path is the lowest level and stores the directory of the parent directory path.
进一步,对象或目录作为大容量表的键(key),将所述目录下一层所对应的对象数目作为所述目录所对应的值(value);Further, the object or directory is used as the key (key) of the large-capacity table, and the number of objects corresponding to the lower layer of the directory is used as the corresponding value (value) of the directory;
在添加对象时,该对象的上层目录对应的value递增;When adding an object, the value corresponding to the upper directory of the object is incremented;
在删除对象时,先删除该对象的键值对记录,并将所述对象的上层目录的value递减。When deleting an object, first delete the key-value pair record of the object, and decrement the value of the upper directory of the object.
进一步,在添加对象时,如果该对象的上层目录不存在,则先创建目录,并将所述目录对应的value设置为1;Further, when adding an object, if the upper-level directory of the object does not exist, create the directory first, and set the value corresponding to the directory to 1;
在删除对象时,如果该对象的上层目录的value递减后为0,则将所述目录删除。When deleting an object, if the value of the upper directory of the object is decremented to 0, the directory is deleted.
进一步,保存数据块的起始记录和终止记录、以及所述数据块所存储的服务器,其中,所述数据块包括大容量表内的对象或目录的记录;Further, save the start record and end record of the data block, and the server where the data block is stored, wherein the data block includes a record of an object or a directory in a large-capacity table;
在检索目录或对象时,先通过元数据表中数据块的起始记录和终止记录找到与要检索的目录或对象对应的数据块,再找到所述数据块所存储的服务器,检索所述服务器。When retrieving a directory or object, first find the data block corresponding to the directory or object to be retrieved through the start record and end record of the data block in the metadata table, then find the server where the data block is stored, and retrieve the server .
进一步,所述标识符为前导符,标识同一层的对象或目录排序前置。Further, the identifier is a leading character, which identifies objects or directories at the same layer as being in front of the sequence.
根据本发明的另一方面,还提出一种基于大容量表的分层目录的存储系统,包括:According to another aspect of the present invention, a storage system based on a hierarchical directory of a large-capacity table is also proposed, including:
处理模块,用于将给定路径最低层对象或目录的首字母前加标识符,标识所述对象或目录为最低层;获取所述对象或目录的父目录路径,并判断所述父目录路径是否为第一层,若不是,则在所述父目录路径的目录的首字母前加标识符,标识所述父目录路径为最低层;The processing module is used to add an identifier before the first letter of the lowest-level object or directory of a given path to identify the object or directory as the lowest level; obtain the parent directory path of the object or directory, and determine the parent directory path Whether it is the first layer, if not, add an identifier before the first letter of the directory of the parent directory path, and identify the parent directory path as the lowest layer;
存储模块,用于存储对象或目录。Storage module, used to store objects or directories.
进一步,维护模块,用于将对象或目录作为大容量表的key,将所述目录下一层所对应的对象数目作为所述目录所对应的value;在添加对象时,该对象的上层目录对应的value递增;在删除对象时,先删除该对象的键值对记录,并将所述对象的上层目录的value递减。Further, the maintenance module is used to use the object or directory as the key of the large-capacity table, and the number of objects corresponding to the lower layer of the directory as the value corresponding to the directory; when adding an object, the upper directory of the object corresponds to The value of the object is incremented; when deleting an object, the key-value pair record of the object is deleted first, and the value of the upper directory of the object is decremented.
进一步,维护模块用于在添加对象时,如果该对象的上层目录不存在,则先创建目录,并将所述目录对应的value设置为1;在删除对象时,如果该对象的上层目录的value递减后为0,则将所述目录删除。Further, the maintenance module is used to create a directory first if the upper-level directory of the object does not exist when adding an object, and set the value corresponding to the directory to 1; when deleting an object, if the value of the upper-level directory of the object If it is 0 after decrementing, the directory will be deleted.
进一步,记录模块,用于保存数据块的起始记录和终止记录、以及所述数据块所存储的服务器,其中,所述数据块包括大容量表内的对象或目录的记录;Further, the record module is used to save the start record and end record of the data block and the server where the data block is stored, wherein the data block includes the records of objects or directories in the large-capacity table;
其中,在检索目录或对象时,先通过元数据表中数据块的起始记录和终止记录找到与要检索的目录或对象对应的数据块,再找到所述数据块所存储的服务器,检索所述服务器。Wherein, when retrieving a directory or object, first find the data block corresponding to the directory or object to be retrieved through the start record and end record of the data block in the metadata table, then find the server where the data block is stored, and retrieve the data block. said server.
进一步,所述标识符为前导符,标识同一层的对象或目录排序前置。Further, the identifier is a leading character, which identifies objects or directories at the same layer as being in front of the sequence.
与现有技术相比,本发明通过将给定路径最低层对象或目录的首字母前加标识符,标识所述对象或目录为最低层,并存储所述对象或目录;获取所述对象或目录的父目录路径,并判断所述父目录路径是否为第一层,若不是,则在所述父目录路径的目录的首字母前加标识符,标识所述父目录路径为最低层,并存储所述父目录路径的目录。因此,实现了同一层的对象和目录集中存储,进而在检索时提高了效率。Compared with the prior art, the present invention identifies the object or directory as the lowest layer by adding an identifier before the first letter of the lowest layer object or directory of a given path, and stores the object or directory; obtains the object or directory The parent directory path of the directory, and judge whether the parent directory path is the first layer, if not, add an identifier before the first letter of the directory of the parent directory path, identify the parent directory path as the lowest level, and The directory where the path to the parent directory is stored. Therefore, centralized storage of objects and directories at the same layer is realized, thereby improving efficiency during retrieval.
通过以下参照附图对本发明的示例性实施例的详细描述,本发明的其它特征及其优点将会变得清楚。Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments of the present invention with reference to the accompanying drawings.
附图说明Description of drawings
构成说明书的一部分的附图描述了本发明的实施例,并且连同说明书一起用于解释本发明的原理。The accompanying drawings, which constitute a part of this specification, illustrate the embodiments of the invention and together with the description serve to explain the principles of the invention.
参照附图,根据下面的详细描述,可以更加清楚地理解本发明,其中:The present invention can be more clearly understood from the following detailed description with reference to the accompanying drawings, in which:
图1为本发明基于大容量表的分层目录的存储方法的一个实施例的流程示意图。FIG. 1 is a schematic flow chart of an embodiment of the method for storing a hierarchical directory based on a large-capacity table in the present invention.
图2为本发明基于大容量表的分层目录的存储系统的一个实施例的结构示意图。FIG. 2 is a schematic structural diagram of an embodiment of a storage system based on a hierarchical directory of a large-capacity table according to the present invention.
图3为本发明分层目录存储与现有技术存储方式对比图。Fig. 3 is a comparison diagram between the hierarchical directory storage of the present invention and the storage method of the prior art.
具体实施方式Detailed ways
现在将参照附图来详细描述本发明的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本发明的范围。Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that the relative arrangements of components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。At the same time, it should be understood that, for the convenience of description, the sizes of the various parts shown in the drawings are not drawn according to the actual proportional relationship.
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本发明及其应用或使用的任何限制。The following description of at least one exemplary embodiment is merely illustrative in nature and in no way taken as limiting the invention, its application or uses.
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为授权说明书的一部分。Techniques, methods and devices known to those of ordinary skill in the relevant art may not be discussed in detail, but where appropriate, such techniques, methods and devices should be considered part of the Authorized Specification.
在这里示出和讨论的所有示例中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它示例可以具有不同的值。In all examples shown and discussed herein, any specific values should be construed as illustrative only, and not as limiting. Therefore, other examples of the exemplary embodiment may have different values.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。It should be noted that like numerals and letters denote like items in the following figures, therefore, once an item is defined in one figure, it does not require further discussion in subsequent figures.
为使本发明的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本发明进一步详细说明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be described in further detail below in conjunction with specific embodiments and with reference to the accompanying drawings.
图1为本发明基于大容量表的分层目录的存储方法的一个实施例的流程示意图。该方法包括以下步骤:FIG. 1 is a schematic flow chart of an embodiment of the method for storing a hierarchical directory based on a large-capacity table in the present invention. The method includes the following steps:
在步骤110,将给定路径最低层对象或目录的首字母前加标识符,标识所述对象或目录为最低层,并存储所述对象或目录。In step 110, an identifier is added before the first letter of the lowest-level object or directory of the given path to identify the object or directory as the lowest level, and store the object or directory.
在步骤120,获取所述对象或目录的父目录路径,并判断所述父目录路径是否为第一层,若不是,则执行步骤130,若是,则执行步骤140。In step 120, obtain the parent directory path of the object or directory, and judge whether the parent directory path is the first level, if not, execute step 130, and if yes, execute step 140.
在步骤130,在所述父目录路径的目录的首字母前加标识符,标识所述父目录路径为最低层,并存储所述父目录路径的目录。In step 130, an identifier is added before the first letter of the directory of the parent directory path to identify the parent directory path as the lowest level, and the directory of the parent directory path is stored.
在步骤140,不再执行加标识符和存储操作。At step 140, no further identifier and store operations are performed.
其中,所述标识符可以为前导符,例如,字符最小值(Character.MIN-VALUE),Character.MIN-VALUE在计算机中是一个不可打印和显示输出的字符,容易分辨起见以Δ表示,由于系统默认按字符顺序排列,通过在对象或目录首字母前加前导符,就能实现同一层的对象或目录排序前置,并连续存放在一起。本领域的技术人员应当理解,Δ只是用于举例,在计算机中可以用其他任何形状标识,不应理解为对本发明的限制。Wherein, the identifier can be a leading character, for example, the character minimum value (Character.MIN-VALUE), Character.MIN-VALUE is a character that cannot be printed and displayed in the computer, and it is represented by Δ for the sake of easy identification, because The system arranges in alphabetical order by default. By adding a leading character before the first letter of the object or directory, the objects or directories at the same level can be sorted and stored together consecutively. Those skilled in the art should understand that Δ is just for example, and can be identified by any other shape in the computer, which should not be construed as a limitation of the present invention.
以对象/bucketname1/a_depth1/z_depth2/z为例,bucketname1为第一层,a_depth1为第二层,z_depth2为第三层,z为最低层,即,对象或目录的路径从左到右依次为第一层/第二层/第三层/第四层/……/第n层/最低层,则对象/bucketname1/a_depth1/z_depth2/z经过计算的输出为:Take the object /bucketname1/a_depth1/z_depth2/z as an example, bucketname1 is the first layer, a_depth1 is the second layer, z_depth2 is the third layer, and z is the lowest layer, that is, the path of the object or directory is the first layer from left to right. One layer/second layer/third layer/fourth layer/.../nth layer/lowest layer, then the calculated output of the object /bucketname1/a_depth1/z_depth2/z is:
/bucketname1/a_depth1/z_depth2/Δz/bucketname1/a_depth1/z_depth2/Δz
/bucketname1/a_depth1/Δz_depth2/bucketname1/a_depth1/Δz_depth2
/bucketname1/Δa_depth1/bucketname1/Δa_depth1
由于,第一层为存储段(bucketname)层,不是对象,所以不在第一层加标识符。Since the first layer is the storage segment (bucketname) layer, not an object, no identifier is added to the first layer.
本发明将同一层对象或目录存储在一起,例如:The present invention stores objects or directories of the same layer together, for example:
/bn1/Δa_dp1//bn1/Δa_dp1/
……...
/bn1/a_dp1/Δa_dp2//bn1/a_dp1/Δa_dp2/
/bn1/a_dp1/Δab_dp2//bn1/a_dp1/Δab_dp2/
/bn1/a_dp1/Δmfile_dp2/bn1/a_dp1/Δmfile_dp2
/bn1/a_dp1/Δz_dp2//bn1/a_dp1/Δz_dp2/
……...
/bn1/a_dp1/a_dp2/Δb/bn1/a_dp1/a_dp2/Δb
/bn1/a_dp1/a_dp2/Δc//bn1/a_dp1/a_dp2/Δc/
/bn1/a_dp1/ab_dp2/Δc//bn1/a_dp1/ab_dp2/Δc/
……...
/bn1/a_dp1/z_dp2/Δz/bn1/a_dp1/z_dp2/Δz
其中,以“/”结尾的表示目录,无“/”结尾的表示对象。Among them, the ones ending with "/" represent directories, and the ones without "/" denote objects.
利用元数据表,对大容量表的情况进行记录,例如,记录大容量表中对象或目录在服务器以及在服务器的数据块上的存储情况,比如一个数据块大小是128M,元数据表会记录数据块所在的服务器以及数据块的起始记录和终止记录,通过元数据表的检索可以很快定位到需要查询的对象或目录。其中,元数据表可以单独放在一台服务器上,也可以放在多台服务器上。Use the metadata table to record the situation of the large-capacity table, for example, record the storage status of the objects or directories in the large-capacity table on the server and on the data block of the server. For example, if the size of a data block is 128M, the metadata table will record The server where the data block is located and the start record and end record of the data block can quickly locate the object or directory that needs to be queried through the retrieval of the metadata table. Wherein, the metadata table can be placed on one server alone, or on multiple servers.
例如在检索/bn1/a_dp1/的下一级对象或目录时,先通过元数据表中数据块的起始记录和终止记录找到与要检索的目录或对象对应的数据块,再找到所述数据块所存储的服务器,检索所述服务器。For example, when retrieving the lower-level object or directory of /bn1/a_dp1/, first find the data block corresponding to the directory or object to be retrieved through the start record and end record of the data block in the metadata table, and then find the data The server where the block is stored, retrieves the server.
如果服务器中有1000亿条记录,三层目录有100万条记录,现有技术中要IO操作1000亿次,而本发明所需要做的IO操作是100万次。If there are 100 billion records in the server, and there are 1 million records in the three-layer directory, 100 billion IO operations are required in the prior art, but 1 million IO operations are required in the present invention.
另外,现有技术中对1000亿条对象记录做字符串截取、匹配、去重等多种运算,而本发明的对象记录仅需去掉前导符操作。In addition, in the prior art, multiple operations such as string interception, matching, and deduplication are performed on 100 billion object records, but the object records of the present invention only need to remove the leading characters.
在本发明的实施例中,将给定路径最低层对象或目录的首字母前加标识符,标识所述对象或目录为最低层,并存储所述对象或目录;获取所述对象或目录的父目录路径,并判断所述父目录路径是否为第一层,若不是,则在所述父目录路径的目录的首字母前加标识符,标识所述父目录路径为最低层,并存储所述父目录路径的目录。因此,实现了同一层的对象和目录集中存储,进而在检索时提高了效率。In an embodiment of the present invention, an identifier is added before the first letter of the lowest-level object or directory of a given path, and the object or directory is identified as the lowest layer, and the object or directory is stored; the object or directory is obtained parent directory path, and judge whether the parent directory path is the first layer, if not, add an identifier before the first letter of the directory of the parent directory path, identify the parent directory path as the lowest level, and store all A directory that describes the path of the parent directory. Therefore, centralized storage of objects and directories at the same layer is realized, thereby improving efficiency during retrieval.
本发明的另一个实施例,对象或目录作为大容量表的键(key),该大容量表为对象目录分层存储表,将所述目录下一层所对应的对象数目作为所述目录所对应的值(value)。In another embodiment of the present invention, the object or directory is used as a key (key) of a large-capacity table, and the large-capacity table is an object directory hierarchical storage table, and the number of objects corresponding to the next layer of the directory is used as the number of objects in the directory. The corresponding value (value).
在添加对象时,该对象的上层目录对应的value递增,如果该对象的上层目录不存在,则先创建目录,并将所述目录对应的value设置为1。When an object is added, the value corresponding to the upper-level directory of the object is incremented. If the upper-level directory of the object does not exist, the directory is created first, and the value corresponding to the directory is set to 1.
在删除对象时,先删除该对象的键值对记录,并将所述对象的上层目录的value递减,如果该对象的上层目录的value递减后为0,则将所述目录删除。When deleting an object, first delete the key-value pair record of the object, and decrement the value of the upper directory of the object, if the value of the upper directory of the object is decremented to 0, then delete the directory.
在本发明的实施例中,在目录层次索引,删除一个目录下的对象时,如果这个目录下为空,则会将该目录一并删除;在增加或删除对象时,通过对各层分解后的目录下的对象数进行累加或递减,并在目录下的对象数为0时删除该目录,在提高检索性能和存储容量的同时实现了对目录文件增删维护管理的功能。In the embodiment of the present invention, when deleting an object under a directory in the directory hierarchy index, if the directory is empty, the directory will be deleted together; when adding or deleting objects, after decomposing each layer Accumulate or decrement the number of objects in the directory, and delete the directory when the number of objects in the directory is 0, while improving retrieval performance and storage capacity, it realizes the function of adding, deleting, maintaining and managing directory files.
图2为本发明基于大容量表的分层目录的存储系统的一个实施例的结构示意图。该系统包括:处理模块210和存储模块220。其中:FIG. 2 is a schematic structural diagram of an embodiment of a storage system based on a hierarchical directory of a large-capacity table according to the present invention. The system includes: a processing module 210 and a storage module 220 . in:
处理模块210,用于将给定路径最低层对象或目录的首字母前加标识符,标识所述对象或目录为最低层;获取所述对象或目录的父目录路径,并判断所述父目录路径是否为第一层,若不是,则在所述父目录路径的目录的首字母前加标识符,标识所述父目录路径为最低层。The processing module 210 is used to add an identifier before the first letter of the lowest-level object or directory of a given path to identify the object or directory as the lowest level; obtain the parent directory path of the object or directory, and determine the parent directory Whether the path is the first level, if not, add an identifier before the first letter of the directory of the parent directory path, and identify the parent directory path as the lowest level.
存储模块220,用于存储对象或目录。The storage module 220 is used for storing objects or directories.
其中,所述标识符可以为前导符,例如,字符最小值(Character.MIN-VALUE),Character.MIN-VALUE在计算机中是一个不可打印和显示输出的字符,容易分辨起见以Δ表示,由于系统默认按字符顺序排列,通过在对象或目录首字母前加前导符,就能实现同一层的对象或目录排序前置,并连续存放在一起。本领域的技术人员应当理解,Δ只是用于举例,在计算机中可以用其他任何形状标识,不应理解为对本发明的限制。Wherein, the identifier can be a leading character, for example, the character minimum value (Character.MIN-VALUE), Character.MIN-VALUE is a character that cannot be printed and displayed in the computer, and it is represented by Δ for the sake of easy identification, because The system arranges in alphabetical order by default. By adding a leading character before the first letter of the object or directory, the objects or directories at the same level can be sorted and stored together consecutively. Those skilled in the art should understand that Δ is just for example, and can be identified by any other shape in the computer, which should not be construed as a limitation of the present invention.
以对象/bucketname1/a_depth1/z_depth2/z为例,bucketname1为第一层,a_depth1为第二层,z_depth2为第三层,z为最低层,即,对象或目录的路径从左到右依次为第一层/第二层/第三层/第四层/……/第n层/最低层,则对象/bucketname1/a_depth1/z_depth2/z经过计算的输出为:Take the object /bucketname1/a_depth1/z_depth2/z as an example, bucketname1 is the first layer, a_depth1 is the second layer, z_depth2 is the third layer, and z is the lowest layer, that is, the path of the object or directory is the first layer from left to right. One layer/second layer/third layer/fourth layer/.../nth layer/lowest layer, then the calculated output of the object /bucketname1/a_depth1/z_depth2/z is:
/bucketname1/a_depth1/z_depth2/Δz/bucketname1/a_depth1/z_depth2/Δz
/bucketname1/a_depth1/Δz_depth2/bucketname1/a_depth1/Δz_depth2
/bucketname1/Δa_depth1/bucketname1/Δa_depth1
由于,第一层为存储段(bucketname)层,不是对象,所以不在第一层加标识符。Since the first layer is the storage segment (bucketname) layer, not an object, no identifier is added to the first layer.
本发明将同一层对象或目录存储在一起,例如:The present invention stores objects or directories of the same layer together, for example:
/bn1/Δa_dp1//bn1/Δa_dp1/
……...
/bn1/a_dp1/Δa_dp2//bn1/a_dp1/Δa_dp2/
/bn1/a_dp1/Δab_dp2//bn1/a_dp1/Δab_dp2/
/bn1/a_dp1/Δmfile_dp2/bn1/a_dp1/Δmfile_dp2
/bn1/a_dp1/Δz_dp2//bn1/a_dp1/Δz_dp2/
……...
/bn1/a_dp1/a_dp2/Δb/bn1/a_dp1/a_dp2/Δb
/bn1/a_dp1/a_dp2/Δc//bn1/a_dp1/a_dp2/Δc/
/bn1/a_dp1/ab_dp2/Δc//bn1/a_dp1/ab_dp2/Δc/
……...
/bn1/a_dp1/z_dp2/Δz/bn1/a_dp1/z_dp2/Δz
其中,以“/”结尾的表示目录,无“/”结尾的表示对象。Among them, the ones ending with "/" represent directories, and the ones without "/" denote objects.
该系统还包括记录模块230,用于保存数据块的起始记录和终止记录、以及所述数据块所存储的服务器,其中,所述数据块包括大容量表内的对象或目录的记录。所述记录模块230可以为元数据表。例如,一个数据块大小是128M,元数据表会记录数据块所在的服务器以及数据块的起始记录和终止记录,通过元数据表的检索可以很快定位到需要查询的对象或目录。其中,元数据表可以单独放在一台服务器上,也可以放在多台服务器上。The system also includes a record module 230, which is used to save the start record and end record of the data block, and the server where the data block is stored, wherein the data block includes records of objects or directories in a large-capacity table. The recording module 230 may be a metadata table. For example, if the size of a data block is 128M, the metadata table will record the server where the data block is located and the start record and end record of the data block. Through the retrieval of the metadata table, the object or directory to be queried can be quickly located. Wherein, the metadata table can be placed on one server alone, or on multiple servers.
例如在检索/bn1/a_dp1/的下一级对象或目录时,先通过元数据表中数据块的起始记录和终止记录找到与要检索的目录或对象对应的数据块,再找到所述数据块所存储的服务器,检索所述服务器。For example, when retrieving the lower-level object or directory of /bn1/a_dp1/, first find the data block corresponding to the directory or object to be retrieved through the start record and end record of the data block in the metadata table, and then find the data The server where the block is stored, retrieves the server.
如果服务器中有1000亿条记录,三层目录有100万条记录,现有技术中要IO操作1000亿次,而本发明所需要做的IO操作是100万次。If there are 100 billion records in the server, and there are 1 million records in the three-level directory, 100 billion IO operations are required in the prior art, but the present invention requires 1 million IO operations.
另外,现有技术中对1000亿条对象记录做字符串截取、匹配、去重等多种运算,而本发明的对象记录仅需去掉前导符操作。In addition, in the prior art, multiple operations such as string interception, matching, and deduplication are performed on 100 billion object records, but the object records of the present invention only need to remove the leading characters.
在本发明的实施例中,通过将给定路径最低层对象或目录的首字母前加标识符,标识所述对象或目录为最低层;获取所述对象或目录的父目录路径,并判断所述父目录路径是否为第一层,若不是,则在所述父目录路径的目录的首字母前加标识符,标识所述父目录路径为最低层。因此,实现了同一层的对象和目录集中存储,进而在检索时提高了效率。In an embodiment of the present invention, by adding an identifier before the initial letter of the lowest-level object or directory of a given path, the object or directory is identified as the lowest layer; the parent directory path of the object or directory is obtained, and the Whether the parent directory path is the first level, if not, an identifier is added before the first letter of the directory of the parent directory path to identify the parent directory path as the lowest level. Therefore, centralized storage of objects and directories at the same layer is realized, thereby improving efficiency during retrieval.
本发明的另一实施例,该系统还包括维护模块240,用于将对象或目录作为大容量表的key,该大容量表为对象目录分层存储表,将所述目录下一层所对应的对象数目作为所述目录所对应的值(value),在添加对象时,该对象的上层目录对应的value递增,如果该对象的上层目录不存在,则先创建目录,并将所述目录对应的value设置为1,在删除对象时,先删除该对象的键值对记录,并将所述对象的上层目录的value递减,如果该对象的上层目录的value递减后为0,则将所述目录删除。In another embodiment of the present invention, the system also includes a maintenance module 240, which is used to use the object or directory as the key of the large-capacity table. The large-capacity table is a hierarchical storage table for the object directory, and the corresponding The number of objects in is used as the value corresponding to the directory. When an object is added, the value corresponding to the upper-level directory of the object is incremented. If the upper-level directory of the object does not exist, the directory is created first, and the corresponding directory The value of the object is set to 1. When deleting an object, the key-value pair record of the object is deleted first, and the value of the upper directory of the object is decremented. If the value of the upper directory of the object is decremented to 0, the Directory deletion.
在本发明的实施例中,在目录层次索引,删除一个目录下的对象时,如果这个目录下为空,则会将该目录一并删除;在增加或删除对象时,通过对各层分解后的目录下的对象数进行累加或递减,并在目录下的对象数为0时删除该目录,在提高检索性能和存储容量的同时实现了对目录文件增删维护管理的功能。In the embodiment of the present invention, when deleting an object under a directory in the directory hierarchy index, if the directory is empty, the directory will be deleted together; when adding or deleting objects, after decomposing each layer Accumulate or decrement the number of objects in the directory, and delete the directory when the number of objects in the directory is 0, while improving retrieval performance and storage capacity, it realizes the function of adding, deleting, maintaining and managing directory files.
下面以一个具体实施例对本发明做进一步说明。The present invention will be further described below with a specific embodiment.
如图3所示,首先在10台配置为2路intel E5-2640v2、128GB内存、4T硬盘空间的服务器上部署分布式数据库,然后在其上创建对象目录分层存储表。As shown in Figure 3, first deploy a distributed database on 10 servers configured as 2-way intel E5-2640v2, 128GB memory, and 4T hard disk space, and then create an object directory hierarchical storage table on it.
当云存储用户上传对象时,先对路径最低层对象或目录的首字母前加前导符进行转义,然后获取所述对象或目录的父目录路径,并判断所述父目录路径是否为第一层,若不是,则在所述父目录路径的目录的首字母前加前导符进行转义,并存储所述父目录路径的目录,将同一层的对象和目录集中存储。When a cloud storage user uploads an object, first escape the leading character before the first letter of the object or directory at the lowest level of the path, then obtain the path of the parent directory of the object or directory, and determine whether the path of the parent directory is the first layer, if not, add a leading character before the first letter of the directory of the parent directory path to escape, and store the directory of the parent directory path, and centrally store the objects and directories of the same layer.
在添加对象时,该对象的上层目录对应的value递增,如果该对象的上层目录不存在,则先创建目录,并将所述目录对应的value设置为1,在删除对象时,先删除该对象的键值对记录,并将所述对象的上层目录的value递减,如果该对象的上层目录的value递减后为0,则将所述目录删除。When adding an object, the value corresponding to the upper-level directory of the object is incremented. If the upper-level directory of the object does not exist, create the directory first, and set the value corresponding to the directory to 1. When deleting the object, delete the object first The key-value pair record of the object, and decrement the value of the upper directory of the object, if the value of the upper directory of the object is decremented to 0, then delete the directory.
其中,利用元数据表,对对象目录分层存储表的情况进行记录,例如,记录对象目录分层存储表中对象或目录在服务器以及在服务器的数据块上的存储情况。在检索目录或对象时,先通过元数据表中数据块的起始记录和终止记录找到与要检索的目录或对象对应的数据块,再找到所述数据块所存储的服务器,检索所述服务器。Wherein, the metadata table is used to record the situation of the hierarchical storage table of the object directory, for example, the storage situation of the object or the directory in the hierarchical storage table of the object directory on the server and on the data block of the server. When retrieving a directory or object, first find the data block corresponding to the directory or object to be retrieved through the start record and end record of the data block in the metadata table, then find the server where the data block is stored, and retrieve the server .
现有技术中,检索目录或对象时,需要对服务器的存储做一次全扫描,并且要对所检索的目录或对象做字符串截取、匹配、去重等多种运算。In the prior art, when retrieving a directory or object, it is necessary to perform a full scan on the storage of the server, and to perform various operations such as string interception, matching, and deduplication on the retrieved directory or object.
在本发明的实施例中,与传统数据库相比,本发明只需遍历所要查询的某一目录或对象层的数据,并且,目录或对象记录仅需去掉前导符操作。因此,本发明计算和存储能力横向扩展了10倍,且可以方便的做横向扩容,相较于大容量表默认存储方法,其检索效率提高了5个数量级。In the embodiment of the present invention, compared with the traditional database, the present invention only needs to traverse the data of a certain directory or object layer to be queried, and the directory or object record only needs to remove the leader. Therefore, the computing and storage capabilities of the present invention have been expanded horizontally by 10 times, and horizontal expansion can be conveniently performed. Compared with the default storage method for large-capacity tables, the retrieval efficiency has been improved by 5 orders of magnitude.
至此,已经详细描述了本发明。为了避免遮蔽本发明的构思,没有描述本领域所公知的一些细节。本领域技术人员根据上面的描述,完全可以明白如何实施这里公开的技术方案。So far, the present invention has been described in detail. Certain details well known in the art have not been described in order to avoid obscuring the inventive concept. Based on the above description, those skilled in the art can fully understand how to implement the technical solutions disclosed herein.
可能以许多方式来实现本发明的方法以及装置。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本发明的方法以及装置。用于所述方法的步骤的上述顺序仅是为了进行说明,本发明的方法的步骤不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本发明实施为记录在记录介质中的程序,这些程序包括用于实现根据本发明的方法的机器可读指令。因而,本发明还覆盖存储用于执行根据本发明的方法的程序的记录介质。The method and apparatus of the invention may be implemented in many ways. For example, the method and device of the present invention can be realized by software, hardware, firmware or any combination of software, hardware, and firmware. The above sequence of steps used in the method is for illustration only, and the steps of the method of the present invention are not limited to the sequence specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present invention can also be implemented as programs recorded in recording media including machine-readable instructions for realizing the method according to the present invention. Thus, the present invention also covers a recording medium storing a program for executing the method according to the present invention.
虽然已经通过示例对本发明的一些特定实施例进行了详细说明,但是本领域的技术人员应该理解,以上示例仅是为了进行说明,而不是为了限制本发明的范围。本领域的技术人员应该理解,可在不脱离本发明的范围和精神的情况下,对以上实施例进行修改。本发明的范围由所附权利要求来限定。Although some specific embodiments of the present invention have been described in detail through examples, those skilled in the art should understand that the above examples are for illustration only, rather than limiting the scope of the present invention. Those skilled in the art will appreciate that modifications can be made to the above embodiments without departing from the scope and spirit of the invention. The scope of the invention is defined by the appended claims.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410827977.0A CN105786916B (en) | 2014-12-26 | 2014-12-26 | Storage method and system for a hierarchical directory based on a large-capacity table |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410827977.0A CN105786916B (en) | 2014-12-26 | 2014-12-26 | Storage method and system for a hierarchical directory based on a large-capacity table |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105786916A CN105786916A (en) | 2016-07-20 |
CN105786916B true CN105786916B (en) | 2019-11-12 |
Family
ID=56388918
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410827977.0A Active CN105786916B (en) | 2014-12-26 | 2014-12-26 | Storage method and system for a hierarchical directory based on a large-capacity table |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105786916B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109144991B (en) * | 2017-06-15 | 2021-09-14 | 北京京东尚科信息技术有限公司 | Method and device for dynamic sub-metering, electronic equipment and computer-storable medium |
CN110797082A (en) * | 2019-10-24 | 2020-02-14 | 福建和瑞基因科技有限公司 | Method and system for storing and reading gene sequencing data |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101226552A (en) * | 2008-02-01 | 2008-07-23 | 北京乾坤化物数字技术有限公司 | Method for management of magnanimity information using directory composed of multidimensional structure tree |
CN103870588A (en) * | 2014-03-27 | 2014-06-18 | 杭州朗和科技有限公司 | Method and device used in database |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4945475A (en) * | 1986-10-30 | 1990-07-31 | Apple Computer, Inc. | Hierarchical file system to provide cataloging and retrieval of data |
-
2014
- 2014-12-26 CN CN201410827977.0A patent/CN105786916B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101226552A (en) * | 2008-02-01 | 2008-07-23 | 北京乾坤化物数字技术有限公司 | Method for management of magnanimity information using directory composed of multidimensional structure tree |
CN103870588A (en) * | 2014-03-27 | 2014-06-18 | 杭州朗和科技有限公司 | Method and device used in database |
Also Published As
Publication number | Publication date |
---|---|
CN105786916A (en) | 2016-07-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI682285B (en) | Product, method, and machine readable medium for kvs tree database | |
WO2018064962A1 (en) | Data storage method, electronic device and computer non-volatile storage medium | |
US11169978B2 (en) | Distributed pipeline optimization for data preparation | |
US7228299B1 (en) | System and method for performing file lookups based on tags | |
CN103257993B (en) | Listed files generates method, system and device | |
TW201841122A (en) | Key-value store tree | |
JP6598996B2 (en) | Signature-based cache optimization for data preparation | |
US10417265B2 (en) | High performance parallel indexing for forensics and electronic discovery | |
CN110888837B (en) | Object storage small file merging method and device | |
JP2017504924A (en) | Content-based organization of the file system | |
CN104657459A (en) | Massive data storage method based on file granularity | |
CN106874481B (en) | Method and system for reading metadata information of distributed file system | |
US20070100888A1 (en) | Method and apparatus for managing content file information, and recording medium storing program for performing the method | |
CN102024019B (en) | Suffix tree based catalog organizing method in distributed file system | |
US10642815B2 (en) | Step editor for data preparation | |
JP6598997B2 (en) | Cache optimization for data preparation | |
EP3646133A1 (en) | Systems and methods of creation and deletion of tenants within a database | |
CN105912675A (en) | Batch delete/query method and apparatus for merging small files | |
CN105786916B (en) | Storage method and system for a hierarchical directory based on a large-capacity table | |
AU2018345147B2 (en) | Database processing device, group map file production method, and recording medium | |
CN110795520B (en) | Automatic identification method for association relation between digital geological data packet directory and file | |
CN111045994A (en) | KV database-based file classification retrieval method and system | |
KR101992631B1 (en) | File indexing apparatus and method thereof using asynchronous method | |
JP6006740B2 (en) | Index management device | |
CN117609153A (en) | Electronic archive management method and device based on multi-source heterogeneous |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220128 Address after: 100007 room 205-32, floor 2, building 2, No. 1 and No. 3, qinglonghutong a, Dongcheng District, Beijing Patentee after: Tianyiyun Technology Co.,Ltd. Address before: No.31, Financial Street, Xicheng District, Beijing, 100033 Patentee before: CHINA TELECOM Corp.,Ltd. |