CN108319634A - The directory access method and apparatus of distributed file system - Google Patents
The directory access method and apparatus of distributed file system Download PDFInfo
- Publication number
- CN108319634A CN108319634A CN201711347711.6A CN201711347711A CN108319634A CN 108319634 A CN108319634 A CN 108319634A CN 201711347711 A CN201711347711 A CN 201711347711A CN 108319634 A CN108319634 A CN 108319634A
- Authority
- CN
- China
- Prior art keywords
- directory
- client
- memory cache
- file
- specified
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000012217 deletion Methods 0.000 claims description 14
- 230000037430 deletion Effects 0.000 claims description 14
- 230000001934 delay Effects 0.000 claims 1
- 238000003306 harvesting Methods 0.000 claims 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
Abstract
本发明提供一种分布式文件系统的目录访问方法和装置,该方法包括:步骤10:客户端根据用户操作,生成对分布式文件系统的指定目录访问请求;步骤11:判断客户端是否存在指定目录的第一内存缓存,若存在指定目录的第一内存缓存,则执行步骤12;步骤12:在客户端载入指定目录的第一内存缓存中的目录项;步骤13:继续判断客户端是否存在本地目录文件,本地目录文件用以保存指定目录下的目录项,若存在本地目录文件,则执行步骤14;步骤14:在客户端载入指定目录文件中的目录项。本发明的分布式文件系统的目录访问方法和装置,可以提高少量目录或文件和百万级别以上目录或文件的访问速度,改善了客户端的用户体验。
The present invention provides a directory access method and device for a distributed file system. The method includes: step 10: the client generates a specified directory access request for the distributed file system according to user operations; step 11: judges whether the client has specified The first memory cache of the directory, if there is the first memory cache of the specified directory, then perform step 12; step 12: load the directory item in the first memory cache of the specified directory on the client; step 13: continue to determine whether the client There is a local directory file, and the local directory file is used to save the directory items in the specified directory. If there is a local directory file, perform step 14; step 14: load the directory items in the specified directory file on the client. The directory access method and device of the distributed file system of the present invention can increase the access speed of a small number of directories or files and more than one million directories or files, and improve the user experience of the client.
Description
技术领域technical field
本发明涉及计算机领域,特别涉及一种分布式文件系统的目录访问方法和装置。The invention relates to the computer field, in particular to a directory access method and device for a distributed file system.
背景技术Background technique
目前,大规模的分布式文件系统能提供PB级甚至是EB级的数据存储,存储空间不再是存储技术的瓶颈。理论上,分布式文件系统可以提供无以计数的文件数目。At present, large-scale distributed file systems can provide PB-level or even EB-level data storage, and storage space is no longer the bottleneck of storage technology. In theory, a distributed file system can provide an uncountable number of files.
另一方面,如果分布式文件系统中的一个目录包含了上百万个子文件或子目录,客户端通过网络访问该目录时,由于客户端内存无法缓存上百万以上的子文件或子目录,因此往往会造成客户端内存不足或内存耗尽,轻则影响用户访问速度,严重时导致客户端死机,影响用户使用体验。On the other hand, if a directory in the distributed file system contains millions of sub-files or sub-directories, when the client accesses the directory through the network, since the client memory cannot cache more than one million sub-files or sub-directories, Therefore, the memory of the client is often insufficient or the memory is exhausted, which affects the user's access speed in the slightest, and causes the client to crash in severe cases, affecting the user experience.
针对上述问题,目前尚未提出有效的解决方案。For the above problems, no effective solution has been proposed yet.
发明内容Contents of the invention
有鉴于此,本发明提供一种分布式文件系统的目录访问方法和装置,解决客户端访问大量文件或目录时速度慢或死机的问题。In view of this, the present invention provides a directory access method and device of a distributed file system to solve the problem of slow speed or crash when a client accesses a large number of files or directories.
本发明提供一种分布式文件系统的目录访问方法,该方法包括:The present invention provides a directory access method of a distributed file system, the method comprising:
步骤10:客户端根据用户操作,生成对分布式文件系统的指定目录访问请求;Step 10: The client generates a specified directory access request to the distributed file system according to the user operation;
步骤11:判断客户端是否存在指定目录的第一内存缓存,若存在指定目录的第一内存缓存,则执行步骤12;Step 11: Determine whether the client has the first memory cache of the specified directory, and if there is the first memory cache of the specified directory, then perform step 12;
步骤12:在客户端载入指定目录的第一内存缓存中的目录项;Step 12: Load the directory items in the first memory cache of the specified directory on the client;
步骤13:继续判断客户端是否存在本地目录文件,本地目录文件用以保存指定目录下的目录项,若存在本地目录文件,则执行步骤14;Step 13: continue to determine whether the client has a local directory file, which is used to save the directory items in the specified directory, and if there is a local directory file, then perform step 14;
步骤14:在客户端载入指定目录文件中的目录项。Step 14: Load the directory items in the specified directory file on the client side.
本发明还提供一种分布式文件系统的目录访问装置,该装置包括:The present invention also provides a directory access device of a distributed file system, the device comprising:
用户请求生成模块:客户端根据用户操作,生成对分布式文件系统的指定目录访问请求;User request generation module: the client generates a specified directory access request to the distributed file system according to user operations;
比较模块:判断客户端是否存在指定目录的第一内存缓存,若存在指定目录的第一内存缓存,则执行缓存载入模;Comparison module: judging whether the first memory cache of the specified directory exists in the client, and if there is the first memory cache of the specified directory, execute the cache loading module;
缓存载入模块:在客户端载入指定目录的第一内存缓存中的目录项;Cache loading module: load the directory items in the first memory cache of the specified directory on the client;
比较模块1:继续判断客户端是否存在本地目录文件,本地目录文件用以保存指定目录下的目录项,若存在本地目录文件,则执行文件载入模块;Comparison module 1: continue to judge whether there is a local directory file in the client, the local directory file is used to save the directory items in the specified directory, if there is a local directory file, execute the file loading module;
文件载入模块:在客户端载入指定目录文件中的目录项。File loading module: load the directory items in the specified directory file on the client side.
在本发明中,每个指定目录的目录项存储分为两部分,目录项小于第一预设目录项(如10万)的部分缓存于客户端的指定目录的第一内存缓存中;目录项大于第一预设目录项(如10万)的部分存储于客户端本地磁盘的文件中。当用户通过客户端访问分布式文件系统服务端的目录时,通过载入缓存中相应的目录项和/或本地存储的目录项,不仅可以提高访问目录中子文件项或子目录项的速度,同时本申请的客户端限定客户端内存中部分内存即第一内存缓存用于保存客户端从分布式文件系统服务端下载的指定目录的目录项,不影响客户端系统的整体运行,也不会带来访问大量文件或目录时带来的内存不足或耗尽的问题。In the present invention, the directory entry storage of each specified directory is divided into two parts, and the directory entry is smaller than the first preset directory entry (such as 100,000) and cached in the first memory cache of the specified directory of the client; Part of the first default directory entry (eg, 100,000) is stored in a file on the local disk of the client. When the user accesses the directory of the distributed file system server through the client, by loading the corresponding directory item in the cache and/or the directory item stored locally, it can not only improve the speed of accessing sub-file items or sub-directory items in the directory, but also The client of this application limits part of the memory in the client memory, that is, the first memory cache, to save the directory items of the specified directory downloaded by the client from the distributed file system server, which will not affect the overall operation of the client system and will not bring Insufficient or exhausted memory problems caused by accessing a large number of files or directories.
本发明分布式文件系统的目录访问方法和装置不仅提高了少量文件数目的目录访问速度,也提高了百万级别以上文件目录的访问及查询速度,改善了客户端的用户体验。The directory access method and device of the distributed file system of the present invention not only improve the directory access speed of a small number of files, but also improve the access and query speed of file directories with more than one million levels, and improve the user experience of the client.
附图说明Description of drawings
图1为本发明分布式文件系统的目录访问方法的第一实施例;Fig. 1 is the first embodiment of the directory access method of the distributed file system of the present invention;
图2为本发明分布式文件系统的目录访问方法的第二实施例;Fig. 2 is the second embodiment of the directory access method of the distributed file system of the present invention;
图3为本发明分布式文件系统的目录访问方法的第三实施例;Fig. 3 is the third embodiment of the directory access method of the distributed file system of the present invention;
图4为本发明分布式文件系统的目录访问方法的第四实施例;Fig. 4 is the fourth embodiment of the directory access method of the distributed file system of the present invention;
图5为本发明分布式文件系统的目录访问装置的结构图。Fig. 5 is a structural diagram of the directory access device of the distributed file system of the present invention.
具体实施方式Detailed ways
为了使本发明的目的、技术方案和优点更加清楚,下面结合附图和具体实施例对本发明进行详细描述。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.
在本发明中客户端访问分布式文件系统服务端时获取的目录项,在客户端的存储分为两部分,目录项小于第一预设目录项(如10万)的部分缓存于客户端的指定目录的第一内存缓存中;目录项大于第一预设目录项(如10万)的部分存储于客户端本地磁盘的文件中。In the present invention, the directory entry obtained when the client accesses the distributed file system server is divided into two parts in the storage of the client, and the part of the directory entry smaller than the first preset directory entry (such as 100,000) is cached in the specified directory of the client In the first memory cache; the part of the directory entry greater than the first preset directory entry (eg, 100,000) is stored in a file on the client's local disk.
基于此,提出本发明的分布式文件系统的目录访问方法,如图1所示,该方法包括以下步骤:Based on this, the directory access method of the distributed file system of the present invention is proposed, as shown in Figure 1, the method comprises the following steps:
步骤10(S101):客户端根据用户操作,生成对分布式文件系统的指定目录访问请求。Step 10 (S101): The client generates an access request to a specified directory of the distributed file system according to user operations.
在步骤10中,用户操作可以是对指定目录的鼠标点击操作。In step 10, the user operation may be a mouse click operation on a specified directory.
步骤11(S102):判断客户端是否存在指定目录的第一内存缓存,若存在指定目录的第一内存缓存,则执行步骤12;Step 11 (S102): Determine whether the client has the first memory cache of the specified directory, and if there is the first memory cache of the specified directory, then perform step 12;
步骤12(S103):在客户端载入指定目录的第一内存缓存中的目录项;Step 12 (S103): Load the directory item in the first memory cache of the specified directory on the client;
步骤13(S104):继续判断客户端是否存在本地目录文件,本地目录文件用以保存指定目录下的目录项,若存在本地目录文件,则执行步骤14;Step 13 (S104): continue to judge whether there is a local directory file in the client, the local directory file is used to save the directory items under the specified directory, if there is a local directory file, then perform step 14;
步骤14(S105):在客户端载入指定目录文件中的目录项。Step 14 (S105): Load the directory items in the specified directory file on the client side.
在图1的方法中,本地目录文件用以保存指定目录下的目录项,为了便于搜索,可将本地目录文件保存于预设的存储位置,比如某一个文件夹中。每个本地目录文件都只存放某一指定目录下的目录项,不同的指定目录具有不同的本地目录文件。进一步地,为了便于定位指定本地目录文件,本地目录文件的文件名可以是指定目录的链接信息,如指定目录的全目录信息为文件名,或以指定目录inode(索引结点)为文件名。In the method shown in FIG. 1, the local directory file is used to save the directory items in the specified directory. In order to facilitate searching, the local directory file can be saved in a preset storage location, such as a certain folder. Each local directory file only stores directory items in a specified directory, and different specified directories have different local directory files. Further, in order to locate the specified local directory file, the file name of the local directory file can be the link information of the specified directory, such as the full directory information of the specified directory as the file name, or the specified directory inode (index node) as the file name.
可选地,步骤12还包括:将第一内存缓存中指定目录下的目录项细分为X个目录组,客户端依次载入X个目录组下的目录项。Optionally, step 12 further includes: subdividing the directory items under the specified directory in the first memory cache into X directory groups, and the client sequentially loads the directory items under the X directory groups.
可选地,步骤14还包括:将本地目录文件中包含的目录项细分为Y个目录组,客户端依次载入Y个目录组下的目录项。Optionally, step 14 further includes: subdividing the directory items included in the local directory file into Y directory groups, and the client loads the directory items under the Y directory groups sequentially.
当用户通过客户端访问分布式文件系统的目录时,通过载入缓存中相应的目录项和/或本地存储的目录项,不仅可以提高访问目录中子文件项或子目录项的速度。同时,由于本申请的客户端限定客户端内存中部分内存即第一内存缓存用于保存客户端从分布式文件系统下载的目录项,不影响客户端系统的整体运行,也不会带来访问大量文件或目录时带来的内存不足或耗尽的问题。When the user accesses the directory of the distributed file system through the client, by loading the corresponding directory item in the cache and/or the directory item stored locally, not only the speed of accessing the sub-file items or sub-directory items in the directory can be improved. At the same time, because the client of this application limits part of the memory in the client memory, that is, the first memory cache, to save the directory items downloaded by the client from the distributed file system, it will not affect the overall operation of the client system, nor will it bring access Insufficient or exhausted memory problems caused by a large number of files or directories.
进一步地,如果指定目录的第一内存缓存在预设的时间内未更新,则清空指定目录的第一内存缓存。Further, if the first memory cache of the specified directory is not updated within a preset time, the first memory cache of the specified directory is cleared.
因用户访问不同的指定目录时,在内存中会对应生成不同指定目录的第一内存缓存,采用“内存自动清除”策略,可避免生成越来越多的第一内存缓存不断地消耗客户端内存,确保客户端系统的运行速度。此外,“内存自动清除”也可确保图1中第一内存缓存或本地目录文件保存的目录项是最新时刻的目录项,只需载入客户端保存的目录项,而无需访问分布式文件系统的服务器获取最新的目录项信息,节省网络流量,提供用户访问速度。When users access different designated directories, the first memory caches of different designated directories will be correspondingly generated in the memory. Using the "memory automatic clearing" strategy can avoid generating more and more first memory caches and continuously consuming client memory , to ensure the running speed of the client system. In addition, "memory automatic clearing" can also ensure that the directory entries saved in the first memory cache or local directory file in Figure 1 are the directory entries at the latest moment, and only need to load the directory entries saved by the client without accessing the distributed file system The server obtains the latest directory item information, saves network traffic, and improves user access speed.
预设时间,可根据经验或用户需求或客户端的系统内存设定,如30秒或1分钟。The preset time can be set according to experience or user needs or the system memory of the client, such as 30 seconds or 1 minute.
进一步地,如图2所示,图1的步骤11还包括:若不存在指定目录的第一内存缓存,则执行步骤21;Further, as shown in FIG. 2, step 11 in FIG. 1 also includes: if there is no first memory cache of the specified directory, then execute step 21;
步骤21(S201):客户端将指定目录访问请求发送给分布式文件系统的服务端;Step 21 (S201): the client sends the specified directory access request to the server of the distributed file system;
步骤22(S202):分布式文件系统的服务端根据指定目录访问请求读取指定目录下的目录项,将目录项细分为N个目录组发送给客户端;Step 22 (S202): The server of the distributed file system reads the directory entries under the specified directory according to the specified directory access request, subdivides the directory entries into N directory groups and sends them to the client;
步骤23(S203):判断当前指定目录的第二内存缓存中的目录项总数是否大于等于第一预设目录项,如果是,则执行步骤24,如果否,则执行步骤25;Step 23 (S203): judging whether the total number of directory entries in the second memory cache of the currently designated directory is greater than or equal to the first preset directory entry, if yes, then perform step 24, if not, then perform step 25;
步骤24(S204):客户端将接收到的第n个目录组的目录项更新到客户端的本地目录文件中,n=1,2…N,客户端载入第n个目录组的目录项(显示第n个目录组的目录项),返回步骤23,直至N个目录组接收完毕;Step 24 (S204): The client updates the received directory entry of the nth directory group into the client's local directory file, n=1, 2...N, and the client loads the directory entry of the nth directory group ( Display the directory entry of the nth directory group), return to step 23, until the N directory groups have been received;
步骤25(S205):客户端将接收到的第n个目录组的目录项保存到指定目录的第二内存缓存的末尾,客户端载入第n个目录组的目录项(显示第n个目录组的目录项),返回步骤23,直至N个目录组接收完毕;Step 25 (S205): the client saves the received directory entry of the nth directory group to the end of the second memory cache of the specified directory, and the client loads the directory entry of the nth directory group (displaying the nth directory group's directory entry), return to step 23, until the N directory groups have been received;
步骤26(S206):将指定目录的第二内存缓存中的目录项保存到指定目录的第一内存缓存后,清空第二内存缓存。Step 26 (S206): After saving the directory entries in the second memory cache of the specified directory to the first memory cache of the specified directory, clear the second memory cache.
进一步地,考虑到所有的目录项都接收完毕后,第二内存缓存中的目录项可能会大于第一预设目录项,如图3所示,步骤26还可以包括以下步骤:Further, considering that after all the directory entries are received, the directory entries in the second memory cache may be larger than the first preset directory entries, as shown in Figure 3, step 26 may also include the following steps:
步骤27(S207):判断当前指定目录的第二内存缓存中的目录项总数是否大于第一预设目录项,如果是,则执行步骤28,如果否,则执行步骤29;Step 27 (S207): judging whether the total number of directory entries in the second memory cache of the currently designated directory is greater than the first preset directory entry, if yes, then perform step 28, if not, then perform step 29;
步骤28(S208):将指定目录的第二内存缓存中的前第一预设目录项保存到指定目录的第一内存缓存中,将指定目录的第二内存缓存中的前第一预设目录项之外的其他目录项更新到指定目录的本地目录文件中,清空第二内存缓存;Step 28 (S208): Save the previous first preset directory item in the second memory cache of the specified directory to the first memory cache of the specified directory, and save the previous first preset directory item in the second memory cache of the specified directory The other directory items other than the item are updated to the local directory file of the specified directory, and the second memory cache is cleared;
步骤29(S209):将指定目录的第二内存缓存中的目录项保存到指定目录的第一内存缓存后,清空第二内存缓存。Step 29 (S209): After saving the directory entries in the second memory cache of the specified directory to the first memory cache of the specified directory, clear the second memory cache.
进一步地,在图2的步骤24中,客户端将接收到的第n个目录组的目录项更新到客户端的本地目录文件中包括:判断是否存在本地目录文件,如果没有,则新建本地目录文件,并将第n个目录组的目录项保存到客户端的新建本地目录文件中;如果存在本地目录文件,则继续判断本地目录文件的更新时间为上一次更新还是本次更新,如果是上一次更新,则将客户端的本地目录文件的原内容替换为第n个目录组的目录项,如果是本次更新,将第n个目录组的目录项追加到客户端的本地目录文件中。Further, in step 24 of FIG. 2 , updating the directory entry of the received nth directory group to the local directory file of the client by the client includes: judging whether there is a local directory file, and if not, creating a new local directory file , and save the directory entry of the nth directory group to the newly created local directory file of the client; if there is a local directory file, continue to judge whether the update time of the local directory file is the last update or this update, if it is the last update , then replace the original content of the client’s local directory file with the directory entry of the nth directory group, if it is this update, append the directory entry of the nth directory group to the client’s local directory file.
判断本地目录文件的更新为上一次更新还是本次更新,可以根据本地目录文件的更新时间戳与当前时间的比较来判断,如果两个时间非常接近,比如小于1分钟,则为本次更新,否则为上一次更新。To determine whether the update of the local directory file is the last update or this update, you can judge by comparing the update timestamp of the local directory file with the current time. If the two times are very close, such as less than 1 minute, it is this update. Otherwise it is the last update.
在图2的方法中,服务器可以根据指定目录下的目录项总数,确定N的取值,例如:如果目录项总数小于20,可以令N=1,如果目录项总数大于1000,可以令N=100。In the method of Fig. 2, the server can determine the value of N according to the total number of directory entries under the specified directory, for example: if the total number of directory entries is less than 20, N=1 can be set; if the total number of directory entries is greater than 1000, N= 100.
图2为当用户通过客户端访问分布式文件系统时,不存在指定目录的第一内存缓存,则说明用户可能是首次访问该指定目录,或者之前访问的指定目录的第一内存缓存已被“自动清除”,此时,只能通过连接分布式文件系统服务端获取该指定目录下的目录项。Figure 2 shows that when the user accesses the distributed file system through the client, there is no first memory cache of the specified directory, which means that the user may be accessing the specified directory for the first time, or the first memory cache of the specified directory previously accessed has been " Automatically clear", at this time, the directory items in the specified directory can only be obtained by connecting to the distributed file system server.
在图2的方法中,分布式文件系统将获取的目录项划分为N个目录组,以目录组为单元将指定目录下的目录项发送给客户端,客户每接收一个目录组,先将该目录组的目录项保存到第二内存缓存中,接收完毕后,再将第二内存缓存中的目录项转移到第一内存缓存中。第二内存缓存可以是后台缓存,相应地,第一内存缓存可以为前台缓存。本申请的第一(二)内存缓存的目录项设置了目录项上限为第一预设目录项,第一预设目录项可以是第一内存缓存能存放的最大目录项,例如第一预设目录项为10万项。In the method shown in Figure 2, the distributed file system divides the obtained directory items into N directory groups, and sends the directory items in the specified directory to the client with the directory group as a unit. The directory entries of the directory group are stored in the second memory cache, and after receiving, the directory entries in the second memory cache are transferred to the first memory cache. The second memory cache may be a background cache, and correspondingly, the first memory cache may be a front cache. The directory entry of the first (second) memory cache of the present application sets the upper limit of the directory entry as the first default directory entry, and the first default directory entry can be the largest directory entry that the first memory cache can store, such as the first default Catalog items are 100,000 items.
图2的方法,将目录项划分为N个目录组,可以实现边接收边刷新客户端载入的目录项,如此不必等接收完整个目录项后,再载入给用户,可提高用户体验。The method in Fig. 2 divides the directory items into N directory groups, which can refresh the directory items loaded by the client while receiving, so that it is not necessary to wait for the entire directory item to be received before loading it to the user, which can improve user experience.
在上述方法中,无论客户端本地读取目录项(第一内存缓存或本地目录文件),还是分布式文件系统服务端读取目录项,均可以通过构建迭代器实现分次读取。In the above method, regardless of whether the client reads the directory item locally (the first memory cache or the local directory file), or the distributed file system server reads the directory item, it can be read in multiples by constructing an iterator.
例如,设定迭代器每次读取的目录项个数为一固定数值,可根据需求更改,假设为128个。每次读完128个文件后迭代器的位置会记录当前读到缓存容器中的目录项的位置(offset,对应目录项序号)及名字(name,对应目录项名称)。下次再读128个目录项时直接使用offset或name定位迭代器位置,避免重新遍历整个容器,加快目录项的读取。For example, the number of directory entries read by the iterator each time is set to a fixed value, which can be changed according to requirements, and it is assumed to be 128. After reading 128 files each time, the position of the iterator will record the position (offset, corresponding to the serial number of the directory item) and name (name, corresponding to the name of the directory item) of the directory item currently read in the cache container. When reading 128 directory items next time, directly use offset or name to locate the iterator position, avoid retraversing the entire container, and speed up the reading of directory items.
每次读完128个目录项后都会保存好offset和name,即使网络出现异常时造成读取中断,也可以通过保存的offset和name重新定位迭代器的位置,保证不会少读或者错读数据。同时,即便最后一个目录项已经被删除,也可通过offset和name定位到迭代器位置,确保迭代读取数据准确。The offset and name will be saved after reading 128 directory items each time. Even if the reading is interrupted when the network is abnormal, the position of the iterator can be relocated through the saved offset and name to ensure that the data will not be read less or misread. . At the same time, even if the last directory item has been deleted, the iterator position can be located by offset and name to ensure that the iterated read data is accurate.
如图4所示,本申请分布式文件系统目录访问方法还包括:As shown in Figure 4, the distributed file system directory access method of the present application also includes:
步骤30(S301):客户端在分布式文件系统中创建新文件时,将新文件的目录项增加到新文件父目录的第一内存缓存中;Step 30 (S301): When the client creates a new file in the distributed file system, add the directory entry of the new file to the first memory cache of the parent directory of the new file;
步骤31(S302):对新文件父目录的第一内存缓存中的目录项进行排序,排序的方法与分布式系统服务端的目录项的排序方法一致;Step 31 (S302): Sorting the directory items in the first memory cache of the parent directory of the new file, the sorting method is consistent with the sorting method of the directory items of the distributed system server;
步骤32(S303):判断新文件父目录的第一内存缓存中的目录项个数是否大于第一预设目录项和第二预设目录项之和,如果是,则执行步骤33;Step 32 (S303): judging whether the number of directory entries in the first memory cache of the parent directory of the new file is greater than the sum of the first preset directory entry and the second preset directory entry, if yes, then perform step 33;
步骤33(S304):将新文件父目录的第一内存缓存中的超出目录项删除;超出目录项为新文件父目录的第一内存缓存中超出第一预设目录项的目录项。Step 33 (S304): Delete the excess directory entry in the first memory cache of the parent directory of the new file; the excess directory entry is a directory entry exceeding the first preset directory entry in the first memory cache of the new file parent directory.
在图3的步骤31中,客户端和分布式文件系统使用相同的排序算法,是为了保证客户端包括的目录项的读取顺序与分布式文件系统读取目录项时顺利一致,显示时,显示内容一致。In step 31 of FIG. 3 , the client and the distributed file system use the same sorting algorithm to ensure that the reading order of the directory items included in the client is consistent with that when the distributed file system reads the directory items. When displayed, The displayed content is consistent.
在图3的步骤32中,第二预设目录项,是为了避免每增加一个文件就要执行步骤33一次,假设第一预设目录项为10万,第二预设目录项为280,当指定目录的第一内存缓存中的目录项超出100280个后,才执行步骤33一次,集中处理可避免占用资源,提高效率。In step 32 of Fig. 3, the second default directory entry is to avoid step 33 once for every additional file, assuming that the first default directory entry is 100,000, and the second default directory entry is 280, when Step 33 is executed only once after the number of directory entries in the first memory cache of the designated directory exceeds 100280, and centralized processing can avoid resource occupation and improve efficiency.
此外,本申请分布式文件系统目录访问方法还包括:In addition, the distributed file system directory access method of the present application also includes:
步骤40:客户端将分布式文件系统中的文件删除时,判断删除文件对应的目录项是否保存在客户端的删除文件父目录的第一内存缓存中,如果是,将删除文件父目录的第一内存缓存中的删除文件对应的目录项删除。Step 40: When the client deletes a file in the distributed file system, determine whether the directory entry corresponding to the deleted file is stored in the first memory cache of the parent directory of the deleted file on the client, and if so, delete the first memory cache of the parent directory of the file. The directory entry corresponding to the deleted file in the memory cache is deleted.
本发明分布式文件系统目录访问方法和装置不仅提高了少量文件数目的目录访问速度,也提高了百万级别以上文件目录的访问及查询速度,改善了客户端的用户体验。The directory access method and device of the distributed file system of the present invention not only improve the directory access speed of a small number of files, but also improve the access and query speed of file directories of more than one million levels, and improve the user experience of the client.
本发明还提供一种分布式文件系统的目录访问装置,如图5所示该装置包括以下模块:The present invention also provides a directory access device of a distributed file system, as shown in Figure 5, the device includes the following modules:
用户请求生成模块:客户端根据用户操作,生成对分布式文件系统的指定目录访问请求;User request generation module: the client generates a specified directory access request to the distributed file system according to user operations;
比较模块:判断客户端是否存在指定目录的第一内存缓存,若存在指定目录的第一内存缓存,则执行缓存载入模;Comparison module: judging whether the first memory cache of the specified directory exists in the client, and if there is the first memory cache of the specified directory, execute the cache loading module;
缓存载入模块:在客户端载入指定目录的第一内存缓存中的目录项;Cache loading module: load the directory items in the first memory cache of the specified directory on the client;
比较模块1:继续判断客户端是否存在本地目录文件,本地目录文件用以保存指定目录下的目录项,若存在本地目录文件,则执行文件载入模块;Comparison module 1: continue to judge whether there is a local directory file in the client, the local directory file is used to save the directory items in the specified directory, if there is a local directory file, execute the file loading module;
文件载入模块:在客户端载入指定目录文件中的目录项。File loading module: load the directory items in the specified directory file on the client side.
可选地,如果指定目录的第一内存缓存在预设的时间内未更新,则清空指定目录的第一内存缓存。Optionally, if the first memory cache of the specified directory is not updated within a preset time, the first memory cache of the specified directory is cleared.
可选地,在图5中比较模块还包括:若不存在指定目录的第一内存缓存,则执行用户请求发送模块;Optionally, the comparison module in FIG. 5 also includes: if there is no first memory cache of the specified directory, execute the user request sending module;
用户请求发送模块:客户端将指定目录访问请求发送给分布式文件系统的服务端;User request sending module: the client sends the specified directory access request to the server of the distributed file system;
服务端目录模块:分布式文件系统的服务端根据指定目录访问请求读取指定目录下的目录项,将目录项细分为N个目录组发送给客户端;Server-side directory module: the server of the distributed file system reads the directory items under the specified directory according to the specified directory access request, subdivides the directory items into N directory groups and sends them to the client;
比较模块2:判断当前指定目录的第二内存缓存中的目录项个数是否大于等于第一预设目录项,如果是,则执行本地目录文件更新模块,如果否,则执行第二缓存更新模块;Comparison module 2: judging whether the number of directory entries in the second memory cache of the currently specified directory is greater than or equal to the first preset directory entry, if yes, execute the local directory file update module, if not, execute the second cache update module ;
本地目录文件更新模块:客户端将接收到的第n个目录组的目录项更新到客户端的本地目录文件中,n=1,2…N,客户端载入第n个目录组的目录项,返回比较模块2,直至N个目录组接收完毕;Local directory file update module: the client updates the received directory entry of the nth directory group to the local directory file of the client, n=1, 2...N, the client loads the directory entry of the nth directory group, Return to the comparison module 2 until the N directory groups are received;
第二缓存更新模块:客户端将接收到的第n个目录组的目录项保存到指定目录的第二内存缓存的末尾,客户端载入第n个目录组的目录项,返回比较模块2,直至N个目录组接收完毕;The second cache update module: the client saves the received directory entry of the nth directory group to the end of the second memory cache of the specified directory, the client loads the directory entry of the nth directory group, and returns to the comparison module 2, Until N directory groups are received;
第一缓存更新模块:将指定目录的第二内存缓存中的目录项保存到指定目录的第一内存缓存后,清空所述第二内存缓存。The first cache update module: after saving the directory entries in the second memory cache of the specified directory to the first memory cache of the specified directory, clear the second memory cache.
可选地,第一缓存更新模块包括:比较模块3、第一缓存更新模块1和第一缓存更新模块2。Optionally, the first cache update module includes: a comparison module 3 , a first cache update module 1 and a first cache update module 2 .
比较模块3:判断当前指定目录的第二内存缓存中的目录项总数是否大于第一预设目录项,如果是则执行第一缓存更新模块1,如果否,则执行第一缓存更新模块2。Comparison module 3: judging whether the total number of directory entries in the second memory cache of the currently specified directory is greater than the first preset directory entry, if yes, execute the first cache update module 1, if not, execute the first cache update module 2.
第一缓存更新模块1:将指定目录的第二内存缓存中的前第一预设目录项保存到指定目录的第一内存缓存中,将指定目录的第二内存缓存中的前第一预设目录项之外的其他目录项更新到指定目录的本地目录文件中,清空第二内存缓存。First cache update module 1: save the previous first preset directory item in the second memory cache of the specified directory to the first memory cache of the specified directory, and save the previous first preset directory item in the second memory cache of the specified directory The directory entries other than the directory entry are updated to the local directory file of the specified directory, and the second memory cache is cleared.
第一缓存更新模块2:将指定目录的第二内存缓存中的目录项保存到指定目录的第一内存缓存后,清空第二内存缓存。The first cache updating module 2: After saving the directory items in the second memory cache of the specified directory to the first memory cache of the specified directory, clear the second memory cache.
可选地,图5的装置还包括:Optionally, the device in Figure 5 also includes:
新建文件模块:客户端在分布式文件系统中创建新文件时,将新文件的目录项增加到新文件父目录的第一内存缓存中;New file module: when the client creates a new file in the distributed file system, the directory entry of the new file is added to the first memory cache of the parent directory of the new file;
目录项排序模块:对新文件父目录的第一内存缓存中的目录项进行排序,排序的方法与分布式系统服务端的目录项的排序方法一致;Directory item sorting module: sort the directory items in the first memory cache of the parent directory of the new file, and the sorting method is consistent with the sorting method of the directory items of the distributed system server;
比较模块4:判断新文件父目录的第一内存缓存中的目录项个数是否大于第一预设目录项和第二预设目录项之和,如果是,则执行步骤43;Comparison module 4: judging whether the number of directory entries in the first memory cache of the parent directory of the new file is greater than the sum of the first preset directory entry and the second preset directory entry, if yes, perform step 43;
删除模块1:将新文件父目录的第一内存缓存中的超出目录项删除;超出目录项为新文件父目录的第一内存缓存中超出第一预设目录项的目录项。Deletion module 1: delete the excess directory entry in the first memory cache of the parent directory of the new file; the excess directory entry is the directory entry in the first memory cache of the new file parent directory that exceeds the first preset directory entry.
可选地,图5的装置还包括:Optionally, the device in Figure 5 also includes:
删除模块2:客户端将分布式文件系统中的文件删除时,判断删除文件对应的目录项是否保存在客户端的删除文件父目录的第一内存缓存中,如果是,将删除文件父目录的第一内存缓存中的删除文件对应的目录项删除。Deletion module 2: When the client deletes a file in the distributed file system, it judges whether the directory entry corresponding to the deleted file is stored in the first memory cache of the parent directory of the deleted file on the client, and if so, deletes the first memory cache of the parent directory of the file. A directory entry corresponding to the deleted file in the memory cache is deleted.
可选地,在本发明的分布式文件系统的目录访问装置中,本地目录文件以指定目录inode为文件名。Optionally, in the directory access device of the distributed file system of the present invention, the local directory file uses the specified directory inode as the file name.
需要说明的是,本发明的分布式文件系统的目录访问装置的实施例,与分布式文件系统的目录访问方法的实施例原理相同,相关之处可以互相参照。It should be noted that the embodiment of the directory access device of the distributed file system of the present invention has the same principle as the embodiment of the directory access method of the distributed file system, and relevant parts can be referred to each other.
以上所述仅为本发明的较佳实施例而已,并不用以限定本发明的包含范围,凡在本发明技术方案的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the technical solutions of the present invention are Should be included within the protection scope of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711347711.6A CN108319634B (en) | 2017-12-15 | 2017-12-15 | Directory access method and device for distributed file system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711347711.6A CN108319634B (en) | 2017-12-15 | 2017-12-15 | Directory access method and device for distributed file system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108319634A true CN108319634A (en) | 2018-07-24 |
CN108319634B CN108319634B (en) | 2021-08-06 |
Family
ID=62892003
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711347711.6A Active CN108319634B (en) | 2017-12-15 | 2017-12-15 | Directory access method and device for distributed file system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108319634B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110287201A (en) * | 2019-07-02 | 2019-09-27 | 重庆紫光华山智安科技有限公司 | Data access method, device, equipment and storage medium |
CN110781137A (en) * | 2019-10-28 | 2020-02-11 | 柏科数据技术(深圳)股份有限公司 | Directory reading method and device for distributed system, server and storage medium |
CN110781159A (en) * | 2019-10-28 | 2020-02-11 | 柏科数据技术(深圳)股份有限公司 | Ceph directory file information reading method and device, server and storage medium |
CN114048185A (en) * | 2021-11-18 | 2022-02-15 | 北京聚存科技有限公司 | Method for transparently packaging, storing and accessing massive small files in distributed file system |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101697168A (en) * | 2009-10-22 | 2010-04-21 | 中国科学技术大学 | Method and system for dynamically managing metadata of distributed file system |
CN102024020A (en) * | 2010-11-04 | 2011-04-20 | 曙光信息产业(北京)有限公司 | Efficient metadata memory access method in distributed file system |
CN102024019A (en) * | 2010-11-04 | 2011-04-20 | 曙光信息产业(北京)有限公司 | Suffix tree based catalog organizing method in distributed file system |
CN102024017A (en) * | 2010-11-04 | 2011-04-20 | 天津曙光计算机产业有限公司 | Method for traversing directory entries of distribution type file system in repetition-free and omission-free way |
CN102523301A (en) * | 2011-12-26 | 2012-06-27 | 深圳市创新科信息技术有限公司 | Method for caching data on client in cloud storage |
CN102541985A (en) * | 2011-10-25 | 2012-07-04 | 曙光信息产业(北京)有限公司 | Organization method of client directory cache in distributed file system |
CN102955808A (en) * | 2011-08-26 | 2013-03-06 | 腾讯科技(深圳)有限公司 | Data acquisition method and distributed file system |
CN103150394A (en) * | 2013-03-25 | 2013-06-12 | 中国人民解放军国防科学技术大学 | Distributed file system metadata management method facing to high-performance calculation |
CN103338242A (en) * | 2013-06-20 | 2013-10-02 | 华中科技大学 | Hybrid cloud storage system and method based on multi-level cache |
US20140214889A1 (en) * | 2013-01-30 | 2014-07-31 | Hewlett-Packard Development Company, L.P. | Anticipatorily Retrieving Information In Response To A Query Of A Directory |
US8805901B1 (en) * | 2011-07-19 | 2014-08-12 | Google Inc. | Geographically distributed file system |
CN104008152A (en) * | 2014-05-21 | 2014-08-27 | 华南理工大学 | Distributed file system architectural method supporting mass data access |
CN105095785A (en) * | 2014-05-22 | 2015-11-25 | 中兴通讯股份有限公司 | File access processing method, and file access method and device of distributed file system |
CN106686113A (en) * | 2017-01-19 | 2017-05-17 | 郑州云海信息技术有限公司 | A Distributed File System Intelligent Pre-reading Implementation Method |
-
2017
- 2017-12-15 CN CN201711347711.6A patent/CN108319634B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101697168A (en) * | 2009-10-22 | 2010-04-21 | 中国科学技术大学 | Method and system for dynamically managing metadata of distributed file system |
CN102024020A (en) * | 2010-11-04 | 2011-04-20 | 曙光信息产业(北京)有限公司 | Efficient metadata memory access method in distributed file system |
CN102024019A (en) * | 2010-11-04 | 2011-04-20 | 曙光信息产业(北京)有限公司 | Suffix tree based catalog organizing method in distributed file system |
CN102024017A (en) * | 2010-11-04 | 2011-04-20 | 天津曙光计算机产业有限公司 | Method for traversing directory entries of distribution type file system in repetition-free and omission-free way |
US8805901B1 (en) * | 2011-07-19 | 2014-08-12 | Google Inc. | Geographically distributed file system |
CN102955808A (en) * | 2011-08-26 | 2013-03-06 | 腾讯科技(深圳)有限公司 | Data acquisition method and distributed file system |
CN102541985A (en) * | 2011-10-25 | 2012-07-04 | 曙光信息产业(北京)有限公司 | Organization method of client directory cache in distributed file system |
CN102523301A (en) * | 2011-12-26 | 2012-06-27 | 深圳市创新科信息技术有限公司 | Method for caching data on client in cloud storage |
US20140214889A1 (en) * | 2013-01-30 | 2014-07-31 | Hewlett-Packard Development Company, L.P. | Anticipatorily Retrieving Information In Response To A Query Of A Directory |
CN103150394A (en) * | 2013-03-25 | 2013-06-12 | 中国人民解放军国防科学技术大学 | Distributed file system metadata management method facing to high-performance calculation |
CN103338242A (en) * | 2013-06-20 | 2013-10-02 | 华中科技大学 | Hybrid cloud storage system and method based on multi-level cache |
CN104008152A (en) * | 2014-05-21 | 2014-08-27 | 华南理工大学 | Distributed file system architectural method supporting mass data access |
CN105095785A (en) * | 2014-05-22 | 2015-11-25 | 中兴通讯股份有限公司 | File access processing method, and file access method and device of distributed file system |
CN106686113A (en) * | 2017-01-19 | 2017-05-17 | 郑州云海信息技术有限公司 | A Distributed File System Intelligent Pre-reading Implementation Method |
Non-Patent Citations (2)
Title |
---|
XIUQIAO LI 等: "CEFLS: A Cost-Effective File Lookup Service in a Distributed Metadata File System", 《2012 12TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID 2012)》 * |
冯幼乐: "分布式文件系统元数据管理技术研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110287201A (en) * | 2019-07-02 | 2019-09-27 | 重庆紫光华山智安科技有限公司 | Data access method, device, equipment and storage medium |
CN110781137A (en) * | 2019-10-28 | 2020-02-11 | 柏科数据技术(深圳)股份有限公司 | Directory reading method and device for distributed system, server and storage medium |
CN110781159A (en) * | 2019-10-28 | 2020-02-11 | 柏科数据技术(深圳)股份有限公司 | Ceph directory file information reading method and device, server and storage medium |
CN114048185A (en) * | 2021-11-18 | 2022-02-15 | 北京聚存科技有限公司 | Method for transparently packaging, storing and accessing massive small files in distributed file system |
Also Published As
Publication number | Publication date |
---|---|
CN108319634B (en) | 2021-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103150394B (en) | Distributed file system metadata management method facing to high-performance calculation | |
US10831736B2 (en) | Fast multi-tier indexing supporting dynamic update | |
US20200183906A1 (en) | Using an lsm tree file structure for the on-disk format of an object storage platform | |
CN104111804B (en) | A kind of distributed file system | |
CN110321325A (en) | File inode lookup method, terminal, server, system and storage medium | |
US20110191544A1 (en) | Data Storage and Access | |
CN107357896A (en) | Expansion method, device, system and the data base cluster system of data-base cluster | |
US20200081867A1 (en) | Independent evictions from datastore accelerator fleet nodes | |
US20160335243A1 (en) | Webpage template generating method and server | |
CN109947668A (en) | The method and apparatus of storing data | |
CN108319634A (en) | The directory access method and apparatus of distributed file system | |
US11599503B2 (en) | Path name cache for notifications of file changes | |
US20120317339A1 (en) | System and method for caching data in memory and on disk | |
CN109376125A (en) | Metadata storage method, apparatus, device and computer-readable storage medium | |
CN110347651A (en) | Method of data synchronization, device, equipment and storage medium based on cloud storage | |
CN102819586A (en) | Uniform Resource Locator (URL) classifying method and equipment based on cache | |
CN106326239A (en) | Distributed file system and file meta-information management method thereof | |
US20240028466A1 (en) | Storing Namespace Metadata in a Key Value Store to Facilitate Space Efficient Point In Time Snapshots | |
CN111198856A (en) | File management method and device, computer equipment and storage medium | |
US10812322B2 (en) | Systems and methods for real time streaming | |
JP7038864B2 (en) | Search server centralized storage | |
CN107368608A (en) | The HDFS small documents buffer memory management methods of algorithm are replaced based on ARC | |
CN107181773A (en) | Data storage and data managing method, the equipment of distributed memory system | |
EP4124970A1 (en) | Using a caching layer for key-value storage in a database | |
WO2020215580A1 (en) | Distributed global data deduplication method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 518057 Shenzhen Software Park, No. 9, 501, 502, Science and Technology Middle Road, Nanshan District, Shenzhen City, Guangdong Province Applicant after: Shenzhen Innovation Technology Co.,Ltd. Address before: 518057 Shenzhen Software Park, No. 9, 501, 502, Science and Technology Middle Road, Nanshan District, Shenzhen City, Guangdong Province Applicant before: UITSTOR (USA) Inc. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
PP01 | Preservation of patent right |
Effective date of registration: 20241115 Granted publication date: 20210806 |
|
PP01 | Preservation of patent right |