CN101419626B

CN101419626B - An Application-Oriented File System Namespace Management Method

Info

Publication number: CN101419626B
Application number: CN2008102366002A
Authority: CN
Inventors: 冯丹; 施展; 朱春霖; 李志超; 赵恒�; 李勇; 邓聪林
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2008-11-28
Filing date: 2008-11-28
Publication date: 2010-06-09
Anticipated expiration: 2028-11-28
Also published as: CN101419626A

Abstract

The invention relates to an application-oriented file system namespace management method which belongs to the technical field of computer storage and aims at solving the problem that the file application semantics is not included in the metadata extension contents in the existing file systems so as to shorten the metadata access time and improve the application access efficiency. The method comprises three steps, i.e. extracting application semantics, cutting namespace and accessing cut namespace. By extracting the application semantics between the processes of the application program and the file accessed by the application program, the file system namespace that faces the processes of each application program is cut so that the metadata access time of the application program is shortened and the access efficiency is improved, and the method is suitable for the file system with more and more complex namespace.

Description

An Application-Oriented File System Namespace Management Method

技术领域technical field

本发明属于计算机存储技术领域，具体涉及一种面向应用的文件系统名字空间管理方法。The invention belongs to the technical field of computer storage, and in particular relates to an application-oriented file system name space management method.

背景技术Background technique

随着个人电脑市场的增长和存储设备数据存储能力的不断提高，个人数据量变得越来越大。文件系统作为管理用户数据的基础，其名字空间变得越来越复杂，这导致文件系统元数据访问性能变得越来越低。With the growth of the personal computer market and the continuous improvement of the data storage capacity of storage devices, the amount of personal data is becoming larger and larger. As the basis for managing user data, the file system's namespace becomes more and more complex, which leads to lower and lower file system metadata access performance.

在计算机系统中，文件包括元数据和数据两部分，元数据是对文件长度、类型等属性的描述，文件的名字是元数据中的一项，用于标识一个文件；文件系统是文件的集合，文件系统的名字空间即其所包括文件的名字的集合。In a computer system, a file includes two parts: metadata and data. Metadata is a description of attributes such as file length and type. The name of a file is an item in the metadata and is used to identify a file; a file system is a collection of files. , the name space of a file system is the collection of names of the files it includes.

应用程序的进程所访问的名字空间是该进程所访问文件的名字的集合。The namespace accessed by the process of the application program is a collection of names of the files accessed by the process.

文件系统元数据在语义文件系统中占有很重要的地位，许多语义信息都是通过元数据的形式来表示的。当前的文件系统没有给予元数据足够的重视，所以它们不能充分利用文件系统的元数据进行内容搜索，更不能提供一些高级的基于语义的关联式数据存取。目前许多语义文件系统都是通过在元数据中扩展文件属性的方式，以包含多种类型的语义信息，并利用这些扩展的元数据提高文件的检索效率。File system metadata plays an important role in the semantic file system, and many semantic information are expressed in the form of metadata. Current file systems do not pay enough attention to metadata, so they cannot make full use of file system metadata for content search, let alone provide some advanced semantic-based associative data access. At present, many semantic file systems extend file attributes in metadata to include various types of semantic information, and use these extended metadata to improve file retrieval efficiency.

目前语义文件系统中元数据扩展的内容大体可包括如下几类：At present, the content of metadata extension in semantic file system can roughly include the following categories:

1运行状态信息。许多领域都需要运行状态信息，比如应用程序和编译器。应用程序需要对状态信息的短期和长期的存储，比如许可证信息，用户信息，密码，DNS或SMTP服务器，ORB信息等。编译器也会产生有关程序数据类型的信息。1 Running status information. Running state information is needed in many areas, such as applications and compilers. Applications require short-term and long-term storage of state information, such as license information, user information, passwords, DNS or SMTP servers, ORB information, etc. The compiler also produces information about the program's data types.

2数据模型。数据库管理系统，Windows注册表，接口储存库都维护着关于数据结构和模式的信息，例如数据类型，索引，约束，联系和接口等。2 data model. Database management systems, Windows registry, and interface repositories all maintain information about data structures and schemas, such as data types, indexes, constraints, relationships, and interfaces.

3多媒体。像图片，视频，音乐等听觉和视觉信息不能够很容易的从中提取出用于查询的有用格式。因此就需要元数据信息以用来提供相关访问。通常这种信息可能通过人工根据关键字分类数据后输入或者通过图片处理程序提取特征样式后输入。3 multimedia. Auditory and visual information like pictures, videos, music, etc. cannot be easily extracted from them in a useful format for query. Metadata information is therefore required to provide relevant access. Usually, this information may be entered after manually classifying data according to keywords or after extracting feature styles through image processing programs.

虽然当前许多文件系统，比如微软的WinFS、苹果的Spotlight，都提供了某种程度上的语义信息，但相比于一个使用广泛的开放的数据模型方法构建的语义文件系统，它们包含的语义信息都还远远不够。大多数文件系统使用层级目录结构，这种结构仅仅适用于小规模或中等规模名字空间的文件系统。在大规模数据文件系统中，文件的分类和检索将变得十分困难。用户亟需更加有效的组织文件，检索文件的方法。Although many current file systems, such as Microsoft's WinFS and Apple's Spotlight, provide some degree of semantic information, compared to a semantic file system built using a wide range of open data model methods, the semantic information they contain It's not enough. Most filesystems use a hierarchical directory structure, which is only suitable for filesystems with small or medium namespaces. In a large-scale data file system, the classification and retrieval of files will become very difficult. Users urgently need more effective methods for organizing files and retrieving files.

发明内容Contents of the invention

本发明提出一种面向应用的文件系统名字空间管理方法，解决现有文件系统中元数据扩展的内容未包含文件的应用语义的问题，以缩短元数据访问时间并提高应用访问的效率。The invention proposes an application-oriented file system name space management method, which solves the problem that the extended content of metadata in the existing file system does not include the application semantics of the file, so as to shorten the metadata access time and improve the efficiency of application access.

本发明的一种面向应用的文件系统名字空间管理方法，包括下列步骤：An application-oriented file system name space management method of the present invention comprises the following steps:

(1)应用语义提取步骤：预先运行各个应用程序，跟踪各应用程序的进程对文件的访问，并在文件中增加一项记录元数据用来保存访问过该文件的应用程序的进程名；然后根据所有的记录元数据，提取应用语义，每条应用语义描述一个进程和所访问文件之间的关系，并将各应用语义以XML文件的形式保存到数据库中，等待调用；(1) application semantics extraction step: run each application program in advance, track the access of the process of each application program to the file, and add a record metadata in the file to save the process name of the application program that has accessed the file; then According to all the record metadata, extract the application semantics, each application semantics describes the relationship between a process and the accessed file, and save each application semantics in the form of an XML file in the database, waiting to be called;

(2)名字空间裁剪步骤：该步骤被调用时，根据应用程序的进程名从数据库中取出其对应的应用语义，根据应用语义裁剪出与该进程关联的名字空间，将其以DOM树的形式驻留在内存中，然后返回；(2) Namespace clipping step: when this step is called, the corresponding application semantics are retrieved from the database according to the process name of the application, and the namespace associated with the process is clipped out according to the application semantics, and stored in the form of a DOM tree reside in memory, and then return;

(3)裁剪名字空间的访问步骤：实际运行应用程序，在应用程序的进程访问文件时，文件系统将接收到的访问请求重定向到与该进程关联的裁剪的名字空间，重定向成功，则完成访问请求；否则调用名字空间裁剪步骤，再向文件系统发出访问请求。(3) The access steps of clipping the name space: actually run the application program, when the process of the application program accesses the file, the file system redirects the received access request to the clipping name space associated with the process, if the redirection is successful, then Complete the access request; otherwise, call the namespace pruning step, and then send an access request to the file system.

所述的面向应用的文件系统名字空间管理方法，其特征在于：The application-oriented file system name space management method is characterized in that:

(1)所述应用语义提取步骤，顺序进行下述过程：(1) The step of extracting the application semantics, the following process is carried out in sequence:

(1.1)启动各应用程序，当有文件被访问时，顺序进行；(1.1) Start each application program, and when a file is accessed, proceed in sequence;

(1.2)在被访问文件F的记录元数据中添加访问进程的名称，记录元数据的格式为A:B:C:…，其中F为被访问的文件名，A、B、C分别为访问此文件的应用程序的进程名；(1.2) Add the name of the access process to the record metadata of the accessed file F. The format of the record metadata is A:B:C:..., where F is the name of the file to be accessed, and A, B, and C are the access process respectively. The process name of the application for this file;

(1.3)所有应用程序结束，根据所有被访问文件的记录元数据，提取应用语义，其格式为A＝F1:F2:F3:…，其中A为进程名，F1、F2、F3为被访问的文件名；(1.3) All applications end, according to the record metadata of all accessed files, extract the application semantics, the format is A=F1:F2:F3:..., where A is the process name, F1, F2, F3 are the accessed file name;

(1.4)将所提取到的各应用程序的进程的应用语义，写成XML文件，保存到数据库中，结束；(1.4) Write the application semantics of the process of each application program extracted into an XML file, save it in the database, and end;

(2)所述名字空间裁剪步骤，顺序进行下述过程：(2) The step of clipping the name space, the following process is performed in sequence:

(2.1)当该步骤被调用时，获取应用程序进程的ID参数；(2.1) When the step is called, obtain the ID parameter of the application process;

(2.2)根据所述进程的ID参数，访问其数据结构，得到进程名A；(2.2) according to the ID parameter of described process, access its data structure, obtain process name A;

(2.3)从数据库中读出包含进程名A的应用语义的XML文件；(2.3) read out the XML file containing the application semantics of process name A from the database;

(2.4)将读出的XML文件解析成DOM树，将进程名A的应用语义包含的所有文件名对应的元数据添加到该DOM树中，并且将该DOM树驻留在内存中，得到与A关联的裁剪的名字空间，结束；(2.4) The read XML file is parsed into a DOM tree, and the metadata corresponding to all file names included in the application semantics of the process name A is added to the DOM tree, and the DOM tree is resident in the memory to obtain the same as A associated clipped namespace, end;

(3)所述裁剪名字空间的访问步骤，进行下述过程：(3) The access step of the clipping name space is to carry out the following process:

(3.1)应用程序的进程A访问文件时，向文件系统发送访问请求，进行下一步；(3.1) When the process A of the application program accesses the file, it sends an access request to the file system and proceeds to the next step;

(3.2)文件系统从请求包中得到进程A的ID参数，再将访问请求重定向到与cA关联的裁剪的名字空间，进行下一步；(3.2) The file system obtains the ID parameter of process A from the request packet, and then redirects the access request to the clipped name space associated with cA, and proceeds to the next step;

(3.3)重定向操作在内存中检查与进程A的ID参数对应的DOM树是否存在，存在则转过程(3.5)，不存在则进行下一步；(3.3) The redirection operation checks whether the DOM tree corresponding to the ID parameter of process A exists in the memory, and if it exists, it will turn to the process (3.5), and if it does not exist, proceed to the next step;

(3.4)调用名字空间裁剪步骤，得到与进程A关联的裁剪的名字空间，转过程(3.2)；(3.4) call the name space clipping step, obtain the name space of the clipping associated with process A, turn process (3.2);

(3.5)在与进程A关联的裁剪的名字空间中完成文件的访问请求，继续进行应用程序；(3.5) complete the file access request in the clipped name space associated with process A, and continue the application program;

(3.6)应用程序完成退出时，在内存中清除与其所包括的所有进程相关的裁剪的名字空间，结束。(3.6) When the application finishes exiting, clear in memory the clipped namespaces associated with all processes it contains, end.

本发明通过提取应用程序的进程和其所访问文件之间的关联信息(即文件的应用语义)，将各应用语义用XML文件保存到数据库中，避免因为系统重启而造成的重复提取；并且利用这些文件的应用语义裁剪与应用程序的进程关联的名字空间，充分考虑和利用了系统和应用程序的进程的名字空间的巨大差异，缩短应用程序的进程访问元数据的时间，提高元数据的访问效率性能，适用于当前名字空间变得越来越复杂的文件系统。The present invention saves each application semantics in the database with an XML file by extracting the associated information between the process of the application program and the accessed file (i.e. the application semantics of the file), avoiding repeated extraction caused by restarting the system; and utilizing The application semantics of these files cuts the name space associated with the application process, fully considers and utilizes the huge difference in the name space of the system and the application process, shortens the time for the application process to access metadata, and improves the access to metadata Efficiency performance, suitable for file systems with increasingly complex namespaces.

附图说明Description of drawings

图1为本发明的流程示意图；Fig. 1 is a schematic flow sheet of the present invention;

图2为本发明的应用语义提取步骤流程示意图；Fig. 2 is a schematic flow chart of the application semantic extraction step of the present invention;

图3为本发明的名字空间裁剪步骤流程示意图；Fig. 3 is a schematic flow chart of the name space clipping steps of the present invention;

图4为本发明的裁剪名字空间的访问步骤流程示意图；FIG. 4 is a schematic flow diagram of the access steps of the clipping name space of the present invention;

具体实施方式Detailed ways

假设一个应用程序由2个进程组成，进程名分别为PA、PB；共访问5个文件，分别为File1、File2、File3、File4、File5。Assume that an application program consists of two processes, the process names are PA and PB respectively; a total of five files are accessed, namely File1, File2, File3, File4, and File5.

图2为应用语义提取步骤示意图；Fig. 2 is a schematic diagram of application semantic extraction steps;

(1)启动这个应用程序，当有文件被访问时，顺序进行；(1) Start this application program, and when a file is accessed, proceed sequentially;

(2)在被访问文件的记录元数据中添加访问进程的名称；(2) Add the name of the access process in the record metadata of the accessed file;

(3)所有应用程序结束，根据所有被访问文件的记录元数据，提取应用语义，应用语义为PA＝File1:File2:File3，PB＝File4:File5；(3) All application programs are finished, and the application semantics are extracted according to the record metadata of all accessed files, and the application semantics are PA=File1:File2:File3, PB=File4:File5;

(4)将所提取到的各应用程序的进程的应用语义，写成XML文件，保存到数据库中；(4) Write the application semantics of the extracted processes of each application program into an XML file and save it in the database;

(5)结束。(5) END.

图3为名字空间裁剪步骤流程图；Fig. 3 is a flow chart of name space clipping steps;

(1)当该步骤被调用时，获取应用程序进程的ID参数；(1) When the step is called, obtain the ID parameter of the application process;

(2)根据所述进程的ID参数，访问其数据结构，得到进程名PA；(2) according to the ID parameter of described process, access its data structure, obtain process name PA;

(3)从数据库中读出包含进程名PA的应用语义的XML文件；(3) read out the XML file containing the application semantics of the process name PA from the database;

(4)将读出的XML文件解析成DOM树，将进程名PA的应用语义包含的File1、File2、File3对应的元数据添加到该DOM树中，并且将该DOM树驻留在内存中，得到与PA关联的裁剪的名字空间；(4) parse the read XML file into a DOM tree, add the metadata corresponding to File1, File2, and File3 included in the application semantics of the process name PA to the DOM tree, and reside the DOM tree in memory, Get the clipped namespace associated with the PA;

(5)结束。(5) END.

图4为裁剪名字空间的访问过程流程图；Fig. 4 is the flow chart of the access process of clipping name space;

(1)应用程序的进程PA访问文件File1时，向文件系统发送访问请求，进行下一步；(1) When the process PA of the application program accesses the file File1, it sends an access request to the file system, and proceeds to the next step;

(2)文件系统从请求包中得到进程PA的ID参数，再将访问请求重定向到与进程PA关联的裁剪的名字空间，进行下一步；(2) The file system obtains the ID parameter of the process PA from the request packet, and then redirects the access request to the clipped name space associated with the process PA, and proceeds to the next step;

(3)重定向操作在内存中检查与进程PA的ID参数对应的DOM树是否存在，存在则转过程(5)，不存在则进行下一步；(3) the redirection operation checks whether the DOM tree corresponding to the ID parameter of the process PA exists in the memory, if there is, then turn the process (5), and if it does not exist, proceed to the next step;

(4)调用名字空间裁剪步骤，得到与进程PA关联的裁剪的名字空间，转过程(2)；(4) call name space clipping step, obtain the name space of clipping associated with process PA, turn process (2);

(5)在与进程PA关联的裁剪的名字空间中完成文件File1的访问请求，继续进行应用程序；(5) complete the access request of the file File1 in the clipped name space associated with the process PA, and continue the application program;

(6)应用程序完成退出时，在内存中清除与其所包括的进程PA相关的裁剪的名字空间；(6) When the application program finishes exiting, clear the clipped name space related to the process PA it includes in the memory;

(7)结束。(7) END.

Claims

1. an application oriented file system name space management method comprises the following steps:

(1) application semantics extraction step: move each application program in advance, follow the tracks of of the visit of the process of each application program file, and increase hereof one the record metadata be used for preserving the process name of the application program of visiting this document; Then according to all record metadata, extract application semantics, every application semantics is described the relation between a process and the institute's access file, and the form of each application semantics with the XML file is saved in the database, and wait is called;

(2) name space cutting step, order is carried out following process:

(2.1) when this step is called, obtain the ID parameter of program process;

(2.2) according to the ID parameter of described process, visit its data structure, obtain process name A;

(2.3) from database, read the XML file of the application semantics that comprises process name A;

(2.4) the XML document analysis of reading is become dom tree, the All Files name metadata corresponding that the application semantics of process name A comprises is added in this dom tree, and this dom tree is resided in the internal memory, obtain the name space of the cutting related, return then with A;

(3) accessing step of cutting name space, carry out following process:

(3.1) during the process A access file of application program, send request of access, carry out next step to file system;

(3.2) file system obtains the ID parameter of process A from request package, request of access is redirected to the name space of the cutting related with process A again, carries out next step;

(3.3) redirect operation checks in internal memory whether the dom tree corresponding with the ID parameter of process A exists, and exists then to turn over journey (3.5), does not exist and then carries out next step;

(3.4) call name space cutting step, obtain the name space of the cutting related, turn over journey (3.2) with process A;

(3.5) in the name space of the cutting related, finish the request of access of file, proceed application program with process A;

(3.6) application program is finished when withdrawing from, and removes the name space of the cutting relevant with its all included processes in internal memory, finishes.

2. application oriented file system name space management method as claimed in claim 1 is characterized in that:

(1) described application semantics extraction step, order is carried out following process:

(1.1) start each application program, when file was accessed, order was carried out;

(1.2) title of interpolation visit process in the record metadata of accessed file F, the form of record metadata is A:B:C:..., and wherein F is accessed filename, and A, B, C are respectively the process name of the application program of this file of visit;

(1.3) all application programs finish, and according to the record metadata of all accessed files, extract application semantics, and its form is A=F1:F2:F3: ..., wherein A is a process name, F1, and F2, F3 are accessed filename;

(1.4) with the application semantics of the process of each application program of being extracted, write as the XML file, be saved in the database, finished.