[go: up one dir, main page]

CN117992409B - Distributed file system renaming operation method based on read-write lock ring detection mechanism - Google Patents

Distributed file system renaming operation method based on read-write lock ring detection mechanism Download PDF

Info

Publication number
CN117992409B
CN117992409B CN202410175368.5A CN202410175368A CN117992409B CN 117992409 B CN117992409 B CN 117992409B CN 202410175368 A CN202410175368 A CN 202410175368A CN 117992409 B CN117992409 B CN 117992409B
Authority
CN
China
Prior art keywords
node
read
renaming
common ancestor
write
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410175368.5A
Other languages
Chinese (zh)
Other versions
CN117992409A (en
Inventor
韩森
李�诚
李嘉豪
王一多
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202410175368.5A priority Critical patent/CN117992409B/en
Publication of CN117992409A publication Critical patent/CN117992409A/en
Application granted granted Critical
Publication of CN117992409B publication Critical patent/CN117992409B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/164File meta data generation
    • G06F16/166File name conversion
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/176Support for shared access to files; File sharing support
    • G06F16/1767Concurrency control, e.g. optimistic or pessimistic approaches
    • G06F16/1774Locking methods, e.g. locking methods for file systems allowing shared and concurrent access to files
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores
    • G06F9/524Deadlock detection or avoidance

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a renaming operation method of a distributed file system based on a read-write lock ring detection mechanism, which is characterized in that a central node uniformly processes ring detection tasks of all renaming operations, and after receiving the renaming operation request, the central node performs ring detection in a read-write lock mode; the invention uses a lightweight distributed file system renaming operation ring detection mechanism, can effectively avoid the occurrence of orphan ring data loss, and effectively improves the concurrency and throughput of renaming operation through a locking scheme mechanism with finer granularity.

Description

Distributed file system renaming operation method based on read-write lock ring detection mechanism
Technical Field
The invention relates to the technical field of computer software, in particular to a renaming operation method of a distributed file system based on a read-write lock ring detection mechanism.
Background
With the rapid development of the internet, the industry generates huge business demands such as big data and training big models, the related business data volume is rapidly increased, and the storage capacity on a single machine is far from meeting the rapid development of the current business data volume. The industry has thus shifted the goal of increasing storage capacity from increasing storage capacity on a single machine to increasing the number of machines to store the amount of data required for a service using multiple machines, which is distributed storage, such as a distributed file system, a distributed database, and so on. In a single-machine storage system, all functional operations are not complex, because all read-write operations are performed on a single machine and the read-write consistency problem of other nodes is not involved, in a distributed operation environment, a plurality of copies are required to be stored on a plurality of machines due to a fault tolerance mechanism, so that the consistency problem of data in a production environment of the plurality of copies is caused, meanwhile, as the number of users increases, the storage system is required to expand capacity on storage capacity, and the parallel read-write requests of multiple clients are required to meet the performance requirements of high throughput and low delay.
In the distributed file system, an upper layer provides an interface for reading and writing files for a developer, and the processing of multiple copies and transaction consistency of a bottom layer is packaged in the bottom layer, so that the maintenance cost of the developer can be facilitated. Among the read-write interfaces provided by the file system for the upper layer, renaming operation (rename operation) is more complex, and includes path analysis on files or directories, locking of modified data and modification on data, so that the renaming operation needs to be packaged as a transaction at the bottom layer to ensure atomicity of the renaming operation, and meanwhile, under the requirement of concurrent reading and writing of a plurality of clients, the concurrency of the renaming operation needs to be ensured as much as possible. One challenge in implementing renaming operations is to meet both the atomicity of renaming operations at the time of operation and the concurrency of multiple threads as much as possible to meet high throughput and low latency performance requirements when designing a locking scheme for renaming operations.
Another requirement is that renaming operations not only need to satisfy atomicity of transactions, but also prevent the problem of orphan ring, and simply describe the problem, that is, when two or more clients operate on the directory tree of the file directory at the same time, ring formation between directories may occur, that is, starting from one directory, the parent directory is continuously searched upwards, and finally the directory itself is found and found. If this occurs, the parsing path of the distributed file system cannot find all the data in the directory and the subdirectories under the directory, which may cause data loss and seriously affect the data security of the user.
In the prior art papers, it is generally considered that the renaming operation occupies a relatively small operation in the whole distributed file system, so that the performance requirement of the renaming operation is often ignored when the performance of the distributed file system is considered, but in the latest big data workload in the industry, the renaming operation burst problem occurs in the distributed file system, and the phenomena of higher peak value, long duration and the like of the renaming operation request may occur in a period of time.
In the renaming operation locking mechanism used in the industry at present, there are four main processing methods:
(1) The earliest distributed file system was traditionally done by adding a global common lock to all renaming operations in the distributed file system, such as the Linux kernel, HDFS, lustre, etc. distributed file systems, in which case the renaming operations in the distributed file system are completely non-concurrent.
(2) The source directory of the renaming operation and all directories and files of the subtree under the directory are locked, in this way, the formation of orphan rings can be effectively prevented, and the renaming operation without father-son directory relation can be concurrent, but because the locking method locks all the subtrees, the locking cost rises with the rising of the size of the subtree, for example HopsFS data, the locking method needs the subsecond-level cost, and LocoFS also uses the locking method to ensure the data security of the renaming operation in the concurrent read-write process.
(3) The current industry can finish renaming operation ring detection, and a central node can be added to finish ring detection task exclusively. All renaming operation requests must pass through the same central node to detect the concurrency feasibility between renaming operations, but because the throughput of the central node is limited, and the concurrency degree of the renaming operations is still low due to locking, for example, the central node of a hundred-degree distributed file system CFS can obtain the smallest common ancestor node of a source node and a destination node, and then lock the smallest common ancestor node-source node and the smallest common ancestor node-destination node on two paths, so that most renaming operations under the smallest common ancestor node and its offspring nodes cannot be concurrent during locking in a coarse-grained locking manner. CephFS use the metadata server MDS to detect looping also adds to the performance overhead. Some distributed file systems may introduce broadcasting that further increases performance overhead, such as InfiniFS.
(4) Some distributed file systems completely forgo the central checking of renaming operations, such as JuiceFS, and while concurrency is greatly increased, loss of data is likely to occur when dangerous renaming concurrency operations occur.
In view of this, the present invention has been made.
Disclosure of Invention
The invention aims to provide a renaming operation method of a distributed file system based on a read-write lock ring detection mechanism, which improves the concurrency of renaming operation and the overall throughput of a cluster by using a lightweight locking method under the condition of ensuring the safety and the correctness of data.
The invention aims at realizing the following technical scheme:
a distributed file system renaming operation method based on a read-write lock ring detection mechanism comprises the following steps:
The distributed file system sends all received renaming operation requests to a central node;
After receiving a renaming operation request, the center node carries out ring detection in a read-write lock mode, and uses access metadata stored by the center node, determines the smallest public ancestor node of a renaming operation source node and a destination node through path analysis, and creates a write lock on the smallest public ancestor node;
and sending a renaming transaction request to a node of the lower metadata storage engine storing directory entries, executing a renaming transaction process, and releasing all read-write locks added in the previous locking process after the renaming transaction is executed.
According to the technical scheme provided by the invention, the lightweight distributed file system renaming operation ring detection mechanism is used, so that the occurrence of orphan ring data loss can be effectively avoided, and the concurrency and throughput of renaming operation are effectively improved through a locking scheme mechanism with finer granularity.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a renaming operation method of a distributed file system based on a read-write lock ring detection mechanism according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an orphan ring according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an overall storage architecture of a distributed file system according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of metadata decoupling into content metadata and access metadata according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a memory lock structure based on HashMap according to an embodiment of the present invention;
FIG. 6 is a diagram of the loop detection correctness of the smallest common ancestor node provided by an embodiment of the present invention;
Fig. 7 is a schematic diagram of concurrent execution of rename transactions provided in an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
The terms that may be used herein will first be described as follows:
The terms "comprises," "comprising," "includes," "including," "has," "having" or other similar referents are to be construed to cover a non-exclusive inclusion. For example, inclusion of a feature (e.g., a starting material, component, ingredient, carrier, dosage form, material, size, part, component, mechanism, apparatus, step, procedure, method, reaction condition, processing condition, parameter, algorithm, signal, data, product or article of manufacture, etc.) should be construed as including not only the feature explicitly recited, but also other features known in the art that are not explicitly recited.
The term "consisting of" means excluding any technical feature elements not explicitly listed. If such term is used in a claim, the term will cause the claim to be closed, such that it does not include technical features other than those specifically listed, except for conventional impurities associated therewith. If the term is intended to appear in only a clause of a claim, it is intended to limit only the elements explicitly recited in that clause, and the elements recited in other clauses are not excluded from the overall claim.
The distributed file system renaming operation method based on the read-write lock ring detection mechanism provided by the invention is described in detail below. What is not described in detail in the embodiments of the present invention belongs to the prior art known to those skilled in the art. The specific conditions are not noted in the examples of the present invention and are carried out according to the conditions conventional in the art or suggested by the manufacturer.
As shown in FIG. 1, a distributed file system renaming operation method based on a read-write lock ring detection mechanism mainly comprises the following steps:
and step 1, the distributed file system sends all received renaming operation requests to a central node.
In the embodiment of the invention, the distributed file system analyzes all renaming operation requests received from clients (clients) and sends the renaming operation requests to the central node of the metadata storage engine, so that all renaming operations can be subjected to ring detection on the same central node.
And step2, after receiving the renaming operation request, the central node performs ring detection in a read-write lock mode.
In the embodiment of the invention, the central node of the metadata storage engine uses a lightweight lock mechanism to perform ring detection on renaming operation requests and avoid generation of orphan rings.
(1) The central node utilizes the access metadata stored by the central node to determine the smallest common ancestor node for renaming the source node and the destination node through path analysis.
In the embodiment of the invention, the access metadata of the whole distributed file system is stored in the memory of the central node, wherein the access metadata comprises tree structure relations and read-write authorities among directories, the distributed file system needs to carry out path analysis on renamed parameters in the renaming process, the authority check on a user who executes renaming operation is needed in the path analysis process, if a certain directory does not have the read-write authorities of the current user, the renaming operation is terminated, so the read-write authorities of the directory need to be stored in the central node, and the request of the access and modification of the related directory of the distributed file system needs to be executed through the central node. The central node divides the parameters in the renaming operation request into directory name lists according to the levels, then matches the directory names of all levels to find out the smallest public ancestor node, and if the directory names of the first level nodes are not the same, the smallest public ancestor node is set as the root node.
(2) And resolving the parameters in the renaming operation request to the smallest common ancestor node, and writing a lock on the smallest common ancestor node in the memory so as to prevent possible deadlock in the locking process.
In the embodiment of the present invention, the parameters in the renaming operation request are paths of the file, for example,/A/B and/A/C in the renaming operation request rename (/ A/B,/A/C) are paths of the file, and are also parameters of the renaming operation.
In the embodiment of the invention, the memory of the central node stores access metadata of a distributed file system, namely inode numbers of a father directory (namely numbers for indexing the directory and file metadata nodes in the distributed file system), directory names and access authorities of the directory, the central node accesses own memory without using network communication, accesses the minimum public ancestor node downwards from the root node according to a recursive sequence, if the minimum public ancestor node is found to be not present or locked, the operation of error reporting and renaming is abandoned, if the minimum public ancestor node is successfully resolved, a mapping between a hash table record directory node and a read-write lock is created in the memory, and a new write lock (namely an exclusive lock) is created and stored in the hash table.
(3) And sequentially creating read locks for other nodes except the minimum common ancestor node from the series of nodes from the minimum common ancestor node to the destination node according to the sequence from top to bottom, namely, the minimum common ancestor node is not locked in the process, and the process is used for detecting whether the ancestor node of the destination node has source nodes participating in other renaming operations or not.
In the embodiment of the invention, a central node sequentially inquires an inode number (a metadata index node number) and a read-write authority of each node in a node sequence of a destination path (i.e. a path from the minimum common ancestor node to the destination node), establishes a read lock (i.e. a shared lock) for the corresponding node according to a hash table of the inode number (the metadata index node number) in a memory, if the corresponding node does not exist, indicates that the corresponding node is changed, releases the node on the minimum common ancestor at the moment, releases all the added read locks, gives up renaming operation, replies renaming operation failure information to an upper layer, and if a write lock (i.e. an exclusive lock) exists on the corresponding node in the hash table, blocks the release of the write lock waiting for the corresponding node, and under the condition, the state of the write lock is kept, and other renaming operation is proved to be needed to be released by waiting for other renaming operation.
(4) A write lock is created for the source node.
In the embodiment of the invention, the central node recursively analyzes the path of the source node (the path from the minimum common ancestor node to the source node) from the memory, and when the last node of the path of the source node, namely the source node, is found, a write lock is created for the source node in the hash table in the memory.
(5) The write lock of the smallest common ancestor node is released.
The primary role of write locks on the smallest common ancestor node is to avoid deadlock caused by the simultaneous occurrence of two r renaming operations that may produce an orphan ring.
And 3, the central node sends a renaming transaction request to a persistent storage dentry node (namely a node for storing the directory entry) of the metadata storage engine at the lower layer, a renaming transaction process is executed, and after the renaming transaction is executed, all read-write locks added in the previous locking process are released.
In the embodiment of the invention, metadata are decoupled into access metadata and content metadata, the access metadata is mainly stored in a memory by a central node, the access metadata and the content metadata are mainly stored in an SSD (Solid STATE DISK or Solid STATE DRIVE, solid hard disk) by a persistent storage dentry node (namely, the node for storing directory entries), renaming transaction is executed, the access metadata of a source node and a destination node and the content metadata of respective father directories are modified in an atomization mode, and because the modification involves reading and writing of a plurality of fragments and the access metadata are simultaneously stored by the central node and other persistent nodes, the whole renaming operation is required to be executed by a transaction interface so as to ensure the atomicity of the operation.
In the embodiment of the invention, in the process of ring detection by means of reading and writing locks, all nodes created with the reading locks and the writing locks are stored in a sequence, and after renaming transaction is executed, the reading locks and the writing locks of all nodes in the sequence are released.
According to the scheme provided by the embodiment of the invention, the write lock is added on the minimum public ancestor node, the read lock is used on the destination node path, the write lock is added on the source node, and finally the write lock on the minimum public node is released. Compared with the ring detection of the renaming operation of the traditional distributed file system, the method improves the concurrency of the renaming operation and improves the correlation performance of the whole cluster under the condition of ensuring the data security of the renaming operation.
In order to clearly show the technical scheme and the technical effects provided by the invention, the distributed file system renaming operation method based on the read-write lock ring detection mechanism provided by the embodiment of the invention is described in detail.
1. The distributed file system ring detects the cause of the generation.
Files and catalogues are used for managing data storage in the distributed file system, and hierarchical namespaces are adopted, so that operation on smaller files is facilitated. However, since the metadata management manner is more complex than the block storage and the object storage, the data security of the distributed file system needs to be maintained when the operation of the distributed file system is performed.
FIG. 2 illustrates the cause of orphan ring formation, renaming operations include moving a file or directory and subtrees under that directory to another directory, when two or more renaming operations are concurrently performed in the hierarchical namespace, provided that the source node of each renaming operation is moved into a descendant node under the source node of the other renaming operation, respectively.
As shown in fig. 2, when three renaming operations of rename (/ a/F,/a/B/D/F), rename (/ a/B,/a/C/E/B), rename (/ a/C,/a/F/G/C) occur simultaneously, namely, the F node moves into the D sub-node, the B node moves into the E sub-node, and the C node moves into the G sub-node, the C sub-node moves into the G sub-node, the ring structure is formed, which results in searching from the root node, metadata in the six nodes cannot be found, the directory represented by the six nodes does not exist in the view of the client, although the user does not perform any deletion operation, the directory data still exists in the data storage engine of the distributed file system, but the six nodes and all the sub-nodes thereof are lost due to the fact that the metadata is in the tree structure.
How to prevent the occurrence of an orphan ring is critical to prevent the source directory node of other renaming operations from occurring on a path from the smallest common ancestor node of the source node and the destination node to the destination node in the renaming operation.
2. The distributed file system stores metadata schemas.
The overall storage architecture of the distributed file system is shown in fig. 3, and is divided into the following parts, wherein clients of the distributed file system upwards provide POSIX, HDFS and Object interfaces for developers, so that the services of the developers on file storage and Object storage are satisfied. The method is characterized in that the method comprises the steps of dividing the file into a metadata storage engine and a data storage engine at the bottom layer of a client of a distributed file system, and connecting the data engine with file storage or object storage at the bottom layer according to the requirement of the distributed file system. In fig. 3, ns (nasspace) refers to access metadata stored in a central node for different namespaces, shard refers to metadata shards stored by persistent storage nodes.
When a client prepares to read and write a file, a metadata request must be initiated to a metadata server first, metadata read-write permission, location information and the like are acquired, and then a data request is initiated to a data server to read and write data.
While the metadata storage engine used in the distributed file system takes the approach of a central node, the distributed file system decouples metadata into content metadata and access metadata, as in fig. 4. The access metadata comprises a parent directory inode number of the directory, a directory name of the directory, an inode of the directory itself and read-write permission of the directory itself. The content metadata includes its dentry list of subdirectories (i.e., list of directory entries), the timestamp of the directory, and the number of links to the directory. All metadata operations must go through path resolution to resolve the nodes that the interface needs to operate.
In the embodiment of the invention, the central node stores and accesses the metadata part, so that path analysis of files and catalogues is accelerated, the path analysis does not need to pass through RPC (remote procedure call) for a plurality of times, and the operation performance of the distributed file system is improved. The central node also takes on the task of renaming operation ring detection, and uses a lightweight locking method to complete ring detection of renaming operation.
The distributed file system metadata storage engine also deploys other nodes to persistently store all metadata, including access metadata and content metadata, so when the distributed file system executes renaming operation, the access metadata of the central node and related metadata in the persistent storage need to be modified at the same time, and the atomicity of POSIX needs to be provided in the upward direction of the operation, so that the atomicity of the metadata operation needs to be ensured in a transactional mode.
3. Read-write lock based on HashMap storage memory.
The renaming ring detection method of the distributed file system in the embodiment of the invention is realized based on the read-write lock in the memory, and as shown in fig. 5, the memory lock is based on the mapping between the hash map (Hash map) storage directory inode and the address of the read-write lock on the directory. When renaming operations are performed in a distributed file system, in order to ensure thread security, a read-write lock in HashMap is used to perform the process of ring detection. The inode number (metadata inode number) is a unique identifier in the distributed file system that is used to identify a file.
The read-write lock of the embodiment of the invention C++ is a multithreaded synchronization mechanism which allows multiple threads to obtain read access at the same time, but allows only one thread to monopolize write access. This lock design helps to improve concurrency performance because multiple threads can read shared data at the same time, and exclusive locks are only needed when writes are needed.
When the read-write lock is in the write-lock state, all threads attempting to lock are blocked before it is unlocked, and when the read-write lock is in the read-lock state, all threads attempting to lock in the read mode can get access, but if it is desired to lock in the write mode, the threads will be blocked.
In the embodiment of the invention, when the exclusive lock is required to be performed, std: unique_lock is used for performing exclusive lock management, and std: shared_lock is used for acquiring the shared lock.
As shown in FIG. 5, after receiving rename (/ A/B/C,/A/D/E/C) at the central node, the HashMap store A, C, D, E in memory maintains an independent read-write lock for each directory, and rename operations will also preserve the sequence of all locking nodes of the operation, with each node inode as an element in the sequence, i.e., [ A_inode, C_inode, D_inode, E_inode ], to facilitate finding the previous locking sequence in the HashMap when released. Wherein A, C performs exclusive locking, D, E performs shared locking, and the minimum common ancestor node, namely the exclusive lock on the A node, is released before the rename transaction is executed, and after the renaming operation is finished, the added locks are sequentially released according to the locking sequence saved before.
4. Correctness is detected based on the ring of the smallest common ancestor node.
Among the distributed file system central node renaming ring detection algorithms, the smallest common ancestor node can ensure the correctness of the ring detection algorithm. When two or more potentially looped renaming operations may occur, exclusive locks are added to the smallest common ancestor node, which may prevent security issues with data generated by those renaming operations that may generate an orphan loop, in a manner that the exclusive locks added to the smallest common ancestor node may prevent deadlock of multiple renaming operations that may generate an orphan loop.
As can be seen from the analysis of the reasons for generating the orphan ring, adding the shared lock on the path of the destination node can prevent a plurality of renaming operations which can generate the orphan ring from being concurrent, but when a plurality of renaming operations occur simultaneously, locking on the path of the destination node and on the source node can possibly cause deadlock, and adding the exclusive lock on the smallest public ancestor node of each renaming operation can effectively prevent deadlock from occurring.
As shown in fig. 6, it is assumed that n renaming operations that may cause orphan loops occur simultaneously (n=5 in the illustration). Based on the design of the renaming ring detection algorithm, n renaming operations will generate n minimum common ancestor nodes, and it will be demonstrated that two of the n minimum common ancestor nodes must be identical.
First, because of the n renaming operations, each generates a destination node and a source node, there are n source nodes and n destination nodes in total. And since it is known that the n renaming operations may form an orphan ring, each of the n destination nodes may have a corresponding node among the n source nodes as its ancestor node, such as G in the destination node corresponding to B in the source node in fig. 6.
N renaming operations generate n minimum public ancestor nodes, and according to the properties of the minimum public ancestor nodes, at least one source node and one destination node are respectively arranged on two sides of each minimum public ancestor node, and one destination node serving as a descendant node of each source node is arranged below each source node, so that at least one destination node exists on two sides of each minimum public ancestor node. However, according to the drawer principle, if at least one destination node exists on each of two sides of n different minimum public ancestor nodes, at least n+1 destination nodes are used as leaf nodes of the minimum public ancestor nodes, so that the original assumption is not satisfied, and two nodes are the same in n minimum public ancestor nodes corresponding to n renaming operations for generating orphan loops possibly in parallel.
Therefore, in the locking link, when n renaming operations enter the ring detection stage in parallel, the exclusive locks on n-1 minimum public ancestor nodes can be contended, and one renaming operation cannot enter the ring detection stage at the same time, so that the n renaming operations can be guaranteed not to be deadlocked, and the data correctness of the renaming operation ring detection concurrent operation is guaranteed.
5. Concurrency of the lightweight ring detection method.
The lightweight ring detection method can utilize a lightweight lock method to avoid the concurrent execution of renaming operations possibly causing orphan rings in the ring detection stage, and release the exclusive lock on the smallest common ancestor node in the renaming operation transaction execution stage, so that the concurrency of the whole distributed file system is improved.
In the embodiment of the present invention, as shown in FIG. 7, if rename (/ A/B/G,/A/C/E/J/G) and rename (/ A/C/D,/A/B/F/I/D) occur simultaneously, serial execution is only required when adding memory locks in the ring detection stage (because both renaming operations need to contend for the write lock of the minimum common ancestor A node), but because the overhead of memory locks is small, the serial execution in this stage is not a bottleneck for the performance of the entire renaming operation.
After the ring detection stage is finished, when the transaction execution layer executes renaming operation, the ring detection stage and the transaction execution layer can be executed concurrently, so that the concurrency and performance of ring detection of the renaming operation are greatly improved.
From the description of the above embodiments, it will be apparent to those skilled in the art that the above embodiments may be implemented in software, or may be implemented by means of software plus a necessary general hardware platform. With such understanding, the technical solutions of the foregoing embodiments may be embodied in a software product, where the software product may be stored in a nonvolatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.), and include several instructions for causing a computer device (may be a personal computer, a server, or a network device, etc.) to perform the methods of the embodiments of the present invention.
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (4)

1.一种基于读写锁环检测机制的分布式文件系统重命名操作方法,其特征在于,包括:1. A distributed file system renaming operation method based on a read-write lock ring detection mechanism, characterized by comprising: 分布式文件系统将接收到的所有重命名操作请求发送至中心节点;The distributed file system sends all received rename operation requests to the central node; 中心节点在接收到重命名操作请求之后,以读写锁的方式来进行环检测,包括:中心节点利用自身存储的访问元数据,通过路径解析确定重命名操作源节点和目的节点的最小公共祖先节点,并在最小公共祖先节点上创建写锁;然后,从最小公共祖先节点到目的节点的一系列节点之中,按照从上到下的顺序为除去最小公共祖先节点之外的其他节点依次创建读锁;再为源节点创建写锁,并释放最小公共祖先节点的写锁;After receiving the rename operation request, the central node performs loop detection in the form of a read-write lock, including: the central node uses the access metadata stored in itself to determine the least common ancestor node of the rename operation source node and the destination node through path resolution, and creates a write lock on the least common ancestor node; then, from the least common ancestor node to the destination node, in a series of nodes, in order from top to bottom, read locks are created for other nodes except the least common ancestor node; then, a write lock is created for the source node, and the write lock of the least common ancestor node is released; 向下层的元数据存储引擎的存储目录项的节点发送重命名事务请求,执行重命名事务过程,执行完重命名事务之后,释放之前加锁过程中添加的所有读写锁;Send a rename transaction request to the node storing the directory entry of the metadata storage engine at the lower layer, execute the rename transaction process, and after the rename transaction is completed, release all read-write locks added in the previous locking process; 所述中心节点利用自身存储的访问元数据,通过路径解析确定重命名操作源节点和目的节点的最小公共祖先节点包括:中心节点的内存中存储有分布式文件系统的访问元数据,包括目录之间的树形结构关系与读写权限;中心节点将重命名操作请求中的参数按照层级分拆成为目录名列表,然后匹配各个层级的目录名找到最小公共祖先节点;如果第一层节点的目录名不相同,则将最小公共祖先节点设为根节点;The central node determines the least common ancestor node of the source node and the destination node of the rename operation by path resolution using the access metadata stored in the central node, including: the access metadata of the distributed file system is stored in the memory of the central node, including the tree structure relationship and read and write permissions between directories; the central node splits the parameters in the rename operation request into a directory name list according to the level, and then matches the directory names of each level to find the least common ancestor node; if the directory names of the first-level nodes are different, the least common ancestor node is set as the root node; 所述从最小公共祖先节点到目的节点的一系列节点之中,按照从上到下的顺序为除去最小公共祖先节点之外的其他节点依次创建读锁包括:中心节点从最小公共祖先节点开始,在目的路径的节点序列中依次查询各个节点的inode编号与读写权限,并根据inode编号在内存中的哈希表对相应节点建立读锁,其中,inode编号为元数据索引节点编号,如果相应节点不存在,则说明相应节点已经被更改,此时释放最小公共祖先上的节点,以及释放此时添加的所有读锁,并放弃重命名操作,向上层回复重命名操作失败的信息;如果哈希表中相应节点上已经存在写锁,则在相应节点上阻塞,等待写锁的释放;其中,目的路径为从最小公共祖先节点到目的节点的路径;The step of creating read locks for other nodes except the least common ancestor node in a series of nodes from the least common ancestor node to the destination node in order from top to bottom includes: starting from the least common ancestor node, the central node sequentially queries the inode number and read/write permissions of each node in the node sequence of the destination path, and establishes a read lock for the corresponding node according to the hash table of the inode number in the memory, wherein the inode number is the metadata index node number, and if the corresponding node does not exist, it means that the corresponding node has been changed, and at this time, the node on the least common ancestor is released, and all read locks added at this time are released, and the renaming operation is abandoned, and the information that the renaming operation failed is replied to the upper layer; if a write lock already exists on the corresponding node in the hash table, the corresponding node is blocked and waits for the release of the write lock; wherein the destination path is the path from the least common ancestor node to the destination node; 所述执行重命名事务过程包括:将元数据解耦为访问元数据和内容元数据,中心节点保存访问元数据,而在存储目录项的节点中持久化存储访问元数据和内容元数据;执行重命名事务,需要原子化的修改源节点与目的节点的访问元数据和各自父目录的内容元数据,将重命名操作以事务接口执行。The process of executing a rename transaction includes: decoupling metadata into access metadata and content metadata, a central node storing access metadata, and persistently storing access metadata and content metadata in nodes storing directory entries; executing a rename transaction requires atomically modifying the access metadata of a source node and a destination node and the content metadata of their respective parent directories, and executing the rename operation with a transaction interface. 2.根据权利要求1所述的一种基于读写锁环检测机制的分布式文件系统重命名操作方法,其特征在于,所述在最小公共祖先节点上创建写锁包括:2. According to the method for renaming a distributed file system based on a read-write lock ring detection mechanism according to claim 1, wherein creating a write lock on the least common ancestor node comprises: 中心节点的内存中存储有分布式文件系统的访问元数据,中心节点在不使用网络通信的情况下访问自身的内存,按照递归的顺序从根节点向下不断访问到最小公共祖先节点,如果发现最小公共祖先节点不存在或者已经上锁,则报错并放弃重命名操作;若成功解析到最小公共祖先节点,则在内存中创建一个哈希表记录目录节点与读写锁的映射,创建新的写锁存储到哈希表中。The central node's memory stores the access metadata of the distributed file system. The central node accesses its own memory without using network communication, and continuously accesses the least common ancestor node from the root node downward in a recursive order. If the least common ancestor node is found to not exist or is locked, an error is reported and the renaming operation is abandoned. If the least common ancestor node is successfully parsed, a hash table is created in the memory to record the mapping between the directory node and the read-write lock, and a new write lock is created and stored in the hash table. 3.根据权利要求1所述的一种基于读写锁环检测机制的分布式文件系统重命名操作方法,其特征在于,所述为源节点创建写锁包括:中心节点从内存中递归解析从最小公共祖先节点到源节点的路径,在查找到源节点时,在内存中的哈希表中对该源节点创建写锁;其中,源节点的路径为从最小公共祖先节点到源节点的路径。3. According to claim 1, a distributed file system renaming operation method based on a read-write lock ring detection mechanism is characterized in that creating a write lock for the source node includes: the central node recursively resolves the path from the least common ancestor node to the source node from the memory, and when the source node is found, creates a write lock for the source node in the hash table in the memory; wherein the path of the source node is the path from the least common ancestor node to the source node. 4.根据权利要求1所述的一种基于读写锁环检测机制的分布式文件系统重命名操作方法,其特征在于,所述执行完重命名事务之后,释放之前加锁过程中添加的所有读写锁包括:4. According to the distributed file system renaming operation method based on the read-write lock ring detection mechanism of claim 1, it is characterized in that after executing the renaming transaction, releasing all read-write locks added in the previous locking process includes: 在以读写锁的方式来进行环检测的过程中,将所有创建有读锁与写锁的节点存储于一个序列中,执行重命名事务完毕后,将序列中所有节点的读锁与写锁释放。In the process of performing ring detection in the form of read-write locks, all nodes that have been created with read locks and write locks are stored in a sequence. After the rename transaction is completed, the read locks and write locks of all nodes in the sequence are released.
CN202410175368.5A 2024-02-07 2024-02-07 Distributed file system renaming operation method based on read-write lock ring detection mechanism Active CN117992409B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410175368.5A CN117992409B (en) 2024-02-07 2024-02-07 Distributed file system renaming operation method based on read-write lock ring detection mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410175368.5A CN117992409B (en) 2024-02-07 2024-02-07 Distributed file system renaming operation method based on read-write lock ring detection mechanism

Publications (2)

Publication Number Publication Date
CN117992409A CN117992409A (en) 2024-05-07
CN117992409B true CN117992409B (en) 2025-02-28

Family

ID=90890417

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410175368.5A Active CN117992409B (en) 2024-02-07 2024-02-07 Distributed file system renaming operation method based on read-write lock ring detection mechanism

Country Status (1)

Country Link
CN (1) CN117992409B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942269A (en) * 2014-03-26 2014-07-23 北京京东尚科信息技术有限公司 Method and device for operating file system
CN107783988A (en) * 2016-08-26 2018-03-09 阿里巴巴集团控股有限公司 The locking method and equipment of a kind of directory tree

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9449008B1 (en) * 2014-03-31 2016-09-20 Amazon Technologies, Inc. Consistent object renaming in distributed systems
CN115705314A (en) * 2021-08-06 2023-02-17 腾讯科技(深圳)有限公司 File operation method, device, computer device and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942269A (en) * 2014-03-26 2014-07-23 北京京东尚科信息技术有限公司 Method and device for operating file system
CN107783988A (en) * 2016-08-26 2018-03-09 阿里巴巴集团控股有限公司 The locking method and equipment of a kind of directory tree

Also Published As

Publication number Publication date
CN117992409A (en) 2024-05-07

Similar Documents

Publication Publication Date Title
EP3714375B1 (en) Allocation and reassignment of unique identifiers for synchronization of content items
US8555378B2 (en) Authorization caching in a multithreaded object server
US5991771A (en) Transaction synchronization in a disconnectable computer and network
US7702640B1 (en) Stratified unbalanced trees for indexing of data items within a computer system
JP7362190B2 (en) Data indexing method in storage engine, data indexing device, computer device, and computer program
US9460144B2 (en) Lock acceleration
CN111444027B (en) Transaction processing method and device, computer equipment and storage medium
KR20070121664A (en) Systems and methods for manipulating data in a data storage system
US11514080B1 (en) Cross domain transactions
US20230098963A1 (en) Object processing method and apparatus, computer device, and storage medium
US9367579B1 (en) System and method for maintaining a file change log within a distributed file system
CN101976314A (en) Access control method and system
CN117539841B (en) Metadata management system of distributed file system and operation method thereof
Waqas et al. Transaction management techniques and practices in current cloud computing environments: A survey
US11940972B2 (en) Execution of operations on partitioned tables
WO2023124242A1 (en) Transaction execution method and apparatus, device, and storage medium
US20080195615A1 (en) Recursive lock-and-propagate operation
Lomet Simple, robust and highly concurrent B-trees with node deletion
US20250209056A1 (en) Execution of a sparql query over an rdf dataset stored in a distributed storage
CN117992409B (en) Distributed file system renaming operation method based on read-write lock ring detection mechanism
CN118427161B (en) Distributed file system and directory operation method
Zhang et al. BG3: A Cost Effective and I/O Efficient Graph Database in Bytedance
CN112286889A (en) Wide area network-oriented metadata copy synchronization method for distributed file system
Arora et al. Dynamic timestamp allocation for reducing transaction aborts
US12259891B2 (en) Hybrid database implementations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant