WO2020135889A1 - Method for dynamic loading of disk and cloud storage system - Google Patents
Method for dynamic loading of disk and cloud storage system Download PDFInfo
- Publication number
- WO2020135889A1 WO2020135889A1 PCT/CN2019/130169 CN2019130169W WO2020135889A1 WO 2020135889 A1 WO2020135889 A1 WO 2020135889A1 CN 2019130169 W CN2019130169 W CN 2019130169W WO 2020135889 A1 WO2020135889 A1 WO 2020135889A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- disk
- storage
- storage node
- node
- management
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0614—Improving the reliability of storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
- G06F11/1004—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's to protect a block of data words, e.g. CRC or checksum
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0629—Configuration or reconfiguration of storage systems
- G06F3/0631—Configuration or reconfiguration of storage systems by allocating resources to storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/064—Management of blocks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0683—Plurality of storage devices
- G06F3/0689—Disk arrays, e.g. RAID, JBOD
Definitions
- the present application relates to the field of data storage technology, in particular to a method for dynamically loading a disk and a cloud storage system.
- Cloud storage provides flexible storage space for the storage of massive video data.
- the storage space of cloud storage needs to maintain the storage cluster, and the data is generally scattered in the storage cluster. That is to say, massive video data can be stored through the storage cluster.
- Cloud storage can use copy mode or EC (Erasure Code, erasure code) mode to ensure data integrity.
- EC Erasure Code, erasure code
- a storage cluster after a device fails, the data in the failed storage node needs to be recovered through the copy or EC data. That is refactoring.
- the storage cluster size of cloud storage is large, storage node failures will become frequent.
- a part of the storage node failure is a software failure, such as service startup failure, operating system abnormality, etc., although the data in the failed storage node can be calculated through copy or EC data, it consumes the computing power of the storage cluster and increases Cluster burden.
- Embodiments of the present application provide a method for dynamically loading a disk and a cloud storage system, which can reduce system resource consumption caused by data reconstruction.
- the technical solution is as follows:
- a method for dynamically loading a disk is provided, which is applied to a cloud storage system.
- the cloud storage system includes a management node and multiple storage nodes.
- the multiple storage nodes access the same SAS switch.
- the method includes:
- the management node When the management node detects that the first storage node of the plurality of storage nodes has a software failure, it sends a disk load instruction to the second storage node of the plurality of storage nodes;
- the second storage node loads the disk of the first storage node through the SAS switch.
- the management node updates storage node information corresponding to the locally stored disk.
- the method further includes:
- the management node When the management node receives the read request to read the data of the disk, the management node sends the read request to the second storage node according to the updated information of the storage node corresponding to the locally stored disk.
- the second storage node reads the data in the disk through the SAS switch according to the received read request.
- the management node when the management node receives a write request to write data to the disk, according to the updated locally stored information of the storage node corresponding to the disk, the management node will write The request is sent to the second storage node;
- the second storage node writes data to the disk through the SAS switch according to the received write request.
- loading the disk of the first storage node through the SAS switch includes:
- the second storage node updates the index information of the disk in the first storage node to the database of the second storage node.
- the management node updating the storage node information corresponding to the locally stored disk includes:
- the management node correspondingly updates the storage node information of the disk and the second storage node to a local database.
- the method before the management node updates the storage node information corresponding to the locally stored disk, the method further includes:
- the management node receives the message that the disk is loaded successfully from the second storage node.
- the cloud storage system includes: a management node and multiple storage nodes, the multiple storage nodes accessing the same SAS switch, and the multiple storage nodes include a first storage Node and second storage node, where:
- the management node is configured to send a disk load instruction to the second storage node when a software failure is detected in the first storage node;
- the second storage node is configured to load the disk of the first storage node through the SAS switch after receiving the disk loading instruction.
- the management node is further used to update storage node information corresponding to the locally stored disk.
- the management node is further configured to, when receiving a read request to read the data of the disk, according to the updated storage node information corresponding to the disk stored locally , And send the read request to the second storage node;
- the second storage node is also used to read the data in the disk through the SAS switch according to the received read request.
- the management node is further configured to, when receiving a write request to write data to the disk, update the locally stored storage node information corresponding to the disk , Send the write request to the second storage node;
- the second storage node is further configured to write data to the disk through the SAS switch according to the received write request.
- the second storage node is also used to update the index information of the disk in the first storage node to the database of the second storage node.
- the management node is further configured to correspondingly update the storage node information of the disk and the second storage node to a local database.
- the management node is further configured to receive a message that the second storage node successfully loads the disk.
- a disk dynamic loading device including:
- One or more processors a storage device that stores one or more programs
- the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the disk dynamic loading method.
- a computer-readable storage medium on which a computer program is stored, which implements the disk dynamic loading method when the computer program is executed by a processor.
- a computer program product containing instructions which, when run on a computer, causes the computer to implement the disk dynamic loading method described in the above aspect.
- the method for dynamically loading the disk of the present application accesses the same SAS switch through the storage node, and the storage node can access the disks of all storage nodes connected to the SAS switch, so that in the case of a storage node software failure, loading through other storage nodes
- the disk of the failed storage node can realize the dynamic loading of the disk, reduce the performance loss of system reconstruction, and improve the availability of the object storage disk.
- FIG. 1 shows a first overall flowchart of a method for dynamically loading a magnetic disk according to an embodiment of the present application.
- FIG. 2 shows a schematic structural diagram of a storage node accessing a SAS switch according to an embodiment of the present application.
- FIG. 3 shows a second overall flowchart of a method for dynamically loading a disk according to an embodiment of the present application.
- FIG. 4 shows a first schematic flowchart of an MDS drift disk according to an embodiment of the present application.
- FIG. 5 shows a second schematic flowchart of an MDS drift disk according to an embodiment of the present application.
- FIG. 6 shows a third schematic flow chart of an MDS drift disk according to an embodiment of the present application.
- Database refers to a collection of related, structured data that is reasonably stored on a computer's storage device.
- a database contains various contents, including tables, views, fields, indexes, etc.
- Video positioning This application refers to the time entered by the user.
- the system can quickly find the stored video data corresponding to this time according to the relevant information recorded in the database.
- Byte Data storage is in the unit of "Byte" (Byte), and every 8 bits (bit, abbreviated as b) form a byte (Byte, abbreviated as B), which is the smallest level of information unit.
- Video stream includes the video data to be transmitted, which can be processed as a stable and continuous stream through the network.
- the object storage system is a massive, safe, highly reliable and easily expandable cloud storage service provided to users. Instead of organizing files into a directory hierarchy, it stores files in a flat container organization and uses unique IDs to retrieve them. The result is that object storage systems require less metadata to store and access files than file systems, and they also reduce the overhead of managing file metadata due to storing metadata.
- the object storage system provides services for users through the platform-independent RESTFUL protocol and supports convenient storage and management of massive objects through the web.
- the object storage system can store arbitrary objects in a durable and highly available system.
- Applications and users can use simple APIs (Application Programming Interface) to access data in the object storage; these are usually based on the state of table attributes Transfer (REST) architecture, but there are also interfaces for programming languages.
- OSD Object-Based Storage Device
- This solution represents a storage node and is a module for reading and writing objects in an object storage system.
- the OSD stores data to the tracks and sectors of the disk, and combines several tracks and sectors to form an object, and provides access to the data to the outside world through this object.
- MDS Metal Database Server
- It is the management node in the object storage system, which stores the index information of the object, including the name of the object, the specific location information of the object, and the last modification time of the object.
- Allocation of resources This solution refers to MDS allocating storage resources for object writing, and specifically refers to allocating OSD and object disks.
- File object Responsible for file access operations. After obtaining the file object, you can use the file object to read the data on the disk. The file object is uploaded to the cloud storage by the user at a time, and the upload is completed in one interaction using the PUT protocol.
- a cluster is a group of independent computers interconnected by a high-speed network. They form a group and are managed as a single system. When a client interacts with the cluster, the cluster acts as an independent server. The cluster configuration is used to improve availability and scalability.
- Disk loading Cloud storage persists data to multiple disks, which are media for storing data in cloud storage. Each disk usually includes multiple partitions. In the Linux operating system, disk loading refers to mounting the disk of a device (usually a storage device) to an existing directory. Specifically, if you want to access files in a disk of a storage device, you must Mount the partition where the file is located on an existing directory, and then access the file by accessing this directory. The disk can only be read and written after it is loaded by cloud storage.
- Disk drift The disk drifts between OSDs, which means that the read and write control of the disk is switched from one OSD to another OSD.
- the process of recovering damaged data blocks can be calculated through valid data blocks and check blocks in EC data.
- SAS Serial Attached SCSI (Small Computer System Interface, small computer system interface) serial connection SCSI) switch: a switch that uses the SAS protocol for disk discovery and simulated network communication. After a storage node is connected to a SAS switch, you can discover and use disks in all storage nodes connected to the switch.
- FIG. 1 is a schematic flowchart of a method for dynamically loading a magnetic disk provided by this embodiment, and each step is described in detail below.
- the method for dynamically loading a magnetic disk may be applied to a cloud storage system.
- the cloud storage system includes a management node and multiple storage nodes, and the multiple storage nodes are connected to the same SAS switch.
- the cloud storage system may include multiple Management nodes, the multiple management nodes form a management cluster, as shown in Figure 2, the signaling ports of each management node MDS1, MDS2, MDS3...MDSN in the management cluster are interconnected with ordinary Gigabit switches, through the mutual The interconnection realizes signaling exchange.
- the multiple storage nodes form a storage cluster, and each storage node OSD1, OSD2, 0SD3...OSDN signaling ports in the storage cluster are interconnected with ordinary Gigabit switches, and the signaling exchange is realized through the interconnection between each other.
- the data ports of each storage node OSD1, OSD2, 0SD3...OSDN of the storage cluster are interconnected through SAS switches, and data exchange between them is realized through interconnection.
- the cloud storage system includes a management node MDS as an example for description.
- the signaling exchange between the management node MDS and the ordinary gigabit switch is a two-way exchange.
- the signaling can be transmitted bidirectionally between the management node MDS and the ordinary gigabit switch;
- the signaling exchange between the storage node OSD and the ordinary gigabit switch is Bidirectional exchange, signaling can be bidirectionally transmitted between the storage node OSD and the ordinary gigabit switch;
- the data exchange between the storage node OSD and the SAS switch is also a bidirectional exchange, and the data can be bidirectionally transferred between the storage node OSD and the SAS switch.
- SAS switches use the SAS protocol for disk discovery and simulated network communication
- when a storage node is connected to the SAS switch it can discover and use the disks in all storage nodes connected to the switch.
- the storage node OSD can access the disks of other storage nodes connected to the SAS switch.
- the method for dynamically loading a disk may include the following steps:
- the management node When the management node detects that the first storage node has a software failure, it sends a disk load instruction to the second storage node.
- a storage node has a software level failure, such as a failed service start or an abnormal operating system.
- this storage node is referred to as a faulty storage node.
- the faulty storage node is also referred to as a first storage node.
- the management node MDS considers the failed storage node offline. At this time, the management node MDS requests other storage nodes connected to the SAS switch to try to load the failed storage node.
- the other storage node is referred to as a second storage node, that is, the management node sends a disk load instruction to the second storage node to instruct the second storage node to load the first storage node. Disk.
- the second storage node After receiving the disk loading instruction, the second storage node loads the disk of the first storage node through a SAS switch.
- the second storage node After receiving the disk loading instruction, the second storage node loads the disk in the first storage node. After the second storage node is successfully loaded, the data in the disk in the first storage node can be normally read by the second storage node, of course, the data in the disk of the first storage node can also be written by the second storage node , Thereby avoiding the process of data recovery.
- the second storage node updates the index information of the disk in the first storage node to the database of the second storage node.
- the disk index information in the first storage node can be sent to the second storage node through the SAS switch, and the second storage node copies the disk index information in the first storage node to the database of the local node for
- the purpose of the update is to use this disk index information to read the data in the disk in the first storage node that has a software failure in the future.
- the management node MDS can dynamically adjust the disks in the failed storage node to other storage nodes for read, write, and load according to the status of the storage node. For example, if the management node MDS does not find the storage node abnormal, it reads and writes the data in the disk normally; and when the management node MDS finds a storage node abnormal, it requests another storage node in the same switch to load the disk of the failed storage node , Through the other storage node to normally read and write data in the disk of the failed storage node, to achieve disk drift.
- the management node MDS realizes disk drift according to the status of the storage node, that is, when a storage node has a software failure, the read and write permissions of the disk drift from the failed storage node to the normal storage node in the storage cluster. After the disk drifts, all disk read and write requests are executed through the normal storage node.
- the normal storage node uses the drifted disk like a local disk. Therefore, the normal storage node can access the disk in the failed storage node normally through the SAS switch, and the disk in the failed storage node can be normally loaded. In this way, the data in the disk of the failed storage node can still be read and written normally without using copy mode or EC mode for recovery.
- the management node updates locally stored storage node information corresponding to the disk.
- the disk is a disk in the first storage node, that is, a disk that drifts to the second storage node.
- the management node updates the disk and storage node information of the second storage node to the local database.
- the storage node information is used to uniquely indicate a storage node.
- the method may further include: the management node receives a message that the second storage node successfully loads the disk. That is to say, after determining that the second storage node successfully loads the disk in the first storage node, the management node updates the above-mentioned disk and the storage node information of the second storage node to the local database.
- the second storage node After the second storage node successfully loads the disk in the first storage node, it will send a corresponding message that the disk is successfully loaded to the management node.
- the management node After receiving the message that the disk is loaded successfully, the management node updates the information of the disk and the storage node of the second storage node to the local database of the management node as a record, so that if the first storage node fails again next time When the disk needs to be loaded again, it is not necessary to search or find a new storage node to load the disk, and a second storage node can be directly assigned to load the disk.
- the disk in the faulty storage node with software level abnormality can be successfully loaded and read and written by other storage nodes. Data read and write does not need to be reconstructed to restore data, avoiding Unnecessary calculations. Moreover, after the storage node is abnormal, the reading and writing of data in the entire cloud storage system will not have much performance impact.
- the management node MDS may request the second storage node to unload the loaded disk. For example, after the failed storage node returns to normal, the management node MDS may first request the second storage node to unload the disk of the loaded failed storage node, and then request the failed storage node to load the disk, so that the local disk of the failed storage node can be replaced by the The failed storage node takes over itself, thereby dispersing the pressure on the storage disk of the storage node in the system.
- this application uses disks to drift between storage nodes inside the object storage to realize dynamic loading of the disks.
- the storage nodes can be drifted through SAS switches to continue to access the disks of the failed storage nodes Data to improve disk availability.
- the disk dynamic loading method may further include:
- the management node When the management node receives the read request to read the data of the disk, the management node sends the read request to the second storage node according to the updated storage node information corresponding to the locally stored disk.
- the second storage node reads the data in the disk through the SAS switch according to the received read request.
- the management node when the management node receives a write request to write data to the disk, it sends the write request to the second according to the updated storage node information corresponding to the disk stored locally.
- the second storage node writes data to the disk through the SAS switch according to the received write request.
- the disks of the failed storage node can be read and written normally after being loaded by other storage nodes. After the disk of the failed storage node is successfully loaded by the normal storage node, subsequent disk data reading and writing can be performed through the normal storage node of the loaded disk.
- the SAS switch allows the storage node to access the disks of other storage nodes in the same switch, just like The same as accessing the local disk.
- the cloud storage system includes: a management node and multiple storage nodes, the multiple storage nodes accessing the same SAS switch, and the multiple storage nodes include a first storage Node and second storage node, where:
- the management node is configured to send a disk load instruction to the second storage node when a software failure is detected in the first storage node;
- the second storage node is configured to load the disk of the first storage node through the SAS switch after receiving the disk loading instruction.
- the management node is also used to update storage node information corresponding to the locally stored disk.
- the management node is further configured to, when receiving a read request to read the data of the disk, update the read request according to the updated storage node information corresponding to the disk stored locally Sent to the second storage node;
- the second storage node is further configured to read the data in the disk through the SAS switch according to the received read request.
- the management node is further configured to, when receiving a write request to write data to the disk, update the write request according to the updated storage node information corresponding to the disk stored locally Sent to the second storage node;
- the second storage node is further configured to write data to the disk through the SAS switch according to the received write request.
- the second storage node is also used to update the index information of the disk in the first storage node to the database of the second storage node.
- the management node is also used to update the disk and the storage node information of the second storage node to the local database.
- the management node is further configured to receive a message that the second storage node successfully loads the disk.
- the management node MDS implements the drift disk Steps can include:
- the storage node OSD1 is abnormal.
- the software level of the storage node OSD1 fails, such as service startup failure, abnormal operating system, etc.
- the disk and the data on the disk are normal, and the disk can still be accessed.
- the management node MDS requests the storage node OSD2 to load the disk of the storage node OSD1.
- the storage node OSD1 After a software failure occurs on the storage node OSD1, the storage node OSD1 cannot report the heartbeat to the management node MDS.
- the management node MDS considers that the storage node OSD1 is offline. At this time, the management node MDS requests the other storage node OSD2 to try to load the storage node OSD1 disk. After the storage node OSD2 is successfully loaded, the disk data in the storage node OSD1 can be normally read by other storage nodes OSD2. Of course, the other storage node OSD2 can also write data to the disk, thereby avoiding the data recovery process.
- the management node MDS can dynamically adjust the disk to other storage node OSD2 for reading, writing, and loading. For example, if the management node MDS finds that the storage node is abnormal, it reads and writes the disk data normally; and when the management node MDS finds that the storage node OSD1 is abnormal, it requests the storage node OSD2 in the same switch to load the disk of the storage node OSD1, through the storage node OSD2 normally reads and writes the disk data of storage node OSD1 to realize disk drift.
- the management node MDS implements disk drift according to the state of the storage node.
- OSD1 the read and write permissions of the disk drift from the faulty storage node OSD1 to the normal storage node OSD2 in the storage cluster.
- the storage node OSD2 successfully loads the disk of OSD1.
- the storage node OSD2 uses the drifted disk like a local disk.
- the storage node OSD2 can normally access the disk in OSD1 through the SAS switch, and the disk in OSD1 can be normally loaded.
- the disk of the faulty storage node OSD1 can be read and written normally after being loaded by other storage nodes OSD2. After the disk of the faulty storage node OSD1 is successfully loaded by the normal storage node OSD2, the subsequent reading and writing of the disk data can be performed by the storage node OSD2 loading the disk.
- the SAS switch allows the storage node OSD2 to access other storage nodes OSD1 in the same switch. Disk, just like accessing a local disk.
- the disk can be successfully loaded and read and written by other storage nodes OSD2. Data read and write does not need to be reconstructed to restore data, avoiding unnecessary calculations. Moreover, after the storage node OSD1 is abnormal, the reading and writing of data in the entire cloud storage system will not have much performance impact.
- the MDS requests other storage nodes to unmount the loaded disk.
- the management node MDS may first request the other storage node OSD2 to unload the disk of the failed storage node OSD1, and then request the storage node OSD1 to load the disk, so that the local disk of the storage node OSD1
- the storage node OSD1 can take over the reading and writing by itself, thereby dispersing the pressure of the operating disk of the storage node OSD in the system.
- the management node MDS requests the other storage nodes in the SAS switch to load the disks of the failed storage node, thereby managing the nodes
- the steps of MDS drift disk are as follows:
- the storage nodes OSD1 and OSD3 are abnormal.
- the management node MDS requests the storage node OSD2 to load the disks of the storage nodes OSD1 and OSD3.
- the storage nodes OSD1 and OSD3 cannot report the heartbeat to the management node MDS.
- the management node MDS considers the storage node OSD1 and OSD3 to be offline. At this time, the management node MDS requests the other storage node OSD2 to try to load the storage node OSD1 and The disk in OSD3, after the other storage node OSD2 is loaded successfully, the data in the disk in the failed storage node can be read normally through the other storage node OSD2, of course, the other storage node OSD2 can also write data to the disk, thereby avoiding The process of data recovery.
- the management node MDS can dynamically adjust the disk to other storage nodes for read-write loading. For example, if the management node MDS finds that the storage node is abnormal, it reads and writes the disk data normally; and when the management node MDS finds that the storage nodes OSD1 and OSD3 are abnormal, it requests the storage node OSD2 in the same switch to load the storage nodes OSD1 and OSD3. Disk, through the storage node OSD2 to read and write data in the disks of the storage nodes OSD1 and OSD3 normally, to achieve disk drift.
- the management node MDS implements disk drift according to the state of the storage node.
- the read and write permissions of the disk automatically drift from the failed storage node OSD1 and OSD3 to the normal storage node OSD2 in the storage cluster.
- the storage node OSD2 successfully loads the disks in the storage nodes OSD1 and OSD3.
- the storage node OSD2 uses the drifted disk like a local disk.
- the storage node OSD2 can normally access the disks in OSD1 and OSD3 through the SAS switch, and the disks in OSD1 and OSD3 can be normally loaded.
- the management node MDS requests the other multiple storage nodes in the switch to load the disks of the failed storage node, thereby managing
- the steps for implementing drift disk on the node MDS are as follows:
- the storage nodes OSD1 and OSD3 are abnormal.
- the management node MDS requests the storage nodes OSD2 and OSD4 to load the disks of the storage nodes OSD1 and OSD3.
- the storage nodes OSD1 and OSD3 cannot report the heartbeat to the management node MDS.
- the management node MDS considers that the storage nodes OSD1 and OSD3 are offline. At this time, the management node MDS requests other storage nodes OSD2 and OSD4 to try to load the storage node. After the disks of OSD1 and OSD3, and other storage nodes OSD2 and OSD4 are loaded successfully, the data in the disk in the failed storage node can be read normally by other storage nodes OSD2 and OSD4. Of course, the OSD2 and OSD4 can also write data to the disk. This avoids the process of data recovery.
- the management node MDS can dynamically adjust the disk to other storage nodes for loading and reading and writing. For example, if the management node MDS does not find the storage node abnormal, it reads and writes the data in the disk normally; and when the management node MDS finds that the storage node OSD1, OSD3 is abnormal, requests the storage node OSD2, OSD4 in the same switch to load the storage node OSD1 , OSD3 disk, through the storage node OSD2, OSD4 normally read and write data in the storage node OSD1, OSD3 disk, for example, the storage node OSD2 can normally read and write data in the storage node OSD1 disk, and, through storage The node OSD4 normally reads and writes data in the disk of the storage node OSD3, thereby realizing disk drift.
- the management node MDS implements disk drift according to the state of the storage node.
- the read and write permissions of the disk drift from the failed storage nodes OSD1 and OSD3 to the normal storage nodes OSD2 and OSD4 in the storage cluster.
- the storage nodes OSD2 and OSD4 successfully load the disks of the storage nodes OSD1 and OSD3.
- the storage nodes OSD2 and OSD4 can normally access the disks in OSD1 and OSD3 through the SAS switch, and the disks in OSD1 and OSD3 can be normally loaded.
- the disk dynamic loading device may be the aforementioned management device, or may also be the aforementioned storage node, which may include:
- One or more processors a storage device that stores one or more programs
- the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the disk dynamic loading method.
- a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the disk dynamic loading method is implemented.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Quality & Reliability (AREA)
- Computer Security & Cryptography (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (14)
- 一种磁盘动态加载的方法,其特征在于,应用于云存储系统,所述云存储系统包括管理节点以及多个存储节点,所述多个存储节点接入同一SAS交换机,所述方法包括:A method for dynamically loading a magnetic disk is characterized by being applied to a cloud storage system. The cloud storage system includes a management node and multiple storage nodes. The multiple storage nodes access the same SAS switch. The method includes:当所述管理节点检测到所述多个存储节点中的第一存储节点出现软件故障时,发送磁盘加载指令至所述多个存储节点中的第二存储节点;When the management node detects a software failure of the first storage node of the plurality of storage nodes, it sends a disk load instruction to the second storage node of the plurality of storage nodes;所述第二存储节点在接收到所述磁盘加载指令后,通过所述SAS交换机加载所述第一存储节点的磁盘。After receiving the disk loading instruction, the second storage node loads the disk of the first storage node through the SAS switch.
- 如权利要求1所述的方法,其特征在于,所述方法还包括:The method according to claim 1, wherein the method further comprises:所述管理节点更新本地存储的所述磁盘对应的存储节点信息。The management node updates storage node information corresponding to the locally stored disk.
- 根据权利要求2所述的方法,其特征在于,所述方法还包括:The method according to claim 2, wherein the method further comprises:当所述管理节点接收到读取所述磁盘的数据的读取请求时,根据更新后的本地存储的所述磁盘对应的存储节点信息,将所述读取请求下发至所述第二存储节点;When the management node receives the read request to read the data of the disk, the management node sends the read request to the second storage according to the updated storage node information corresponding to the disk stored locally node;所述第二存储节点根据接收到的所述读取请求,通过所述SAS交换机读取所述磁盘中的数据。The second storage node reads the data in the disk through the SAS switch according to the received read request.
- 如权利要求2所述的方法,其特征在于,所述方法还包括:The method according to claim 2, wherein the method further comprises:当所述管理节点接收到向所述磁盘写入数据的写入请求时,根据更新后的本地存储的所述磁盘对应的存储节点信息,将所述写入请求下发至所述第二存储节点;When the management node receives a write request to write data to the disk, it sends the write request to the second storage according to the updated storage node information corresponding to the disk stored locally node;所述第二存储节点根据接收到的所述写入请求,通过所述SAS交换机向所述磁盘写入数据。The second storage node writes data to the disk through the SAS switch according to the received write request.
- 根据权利要求1所述的方法,其特征在于,所述通过所述SAS交换机加载所述第一存储节点的磁盘,包括:The method according to claim 1, wherein loading the disk of the first storage node through the SAS switch includes:所述第二存储节点将所述第一存储节点中的磁盘的索引信息更新至所述第二存储节点的数据库。The second storage node updates the index information of the disk in the first storage node to the database of the second storage node.
- 根据权利要求2所述的方法,其特征在于,所述管理节点更新本地存储的所述磁盘对应的存储节点信息,包括:The method according to claim 2, wherein the management node updating the storage node information corresponding to the locally stored disk includes:所述管理节点将所述磁盘与所述第二存储节点的存储节点信息对应更新至本地数据库中。The management node correspondingly updates the storage disk information of the disk and the second storage node to a local database.
- 根据权利要求1所述的方法,其特征在于,在所述管理节点更新本地存储的所述磁盘对应的存储节点信息之前,还包括:The method according to claim 1, wherein before the management node updates the storage node information corresponding to the locally stored disk, the method further comprises:所述管理节点接收到所述第二存储节点发送的加载磁盘成功的消息。The management node receives the message that the disk is loaded successfully from the second storage node.
- 一种云存储系统,其特征在于,所述云存储系统包括:管理节点以及多个存储节点,所述多个存储节点接入同一SAS交换机,所述多个存储节点包括第一存储节点和第二存储节点,其中:A cloud storage system, characterized in that the cloud storage system includes: a management node and a plurality of storage nodes, the plurality of storage nodes are connected to the same SAS switch, and the plurality of storage nodes include a first storage node and a Two storage nodes, where:所述管理节点,用于在检测到所述第一存储节点出现软件故障时,发送磁盘加载指令至所述第二存储节点;The management node is configured to send a disk load instruction to the second storage node when it detects that the first storage node has a software failure;所述第二存储节点,用于在接收到所述磁盘加载指令后,通过所述SAS交换机加载所述第一存储节点的磁盘。The second storage node is configured to load the disk of the first storage node through the SAS switch after receiving the disk loading instruction.
- 如权利要求8所述的系统,其特征在于,The system of claim 8, wherein:所述管理节点,还用于更新本地存储的所述磁盘对应的存储节点信息。The management node is also used to update locally stored storage node information corresponding to the disk.
- 根据权利要求9所述的系统,其特征在于,The system of claim 9, wherein:所述管理节点,还用于在接收到读取所述磁盘的数据的读取请求时,根据更新后的本地存储的所述磁盘对应的存储节点信息,将所述读取请求下发至所述第二存储节点;The management node is further configured to, when receiving a read request to read the data of the disk, send the read request to all of the disks according to the updated storage node information corresponding to the disk stored locally Describe the second storage node;所述第二存储节点,还用于根据接收到的所述读取请求,通过所述SAS交换机读取所述磁盘中的数据。The second storage node is also used to read the data in the disk through the SAS switch according to the received read request.
- 根据权利要求9所述的系统,其特征在于,The system of claim 9, wherein:所述管理节点,还用于在接收到向所述磁盘写入数据的写入请求时,根据更新后的本地存储的所述磁盘对应的存储节点信息,将所述写入请求下发至所述第二存储节点;The management node is further configured to, when receiving a write request to write data to the disk, send the write request to all the disks according to the updated storage node information corresponding to the disk stored locally Describe the second storage node;所述第二存储节点,还用于根据接收到的所述写入请求,通过所述SAS交换机向所述磁盘写入数据。The second storage node is further configured to write data to the disk through the SAS switch according to the received write request.
- 根据权利要求8所述的系统,其特征在于:The system according to claim 8, characterized in that:所述第二存储节点,还用于将所述第一存储节点中的磁盘的索引信息更新至所述第二存储节点的数据库。The second storage node is also used to update the index information of the disk in the first storage node to the database of the second storage node.
- 根据权利要求9所述的系统,其特征在于:The system according to claim 9, characterized in that:所述管理节点,还用于将所述磁盘与所述第二存储节点的存储节点信息对应更新至本地数据库中。The management node is also used to correspondingly update the storage node information of the disk and the second storage node to a local database.
- 根据权利要求8所述的系统,其特征在于:The system according to claim 8, characterized in that:所述管理节点,还用于接收所述第二存储节点发送的加载磁盘成功的消息。The management node is also used to receive a message that the second storage node successfully loads the disk.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811625675.X | 2018-12-28 | ||
CN201811625675.XA CN111381766B (en) | 2018-12-28 | 2018-12-28 | Method for dynamically loading disk and cloud storage system |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020135889A1 true WO2020135889A1 (en) | 2020-07-02 |
Family
ID=71129699
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/130169 WO2020135889A1 (en) | 2018-12-28 | 2019-12-30 | Method for dynamic loading of disk and cloud storage system |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111381766B (en) |
WO (1) | WO2020135889A1 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111880751B (en) * | 2020-09-28 | 2020-12-25 | 浙江大华技术股份有限公司 | Hard disk migration method, distributed storage cluster system and storage medium |
TWI784750B (en) * | 2021-10-15 | 2022-11-21 | 啟碁科技股份有限公司 | Data processing method of terminal device and data processing system of terminal device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101969465A (en) * | 2010-10-13 | 2011-02-09 | 北京神州融信信息技术股份有限公司 | Cluster read-write method, apparatus and system and controller |
CN103608784A (en) * | 2013-06-26 | 2014-02-26 | 华为技术有限公司 | Method for creating network volumes, data storage method, storage device and storage system |
US20160070622A1 (en) * | 2010-09-24 | 2016-03-10 | Hitachi Data Systems Corporation | System and method for enhancing availability of a distributed object storage system during a partial database outage |
CN107046575A (en) * | 2017-04-18 | 2017-08-15 | 南京卓盛云信息科技有限公司 | A kind of cloud storage system and its high density storage method |
CN107124469A (en) * | 2017-06-07 | 2017-09-01 | 郑州云海信息技术有限公司 | A kind of clustered node communication means and system |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105657066B (en) * | 2016-03-23 | 2019-06-14 | 天津书生云科技有限公司 | Load rebalancing method and device for storage system |
CN103067485A (en) * | 2012-12-25 | 2013-04-24 | 曙光信息产业(北京)有限公司 | Disk monitoring method for cloud storage system |
CN103152397B (en) * | 2013-02-06 | 2017-05-03 | 浪潮电子信息产业股份有限公司 | Method for designing multi-protocol storage system |
CN104967577B (en) * | 2015-06-25 | 2019-09-03 | 北京百度网讯科技有限公司 | SAS switch and server |
-
2018
- 2018-12-28 CN CN201811625675.XA patent/CN111381766B/en active Active
-
2019
- 2019-12-30 WO PCT/CN2019/130169 patent/WO2020135889A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160070622A1 (en) * | 2010-09-24 | 2016-03-10 | Hitachi Data Systems Corporation | System and method for enhancing availability of a distributed object storage system during a partial database outage |
CN101969465A (en) * | 2010-10-13 | 2011-02-09 | 北京神州融信信息技术股份有限公司 | Cluster read-write method, apparatus and system and controller |
CN103608784A (en) * | 2013-06-26 | 2014-02-26 | 华为技术有限公司 | Method for creating network volumes, data storage method, storage device and storage system |
CN107046575A (en) * | 2017-04-18 | 2017-08-15 | 南京卓盛云信息科技有限公司 | A kind of cloud storage system and its high density storage method |
CN107124469A (en) * | 2017-06-07 | 2017-09-01 | 郑州云海信息技术有限公司 | A kind of clustered node communication means and system |
Also Published As
Publication number | Publication date |
---|---|
CN111381766B (en) | 2022-08-02 |
CN111381766A (en) | 2020-07-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11153380B2 (en) | Continuous backup of data in a distributed data store | |
US10387673B2 (en) | Fully managed account level blob data encryption in a distributed storage environment | |
US9098466B2 (en) | Switching between mirrored volumes | |
US9703803B2 (en) | Replica identification and collision avoidance in file system replication | |
US9582213B2 (en) | Object store architecture for distributed data processing system | |
US7913046B2 (en) | Method for performing a snapshot in a distributed shared file system | |
US20190007208A1 (en) | Encrypting existing live unencrypted data using age-based garbage collection | |
US8386707B2 (en) | Virtual disk management program, storage device management program, multinode storage system, and virtual disk managing method | |
US9262087B2 (en) | Non-disruptive configuration of a virtualization controller in a data storage system | |
US9823955B2 (en) | Storage system which is capable of processing file access requests and block access requests, and which can manage failures in A and storage system failure management method having a cluster configuration | |
US11681443B1 (en) | Durable data storage with snapshot storage space optimization | |
US9760457B2 (en) | System, method and computer program product for recovering stub files | |
CN103037004A (en) | Implement method and device of cloud storage system operation | |
US20050234916A1 (en) | Method, apparatus and program storage device for providing control to a networked storage architecture | |
US12001724B2 (en) | Forwarding operations to bypass persistent memory | |
US8386741B2 (en) | Method and apparatus for optimizing data allocation | |
US20090024768A1 (en) | Connection management program, connection management method and information processing apparatus | |
US20240103744A1 (en) | Block allocation for persistent memory during aggregate transition | |
CN106528338A (en) | Remote data replication method, storage equipment and storage system | |
WO2020135889A1 (en) | Method for dynamic loading of disk and cloud storage system | |
US20130275670A1 (en) | Multiple enhanced catalog sharing (ecs) cache structure for sharing catalogs in a multiprocessor system | |
US11169728B2 (en) | Replication configuration for multiple heterogeneous data stores | |
CN114490540A (en) | Data storage method, medium, device and computing equipment | |
CN119127095B (en) | Multi-tenant distributed file system, request method and device based on gRPC |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19904425 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19904425 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19904425 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 03.02.2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19904425 Country of ref document: EP Kind code of ref document: A1 |