CN1728665A - Expandable storage system and control method based on objects - Google Patents
Expandable storage system and control method based on objects Download PDFInfo
- Publication number
- CN1728665A CN1728665A CN 200510019166 CN200510019166A CN1728665A CN 1728665 A CN1728665 A CN 1728665A CN 200510019166 CN200510019166 CN 200510019166 CN 200510019166 A CN200510019166 A CN 200510019166A CN 1728665 A CN1728665 A CN 1728665A
- Authority
- CN
- China
- Prior art keywords
- storage
- network
- client
- osn
- data server
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
一种可扩展的基于对象的存储系统及其控制方法,属于计算机信息存储技术领域,目的在于克服现有海量存储系统的系统结构和用户服务模式的不足,将文件、块方式结合起来提出一种全新的对象接口。系统包括通过网络连接的I个元数据服务器、N个对象存储节点以及M个客户端;客户端、对象存储节点、元数据服务器之间实现三方通信,系统可扩展性强,系统存储容量与数据传输集合带宽可同步增长,并且,基于对象的存储使系统具有自适应能力。本发明改变了传统的数据管理和控制方式,对象存储节点OSN完成传统文件系统繁琐的底层数据管理功能,MS实施元数据管理。系统在对象存储的基础上实现了自适应的管理功能。
An extensible object-based storage system and its control method belong to the field of computer information storage technology. The purpose is to overcome the deficiencies in the system structure and user service mode of the existing mass storage system, and propose a method combining file and block methods. Brand new object interface. The system includes I metadata server, N object storage nodes, and M clients connected through the network; three-way communication is realized between clients, object storage nodes, and metadata servers, and the system has strong scalability. Transfer aggregate bandwidth can grow synchronously, and object-based storage enables the system to be adaptive. The invention changes the traditional data management and control mode, the object storage node OSN completes the cumbersome underlying data management function of the traditional file system, and the MS implements metadata management. The system realizes adaptive management function on the basis of object storage.
Description
技术领域technical field
本发明属于计算机信息存储技术领域,具体涉及一种可扩展的基于对象的存储系统及其控制方法。The invention belongs to the technical field of computer information storage, and in particular relates to an expandable object-based storage system and a control method thereof.
背景技术Background technique
理想的存储系统应该具有下列五个特征:安全性、跨平台数据共享、高性能、可扩展性及智能性。现今广泛使用的直接存取存储(DAS)、附网存储(NAS)和存储区域网(SAN)三种结构均存在不同程度的缺陷,难以同时具备上述五个特征;这些系统的本质始终没有改变,即以块(Block)或文件(File)为基本传输单位,缺少以对象(Object)为接口的存储系统所具有的智能等特性。随着存储需求的增加及存储应用日益复杂,以块或文件为基础的存储接口限制了存储工业的发展。An ideal storage system should have the following five characteristics: security, cross-platform data sharing, high performance, scalability, and intelligence. The three structures of Direct Access Storage (DAS), Network Attached Storage (NAS) and Storage Area Network (SAN) that are widely used today have different degrees of defects, and it is difficult to have the above five characteristics at the same time; the essence of these systems has never changed , that is, the basic transmission unit is block (Block) or file (File), and it lacks the intelligence and other characteristics of the storage system with object (Object) as the interface. As storage requirements increase and storage applications become increasingly complex, block or file-based storage interfaces limit the development of the storage industry.
在地震分析、基因分析、核爆炸仿真模拟等许多前沿性的研究领域中,要求存储容量至少在1TB、甚至1PB以上,而且容量需求还在不断增长。与此同时,随着网络的普及和多媒体应用推广,对带宽的要求也在不断同步提高,某些重要领域甚至要求存储系统具有1TB/S的带宽。而现有存储系统沿用外部存储设备模式,不适应网络发展需求。主要表现为网络存储协议复杂(存储协议簇如SCSI、ATA-5、SATA、FC、iSCSI、iFCP、USB等;网络协议簇如IEEE802.x、ATM、TCP/IP、UDP、RTP/RTCP、FTP、GridFTP、SNMP等)、管理复杂、效率低、客户使用不便。现有存储设备是个哑终端,只包含数据,元数据信息则由主机组织和管理,存储设备只是被动响应;安全机制由主机实现,存储设备上的数据缺乏保护(如一块硬盘从一台主机接插到另一台主机,其数据仍然可读出);而且,如前所述,存储协议的复杂带来接口种类繁多,协议变换开销大。In many cutting-edge research fields such as earthquake analysis, genetic analysis, and nuclear explosion simulation, the storage capacity is required to be at least 1TB, or even more than 1PB, and the demand for capacity is still growing. At the same time, with the popularization of the network and the promotion of multimedia applications, the requirements for bandwidth are also increasing simultaneously, and some important fields even require the storage system to have a bandwidth of 1TB/S. However, the existing storage system still uses the external storage device mode, which does not meet the needs of network development. The main performance is that the network storage protocol is complex (storage protocol clusters such as SCSI, ATA-5, SATA, FC, iSCSI, iFCP, USB, etc.; network protocol clusters such as IEEE802.x, ATM, TCP/IP, UDP, RTP/RTCP, FTP , GridFTP, SNMP, etc.), complex management, low efficiency, and inconvenience for customers. The existing storage device is a dumb terminal, which only contains data, metadata information is organized and managed by the host, and the storage device only responds passively; the security mechanism is implemented by the host, and the data on the storage device lacks protection (for example, a hard disk is connected to Plugged into another host, its data can still be read); and, as mentioned earlier, the complexity of the storage protocol brings a wide variety of interfaces, and the overhead of protocol conversion is large.
由于受到服务器与存储设备连接这种传统存取模式的制约,尽管可以通过挂接高密度、大容量存储设备的方式来满足容量上的要求,但它不能同时提高存储带宽。而且大量存储设备上的I/O数据都需经由服务器转发给客户端,这种工作模式不仅容易导致外设通道拥塞,还会因为数据存取和传输过程中经历的多次存储转发,增加系统开销和传输延迟,导致平均数传率降低和服务等待时间加长。特别是当大量客户发出请求时,上述问题会更加突出,形成“服务器瓶颈”。而且,由于存储设备智能性不高,系统自适应能力低,不能主动适应应用环境的变化,这样,客户的存取服务得不到保证,又加重了管理人员的负担。Due to the constraints of the traditional access mode of connecting servers and storage devices, although high-density and large-capacity storage devices can be attached to meet capacity requirements, it cannot simultaneously increase storage bandwidth. Moreover, the I/O data on a large number of storage devices needs to be forwarded to the client through the server. This working mode not only easily leads to peripheral channel congestion, but also increases system Overhead and transmission delays, resulting in lower average data transfer rates and longer wait times for services. Especially when a large number of customers send requests, the above problems will become more prominent, forming a "server bottleneck". Moreover, due to the low intelligence of the storage device and the low adaptive capacity of the system, it cannot actively adapt to changes in the application environment. In this way, the customer's access service cannot be guaranteed, and the burden on the management personnel is increased.
因此,有必要构造一种基于全新的对象(Object)接口,充分利用对象存储的特性,使系统具有良好的扩展性能、智能性、可提供高速数据传输率且可缓解服务器负载的海量存储系统,以满足人们日益苛刻的存储要求。Therefore, it is necessary to construct a mass storage system based on a new object (Object) interface, making full use of the characteristics of object storage, so that the system has good scalability, intelligence, high-speed data transmission rate and server load relief. To meet people's increasingly demanding storage requirements.
发明内容Contents of the invention
本发明提出一种可扩展的基于对象的存储系统及其控制方法,目的在于克服现有海量存储系统的系统结构和用户服务模式的不足,将文件(file)、块(block)方式结合起来提出一种全新的对象接口(Objectinterface);从系统结构的角度构造一种可满足人们对容量、带宽需求不断增长的海量存储系统,并减少服务器负载、提高I/O服务性能。The present invention proposes an expandable object-based storage system and its control method, aiming at overcoming the deficiencies in the system structure and user service mode of existing mass storage systems, and combining file (file) and block (block) methods to propose A brand-new object interface (Objectinterface); from the perspective of system structure, construct a mass storage system that can meet people's growing demand for capacity and bandwidth, reduce server load, and improve I/O service performance.
本发明一种可扩展的基于对象的存储系统,包括通过网络连接的1个元数据服务器、N个对象存储节点以及M个客户端;An expandable object-based storage system of the present invention includes a metadata server, N object storage nodes and M clients connected through a network;
(1)元数据服务器实现多个存储对象的管理,以及数据在多个存储对象上的映射;各元数据服务器依次通过元数据服务器网络接口、元数据服务器网络通道连入网络;(1) The metadata server realizes the management of multiple storage objects and the mapping of data on multiple storage objects; each metadata server is connected to the network through the metadata server network interface and the metadata server network channel in turn;
(2)对象存储节点由基于对象的存储控制器通过J个存储设备通道连接J套存储设备组成,负责对象数据的存储;(2) The object storage node is composed of an object-based storage controller connected to J sets of storage devices through J storage device channels, and is responsible for the storage of object data;
A.基于对象的存储控制器包括运行控制模块、存储设备接口模块以及网络接口模块,各模块之间通过外围组件互连扩展总线PCI-X物理连接;A. The object-based storage controller includes a running control module, a storage device interface module and a network interface module, and each module is physically connected through the peripheral component interconnection expansion bus PCI-X;
B.所述的运行控制模块包括CPU、RAM和EPROM,相互之间通过外围组件互连扩展总线PCI-X物理连接,提供计算能力、方法运行环境与调度能力;B. described operation control module comprises CPU, RAM and EPROM, mutually interconnects expansion bus PCI-X physical connection by peripheral component, provides computing power, method operating environment and dispatching ability;
C.存储设备接口模块由J个存储设备接口构成,提供该控制器与存储设备连接;C. The storage device interface module is composed of J storage device interfaces, providing the controller to connect with the storage device;
D.网络接口模块包括K个网络接口;D. The network interface module includes K network interfaces;
各对象存储节点通过各自的k个网络接口和k个网络通道连入网络;Each object storage node is connected to the network through its own k network interfaces and k network channels;
(3)各客户端依次通过客户端网络接口、客户端网络通道连入网络;(3) Each client is connected to the network through the client network interface and the client network channel in turn;
(4)元数据服务器与客户端之间通过网络完成元数据服务器对客户的认证、授权及返回客户请求的存储对象的映射表;客户端和对象存储节点的存储设备之间通过网络进行数据的传输;元数据服务器和对象存储节点的存储设备之间通过网络实现元数据服务器对存储设备的元数据管理;上述I、J、K、M、N均为自然数。(4) Between the metadata server and the client, the metadata server authenticates and authorizes the client and returns the mapping table of the storage object requested by the client through the network; the client and the storage device of the object storage node perform data transfer through the network Transmission; between the metadata server and the storage device of the object storage node, the metadata management of the storage device by the metadata server is realized through the network; the above-mentioned I, J, K, M, and N are all natural numbers.
所述的可扩展的基于对象的存储系统,其特征在于,根据性能、成本所述存储设备接口采用SCSI、FC、IDE或SATA执行面向块设备级协议的接口;所述存储设备为各类接口的磁盘驱动器;所述网络通道接口可以采用相同或不同的形式,且不局限于任何网络形态,各网络接口可以接入同一网络或不同网络中,按基于对象存储的协议接受客户请求。The scalable object-based storage system is characterized in that, according to performance and cost, the storage device interface adopts SCSI, FC, IDE or SATA to implement an interface facing the block device level protocol; the storage device is a variety of interfaces The disk drive; the network channel interface can adopt the same or different forms, and is not limited to any network form, each network interface can be connected to the same network or different networks, and accept client requests according to the protocol based on object storage.
本发明可扩展的基于对象的存储系统的控制方法,顺序包括下述步骤:The control method of the scalable object-based storage system of the present invention comprises the following steps in sequence:
(1)元数据服务器初始化完毕后,存储系统接收客户请求,并对其进行身份验证,通过验证的合法客户请求,进行下一步骤,否则作为非法客户请求处理;(1) After the metadata server is initialized, the storage system receives the customer request and performs identity verification on it, and the legal customer request that passes the verification will proceed to the next step, otherwise it will be treated as an illegal customer request;
(2)判断客户请求的对象类型,进行相应处理:(2) Determine the type of object requested by the customer and process it accordingly:
对根对象的处理请求转步骤(3),写用户对象请求转步骤(4),读用户对象请求转步骤(5),对分区对象的请求转步骤(6),对集合对象的请求转步骤(7);Go to step (3) for root object processing requests, go to step (4) for write user object requests, go to step (5) for read user object requests, go to step (6) for partition object requests, go to step (6) for collection object requests (7);
(3)对根对象的处理请求:(3) Processing request to the root object:
A.判断该根对象是否请求新注册,不是则进行相应的对象存储节点OSN管理,A. Determine whether the root object requests new registration, if not, perform OSN management of the corresponding object storage node,
B.若是注册新的根对象,则将该对象存储节点OSN格式化为基于对象的存储逻辑单元,B. If a new root object is registered, the object storage node OSN is formatted as an object-based storage logical unit,
C.创建分区对象,C. Create partition objects,
D.元数据服务器记录相关信息,D. The metadata server records relevant information,
E.请求处理结束;E. End of request processing;
(4)写用户对象请求:(4) Write user object request:
A.元数据服务器返回客户有关用户对象对对象存储节点OSN的映射表及授权信息,A. The metadata server returns the mapping table and authorization information of the user object to the OSN of the object storage node,
B.客户依据映射表信息与相关对象存储节点OSN建立网络连接及发送基于对象的写命令,B. The customer establishes a network connection with the relevant object storage node OSN according to the mapping table information and sends an object-based write command,
C.对象存储节点OSN响应客户发来的写用户对象命令,C. The object storage node OSN responds to the write user object command sent by the client,
D.对象存储节点OSN与元数据服务器建立连接并提交处理用户对象请求而变更的元数据信息,D. The object storage node OSN establishes a connection with the metadata server and submits the metadata information changed by processing the user object request,
E.元数据服务器更新相关的元数据信息并向对象存储节点OSN发送元数据更新成功的确认信息,E. The metadata server updates the relevant metadata information and sends a confirmation message of successful metadata update to the object storage node OSN,
F.请求处理结束;F. End of request processing;
(5)读用户对象请求:(5) Read user object request:
A.元数据服务器返回客户有关用户对象对对象存储节点OSN的映射表及授权信息,A. The metadata server returns the mapping table and authorization information of the user object to the OSN of the object storage node,
B.客户依据映射表信息与相关对象存储节点OSN建立网络连接及发送基于对象的读命令,B. The customer establishes a network connection with the relevant object storage node OSN according to the mapping table information and sends an object-based read command,
C.对象存储节点OSN响应客户发来的读用户对象命令,C. The object storage node OSN responds to the read user object command sent by the client,
D.对象存储节点OSN将读用户对象所产生的结果返回给客户端,D. The object storage node OSN returns the result generated by reading the user object to the client,
E.请求处理结束;E. End of request processing;
(6)分区对象的请求:(6) Request for partition objects:
A.检索分区对象,A. Retrieve the partition object,
B.返回客户相应的请求信息,B. Return the corresponding request information to the customer,
C.请求处理结束;C. End of request processing;
(7)集合对象的请求:(7) Request for collection objects:
A.创建或检索集合对象,A. Create or retrieve a collection object,
B.返回客户相应的请求信息,B. Return the corresponding request information to the customer,
C.请求处理结束。C. Request processing ends.
本发明的存储系统的控制方法如图4所示。The control method of the storage system of the present invention is shown in FIG. 4 .
可扩展的基于对象的存储系统SOSS(Scalable Object-based StorageSystem)是按存储对象提供存储服务的解决方案。SOSS将传统的基于块的文件系统的两部分——用户相关部分和存储相关部分进行分割。用户相关部分通过逻辑数据结构(如目录和文件的元数据部分)向用户提供对象调用接口(由元数据服务器管理);存储相关部分完成这些目录和文件的数据到底层物理设备逻辑块上的映射(由对象存储节点提供)。在存储系统SOSS中,对象是存储的逻辑单元,由全局对象ID(object ID)唯一确定,对象包含数据(data)、属性(attributes)和方法(methods)。存储系统SOSS有四个主要组成部分:客户端(Host或Applicationservers),由可扩展的基于对象的存储控制器(Object-Based StorageController,OBSC)组成的对象存储节点(Obiect-based Storage Node,OSN),基于对象的文件系统(Object-based File System,OFS),元数据服务器(Metadata Server,MS)。各部分设计如下:The scalable object-based storage system SOSS (Scalable Object-based Storage System) is a solution to provide storage services by storage objects. SOSS divides two parts of the traditional block-based file system—the user-related part and the storage-related part. The user-related part provides the user with an object call interface (managed by the metadata server) through the logical data structure (such as the metadata part of the directory and file); the storage-related part completes the mapping of the data of these directories and files to the logical blocks of the underlying physical device (provided by object storage nodes). In the storage system SOSS, an object is a logical unit of storage, uniquely identified by the global object ID (object ID), and the object contains data (data), attributes (attributes) and methods (methods). The storage system SOSS has four main components: client (Host or Applicationservers), object storage node (Obiect-based Storage Node, OSN) composed of scalable object-based storage controller (Object-Based StorageController, OBSC) , object-based file system (Object-based File System, OFS), metadata server (Metadata Server, MS). Each part is designed as follows:
客户端:运行在客户端的应用程序看到的是具有标准的POSIX文件系统语义的文件系统。特殊的应用可不通过文件系统直接访问对象。Client: Applications running on the client see the file system with standard POSIX file system semantics. Special applications can directly access objects without going through the file system.
对象存储节点OSN:对象存储节点负责对象数据的存储。基于对象的存储可根据存储对象的属性加载相应的方法(规则或策略),故具有智能处理能力,如将数据用对象方法过滤后再传输。Object storage node OSN: The object storage node is responsible for storing object data. Object-based storage can load corresponding methods (rules or policies) according to the attributes of the storage objects, so it has intelligent processing capabilities, such as filtering data with object methods before transmitting.
元数据服务器MS:实现多个存储对象的管理,以及数据在多个存储对象上的映射。其中MS通过网络对对象存储节点OSN实现管理功能,对客户端实现访问控制的功能;客户端与对象存储节点OSN之间实现数据传输功能。Metadata server MS: realizes the management of multiple storage objects and the mapping of data on multiple storage objects. Among them, the MS implements the management function for the object storage node OSN through the network, and implements the access control function for the client; the data transmission function is implemented between the client and the object storage node OSN.
基于对象的文件系统OFS:采用object metadata代替inode data,元数据服务器MS负责object metadata的一致性管理,OFS基于标准Linuxext2文件系统,带有日志(joumaling)文件系统特性。Object-based file system OFS: Object metadata is used instead of inode data, and the metadata server MS is responsible for the consistency management of object metadata. OFS is based on the standard Linuxext2 file system with a log (joumaling) file system feature.
存储对象(Object)由数据(data)、属性(attributes)和方法(methods)构成,存储系统SOSS用对象属性页描述对象的特性,如对象的类型、对象的创建时间、大小等;对象的方法可针对对象的属性按照一定的规则(或策略)加载/卸载。对象分为四种类型:根对象(Root Object),分区对象(Partition Object),集合对象(Collection Object)和用户对象(UserObject)。一个对象存储节点(OSN)有且仅有一个根对象,分区对象包含零个或多个用户对象(或集合对象),分区对象的数据区只含用户对象ID列表,分区对象的属性包括分区内的用户对象数目,分区内用户对象占用的空间等。集合对象用于实现用户对象的快速检索,一个分区对象可以包含零个或多个集合对象,一个用户对象可以属于零个或多个集合对象。对于分区对象,集合对象与用户对象的地位是等同的。对象存储节点内用户对象占绝大多数。所有对象以分区对象ID和用户对象标识,当对象的分区对象ID和用户对象ID值均为零时为根对象;当对象的分区对象ID为非零而用户对象ID值为零时为分区对象;当对象的分区对象ID和用户对象ID的值均不为零时为用户对象或集合对象。SOSS存储系统带来的一个明显的优点是具有较高的智能。在对象存储节点OSN构成的SOSS存储系统中,传统文件服务器上层的用户相关部分,由元数据服务器MS完成;而下层的存储相关部分下移到对象存储节点OSN中,相应的设备接口也从基于块或文件的接口变为基于对象接口。这样,文件系统的上层只负责把文件名等逻辑名称映射为对象ID,负载减小约90%,对象ID与磁盘块的映射在对象存储节点OSN内完成,元数据服务器MS不易成为处理I/O请求路径上的“瓶颈”,而且,可使用多台元数据服务器并行工作,解决元数据服务器MS的“瓶颈”问题,系统可扩展性强。The storage object (Object) is composed of data (data), attributes (attributes) and methods (methods). The storage system SOSS uses the object property page to describe the characteristics of the object, such as the type of object, the creation time and size of the object, etc.; the method of the object Object properties can be loaded/unloaded according to certain rules (or strategies). Objects are divided into four types: root object (Root Object), partition object (Partition Object), collection object (Collection Object) and user object (UserObject). An object storage node (OSN) has one and only one root object, the partition object contains zero or more user objects (or collection objects), the data area of the partition object only contains the user object ID list, and the attributes of the partition object include The number of user objects in the partition, the space occupied by user objects in the partition, etc. Collection objects are used to quickly retrieve user objects. A partition object can contain zero or more collection objects, and a user object can belong to zero or more collection objects. For partition objects, the status of collection objects and user objects is equal. User objects account for the vast majority of object storage nodes. All objects are identified by partition object ID and user object. When the partition object ID and user object ID values of the object are both zero, it is the root object; when the object partition object ID is non-zero and the user object ID value is zero, it is the partition object. ; When the value of the object's partition object ID and user object ID is not zero, it is a user object or a collection object. An obvious advantage brought by the SOSS storage system is its high intelligence. In the SOSS storage system composed of object storage nodes OSN, the user-related part of the upper layer of the traditional file server is completed by the metadata server MS; while the storage-related part of the lower layer is moved down to the object storage node OSN, and the corresponding device interface is also changed from based on The block or file interface becomes an object-based interface. In this way, the upper layer of the file system is only responsible for mapping logical names such as file names to object IDs, and the load is reduced by about 90%. The mapping between object IDs and disk blocks is completed in the object storage node OSN, and the metadata server MS is not easy to become a processing I/O O requests the "bottleneck" on the path, and can use multiple metadata servers to work in parallel to solve the "bottleneck" problem of the metadata server MS, and the system has strong scalability.
本发明具有如下特点:The present invention has following characteristics:
(1)系统的可扩展性好,添加新的对象存储节点OSN只需向元数据服务器注册(注册过程后文详述)。由于元数据服务器MS只管理元数据,增加的负载很小,方便海量存储系统的扩容而不影响系统性能,且用户的规模扩大时,系统集合带宽呈线性增加。(1) The system has good scalability. Adding a new object storage node OSN only needs to register with the metadata server (the registration process will be described in detail later). Since the metadata server MS only manages metadata, the added load is small, which facilitates the expansion of the mass storage system without affecting system performance, and when the scale of users expands, the aggregate bandwidth of the system increases linearly.
(2)对象存储节点可根据应用环境的变化,动态加载相应的规则(rules)或者是策略(policy),实现诸如负载平衡、热点数据迁移。(2) The object storage node can dynamically load the corresponding rules (rules) or policy (policy) according to the change of the application environment, so as to realize such as load balancing and hotspot data migration.
(3)依据存储对象的属性,可将存储对象组织在多个对象存储节点内使对象存储节点的运行具有高度并行性。除了可以实现并行存取操作外,还可以实现对不同对象请求的并行传输。(3) According to the attributes of the storage objects, the storage objects can be organized in multiple object storage nodes so that the operation of the object storage nodes has a high degree of parallelism. In addition to parallel access operations, parallel transmission of different object requests can also be realized.
(4)实现集中对象元数据的管理与分布对象数据的存取,保证管理的高效率与存储的高性能。(4) Realize the management of centralized object metadata and the access of distributed object data to ensure high efficiency of management and high performance of storage.
(5)数据直接在对象存储节点与用户之间传输,缩短了I/O路径,减少了系统延迟,提高了平均数传率。(5) Data is directly transmitted between object storage nodes and users, which shortens the I/O path, reduces system delay, and improves the average data transmission rate.
基于对象的存储,可依据对象属性(诸如QoS需求等),通过加载/卸载对象的操作方法,使对象存储节点智能性提高。多个对象存储节点协同工作,实现存储系统的自组织和自管理。Object-based storage can improve the intelligence of object storage nodes through the operation method of loading/unloading objects according to object attributes (such as QoS requirements, etc.). Multiple object storage nodes work together to realize self-organization and self-management of the storage system.
附图说明Description of drawings
图1为本发明的存储系统组成结构示意图;Fig. 1 is a schematic diagram of the composition and structure of the storage system of the present invention;
图2为由基于对象的存储控制器组成的对象存储节点的结构示意图;FIG. 2 is a schematic structural diagram of an object storage node composed of an object-based storage controller;
图3为本发明的存储系统实施例示意图;FIG. 3 is a schematic diagram of an embodiment of a storage system of the present invention;
图4为本发明的存储系统的控制方法示意图。FIG. 4 is a schematic diagram of a control method of the storage system of the present invention.
具体实施方式Detailed ways
如图1所示,本发明包括I个元数据服务器200.1~200.i和N个可扩展的对象存储节点900.1、900.2、…、900.n,元数据服务器200.1~200.i通过元数据服务器网络接口221.1~221.i连入网络400,同时通过网络与对象存储节点900.1、900.2、…、900.n相连;对象存储节点900.1、900.2、…、900.n分别通过各自的k个网络接口121.1~121.k连入网络,M个客户端300.1~300.m通过客户端网络接口321.1~321.m连入网络。元数据服务器200.1~200.i通过元数据服务器网络通道220.1~220.i,客户端300.1~300.m通过客户端网络通道320.1~320.m完成元数据服务器200.1~200.i对客户的认证、授权及返回客户请求的存储对象的映射表;客户端通过客户端网络通道320.1~320.m,对象存储节点通过各自的k个网络通道120.1~120.k,进行数据的传输;元数据服务器200.1~200.i通过元数据服务器网络通道220.1~220.i,对象存储节点通过各自的k个网络通道120.1~120.k,实现元数据服务器200.1~200.i对存储系统元数据管理。As shown in Figure 1, the present invention includes I metadata server 200.1~200.i and N scalable object storage nodes 900.1, 900.2, ..., 900.n, metadata server 200.1~200.i through metadata server The network interfaces 221.1-221.i are connected to the
图2显示了本发明的对象存储节点900的结构方块图。本发明的对象存储节点900由基于对象的存储控制器100通过存储设备接口模块112接存储设备113组成。基于对象的存储控制器100包含运行控制模块140、存储设备接口模块112以及网络接口模块122,各模块之间通过外围组件互连扩展总线PCI-X130物理连接。FIG. 2 shows a structural block diagram of an object storage node 900 of the present invention. The object storage node 900 of the present invention is composed of an object-based storage controller 100 connected to a storage device 113 through a storage device interface module 112 . The object-based storage controller 100 includes an operation control module 140 , a storage device interface module 112 and a network interface module 122 , and the modules are physically connected through a peripheral component interconnection expansion bus PCI-X130 .
运行控制模块140具体包括CPU 142、RAM 141和EPROM 143,提供计算能力、方法运行环境与调度能力。The operation control module 140 specifically includes a CPU 142, a RAM 141 and an EPROM 143, providing computing power, method running environment and scheduling capabilities.
存储设备接口模块112由存储设备接口111.1~111.j构成,提供该控制器与存储设备连接。存储设备主要是各类接口的磁盘驱动器,但不限定于该形式的存储设备。存储设备接口根据性能、成本采用SCSI、FC、IDE及SATA等执行面向块设备级协议的接口。The storage device interface module 112 is composed of storage device interfaces 111.1 to 111.j, providing the connection between the controller and the storage device. The storage device is mainly a disk drive with various interfaces, but is not limited to this type of storage device. The storage device interface adopts SCSI, FC, IDE and SATA according to the performance and cost to implement the interface oriented to the block device level protocol.
网络接口模块122由网络通道接口121.1~121.k组成,网络接口可以采用相同或不同的形式,且不局限于任何网络形态,各网络接口可以接入同一网络或不同网络中,按基于对象存储的协议接受客户请求。The network interface module 122 is composed of network channel interfaces 121.1~121.k. The network interfaces can be in the same or different forms, and are not limited to any network form. Each network interface can be connected to the same network or different networks. The protocol accepts client requests.
利用本发明的基于对象接口的存储节点组建而成的存储系统,改变了传统的数据管理方式和数据存储方式。传统文件系统或数据库系统中琐碎的底层数据操作管理将移至对象存储节点900中进行,部分原来由客户端应用程序承担的数据处理也移至对象存储节点900中处理。利用可扩展的对象属性表示和灵活的数据组织形式,使得对象存储节点900可用于构建适用于多种应用的存储系统,如Web服务系统、数据库系统、文件服务系统等;利用存储对象的操作方法及方法的动态调度机制,很容易实现一个具有自管理功能的存储系统,如自动恢复、自动负载均衡、自动热点迁移等。The storage system formed by using the storage nodes based on the object interface of the present invention changes the traditional data management and data storage methods. The trivial underlying data operation management in the traditional file system or database system will be moved to the object storage node 900, and part of the data processing originally undertaken by the client application program will also be moved to the object storage node 900 for processing. Using scalable object attribute representation and flexible data organization form, the object storage node 900 can be used to build a storage system suitable for various applications, such as Web service system, database system, file service system, etc.; using the operation method of storing objects And the dynamic scheduling mechanism of the method, it is easy to implement a storage system with self-management functions, such as automatic recovery, automatic load balancing, automatic hotspot migration, etc.
附图3是本发明的一个具体应用实例,元数据服务器200.1使用的是IBM xSeries 346Type 8840服务器(元数据服务器根据实际应用的需要可扩展),通过网络接口适配器接入由Cisco公司的Catalyst 3750SERIES千兆交换机形成的千兆以太网400,4个磁盘型号为ST173404LC的磁盘113.1、113.2、113.3、113.4接入基于对象的存储控制器100.1、100.2、100.3、100.4组成对象存储节点,并形成对象存储网络,客户端300.1、300.2、300.3对对象的访问请求分为根对象、分区对象、集合对象和用户对象等四种请求。其中前两种在元数据服务器200.1完成,而客户对用户对象的请求包括读用户对象、写用户对象等操作先要通过元数据服务器授权并获得对象在对象存储节点上的映射表信息。其中对用户对象的操作会涉及到大量数据流动,而其它请求是对对象存储系统的管理,所涉及的数据量相对较少。按照系统安全认证机制和对象数据分流的控制方法,系统运行时,元数据服务器将对每一个客户请求进行客户身份、权限检查,并对合法有效的请求进行授权。Accompanying drawing 3 is a specific application example of the present invention, what metadata server 200.1 used is IBM xSeries 346Type 8840 server (metadata server can expand according to the needs of actual application), inserts by the Catalyst 3750SERIES of Cisco company by network interface
本存储系统处理客户基于对象的请求分为对根对象,分区对象,集合对象和用户对象的请求处理。需要强调的是,在对象存储节点OSN可接收和处理基于对象的存储命令之前,对象存储节点OSN需要向元数据服务器注册。附图4为本发明存储系统的控制流程。The storage system processes the client's object-based requests into request processing for root objects, partition objects, collection objects, and user objects. It should be emphasized that before the object storage node OSN can receive and process object-based storage commands, the object storage node OSN needs to register with the metadata server. Accompanying drawing 4 is the control flow of the storage system of the present invention.
Claims (3)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CNB2005100191669A CN100367727C (en) | 2005-07-26 | 2005-07-26 | A scalable object-based storage system and its control method |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CNB2005100191669A CN100367727C (en) | 2005-07-26 | 2005-07-26 | A scalable object-based storage system and its control method |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN1728665A true CN1728665A (en) | 2006-02-01 |
| CN100367727C CN100367727C (en) | 2008-02-06 |
Family
ID=35927689
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CNB2005100191669A Expired - Fee Related CN100367727C (en) | 2005-07-26 | 2005-07-26 | A scalable object-based storage system and its control method |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN100367727C (en) |
Cited By (21)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101013381B (en) * | 2007-01-26 | 2010-05-19 | 华中科技大学 | Distributed lock based on object storage system |
| CN102073742A (en) * | 2011-01-31 | 2011-05-25 | 清华大学 | Mass object storage system and running method thereof |
| CN101247417B (en) * | 2008-03-07 | 2011-07-27 | 中国科学院计算技术研究所 | Double-layer metadata processing system and method |
| CN101170416B (en) * | 2006-10-26 | 2012-01-04 | 阿里巴巴集团控股有限公司 | Network data storage system and data access method |
| CN102360382A (en) * | 2011-10-13 | 2012-02-22 | 中国人民解放军国防科学技术大学 | High-speed object-based parallel storage system directory replication method |
| CN101674334B (en) * | 2009-09-30 | 2012-05-23 | 华中科技大学 | Access control method of network storage equipment |
| CN102567495A (en) * | 2011-12-22 | 2012-07-11 | 国网信息通信有限公司 | Mass information storage system and implementation method |
| CN101616174B (en) * | 2009-07-09 | 2012-07-11 | 浪潮电子信息产业股份有限公司 | Method for optimizing system performance by dynamically tracking IO processing path of storage system |
| CN101316273B (en) * | 2008-05-12 | 2012-08-22 | 华中科技大学 | Distributed safety memory system |
| CN101930361B (en) * | 2009-06-26 | 2013-10-09 | 中国电信股份有限公司 | Method and system for providing online data storage service |
| CN106713465A (en) * | 2016-12-27 | 2017-05-24 | 北京锐安科技有限公司 | Distributed storage system |
| CN107111481A (en) * | 2014-10-03 | 2017-08-29 | 新加坡科技研究局 | Distribution actively mixes storage system |
| CN107533517A (en) * | 2015-01-20 | 2018-01-02 | 乌尔特拉塔有限责任公司 | object-based memory structure |
| CN113626525A (en) * | 2011-06-27 | 2021-11-09 | 亚马逊科技公司 | System and method for implementing scalable data storage services |
| US11231865B2 (en) | 2015-06-09 | 2022-01-25 | Ultrata, Llc | Infinite memory fabric hardware implementation with router |
| US11256438B2 (en) | 2015-06-09 | 2022-02-22 | Ultrata, Llc | Infinite memory fabric hardware implementation with memory |
| US11269514B2 (en) | 2015-12-08 | 2022-03-08 | Ultrata, Llc | Memory fabric software implementation |
| US11281382B2 (en) | 2015-12-08 | 2022-03-22 | Ultrata, Llc | Object memory interfaces across shared links |
| CN115119200A (en) * | 2022-08-29 | 2022-09-27 | 深圳慧城智联科技有限公司 | Information transfer method for 5G communication environment |
| US11573699B2 (en) | 2015-01-20 | 2023-02-07 | Ultrata, Llc | Distributed index for fault tolerant object memory fabric |
| US11579774B2 (en) | 2015-01-20 | 2023-02-14 | Ultrata, Llc | Object memory data flow triggers |
-
2005
- 2005-07-26 CN CNB2005100191669A patent/CN100367727C/en not_active Expired - Fee Related
Cited By (32)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101170416B (en) * | 2006-10-26 | 2012-01-04 | 阿里巴巴集团控股有限公司 | Network data storage system and data access method |
| US8953602B2 (en) | 2006-10-26 | 2015-02-10 | Alibaba Group Holding Limited | Network data storing system and data accessing method |
| CN101013381B (en) * | 2007-01-26 | 2010-05-19 | 华中科技大学 | Distributed lock based on object storage system |
| CN101247417B (en) * | 2008-03-07 | 2011-07-27 | 中国科学院计算技术研究所 | Double-layer metadata processing system and method |
| CN101316273B (en) * | 2008-05-12 | 2012-08-22 | 华中科技大学 | Distributed safety memory system |
| CN101930361B (en) * | 2009-06-26 | 2013-10-09 | 中国电信股份有限公司 | Method and system for providing online data storage service |
| CN101616174B (en) * | 2009-07-09 | 2012-07-11 | 浪潮电子信息产业股份有限公司 | Method for optimizing system performance by dynamically tracking IO processing path of storage system |
| CN101674334B (en) * | 2009-09-30 | 2012-05-23 | 华中科技大学 | Access control method of network storage equipment |
| CN102073742A (en) * | 2011-01-31 | 2011-05-25 | 清华大学 | Mass object storage system and running method thereof |
| US12393607B2 (en) | 2011-06-27 | 2025-08-19 | Amazon Technologies, Inc. | System and method for implementing a scalable data storage service |
| CN113626525A (en) * | 2011-06-27 | 2021-11-09 | 亚马逊科技公司 | System and method for implementing scalable data storage services |
| CN102360382A (en) * | 2011-10-13 | 2012-02-22 | 中国人民解放军国防科学技术大学 | High-speed object-based parallel storage system directory replication method |
| CN102567495A (en) * | 2011-12-22 | 2012-07-11 | 国网信息通信有限公司 | Mass information storage system and implementation method |
| CN102567495B (en) * | 2011-12-22 | 2013-08-21 | 国家电网公司 | Mass information storage system and implementation method |
| CN107111481A (en) * | 2014-10-03 | 2017-08-29 | 新加坡科技研究局 | Distribution actively mixes storage system |
| CN107533517A (en) * | 2015-01-20 | 2018-01-02 | 乌尔特拉塔有限责任公司 | object-based memory structure |
| US11755201B2 (en) | 2015-01-20 | 2023-09-12 | Ultrata, Llc | Implementation of an object memory centric cloud |
| US11782601B2 (en) | 2015-01-20 | 2023-10-10 | Ultrata, Llc | Object memory instruction set |
| US11775171B2 (en) | 2015-01-20 | 2023-10-03 | Ultrata, Llc | Utilization of a distributed index to provide object memory fabric coherency |
| US11768602B2 (en) | 2015-01-20 | 2023-09-26 | Ultrata, Llc | Object memory data flow instruction execution |
| CN114741334A (en) * | 2015-01-20 | 2022-07-12 | 乌尔特拉塔有限责任公司 | General single level object memory address space |
| US11755202B2 (en) | 2015-01-20 | 2023-09-12 | Ultrata, Llc | Managing meta-data in an object memory fabric |
| US11573699B2 (en) | 2015-01-20 | 2023-02-07 | Ultrata, Llc | Distributed index for fault tolerant object memory fabric |
| US11579774B2 (en) | 2015-01-20 | 2023-02-14 | Ultrata, Llc | Object memory data flow triggers |
| US11231865B2 (en) | 2015-06-09 | 2022-01-25 | Ultrata, Llc | Infinite memory fabric hardware implementation with router |
| US11733904B2 (en) | 2015-06-09 | 2023-08-22 | Ultrata, Llc | Infinite memory fabric hardware implementation with router |
| US11256438B2 (en) | 2015-06-09 | 2022-02-22 | Ultrata, Llc | Infinite memory fabric hardware implementation with memory |
| US11281382B2 (en) | 2015-12-08 | 2022-03-22 | Ultrata, Llc | Object memory interfaces across shared links |
| US11269514B2 (en) | 2015-12-08 | 2022-03-08 | Ultrata, Llc | Memory fabric software implementation |
| US11899931B2 (en) | 2015-12-08 | 2024-02-13 | Ultrata, Llc | Memory fabric software implementation |
| CN106713465A (en) * | 2016-12-27 | 2017-05-24 | 北京锐安科技有限公司 | Distributed storage system |
| CN115119200A (en) * | 2022-08-29 | 2022-09-27 | 深圳慧城智联科技有限公司 | Information transfer method for 5G communication environment |
Also Published As
| Publication number | Publication date |
|---|---|
| CN100367727C (en) | 2008-02-06 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN1728665A (en) | Expandable storage system and control method based on objects | |
| CN1304961C (en) | Memory virtualized management method based on metadata server | |
| CN1324450C (en) | Storage system, storage control device, and control method for storage control device | |
| Wang et al. | An efficient design and implementation of LSM-tree based key-value store on open-channel SSD | |
| US10452316B2 (en) | Switched direct attached shared storage architecture | |
| US10216418B2 (en) | Storage apparatus and method for autonomous space compaction | |
| CN1852318A (en) | Distributed multi-stage buffer storage system suitable for object network storage | |
| CN1771495A (en) | Distributed File Service Architecture System | |
| CN1871587A (en) | Bottom-up cache structure for storage servers | |
| CN1723434A (en) | Apparatus and method for scalable network attached storage system | |
| WO2013155751A1 (en) | Concurrent-olap-oriented database query processing method | |
| CN103873559A (en) | Database all-in-one machine capable of realizing high-speed storage | |
| CN1862476A (en) | Super large capacity virtual magnetic disk storage system | |
| CN1902578A (en) | Method and apparatus for controlling access to logical units | |
| CN104615577A (en) | Big data server | |
| CN1209714C (en) | Server system based on network storage and expandable system structure and its method | |
| CN117539398A (en) | Method, device, equipment and medium for managing volume mapping | |
| CN1220950C (en) | Controller for outer multi-channel network disc array and its protocol fitting method | |
| CN1447254A (en) | Networked mass storage device and implementation approach | |
| CN1228726C (en) | Massive memory system based on multi-channel memory equipment and its control method | |
| CN100527744C (en) | Intelligent network disc storage system and its realizing method | |
| CN1255731C (en) | Data management system in the internet storage system | |
| KR101470857B1 (en) | Network distributed file system and method using iSCSI storage system | |
| CN106201328A (en) | Method, device and the server of a kind of disk space managing memory node | |
| CN1331038C (en) | Storage controler based on object and dispatching method used thereof |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20080206 |
