[go: up one dir, main page]

CN1267379A - Redundancy implementation on object oriented data storage device - Google Patents

Redundancy implementation on object oriented data storage device Download PDF

Info

Publication number
CN1267379A
CN1267379A CN98808073A CN98808073A CN1267379A CN 1267379 A CN1267379 A CN 1267379A CN 98808073 A CN98808073 A CN 98808073A CN 98808073 A CN98808073 A CN 98808073A CN 1267379 A CN1267379 A CN 1267379A
Authority
CN
China
Prior art keywords
data
redundant
memory device
requestor
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN98808073A
Other languages
Chinese (zh)
Inventor
D·B·安德森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SICHATER TEHC CO Ltd
Seagate Technology LLC
Original Assignee
SICHATER TEHC CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SICHATER TEHC CO Ltd filed Critical SICHATER TEHC CO Ltd
Publication of CN1267379A publication Critical patent/CN1267379A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/448Execution paradigms, e.g. implementations of programming paradigms
    • G06F9/4488Object-oriented
    • G06F9/4493Object persistence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1461Backup scheduling policy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2211/00Indexing scheme relating to details of data-processing equipment not covered by groups G06F3/00 - G06F13/00
    • G06F2211/10Indexing scheme relating to G06F11/10
    • G06F2211/1002Indexing scheme relating to G06F11/1076
    • G06F2211/1059Parity-single bit-RAID5, i.e. RAID 5 implementations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2211/00Indexing scheme relating to details of data-processing equipment not covered by groups G06F3/00 - G06F13/00
    • G06F2211/10Indexing scheme relating to G06F11/10
    • G06F2211/1002Indexing scheme relating to G06F11/1076
    • G06F2211/1066Parity-small-writes, i.e. improved small or partial write techniques in RAID systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Detection And Correction Of Errors (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

A data storage system (100) includes a storage media (132) having stored thereon data configured as a plurality of objects (124-126), each object having attributes indicative of characteristics of the object (124-126). The objects (124-126) include a redundancy object (412) storing redundancy information. A control component (150) is operably coupled to the storage media (132) and is configured to provide an interface (128) to the objects (124-126). The interface (128) exposes methods (o-N) which are invoked to access the objects (124-126).

Description

The redundancy of OO data storage device realizes
Technical field
The present invention relates to data storage device.Particularly, the present invention relates to data wherein and organized the data storage device such as disc driver, tape drive or CD drive with access as object.
Background technology
In computer industry circle, know the computer model of two class routines.A kind of is the main frame computation model, and another kind is bunch (a trooping) computation model.
In the main frame computation model, final user's common process is to buy a starter system earlier, and when needs append processing power, change a bigger system again.Difference in this cycle can produce abrasive discontinuous.For example, if the user grows to the architecture of dissatisfied starter system, then the user may need to be converted to another kind of operating system from a kind of operating system, or even when buying the host computer system of the second cover upgrading, be transformed into another supplier's proprietary architecture from a proprietary architecture of supplier.For implementing upgrading, this change is all needing great cost aspect expense and employee's time two.So under many circumstances, avoid this conversion.
In addition, the surplus value of the computer equipment left over of main frame model is relatively poor.Like this, when initial system during, before often making, system conversion completely lost on the foundation of investment by the system conversion of upgrading.In addition, bigger upgrade-system with compare than mini system, sales volume is less.Therefore, the price of upgrade-system is often than former system height.
In bunch computation model, with cluster less, standard server replaces main frame.This can provide than main frame and more many advantage.Start because bunch can be used as a triangular web, thereby it is lower to enter the threshold value of clustering model.And then, this than mini system mass selling usually, make computer costs lower.And the standard of this system is not rely on proprietary architecture.This provides the possibility in multiple source for existing equipment, chooses best product when the back is bought thereby allow the user to continue.
Bunch computation model also has the following advantages.Satisfy at present and the additional resource of recent demand by only increasing, can control upgrade cost more accurately.And then the user can select from supplier widely and needn't worry to shift or be converted to an architecture that rises.Similarly, a correct architecture is arranged, may never need be converted to another operating system.
Bunch computation model also has its disadvantageous problem.For example, bunch computation model is difficult to be provided at bear with individual host and finishes the cluster system that has the shared data ability under the equal workload prerequisite.For example, each server is all handled bunch type of identical data affairs in the common extremely difficult realization bunch.
Some this examples of applications comprise the complete schedule of dealing of airline reservation system or financial institution.
Second shortcoming of bunch type only relates to that existing data lack broad experience in the management host environment.This experience has been extended to also do not have available management software in bunch environment of standard.
The present invention is devoted to address these and other problems, and provides other advantages to prior art.
Summary of the invention
Data-storage system comprises medium, and stored configuration becomes the data of a plurality of objects on it, and each object has the attribute of this characteristics of objects of expression.The redundant object that also comprises a store redundant information in these objects.Control assembly operationally is coupled to this medium and is configured as described object provides an interface.The call method of described showing interface access object.
Summary of drawings
Fig. 1 is the storage system block diagram that is additional to network according to one aspect of the invention.
Fig. 2 is the key diagram according to the object model of one aspect of the invention.
Fig. 3-1 is first kind of configuration block diagram, an object in one of them requestor's access memory device.
Fig. 3-2 is second kind of configuration block diagram, an object in one of them requestor's access memory device.
Fig. 4 is the skeleton view according to the disc driver of one aspect of the invention.
Fig. 5 is the functional block diagram of explanation requester accesses object.
Fig. 6 is the key diagram according to the part medium of one aspect of the invention subregion.
Fig. 7-1 and Fig. 7 the-the 2nd, according to an aspect of the present invention, and the process flow diagram of requestor's access object.
Fig. 8 is according to one aspect of the invention, creates the process flow diagram of object.
Fig. 9 is according to one aspect of the invention, opens the process flow diagram with upgating object.
Figure 10 is according to one aspect of the invention, writes the process flow diagram of object.
Figure 11 is according to one aspect of the invention, opens the process flow diagram with read-only object.
Figure 12 is according to one aspect of the invention, reads the process flow diagram of object.
Figure 13 is according to one aspect of the invention, closes the process flow diagram of object.
Figure 14 is according to one aspect of the invention, the process flow diagram of deletion object.
Figure 15 is according to one aspect of the invention, the process flow diagram that creates the division.
Figure 16 is according to one aspect of the invention, the process flow diagram of deletion subregion.
Figure 17 is according to one aspect of the invention, the process flow diagram of object output.
Figure 18 is according to one aspect of the invention, obtains the process flow diagram of object properties.
Figure 19 is according to one aspect of the invention, is provided with or revises the process flow diagram of object properties.
Figure 20 is according to one aspect of the invention, the process flow diagram of read lock attribute.
Figure 21 is according to one aspect of the invention, and the process flow diagram of locking attribute is set.
Figure 22 is according to one aspect of the invention, the process flow diagram of the locking attribute of resetting.
Figure 23 is according to one aspect of the invention, obtains the process flow diagram of device association.
Figure 24 is according to one aspect of the invention, and the process flow diagram of device association is set.
Figure 25 is according to one aspect of the invention, realizes the block diagram of disk drive array.
Figure 26 is the block diagram according to the destination disk drive of one aspect of the invention.
Figure 27 is the block diagram according to the odd even disc driver of one aspect of the invention.
Figure 28 is according to one aspect of the invention, creates odd even group's process flow diagram.
Figure 29 is according to one aspect of the invention, upgrades the write operation process flow diagram of parity information.
The detailed description of preferred embodiment
Fig. 1 is the block diagram according to the data-storage system 100 of one aspect of the invention.System 100 comprises OO data storage device 110 and 112, file server 114, and the requestor 116,118 and 112, and interconnection 122.A kind of stored configuration that is additional to network of system 100 expression, it is by equipment and software component from many different suppliers, concerning the user as single computation machine system.
OO memory device 110-112 is the memory unit of the data storage function of executive system 100.Memory device 110-112 preferably includes disc driver, independent redundancy disk array (RAID) subsystem, tape drive, tape library, CD drive, connection CD (juke boxes) or other sharable memory device.Memory device 110 and 112 also provides the I/O passage so that access arrangement 110 and 112 to requestor 116,118 and 120.
Requestor 116,118 and 120 is parts of similar service device or client's device and so on, the canned data in their energy shared devices 110 and 112.Requestor 116-120 preferably is configured to the information in direct access storage device 110 and 112.
File server 114 is carried out management and security function, such as requests verification and resource location etc.In than mini system, preferably do not use dedicated file server.And bear the responsibility of monitor system 100 operations that file server 114 should carry out with the requestor among the 116-120.In addition, when not needing or do not wish safety that file server 114 provides and function, or more need requestor 116-120 group and memory device 110 and 112 directly during talk, can from system 100, remove file server 114.
In a preferred embodiment, interconnection 122 is physically a foundation structure, and by it, all parts that have in the network of storage system 100 can intercom mutually.
In the operation, when system 100 started, all devices was preferably mutually or with respect to a common reference point (for example file server 114 or interconnect 122) indicating self.For example, in system 100 based on optical-fibre channel, OO memory device 110 and 112 and requestor 116-120 be registered in system organization structure (fabric).After implementing like this, but any hope is determined any parts utilization structure service of the system 110 of operative configuration and is discerned all other parts.From file server 114, requestor 116-120 learns memory device 110 that requestor 116-120 can access and 112 existence.Equally, memory device 110 and 112 is learnt the positional information of determining the miscellaneous equipment in the system 100 and is used to call the necessary address of backup and so on management service.Similarly, in a preferred embodiment, file server 114 is learnt the existence of memory device 110 and 112 from the structure service.
The security practices that depends on particular system 100, requestor 116-120 or wherein any may be rejected some parts of access system 100.From one group of memory device 110 and 112 that each requestor is suitable for, requestor's identified documentation, database and its available free space.
Simultaneously, each parts in the system 100 preferably can be discerned the arbitrary feature related with file server 114 and consider.For example, in case arbitrary memory device with level service (level service) attribute circular document server, then all other parts in the system 100 can be learnt these attributes from file server 114.For example, particular requester 116-120 may wish in time to notify the intervention of its additional memory devices, and starts it.When for example needing only requestor's statement the time, can provide this attribute to file server 114 logins.Like this, when having new memory device to add system 100, file server 114 is informed this particular requester 116-120 automatically.Usually, file server 114 also can transmit other key property to the requestor, for example whether RAID5, mirror image etc. of memory device.
According to an aspect of the present invention, the information that is stored in memory device 110 and 112 is stored with the system that clearly demonstrates among Fig. 2.The preferably OO equipment of in the memory device 110 and 112 each, in its operator scheme, data are as object 124-126, rather than organize with access as the sector of certain sequence.OO equipment 110 and object-based file system of 112 usefulness come management object 124-126, and this system comprises that to each subregion on the particular device individual layer Object table is arranged.This also can see straight file system as.Be stored in object 124-126 on the medium of each equipment 110 and 112 preferably with the minimum visual element of the capacity allocation on object-oriented equipment mode apparatus operating 110 or 112.An object on this memory device comprises the one group orderly sector related with unique identifier.Data have a side-play amount with identifier reference and object. Press memory device 110 or 112, object is distributed, deposit medium in.File and metadata that operating system management is made up of object, and unlike management sector data in the existing architecture.
By interface 128 access object 124-126, wherein object is showed the method that a plurality of person of being requested 116-120 call, with attribute and the data among access and the operand 124-126.Like this, as shown in Figure 2, the requestor 116-120 request of sending 130.In preferred embodiment, requestor 116-120 is a computer system, or a member in certain cluster system network, and they are to the memory device that the comprises object 124-126 request of sending 130.Thereby requestor 116-120 is client both, can be again server.Under any circumstance, a method in the interface 128 is called in the request 130 of being sent by one of requestor 116-120, thereby operates one or more object 124-126, as hereinafter describing in detail among the application.
Fig. 3-1 is the block diagram that can be used for two different configurations of the object of depositing among the access memory device 110-112 with 3-2.For the sake of simplicity, single requestor 116 and single OO memory device 110 only are described in Fig. 3-1 and 3-2.When requestor 116 wishes to open object (for example object 124-126), but requestor's 116 direct access storage devices 110, or be an object on the access memory device 110, it may need asking for permission of file server 114 and positional information.File server 114 controls are extended to the safety requirements function of system's 100 specific execution to the access of memory device 110.
In the block diagram of Fig. 3-1 explanation, suppose that system 100 is safe.Do not require that promptly protection is in requestor 116 and 110 transmission of carrying out command information and data of memory device.In this enforcement, still have the file server 114 of a band management function, but do not need it to supervise the reciprocation of requestor and memory device 110.
In this enforcement, requestor 116 can be directly to memory device 110 accesses and establishment object.Like this, requestor 116 can open, reading and writing and close object, just is additional to requestor 116 originally as their.This operation hereinafter will be described in detail.But for clarity sake, do a general introduction here earlier.In order to read the object in the memory device 110, requestor 116 preferably reads to disclose one or more objects of the logical volume of memory device 110 or subregion earlier and gets a thorough understanding of how to begin to retrieve objects stored on it.Requestor 116 opens and reads an object then, and it may be a root directory.From this object, and go ahead based on the content of root directory and to seek other object.Requestor 116 repeats said process up to finding desired data.Data are by object identifier (object ID) reference, and skew is arranged in object.
In being shown in second embodiment of Fig. 3-2, require security.So the I/O chain that file server 114 insertion requestors 116 and memory device are 110 is to expecting the necessary degree of protected level.In a preferred embodiment, requestor 116 must at first carry out a series of I/O operations and obtain asking for permission of file server 114.File server 114 (it may be for add-on security to requestor's 116 concealment stored position informations) is then by returning enough information to allow requestor 116 directly to communicate by letter with memory device 110 to approve the request from requestor 116.Because when memory device 110 is logined on file server 114, it preferably is apprised of security parameter, thereby memory device 110 preferably do not allow I/O request, unless it is suitably constituted and comprises the coded data that contains from effective permission of file server 114.
Then, process is to carry out with Fig. 3-1 similar manner of being narrated.But, with the service load of each order association may be very inequality.For example, all may encrypt in the order and the data of requestor 116 and 110 transmission of memory device under the situation (shown in Fig. 3-2) requiring completely.In addition, preferably License Info is appended to the command parameter that requestor 116 provides to memory device 110.
Because in a preferred embodiment, memory device 110 and 112 can comprise hard disk drive, thereby hard disk drive simply is discussed is necessary.Fig. 4 is the skeleton view that can be used as the hard disk drive of memory device 110.In disc driver 110, the axle sleeve that a plurality of discs 132 center on the motor component 134 in the casing 136 is folded, and each disc 132 has many concentrically ringed recording tracks, shown in 138.Each magnetic track 138 is divided into a plurality of subregions (carefully stating with reference to Fig. 6) again.By specifying the subregion in certain magnetic track 138, data can be stored on the disk 132 or therefrom retrieve.Actuator arm member 140 preferably is rotatably installed on the angle of casing 136, actuator arm member 140 has a plurality of head gimbal members 142, each member has and has the slide block that read/write head is sensor (not shown), is used for writing disk 132 from disk 132 sense informations or information.
Voice coil motor 144 accurately before and after rotary-actuated arm member 140, the arc along arrow 146 indications moves on the surface of disk 132 to make sensor on the slide block 142.Fig. 4 also represents disk drive controller 148 with the block diagram form, is used for controlling in the mode of knowing some operation of disc driver 110.But according to the present invention, Disk Drive Controller 148 also is used for objects stored 124-126 on the disk 132 is implemented interface 128.
Fig. 5 is a disc driver 110 when being applied to be shown in the system 100 of Fig. 1, the block diagram that it is a part of.Among Fig. 5, Disk Drive Controller 148 comprises the control assembly 150 of implementing interface 128.Object 124-126 is stored on the medium that constitutes disk 132.Request parts 152 person of being requested 116-120 carry out and call method formation logical formula request in the interface 128, in case call a certain method, control assembly 150 is carried out certain tasks, is demarcated object with the setting means operation.Control assembly 150 returns an incident, and it can comprise data related with any identifying object or attribute.This incident also can be returned according to the specific call method of requestor 116-120.
In order to make OO equipment 110-112 that operating system identical function with block-oriented equipment logotype can be provided, the storage space among the equipment 110-112 must be managed to similarity degree.Like this, in a preferred embodiment, in memory device 110-112, on storage object 124-126 thereon, organized layer is set.In a preferred embodiment, OO equipment 110-112 is distributed into one or more exclusive area to disk space, is called subregion.Subregion describes in detail with reference to Fig. 6.In a subregion, requestor 116-120 can create object.In a preferred embodiment, the structure in the subregion is simple, a straight tissue.At this tissue, any operating system all can be shone upon the structure of himself.
Fig. 6 illustrates the part storage space on the medium (for example disk 132).This storage space comprises many objects, such as device control object 154, device association object 156 and be designated as subregion 0 (also representing with 158), subregion 1 (also representing with 160), a plurality of subregions of subregion N (also representing with 162).Each subregion also comprises a plurality of objects, for example subregion controlling object 164, zone object table 166 and a plurality of data object 168 (N represents with data object 0-data object).
One group of attribute with each object association.According to an aspect of the present invention, provide access control attribute.(hereinafter will describe in detail) is set and a kind of means are provided with Set Attribute method, but by means of the controlled special object of its access.By changing the version number of access control attribute, can refuse or provide the access of some requestor 116-120 to this special object.
Sub-clustering object (The clustering object) is a kind of attribute, and whether it is illustrated in the storage system special object and should wishes to be positioned near another object.Whether regeneration attribute (the cloning attribute) expression special object is created by duplicating in the storage system another object.One group of size attribute (sizeattributes) is determined the size characteristic of special object.For example, this group size attribute comprises: expression can write the peak excursion information of object, distributes to the piece number of object, every byte number in the piece number of storage data and the object in the object.
One group of time attribute is represented: when object is created, and the time that data are modified for the last time in the object, is modified time of attribute in the object for the last time.Object preferably also comprises one group of attribute, and it determines the time that any data are modified for the last time in the file system time and any attribute are modified for the last time.In order to represent the characteristics of other parameter, feature or any given object, also can provide other attribute.
Each object also is associated with an object identifier, and it is selected by specific memory equipment 110-112.And as the response of creating object command is returned to requestor 116-120.This identifier is preferably not signed, fixed length integer.In a preferred embodiment, the length of identifier defaults to the size by particular storage device 110-112 regulation, can be arranged to a device attribute.And then, in a preferred embodiment, for knowing the predetermined subset of identifiers (Ids) of other specific function reservation that object, application-specific and hope are carried out.
Fig. 6 explanation, medium typically comprises many objects of knowing, they always have specific object ID.In some cases, these objects of knowing are present in each equipment or each subregion.
For example, know liking device control object 154 for one, it preferably comprises the attribute of being safeguarded by each equipment 110-112, and all relevant with all objects in equipment itself or the equipment.The attribute of being safeguarded by Set Attribute method will describe in detail in the application of back.In a preferred embodiment, each equipment 110-112 has a device control object 154.
Table 1 explanation one group of preferable device control object (DCO) attribute
Type Title Byte Semantic
Security Clock ??8 Monotone counter
Master key ??8 Master key opertaing device key
Device keys ??8 Device keys control subregion key
Protected level ??1 Determine that protection is optional
Subregion The subregion counting ??1 The equipment number of partitions
Device attribute Object properties ??8 Determine with equipment on the characteristic of all object associations
In a preferred embodiment, the DCO attribute comprises clock (being a monotone counter simply), comprises the master key of other master key of all other keys on encryption key or the opertaing device and control subregion key and can be used for locking the Device keys of subregion.Attribute also comprises can discern predetermined protected level and the protected level key related with safety policy, determines the subregion counter of the number of partitions on the equipment and determines by the object properties of all object association characteristics on the access particular device.
Manage cross-over connection in the object of a plurality of memory device 110-112 for competent, each memory device 1 10-112 preferably has the device association object 156 that can determine each equipment room correlativity.For example, where memory device 110 and 112 is a pair of mirror image equipment or the member of an array group, and object 156 in parallel can this relation of identification.The preferable attribute of table 2 explanation affiliated partner 156.
Table 2
Title Byte Semantic
Associated identifiers ????2 This organizes unique ID
Association type ????2 Related kind
Membership table ????n
Associated identifiers ????2
Association type ????2
Membership table ????n
Preferably include an associated identifiers in these attributes, it is each unique identifier of given group.Preferably also comprise the association type of determining the related kind of equipment room (for example mirror image to, RAID5 etc.) in the attribute.Preferably further comprise a membership table in the attribute, it indicates that simply the member of above-mentioned definite association is equipment 110-122.
Memory device 110-112 goes up each subregion 158,160 and 162 and preferably includes the subregion controlling object 164 that contains single graded properties.Object 164 is preferably not only described subregion, also describes the relevant object properties of all objects in any and this subregion.Each equipment 110-112 preferably comprises a subregion controlling object 164 to each subregion that defines on the equipment, though Fig. 6 explanation is stored in the subregion controlling object in each subregion, might not be this situation.The subregion controlling object can be stored in the flat file system, rather than is stored in the above-mentioned subregion.
The many attributes that often are included in the subregion controlling object 168 of table 3 expression.
Type Title Byte Semantic
Master key ????8 Encryption key
The work at present key ????8
Working key in the past ????8
Zone attribute Object properties ????8 Own in definition and the subregion
The characteristic of object association
Preferably comprise a master key that can define whole subregion encryption key and can be used for being provided with the work at present key in this generic attribute.Preferably also comprise the work at present key and the former working key that are used for encryption and decryption order and data message in the attribute.Subregion controlling object 164 preferably also comprise with specified partition in the object properties of all object associations.
Fig. 6 illustrates that also each subregion preferably comprises zone object table 166, and it is when creating the division on medium, the object of being set up by control assembly 150.Zone object table 166 preferably has identical identifier to each subregion, specifies in the starting point of navigation object file system on the medium.The attribute list that table 4 explanation is related with each zone object table.
Table 4
Field Byte
Object ID ??8 Open, reading and writing, close the ID that this object is used
User data ??N Be provided with the POL attribute, obtain data value with GET ATTRIBUTE
As shown in table 4, this object preferably comprises the object identifier table (being object ID) of all objects in the subregion and distributes to the user's space amount of each object.The requestor uses that object identifier is opened, reading and writing and close this object.In addition, the user is preferably each object ID distributing user space and is convenient to set user data attribute in the zone object table.Behind zone object table 166, each subregion preferably comprises a plurality of data objects 168.Each data object 168 preferably comprises one or more set of properties that table 1 is listed, and is determined by the concrete enforcement of data-storage system.
OO memory device 110-112 preferably can support requestor 116-120 that the request of data or storage data is provided.And memory device 110-112 preferably bears other parts are likely the function of finishing in operating system or existing architecture.Space management and to equipment 110-112 in the attribute of object association safeguard and preferably self to finish by equipment 110-112.These functions preferably realize that by calling the method for supporting at interface 128 this interface 128 is to realize by the control assembly 150 among each memory device 110-112.Hereinafter go through a plurality of methods that to call.But, for the ease of understanding these methods better, provide process flow diagram with 7-1 and 7-2, the OO file system of navigating is described according to an aspect of the present invention.Believe go through each method hereinafter before, Fig. 7-1 and 7-2 are discussed earlier will be convenient to understand the present invention.
Extend to Fig. 7-1 of 204 and how the 7-2 explanation searches object the specified partition of a memory device 110-112 from frame 170.At first, the device attribute in requestor's 116 acquisition device control objects 154.This is by frame 172 expressions.Calling the Get_DCO_Attributes method makes control assembly 150 return the attribute of memory device in controlling object 154.This is by frame 174 expressions.According to the attribute that returns from device controlling object 154, requestor 116 selects certain given subregion then.This is by frame 176 expressions.
Requestor's 116 in a single day selected subregions, requestor 116 calls the Get_DAO_Attributes method immediately, shown in frame 173.It makes control assembly 150 from being stored in device association object 156 getattrs on the medium 110.Then, control assembly 150 is to requestor's 116 Returning equipment relating attributes, shown in frame 175.According to device association attribute and controlled attribute, the requestor's 116 selected subregions that will inquire.Shown in frame 176.
Then, requestor 116 calls the Get_PCO_Attributes method, the attribute in the related subregion controlling object 164 of the specified partition that it finds control assembly 158 will to inquire with requestor 116.Control assembly 150 obtains and returns subregion controlling object attribute.Shown in frame 178 and 180.If the object in the selected subregion is not the interested object of requestor, then shown in frame 182 and 176, the requestor can select other subregion.
But, suppose that requestor 116 has found interested subregion, then the requestor calls Get_POL_Attributes to selected subregion, shown in frame 184.This method makes control assembly 150 obtain attribute from the zone object table 166 that selected subregion is associated.Then these attributes are offered requestor 116, shown in frame 186.
Then, requestor 116 calls the Open_Read_Only_POL method.Shown in frame 188.As hereinafter going through, control assembly 150 obtains to be stored in the data in the zone object table 166 related with selected subregion, but because it is read-only connecing the data of confession, can not be modified or expand, so control assembly can not be revised the attribute of this object.Shown in frame 190.
The requestor calls the Read_POL method then, and it makes control assembly 150 obtain the Object table of selected subregion, offers requestor 116 and looks back.Shown in frame 194.In selected subregion, choose want object after, requestor 116 calls the Close_POL method, it makes control assembly 150 close the zone object table.Shown in frame 196.
Find behind the ID of the object of wanting or all objects, the requestor then call Open_ * * * the Objectx method.Wherein * * * show that the requestor operates the regulation deployment method that calls according to desired particular data.Objectx shows requestor's selected will operation or the ID of access object from the zone object table.Symbol * * * for example can represent that Open_Update operates or or Open_Read_Only operation.These also to discuss hereinafter and these steps shown in frame 198.
Then, the requestor carries out the operation of wishing to the object that control assembly 150 returns.Hereinafter will go through the whole bag of tricks that can be used for operand.Shown in frame 200.
At last, the requestor is in case finish desired operation or access to want object, and requestor 116 calls hereinafter immediately with the Close_Objectx method of detailed description, and this operation is with the object of turn-off request person's 116 accesses.
Fig. 8-the 24th is illustrated as being stored in the object in the OO memory device (as equipment 110), realizes desired function and desired operation, the process flow diagram of the various demonstration methodses that the requestor can call.
Fig. 8 is the process flow diagram that specifies the Open_Create_Object method.When requestor 116 calls this method, shown in frame 208, control assembly 150 is created in a new object ID and the zone object table that its input is related with being created object place subregion.Shown in frame 210.Control assembly 150 is by distributing the piece number be associated with object etc. then, and revises object properties showing the Object Creation time and listed in the table 1, and other attribute of object association is set, thereby creates new object.Shown in frame 212.Then, control assembly 150 returns solicited status, together with the ID of the new object of just having created.Shown in frame 214.
Except that object of simple establishment, requestor 116 can specify many options.For example; in a preferred embodiment; requestor 116 can specify: whether object is subjected to password protection; whether object is encrypted; certain quality service threshold value (for example; whether object backs up); lockout feature (for example; whether object locks as subregion lock or equipment with an object lock or other lock); the access control version; (it can make all renewals be become another object by mirror image or back up with another specific mode) supported in mirror image or other backup, will be that unit comes allocation space and conflict characteristic (for example writing a unix system) is set to specify minimal size to show.
In order to call this method, the customizing messages that requestor 116 provides to control assembly 150 comprises the License Info that security of system is required, create equipment subregion and above-mentioned any option of object.In response, in an example embodiment, control assembly 150 returns: the ID of available capacity, solicited status and new object on the equipment.
It should be noted that the specific examples that can call this method comprises all data with object association.In this case, can call a method creates object, writes object and closes object.
Fig. 9 is the process flow diagram of explanation Open_Update_Objectx method.Shown in frame 220, when requestor 116 calls the method, allow requestor's 116 read and write special objects.Also can expand the length of this object.When calling the method, an attribute is set to show this object just in use in 150 pairs of appointed objects of control assembly.Requestor 116 provides License Info, contains the partition id of object, wants the identifier of access object, the action type that will carry out (as upgrading or writing) and above-mentioned any option.In response, control assembly 150 returns the length and the also available residual capacity of requestor 116 of solicited status and appointed object.
Figure 10 is the process flow diagram of explanation Write_Object method.When the requestor calls the method shown in frame 242, write certain piece number on the certain position of control assembly 150 in appointed object.
Write method also can cause other method of calling.For example, if supported parity checking by access arrangement 110-112, then write operation can call the XOR method automatically, and it is to being carried out xor operation by write data, and then, odd and even data is written in one or more predetermined parity checking equipment.
In order to call the method, requestor 116 provides piece number, the optional information that can write in the starting block position write in License Info, object identifier, partition id, the object, the object and the data that will write.In case the method for calling, the object of the concrete data modification appointment that control assembly 150 usefulness provide.Shown in frame 244.Then, control assembly 150 is revised the necessary attribute of appointed objects, as object length, with the time marker of this object association etc.Shown in frame 246.Then, control assembly 150 is revised the necessary attribute of other object in the place of needs, such as the place that relates in the zone object table.Shown in frame 248.Then, control assembly 150 returns solicited status to respective request person.Shown in frame 250.
Figure 11 is the process flow diagram of explanation Open_Only_Objct x method.When calling the method, control assembly 150 allows requestor 116 only for reading purpose access appointed object.Like this, when calling this object, shown in frame 230, the requestor provides License Info, partition id, object ID and optional information.Then, 150 pairs of appointed objects of control assembly are provided with an attribute and show that object just in use.Shown in frame 232.Then, control assembly 150 is provided with read only attribute at this object.Show that the requestor can not write this object.Shown in frame 234.At last, control assembly 150 returns the length of solicited status and appointed object.Shown in frame 236.
Figure 12 is the process flow diagram of explanation Read_Objectx method.When requestor 116 wishes equipment 110 from the special object return data, call the method.The requestor provides License Info, object ID, partition id, will read the reference position of piece, the piece number that will read and other desired optional information.In response, control assembly 150 return solicited status, the data length that will return and the real data returned in response to the method.Shown in frame 258.
Figure 13 is an explanation Close_Object x method flow diagram.When requestor 116 called this method shown in frame 264, the requestor provided License Info, object ID and any desired optional information.In response, control assembly 150 is revised the data in the appointed object shown in frame 266.In addition, cause that as the result who writes object object understands some change.If, in the past medium was not write as, then at this moment also can write.Control assembly 150 shown in frame 268, the attribute of upgating object X.For example, if to liking the new object of creating, then its attribute is upgraded with the attribute information of creation-time and other requirement.In addition, the attribute that is modified shows that data are by the time of Last modification in the object, if data length is changed, then control assembly 150 is provided with an attribute, shows that the requestor does not re-use this object.As optional, control assembly 150 can also renewal and object association and is reflected in resident in cache information in the object properties.Shown in frame 270.For example, if particular requester 116 is configured to request will notify memory device 110, the data of closing object are still wanted high-speed cache or high-speed cache no longer, so that the application that the operating system of memory device 110 is object closes continuously fast and open once more keeps cache information.But simultaneously, memory device 110 can be which parts reserved track because of notifying in another requestor access same target in averaging time the causes relevant collision event in the system 100.At last, control assembly 150 returns solicited status, shown in frame 270.
Figure 14 is the process flow diagram of explanation Remove_Object x method.When calling this method, shown in frame 278, control assembly 150 takes the necessary steps and deletes this object from medium.Shown in frame 280.Then, control assembly 150 is revised and the related zone object tables of deleted object place subregion, and reflects appointed object ID and can have been used again.Shown in frame 282.At last, control assembly 150 returns solicited status, shown in frame 284.In order to call the method, requestor 116 will provide License Info, partition id, object ID and any desired optional information.At last, control assembly 150 returns solicited status, shown in frame 284.
Figure 15 is the process flow diagram of explanation Create_Partition method, and this method can be called by the requestor, shown in frame 290.So that on memory device 110, create a subregion.It should be noted that when the Greate_Partitionx method is divided into one or more zone to driver the institute that needn't take into account on the medium has living space.Subregion also can be crossed over zones of different on the disk in addition.
In one embodiment, create the subregion that tiling is arranged, the true division of storage space on its representative equipment with this method.Divide the space with this arrangement by seeervice level, as the data array.This subregion can not change size but can delete and rebuild.
According to a further aspect in the invention, subregion as logical partition, is convenient to organization object in logic, rather than is managed the space by seeervice level.In second embodiment, subregion dynamically changes size.
In order to call this method, the requestor provides License Info, any desired optional, partition id and distributes for identification is assigned to the initial space of formulating in the concrete space of demarcating part.In response, control assembly 150 is with allocation of space on the medium or specified partition, shown in frame 292.Then, control assembly 150 is set up subregion controlling object and zone object table, shown in frame 294 and 296.As mentioned above, can not delete the zone object table.It can be used as the starting point of object in the navigation subregion.At last, control assembly 150 returns solicited status and the block plan of having divided is described.Shown in frame 298.
Figure 16 is the process flow diagram of explanation Remove_Partition X method.In order to call this method, requestor 116 provides the partition id of License Info, optional information and the deleted subregion of sign.Shown in frame 304.In response, control assembly 150 goes the previous space related with subregion to distribute, shown in frame 306.Then, control assembly 150 is left out all objects in the zone object table related with deleted subregion, and deletion zone object table is also deleted the subregion controlling object.Shown in frame 308,310 and 312.At last, the block plan that changes is divided in control assembly 150 return state states and demonstration.Shown in frame 314.
According to an aspect of the present invention, the data management policy will be notified each memory device 110-112, so that memory device is carried out this management policy when moving independently of each other.This significant advantage that provides is, not only less people's intervention and can carry out more predictable and management control timely.
For example, the last data of memory device 110-112 can back up weekly desirably.Typically in the backup of idle period of time at weekend, the system availability of making is unlikely to interrupt between all service period conventional system.But when power system capacity constantly increased, the window of availability dwindled gradually.Like this, for backing up the data that may reach terabyte, be difficult to find out the system break time of length like this.
According to an aspect of the present invention, based on being endowed attribute decision to this object action, OO memory device 110-112 can arrive the notice backup functionality whenever of its correct status that will back up at object.And the backup of all files can be distributed between longer-term, and remaining paper is still upgrading during this period, and does not influence data integrity.
Other example of the attribute that OO memory device 110-112 can call comprises encryption, compression, empty version (versioning) and odd even redundancy.In each example, only need notify memory device 110-112 the policy relevant with appointed object or group of objects.Then, equipment itself can be carried out the agency that this function or notice provide this service.
For example, memory device 110-112 itself can finish compression and encrypt.Thereby what only need announcement apparatus is that an object is required compression or encrypts this fact.For the management function of being carried out by the agency, not only the management function policy must be notified memory device, and will inform and how discern the agency who carries out this function, causes memory device when the agency carries out this function, addressable agency.
According to an aspect of the present invention, between object, set up relatedly, be convenient to discern and have same alike result or relevant object.For example, suppose that a database comprises 6 files or object, one can not be backed up, unless all files or object are all closed or specify one to close with other object that has no relations.Come this class mutual relationship between management object with file server 114 possibly.In addition, when the present invention was provided with in the array odd even, it was relevant also to set up internal unit.By setting up such group, wherein, equipment or object understand that then all the other equipment or object also have the identical characteristic of essence among the group, so the group management can be more effective.
Figure 17-the 24th illustrates by calling the process flow diagram of the management function that method that object on the memory device shows finishes.Calling these methods takes steps to finish the management function related with call method control assembly 150 and/or relevant control assembly.
Figure 17 is the process flow diagram of explanation Export_Objectx method.Shown in frame 130, requestor 116 is by providing License Info, optional information, object ID, target device ID and target partition ID, and calls this method.This output intent makes the rule action of memory device 110-112 according to the attribute of the given object association of expression.For example, it can be used for starting backup or supports to give the object of miscellaneous equipment to decide version.
When calling the Export_Objectx method, control assembly 150 obtains appointed object from medium, shown in frame 322.Then, the target device of 150 pairs of requestor's 116 appointments of control assembly calls the Open_Create method.Shown in frame 234.Then, 150 pairs of control assemblies provide the data of appointed object and the target device of attribute to call write method.Shown in frame 326.At last, 150 pairs of target devices of control assembly call the Close method, so as finish target device write after, close the object on the target device.Shown in frame 328.At last, control assembly 150 returns the new object ID of solicited status together with the object that is written into target device to the requestor.Shown in frame 330.
Also support to allow the requestor to look back the method that obtains object properties and object properties are set earlier by the interface 128 that control assembly 150 is implemented.Figure 18 and 19 is process flow diagrams that corresponding Get_Objectx_Attributes and Get_Objectx_Attributes method are described respectively.
In case call method shown in Figure 180, shown in frame 336, make control assembly 150 obtain the attribute of appointed object.In an illustrative embodiment, the requestor provides License Info, object ID or object ID table and optional information.Then, control assembly 150 obtains the attribute related with this object ID or object ID table and returns these attributes together with solicited status to the requestor.Shown in frame 338.
Figure 19 illustrates the Get_Objectx_Attributes method.Shown in frame 344, can provide License Info, object ID and optional information to be called to control assembly 150 by the requestor.Then, the attribute of the information correction appointed object that control assembly 150 usefulness requestors provide, and return solicited status and the attribute of the appointed object that is modified.Shown in frame 346 and 346.
According to a further aspect in the invention, can lock object, make them can only could be after the server release that object is locked by access.In an illustrative embodiment, object can lock by object level, partition level or device level.Add latch mechanism and provide access plan for internal server.In a preferred embodiment, dispatch parallel renewal and during safeguarding, forbid access with this lock.Figure 20,21 and 22 is process flow diagrams that explanation can be used as the locking method of Get_Attribute and Set_Attribute method example.But, also provide additional detail to the object lesson of these methods, make it to be used in the requestor bunch between shared data.
Figure 20 is the process flow diagram of explanation Read_Lock_Attributes method.By requestor 116 to control assembly 150 License Info, object, subregion or device id are provided, the parameter that locks and desired optional information call the method, shown in frame 354.In response, control assembly 150 determines whether appointed object is provided with lock.Then, control assembly 150 returns solicited status for the requestor who has lock.Shown in frame 356.
Figure 21 is the process flow diagram of explanation Set_Lock_Attributes method.By License Info, object, subregion or device identifier information, the parameter that locks and optional information are provided, the requestor can be called, shown in frame 362.When calling this method, control assembly 150 is checked the lock with the sign object association.Shown in frame 364.Then, control assembly is attempted to finish with requestor's sign and is locked or unlocking operation.Shown in frame 366.If the requestor of solicit operation is the owner of lock, then carry out this operation.Otherwise do not carry out this operation.No matter which kind of situation, control assembly 150 are returned the ID of solicited status and lock owner server.Shown in frame 368.
Figure 22 is the process flow diagram of explanation Reset_Lock_Attribute method.When no longer including this function, locks lock owner's server with this function replacement.By being provided, License Info, object, subregion or device identifier information, lock parameter and desired optional information called the method.Shown in frame 374.In response, control assembly 150 pairs of appointed objects, subregion or equipment lock, shown in frame 376.And return the identifier of solicited status and lock owner server.Shown in frame 378.
Figure 23 and 24 is process flow diagrams of explanation Get and Set_Device_Association method.Relation between these method definition or inquiry unit 110-112.Illustrative of this relation is implemented to comprise: take one of memory device 110-112 as first equipment of this group equipment or main equipment and all the other equipment are the subordinate group member of this group.First equipment of this group or main equipment are responsible for the variation of group attribute is decomposed other group member.If the attribute setting is not organized first equipment or main equipment provides from this, then other member should refuse these attribute settings.For making memory device 110-112 carry out these functions, they have the ability of self-check.This makes these equipment can check self is to determine whether a being member of large equipment group more.
Explanation Get_Device_Associations method among Figure 23.By providing License Info and optional information to call this method, shown in frame 384.In response, control assembly 150 returns solicited status and this equipment becomes the relation that the group member asks.Shown in frame 386.Figure 24 is the process flow diagram of explanation Set_Device_Associations method.By providing License Info, optional information, membership table and the related attribute of definition to call the method, shown in frame 392.In response, control assembly 150 is revised contained device association object 156 on the medium, shown in frame 394.The device association object that is modified comprises time marker that attribute that the requestor provides and indicated object be modified at last etc.Control assembly 150 returns solicited status, shown in frame 396.
Above-mentioned License Info permission file server 114 usefulness door is controlled the access to storer.Requestor 116-120 gives the voucher of file server necessity, to obtain the response of memory device 110-112.File server 114 is gone back commands storage device 110-112, and they must be the I/O requestors who adheres to observing the device security policy.Convey to memory device 110-112 by the Set_Object_Attributes method with the key that marginal data will be subordinated to the security admission ability.If memory device 110-112 is provided with certain safe level, then memory device can be configured to check each I/O order for security.But as mentioned above, some application need not adopted security.And then, if having some, specific cluster of server is positioned at equipment on another facility, it is contemplated that the safe level higher with the communication definitions of distant place equipment then needn't be like this to this locality communication.This allows distant place requestor or server are adopted security, but avoids the performance loss of adopting same security to cause because of to local requestor or server.
And then each memory device 110-112 preferably comprises readable monotone increasing clock, is used for beating time marker to security information and object.In an illustrative embodiment, base synchronously when the clock of each equipment and total system.In another illustrative embodiment, file server 114 provides deviation between the memory device and value one by one.
This shows that the invention provides the OO memory device such as disc driver, it has remarkable advantage than conventional memory device.OO memory device has significantly improved a bunch architecture.For example, by storing data in OO mode, these data can be by the memory device self-management.Object for resident data provides abundant knowledge, is enough to bear the responsibility of managing himself space to memory device.And when equipment had configuration information about logic entity, control data was shared more intelligently.For example, if the data that are stored in the block-oriented equipment are shared by two systems, then must control the parallel access activity of all metadata.In contrast, in OO equipment, many metadata activities are opaque to the system that visits it.Like this, system only need be concerned about the access conflict to user data.In addition, itself finishes space management equipment, just can avoid attempting to manage at one time arguement and the confusion that same memory device space produces because of two systems.
In addition, being abstracted into the easier multimachine kind of carrying out of object calculates.OO memory device provides the ability of the interruptable tissue of a kind of operating system at least.
And then, use OO memory device to have many reasons to strengthen the performance of cluster system.For example, metadata never needs separation instrumentation itself, thereby has eliminated the I/O operation of some.
In addition, equipment all know at any time which to as if open or close, utilize this information can more effective cached data.Because equipment is known the form of positive reading object, thereby it is more effective to look ahead.Memory device can more effectively be determined sequential access mode.High-speed cache in the equipment can once be that its system of a plurality of positive accesses preserves metadata.In addition, equipment can participate in serving the quality of resolution, such as where store data is more suitable.Only when having a responsibility for allocate memory, to do like this equipment.In contrast, the neither one operating system storage area distribute data that can press disc driver almost.So equipment self possesses this ability and has just strengthened equipment performance.
The present invention also can implement being arranged on the disk drive of array.Because the information that is stored on the array of disk drives is more valuable than disk drive itself usually, so, often claim drive array be Redundant Array of Inexpensive Disc (Redundant Arrays of Inexpensive Discs) (RAID).RAID system or what RAID of several types are known.For example, first order RAID as mentioned above, is characterized in providing the mirror image dish.To level V RAID, data of storing in the array and parity checking or redundant data cloth and one group of all interior disk drive.Level V RAID data and checking information cross-distribution to all dishes that comprise monitor desk.The RAID of all the other levels (for example 2-4 level) is being entitled as " disk array with array support controller and interface ", and the patent No. is to go through in 5617425 the United States Patent (USP).
The write operation that Figure 25-29 explanation is carried out according to one aspect of the invention, wherein, the storage data are as the object of array disk drive.In the embodiment shown in Figure 25, file server 114, requestor's (or main frame) 116 and interconnection 125 are connected to the array of disk drives of the memory device (as 110-112) that is made of destination drive 402 and parity check driver 404.Destination drive 402 maintains the part that an object maybe will be write, and parity check driver 404 maintains the parity information of the destination object association of depositing on the destination drive 402.
In Figure 25, the drive array of enforcement makes the RAID5 array, and wherein data and parity checking cross-distribution are in the All Drives of this group.Thereby only to current write operation, driver 402 is destination drives, and driver 404 is parity check drivers.In other words, destination drive 402 also can keep parity information, and parity check driver 404 also can keep data.But to following single write operation, driver 402 is destination drives, and driver 404 is corresponding parity check drivers.It should be noted that also available other grade RAID of the present invention rather than Pyatyi RAID implement.The present invention is conspicuous to the application of this RAID system for those skilled in the art.
In Figure 25, destination drive 402 and parity check driver 404 interconnect such as other serial line interface through fiber channel interface or other suitable interface.
Figure 26 and 27 illustrates destination drive 402 and parity check driver 404 respectively.Each driver comprises control assembly 150 and one or more dish 132.Each driver also comprises read/write circuit 406 (for example above-mentioned data head) and XOR (XOR) circuit 408.Destination drive 402 comprises that storage write the disk space 410 of destination object.Parity check driver 404 comprises the disk space 412 of storing corresponding parity checking object.Below, Figure 28 and 29 will go through the operation of driver 402 and 404.
(SCSI) realizes conventional disk array with small computer system interface, and the XOR order makes disk drive carry out essential bit manipulation, implements to prevent the parity checking protection of driver malfunction.This order request main frame (or requestor) has the sector of this dish of access, and like this, the respective sectors that contains another disk drive of parity information as the arbitrary sector that writes disk drive can suitably be upgraded.But above-mentioned OO disk drive is introduced an extract layer between the actual storage sector of main frame and this disk drive.Particularly, disk drive is managed disk space as object, makes the following sector of main frame (or requestor) access less than addressing scheme.Disk drive self management responsibility of having living space makes requestor or the main frame can not be relevant with a position on another disk drive a part of data that write a disk drive.Like this, the requestor does not just know to be write on the disk drive address of piece, even if do not go out corresponding parity checking address yet.This makes as mentioned above, uses conventional xor function in OO disk drive, even be not impossible, also is very difficult.
Thereby, the invention provides a kind of Define_Parity_Group of being called method, this method can be called each disk drive in one group of disk drive forming parity group.This method is finished two things.At first, it provides enough information.Can call standard Write_Object method finish with conventional drive array in based on the XOR of sector order identical functions.It also makes organizes an object of creating on each driver at this and can keep the driver of specific parity data to share.The ID of parity checking object is well-known, and each driver is all known, thereby any driver of attempting to upgrade parity information is all known correct object identifier, available it find desired request.
Figure 28 is described in detail the Define_Parity_Group method.At first, requestor or main frame are to this method of each calling driver in the parity group.Shown in frame 420.In order to call this method, the requestor will provide following information:
One orderly, form the drive table of parity group.In simple explanation embodiment, also comprise the sequence number and the address of each driver.
2. be used to calculate the algorithm of parity checking.In simple declaration embodiment, to being carried out modulo operation by the block address of write data.This computing had both drawn parity check driver address (according to the 1st of above-mentioned ordered list), gave the relative block address in the parity checking object on the parity check driver (the relative part that contains the parity checking object of required parity information).
3. the data volume of odd even bar is a unit with the piece in the example.If parity data is dispersed in the space of each driver, then data volume is by basic allocation unit.
4. the identifier of parity checking object.Call the Write_Object method and upgrade the driver of parity checking object, the parity checking object identifier is sent to this object ID of this parity check driver, tailor-made above-mentioned two beginning.Should also be noted that and preferably implement multistage parity checking (for example two-stage parity checking).Like this, each driver can have two parity checking objects of as many as.In an illustrative was implemented, when driver was used to have the disk array of two-stage parity checking, each driver distributed and keeps two object IDs of knowing.The existence of second parity checking object represents just to adopt the two-stage parity checking.
5. parity checking object distribution policy.It represents whether each driver is distributed into the parity checking object the single neighboring region of disk space or will intersperse the parity checking object with the user data object.When parity checking object and data object shown in Figure 26 and 27 as the abuts tray space, this only is for illustration.It should be noted that if when interspersing the parity checking object with data, but its still predistribution.
As the response of calling the Define_Parity_Group method, the number percent of the control assembly 150 computes parity data requisite spaces in the parity group in each disk drive.Shown in frame 422.The required space of parity checking object is the disk drive number that depends in the parity group table.For example, if in the table 9 disk drives are arranged, then each driver must be given parity information with its 1/9th allocation of space.Provide by the method for calling with requestor or main frame, the parity checking object ID of knowing comes the identification space amount.Shown in frame 424.
Each driver in odd even group or the group table is preserved the information of definition parity group, and like this, disk drive is driven at every turn or when resetting, just can verifies whether this parity group suffers damage.For this reason, this information should be stored in the nonvolatile memory, shown in frame 426.
Because created the disk drive of parity checking, and on each disk drive, distribute certain space to keep one or more parity checking objects, so the data that are stored on one or more drivers with data object just can be updated.Figure 29 illustrates according to an aspect of the present invention, more new data-objects and upgrade the block diagram of corresponding parity checking object.
For new data more, ask a disk drive in 116 pairs of these parity group of requestor of new data more to call above-mentioned Write_Object method.In the embodiment shown in Figure 25-27, requestor 116 calls the Write_Object method.As among Figure 26 to shown in the arrow 428 of destination drive 402.In an embodiment, in order to call the method, requestor 116 provides piece number, optional information of being write in the starting block position write in object identifier that sign is updated object, partition id, the object, the object and the data that will write.Destination drive 402 is known the Write_Object method of serving, and must comprise and the parity information that is updated the renewal of object association.Destination drive 402 is known this information.Because the information that provides during its execution Define_Parity_Group method and produce exists in the nonvolatile memory.
In order to upgrade parity information, destination drive 402 is carried out series of steps.At first, its assigned address from destination object is read old data, and it is offered XOR circuit 408 together with the new data that will write on this position.Shown in the arrow 434,436 and 438 of the frame 432 of Figure 29 and Figure 26.
Secondly, parity information in the middle of the old data XORs of destination drive 402 usefulness (XOR) new data obtains.Shown in the frame 440 of Figure 29.Destination drive 402 provides middle parity information at the delivery outlet 442 of Figure 26.Then, destination drive 402 writes target location in the destination object 410 to new data, like this, upgrades destination object.Shown in the frame 444 of Figure 29.
Then, 402 pairs of parity drive 404 of destination drive are called the Write_Object method.The parity checking object of corresponding destination object 410 just in time is updated.Shown in the arrow 448 of the frame 446 of Figure 29 and Figure 27.Destination drive 402 can be calculated the target location of parity checking object in many ways.For example, destination drive 402 can be from being write this position of relative sector address computation of destination object piece.This relative address is removed by the driver number of parity group, obtains the relative address of parity checking object on the parity check driver 404.Determine the parity check driver address with the algorithm of stipulating in the Define_Parity_Group method.Like this, destination drive 402 constitutes the Write_Object methods and parity check driver is called, and discerns parity checking object 412, and discerns appropriate location in this object with relative address.
For example, be updated the relative piece in the parity checking object on the driver 404 in order to calculate, destination drive 402 can be used following formula:
B=INT (S/D-1) formula 1
Wherein: B is the relative piece in the parity checking object;
S is the relative sector address of being write on the destination drive 402;
D is the driver number in the parity group.
In order to calculate the parity check driver address, destination drive 402 can be used following formula:
P=Mod (S/D-1) formula 2
Wherein: P is the skew (table that is used to calculate P must comprise the address of destination drive 402) to the parity group inner driver table of parity check driver.
Respond this Write_Object method, parity check driver 404 thinks that this order is that its parity checking object is write, and finishes parity-check operations.These operations comprise reads old parity data, shown in Figure 29 frame 450 and Figure 27 arrow.Then, parity check driver 404 is from destination drive 402, and parity data carries out xor operation to old parity information in the middle of getting.Shown in Figure 29 frame 454 and Figure 27 arrow 456 and 458.The xor operation result has upgraded parity information and it has been write the parity checking object of dish 132.Shown in the arrow 462 and 464 of the frame 460 of Figure 29 and Figure 27.Thereby finish the renewal of parity checking object.
Like this, the normally used relatively parity scheme based on the sector of the present invention has a plurality of advantages.For example, no longer intersperse parity information with the user data on the dish.Thereby may link to each other storage user file (or object), performance is better.In addition, object can back up or be output as the actual user data file.Because the parity data of not interspersing, thereby they are meaningful naturally and useful.
Equally, when downloading a database to parity group, application program can database one download immediately from requestor's access it, and needn't wait the calculating parity checking.As background process, can constitute parity information and needn't interrupt data access.This helps using quickly data.Conventional system can not be before all data of packing into comprise the parity data structure, access data.Parity data is interspersed whole user data, and user data can not independently load, and can only be connected with parity calculation.
And then because each disk drive in the parity group is known other member in this parity group, disk drive can verify whether the driver of parity updating aligning is correct.If driver is replaced or its address change, then can detects and abort operation, thereby avoid the parity data error.
Need not order when calling the parity checking protection with specific XOR.Send Define_Parity_Group order, driver is learnt that simple Write_Object order expands to and is comprised the parity checking protection.This can be carried out pellucidly by driver.
In addition, the requestor need not to send additional I/O order to driver and can support the two-stage parity checking.To two Define_Parity_Group methods of a calling driver, it has started the twin failure protection.Driver knows that it must upgrade two parity checking objects, in each parity group one.Driver also can be checked to be sure of two only shared drivers of parity group (otherwise perhaps this configuration is invalid).
The present invention includes a data-storage system, it comprises first medium 132, and the data storage that constitutes a plurality of object 124-126 in the above.Each object has the attribute of this characteristics of objects of expression.These objects comprise the redundant object 412 of store redundant information.First control assembly 150 operationally is coupled to medium 132, and is configured as object 124-126 interface 128 is provided.The call method (method O-method N) of access object 124-126 is showed at interface 128.
In a preferred embodiment, method O-N comprises the Define_Redundancy method, when calling this method, and 150 pairs of redundant objects of control assembly, 412 memory allocated spaces.In another embodiment, when calling this Define_Redundancy method, the information of redundancy group under control assembly 150 scus 150.
According to another embodiment, when calling the Define_Redundancy method, control assembly calculates the storage size that will distribute to redundant object 412.
According to another embodiment, call the Write_Object method and make control assembly 150 usefulness new datas renewal data designated object and upgrade corresponding redundant object 412 according to new data.
According to another embodiment, storage system comprises a plurality of memory devices 402 and 404, first memory device 402 comprises first medium 132 and first control assembly 150, and second memory device 404 comprises second medium 132 and second control assembly 150.In this embodiment, the data designated object to small part object and corresponding redundant object storage on first medium and part object and corresponding redundant object storage are arranged on second medium.In an illustrative embodiment, 150 pairs second memory devices of first control assembly call the Write_Object method and upgrade redundant object.
The present invention also can be embodied as method redundant in a kind of maintenance data storage system, and this data-storage system has a plurality of memory devices, and each memory device all comprises medium 132 and control assembly 150.This method is included on the described medium 132 and stores data, and handle is configured to a plurality of object 124-126 by deposit data, and each object contains the attribute of representing described characteristics of objects.Described object comprises the redundant object 412 of store redundant information.Described method further provides the interface 128 of showing access object call method (method O-method N).
In an illustrative embodiment, be included in the described method of step 420 a plurality of set of memory device are called the Define_Parity method, to dispose the described group of redundancy scheme of finishing expectation.In an illustrative embodiment, described method comprises the response invocation step, and each memory device of described group is created a redundant object 412.In step 422, described method comprises that further dispensed gives the part storage space of described redundant object 412.
Should be appreciated that, though above narrated characteristics and the advantage of each embodiment of the present invention and its 26S Proteasome Structure and Function be described in detail in detail, but this announcement only is exemplary, in principle of the present invention, can make many conversion, especially for structure and configuration aspect, so that the present invention includes the represented gamut of the broader sense of the term in the claims.For example, do not depart from the scope of the present invention and spirit, specific interface method and redundancy scheme that particular element can be depending on use change, and in fact still keep identical functions.

Claims (13)

1. data-storage system is characterized in that it comprises:
First medium has the data that are configured to a plurality of objects on it, each object has the attribute of the described characteristics of objects of expression, and described object comprises the redundant object of store redundant information; With
First control assembly, operationally being coupled to described first medium and being configured as described object provides the interface, the call method of the described object of described showing interface access.
2. in data-storage system, safeguard redundant method for one kind, described storage system comprises a plurality of memory devices, each memory device comprises medium and operationally is coupled to the control assembly of described medium, it is characterized in that this method comprises the steps:
Store data on described medium, described data configuration becomes a plurality of objects, and each object comprises the attribute of representing described characteristics of objects, and described object comprises the redundant object of store redundant information; With
The interface of the call method of showing the described object of access is provided to described object.
3. method as claimed in claim 2 is characterized in that, it is redundant that this method comprises the steps: that also a plurality of set of memory device call definition redundancy approachs are set to the realization expectation to described assembly.
4. method as claimed in claim 3 is characterized in that, this method also comprises the steps: to respond described invocation step, creates redundant object on each memory device of described group.
5. method as claimed in claim 4 is characterized in that, described foundation step comprises: dispensed is given the part storage space of described redundant object.
6. method as claimed in claim 3, it is characterized in that, this method also comprises the steps: first equipment calls write method in described a plurality of memory devices, makes first control assembly on described first memory device upgrade specified data object on first medium of described first memory device.
7. method as claimed in claim 6, it is characterized in that, this method also comprises the steps: to respond described write method, to second equipment calls parity checking write method in described a plurality of memory devices, update stored in second medium on described second memory device parity checking object related with described specified data object.
8. method as claimed in claim 7 is characterized in that, the described step of calling write method further comprises the steps: to read old data from described specified data object; And write new data to described data object.
9. method as claimed in claim 7 is characterized in that, the described step of calling redundant write method comprises the steps:
Described old data and new data are carried out xor operation to obtain the intermediate redundant data; With
Described the 2nd memory device that intermediate redundant data and redundant object identifier (ID) are provided is called redundant write method.
10. method as claimed in claim 9 is characterized in that this method also comprises the steps: in response to calling redundant write method
Read old redundant data from described redundant object;
Described old redundant data and intermediate redundant data are made xor operation to obtain new redundant data;
Write described new redundant data to described redundant object.
11. method as claimed in claim 9, it is characterized in that, this method also comprises the steps: the set of memory device that comprises described redundant object is calculated, described redundant object is related with described specified data object, the position in the described specified data object that described calculating is write based on described group memory device number with new data.
12. method as claimed in claim 11 is characterized in that, this method also comprises the steps: to calculate the position in the described redundant object that is updated according to described group of internal memory devices number and the position in the specified data object that writes with described new data.
13. a data-storage system is characterized in that it comprises:
First disk drive, it comprises: have be configured to a plurality of object datas first medium, each object have the expression described characteristics of objects attribute, described object comprises the first redundant object of store redundant information; Second disk drive, it comprises first control assembly that operationally is coupled to described first medium and is configured to provide to described object the interface, the call method of the described object of described showing interface access;
Second disk drive, operationally be coupled to described first disk drive and have the medium that have the data that are configured to a plurality of objects, each object has the attribute of the described characteristics of objects of expression, described object comprise have with the described first redundant object in first data object of described redundant information corresponding data; With second control element, operationally being coupled to second medium and being configured to provides the interface to described object, the call method of the described object of described showing interface access.
CN98808073A 1997-08-15 1998-08-14 Redundancy implementation on object oriented data storage device Pending CN1267379A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US5589797P 1997-08-15 1997-08-15
US60/055,897 1997-08-15

Publications (1)

Publication Number Publication Date
CN1267379A true CN1267379A (en) 2000-09-20

Family

ID=22000881

Family Applications (1)

Application Number Title Priority Date Filing Date
CN98808073A Pending CN1267379A (en) 1997-08-15 1998-08-14 Redundancy implementation on object oriented data storage device

Country Status (6)

Country Link
JP (1) JP2001516080A (en)
KR (1) KR20010022942A (en)
CN (1) CN1267379A (en)
DE (1) DE19882609T1 (en)
GB (1) GB2341466B (en)
WO (1) WO1999009479A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101872319A (en) * 2004-11-05 2010-10-27 数据机器人技术公司 Storage system condition indicator and using method thereof
CN101751390B (en) * 2008-12-08 2012-07-04 财团法人工业技术研究院 Disk allocation method for object-oriented storage device
CN111291026A (en) * 2018-12-07 2020-06-16 北京京东尚科信息技术有限公司 Data access method, system, device and computer readable medium

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6321358B1 (en) * 1997-08-28 2001-11-20 Seagate Technology Llc Object reconstruction on object oriented data storage device
US6029168A (en) 1998-01-23 2000-02-22 Tricord Systems, Inc. Decentralized file mapping in a striped network file system in a distributed computing environment
US6725392B1 (en) 1999-03-03 2004-04-20 Adaptec, Inc. Controller fault recovery system for a distributed file system
US6449731B1 (en) 1999-03-03 2002-09-10 Tricord Systems, Inc. Self-healing computer system storage
US6530036B1 (en) * 1999-08-17 2003-03-04 Tricord Systems, Inc. Self-healing computer system storage
US6742137B1 (en) 1999-08-17 2004-05-25 Adaptec, Inc. Object oriented fault tolerance
US20030140273A1 (en) * 2001-12-20 2003-07-24 Ajay Kamalvanshi Method and apparatus for fault tolerant persistency service on network device
US20060080354A1 (en) * 2004-08-27 2006-04-13 Nokia Corporation System for selecting data from a data store based on utility of the data
US7533330B2 (en) * 2005-06-27 2009-05-12 Seagate Technology Llc Redundancy for storage data structures

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07175700A (en) * 1993-12-20 1995-07-14 Fujitsu Ltd Database management method
US5594862A (en) * 1994-07-20 1997-01-14 Emc Corporation XOR controller for a storage subsystem
JP2003517645A (en) * 1997-08-11 2003-05-27 シーゲイト テクノロジー エルエルシー Data storage device and storage method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101872319A (en) * 2004-11-05 2010-10-27 数据机器人技术公司 Storage system condition indicator and using method thereof
CN101751390B (en) * 2008-12-08 2012-07-04 财团法人工业技术研究院 Disk allocation method for object-oriented storage device
CN111291026A (en) * 2018-12-07 2020-06-16 北京京东尚科信息技术有限公司 Data access method, system, device and computer readable medium

Also Published As

Publication number Publication date
WO1999009479A1 (en) 1999-02-25
DE19882609T1 (en) 2000-08-24
GB9928817D0 (en) 2000-02-02
GB2341466B (en) 2002-10-02
JP2001516080A (en) 2001-09-25
KR20010022942A (en) 2001-03-26
GB2341466A (en) 2000-03-15

Similar Documents

Publication Publication Date Title
CN1158604C (en) Object reconstruction on object oriented data storage device
CN1281560A (en) Hybrid data storage and reconstruction system and method for data storage device
CN1153159C (en) Method and device for server-based handheld application and database management
US6298401B1 (en) Object oriented storage device having a disc drive controller providing an interface exposing methods which are invoked to access objects stored in a storage media
US7500053B1 (en) Method and system for grouping storage system components
US8607010B2 (en) Information processing system and management device for managing relocation of data based on a change in the characteristics of the data over time
CN1104684C (en) Method and equipment for restoring hard disc driver of computer system
CN1267379A (en) Redundancy implementation on object oriented data storage device
EP1984821A2 (en) Restoring a file to its proper storage tier in an information lifecycle management environment
JP2009505256A (en) Method, system, and program for maintaining an aggregate containing active files in a storage pool (maintaining an aggregate containing active files in a storage pool)
CN1591359A (en) Apparatus and method for controlling booting operation of computer system
WO2006089092A2 (en) Hierarchal data management
JP2005510794A (en) Selective data replication system and method
CN1770088A (en) Incremental backup operations in storage networks
CN1266514A (en) Object oriented data storage device
CN1404587A (en) Method for regenerating partition using virtual drive, data processor and data storage device
JP2009217327A (en) Client environment generation system, client environment generation method, client environment generation program and storage medium
CN101034363A (en) Data backup device, data backup method, and recording medium storing data backup program
US20070077022A1 (en) Data transfer method, data transfer source apparatus, data transfer destination apparatus, storage medium for recording data transfer program and storage medium for recording transferred-data recording program
CN1983270A (en) Database schema for content managed data and its setting method and sytem
CN1991833B (en) File system and file information processing method
CA2436533A1 (en) Distributed management and administration of licensing of multi function offering applications
JPS59157747A (en) Protection of software
JPH09265763A (en) Information recorder
Lovelace et al. DFSMSrmm Primer

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C53 Correction of patent for invention or patent application
CB02 Change of applicant information

Applicant after: Seagate Technology, Inc.

Applicant before: Sichater Tehc. Co., Ltd.

COR Change of bibliographic data

Free format text: CORRECT: APPLICANT; FROM: SEAGATE TECHNOLOGY, INC. TO: SEAGATE TECHNOLOGY LLC

C53 Correction of patent for invention or patent application
CB02 Change of applicant information

Applicant after: Seagate Technology, Inc.

Applicant before: Sichater Tehc. Co., Ltd.

COR Change of bibliographic data

Free format text: CORRECT: APPLICANT; FROM: SEAGATE TECHNOLOGY, INC. TO: SEAGATE TECHNOLOGY LLC

C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication