US20040268062A1 - Cooperative lock override procedure - Google Patents
Cooperative lock override procedure Download PDFInfo
- Publication number
- US20040268062A1 US20040268062A1 US10/879,476 US87947604A US2004268062A1 US 20040268062 A1 US20040268062 A1 US 20040268062A1 US 87947604 A US87947604 A US 87947604A US 2004268062 A1 US2004268062 A1 US 2004268062A1
- Authority
- US
- United States
- Prior art keywords
- lock
- processor
- procedure
- data structure
- mode
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title claims abstract description 248
- 238000012545 processing Methods 0.000 claims abstract description 9
- 238000004891 communication Methods 0.000 claims description 27
- 238000013500 data storage Methods 0.000 claims description 16
- 238000012544 monitoring process Methods 0.000 claims description 11
- 238000001514 detection method Methods 0.000 claims description 6
- 238000010187 selection method Methods 0.000 claims description 4
- 230000000977 initiatory effect Effects 0.000 claims 1
- 238000010200 validation analysis Methods 0.000 description 30
- 230000000153 supplemental effect Effects 0.000 description 16
- 238000010586 diagram Methods 0.000 description 6
- 238000012546 transfer Methods 0.000 description 6
- 238000003491 array Methods 0.000 description 5
- 238000007726 management method Methods 0.000 description 5
- 230000002860 competitive effect Effects 0.000 description 3
- 238000012790 confirmation Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000007257 malfunction Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000013523 data management Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 235000003642 hunger Nutrition 0.000 description 2
- 230000037351 starvation Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 239000000872 buffer Substances 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000036039 immunity Effects 0.000 description 1
- 238000013468 resource allocation Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000026676 system process Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Program synchronisation; Mutual exclusion, e.g. by means of semaphores
- G06F9/524—Deadlock detection or avoidance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0793—Remedial or corrective actions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Program synchronisation; Mutual exclusion, e.g. by means of semaphores
- G06F9/526—Mutual exclusion algorithms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/52—Indexing scheme relating to G06F9/52
- G06F2209/522—Manager
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/52—Indexing scheme relating to G06F9/52
- G06F2209/523—Mode
Definitions
- This invention relates generally to a method and apparatus for improving performance in systems where multiple processors contend for control of a shared resource through a lock associated with the shared resource, and more particularly to a method and apparatus for improving performance in intelligent data storage systems.
- Many high-performance storage systems are intelligent data storage systems which may be accessible by multiple host computers. These may include, in addition to one or more storage device arrays, a number of intelligent controllers for controlling the various aspects of the data transfers associated with the storage system.
- host controllers may provide the interface between the host computers and the storage system, and device controllers may be used to manage the transfer of data to and from an associated array of storage devices (e.g. disk drives). Often, the arrays may be accessed by multiple hosts and controllers.
- advanced storage systems such as the SYMMETRIX® storage systems manufactured by EMC Corporation, generally include a global memory which typically shared by the controllers in the system.
- the memory may be used as a staging area (or cache) for the data transfers between the storage devices and the host computers and may provide a communications path which buffers data transfer between the various controllers.
- Various communication channels such as busses, backplanes or networks, link the controllers to one another and the global memory, the host controllers to the host computers, and the disk controllers to the storage devices.
- the '539 patent Vishlitzky et al, U.S. Pat. No. 5,592,492 issued Jan. 7, 1997, (hereinafter “the '492 patent”), Yanai et al, U.S. Pat. No. 5,664,144 issued Sep. 2, 1997 (hereinafter “the '144 patent), and Vishlitzky et al, U.S. Pat. No. 5,787,473 issued Jul. 28, 1998, (hereinafter “the '473 patent”), all of which are herein incorporated in their entirety by reference.
- the systems described therein allow the controllers to act independently to perform different processing tasks and provide for distributed management of the global memory resources by the controllers.
- each of the controllers may act independently, there may be contention for certain of the shared memory resources within the system.
- the consistency of the data contained in some portions of global memory may be maintained by requiring each controller to lock those data structures which require consistency while it is performing any operations on them which are supposed to be atomic.
- multimodal locks which permit the requestor to identify the kind of resource access desired by the requestor and the degree of resource sharing which its transaction can tolerate, can be useful in improving system performance and avoiding deadlocks, but providing a lock override which is suitable for a multimodal lock is quite difficult. If, for example, one lock mode is set to allow unusually long transactions, a timeout set to accommodate normal transactions will cut the long ones off in midstream while a timeout set to accommodate the long transactions will allow failures occurring during normal transactions to go undetected for excessively long periods. Moreover, timeouts are competitive procedures which, in certain circumstances, undesirably offset the cooperative advantages of a queued lock. Because of the complexities introduced by multifeatured locks, it is desirable to validate features and modes which create particularly significant drains on system resources, such as long timeout modes, but introducing additional validation features can itself load system resources to the point where the system efficiency suffers.
- a lock is needed which supports multiple locking modes and makes provision both for validation features to detect protocol violations and lock override procedures to manage the violations without unduly reducing system efficiency, and which also meets desirable design criteria for fairness, wait time minimization and guaranteed access.
- a lock mechanism for managing shared resources in a data processing system is provided.
- a method for providing queued locking and unlocking services for a shared resource includes a cooperative lock override procedure.
- the locking services are multimodal and the cooperative lock override procedure is selectively associated with a lock mode.
- a method for providing self-validating, queued lock services for managing a shared resource in a data processing system services includes providing a cooperative lock override procedure.
- the data processing system includes a plurality of processors as lock requestors. Each processor supports atomic operations and is coupled to the shared resource through one or more first common communication channels.
- the method includes providing for each shared resource an associated main lock data structure stored in a shared memory accessible by the plurality of processors.
- the main lock data structure includes in a single atomic structure, the resources needed to lock the shared resource by a successful lock requester, to establish a queue of unsuccessful lock requestors, and to validate the existence of the lock. Resources are also provided to validate the identity of the successful lock requestor in connection with certain transactions.
- the method also includes providing for each shared resource, an associated auxiliary lock data structure stored in a shared memory accessible by the plurality of processors.
- the auxiliary lock data structure may be a single entry, the entry being a single atomic structure, or it may be an array which includes an entry for each processor, each entry being a single atomic structure.
- Each entry includes the resources needed to identify the successful lock requestor's place in a queue of requesters and to identify the successful lock requestor.
- Each entry may also include the resources needed to save a timestamp as a reference value.
- the method also includes providing for each processor a monitoring procedure for detecting a predetermined indication of protocol failure by an one of the plurality of processors and identifying the failing processor.
- the method also includes providing for each processor a lock services procedure including a queuing procedure for unsuccessful lock requesters, locking and unlocking procedures for locking and unlocking the shared resource by a successful lock requestor, and a cooperative lock override procedure responsive to the detection of the predetermined indication of protocol failure.
- the method also includes detecting, by one of the processors, one of these predetermined indications of protocol failure and identifying the failing processor.
- the method also includes, in a single atomic operation, examining the contents of the auxiliary lock data structure by the detecting processor to determine whether the identified failing processor is the successful lock requestor, and either, if the identified failing processor is the successful lock requestor, in a single atomic operation by the detecting processor, examining the contents of the main lock data structure and writing data to the main lock data structure to reserve the lock to the next requestor in the queue of unsuccessful lock requestors and to revalidate the lock, or, if the identified failing processor is not the successful lock requester, exiting the cooperative lock override procedure.
- one of the requesting processors may, in a single atomic operation, examine the contents of the main lock data structure to determine if another requesting processor has previously locked the shared resource and if the lock contents are valid, and if it determines that the contents are invalid or no other requesting processor has previously locked the shared resource, it may write data to the main lock data Structure to reserve and validate the lock.
- the lock services procedure further includes at least two lock mode procedures and a lock mode selection procedure for selecting one from the lock mode procedures by a successful lock requestor.
- the locking and unlocking procedures include one or more procedures for locking and unlocking the shared resource in the selected lock mode by a successful lock requester and the cooperative lock override procedure is selectively associated with a lock mode.
- the atomic main lock data structure further includes the resources needed to identify one of the lock modes and the auxiliary lock data structure further includes the resources needed to identify one from the lock modes.
- the detecting processor may, in the same atomic operation, verify that the identified lock mode is a lock mode associated with the cooperative lock override procedure and in writing data to the main lock data structure to reserve the lock to the next requestor in the queue of unsuccessful lock requestors the detecting processor may, in the same atomic operation, invalidate the identified lock mode.
- the invention provides an intelligent data storage system.
- the intelligent storage system typically includes multiple processors as requestors, and these are coupled to a shared resource through one or more first common communication channels.
- the system also includes a shared memory accessible over one or more second common communications channels to all of the processors.
- Each processor supports atomic operations.
- Each processor implements a monitoring procedure for detecting a predetermined indication of protocol failure by a one of the plurality of processors and identifying the failing processor.
- a lock services procedure is also implemented in each of the processors.
- the lock services procedure includes a queuing procedure for unsuccessful lock requestors, and locking and unlocking procedures for locking and unlocking the shared resource by a successful lock requester, and a cooperative lock override procedure responsive to the detection of the predetermined indication of protocol failure.
- M atomic main lock data structure responsive to the lock services procedures, is implemented in the shared memory and associated with the shared resource.
- the main lock data structure includes the resources needed to lock a shared resource by a successful lock requestor, to establish a place in a queue of unsuccessful lock requesters, and to validate the existence of the lock.
- An atomic auxiliary lock data structure responsive to the lock services procedures, is also implemented in the shared memory and associated with the shared resource.
- the auxiliary lock data structure includes the resources needed to identify the successful lock requestor's place in a queue of requesters and to identify the successful lock requesters.
- Each processor is operable in accordance with its monitoring procedure to detect a predetermined indication of protocol failure and identify the failing processor.
- Each processor is also operable in accordance with its lock services procedure, first to initiate its cooperative lock override procedure responsive to its detection of the predetermined indication of protocol failure, and then in a single atomic operation, to examine the contents of the auxiliary lock data structure to determine if the identified failing processor is the successful lock requestor, and either, if the identified failing processor is the successful lock requester, in a single atomic operation, to examine the contents of the main lock data structure and write data to the main lock data structure to reserve the lock to the next requestor in the queue of unsuccessful lock requestors and to revalidate the lock, or, if the identified failing processor is not the successful lock requester, to exit the cooperative lock override procedure.
- Each of the requesting processors is also operable in accordance with its lock services procedure, in a single atomic operation, to examine the contents of the main lock data structure to determine if another requesting processor has previously locked the shared resource and if the lock contents are valid, and if it determines that the contents are invalid or no other requesting processor has previously locked the shared resource, it may write data to the main lock data structure to reserve and validate the lock.
- the lock services procedure further includes at least two lock mode procedures and a lock mode selection procedure for selecting one from the lock mode procedures by a successful lock requester.
- the locking and unlocking procedures include one or more procedures for locking and unlocking the shared resource in the selected lock mode by a successful lock requestor and the cooperative lock override procedure is selectively associated with a lock mode.
- the atomic main lock data structure further includes the resources needed to identify one of the lock modes.
- the detecting processor may, in the same atomic operation, verify that the identified lock mode is the lock mode associated with the cooperative lock override procedure and in of writing data to the main lock data structure to reserve the lock to the next requestor in the queue of unsuccessful lock requestors the detecting processor may, in the same atomic operation, invalidate the identified lock mode.
- multiple processes running on a single processor may in some aspects act as requesters, and a lock allocation process or procedure may be invoked by each of these processes, but the operation of the invention is otherwise as described above.
- FIG. 1 is a block diagram of a computer system including a shared resource and incorporating the present invention.
- FIG. 2 is a more detailed block diagram of the computer system of FIG. 1 including an intelligent mass storage system.
- FIG. 3 is a schematic diagram of the main lock data structure used to implement the invention in the system described herein.
- FIG. 4 is a schematic diagram of the auxiliary lock data structure used to implement the invention in some aspects of the system described herein.
- FIG. 5 is a flowchart illustrating steps used to enter the lock request queue, and to poll for and obtain the lock during normal operation of the system described herein.
- FIG. 6 is a flowchart illustrating steps used to perform a timeout lock override procedure associated with a selected one of the lock modes implemented in the system described herein.
- FIG. 7 is a flowchart illustrating steps used to take the lock, to select one of the lock modes, to perform a supplemental validation associated with the selected lock mode, and to initialize a cooperative lock override procedure associated with the selected lock mode implemented in the system described herein.
- FIG. 8 is a flowchart illustrating steps used to perform the cooperative lock override procedure corresponding to a selected one of the lock modes in the system described herein.
- FIG. 9 is a flowchart illustrating steps used to unlock the lock during normal operation of the system described herein.
- computer system 10 is shown to include, among other things, a plurality of processors 1 a - 1 n , running processes A-N, coupled to a shared resource 4 via one or more first common communication channels 3 a - n and to a shared memory 2 via one or more second common communication channels 7 a - n .
- processors 1 a - 1 n may request access to shared resource 4 in order to execute their processes A-N.
- the processors are actual or virtual digital processing units which include one or more CPU's and additional local memory 5 a - n .
- processor 1 a may be an intelligent device controller, an open-systems computer, a personal computer, a server, an intelligent host controller or a virtual system residing on a mainframe computer. Since each of the computer systems just mentioned typically communicates using a specific communication protocol, each of the first and second common communication channels will correspondingly be those channels specific to the computer system to which they are coupled. That is for example, assuming processor 1 b is an open-systems type server (e.g. running the UNIX Operating System), channel 3 or 7 would typically be a SCSI type communications bus or a fibber-channel communications path. All communications over channel 3 or 7 would therefore adhere to the respective SCSI or fibre-channel communications protocols.
- open-systems type server e.g. running the UNIX Operating System
- Processes A-N may be, for example, procedures run by the processors, operating system processes or higher level applications.
- the processors may run other processes not involving shared resource 4 .
- the invention may also be applicable to multiple processes contending for a shared resource but running on a single processor, although this aspect is not illustrated in the drawings.
- system 10 To synchronize accesses to the shared resource 4 and provide data consistency, system 10 also provides a queued lock associated with shared resource 4 .
- the queued lock is implemented by a main lock data structure, 30 and, in some aspects, an auxiliary lock data structure, 40 , both her described below, in shared memory 2 and a lock services procedure 6 a - 6 n running on each of processors 1 a - 1 n , respectively.
- the lock data structures, 30 and 40 must be implemented in a section of memory that is accessible by all of the processors which might need access to the shared resource, although they need not be on the same media as the shared resource.
- the procedures which allocate the lock may be centralized or distributed. In the intelligent data processing systems described above, the lock services procedures are typically distributed among the various intelligent controllers.
- the main lock data structure, 30 is used for queuing, mode designation, and transfers of control. It is an atomic data structure which indicates the queue position of the current holder of the lock, the next available position in the queue of subsequent lock requests, the lock mode employed by the current successful lock requestor, and validation information which may be used to identify certain protocol failures requiring lock overrides. Resources may also be provided in the main lock data structure to validate the identity of the successful lock requestor in connection with certain transactions. In some aspects of the invention, the auxiliary lock data structure, 40 , is used for validation and may be used to identify additional protocol failures requiring lock overrides, for example, those associated with a particular lock mode.
- the auxiliary lock data structure, 40 may be a single entry, the entry being a single atomic structure, or it may be an array which includes an entry for each processor, each entry being a single atomic structure.
- Each entry includes the resources needed to identify the successful lock requestor's place in a queue of requestors and to identify the successful lock requestor.
- Each processor typically invokes its lock services procedure, for example procedure 6 b for processor 1 b , before starting a transaction on the shared resource 4 , and may obtain a lock on the shared resource 4 if it is available. Only after a successful requestor from among the processors obtains the lock will that processor perform its transaction on shared resource 4 .
- each of the lock services procedures 6 a - 6 n incorporates, in accordance with the present invention, a lock contention procedure, at least two lock mode procedures, procedures for locking, mode designation and unlocking operations by a successful lock requester in normal operation, algorithms for arbitrating among multiple requests for locks on the shared resource 4 from multiple unsuccessful requestors 1 a - 1 n , and a polling procedure for allowing a previously unsuccessful requestor to determine its current status, and, in some aspects, lock override procedures and supplemental lock validation procedures associated with various lock modes, all of which will be further described below.
- the shared resource 4 of computer system 10 may be almost any resource that might be used by multiple processes, such as a mass storage device, a memory, a data structure within a memory, an ATM or a communication device.
- the shared memory 2 of computer system 10 is mutually shared by or accessible to the processors 1 a - n .
- the shared memory 2 and shared resource 4 may be contained in a single logical object, in separate logical objects contained in a single physical object, such as two portions of a global memory, or they may be separate physical and logical objects, such as a memory and a disk drive.
- the invention is implemented in an intelligent data storage system which includes several individual components coupled via internal communications channels, and the shared resource 4 is one or more of a set of shared data resources, such as data records, data management records and blocks of data, in the data storage system.
- Computer system 10 includes an intelligent data storage system 14 , and may also include a plurality of host processors 12 a - 12 n connected to the intelligent data storage system 14 by host communication channels 13 a - 13 ( 2 n ).
- the storage system 14 includes a plurality of host controllers 21 a - 21 n which are, according to a preferred embodiment of the present invention, coupled alternately to buses 22 and 23 .
- Each host controller 21 a - 21 n is responsible for managing the communication between its associated attached host computers and storage system 14 .
- Storage system 14 also includes a global memory 11 coupled to both buses 22 and 23 .
- the global memory is a high speed random access semiconductor memory.
- Global memory 11 includes a large cache memory 15 which is used during the transfer of data between the host computers and the storage devices of arrays 26 a - 26 n .
- the global memory 11 also includes, as further described below, a cache manager memory 16 and a cache index directory 18 which provides an indication of the data which in stored in the cache memory 15 and provides the addresses of the data which is stored in the cache memory.
- Also coupled alternately to buses 22 and 23 are a plurality of device controllers 25 a - 25 n . Coupled to each device controller is an array of mass storage devices 26 a - 26 n which as shown here may be magnetic disk devices.
- each device controller is responsible for managing the communications between its associated array of drives and the host controllers 21 a - 21 n or global memory 11 of storage system 14 .
- a set of shared data resources in which data may be stored are implemented in data storage system 14 and accessible by a plurality of the processors in system 10 .
- Some or all of the data records, blocks of data and data management records in the global memory 11 and device arrays 26 a - 26 n may be shared data resources.
- the invention will be explained by treating a single data structure implemented in a portion of global memory 11 as the only shared resource 4 .
- the exemplary data structure is a replacement queue 20 , formed from a region of shared memory, such as cache manager memory 16 .
- Replacement queue 20 is analogous to the “least recently used” (LRU) queue used in prior art cache managers for readily identifying the least-recently-used data element in the cache. Because the cache memory has a capacity that is smaller than the main memory, it is sometimes necessary for data elements in the cache memory to be removed from or replaced in the cache memory in order to provide space for new data elements being staged into the cache memory. Typically, the cache manager will remove or replace the “least-recently-used” data element in replacement queue 20 .
- LRU least recently used
- replacement queue 20 is referred to more generally as the replacement queue.
- the typical intelligent data storage system 14 includes many such shared data resources.
- the invention is equally applicable to any shared resource 4 in a system 10 which may be accessed by a plurality of the processors through a queued lock.
- other shared resources in intelligent data storage system 14 may include cache index directory 18 , other data structures in cache manager memory 16 , some or all of the data records in cache memory 10 , and some or all of the blocks of data on disk arrays 26 a - 26 n .
- Intelligent data storage systems for certain applications may require extensive locking of shared data resources, while other applications may require locking of fewer data resources.
- the main lock data structure 30 and the auxiliary lock data structure 40 are also implemented in cache manager memory.
- Various procedures may be executed by each of the host controllers 21 a - 21 n and device controllers 25 a - 25 n to access and manage the replacement queue 20 as well as other shared data resources in cache memory 15 , cache index directory 18 and cache manager memory 16 , as further described, for example, in the '539 patent, the '307 patent, the '144 patent, and the '473 patent, all of which are herein incorporated in their entirety by reference.
- Procedures 6 a - 6 ( 2 n ) are the lock services procedures of this invention.
- Procedures 27 a - 27 ( 2 n ) are the replacement queue management procedures for host controllers 21 a - 21 n and device controllers 25 a - 25 n respectively.
- the shared resource 4 is replacement queue 20 implemented in the cache manager memory 16 of global memory 11
- the processors 1 a - n are the host controllers 21 a - 21 n and device controllers 25 a - 25 n
- processes A-N are the replacement queue management procedures 27 a - 27 ( 2 n ) which manage the replacement queue 20
- the shared memory 2 is also the cache manager memory 16 .
- the storage busses 22 and 23 provide access to the shared resource 4 , so these are the first communication channels 3 a - 3 n .
- the storage busses 22 and 23 also provide access to the shared memory 2 so these are the second communication channels 7 a - 7 n .
- Local memory 5 a - 5 n will typically be implemented on both host controllers 21 a - 21 n and device controllers 25 a - 25 n.
- this example illustrates two preferred aspects of the invention, namely, that the system embodying the invention is the intelligent data storage system 14 and that the processors access the lock data structures 30 and 40 over the same channels used to access the shared resource 4 , i.e. the first and second communication channels are identical.
- the processors may be any or all of the host controllers 21 a - 21 n , device controllers 25 a - 25 n , or host computers 12 a - 12 n
- the channels 3 a - 3 n may be any or all of channels 13 a - 13 n or busses 22 or 23
- the processes A-N and associated lock services procedures 6 a - 6 n may be other processes or procedures managing other shared data resources.
- the lock data structures 30 and 40 need not reside in the same logical device Or be accessed over the same channels as each other or as the shared resource 4 .
- the invention is also applicable to embodiments where the first and second communication channels are separate.
- FIG. 3 is a schematic diagram of a preferred form of the main lock data structure 30
- FIG. 4 is a schematic diagram of a preferred form of the auxiliary lock data structure, 40 .
- MAIN the main lock data structure, 30 , is short enough for an atomic operation and typically has the following form:
- HOLDER_ID LOCK_MODE
- LOCK_PW LOCK_PW
- CURRENT_HOLDER NEXT_FREE.
- the HOLDER_ID parameter, 31 may be used as an identifier of the requester which currently holds the lock. Each possible requestor in the system is assigned a unique HOLDER_ID. In some aspects of the invention, it is only updated in connection with certain lock modes, so it may not always identify the current holder of the lock. It is an optional parameter, since it is used primarily to validate the identity of a successful lock requestor
- the LOCK_MODE parameter, 33 specifies the type of lock which is currently being used by the current lock holder.
- one or more supplemental validation procedures, lock override procedures, or both may be selectively associated with each LOCK_MODE parameter. For example, some processor operations take much longer than others, and in systems which implement a preset timeout to override the lock in the event of a protocol failure, it may be desirable to establish a lock mode for these longer operations in which the normal timeout will not occur.
- a first lock mode may be associated with a normal timeout lock override procedure and a second lock mode with a different timeout procedure, or none at all.
- Additional lock modes may be associated, for example, with shared access to certain data.
- one of lock modes (and any lock mode, supplemental validation or override procedures associated with this lock mode) will be the default lock mode.
- the first lock mode is associated with a competitive, normal timeout lock override procedure and has no supplemental validation procedure
- the second lock mode does have an associated supplemental validation procedure and is also associated with two lock override procedures, one a competitive, long timeout procedure and the other a cooperative, event-based lock override procedure.
- the LOCK_MODE value for a normal timeout mode is the default setting “0” for the LOCK_MODE parameter, while “T” is the LOCK_MODE value for long timeout.
- the LOCK_PW parameter, 35 indicates whether a valid lock is held. It has a valid value for the “no lock holder” state, and one or more valid values indicating that the lock is held. All other values are invalid.
- each shared resource, 4 is assigned its own value of LOCK_PW, 35 . This parameter may be used to identify certain protocol failures requiring lock overrides.
- the CURRENT_HOLDER parameter, 37 indicates which place in the lock request queue presently holds the lock. It indicates a place in line, not an identification, but, as will be explained below, it enables the requestor which holds that place in line to determine when it may take the lock.
- NEXT_FREE parameter 39 , indicates the next available place in the lock queue.
- Both CURRENT_HOLDER and NEXT_FREE are numeric parameters whose values wrap so that the allowable size of the parameter is never exceeded.
- AUX the auxiliary lock data structure
- AUX the auxiliary lock data structure
- AUX, 40 is a single entry, short enough for an atomic operation, and typically has the following form:
- MY_ID (optional), LOCK_MODE_AUX, MY_NUMBER_AUX,
- auxiliary lock data structure, 40 Since the auxiliary lock data structure, 40 , is used primarily to assist in determining when a protocol failure requiring certain lock override procedures has occurred, it is typically not updated every time a new requestor takes the lock. This feature of the invention will be further described in connection with FIG. 7.
- the MY_ID parameter, 41 is an identifier uniquely associated with each processor. As will be further discussed below, the entry is typically refreshed only when that processor is the requestor which currently holds the lock, and only in connection with certain lock modes. In the array form of AUX, only one value of MY_ID(i) is valid for any given entry, since each entry is associated with and can be written by only one processor, but in the illustrated form, N different values of MY-ID are valid, one being associated with each of the N possible requesters, This parameter is optional, but may be used for validation in certain protocol failure situations, as further explained below.
- the LOCK_MODE_AUX parameter, 43 specifies the type of lock which is currently being used by the current lock holder. It has the same possible values and serves the same purpose as the LOCK_MODE parameter, 53 .
- the MY_NUMBER_AUX parameter, 45 indicates what place in the queue the processor holds.
- the entry is typically refreshed only in connection with certain lock modes when a requestor which holds the lock in that mode.
- each processor may refresh only the value in its own entry in the array.
- the TIME_STAMP_AUX parameter, 47 indicates the time at which the processor making the entry obtained the lock. It is typically used to start a timeout clock. This parameter is optional, but may be used for certain types of lock overrides, as will be further explained below.
- MAIN, 30 , and AUX, 40 which must be stored in shared memory, 2 , so that all possible requestors may access them.
- two additional numerical variables, MY_NUMBER, 51 a - n , and TIME_STAMP_L, 53 a - n are associated with each potential requestor. While these may be stored in any system resource to which the requestor has access, typically, both MY_NUMBER, 51 i , and TIME_STAMP_L, 53 i , are stored in the local memory associated with each potential requestor in order to reduce bus traffic.
- Each requester also requires sufficient local memory to store the two most recent values of MAIN and the value of an AUX entry.
- FIG. 5 the steps used to enter the lock request queue, and to poll for and obtain the lock during normal operation of the system described herein are illustrated in a flowchart.
- processor 1 a Prior to entering the process described in FIG. 5, processor 1 a has, in the course of executing process A, identified a need to obtain a lock on a shared resource 4 , illustratively, the replacement queue, 20 .
- processor 1 a initiates its attempt to obtain the lock.
- step 100 processor 1 a reads MAN, and in step 102 , determines whether the lock is validly held. If the lock is currently held by another requestor, the LOCK_PW, 35 , will have a valid value indicating that the lock is held.
- processor 1 a will reserve the lock in default mode and establish a lock request queue at step 106 by setting HOLDER_ID, 31 to its own value, LOCK_MODE, 33 to “0”, LOCK_PW, 35 , to a valid value, CURRENT_HOLDER, 37 , to the value presently entered in NEXT_FREE, 39 , and by incrementing NEXT_FREE, 39 .
- processor 1 a makes a good exit to process A.
- Processor 1 a may call the supplemental validation process described in connection with FIG.
- step 106 either immediately upon completing step 106 , if it requires the lock in a mode other than the default mode, or at some later point in its execution of process A, if, for example, an error or branch condition creates the need for an alternate activity, like recovering the structure of the shared resource, which would require the alternate lock mode.
- processor 1 a queues for the lock in a single atomic read-modify-write operation represented in FIG. 5 by steps 100 , 102 and 104 . If upon reading MAIN in step 100 , processor 1 a determines that the lock is validly held by another requester by the method previously described in connection with step 102 , then, at step 104 , processor 1 a will reserve the next available number in the queue by incrementing the value of NEXT_FREE in MAIN. At step 108 , processor 1 a enters the queue by setting the value of MY_NUMBER, 51 a , to the value of NEXT_FREE it read in step 102 .
- the processor then updates the timeout parameters at step 110 , assuming the lock mode it detected in step 100 by reading MAIN has a timeout-based lock override procedure associated with it in lock services procedure 6 . If there is no timeout-based lock override procedure associated with the lock mode, then processor 1 a may jump directly to the lock polling sequence beginning at step 118 . In the exemplary embodiment shown in FIG. 5, there is a timeout-based lock override procedure associated with each of the two possible lock modes, so at step 110 , processor 1 a updates in its local memory the override parameters associated with the lock mode it has found to be in effect.
- Each lock mode which has an associated timeout procedure may use a different source for its reference value and a different predetermined interval associated with it in lock services procedure 6 .
- the normal timeout mode may use obtain its reference value from its own clock and have a timeout interval of a few seconds or less
- the long timeout mode may obtain its reference value from AUX, 40 , and have a timeout interval of many minutes.
- processor 1 a performs the update by saving the time at which step 108 occurs (as measured by its own internal clock) in TIME_STAMP_L, 53 , for use as a reference value in monitoring whether a timeout has occurred.
- processor 1 a may perform this update by taking a timestamp value from TIME_STAMP_AUX, 47 for use as a reference value in monitoring whether a timeout has occurred. If AUX is an array, Processor 1 a determines what entry in AUX to use for this purpose from the value of HOLDER_ID, 31 , which processor 1 a read in MAIN, 30 , at step 100 .
- processor 1 a may confirm that its LOCK_MODE_AUX is set to the second lock mode, and, if MY_D is implemented, may confirm that AUX also has a value of MY_ID corresponding to the value of HOLDER_ID. If, when processor 1 a executes these validation steps, AUX is found not to be valid, processor 1 a may default to a short, fixed, timeout value. If a valid AUX entry is found, processor 1 a will save the time from TIME_STAMP_AUX to the processor's local memory, for example in TIME_STAMP_L for use in monitoring whether a timeout has occurred.
- processor 1 a will continue with the procedure by testing to see if a timeout has occurred by determining whether the predetermined interval has elapsed since the reference value for the timeout was updated. If a tout is detected, at step 130 , processor 1 a enters the lock forcing process further described in connection with FIG. 6. If a timeout has not occurred, processor 1 a begins polling MAN.
- processor 1 a estimates, before every repetition of polling step 120 , the number of prior entries in the lock request queue and adaptively delays its polling period as a function of said number of prior entries in said lock request queue.
- the polling period may be estimated as the product of the number of significant processor operations expected to be performed before processor 1 a obtains the lock as a function of the number of prior entries in said lock request queue and the average duration of a significant processor operation involving the shared resource.
- processor 1 a After polling MAIN in step 120 , processor 1 a performs a sequence of sanity checks on the updated value of MAIN, 30 , which it has obtained from the polling step, 120 , and stored in its local memory. The sanity check sequence may also be entered from the lock forcing process of step 130 after a failed attempt to force the lock. If processor 1 a determines at step 122 that the LOCK_PW, 35 , is invalid, processor 1 a will jump to step 100 and attempt to obtain the lock. If the LOCK_PW, 35 , is valid and processor 1 a finds at step 124 that it has obtained the lock, i.e.
- processor 1 a will enter the good exit/supplemental validation process at step 131 . If upon reading MAN, 30 , in step 120 , processor 1 a determines at step 122 that the LOCK_PW, 35 , is valid and at step 124 that the lock is still held by another requester by the method previously described in connection with step 102 , then, at step 126 , processor 1 a compares MY_NUMBER with CURRENT_HOLDER and NEXT_FREE to determine whether processor 1 a is still in the queue.
- processor 1 a If, when adjusted for the queue wrap, MY_NUMBER is not between CURRENT_HOLDER and NEXT_FREE, this indicates that the lock has been reset due to a lock override, as will be described further in connection with FIG. 6, and processor 1 a is not a member of the current queue of lock requestors. Processor 1 a then goes to step 100 and repeats steps 100 , 102 , 104 , and 108 in order to join the new lock queue.
- step 126 confirms that processor 1 a is still part of the current lock request queue, then, as will be further discussed in connection with the lock override procedures described below, at step 128 processor 1 a will determine if CURRENT_HOLDER, 37 , LOCK_MODE, 33 , or LOCK_PW, 35 has changed. At step 129 , processor 1 a may update its timeout parameters if any of these has changed since its last reading of MAIN.
- each processor implements a monitoring procedure, M, for detecting a predetermined indication of protocol failure by an one of the plurality of processors and identifying the failing processor.
- This procedure, M is external to lock contention procedure, but may be used to trigger certain lock override procedures, for example, the cooperative lock override process described in connection with FIG. 8.
- FIG. 8 shows one aspect of the invention, shown in FIG.
- processor 1 a will determine at step 112 whether the lockholder is operating in a lock mode associated with an override which uses this trigger, such as the cooperative lock override, and if the lockholder is, process May be periodically polled by processor 1 a upon receiving an indication of protocol failure during this poll, processor 1 a will initiate a lock override process at step 114 , as further described in connection with FIG. 8. At the conclusion of the process shown at step 114 , there will typically be a new lockholder, and processor 1 a will go to step 116 to continue checking for timeouts. Alternatively, procedure M may cause a jump to a lock override process at step 114 , as further described in connection with FIG. 8.
- the procedure M is shown for convenience operating at step 113 although it will be understood that it operates periodically so long as the processors are running.
- the polls do not occur when the lockholder is operating in the default lock mode, but only in connection with a more resource-intensive lock mode such as the long timeout mode.
- polls of or jumps to and from process M may occur at any time in the course of lock contention procedure.
- processor 1 a determines that the present lock mode is not associated with process M at step 112 , or if no protocol failure is indicated by process M in step 113 , processor 1 a will continue checking for timeouts at step 116 . So long as processor 1 a does not obtain the lock and no lock override is initiated as described below in connection with FIG. 6 or FIG.
- processor 1 a repeats the applicable steps in the lock polling sequence, 116 through 120 , and the subsequent sanity check sequence 122 through 129 (with steps 112 , 113 , and 114 if the lock mode so requires), until it determines either that LOCK_PW, 35 , has an invalid value or that MY_NUMBER, 51 a , equals CURRENT_HOLDER, 37 , either of which cause it to take the lock and make a good exit to process A, as described in steps 106 and 131 , or it determines that a timeout or other event requiring a lock override has occurred.
- Various procedures for handling lock overrides are discussed in connection with FIGS. 6, 7 and 8 .
- FIG. 6 is a flowchart illustrating steps used to perform a lock override procedure associated with a selected one of the lock modes in the system described herein.
- This lock override procedure is a timeout procedure.
- Different timeout procedures with different reference values and timeout intervals may be associated with different lock modes.
- a normal, i.e. short, timeout interval using a first reference value is associated with the default “0” lock mode and a long timeout interval using a second reference value is associated with the other “T” lock mode
- processor 1 a tests to see if a timeout has occurred by determining whether a predetermined interval has elapsed since the reference value for the timeout.
- processor 1 a enters the lock forcing process of step 130 . Going now to FIG. 6, where the process of 130 is illustrated in more detail, if processor 1 a determines that a timeout has occurred, then, entering the lock forcing process at Y upon setting a hardware lock, then, in a single atomic read-modify-write operation, represented on the flowchart by steps 132 , 1347 and 136 , processor 1 a initiates its attempt to obtain the lock At step 132 , processor 1 a will read MAIN, 30 , and at step 134 will determine whether MAIN, 30 , has changed since processor 1 a last read MAIN and stored its value in local memory.
- processor 1 a will force the lock, and reset the entire lock request queue, by setting CURRENT_HOLDER, 37 , equal to value of NEXT_FREE, 39 , it read in step 132 , incrementing NEXT_FREE, setting the LOCK_MODE, 33 , to the default mode indicator (regardless of which lock mode processor 1 a actually requires), setting HOLDER_ID, 31 , to its own identifier and setting the LOCK_PW, 35 , to a valid password. Steps 132 , 134 , and 136 must be performed as an atomic operation.
- processor 1 a will complete the lock override procedure by setting MY_NUMBER, 51 a , equal to the value of NEXT_FREE, 39 , it read in step 132 . Processor 1 a will then make a good exit to process A. As discussed in connection with step 131 in FIG. 5, should processor 1 a require the lock in some mode other than the default mode, it will, as a part of this process, proceed as described in connection with FIG. 7. Otherwise, it will simply take the lock and exit the lock contention procedure.
- processor 1 a detects in step 134 that MAIN, 30 , has changed since the last time processor 1 a polled MAIN, it will release its hardware lock at Z and exit the forcing procedure. It will then continue with the sanity check sequence described in connection with FIG. 5, beginning with step 122 , if implemented, using the new value of MAIN which it read at step 132 , and proceeding to steps 124 and beyond.
- processor 1 a will detect in step 126 that the lock request queue has been reset, and will then repeat steps 100 , 102 , 104 , and 108 in order to join the new lock queue. If processor 1 a has not detected the timeout before the lock is forced, and so never enters the lock forcing process, then when processor 1 a reaches step 126 in its regular polling sequence, it will detect that MY_NUMBER, 51 a , is no longer in the queue and will also repeat steps 100 , 102 , 104 , and 108 in order to join the new lock queue.
- FIG. 7 is a flowchart illustrating steps used to select a lock mode (in this case, the second lock mode) other than the default lock mode, to perform a supplemental validation associated with the selected lock mode, and to initialize a second lock override procedure associated with the selected lock mode.
- FIG. 8 will describe how the second lock override procedure is performed.
- the second lock override procedure is a cooperative lock override procedure, and, for purposes of illustration, will be associated with the second, or long timeout lock mode. Because it involves a number of steps using scarce system resources, the cooperative lock override procedure is most suitably associated with a lock mode expected to consume many more I/O cycles or system resources than the default lock mode.
- a supplemental validation procedure is selectively associated with this lock mode.
- processor 1 a has queued for the lock and determined in step 124 of FIG. 5 that its MY_NUMBER, 51 a , corresponds to CURRENT_HOLDER, 37 .
- Processor 1 a has therefore made a good exit to process A at step 131 .
- processor 1 f is next in the lock request queue.
- processor 1 a calls the supplemental validation process from process A, as discussed in connection with step 131 of FIG. 5, because it needs an alternate mode for which supplemental validation is associated, in this case the long timeout mode.
- processor 1 a updates AUX, 40 by setting LOCK_MODE_AUX, 43 , to the identifier of the lock mode it requires, in this case the identifier, “T”, for long timeout mode, and MY_NUMBER_AUX, 45 , to MY_NUMBER, 51 a , the number of its place in the queue.
- processor 1 a will update only the values in its own entry AUX(a). It should be noted that TIME_STAMP_AUX, 47 , and MY_ID, 41 , are not required parameters in connection with the second lock override procedure illustrated in FIG. 7, although either or both may optionally be used for validation in connection with this procedure. If a timeout is associated with the selected lock mode, or if TIME_STAMP_AUX, 47 , is to be used for validation, processor 1 a will also update TIME_STAMP_AUX, 47 , to the time at which step 140 occurs, and if MY_ID, 41 , is implemented in AUX, will update MY_ID to the value of its unique identifier.
- both an event-based cooperative lock override procedure and a timeout-based lock override procedure are associated with the long timeout mode.
- processor 1 a reads an internal clock, preferably the system clock, to determine the time at which step 140 occurred and puts this value in TIME_STAMP_AUX, 47 .
- processor 1 a then reads MAIN, 30 , at step 142 and determines at step 144 whether it validly holds the lock by determining whether MY_NUMBER, 51 a , is equal to CURRENT_HOLDER and the LOCK_PW, 35 , has a valid value. Since processor 1 a has just taken the lock, in the absence of a memory corruption involving MAIN, 30 , or other protocol error, this operation is expected confirm its custody of the lock.
- processor 1 a Upon receiving confirmation that it still holds the lock and still as part of the atomic operation begun in step 142 , processor 1 a updates MAIN, 30 , in step 146 by setting LOCK_MODE, 33 , to the mode indicator “T”, and updating the validation parameters implemented in MAIN. Processor 1 a then exits the supplemental validation process at 151 and proceeds with process A. If any confirmation step in the sequence fails to confirm that processor 1 a holds the lock, then processor 1 a gives a “bad status” error message to process A at step 148 and exits the lock contention process to process A at step 150 , relinquishing any hardware locks as it does so. Although each confirmation requires an extra bus cycle, any failure to confirm is strong evidence of a protocol violation involving processor holding the lock or the lock itself. Once a resource is locked into a long timeout mode (or another high I/O demand mode) in error, detecting and correcting the problem typically requires a great many bus cycles to correct. The validation steps significantly decrease the likelihood of such errors.
- the second lock override procedure is initiated at M when any processor detects certain types of protocol failures while the processor which hold the lock is operating in a lock mode associated with the second lock override procedure, by way of example, the long timeout mode.
- the detecting processor need not be a current member of the lock request queue, and may, in some instances, even be the one which holds the lock.
- the second lock override procedure may be initiated when a processor receives a predetermined indication from a process M external to the lock services procedure that another processor is malfunctioning.
- the processors monitor certain of their own functions. If a processor detects certain types of malfunctions, it will put a message in a first predetermined area in global memory indicating that it is malfunctioning. All processors periodically poll this area for indications of malfunctions in the other processors. In addition, each processor periodically sends a signal, called a heartbeat, over the common bus to a predetermined area in global memory to indicate that it is in good working order, and all processors monitor the heartbeats of all other processors by polling for these heartbeats. If the polling processor fails to detect the heartbeat of another processor for a predetermined interval, the polling processor determines that the silent processor has malfunctioned.
- processor 1 c detects a malfunction in processor 1 a via process M and enters the cooperative override process shown at step 114 in FIG. 5.
- processor 1 c reads AUX, 40 , or, if AUX is an array, AUX(a) corresponding to processor 1 a
- processor 1 c determines whether processor 1 a had set its LOCK_MODE_AUX entry, 43 , to indicate a mode associated with the cooperative lock override procedure, in our example, the long timeout mode.
- the value of MY_NUMBER_AUX, 45 , in AUX, 40 indicates what place in the queue a processor held the last time it updated AUX.
- processor 1 c If AUX is not validated or does include the indicator for the long timeout mode, the second lock override procedure will not be implemented, and processor 1 c will exit the sequence at 168 . If AUX indicates that processor 1 a held the lock in long timeout mode, and the requisite validation criteria are satisfied, then at step 156 , processor 1 c reads MAIN, 30 , and at step 158 attempts to validate that processor 1 a held the lock in long timeout mode. If MAIN, 30 is not validated or does include the requisite indicators for the long timeout mode, the second lock override procedure will not be implemented, and, as before, processor 1 c will exit the sequence at 68 . If processor 1 c is not queued for the lock, at 168 it will exit the lock contention procedure, but if processor 1 c is a member of the lock request queue, from 168 it will continue the lock polling sequence at step 116 in FIG. 5.
- processor 1 c determines whether the value of NEXT_FREE, 39 , read at step 156 is equal to CURRENT_HOLDER, 37 , plus 1.
- processor 1 c updates MAIN to indicate the lock is not held by setting CURRENT_HOLDER, 37 , equal to the value of NEXT_FREE, 39 , setting the LOCK_MODE, 33 , to its default value and setting the LOCK_PW, 35 , to indicate “no lock holder”.
- processor 1 c updates MAIN, 30 , by incrementing CURRENT_HOLDER, 37 , setting LOCK_MODE, 33 to its default value and setting the LOCK_PW, 35 , to any valid value.
- processor 1 c invalidates AUX, 40 , by writing over at least MY_ID, 41 , and preferably the entire entry, and then exits the cooperative lock override procedure at step 168 , as described above.
- Processor 1 f the lock requestor which has been moved to the head of the queue by processor 1 c will detect on its next poll that the LOCK_PW, 35 , is valid and that MY_NUMBER, 51 c , is now equal to CURRENT_HOLDER, 37 , and will accept the lock.
- FIG. 9 the procedure for unlocking the lock in the absence of a protocol error is shown.
- processor 1 a holds the lock in long timeout mode and processor 1 f is the next requestor in the queue. Except where indicated, the steps are the same regardless of whether processor 1 a held the lock in default mode or in another mode. It will also be assumed that processor 1 a has successfully completed the portion of process A which required a lock on the shared resource 4 and still retains the lock, i.e. that no other processor has completed a lock override procedure.
- processor 1 a reads MAIN, 30 , and at step 172 determines whether MAIN is valid and whether the value of CURRENT_HOLDER, 37 , read at step 170 is equal to the value of MY_NUMBER, 51 a . If both conditions are satisfied, then at step 174 , processor 1 a determines whether the value of NEXT_FREE, 39 , read at step 170 is equal to CURRENT_HOLDER, 37 , plus 1.
- processor 1 a updates MAIN to indicate the lock is not held by setting CURRENT_HOLDER, 37 , equal to the value of NEXT_FREE, 39 , setting the LOCK_MODE, 33 , to its default value and setting the LOCK_PW, 35 , to indicate “no lock holder”.
- processor 1 a updates MAIN, 30 , by incrementing CURRENT_HOLDER, 37 , setting LOCK_MODE, 33 to its default value and setting the LOCK_PW, 35 , to any valid value.
- processor 1 a decides, at step 180 if it held the lock in a lock mode associated with a lock override procedure which requires a reference to AUX, 40 , such as the cooperative lock override procedure described in connection with FIG. 8, or a timeout-based procedure which uses TIME_STAMP_AUX, 47 , as its reference value. If it did not hold the lock in such a mode, it will exit the lock services procedure, 6 a , to resume process A.
- a lock override procedure which requires a reference to AUX, 40 , such as the cooperative lock override procedure described in connection with FIG. 8, or a timeout-based procedure which uses TIME_STAMP_AUX, 47 , as its reference value.
- processor 1 a held the lock in long timeout mode, which is associated with both the cooperative lock override procedure and a timeout procedure which uses TIME_STAMP_AUX, 47 , as its reference value, so at step 182 , processor 1 a invalidates AUX by writing over at least MY_ID, and preferably the entire entry, and then exits the lock services procedure to resume process A.
- processor 1 f continuing with the lock contention procedure of FIG. 5, will shortly discover at step 124 that CURRENT_HOLDER, 37 , is now equal to MY_NUMBER, 51 f , and so, in normal operation, the lock will pass to the next member of the queue.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Multi Processors (AREA)
Abstract
Queued lock services for managing a shared resource in a data processing system include a cooperative lock override procedure. On detecting a protocol failure by another processor, the detecting processor confirms that the failing processor is the lockholder and passes the lock to the next requestor in the queue.
Description
- This invention relates generally to a method and apparatus for improving performance in systems where multiple processors contend for control of a shared resource through a lock associated with the shared resource, and more particularly to a method and apparatus for improving performance in intelligent data storage systems.
- When a computer system resource is shared by multiple processes running on multiple processors, or even on one processor, often there must be some way of insuring that no more than one such process may access that resource at any one time. In designing complex data storage systems including multiple processors, synchronizing access to shared resources has been recognized as an issue which must be addressed in order to maintain the consistency and validity of the data. However, the sharing issue may arise in connection with almost any resource that might be used by multiple requesters.
- Many high-performance storage systems are intelligent data storage systems which may be accessible by multiple host computers. These may include, in addition to one or more storage device arrays, a number of intelligent controllers for controlling the various aspects of the data transfers associated with the storage system. In such systems, host controllers may provide the interface between the host computers and the storage system, and device controllers may be used to manage the transfer of data to and from an associated array of storage devices (e.g. disk drives). Often, the arrays may be accessed by multiple hosts and controllers. In addition, advanced storage systems, such as the SYMMETRIX® storage systems manufactured by EMC Corporation, generally include a global memory which typically shared by the controllers in the system. The memory may be used as a staging area (or cache) for the data transfers between the storage devices and the host computers and may provide a communications path which buffers data transfer between the various controllers. Various communication channels, such as busses, backplanes or networks, link the controllers to one another and the global memory, the host controllers to the host computers, and the disk controllers to the storage devices. Such systems are described, for example, in Yanai et al, U.S. Pat. No. 5,206,939 issued Apr. 27, 1993, (hereinafter “the '939 patent”), Yanai et al, U.S. Pat. No. 5,381,539 issued Jan. 10, 1995, (hereinafter “the '539 patent”), Vishlitzky et al, U.S. Pat. No. 5,592,492 issued Jan. 7, 1997, (hereinafter “the '492 patent”), Yanai et al, U.S. Pat. No. 5,664,144 issued Sep. 2, 1997 (hereinafter “the '144 patent), and Vishlitzky et al, U.S. Pat. No. 5,787,473 issued Jul. 28, 1998, (hereinafter “the '473 patent”), all of which are herein incorporated in their entirety by reference. The systems described therein allow the controllers to act independently to perform different processing tasks and provide for distributed management of the global memory resources by the controllers. This high degree of parallelism permits improved efficiency in processing I/O tasks. Since each of the controllers may act independently, there may be contention for certain of the shared memory resources within the system. In these systems, the consistency of the data contained in some portions of global memory may be maintained by requiring each controller to lock those data structures which require consistency while it is performing any operations on them which are supposed to be atomic.
- Since locking inherently reduces the parallelism of the system and puts a high load on system resources, locking procedures must be designed with care to preserve system efficiency. Adding features to the lock, such as queuing, lock override procedures, or multimodality can help to avoid some pitfalls of common lock protocols, such as processor starvation, deadlocks, livelocks and convoys. However, it is also known that, while many of these lock features have individual advantages, multifeatured lock management procedures are difficult to design and implement without unduly burdening system resources or inadvertently introducing pitfalls such as additional deadlock or starvation situations. For example, multimodal locks, which permit the requestor to identify the kind of resource access desired by the requestor and the degree of resource sharing which its transaction can tolerate, can be useful in improving system performance and avoiding deadlocks, but providing a lock override which is suitable for a multimodal lock is quite difficult. If, for example, one lock mode is set to allow unusually long transactions, a timeout set to accommodate normal transactions will cut the long ones off in midstream while a timeout set to accommodate the long transactions will allow failures occurring during normal transactions to go undetected for excessively long periods. Moreover, timeouts are competitive procedures which, in certain circumstances, undesirably offset the cooperative advantages of a queued lock. Because of the complexities introduced by multifeatured locks, it is desirable to validate features and modes which create particularly significant drains on system resources, such as long timeout modes, but introducing additional validation features can itself load system resources to the point where the system efficiency suffers.
- Providing suitable procedures becomes especially difficult in complex multiprocessor systems which may contain a number of queued locks associated with different shared resources and where a requestor may have to progress through a number of lock request queues in turn in order to complete a process. In these systems, it is desirable that whatever procedure is implemented be fair, ensure that each requestor eventually obtains access to the lock whether or not all other requestors in the system are operating properly, and minimize the average waiting time for each requestor in the queue to improve system efficiency. Queued locks which implement a first-in-first-out (FIFO) protocol meet the fairness criteria because denied requests are queued in the order they are received. One such lock services procedure, often known as the “bakery” or “deli” algorithm, is described, for example, in “Resource Allocation with Immunity to Limited Process Failure”. Michael J. Fischer, Nancy A. Lynch, James E. Burns, and Alan Borodin, 20th Annual Symposium on Foundations of Computer Science, San Juan, Puerto Rico, October 1979, p 234-254; and “Distributed FIFO Allocation of Identical Resources Using Small Shared Space”, ACM Transactions on Programming Languages and Systems, January 1989, 11(1): 90-114. When all requestors in the system are operating properly, the basic “deli” algorithm also meets the other criteria, but a protocol violation such as the failure of any processor in the lock request queue can lead to total system deadlock. However, in all complex multiprocessor systems, occasional protocol violations are inevitable, and the “deli” algorithm makes no provision either for detecting these through validation procedures or otherwise, or for handling them when they occur. Moreover, the basic “deli” lock is a unimodal lock.
- A lock is needed which supports multiple locking modes and makes provision both for validation features to detect protocol violations and lock override procedures to manage the violations without unduly reducing system efficiency, and which also meets desirable design criteria for fairness, wait time minimization and guaranteed access.
- In accordance with the present invention, a lock mechanism for managing shared resources in a data processing system is provided.
- In accordance with the present invention, a method for providing queued locking and unlocking services for a shared resource is provided. The services include a cooperative lock override procedure. In one aspect, the locking services are multimodal and the cooperative lock override procedure is selectively associated with a lock mode.
- In another aspect of the invention, a method for providing self-validating, queued lock services for managing a shared resource in a data processing system services includes providing a cooperative lock override procedure. The data processing system includes a plurality of processors as lock requestors. Each processor supports atomic operations and is coupled to the shared resource through one or more first common communication channels. The method includes providing for each shared resource an associated main lock data structure stored in a shared memory accessible by the plurality of processors. The main lock data structure includes in a single atomic structure, the resources needed to lock the shared resource by a successful lock requester, to establish a queue of unsuccessful lock requestors, and to validate the existence of the lock. Resources are also provided to validate the identity of the successful lock requestor in connection with certain transactions. The method also includes providing for each shared resource, an associated auxiliary lock data structure stored in a shared memory accessible by the plurality of processors. The auxiliary lock data structure may be a single entry, the entry being a single atomic structure, or it may be an array which includes an entry for each processor, each entry being a single atomic structure. Each entry includes the resources needed to identify the successful lock requestor's place in a queue of requesters and to identify the successful lock requestor. Each entry may also include the resources needed to save a timestamp as a reference value. The method also includes providing for each processor a monitoring procedure for detecting a predetermined indication of protocol failure by an one of the plurality of processors and identifying the failing processor. The method also includes providing for each processor a lock services procedure including a queuing procedure for unsuccessful lock requesters, locking and unlocking procedures for locking and unlocking the shared resource by a successful lock requestor, and a cooperative lock override procedure responsive to the detection of the predetermined indication of protocol failure. The method also includes detecting, by one of the processors, one of these predetermined indications of protocol failure and identifying the failing processor. The method also includes, in a single atomic operation, examining the contents of the auxiliary lock data structure by the detecting processor to determine whether the identified failing processor is the successful lock requestor, and either, if the identified failing processor is the successful lock requestor, in a single atomic operation by the detecting processor, examining the contents of the main lock data structure and writing data to the main lock data structure to reserve the lock to the next requestor in the queue of unsuccessful lock requestors and to revalidate the lock, or, if the identified failing processor is not the successful lock requester, exiting the cooperative lock override procedure.
- Prior to the step of examining the contents of the main lock data structure by the detecting processor, one of the requesting processors may, in a single atomic operation, examine the contents of the main lock data structure to determine if another requesting processor has previously locked the shared resource and if the lock contents are valid, and if it determines that the contents are invalid or no other requesting processor has previously locked the shared resource, it may write data to the main lock data Structure to reserve and validate the lock.
- In one aspect of the invention, the lock services procedure further includes at least two lock mode procedures and a lock mode selection procedure for selecting one from the lock mode procedures by a successful lock requestor. The locking and unlocking procedures include one or more procedures for locking and unlocking the shared resource in the selected lock mode by a successful lock requester and the cooperative lock override procedure is selectively associated with a lock mode. The atomic main lock data structure further includes the resources needed to identify one of the lock modes and the auxiliary lock data structure further includes the resources needed to identify one from the lock modes. In examining the contents of the main lock data structure, the detecting processor may, in the same atomic operation, verify that the identified lock mode is a lock mode associated with the cooperative lock override procedure and in writing data to the main lock data structure to reserve the lock to the next requestor in the queue of unsuccessful lock requestors the detecting processor may, in the same atomic operation, invalidate the identified lock mode.
- In another aspect, the invention provides an intelligent data storage system. The intelligent storage system typically includes multiple processors as requestors, and these are coupled to a shared resource through one or more first common communication channels. The system also includes a shared memory accessible over one or more second common communications channels to all of the processors. Each processor supports atomic operations. Each processor implements a monitoring procedure for detecting a predetermined indication of protocol failure by a one of the plurality of processors and identifying the failing processor. A lock services procedure is also implemented in each of the processors. The lock services procedure includes a queuing procedure for unsuccessful lock requestors, and locking and unlocking procedures for locking and unlocking the shared resource by a successful lock requester, and a cooperative lock override procedure responsive to the detection of the predetermined indication of protocol failure. M atomic main lock data structure, responsive to the lock services procedures, is implemented in the shared memory and associated with the shared resource. The main lock data structure includes the resources needed to lock a shared resource by a successful lock requestor, to establish a place in a queue of unsuccessful lock requesters, and to validate the existence of the lock. An atomic auxiliary lock data structure, responsive to the lock services procedures, is also implemented in the shared memory and associated with the shared resource. The auxiliary lock data structure includes the resources needed to identify the successful lock requestor's place in a queue of requesters and to identify the successful lock requesters. Each processor is operable in accordance with its monitoring procedure to detect a predetermined indication of protocol failure and identify the failing processor. Each processor is also operable in accordance with its lock services procedure, first to initiate its cooperative lock override procedure responsive to its detection of the predetermined indication of protocol failure, and then in a single atomic operation, to examine the contents of the auxiliary lock data structure to determine if the identified failing processor is the successful lock requestor, and either, if the identified failing processor is the successful lock requester, in a single atomic operation, to examine the contents of the main lock data structure and write data to the main lock data structure to reserve the lock to the next requestor in the queue of unsuccessful lock requestors and to revalidate the lock, or, if the identified failing processor is not the successful lock requester, to exit the cooperative lock override procedure.
- Each of the requesting processors is also operable in accordance with its lock services procedure, in a single atomic operation, to examine the contents of the main lock data structure to determine if another requesting processor has previously locked the shared resource and if the lock contents are valid, and if it determines that the contents are invalid or no other requesting processor has previously locked the shared resource, it may write data to the main lock data structure to reserve and validate the lock.
- In one aspect of the invention, the lock services procedure further includes at least two lock mode procedures and a lock mode selection procedure for selecting one from the lock mode procedures by a successful lock requester. The locking and unlocking procedures include one or more procedures for locking and unlocking the shared resource in the selected lock mode by a successful lock requestor and the cooperative lock override procedure is selectively associated with a lock mode. The atomic main lock data structure further includes the resources needed to identify one of the lock modes. In examining the contents of the main lock data structure, the detecting processor may, in the same atomic operation, verify that the identified lock mode is the lock mode associated with the cooperative lock override procedure and in of writing data to the main lock data structure to reserve the lock to the next requestor in the queue of unsuccessful lock requestors the detecting processor may, in the same atomic operation, invalidate the identified lock mode.
- In yet another aspect of the invention, multiple processes running on a single processor may in some aspects act as requesters, and a lock allocation process or procedure may be invoked by each of these processes, but the operation of the invention is otherwise as described above.
- The above and further advantages of the present invention may be better understood by referring to the following description taken into conjunction with the accompanying drawings in which:
- FIG. 1 is a block diagram of a computer system including a shared resource and incorporating the present invention.
- FIG. 2 is a more detailed block diagram of the computer system of FIG. 1 including an intelligent mass storage system.
- FIG. 3 is a schematic diagram of the main lock data structure used to implement the invention in the system described herein.
- FIG. 4 is a schematic diagram of the auxiliary lock data structure used to implement the invention in some aspects of the system described herein.
- FIG. 5 is a flowchart illustrating steps used to enter the lock request queue, and to poll for and obtain the lock during normal operation of the system described herein.
- FIG. 6 is a flowchart illustrating steps used to perform a timeout lock override procedure associated with a selected one of the lock modes implemented in the system described herein.
- FIG. 7 is a flowchart illustrating steps used to take the lock, to select one of the lock modes, to perform a supplemental validation associated with the selected lock mode, and to initialize a cooperative lock override procedure associated with the selected lock mode implemented in the system described herein.
- FIG. 8 is a flowchart illustrating steps used to perform the cooperative lock override procedure corresponding to a selected one of the lock modes in the system described herein.
- FIG. 9 is a flowchart illustrating steps used to unlock the lock during normal operation of the system described herein.
- Referring now to FIG. 1,
computer system 10 is shown to include, among other things, a plurality ofprocessors 1 a-1 n, running processes A-N, coupled to a shared resource 4 via one or more firstcommon communication channels 3 a-n and to a sharedmemory 2 via one or more second common communication channels 7 a-n. For purposes of illustration, only one firstcommon communication channel 3 and one second common communication channel 7 are shown in FIG. 1. Any or all ofprocessors 1 a-1 n may request access to shared resource 4 in order to execute their processes A-N. The processors are actual or virtual digital processing units which include one or more CPU's and additionallocal memory 5 a-n. For example,processor 1 a may be an intelligent device controller, an open-systems computer, a personal computer, a server, an intelligent host controller or a virtual system residing on a mainframe computer. Since each of the computer systems just mentioned typically communicates using a specific communication protocol, each of the first and second common communication channels will correspondingly be those channels specific to the computer system to which they are coupled. That is for example, assumingprocessor 1 b is an open-systems type server (e.g. running the UNIX Operating System),channel 3 or 7 would typically be a SCSI type communications bus or a fibber-channel communications path. All communications overchannel 3 or 7 would therefore adhere to the respective SCSI or fibre-channel communications protocols. Processes A-N may be, for example, procedures run by the processors, operating system processes or higher level applications. The processors may run other processes not involving shared resource 4. The invention may also be applicable to multiple processes contending for a shared resource but running on a single processor, although this aspect is not illustrated in the drawings. - To synchronize accesses to the shared resource4 and provide data consistency,
system 10 also provides a queued lock associated with shared resource 4. The queued lock is implemented by a main lock data structure, 30 and, in some aspects, an auxiliary lock data structure, 40, both her described below, in sharedmemory 2 and alock services procedure 6 a-6 n running on each ofprocessors 1 a-1 n, respectively. The lock data structures, 30 and 40, must be implemented in a section of memory that is accessible by all of the processors which might need access to the shared resource, although they need not be on the same media as the shared resource. The procedures which allocate the lock may be centralized or distributed. In the intelligent data processing systems described above, the lock services procedures are typically distributed among the various intelligent controllers. - The main lock data structure,30, is used for queuing, mode designation, and transfers of control. It is an atomic data structure which indicates the queue position of the current holder of the lock, the next available position in the queue of subsequent lock requests, the lock mode employed by the current successful lock requestor, and validation information which may be used to identify certain protocol failures requiring lock overrides. Resources may also be provided in the main lock data structure to validate the identity of the successful lock requestor in connection with certain transactions. In some aspects of the invention, the auxiliary lock data structure, 40, is used for validation and may be used to identify additional protocol failures requiring lock overrides, for example, those associated with a particular lock mode. The auxiliary lock data structure, 40, may be a single entry, the entry being a single atomic structure, or it may be an array which includes an entry for each processor, each entry being a single atomic structure. Each entry includes the resources needed to identify the successful lock requestor's place in a queue of requestors and to identify the successful lock requestor. Each processor typically invokes its lock services procedure, for
example procedure 6 b forprocessor 1 b, before starting a transaction on the shared resource 4, and may obtain a lock on the shared resource 4 if it is available. Only after a successful requestor from among the processors obtains the lock will that processor perform its transaction on shared resource 4. If the shared resource 4 is already locked at the time the request is received, or if there are multiple simultaneous requests for access, the lock services procedure will queue the unsuccessful requests on alock request queue 50. In relevant part, each of thelock services procedures 6 a-6 n incorporates, in accordance with the present invention, a lock contention procedure, at least two lock mode procedures, procedures for locking, mode designation and unlocking operations by a successful lock requester in normal operation, algorithms for arbitrating among multiple requests for locks on the shared resource 4 from multipleunsuccessful requestors 1 a-1 n, and a polling procedure for allowing a previously unsuccessful requestor to determine its current status, and, in some aspects, lock override procedures and supplemental lock validation procedures associated with various lock modes, all of which will be further described below. - The shared resource4 of
computer system 10 may be almost any resource that might be used by multiple processes, such as a mass storage device, a memory, a data structure within a memory, an ATM or a communication device. The sharedmemory 2 ofcomputer system 10 is mutually shared by or accessible to theprocessors 1 a-n. The sharedmemory 2 and shared resource 4 may be contained in a single logical object, in separate logical objects contained in a single physical object, such as two portions of a global memory, or they may be separate physical and logical objects, such as a memory and a disk drive. In one aspect, described in more detail below, the invention is implemented in an intelligent data storage system which includes several individual components coupled via internal communications channels, and the shared resource 4 is one or more of a set of shared data resources, such as data records, data management records and blocks of data, in the data storage system. - Referring now to FIG. 2 the
computer system 10 of FIG. 1 is shown in more detail.Computer system 10 includes an intelligent data storage system 14, and may also include a plurality ofhost processors 12 a-12 n connected to the intelligent data storage system 14 byhost communication channels 13 a-13(2 n). The storage system 14 includes a plurality ofhost controllers 21 a-21 n which are, according to a preferred embodiment of the present invention, coupled alternately tobuses host controller 21 a-21 n is responsible for managing the communication between its associated attached host computers and storage system 14. Storage system 14 also includes a global memory 11 coupled to bothbuses large cache memory 15 which is used during the transfer of data between the host computers and the storage devices of arrays 26 a-26 n. The global memory 11 also includes, as further described below, acache manager memory 16 and acache index directory 18 which provides an indication of the data which in stored in thecache memory 15 and provides the addresses of the data which is stored in the cache memory. Also coupled alternately tobuses host controllers 21 a-21 n or global memory 11 of storage system 14. - A set of shared data resources in which data may be stored are implemented in data storage system14 and accessible by a plurality of the processors in
system 10. Some or all of the data records, blocks of data and data management records in the global memory 11 and device arrays 26 a-26 n may be shared data resources. By way of example and in order illustrate certain aspects of the invention, the invention will be explained by treating a single data structure implemented in a portion of global memory 11 as the only shared resource 4. The exemplary data structure is areplacement queue 20, formed from a region of shared memory, such ascache manager memory 16.Replacement queue 20 is analogous to the “least recently used” (LRU) queue used in prior art cache managers for readily identifying the least-recently-used data element in the cache. Because the cache memory has a capacity that is smaller than the main memory, it is sometimes necessary for data elements in the cache memory to be removed from or replaced in the cache memory in order to provide space for new data elements being staged into the cache memory. Typically, the cache manager will remove or replace the “least-recently-used” data element inreplacement queue 20. Various techniques have been described for dynamically monitoring and adjusting cache parameters, as described, for example, in the '473 patent and the '939 patent, supra. The performance of system 14 is highly dependent on the cache management strategy selected. The strategy is implemented byprocedures 27 a-27 n. Since some of these strategies allow the cache slot at the head ofreplacement queue 20 to contain something other than the “least-recently-used” data element,replacement queue 20 is referred to more generally as the replacement queue. - It will be understood, however, that the typical intelligent data storage system14 includes many such shared data resources. The invention is equally applicable to any shared resource 4 in a
system 10 which may be accessed by a plurality of the processors through a queued lock. By way of example and not by way of limitation, other shared resources in intelligent data storage system 14 may includecache index directory 18, other data structures incache manager memory 16, some or all of the data records incache memory 10, and some or all of the blocks of data on disk arrays 26 a-26 n. Intelligent data storage systems for certain applications, such as those supporting airline reservation systems, may require extensive locking of shared data resources, while other applications may require locking of fewer data resources. - In the exemplary embodiment, the main
lock data structure 30 and the auxiliarylock data structure 40, further described in connection with FIG. 3 and FIG. 4, are also implemented in cache manager memory. Various procedures may be executed by each of thehost controllers 21 a-21 n and device controllers 25 a-25 n to access and manage thereplacement queue 20 as well as other shared data resources incache memory 15,cache index directory 18 andcache manager memory 16, as further described, for example, in the '539 patent, the '307 patent, the '144 patent, and the '473 patent, all of which are herein incorporated in their entirety by reference.Procedures 6 a-6(2 n) are the lock services procedures of this invention.Procedures 27 a-27(2 n) are the replacement queue management procedures forhost controllers 21 a-21 n and device controllers 25 a-25 n respectively. Thus, in the illustrative embodiment, the shared resource 4 isreplacement queue 20 implemented in thecache manager memory 16 of global memory 11, theprocessors 1 a-n are thehost controllers 21 a-21 n and device controllers 25 a-25 n, processes A-N are the replacementqueue management procedures 27 a-27(2 n) which manage thereplacement queue 20, and the sharedmemory 2 is also thecache manager memory 16. The storage busses 22 and 23 provide access to the shared resource 4, so these are thefirst communication channels 3 a-3 n. The storage busses 22 and 23 also provide access to the sharedmemory 2 so these are the second communication channels 7 a-7 n.Local memory 5 a-5 n will typically be implemented on bothhost controllers 21 a-21 n and device controllers 25 a-25 n. - It should be noted that this example illustrates two preferred aspects of the invention, namely, that the system embodying the invention is the intelligent data storage system14 and that the processors access the
lock data structures host controllers 21 a-21 n, device controllers 25 a-25 n, orhost computers 12 a-12 n, thechannels 3 a-3 n may be any or all ofchannels 13 a-13 n or busses 22 or 23, and the processes A-N and associatedlock services procedures 6 a-6 n may be other processes or procedures managing other shared data resources. Moreover, thelock data structures - Before proceeding further, it may be helpful to describe the data structures used in one embodiment of the invention. FIG. 3 is a schematic diagram of a preferred form of the main
lock data structure 30, and FIG. 4 is a schematic diagram of a preferred form of the auxiliary lock data structure, 40. - MAIN, the main lock data structure,30, is short enough for an atomic operation and typically has the following form:
- HOLDER_ID, LOCK_MODE, LOCK_PW, CURRENT_HOLDER, NEXT_FREE.
- The HOLDER_ID parameter,31, may be used as an identifier of the requester which currently holds the lock. Each possible requestor in the system is assigned a unique HOLDER_ID. In some aspects of the invention, it is only updated in connection with certain lock modes, so it may not always identify the current holder of the lock. It is an optional parameter, since it is used primarily to validate the identity of a successful lock requestor
- The LOCK_MODE parameter,33, specifies the type of lock which is currently being used by the current lock holder. In addition to the basic lock mode procedure associated with a particular LOCK_MODE parameter, one or more supplemental validation procedures, lock override procedures, or both may be selectively associated with each LOCK_MODE parameter. For example, some processor operations take much longer than others, and in systems which implement a preset timeout to override the lock in the event of a protocol failure, it may be desirable to establish a lock mode for these longer operations in which the normal timeout will not occur. Thus, a first lock mode may be associated with a normal timeout lock override procedure and a second lock mode with a different timeout procedure, or none at all. Additional lock modes may be associated, for example, with shared access to certain data. In one aspect of the invention, one of lock modes (and any lock mode, supplemental validation or override procedures associated with this lock mode) will be the default lock mode. In order to illustrate the invention, and not to limit it, a dual-mode locking system will be described, and the only differences between the two lock modes will be the supplemental validation and lock override procedures associated with them. In the illustrative embodiment, the first lock mode is associated with a competitive, normal timeout lock override procedure and has no supplemental validation procedure, while the second lock mode does have an associated supplemental validation procedure and is also associated with two lock override procedures, one a competitive, long timeout procedure and the other a cooperative, event-based lock override procedure. However, if one byte is allocated to the LOCK_MODE parameter, up to two hundred fifty-six lock modes, with their associated lock mode, supplemental validation and lock override procedures, may be supported within the atomic data structure for MAIN. In the illustrative embodiment the LOCK_MODE value for a normal timeout mode is the default setting “0” for the LOCK_MODE parameter, while “T” is the LOCK_MODE value for long timeout.
- The LOCK_PW parameter,35, indicates whether a valid lock is held. It has a valid value for the “no lock holder” state, and one or more valid values indicating that the lock is held. All other values are invalid. In one aspect of the invention, each shared resource, 4, is assigned its own value of LOCK_PW, 35. This parameter may be used to identify certain protocol failures requiring lock overrides.
- The CURRENT_HOLDER parameter,37, indicates which place in the lock request queue presently holds the lock. It indicates a place in line, not an identification, but, as will be explained below, it enables the requestor which holds that place in line to determine when it may take the lock.
- The NEXT_FREE parameter,39, indicates the next available place in the lock queue. Both CURRENT_HOLDER and NEXT_FREE are numeric parameters whose values wrap so that the allowable size of the parameter is never exceeded.
- AUX, the auxiliary lock data structure, may be a single entry, the entry being a single atomic structure, or it may be an array which includes an entry for each processor, each entry being a single atomic structure. In the embodiment shown, AUX,40, is a single entry, short enough for an atomic operation, and typically has the following form:
- MY_ID (optional), LOCK_MODE_AUX, MY_NUMBER_AUX,
- TIME_STAMP_AUX (optional).
- Since the auxiliary lock data structure,40, is used primarily to assist in determining when a protocol failure requiring certain lock override procedures has occurred, it is typically not updated every time a new requestor takes the lock. This feature of the invention will be further described in connection with FIG. 7.
- The MY_ID parameter,41, is an identifier uniquely associated with each processor. As will be further discussed below, the entry is typically refreshed only when that processor is the requestor which currently holds the lock, and only in connection with certain lock modes. In the array form of AUX, only one value of MY_ID(i) is valid for any given entry, since each entry is associated with and can be written by only one processor, but in the illustrated form, N different values of MY-ID are valid, one being associated with each of the N possible requesters, This parameter is optional, but may be used for validation in certain protocol failure situations, as further explained below.
- The LOCK_MODE_AUX parameter,43, specifies the type of lock which is currently being used by the current lock holder. It has the same possible values and serves the same purpose as the LOCK_MODE parameter, 53.
- The MY_NUMBER_AUX parameter,45, indicates what place in the queue the processor holds. The entry is typically refreshed only in connection with certain lock modes when a requestor which holds the lock in that mode. In the array form of AUX, each processor may refresh only the value in its own entry in the array.
- The TIME_STAMP_AUX parameter,47, indicates the time at which the processor making the entry obtained the lock. It is typically used to start a timeout clock. This parameter is optional, but may be used for certain types of lock overrides, as will be further explained below.
- In addition to MAIN,30, and AUX, 40, which must be stored in shared memory, 2, so that all possible requestors may access them, two additional numerical variables, MY_NUMBER, 51 a-n, and TIME_STAMP_L, 53 a-n, are associated with each potential requestor. While these may be stored in any system resource to which the requestor has access, typically, both MY_NUMBER, 51 i, and TIME_STAMP_L, 53 i, are stored in the local memory associated with each potential requestor in order to reduce bus traffic. Each requester also requires sufficient local memory to store the two most recent values of MAIN and the value of an AUX entry.
- Turning now FIG. 5, the steps used to enter the lock request queue, and to poll for and obtain the lock during normal operation of the system described herein are illustrated in a flowchart. Prior to entering the process described in FIG. 5,
processor 1 a has, in the course of executing process A, identified a need to obtain a lock on a shared resource 4, illustratively, the replacement queue, 20. In a single atomic read-modify-write operation, represented on the flowchart bysteps processor 1 a initiates its attempt to obtain the lock. Instep 100,processor 1 a reads MAN, and instep 102, determines whether the lock is validly held. If the lock is currently held by another requestor, the LOCK_PW, 35, will have a valid value indicating that the lock is held. - If this condition is not true,
processor 1 a will reserve the lock in default mode and establish a lock request queue atstep 106 by setting HOLDER_ID, 31 to its own value, LOCK_MODE, 33 to “0”, LOCK_PW, 35, to a valid value, CURRENT_HOLDER, 37, to the value presently entered in NEXT_FREE, 39, and by incrementing NEXT_FREE, 39. Next, atstep 131,processor 1 a makes a good exit to processA. Processor 1 a may call the supplemental validation process described in connection with FIG. 7, either immediately upon completingstep 106, if it requires the lock in a mode other than the default mode, or at some later point in its execution of process A, if, for example, an error or branch condition creates the need for an alternate activity, like recovering the structure of the shared resource, which would require the alternate lock mode. - Assuming that the lock is validly held by another processor,
processor 1 a queues for the lock in a single atomic read-modify-write operation represented in FIG. 5 bysteps step 100,processor 1 a determines that the lock is validly held by another requester by the method previously described in connection withstep 102, then, atstep 104,processor 1 a will reserve the next available number in the queue by incrementing the value of NEXT_FREE in MAIN. Atstep 108,processor 1 a enters the queue by setting the value of MY_NUMBER, 51 a, to the value of NEXT_FREE it read instep 102. - The processor then updates the timeout parameters at
step 110, assuming the lock mode it detected instep 100 by reading MAIN has a timeout-based lock override procedure associated with it inlock services procedure 6. If there is no timeout-based lock override procedure associated with the lock mode, thenprocessor 1 a may jump directly to the lock polling sequence beginning atstep 118. In the exemplary embodiment shown in FIG. 5, there is a timeout-based lock override procedure associated with each of the two possible lock modes, so atstep 110,processor 1 a updates in its local memory the override parameters associated with the lock mode it has found to be in effect. Each lock mode which has an associated timeout procedure may use a different source for its reference value and a different predetermined interval associated with it inlock services procedure 6. Thus, for example, the normal timeout mode may use obtain its reference value from its own clock and have a timeout interval of a few seconds or less, while the long timeout mode may obtain its reference value from AUX, 40, and have a timeout interval of many minutes. So, in one aspect of the invention,processor 1 a performs the update by saving the time at which step 108 occurs (as measured by its own internal clock) in TIME_STAMP_L, 53, for use as a reference value in monitoring whether a timeout has occurred. In this approach, the timeout is established and monitored without involving scarce system resources such as the busses in any additional I/O cycles, so it is suitable for use as the lock override procedure corresponding to the default lock mode. In another aspect of the invention,processor 1 a may perform this update by taking a timestamp value from TIME_STAMP_AUX, 47 for use as a reference value in monitoring whether a timeout has occurred. If AUX is an array,Processor 1 a determines what entry in AUX to use for this purpose from the value of HOLDER_ID, 31, whichprocessor 1 a read in MAIN, 30, atstep 100. For validation,processor 1 a may confirm that its LOCK_MODE_AUX is set to the second lock mode, and, if MY_D is implemented, may confirm that AUX also has a value of MY_ID corresponding to the value of HOLDER_ID. If, whenprocessor 1 a executes these validation steps, AUX is found not to be valid,processor 1 a may default to a short, fixed, timeout value. If a valid AUX entry is found,processor 1 a will save the time from TIME_STAMP_AUX to the processor's local memory, for example in TIME_STAMP_L for use in monitoring whether a timeout has occurred. In this aspect of the invention, several additional I/O cycles involving scarce system resources are required to validate the lock mode and establish the reference value for the timeout, so this approach is most suitable when either the timeout procedure itself or the lock mode procedure it is associated with (or both) are expected to consume many more I/O cycles or system resources than the default lock mode. In this situation, the small number of I/O cycles used may be justified by decreased likelihood that one or both of these procedures will be initiated in error. - Assuming that a timeout-based lock override has been determined to be associated with operative lock mode, in
step 116,processor 1 a will continue with the procedure by testing to see if a timeout has occurred by determining whether the predetermined interval has elapsed since the reference value for the timeout was updated. If a tout is detected, atstep 130,processor 1 a enters the lock forcing process further described in connection with FIG. 6. If a timeout has not occurred,processor 1 a begins polling MAN. In one embodiment of the invention, atstep 118,processor 1 a estimates, before every repetition ofpolling step 120, the number of prior entries in the lock request queue and adaptively delays its polling period as a function of said number of prior entries in said lock request queue. The polling period may be estimated as the product of the number of significant processor operations expected to be performed beforeprocessor 1 a obtains the lock as a function of the number of prior entries in said lock request queue and the average duration of a significant processor operation involving the shared resource. This delay procedure is further described in U.S. Ser. No. 09/312,146 filed 14 May 1999 by Ofer et al and entitled “Adaptive Delay of Polling Frequencies in a Distributed System with a Queued Lock”, which is herein incorporated by reference in its entirety. - After polling MAIN in
step 120,processor 1 a performs a sequence of sanity checks on the updated value of MAIN, 30, which it has obtained from the polling step, 120, and stored in its local memory. The sanity check sequence may also be entered from the lock forcing process ofstep 130 after a failed attempt to force the lock. Ifprocessor 1 a determines atstep 122 that the LOCK_PW, 35, is invalid,processor 1 a will jump to step 100 and attempt to obtain the lock. If the LOCK_PW, 35, is valid andprocessor 1 a finds atstep 124 that it has obtained the lock, i.e. that the value of CURRENT_HOLDER, 37, read atstep 120 equals MY_NUMBER, 51 a,processor 1 a will enter the good exit/supplemental validation process atstep 131. If upon reading MAN, 30, instep 120,processor 1 a determines atstep 122 that the LOCK_PW, 35, is valid and atstep 124 that the lock is still held by another requester by the method previously described in connection withstep 102, then, atstep 126,processor 1 a compares MY_NUMBER with CURRENT_HOLDER and NEXT_FREE to determine whetherprocessor 1 a is still in the queue. If, when adjusted for the queue wrap, MY_NUMBER is not between CURRENT_HOLDER and NEXT_FREE, this indicates that the lock has been reset due to a lock override, as will be described further in connection with FIG. 6, andprocessor 1 a is not a member of the current queue of lock requestors.Processor 1 a then goes to step 100 and repeatssteps step 126 confirms thatprocessor 1 a is still part of the current lock request queue, then, as will be further discussed in connection with the lock override procedures described below, atstep 128processor 1 a will determine if CURRENT_HOLDER, 37, LOCK_MODE, 33, or LOCK_PW, 35 has changed. Atstep 129,processor 1 a may update its timeout parameters if any of these has changed since its last reading of MAIN. - In one aspect of the invention, each processor implements a monitoring procedure, M, for detecting a predetermined indication of protocol failure by an one of the plurality of processors and identifying the failing processor. This procedure, M, is external to lock contention procedure, but may be used to trigger certain lock override procedures, for example, the cooperative lock override process described in connection with FIG. 8. In one aspect of the invention, shown in FIG. 5,
processor 1 a will determine atstep 112 whether the lockholder is operating in a lock mode associated with an override which uses this trigger, such as the cooperative lock override, and if the lockholder is, process May be periodically polled byprocessor 1 a upon receiving an indication of protocol failure during this poll,processor 1 a will initiate a lock override process atstep 114, as further described in connection with FIG. 8. At the conclusion of the process shown atstep 114, there will typically be a new lockholder, andprocessor 1 a will go to step 116 to continue checking for timeouts. Alternatively, procedure M may cause a jump to a lock override process atstep 114, as further described in connection with FIG. 8. The procedure M is shown for convenience operating atstep 113 although it will be understood that it operates periodically so long as the processors are running. In a preferred embodiment, the polls do not occur when the lockholder is operating in the default lock mode, but only in connection with a more resource-intensive lock mode such as the long timeout mode. However, in an alternative embodiment, polls of or jumps to and from process M may occur at any time in the course of lock contention procedure. - If
processor 1 a determines that the present lock mode is not associated with process M atstep 112, or if no protocol failure is indicated by process M instep 113,processor 1 a will continue checking for timeouts atstep 116. So long asprocessor 1 a does not obtain the lock and no lock override is initiated as described below in connection with FIG. 6 or FIG. 8,processor 1 a repeats the applicable steps in the lock polling sequence, 116 through 120, and the subsequentsanity check sequence 122 through 129 (withsteps steps - FIG. 6 is a flowchart illustrating steps used to perform a lock override procedure associated with a selected one of the lock modes in the system described herein. This lock override procedure is a timeout procedure. Different timeout procedures with different reference values and timeout intervals may be associated with different lock modes. In the exemplary embodiment, for example, a normal, i.e. short, timeout interval using a first reference value is associated with the default “0” lock mode and a long timeout interval using a second reference value is associated with the other “T” lock mode Referring now to step116 in FIG. 5,
processor 1 a tests to see if a timeout has occurred by determining whether a predetermined interval has elapsed since the reference value for the timeout. If a timeout has occurred,processor 1 a enters the lock forcing process ofstep 130. Going now to FIG. 6, where the process of 130 is illustrated in more detail, ifprocessor 1 a determines that a timeout has occurred, then, entering the lock forcing process at Y upon setting a hardware lock, then, in a single atomic read-modify-write operation, represented on the flowchart bysteps processor 1 a initiates its attempt to obtain the lock Atstep 132,processor 1 a will read MAIN, 30, and atstep 134 will determine whether MAIN, 30, has changed sinceprocessor 1 a last read MAIN and stored its value in local memory. If it has not, atstep 136,processor 1 a will force the lock, and reset the entire lock request queue, by setting CURRENT_HOLDER, 37, equal to value of NEXT_FREE, 39, it read instep 132, incrementing NEXT_FREE, setting the LOCK_MODE, 33, to the default mode indicator (regardless of which lockmode processor 1 a actually requires), setting HOLDER_ID, 31, to its own identifier and setting the LOCK_PW, 35, to a valid password.Steps step 138,processor 1 a will complete the lock override procedure by setting MY_NUMBER, 51 a, equal to the value of NEXT_FREE, 39, it read instep 132.Processor 1 a will then make a good exit to process A. As discussed in connection withstep 131 in FIG. 5, shouldprocessor 1 a require the lock in some mode other than the default mode, it will, as a part of this process, proceed as described in connection with FIG. 7. Otherwise, it will simply take the lock and exit the lock contention procedure. - If more than one processor is in the lock request queue when the first mode timeout occurs, it is possible that more than one processor will detect the event and attempt to force the lock. It would be undesirable for more than one processor to do so successfully. So, if
processor 1 a detects instep 134 that MAIN, 30, has changed since thelast time processor 1 a polled MAIN, it will release its hardware lock at Z and exit the forcing procedure. It will then continue with the sanity check sequence described in connection with FIG. 5, beginning withstep 122, if implemented, using the new value of MAIN which it read atstep 132, and proceeding tosteps 124 and beyond. Typically, in this scenario,processor 1 a will detect instep 126 that the lock request queue has been reset, and will then repeatsteps processor 1 a has not detected the timeout before the lock is forced, and so never enters the lock forcing process, then whenprocessor 1 areaches step 126 in its regular polling sequence, it will detect that MY_NUMBER, 51 a, is no longer in the queue and will also repeatsteps - FIG. 7 is a flowchart illustrating steps used to select a lock mode (in this case, the second lock mode) other than the default lock mode, to perform a supplemental validation associated with the selected lock mode, and to initialize a second lock override procedure associated with the selected lock mode. FIG. 8 will describe how the second lock override procedure is performed. The second lock override procedure is a cooperative lock override procedure, and, for purposes of illustration, will be associated with the second, or long timeout lock mode. Because it involves a number of steps using scarce system resources, the cooperative lock override procedure is most suitably associated with a lock mode expected to consume many more I/O cycles or system resources than the default lock mode. To minimize the likelihood of tying up these system resources in error, a supplemental validation procedure is selectively associated with this lock mode. For purposes of this discussion and the one that follows in connection with FIG. 9, it will be assumed that
processor 1 a has queued for the lock and determined instep 124 of FIG. 5 that its MY_NUMBER, 51 a, corresponds to CURRENT_HOLDER, 37.Processor 1 a has therefore made a good exit to process A atstep 131. It will also be assumed that processor 1 f is next in the lock request queue. - Turning now to FIG. 7, where the supplemental validation process is described in more detail,
processor 1 a calls the supplemental validation process from process A, as discussed in connection withstep 131 of FIG. 5, because it needs an alternate mode for which supplemental validation is associated, in this case the long timeout mode. Instep 140,processor 1 a updates AUX, 40 by setting LOCK_MODE_AUX, 43, to the identifier of the lock mode it requires, in this case the identifier, “T”, for long timeout mode, and MY_NUMBER_AUX, 45, to MY_NUMBER, 51 a, the number of its place in the queue. If AUX is an array,processor 1 a will update only the values in its own entry AUX(a). It should be noted that TIME_STAMP_AUX, 47, and MY_ID, 41, are not required parameters in connection with the second lock override procedure illustrated in FIG. 7, although either or both may optionally be used for validation in connection with this procedure. If a timeout is associated with the selected lock mode, or if TIME_STAMP_AUX, 47, is to be used for validation,processor 1 a will also update TIME_STAMP_AUX, 47, to the time at which step 140 occurs, and if MY_ID, 41, is implemented in AUX, will update MY_ID to the value of its unique identifier. It is not necessary to implement a timeout in addition to the cooperative lock override procedure described below in connection with any selected mode, but depending on the events used to trigger the cooperative lock override procedure, it may be desirable to do so. In the illustrative embodiment, as will be discussed in connection with FIG. 8, both an event-based cooperative lock override procedure and a timeout-based lock override procedure are associated with the long timeout mode. Typically, if both are implemented,processor 1 a reads an internal clock, preferably the system clock, to determine the time at which step 140 occurred and puts this value in TIME_STAMP_AUX, 47. In an atomic read-modify-write operation shown assteps processor 1 a then reads MAIN, 30, atstep 142 and determines atstep 144 whether it validly holds the lock by determining whether MY_NUMBER, 51 a, is equal to CURRENT_HOLDER and the LOCK_PW, 35, has a valid value. Sinceprocessor 1 a has just taken the lock, in the absence of a memory corruption involving MAIN, 30, or other protocol error, this operation is expected confirm its custody of the lock. Upon receiving confirmation that it still holds the lock and still as part of the atomic operation begun instep 142,processor 1 a updates MAIN, 30, instep 146 by setting LOCK_MODE, 33, to the mode indicator “T”, and updating the validation parameters implemented in MAIN.Processor 1 a then exits the supplemental validation process at 151 and proceeds with process A. If any confirmation step in the sequence fails to confirm thatprocessor 1 a holds the lock, thenprocessor 1 a gives a “bad status” error message to process A atstep 148 and exits the lock contention process to process A atstep 150, relinquishing any hardware locks as it does so. Although each confirmation requires an extra bus cycle, any failure to confirm is strong evidence of a protocol violation involving processor holding the lock or the lock itself. Once a resource is locked into a long timeout mode (or another high I/O demand mode) in error, detecting and correcting the problem typically requires a great many bus cycles to correct. The validation steps significantly decrease the likelihood of such errors. - The processes indicated at
steps step 113 in FIG. 5, processor 1 c detects a malfunction inprocessor 1 a via process M and enters the cooperative override process shown atstep 114 in FIG. 5. Atstep 152, processor 1 c reads AUX, 40, or, if AUX is an array, AUX(a) corresponding toprocessor 1 a Atstep 154, processor 1 c determines whetherprocessor 1 a had set its LOCK_MODE_AUX entry, 43, to indicate a mode associated with the cooperative lock override procedure, in our example, the long timeout mode. The value of MY_NUMBER_AUX, 45, in AUX, 40, indicates what place in the queue a processor held the last time it updated AUX. However, since each processor updates its entry in AUX only when it requires a long timeout mode and corruption of the data in the interim periods is possible, it is desirable to validate AUX, 40, using either the time in TIME_STAMP_AUX, 47, or the processor identifier in MY_ID, 41, or both. If the entry is corrupt, it is unlikely that MY_ID will contain the proper identifier and if the entry is outdated, the time in TIME_STAMP_AUX(a) will so indicate. Since each AUX entry is atomic, all of the reads necessary for validation require only one bus I/O cycle. If AUX is not validated or does include the indicator for the long timeout mode, the second lock override procedure will not be implemented, and processor 1 c will exit the sequence at 168. If AUX indicates thatprocessor 1 a held the lock in long timeout mode, and the requisite validation criteria are satisfied, then atstep 156, processor 1 c reads MAIN, 30, and atstep 158 attempts to validate thatprocessor 1 a held the lock in long timeout mode. If MAIN, 30 is not validated or does include the requisite indicators for the long timeout mode, the second lock override procedure will not be implemented, and, as before, processor 1 c will exit the sequence at 68. If processor 1 c is not queued for the lock, at 168 it will exit the lock contention procedure, but if processor 1 c is a member of the lock request queue, from 168 it will continue the lock polling sequence atstep 116 in FIG. 5. - If at
step 158, processor 1 c does validate thatprocessor 1 a held the lock in long timeout mode by finding that CURRENT_HOLDER, 37, has the same value as the value of MY_NUMBER_AUX, 457 whichprocessor 1 a read instep 152, that LOCK_MODE, 33, is set to indicate the long timeout mode, and that LOCK_PW, 35, and HOLDER_ID, 31, if implemented, validate thatprocessor 1 a holds the lock, then atstep 160, processor 1 c determines whether the value of NEXT_FREE, 39, read atstep 156 is equal to CURRENT_HOLDER, 37, plus 1. If it is, there is no other requestor in the queue, so atstep 162, processor 1 c updates MAIN to indicate the lock is not held by setting CURRENT_HOLDER, 37, equal to the value of NEXT_FREE, 39, setting the LOCK_MODE, 33, to its default value and setting the LOCK_PW, 35, to indicate “no lock holder”. If NEXT_FREE, 39, is not equal to CURRENT_HOLDER, 37, plus 1, there are other requestors in the lock queue, so atstep 164, processor 1 c updates MAIN, 30, by incrementing CURRENT_HOLDER, 37, setting LOCK_MODE, 33 to its default value and setting the LOCK_PW, 35, to any valid value. Followingstep step 166, processor 1 c invalidates AUX, 40, by writing over at least MY_ID, 41, and preferably the entire entry, and then exits the cooperative lock override procedure atstep 168, as described above. Meanwhile, the processors in the lock request queue will continue with the lock polling sequence described in connection with FIG. 5. Processor 1 f, the lock requestor which has been moved to the head of the queue by processor 1 c will detect on its next poll that the LOCK_PW, 35, is valid and that MY_NUMBER, 51 c, is now equal to CURRENT_HOLDER, 37, and will accept the lock. - Turning now to FIG. 9, the procedure for unlocking the lock in the absence of a protocol error is shown. As indicated above in connection with FIG. 8, it will be assumed that
processor 1 a holds the lock in long timeout mode and processor 1 f is the next requestor in the queue. Except where indicated, the steps are the same regardless of whetherprocessor 1 a held the lock in default mode or in another mode. It will also be assumed thatprocessor 1 a has successfully completed the portion of process A which required a lock on the shared resource 4 and still retains the lock, i.e. that no other processor has completed a lock override procedure. Atstep 170,processor 1 a reads MAIN, 30, and atstep 172 determines whether MAIN is valid and whether the value of CURRENT_HOLDER, 37, read atstep 170 is equal to the value of MY_NUMBER, 51 a. If both conditions are satisfied, then atstep 174,processor 1 a determines whether the value of NEXT_FREE, 39, read atstep 170 is equal to CURRENT_HOLDER, 37, plus 1. If it is, there is no other requestor in the queue, so atstep 176,processor 1 a updates MAIN to indicate the lock is not held by setting CURRENT_HOLDER, 37, equal to the value of NEXT_FREE, 39, setting the LOCK_MODE, 33, to its default value and setting the LOCK_PW, 35, to indicate “no lock holder”. If NEXT_FREE, 39, is not equal to CURRENT_HOLDER, 37, plus 1, there are other requesters in the lock queue, so atstep 178,processor 1 a updates MAIN, 30, by incrementing CURRENT_HOLDER, 37, setting LOCK_MODE, 33 to its default value and setting the LOCK_PW, 35, to any valid value. These steps are performed as an atomic read-modify-write operation. Followingstep step 172 if either of the conditions are not satisfied,processor 1 a decides, atstep 180 if it held the lock in a lock mode associated with a lock override procedure which requires a reference to AUX, 40, such as the cooperative lock override procedure described in connection with FIG. 8, or a timeout-based procedure which uses TIME_STAMP_AUX, 47, as its reference value. If it did not hold the lock in such a mode, it will exit the lock services procedure, 6 a, to resume process A. However, in the exemplary embodiment,processor 1 a held the lock in long timeout mode, which is associated with both the cooperative lock override procedure and a timeout procedure which uses TIME_STAMP_AUX, 47, as its reference value, so atstep 182,processor 1 a invalidates AUX by writing over at least MY_ID, and preferably the entire entry, and then exits the lock services procedure to resume process A. Meanwhile, processor 1 f, continuing with the lock contention procedure of FIG. 5, will shortly discover atstep 124 that CURRENT_HOLDER, 37, is now equal to MY_NUMBER, 51 f, and so, in normal operation, the lock will pass to the next member of the queue. - Having described a preferred embodiment of the present invention, it will now become apparent to those of skill in the art that other embodiments incorporating its concepts may be provided. It is felt therefore that this invention should not be limited to the disclosed embodiment but rather should be limited only by the spirit and scope of the following claims.
Claims (8)
1. A method for providing cooperative queued locking and unlocking services for managing a shared resource in a data processing system including a plurality of processors as lock requesters, each processor supporting atomic operations and being coupled to the shared resource through one or more first common communication channels, including the steps of:
providing for each processor a monitoring procedure for detecting a predetermined indication of protocol failure by an one of the plurality of processors and identifying the failing processor;
providing for each processor a lock services procedure including a queuing procedure for unsuccessful lock requestors, locking and unlocking procedures for locking and unlocking the shared resource by a successful lock requestor, and a lock override procedure responsive to the detection of the predetermined indication of protocol failure;
providing for the shared resource, an associated main lock data structure stored in a shared memory accessible by the plurality of processors, the main lock data structure including in a single atomic structure, the resources needed to lock the shared resource by a successful lock requester, to establish a queue of unsuccessful lock requestors, and to validate the existence of the lock;
providing for the shared resource, an associated auxiliary lock data structure stored in a shared memory accessible by the plurality of processors, the auxiliary lock data structure including the resources needed to identify the successful lock requestor's place in a queue of requestors and to identify the successful lock requestor;
detecting, by one of the processors, a predetermined indication of protocol failure and identifying the failing processor by the detecting processor;
initiating the lock override procedure by the detecting processor responsive to the predetermined indication of protocol failure; and,
in a single atomic operation by the detecting processor, examining the contents of the auxiliary lock data structure to determine if the identified failing processor is the successful lock requestor, and either, if the identified failing processor is the successful lock requestor, in a single atomic operation by the detecting processor, examining the contents of the main lock data structure and writing data to the main lock data structure to reserve the lock to the next requestor in the queue of unsuccessful lock requesters and to revalidate the lock, or, if the identified failing processor is not the successful lock requester, exiting the cooperative lock override procedure.
2. A method according to claim 1 wherein the lock services procedure further includes at least two lock mode procedures and a lock mode selection procedure for selecting one from the lock mode procedures by a successful lock requester, wherein the locking and unlocking procedures include one or more procedures for locking and unlocking the shared resource in the selected lock mode by a successful lock requestor, wherein the cooperative lock override procedure is selectively associated with a lock mode, wherein the atomic main lock data structure further includes the resources needed to identify one of the lock modes, and wherein the auxiliary lock data structure further includes the resources needed to identify one from the lock modes.
3. A method according to claim 1 wherein the method further includes, prior to examining the contents of the main lock data structure by the detecting processor, the step of in a single atomic operation by one of the requesting processors, examining the contents of the main lock data structure to determine if another requesting processor has previously locked the shared resource and if the lock contents are valid, determining that the contents are invalid or no other requesting processor has previously locked the shared resource, and writing data to the main lock data structure to reserve and validate the lock.
4. A method according to claim 2 wherein, the step of examining the contents of the main lock data structure by the detecting processor includes the step of, in the same atomic operation, verifying by examining the contents of the main lock data structure that the identified lock mode is the lock mode associated with the first lock override procedure and the step of writing data to the main lock data structure to reserve the lock to the next requestor in the queue of unsuccessful lock requestors and to revalidate the lock, includes the step of, in the same atomic operation, the step of invalidating the identified lock mode.
5. An intelligent data storage system comprising:
a shared resource;
a plurality of processors as lock requestors, each processor supporting atomic operations and being coupled to the shared resource through one or more first common communication channels, and each processor implementing a monitoring procedure for detecting a predetermined indication of protocol failure by an one of the plurality of processors and identifying the failing processor;
a shared memory accessible over one or more second common communications channels to all of the processors;
a lock services procedure implemented in each of the processors, the lock services procedure including a queuing procedure for unsuccessful lock requestors, and locking and unlocking procedures for locking and unlocking the shared resource by a successful lock requestor, and a cooperative lock override procedure responsive to the detection of the predetermined indication of protocol failure;
an atomic main lock data structure, responsive to the lock services procedures, implemented in the shared memory and associated with the shared resource, which includes the resources needed to lock a shared resource by a successful lock requester, to establish a place in a queue of unsuccessful lock requestors, and validate the existence of the lock;
an atomic auxiliary lock data structure, responsive to the lock services procedures, implemented in the shared memory and associated with the shared resource, which includes the resources needed to identify the successful lock requestor's place in a queue of requesters and to identify the successful lock requestors;
each processor other than the successful lock requester being operable in accordance with its monitoring procedure to detect a predetermined indication of protocol failure and identify the failing processor; and in accordance with its lock services procedure, first to initiate its cooperative lock override procedure responsive to its detection of the predetermined indication of protocol failure, and then in a single atomic operation, to examine the contents of the auxiliary lock data structure to determine if the identified failing processor is the successful lock requester, and either, if the identified failing processor is the successful lock requestor, in a single atomic operation, to examine the contents of the main lock data structure and write data to the main lock data structure to reserve the lock to the next requester in the queue of unsuccessful lock requestors and to revalidate the lock, or, if the identified failing processor is not the successful lock requestor, to exit the cooperative lock override procedure.
6. An intelligent data storage system according to claim 5 wherein the lock services procedure further includes at least two lock mode procedures, a lock mode selection procedure for selecting one from the lock mode procedures by a successful lock requester, wherein the locking and unlocking procedures include one or more procedures for locking and unlocking the shared resource in the selected lock mode by a successful lock requestor, wherein the cooperative lock override procedure is selectively associated with a lock mode, and wherein the atomic main lock data structure includes the resources needed to identify one of the lock modes.
7. A method according to claim 6 wherein each requesting processor is further operable in accordance with its lock services procedure to verify in examining the contents of the main lock data structure in an atomic operation that the identified lock mode is the lock mode associated with the cooperative lock override procedure and to invalidate the identified lock mode.
8. An intelligent data storage system according to claim 5 wherein each requesting processor is further operable in accordance with its lock services procedure, in a single atomic operation, to examine the contents of the main lock data structure to determine if another requesting processor has previously locked the shared resource and if the lock contents are valid, and either, if the lock contents are valid and some other requesting processor has previously locked the shared resource, to write data to the main lock data structure to establish its place in a queue of requestors for subsequent locks on the shared resource, or if the contents are invalid or no other requesting processor has previously locked the shared resource, to write data to the main lock data structure to reserve and validate the lock.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/879,476 US20040268062A1 (en) | 2000-11-28 | 2004-06-29 | Cooperative lock override procedure |
US10/955,033 US7246187B1 (en) | 2000-11-28 | 2004-09-30 | Method and apparatus for controlling exclusive access to a shared resource in a data storage system |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/724,014 US6757769B1 (en) | 2000-11-28 | 2000-11-28 | Cooperative lock override procedure |
US10/879,476 US20040268062A1 (en) | 2000-11-28 | 2004-06-29 | Cooperative lock override procedure |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/724,014 Continuation US6757769B1 (en) | 2000-11-28 | 2000-11-28 | Cooperative lock override procedure |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/955,033 Continuation-In-Part US7246187B1 (en) | 2000-11-28 | 2004-09-30 | Method and apparatus for controlling exclusive access to a shared resource in a data storage system |
Publications (1)
Publication Number | Publication Date |
---|---|
US20040268062A1 true US20040268062A1 (en) | 2004-12-30 |
Family
ID=32508448
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/724,014 Expired - Lifetime US6757769B1 (en) | 2000-11-28 | 2000-11-28 | Cooperative lock override procedure |
US10/879,476 Abandoned US20040268062A1 (en) | 2000-11-28 | 2004-06-29 | Cooperative lock override procedure |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/724,014 Expired - Lifetime US6757769B1 (en) | 2000-11-28 | 2000-11-28 | Cooperative lock override procedure |
Country Status (1)
Country | Link |
---|---|
US (2) | US6757769B1 (en) |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030023656A1 (en) * | 2001-07-27 | 2003-01-30 | International Business Machines Corporation | Method and system for deadlock detection and avoidance |
US20050066064A1 (en) * | 2003-09-19 | 2005-03-24 | International Business Machines Corporation | Fault tolerant mutual exclusion locks for shared memory systems |
US20080010643A1 (en) * | 2006-07-07 | 2008-01-10 | Nec Electronics Corporation | Multiprocessor system and access right setting method in the multiprocessor system |
US20090106248A1 (en) * | 2004-02-06 | 2009-04-23 | Vmware, Inc. | Optimistic locking method and system for committing transactions on a file system |
US20090138758A1 (en) * | 2005-09-09 | 2009-05-28 | International Business Machines Corporation | Method and system to execute recovery in non-homogeneous multi processor environments |
US7596621B1 (en) * | 2002-10-17 | 2009-09-29 | Astute Networks, Inc. | System and method for managing shared state using multiple programmed processors |
US20100017409A1 (en) * | 2004-02-06 | 2010-01-21 | Vmware, Inc. | Hybrid Locking Using Network and On-Disk Based Schemes |
US20100023803A1 (en) * | 2008-07-25 | 2010-01-28 | International Business Machines Corporation | Transitional replacement of operations performed by a central hub |
US20100250719A1 (en) * | 2009-03-30 | 2010-09-30 | Klug Darren R | Universal Network Adapter for Industrial Control Networks |
US20100254389A1 (en) * | 2009-04-04 | 2010-10-07 | Oracle International Corporation | Method and system for implementing a best efforts resequencer |
US20100257240A1 (en) * | 2009-04-04 | 2010-10-07 | Oracle International Corporation | Method and system for implementing sequence start and increment values for a resequencer |
US20100254388A1 (en) * | 2009-04-04 | 2010-10-07 | Oracle International Corporation | Method and system for applying expressions on message payloads for a resequencer |
US20100257404A1 (en) * | 2009-04-04 | 2010-10-07 | Oracle International Corporation | Method and system for implementing a scalable, high-performance, fault-tolerant locking mechanism in a multi-process environment |
US7814218B1 (en) | 2002-10-17 | 2010-10-12 | Astute Networks, Inc. | Multi-protocol and multi-format stateful processing |
US20110055274A1 (en) * | 2004-02-06 | 2011-03-03 | Vmware, Inc. | Providing multiple concurrent access to a file system |
US20110179082A1 (en) * | 2004-02-06 | 2011-07-21 | Vmware, Inc. | Managing concurrent file system accesses by multiple servers using locks |
US8015303B2 (en) | 2002-08-02 | 2011-09-06 | Astute Networks Inc. | High data rate stateful protocol processing |
US8151278B1 (en) | 2002-10-17 | 2012-04-03 | Astute Networks, Inc. | System and method for timer management in a stateful protocol processing system |
US8549364B2 (en) * | 2009-02-18 | 2013-10-01 | Vmware, Inc. | Failure detection and recovery of host computers in a cluster |
US8560747B1 (en) * | 2007-02-16 | 2013-10-15 | Vmware, Inc. | Associating heartbeat data with access to shared resources of a computer system |
US20160246998A1 (en) * | 2012-09-28 | 2016-08-25 | Intel Corporation | Secure access management of devices |
US10776206B1 (en) | 2004-02-06 | 2020-09-15 | Vmware, Inc. | Distributed transaction system |
US20220004442A1 (en) * | 2016-07-06 | 2022-01-06 | International Business Machines Corporation | Determining when to release a lock from a first task holding the lock to grant to a second task waiting for the lock |
Families Citing this family (158)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6725341B1 (en) * | 2000-06-28 | 2004-04-20 | Intel Corporation | Cache line pre-load and pre-own based on cache coherence speculation |
US7246187B1 (en) | 2000-11-28 | 2007-07-17 | Emc Corporation | Method and apparatus for controlling exclusive access to a shared resource in a data storage system |
US7743109B2 (en) * | 2001-08-01 | 2010-06-22 | Cisco Technology, Inc. | Reducing round trips across a wide area network for resource locking by extended locking and delayed unlocking |
US8495131B2 (en) * | 2002-10-08 | 2013-07-23 | International Business Machines Corporation | Method, system, and program for managing locks enabling access to a shared resource |
US7689708B1 (en) | 2002-10-28 | 2010-03-30 | Netapp, Inc. | Apparatus to flow control frames in a networked storage virtualization using multiple streaming protocols |
US7272611B1 (en) | 2002-10-28 | 2007-09-18 | Network Appliance, Inc. | Apparatus and method for searching a n-branch data structure using information in entries |
US7496574B2 (en) * | 2003-05-01 | 2009-02-24 | International Business Machines Corporation | Managing locks and transactions |
US7289992B2 (en) * | 2003-05-01 | 2007-10-30 | International Business Machines Corporation | Method, system, and program for lock and transaction management |
US7350117B2 (en) * | 2004-10-05 | 2008-03-25 | International Business Machines Corporation | Management of microcode lock in a shared computing resource |
US7487277B2 (en) * | 2005-10-11 | 2009-02-03 | International Business Machines Corporation | Apparatus, system, and method for overriding resource controller lock ownership |
US7606807B1 (en) * | 2006-02-14 | 2009-10-20 | Network Appliance, Inc. | Method and apparatus to utilize free cache in a storage system |
US8234331B2 (en) * | 2008-02-01 | 2012-07-31 | Honeywell International Inc. | System and method for shielding open process control client applications from bad quality initial data |
US9104662B2 (en) * | 2008-08-08 | 2015-08-11 | Oracle International Corporation | Method and system for implementing parallel transformations of records |
GB2465785B (en) * | 2008-11-28 | 2012-07-04 | Vmware Inc | Computer system and method for resolving dependencies in a computer system |
GB2466976B (en) * | 2009-01-16 | 2011-04-27 | Springsource Ltd | Controlling access to a shared resourse in a computer system |
US11275509B1 (en) | 2010-09-15 | 2022-03-15 | Pure Storage, Inc. | Intelligently sizing high latency I/O requests in a storage environment |
US8468318B2 (en) | 2010-09-15 | 2013-06-18 | Pure Storage Inc. | Scheduling of I/O writes in a storage environment |
US8589625B2 (en) | 2010-09-15 | 2013-11-19 | Pure Storage, Inc. | Scheduling of reconstructive I/O read operations in a storage environment |
US12008266B2 (en) | 2010-09-15 | 2024-06-11 | Pure Storage, Inc. | Efficient read by reconstruction |
US8732426B2 (en) | 2010-09-15 | 2014-05-20 | Pure Storage, Inc. | Scheduling of reactive I/O operations in a storage environment |
US8589655B2 (en) | 2010-09-15 | 2013-11-19 | Pure Storage, Inc. | Scheduling of I/O in an SSD environment |
US11614893B2 (en) | 2010-09-15 | 2023-03-28 | Pure Storage, Inc. | Optimizing storage device access based on latency |
US9244769B2 (en) | 2010-09-28 | 2016-01-26 | Pure Storage, Inc. | Offset protection data in a RAID array |
US8775868B2 (en) | 2010-09-28 | 2014-07-08 | Pure Storage, Inc. | Adaptive RAID for an SSD environment |
JP5725162B2 (en) * | 2011-03-31 | 2015-05-27 | 富士通株式会社 | Exclusive control method and exclusive control program |
US8589640B2 (en) | 2011-10-14 | 2013-11-19 | Pure Storage, Inc. | Method for maintaining multiple fingerprint tables in a deduplicating storage system |
US11636031B2 (en) | 2011-08-11 | 2023-04-25 | Pure Storage, Inc. | Optimized inline deduplication |
US8719540B1 (en) | 2012-03-15 | 2014-05-06 | Pure Storage, Inc. | Fractal layout of data blocks across multiple devices |
US10623386B1 (en) | 2012-09-26 | 2020-04-14 | Pure Storage, Inc. | Secret sharing data protection in a storage system |
US8745415B2 (en) | 2012-09-26 | 2014-06-03 | Pure Storage, Inc. | Multi-drive cooperation to generate an encryption key |
US11032259B1 (en) | 2012-09-26 | 2021-06-08 | Pure Storage, Inc. | Data protection in a storage system |
US9063967B2 (en) | 2013-01-10 | 2015-06-23 | Pure Storage, Inc. | Performing copies in a storage system |
US11768623B2 (en) | 2013-01-10 | 2023-09-26 | Pure Storage, Inc. | Optimizing generalized transfers between storage systems |
US10908835B1 (en) | 2013-01-10 | 2021-02-02 | Pure Storage, Inc. | Reversing deletion of a virtual machine |
US11733908B2 (en) | 2013-01-10 | 2023-08-22 | Pure Storage, Inc. | Delaying deletion of a dataset |
US10365858B2 (en) | 2013-11-06 | 2019-07-30 | Pure Storage, Inc. | Thin provisioning in a storage device |
US11128448B1 (en) | 2013-11-06 | 2021-09-21 | Pure Storage, Inc. | Quorum-aware secret sharing |
US10263770B2 (en) | 2013-11-06 | 2019-04-16 | Pure Storage, Inc. | Data protection in a storage system using external secrets |
US9459931B2 (en) * | 2014-01-06 | 2016-10-04 | International Business Machines Corporation | Administering a lock for resources in a distributed computing environment |
US9208086B1 (en) | 2014-01-09 | 2015-12-08 | Pure Storage, Inc. | Using frequency domain to prioritize storage of metadata in a cache |
US10656864B2 (en) | 2014-03-20 | 2020-05-19 | Pure Storage, Inc. | Data replication within a flash storage array |
US9779268B1 (en) | 2014-06-03 | 2017-10-03 | Pure Storage, Inc. | Utilizing a non-repeating identifier to encrypt data |
US11399063B2 (en) | 2014-06-04 | 2022-07-26 | Pure Storage, Inc. | Network authentication for a storage system |
US9218244B1 (en) | 2014-06-04 | 2015-12-22 | Pure Storage, Inc. | Rebuilding data across storage nodes |
US10496556B1 (en) | 2014-06-25 | 2019-12-03 | Pure Storage, Inc. | Dynamic data protection within a flash storage system |
US9218407B1 (en) | 2014-06-25 | 2015-12-22 | Pure Storage, Inc. | Replication and intermediate read-write state for mediums |
US9887937B2 (en) * | 2014-07-15 | 2018-02-06 | Cohesity, Inc. | Distributed fair allocation of shared resources to constituents of a cluster |
US10296469B1 (en) * | 2014-07-24 | 2019-05-21 | Pure Storage, Inc. | Access control in a flash storage system |
US9558069B2 (en) | 2014-08-07 | 2017-01-31 | Pure Storage, Inc. | Failure mapping in a storage array |
US9495255B2 (en) | 2014-08-07 | 2016-11-15 | Pure Storage, Inc. | Error recovery in a storage cluster |
US9864761B1 (en) | 2014-08-08 | 2018-01-09 | Pure Storage, Inc. | Read optimization operations in a storage system |
US12175076B2 (en) | 2014-09-08 | 2024-12-24 | Pure Storage, Inc. | Projecting capacity utilization for snapshots |
US10430079B2 (en) | 2014-09-08 | 2019-10-01 | Pure Storage, Inc. | Adjusting storage capacity in a computing system |
US10164841B2 (en) | 2014-10-02 | 2018-12-25 | Pure Storage, Inc. | Cloud assist for storage systems |
US9489132B2 (en) | 2014-10-07 | 2016-11-08 | Pure Storage, Inc. | Utilizing unmapped and unknown states in a replicated storage system |
US10430282B2 (en) | 2014-10-07 | 2019-10-01 | Pure Storage, Inc. | Optimizing replication by distinguishing user and system write activity |
US9727485B1 (en) | 2014-11-24 | 2017-08-08 | Pure Storage, Inc. | Metadata rewrite and flatten optimization |
US9773007B1 (en) | 2014-12-01 | 2017-09-26 | Pure Storage, Inc. | Performance improvements in a storage system |
US9552248B2 (en) | 2014-12-11 | 2017-01-24 | Pure Storage, Inc. | Cloud alert to replica |
US9588842B1 (en) | 2014-12-11 | 2017-03-07 | Pure Storage, Inc. | Drive rebuild |
US9864769B2 (en) | 2014-12-12 | 2018-01-09 | Pure Storage, Inc. | Storing data utilizing repeating pattern detection |
US10545987B2 (en) | 2014-12-19 | 2020-01-28 | Pure Storage, Inc. | Replication to the cloud |
US10296354B1 (en) | 2015-01-21 | 2019-05-21 | Pure Storage, Inc. | Optimized boot operations within a flash storage array |
US11947968B2 (en) | 2015-01-21 | 2024-04-02 | Pure Storage, Inc. | Efficient use of zone in a storage device |
US9710165B1 (en) | 2015-02-18 | 2017-07-18 | Pure Storage, Inc. | Identifying volume candidates for space reclamation |
US10082985B2 (en) | 2015-03-27 | 2018-09-25 | Pure Storage, Inc. | Data striping across storage nodes that are assigned to multiple logical arrays |
US10178169B2 (en) | 2015-04-09 | 2019-01-08 | Pure Storage, Inc. | Point to point based backend communication layer for storage processing |
US10140149B1 (en) | 2015-05-19 | 2018-11-27 | Pure Storage, Inc. | Transactional commits with hardware assists in remote memory |
US10310740B2 (en) | 2015-06-23 | 2019-06-04 | Pure Storage, Inc. | Aligning memory access operations to a geometry of a storage device |
US9547441B1 (en) | 2015-06-23 | 2017-01-17 | Pure Storage, Inc. | Exposing a geometry of a storage device |
US11341136B2 (en) | 2015-09-04 | 2022-05-24 | Pure Storage, Inc. | Dynamically resizable structures for approximate membership queries |
KR20170028825A (en) | 2015-09-04 | 2017-03-14 | 퓨어 스토리지, 아이앤씨. | Memory-efficient storage and searching in hash tables using compressed indexes |
US11269884B2 (en) | 2015-09-04 | 2022-03-08 | Pure Storage, Inc. | Dynamically resizable structures for approximate membership queries |
US9843453B2 (en) | 2015-10-23 | 2017-12-12 | Pure Storage, Inc. | Authorizing I/O commands with I/O tokens |
US10133503B1 (en) | 2016-05-02 | 2018-11-20 | Pure Storage, Inc. | Selecting a deduplication process based on a difference between performance metrics |
US10452297B1 (en) | 2016-05-02 | 2019-10-22 | Pure Storage, Inc. | Generating and optimizing summary index levels in a deduplication storage system |
US10203903B2 (en) | 2016-07-26 | 2019-02-12 | Pure Storage, Inc. | Geometry based, space aware shelf/writegroup evacuation |
US10613974B2 (en) | 2016-10-04 | 2020-04-07 | Pure Storage, Inc. | Peer-to-peer non-volatile random-access memory |
US10191662B2 (en) | 2016-10-04 | 2019-01-29 | Pure Storage, Inc. | Dynamic allocation of segments in a flash storage system |
US10756816B1 (en) | 2016-10-04 | 2020-08-25 | Pure Storage, Inc. | Optimized fibre channel and non-volatile memory express access |
US10162523B2 (en) | 2016-10-04 | 2018-12-25 | Pure Storage, Inc. | Migrating data between volumes using virtual copy operation |
US10481798B2 (en) | 2016-10-28 | 2019-11-19 | Pure Storage, Inc. | Efficient flash management for multiple controllers |
US10185505B1 (en) | 2016-10-28 | 2019-01-22 | Pure Storage, Inc. | Reading a portion of data to replicate a volume based on sequence numbers |
US10359942B2 (en) | 2016-10-31 | 2019-07-23 | Pure Storage, Inc. | Deduplication aware scalable content placement |
US11550481B2 (en) | 2016-12-19 | 2023-01-10 | Pure Storage, Inc. | Efficiently writing data in a zoned drive storage system |
US10452290B2 (en) | 2016-12-19 | 2019-10-22 | Pure Storage, Inc. | Block consolidation in a direct-mapped flash storage system |
US11093146B2 (en) | 2017-01-12 | 2021-08-17 | Pure Storage, Inc. | Automatic load rebalancing of a write group |
US10528488B1 (en) | 2017-03-30 | 2020-01-07 | Pure Storage, Inc. | Efficient name coding |
US12045487B2 (en) | 2017-04-21 | 2024-07-23 | Pure Storage, Inc. | Preserving data deduplication in a multi-tenant storage system |
US11403019B2 (en) | 2017-04-21 | 2022-08-02 | Pure Storage, Inc. | Deduplication-aware per-tenant encryption |
US10944671B2 (en) | 2017-04-27 | 2021-03-09 | Pure Storage, Inc. | Efficient data forwarding in a networked device |
US10402266B1 (en) | 2017-07-31 | 2019-09-03 | Pure Storage, Inc. | Redundant array of independent disks in a direct-mapped flash storage system |
US10831935B2 (en) | 2017-08-31 | 2020-11-10 | Pure Storage, Inc. | Encryption management with host-side data reduction |
US10776202B1 (en) | 2017-09-22 | 2020-09-15 | Pure Storage, Inc. | Drive, blade, or data shard decommission via RAID geometry shrinkage |
US10789211B1 (en) | 2017-10-04 | 2020-09-29 | Pure Storage, Inc. | Feature-based deduplication |
JP2019067289A (en) * | 2017-10-04 | 2019-04-25 | ルネサスエレクトロニクス株式会社 | Semiconductor device |
US10884919B2 (en) | 2017-10-31 | 2021-01-05 | Pure Storage, Inc. | Memory management in a storage system |
US10860475B1 (en) | 2017-11-17 | 2020-12-08 | Pure Storage, Inc. | Hybrid flash translation layer |
US11010233B1 (en) | 2018-01-18 | 2021-05-18 | Pure Storage, Inc | Hardware-based system monitoring |
US11144638B1 (en) | 2018-01-18 | 2021-10-12 | Pure Storage, Inc. | Method for storage system detection and alerting on potential malicious action |
US10970395B1 (en) | 2018-01-18 | 2021-04-06 | Pure Storage, Inc | Security threat monitoring for a storage system |
US10467527B1 (en) | 2018-01-31 | 2019-11-05 | Pure Storage, Inc. | Method and apparatus for artificial intelligence acceleration |
US11036596B1 (en) | 2018-02-18 | 2021-06-15 | Pure Storage, Inc. | System for delaying acknowledgements on open NAND locations until durability has been confirmed |
US11494109B1 (en) | 2018-02-22 | 2022-11-08 | Pure Storage, Inc. | Erase block trimming for heterogenous flash memory storage devices |
US11934322B1 (en) | 2018-04-05 | 2024-03-19 | Pure Storage, Inc. | Multiple encryption keys on storage drives |
US11995336B2 (en) | 2018-04-25 | 2024-05-28 | Pure Storage, Inc. | Bucket views |
US10678433B1 (en) | 2018-04-27 | 2020-06-09 | Pure Storage, Inc. | Resource-preserving system upgrade |
US11385792B2 (en) | 2018-04-27 | 2022-07-12 | Pure Storage, Inc. | High availability controller pair transitioning |
US10678436B1 (en) | 2018-05-29 | 2020-06-09 | Pure Storage, Inc. | Using a PID controller to opportunistically compress more data during garbage collection |
US11436023B2 (en) | 2018-05-31 | 2022-09-06 | Pure Storage, Inc. | Mechanism for updating host file system and flash translation layer based on underlying NAND technology |
US10776046B1 (en) | 2018-06-08 | 2020-09-15 | Pure Storage, Inc. | Optimized non-uniform memory access |
US11281577B1 (en) | 2018-06-19 | 2022-03-22 | Pure Storage, Inc. | Garbage collection tuning for low drive wear |
US11869586B2 (en) | 2018-07-11 | 2024-01-09 | Pure Storage, Inc. | Increased data protection by recovering data from partially-failed solid-state devices |
US11133076B2 (en) | 2018-09-06 | 2021-09-28 | Pure Storage, Inc. | Efficient relocation of data between storage devices of a storage system |
US11194759B2 (en) | 2018-09-06 | 2021-12-07 | Pure Storage, Inc. | Optimizing local data relocation operations of a storage device of a storage system |
US10846216B2 (en) | 2018-10-25 | 2020-11-24 | Pure Storage, Inc. | Scalable garbage collection |
US11113409B2 (en) | 2018-10-26 | 2021-09-07 | Pure Storage, Inc. | Efficient rekey in a transparent decrypting storage array |
US11194473B1 (en) | 2019-01-23 | 2021-12-07 | Pure Storage, Inc. | Programming frequently read data to low latency portions of a solid-state storage array |
US11588633B1 (en) | 2019-03-15 | 2023-02-21 | Pure Storage, Inc. | Decommissioning keys in a decryption storage system |
US11334254B2 (en) | 2019-03-29 | 2022-05-17 | Pure Storage, Inc. | Reliability based flash page sizing |
US11775189B2 (en) | 2019-04-03 | 2023-10-03 | Pure Storage, Inc. | Segment level heterogeneity |
US11397674B1 (en) | 2019-04-03 | 2022-07-26 | Pure Storage, Inc. | Optimizing garbage collection across heterogeneous flash devices |
US10990480B1 (en) | 2019-04-05 | 2021-04-27 | Pure Storage, Inc. | Performance of RAID rebuild operations by a storage group controller of a storage system |
US12087382B2 (en) | 2019-04-11 | 2024-09-10 | Pure Storage, Inc. | Adaptive threshold for bad flash memory blocks |
US11099986B2 (en) | 2019-04-12 | 2021-08-24 | Pure Storage, Inc. | Efficient transfer of memory contents |
US11487665B2 (en) | 2019-06-05 | 2022-11-01 | Pure Storage, Inc. | Tiered caching of data in a storage system |
US11281394B2 (en) | 2019-06-24 | 2022-03-22 | Pure Storage, Inc. | Replication across partitioning schemes in a distributed storage system |
US10929046B2 (en) | 2019-07-09 | 2021-02-23 | Pure Storage, Inc. | Identifying and relocating hot data to a cache determined with read velocity based on a threshold stored at a storage device |
US12135888B2 (en) | 2019-07-10 | 2024-11-05 | Pure Storage, Inc. | Intelligent grouping of data based on expected lifespan |
US11422751B2 (en) | 2019-07-18 | 2022-08-23 | Pure Storage, Inc. | Creating a virtual storage system |
US11086713B1 (en) | 2019-07-23 | 2021-08-10 | Pure Storage, Inc. | Optimized end-to-end integrity storage system |
US11963321B2 (en) | 2019-09-11 | 2024-04-16 | Pure Storage, Inc. | Low profile latching mechanism |
US11403043B2 (en) | 2019-10-15 | 2022-08-02 | Pure Storage, Inc. | Efficient data compression by grouping similar data within a data segment |
US12050689B2 (en) | 2019-11-22 | 2024-07-30 | Pure Storage, Inc. | Host anomaly-based generation of snapshots |
US11625481B2 (en) | 2019-11-22 | 2023-04-11 | Pure Storage, Inc. | Selective throttling of operations potentially related to a security threat to a storage system |
US11720692B2 (en) | 2019-11-22 | 2023-08-08 | Pure Storage, Inc. | Hardware token based management of recovery datasets for a storage system |
US11675898B2 (en) | 2019-11-22 | 2023-06-13 | Pure Storage, Inc. | Recovery dataset management for security threat monitoring |
US11500788B2 (en) | 2019-11-22 | 2022-11-15 | Pure Storage, Inc. | Logical address based authorization of operations with respect to a storage system |
US12204657B2 (en) | 2019-11-22 | 2025-01-21 | Pure Storage, Inc. | Similar block detection-based detection of a ransomware attack |
US11341236B2 (en) | 2019-11-22 | 2022-05-24 | Pure Storage, Inc. | Traffic-based detection of a security threat to a storage system |
US12067118B2 (en) | 2019-11-22 | 2024-08-20 | Pure Storage, Inc. | Detection of writing to a non-header portion of a file as an indicator of a possible ransomware attack against a storage system |
US11755751B2 (en) | 2019-11-22 | 2023-09-12 | Pure Storage, Inc. | Modify access restrictions in response to a possible attack against data stored by a storage system |
US11520907B1 (en) | 2019-11-22 | 2022-12-06 | Pure Storage, Inc. | Storage system snapshot retention based on encrypted data |
US11615185B2 (en) | 2019-11-22 | 2023-03-28 | Pure Storage, Inc. | Multi-layer security threat detection for a storage system |
US12079333B2 (en) | 2019-11-22 | 2024-09-03 | Pure Storage, Inc. | Independent security threat detection and remediation by storage systems in a synchronous replication arrangement |
US12153670B2 (en) | 2019-11-22 | 2024-11-26 | Pure Storage, Inc. | Host-driven threat detection-based protection of storage elements within a storage system |
US12248566B2 (en) | 2019-11-22 | 2025-03-11 | Pure Storage, Inc. | Snapshot deletion pattern-based determination of ransomware attack against data maintained by a storage system |
US12050683B2 (en) * | 2019-11-22 | 2024-07-30 | Pure Storage, Inc. | Selective control of a data synchronization setting of a storage system based on a possible ransomware attack against the storage system |
US12079356B2 (en) | 2019-11-22 | 2024-09-03 | Pure Storage, Inc. | Measurement interval anomaly detection-based generation of snapshots |
US12079502B2 (en) | 2019-11-22 | 2024-09-03 | Pure Storage, Inc. | Storage element attribute-based determination of a data protection policy for use within a storage system |
US11651075B2 (en) | 2019-11-22 | 2023-05-16 | Pure Storage, Inc. | Extensible attack monitoring by a storage system |
US11657155B2 (en) | 2019-11-22 | 2023-05-23 | Pure Storage, Inc | Snapshot delta metric based determination of a possible ransomware attack against data maintained by a storage system |
US11720714B2 (en) | 2019-11-22 | 2023-08-08 | Pure Storage, Inc. | Inter-I/O relationship based detection of a security threat to a storage system |
US11941116B2 (en) | 2019-11-22 | 2024-03-26 | Pure Storage, Inc. | Ransomware-based data protection parameter modification |
US11687418B2 (en) | 2019-11-22 | 2023-06-27 | Pure Storage, Inc. | Automatic generation of recovery plans specific to individual storage elements |
US11645162B2 (en) | 2019-11-22 | 2023-05-09 | Pure Storage, Inc. | Recovery point determination for data restoration in a storage system |
US20210232442A1 (en) * | 2020-01-29 | 2021-07-29 | International Business Machines Corporation | Moveable distributed synchronization objects |
EP3964959A1 (en) * | 2020-09-03 | 2022-03-09 | ARM Limited | Data processing |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5544347A (en) | 1990-09-24 | 1996-08-06 | Emc Corporation | Data storage system controlled remote data mirroring with respectively maintained data indices |
US5206939A (en) | 1990-09-24 | 1993-04-27 | Emc Corporation | System and method for disk mapping and data retrieval |
US5381539A (en) | 1992-06-04 | 1995-01-10 | Emc Corporation | System and method for dynamically controlling cache management |
US5592432A (en) | 1995-09-05 | 1997-01-07 | Emc Corp | Cache management system using time stamping for replacement queue |
US6353869B1 (en) * | 1999-05-14 | 2002-03-05 | Emc Corporation | Adaptive delay of polling frequencies in a distributed system with a queued lock |
US6609178B1 (en) * | 2000-11-28 | 2003-08-19 | Emc Corporation | Selective validation for queued multimodal locking services |
US6691194B1 (en) * | 2000-11-28 | 2004-02-10 | Emc Corporation | Selective association of lock override procedures with queued multimodal lock |
-
2000
- 2000-11-28 US US09/724,014 patent/US6757769B1/en not_active Expired - Lifetime
-
2004
- 2004-06-29 US US10/879,476 patent/US20040268062A1/en not_active Abandoned
Cited By (42)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6983461B2 (en) * | 2001-07-27 | 2006-01-03 | International Business Machines Corporation | Method and system for deadlock detection and avoidance |
US20030023656A1 (en) * | 2001-07-27 | 2003-01-30 | International Business Machines Corporation | Method and system for deadlock detection and avoidance |
US8015303B2 (en) | 2002-08-02 | 2011-09-06 | Astute Networks Inc. | High data rate stateful protocol processing |
US8151278B1 (en) | 2002-10-17 | 2012-04-03 | Astute Networks, Inc. | System and method for timer management in a stateful protocol processing system |
US7596621B1 (en) * | 2002-10-17 | 2009-09-29 | Astute Networks, Inc. | System and method for managing shared state using multiple programmed processors |
US7814218B1 (en) | 2002-10-17 | 2010-10-12 | Astute Networks, Inc. | Multi-protocol and multi-format stateful processing |
US8234646B2 (en) * | 2003-09-19 | 2012-07-31 | International Business Machines Corporation | Fault tolerant mutual exclusion locks for shared memory systems |
US20050066064A1 (en) * | 2003-09-19 | 2005-03-24 | International Business Machines Corporation | Fault tolerant mutual exclusion locks for shared memory systems |
US20080243850A1 (en) * | 2003-09-19 | 2008-10-02 | Michael Maged M | Fault tolerant mutual exclusion locks for shared memory systems |
US7493618B2 (en) * | 2003-09-19 | 2009-02-17 | International Business Machines Corporation | Fault tolerant mutual exclusion locks for shared memory systems |
US9130821B2 (en) | 2004-02-06 | 2015-09-08 | Vmware, Inc. | Hybrid locking using network and on-disk based schemes |
US20100017409A1 (en) * | 2004-02-06 | 2010-01-21 | Vmware, Inc. | Hybrid Locking Using Network and On-Disk Based Schemes |
US10776206B1 (en) | 2004-02-06 | 2020-09-15 | Vmware, Inc. | Distributed transaction system |
US9031984B2 (en) | 2004-02-06 | 2015-05-12 | Vmware, Inc. | Providing multiple concurrent access to a file system |
US8700585B2 (en) * | 2004-02-06 | 2014-04-15 | Vmware, Inc. | Optimistic locking method and system for committing transactions on a file system |
US8489636B2 (en) | 2004-02-06 | 2013-07-16 | Vmware, Inc. | Providing multiple concurrent access to a file system |
US8543781B2 (en) | 2004-02-06 | 2013-09-24 | Vmware, Inc. | Hybrid locking using network and on-disk based schemes |
US20090106248A1 (en) * | 2004-02-06 | 2009-04-23 | Vmware, Inc. | Optimistic locking method and system for committing transactions on a file system |
US20110055274A1 (en) * | 2004-02-06 | 2011-03-03 | Vmware, Inc. | Providing multiple concurrent access to a file system |
US20110179082A1 (en) * | 2004-02-06 | 2011-07-21 | Vmware, Inc. | Managing concurrent file system accesses by multiple servers using locks |
US20090138758A1 (en) * | 2005-09-09 | 2009-05-28 | International Business Machines Corporation | Method and system to execute recovery in non-homogeneous multi processor environments |
US7765429B2 (en) * | 2005-09-09 | 2010-07-27 | International Business Machines Corporation | Method and system to execute recovery in non-homogenous multi processor environments |
US20080010643A1 (en) * | 2006-07-07 | 2008-01-10 | Nec Electronics Corporation | Multiprocessor system and access right setting method in the multiprocessor system |
US8560747B1 (en) * | 2007-02-16 | 2013-10-15 | Vmware, Inc. | Associating heartbeat data with access to shared resources of a computer system |
US8010832B2 (en) * | 2008-07-25 | 2011-08-30 | International Business Machines Corporation | Transitional replacement of operations performed by a central hub |
US8713354B2 (en) * | 2008-07-25 | 2014-04-29 | International Business Machines Corporation | Transitional replacement of operations performed by a central hub |
US20100023803A1 (en) * | 2008-07-25 | 2010-01-28 | International Business Machines Corporation | Transitional replacement of operations performed by a central hub |
US8443228B2 (en) | 2008-07-25 | 2013-05-14 | International Business Machines Corporation | Transitional replacement of operations performed by a central hub |
US8549364B2 (en) * | 2009-02-18 | 2013-10-01 | Vmware, Inc. | Failure detection and recovery of host computers in a cluster |
US8046444B2 (en) * | 2009-03-30 | 2011-10-25 | Rockwell Automation Technologies, Inc. | Universal network adapter for industrial control networks |
US20100250719A1 (en) * | 2009-03-30 | 2010-09-30 | Klug Darren R | Universal Network Adapter for Industrial Control Networks |
US20100254388A1 (en) * | 2009-04-04 | 2010-10-07 | Oracle International Corporation | Method and system for applying expressions on message payloads for a resequencer |
US20100257240A1 (en) * | 2009-04-04 | 2010-10-07 | Oracle International Corporation | Method and system for implementing sequence start and increment values for a resequencer |
US8661083B2 (en) | 2009-04-04 | 2014-02-25 | Oracle International Corporation | Method and system for implementing sequence start and increment values for a resequencer |
US20100254389A1 (en) * | 2009-04-04 | 2010-10-07 | Oracle International Corporation | Method and system for implementing a best efforts resequencer |
US9124448B2 (en) | 2009-04-04 | 2015-09-01 | Oracle International Corporation | Method and system for implementing a best efforts resequencer |
US8578218B2 (en) * | 2009-04-04 | 2013-11-05 | Oracle International Corporation | Method and system for implementing a scalable, high-performance, fault-tolerant locking mechanism in a multi-process environment |
US20100257404A1 (en) * | 2009-04-04 | 2010-10-07 | Oracle International Corporation | Method and system for implementing a scalable, high-performance, fault-tolerant locking mechanism in a multi-process environment |
US20160246998A1 (en) * | 2012-09-28 | 2016-08-25 | Intel Corporation | Secure access management of devices |
US10049234B2 (en) * | 2012-09-28 | 2018-08-14 | Intel Corporation | Secure access management of devices |
US20220004442A1 (en) * | 2016-07-06 | 2022-01-06 | International Business Machines Corporation | Determining when to release a lock from a first task holding the lock to grant to a second task waiting for the lock |
US12175304B2 (en) * | 2016-07-06 | 2024-12-24 | International Business Machines Corporation | Determining when to release a lock from a first task holding the lock to grant to a second task waiting for the lock |
Also Published As
Publication number | Publication date |
---|---|
US6757769B1 (en) | 2004-06-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6757769B1 (en) | Cooperative lock override procedure | |
US6691194B1 (en) | Selective association of lock override procedures with queued multimodal lock | |
US6718448B1 (en) | Queued locking of a shared resource using multimodal lock types | |
US7246187B1 (en) | Method and apparatus for controlling exclusive access to a shared resource in a data storage system | |
US6353869B1 (en) | Adaptive delay of polling frequencies in a distributed system with a queued lock | |
US6578033B1 (en) | System and method for accessing a shared computer resource using a lock featuring different spin speeds corresponding to multiple states | |
US6609178B1 (en) | Selective validation for queued multimodal locking services | |
JP3121584B2 (en) | Method and apparatus for controlling the number of servers in a multi-system cluster | |
US5987550A (en) | Lock mechanism for shared resources in a data processing system | |
US6226717B1 (en) | System and method for exclusive access to shared storage | |
US6105085A (en) | Lock mechanism for shared resources having associated data structure stored in common memory include a lock portion and a reserve portion | |
US6185639B1 (en) | System and method to reduce a computer system's interrupt processing overhead | |
US6009275A (en) | Centralized management of resources shared by multiple processing units | |
US6105098A (en) | Method for managing shared resources | |
JP3871305B2 (en) | Dynamic serialization of memory access in multiprocessor systems | |
US6463532B1 (en) | System and method for effectuating distributed consensus among members of a processor set in a multiprocessor computing system through the use of shared storage resources | |
EP0428006A2 (en) | Multilevel locking system and method | |
JPS63238634A (en) | Decentralized multiplex processing transaction processing system | |
JPH07191944A (en) | System and method for prevention of deadlock in instruction to many resources by multiporcessor | |
US7472237B1 (en) | Apparatus to offload and accelerate pico code processing running in a storage processor | |
EP0267464B1 (en) | Method for controlling processor access to input/output devices | |
US20150052529A1 (en) | Efficient task scheduling using a locking mechanism | |
EP0853281A2 (en) | Raid apparatus and access control method therefor | |
JPH02224052A (en) | Bus arbitration method and apparatus for provision of bus possession | |
JP2001084235A (en) | Exclusive control method using lock particle size satistical information and computer-readable recording medium with program recorded therein |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: EMC CORPORATION, MASSACHUSETTS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:OFER, ADI;REEL/FRAME:015316/0283 Effective date: 20001120 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |