Detailed Description
To make the objects, technical solutions and advantages of the present disclosure clearer, the present disclosure will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present disclosure, rather than all embodiments. All other embodiments, which can be derived by one of ordinary skill in the art from the embodiments disclosed herein without making any creative effort, shall fall within the scope of protection of the present disclosure.
Some of the words that appear in the text are explained below:
1. the term "and/or" in the embodiments of the present disclosure describes an association relationship of associated objects, and means that there may be three relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
2. The terms "first," "second," and the like in the description and in the claims of the present disclosure and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein.
The application scenario described in the embodiment of the present disclosure is for more clearly illustrating the technical solution of the embodiment of the present disclosure, and does not form a limitation on the technical solution provided in the embodiment of the present disclosure, and as a person having ordinary skill in the art knows, with the occurrence of a new application scenario, the technical solution provided in the embodiment of the present disclosure is also applicable to similar technical problems. In the description of the present disclosure, the term "plurality" means two or more unless otherwise specified.
In the prior art, no specific detection and processing method is provided for large and small service pressure scenes, so that the misjudgment rate of slow disk detection is improved.
In order to solve the above problems, the present disclosure provides a slow disk detection method, a slow disk detection device, and a computer readable and writable storage medium, which count and calculate the I/O timeout probability of each hard disk from two time dimensions, screen out slow disks, and provide a specific detection and processing method especially for large and small service pressure scenes, thereby reducing the misjudgment rate of slow disk detection and improving the accuracy of slow disk detection.
Reference is first made to fig. 1, which is a schematic view of an application scenario of an embodiment of the present disclosure. A user 10 logs in a metadata server 12 in the distributed storage cluster storage system through a client installed in a user device 11, where the client may be a browser of a web page or an application client installed in a mobile user device, such as a mobile phone, a tablet computer, and the like.
The user device 11 and the metadata server 12 are communicatively connected via a network, which may be a local area network, a wide area network, or the like. The user device 11 may be a portable device (e.g., a mobile phone, a tablet, a notebook, etc.) or a Personal Computer (PC), and the metadata server 12 may be any device capable of providing internet services.
One possible form of communication between the user device 11 and the metadata server 12 is that a user can log on to a corresponding slow disc detection platform, and send a slow disc detection instruction of the user 10 to the metadata server 12 through a communication network, and the data node 13 in the metadata server 12 determines whether each hard disc belonging to the data node 13 is a slow disc.
In the embodiment of the present disclosure, the data node 13 uses the second preset interval as a sliding window, and slides with the first preset interval as a sliding step length; when the slow disk detection condition is met, slow disk detection parameters respectively corresponding to each first preset interval and each second preset interval in a current sliding window are obtained, wherein the slow disk detection parameters comprise service duration, I/O times and I/O overtime times of a hard disk; determining a first I/O timeout probability of the first preset interval according to the slow disc detection parameters of the first preset interval; determining a second I/O timeout probability of the second preset interval according to the slow disc detection parameters of the second preset interval; and when the first I/O timeout probability or the second I/O timeout probability is determined to exceed the corresponding I/O timeout threshold, determining that the hard disk is a slow disk.
The embodiment of the disclosure provides a slow disc detection method, and the disclosure is based on the same concept, and further provides a slow disc detection device, an electronic device and a computer readable and writable storage medium.
Example 1
A slow disk detection method provided by the present disclosure is described below with specific embodiments, and the method is applied to a data node in a distributed storage cluster storage system, as shown in fig. 2, and includes:
step 201, adopting a second preset interval as a sliding window, and sliding by taking the first preset interval as a sliding step length;
specifically, the second preset interval is n times of the first preset interval, and the first preset interval, the second preset interval and the multiple relation n between the first preset interval and the second preset interval can be set according to actual conditions, or can be directly completed through a configuration file of the data node. The first preset interval is defined by1The second preset interval is represented by I2(I2={I11,I12,I13,...,I1n}). As shown in fig. 3, the first predetermined interval is 1 day, and the second predetermined interval is 7 days. Sliding the second preset interval as a sliding window and the first preset interval as a sliding step length, as shown in fig. 4, the sliding step length of the sliding window is 1 day from the original [0,7 ]]Position is changed to [1,8 ]]Location. The method and the device have the advantages that two preset intervals are arranged, the detection dimension is enlarged from local to integral detection hard disks, and different scenes can be effectively dealt with. Meanwhile, the hard disk and the hard disk are independent from each other, so that dependence is avoided, the uniqueness of a detection result is ensured, and the low coupling of a detection scheme is ensured.
Step 202, when a slow disc detection condition is met, obtaining slow disc detection parameters respectively corresponding to each first preset interval and each second preset interval in a current sliding window, wherein the slow disc detection parameters comprise service duration, I/O times and I/O overtime times of a hard disc;
specifically, the service duration of the hard disk is a sum of response durations of the hard disk processing the I/O requests in the first preset interval or the second preset interval, the I/O timeout times of the hard disk are total times of response timeout when the hard disk processes the I/O requests in the first preset interval or the second preset interval, and the I/O times of the hard disk are total times of the hard disk processing the I/O requests in the first preset interval or the second preset interval and include the I/O timeout times.
Step 203, determining a first I/O timeout probability of the first preset interval according to the slow disc detection parameters of the first preset interval;
specifically, the first I/O timeout probability is represented as a probability that an I/O timeout occurs within a unit time in a first preset interval, and is determined according to a service duration of a hard disk, I/O times, and I/O timeout times in the first preset interval.
Step 204, determining a second I/O timeout probability of the second preset interval according to the slow disc detection parameters of the second preset interval;
specifically, the second I/O timeout probability is represented as a probability that an I/O timeout occurs within a unit time in a second preset interval, and the second I/O timeout probability in the second preset interval is determined according to the service duration, the I/O frequency, and the I/O timeout frequency of the hard disk in the second preset interval. The data nodes have hot plug condition, the service duration of the hard disks is different from the order of magnitude of the received I/O request, and the detection points of the hard disks are unified by the angle of the I/O overtime probability in unit time, so that the detection accuracy is improved.
Step 205, determining that the hard disk is a slow disk when the first I/O timeout probability or the second I/O timeout probability exceeds the corresponding I/O timeout threshold.
Specifically, if the first I/O timeout probability is determined to exceed the corresponding I/O timeout threshold, determining that the hard disk is a slow disk; otherwise, determining a second I/O timeout probability of the second preset interval, and if the second I/O timeout probability is determined to exceed the corresponding I/O timeout threshold, determining that the hard disk is a slow disk.
The slow disk detection method provided by the embodiment of the disclosure can be used for counting and calculating the I/O overtime probability of each hard disk from two time dimensions of the first preset interval and the second preset interval, and screening the slow disks by comparing the threshold values respectively corresponding to the intervals, so that the misjudgment rate of slow disk detection is reduced, and the accuracy of slow disk detection is improved.
When the slow disc detection condition is met, the slow disc detection condition can be set according to the requirement, can be triggered periodically, can also be triggered when a certain time point is reached, or can be triggered according to the data storage condition. As an alternative implementation, it may be determined whether the detection condition is satisfied according to the configuration item of the configuration file. The configuration item is used for indicating whether the slow disc detection switch is turned on or not, and when the slow disc detection switch is determined to be turned on according to the configuration item, the slow disc detection condition is determined to be met.
The configuration file may further include a first preset interval, a second preset interval, corresponding thresholds, and other related configuration items.
As an optional implementation manner, when the method is applied to data nodes, each data node may establish its own mapping relationship table in a memory, where the mapping relationship table is used to establish data information, corresponding to each hard disk, for determining a slow disk detection parameter in a plurality of hard disks of the data node, and specifically, the content of the mapping relationship table may include: the slot number of the hard disk, the hard disk mark, the service time, the I/O times and the I/O overtime times can determine the slow disk detection parameters according to a mapping relation table established in the memory. Specifically, the hard disk slot number is a slot number in a server where the hard disk is located; the hard disk markers are used to identify normal disks, slow disks, bad disks, and unknown disks, and the disclosed embodiments are directed to only normal disks and slow disks. Determining whether the hard disk is a slow disk or not according to the hard disk mark, determining the service time of a first preset interval and the service time of a second preset interval of the hard disk by reading the service time of the hard disk, determining the I/O times of the hard disk in the first preset interval and the I/O times of the hard disk in the second preset interval by reading the I/O times of the hard disk, and determining the I/O overtime times of the hard disk in the first preset interval and the I/O overtime times of the hard disk in the second preset interval by reading the I/O overtime times of the hard disk.
As an optional implementation manner, when determining that the hard disk is a slow disk, the method further includes:
marking the hard disk as a slow disk;
isolating the hard disk and reporting the hard disk to a metadata node, and stopping writing in service data;
and transferring the data on the hard disk to other hard disks of the data nodes or hard disks of other data nodes through load balancing.
Specifically, this process is the prior art, and the load balancing is currently implemented by a plurality of methods, such as a dynamic load balancing method, which are not described herein again.
FIG. 5 is a flowchart illustrating an overview of a slow disk detection method according to an exemplary embodiment, applied to a data node in a distributed storage cluster, as shown in FIG. 5, including:
step 501, reading a configuration file by each data node in a distributed storage cluster system;
specifically, when the data node is started, the configuration file is read, and whether the slow disk detection switch is turned on or not is determined according to the configuration items in the configuration file, so that the data node can detect the hard disk in real time.
Step 502, judging whether to start slow disc detection according to configuration items in the configuration file, and if not, directly ending the slow disc detection;
step 503, according to the configuration items in the configuration file, if the slow disc detection is started, setting each preset interval, each preset interval threshold value and other related configuration items according to the configuration file;
specifically, a first preset interval and a corresponding threshold, a second preset interval and a corresponding threshold, and the like need to be set.
Step 504, each data node establishes a mapping relation table in a memory, and dynamically updates the content of the mapping relation table along with the input and output of the hard disk service data;
505, each data node performs periodic I/O performance statistics on the corresponding hard disk according to the mapping relation table;
specifically, statistics may be performed when the sliding window slides once, that is, according to the content of the mapping relationship table, slow disc detection parameters of the current sliding window and each first preset interval in the sliding window are determined.
It should be noted that, when a new sliding occurs to the sliding window, only one first preset interval is updated, and the slow disc determination according to the slow disc detection parameter has been performed in the second preset interval that is not updated before, so that only the slow disc detection parameter of the first preset interval that is updated may be counted.
Step 506, detecting whether the hard disk is a slow disk according to the counted slow disk detection parameters in the first preset interval, if not, executing step 507, otherwise, executing step 508;
step 507, detecting whether the hard disk is a slow disk according to the counted slow disk detection parameters in the second preset interval, if not, executing step 505, otherwise, executing step 508;
specifically, at the initial stage of powering on the data node, if the duration of the slow disk detection statistical data does not reach the second preset interval, the second I/O timeout probability of the second preset interval is not calculated, and only the first I/O timeout probability of the first preset interval is calculated. Because the hard disk on the data node is new hardware in the early stage of power-on, the slow disk condition can be basically ignored. And when the duration of the statistical data reaches a second preset interval, calculating a second I/O timeout probability of the second preset interval, sliding according to the length of the first preset interval, and clearing the data which are not in the second preset interval.
Step 508, the data node marks the hard disk as a slow disk, and dynamically updates a mapping relation table;
specifically, only the data node needs to modify the relation table in the memory, and no pressure is caused on the system, the memory and the network bandwidth of the data node, so that the availability of the storage cluster is improved.
Step 509, isolating the hard disk and reporting the hard disk to a metadata node, and stopping writing in service data;
step 510, transferring the data on the hard disk to other hard disks of the data node or hard disks of other data nodes through load balancing.
Specifically, the availability, reliability and access efficiency of the distributed storage cluster are ensured through step 508 and step 510.
As an optional implementation manner, the manner of the first I/O timeout probability and the second I/O timeout probability calculated in this embodiment is as follows:
1) first I/O timeout probability
Calculating the average I/O overtime frequency of the first preset interval according to the ratio of the I/O overtime frequency of the first preset interval to the service time of the hard disk of the first preset interval;
calculating the average I/O frequency of the first preset interval according to the ratio of the I/O times of the first preset interval to the service time of the hard disk of the first preset interval;
and calculating the first I/O overtime probability of the first preset interval according to the ratio of the average I/O overtime frequency of the first preset interval to the average I/O frequency of the first preset interval.
2) Second I/O timeout probability
Calculating the average I/O overtime frequency of the second preset interval according to the ratio of the I/O overtime frequency of the second preset interval to the service time of the hard disk of the second preset interval;
calculating the average I/O frequency of the second preset interval according to the ratio of the I/O times of the second preset interval to the service time of the hard disk of the second preset interval;
and calculating the second I/O overtime probability of the second preset interval according to the ratio of the average I/O overtime frequency of the second preset interval to the average I/O frequency of the second preset interval.
The slow disc can be detected by the method under the normal service environment, but the actual service scene has the scenes with higher service pressure and lower service pressure. The traffic volume under the scene with low traffic pressure is less than the normal traffic volume, which affects the reliability of the distributed cluster.
As an optional implementation, the method further comprises:
if the first I/O timeout probability and the second I/O timeout probability are both smaller than the corresponding I/O timeout threshold and I/O timeout occurs in a plurality of continuous first preset intervals in the current sliding window, determining the number of the continuous first preset intervals;
and if the number of the continuous first preset intervals is larger than the number of the set intervals, determining that the hard disk is a slow disk.
Specifically, when the traffic volume of the scenario is smaller than the normal traffic volume, both the first I/O timeout probability of the first preset interval and the second I/O timeout probability of the second preset interval are smaller than the corresponding I/O timeout thresholds, but the I/O timeout probability of the hard disk in the ith first preset interval is smaller than the corresponding I/O timeout threshold1iThe I/O timeout occurs and starts to the jth first preset interval I of the second preset interval1jWhen the I/O overtime is over, the number of the first preset intervals for the I/O overtime is larger than or equal to the number of the set intervals, namely j-I +1 is larger than or equal to N, wherein I is larger than or equal to 1 and smaller than or equal to N, N is the multiple of the second preset interval and the first preset interval, N is the number of the set intervals, and the size of N can be set according to the actual situation, and the hard disk is judged to be a slow disk.
As shown in FIG. 6, the present disclosure thresholds in a configuration file for data nodesValue N is set to
N is set to be 7, I/O timeout is continuously generated from the 2 nd first preset interval to the 6 th first preset interval, I is 2, j is 6, and the hard disk is judged to be a slow disk according to j-I +1 being not less than N.
According to the characteristics of the slow disk, the scene that the service pressure is small is considered, the hard disk which does not meet the threshold value of the first preset interval and the threshold value of the second preset interval but continuously generates I/O overtime in the plurality of first preset intervals possibly exists, the slow disk is effectively screened according to the phenomenon, data migration is carried out in time, and the reliability of the distributed cluster is improved.
Under the scene of high service pressure, the high service pressure can cause high-frequency I/O overtime of the hard disk jitter, and the target hard disk is easily misjudged.
As an optional implementation manner, the obtaining the first I/O timeout number includes:
and combining the first I/O overtime times acquired in a third preset interval once when the frequency of the occurrence of the I/O overtime exceeds a set frequency threshold, wherein the length of the third preset interval is smaller than that of the first preset interval.
Specifically, in order to improve the accuracy of slow disc detection, when it is detected that the frequency of occurrence of I/O timeout exceeds a set frequency threshold, the data node needs to set a third preset interval, merge the first I/O timeout times acquired within the third preset interval into one time, normally record the I/O timeout times occurring outside the third preset interval, and the length of the third preset interval is smaller than the first preset interval.
For example, the third preset interval is set to 30 seconds(s), an I/O timeout occurs at time t, the data node records the I/O timeout, if the I/O timeout still occurs in the [ t, t +30] interval, the data node does not record the I/O timeout again, that is, only once the I/O timeout is recorded in the [ t, t +30] interval, and if the I/O timeout occurs outside the [ t, t +30] interval, the number of times of the I/O timeout is normally recorded.
Under the scene of high service pressure, the frequently-occurring I/O overtime times in the third preset interval are combined, only one-time statistics is carried out, the misjudgment probability of the slow disk due to high service judgment is reduced, and the accuracy of slow disk detection is improved.
Fig. 7 is a flowchart illustrating a slow disc detection method according to an exemplary embodiment, as shown in fig. 7, including:
step 701, acquiring a slow disc detection parameter according to a mapping relation table;
specifically, according to the service time of the hard disk, the time when the I/O occurs, and the time when the I/O overtime occurs in the mapping relationship table, slow disk detection parameters respectively corresponding to each first preset interval and each second preset interval in the current sliding window are obtained.
Step 702, determining a first I/O timeout probability of a first preset interval;
as described above, in the first preset interval, before determining the first I/O timeout probability, the frequency of occurrence of I/O timeout, that is, the number of times of occurrence of I/O timeout in unit time, needs to be calculated according to the slow disc detection parameter corresponding to the first I/O timeout probability. If the frequency of the occurrence of I/O overtime exceeds the set frequency threshold, combining the acquired first I/O overtime times in a third preset interval into one time, normally recording the occurring first I/O overtime times outside the third preset interval, and then determining the first I/O overtime probability of the first preset interval, and if the frequency of the occurrence of I/O overtime does not exceed the set frequency threshold, directly determining the first I/O overtime probability of the first preset interval.
Step 703, determining whether the first I/O timeout probability exceeds a corresponding I/O timeout threshold, if yes, executing step 708, otherwise, executing step 704;
specifically, step 704, determining a second I/O timeout probability of a second preset interval;
step 705, determining whether the second I/O timeout probability exceeds the corresponding I/O timeout threshold, if yes, executing step 708, otherwise executing step 706;
step 706, acquiring the number of a plurality of continuous first preset intervals in the current sliding window;
step 707, determining that the number of the consecutive first preset intervals is greater than the number of the set intervals, if so, executing step 708, otherwise, executing step 709;
step 708, determining that the hard disk is a slow disk;
step 709, determining the hard disk is a non-slow disk.
Example 2
Based on the same inventive concept, the embodiment of the present disclosure further provides a slow disc detection apparatus, and since the apparatus is an apparatus in the method in the embodiment of the present disclosure, and the principle of the apparatus to solve the problem is similar to that of the method, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not repeated.
As shown in fig. 8, the above apparatus includes the following modules:
a second preset interval sliding module 801, configured to slide by using the second preset interval as a sliding window and using the first preset interval as a sliding step length;
a slow disc detection parameter obtaining module 802, configured to obtain slow disc detection parameters corresponding to each of a first preset interval and a second preset interval in a current sliding window when a slow disc detection condition is met, where the slow disc detection parameters include a service duration, I/O times, and I/O timeout times of a hard disc;
a first I/O timeout probability determining module 803, configured to determine, according to the slow disc detection parameter in the first preset interval, a first I/O timeout probability in the first preset interval;
a second I/O timeout probability determining module 804, configured to determine a second I/O timeout probability of the second preset interval according to the slow disc detection parameter of the second preset interval;
a slow disk determining module 805, configured to determine that the hard disk is a slow disk when determining that the first I/O timeout probability or the second I/O timeout probability exceeds the corresponding I/O timeout threshold.
As an optional implementation manner, the second I/O timeout probability determining module is configured to determine the second I/O timeout probability of the second preset interval according to the I/O times and the I/O timeout times collected in the second preset interval, and includes:
and when the first I/O timeout probability is determined not to exceed the corresponding I/O timeout threshold, determining a second I/O timeout probability of the second preset interval according to the I/O times and the I/O timeout times acquired in the second preset interval.
As an optional implementation, the apparatus further comprises:
a first preset interval number determining module, configured to determine the number of consecutive first preset intervals if the first I/O timeout probability and the second I/O timeout probability are both smaller than corresponding I/O timeout thresholds and I/O timeout occurs in consecutive first preset intervals within the current sliding window;
the slow disk determining module is further configured to determine that the hard disk is a slow disk if the number of the consecutive first preset intervals is greater than the number of the set intervals.
As an optional implementation manner, the slow disc detection parameter obtaining module is configured to obtain the first I/O timeout number, and includes:
and combining the first I/O overtime times acquired in a third preset interval once when the frequency of the occurrence of the I/O overtime exceeds a set frequency threshold, wherein the length of the third preset interval is smaller than that of the first preset interval.
As an optional implementation manner, the first I/O timeout probability determining module is configured to determine, according to the slow disc detection parameter of the first preset interval, a first I/O timeout probability of the first preset interval, and includes:
calculating the average I/O overtime frequency of the first preset interval according to the ratio of the I/O overtime frequency of the first preset interval to the service time of the hard disk of the first preset interval;
calculating the average I/O frequency of the first preset interval according to the ratio of the I/O times of the first preset interval to the service time of the hard disk of the first preset interval;
and calculating the first I/O overtime probability of the first preset interval according to the ratio of the average I/O overtime frequency of the first preset interval to the average I/O frequency of the first preset interval.
As an optional implementation manner, the second I/O timeout probability determining module is configured to determine, according to the slow disc detection parameter of the second preset interval, a second I/O timeout probability of the second preset interval, and includes:
calculating the average I/O overtime frequency of the second preset interval according to the ratio of the I/O overtime frequency of the second preset interval to the service time of the hard disk of the second preset interval;
calculating the average I/O frequency of the second preset interval according to the ratio of the I/O times of the second preset interval to the service time of the hard disk of the second preset interval;
and calculating the second I/O overtime probability of the second preset interval according to the ratio of the average I/O overtime frequency of the second preset interval to the average I/O frequency of the second preset interval.
As an optional implementation manner, the slow disc detection parameter obtaining module is configured to obtain slow disc detection parameters corresponding to each of a first preset interval and a second preset interval in a current sliding window when a slow disc detection condition is met, where the slow disc detection parameters include service duration, I/O times, and I/O timeout times of a hard disc, and the method includes:
acquiring a mapping relation table of hard disk slot numbers, hard disk marks, service time, I/O times and I/O overtime times of the hard disk established by the data nodes;
and acquiring slow disc detection parameters respectively corresponding to each first preset interval and each second preset interval in the current sliding window according to the service time, the I/O times and the I/O overtime times of the hard disc in the mapping relation table.
As an optional implementation manner, the slow disk determining module, when determining that the hard disk is a slow disk, is further configured to:
marking the hard disk as a slow disk;
isolating the hard disk and reporting the hard disk to a metadata node, and stopping writing in service data;
and transferring the data on the hard disk to other hard disks of the data nodes or hard disks of other data nodes through load balancing.
Example 3
Based on the same inventive concept, the embodiment of the present disclosure further provides a slow disc detection electronic device, and as the electronic device is the electronic device in the method in the embodiment of the present disclosure, and the principle of the electronic device to solve the problem is similar to the method, the implementation of the electronic device may refer to the implementation of the method, and repeated details are not repeated.
An electronic device 90 according to this embodiment of the present disclosure is described below with reference to fig. 9. The electronic device 90 shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 9, the electronic device 90 may be embodied in the form of a general purpose computing device, which may be a terminal device, for example. The components of the electronic device 90 may include, but are not limited to: the at least one processor 91, the at least one memory 92 storing processor-executable instructions, and a bus 93 connecting the various system components (including the memory 92 and the processor 91).
The processor executes the executable instructions to implement the steps of:
sliding by taking the first preset interval as a sliding step length by taking the second preset interval as a sliding window;
when the slow disk detection condition is met, slow disk detection parameters respectively corresponding to each first preset interval and each second preset interval in a current sliding window are obtained, wherein the slow disk detection parameters comprise service duration, I/O times and I/O overtime times of a hard disk;
determining a first I/O timeout probability of the first preset interval according to the slow disc detection parameters of the first preset interval;
determining a second I/O timeout probability of the second preset interval according to the slow disc detection parameters of the second preset interval;
and when the first I/O timeout probability or the second I/O timeout probability is determined to exceed the corresponding I/O timeout threshold, determining that the hard disk is a slow disk.
As an optional implementation manner, the determining, by the processor, a second I/O timeout probability of the second preset interval according to the number of I/O times and the number of I/O timeout collected in the second preset interval includes:
and when the first I/O timeout probability is determined not to exceed the corresponding I/O timeout threshold, determining a second I/O timeout probability of the second preset interval according to the I/O times and the I/O timeout times acquired in the second preset interval.
As an optional implementation, the processor is further configured to perform:
if the first I/O timeout probability and the second I/O timeout probability are both smaller than the corresponding I/O timeout threshold and I/O timeout occurs in a plurality of continuous first preset intervals in the current sliding window, determining the number of the continuous first preset intervals;
and if the number of the continuous first preset intervals is larger than the number of the set intervals, determining that the hard disk is a slow disk.
As an optional implementation manner, the processor is configured to perform acquiring the first I/O timeout number, and includes:
and combining the first I/O overtime times acquired in a third preset interval once when the frequency of the occurrence of the I/O overtime exceeds a set frequency threshold, wherein the length of the third preset interval is smaller than that of the first preset interval.
As an optional implementation manner, the processor is configured to execute determining a first I/O timeout probability of the first preset interval according to the slow disc detection parameter of the first preset interval, including:
calculating the average I/O overtime frequency of the first preset interval according to the ratio of the I/O overtime frequency of the first preset interval to the service time of the hard disk of the first preset interval;
calculating the average I/O frequency of the first preset interval according to the ratio of the I/O times of the first preset interval to the service time of the hard disk of the first preset interval;
and calculating the first I/O overtime probability of the first preset interval according to the ratio of the average I/O overtime frequency of the first preset interval to the average I/O frequency of the first preset interval.
As an optional implementation manner, the processor is configured to execute determining a second I/O timeout probability of the second preset interval according to the slow disc detection parameter of the second preset interval, including:
calculating the average I/O overtime frequency of the second preset interval according to the ratio of the I/O overtime frequency of the second preset interval to the service time of the hard disk of the second preset interval;
calculating the average I/O frequency of the second preset interval according to the ratio of the I/O times of the second preset interval to the service time of the hard disk of the second preset interval;
and calculating the second I/O overtime probability of the second preset interval according to the ratio of the average I/O overtime frequency of the second preset interval to the average I/O frequency of the second preset interval.
As an optional implementation manner, when the slow disk detection condition is satisfied, the processor is configured to obtain slow disk detection parameters corresponding to each of a first preset interval and a second preset interval in a current sliding window, where the slow disk detection parameters include a service duration, I/O times, and I/O timeout times of a hard disk, and the method includes:
acquiring a mapping relation table of hard disk slot numbers, hard disk marks, service time, I/O times and I/O overtime times of the hard disk established by the data nodes;
and acquiring slow disc detection parameters respectively corresponding to each first preset interval and each second preset interval in the current sliding window according to the service time, the I/O times and the I/O overtime times of the hard disc in the mapping relation table.
As an optional implementation manner, when the processor is configured to perform determining that the hard disk is a slow disk, the processor further includes:
marking the hard disk as a slow disk;
isolating the hard disk and reporting the hard disk to a metadata node, and stopping writing in service data;
and transferring the data on the hard disk to other hard disks of the data nodes or hard disks of other data nodes through load balancing.
Bus 93 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, or a local bus using any of a variety of bus architectures.
Memory 92 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)921 and/or cache memory 922, and may further include Read Only Memory (ROM) 923.
Memory 92 may also include a program/utility 925 having a set (at least one) of program modules 924, such program modules 924 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The electronic device 90 may also communicate with one or more external devices 94 (e.g., keyboard, pointing device, etc.), with one or more devices that enable a user to interact with the electronic device 90, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 90 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 95. Also, the electronic device 90 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via a network adapter 96. As shown, the network adapter 96 communicates with the other modules of the electronic device 90 via the bus 93. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 90, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Example 4
In some possible embodiments, various aspects of the present disclosure may also be implemented in a program product, which includes program code for causing a terminal device to execute steps of modules in a slow disc detection apparatus according to various exemplary embodiments of the present disclosure described in the above section of "exemplary method" of this specification when the program product runs on the terminal device, for example, the terminal device may be configured to slide with a first preset interval as a sliding step length by using a second preset interval as a sliding window; when the slow disk detection condition is met, slow disk detection parameters respectively corresponding to each first preset interval and each second preset interval in a current sliding window are obtained, wherein the slow disk detection parameters comprise service duration, I/O times and I/O overtime times of a hard disk; determining a first I/O timeout probability of the first preset interval according to the slow disc detection parameters of the first preset interval; determining a second I/O timeout probability of the second preset interval according to the slow disc detection parameters of the second preset interval; and when the first I/O timeout probability or the second I/O timeout probability is determined to exceed the corresponding I/O timeout threshold, determining that the hard disk is a slow disk.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
As shown in fig. 10, a program product 100 for slow disc detection according to an embodiment of the present disclosure is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device over any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., over the internet using an internet service provider).
It should be noted that although several modules or sub-modules of the system are mentioned in the above detailed description, such partitioning is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the modules described above may be embodied in one module, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module described above may be further divided into embodiments by a plurality of modules.
Further, while operations of the modules of the disclosed system are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain operations may be omitted, operations combined into one operation execution, and/or operations broken down into multiple operation executions.
As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.