[go: up one dir, main page]

CN113903389A - Slow disk detection method and device and computer readable and writable storage medium - Google Patents

Slow disk detection method and device and computer readable and writable storage medium Download PDF

Info

Publication number
CN113903389A
CN113903389A CN202111143369.4A CN202111143369A CN113903389A CN 113903389 A CN113903389 A CN 113903389A CN 202111143369 A CN202111143369 A CN 202111143369A CN 113903389 A CN113903389 A CN 113903389A
Authority
CN
China
Prior art keywords
preset interval
slow
timeout
hard disk
disk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111143369.4A
Other languages
Chinese (zh)
Inventor
夏天鹏
王志豪
周明伟
江文龙
罗心
李文俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202111143369.4A priority Critical patent/CN113903389A/en
Publication of CN113903389A publication Critical patent/CN113903389A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C29/00Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
    • G11C29/04Detection or location of defective memory elements, e.g. cell constructio details, timing of test signals
    • G11C29/08Functional testing, e.g. testing during refresh, power-on self testing [POST] or distributed testing
    • G11C29/12Built-in arrangements for testing, e.g. built-in self testing [BIST] or interconnection details
    • G11C29/1201Built-in arrangements for testing, e.g. built-in self testing [BIST] or interconnection details comprising I/O circuitry
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C29/00Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
    • G11C29/04Detection or location of defective memory elements, e.g. cell constructio details, timing of test signals
    • G11C29/08Functional testing, e.g. testing during refresh, power-on self testing [POST] or distributed testing
    • G11C29/10Test algorithms, e.g. memory scan [MScan] algorithms; Test patterns, e.g. checkerboard patterns 
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C29/00Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
    • G11C29/04Detection or location of defective memory elements, e.g. cell constructio details, timing of test signals
    • G11C29/08Functional testing, e.g. testing during refresh, power-on self testing [POST] or distributed testing
    • G11C29/12Built-in arrangements for testing, e.g. built-in self testing [BIST] or interconnection details
    • G11C29/12015Built-in arrangements for testing, e.g. built-in self testing [BIST] or interconnection details comprising clock generation or timing circuitry
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C29/00Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
    • G11C29/04Detection or location of defective memory elements, e.g. cell constructio details, timing of test signals
    • G11C29/50Marginal testing, e.g. race, voltage or current testing
    • G11C29/50012Marginal testing, e.g. race, voltage or current testing of timing

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

本公开涉及一种慢盘检测方法、装置及计算机可读写存储介质,所述方法包括:采用第二预设区间作为滑动窗口,以第一预设区间为滑动步长进行滑动;满足慢盘检测条件时,获取当前滑动窗口内各第一预设区间和第二预设区间分别对应的慢盘检测参数,所述慢盘检测参数包括硬盘的服务时长、I/O次数和I/O超时次数;根据所述第一预设区间的慢盘检测参数,确定所述第一预设区间的第一I/O超时概率;根据所述第二预设区间的慢盘检测参数,确定所述第二预设区间的第二I/O超时概率;确定所述第一I/O超时概率或第二I/O超时概率超过对应的I/O超时阈值时,确定所述硬盘为慢盘。本公开降低慢盘检测的误判率,提高慢盘检测的准确性。

Figure 202111143369

The present disclosure relates to a slow disk detection method, a device and a computer readable and writable storage medium. The method includes: using a second preset interval as a sliding window, and sliding with a first preset interval as a sliding step; When detecting conditions, acquire the slow disk detection parameters corresponding to each of the first preset interval and the second preset interval in the current sliding window, where the slow disk detection parameters include the service time of the hard disk, the number of I/Os, and the I/O timeout. number of times; determine the first I/O timeout probability of the first preset interval according to the slow disk detection parameters of the first preset interval; determine the slow disk detection parameter of the second preset interval The second I/O timeout probability in the second preset interval; when it is determined that the first I/O timeout probability or the second I/O timeout probability exceeds the corresponding I/O timeout threshold, the hard disk is determined to be a slow disk. The present disclosure reduces the misjudgment rate of slow disk detection and improves the accuracy of slow disk detection.

Figure 202111143369

Description

Slow disk detection method and device and computer readable and writable storage medium
Technical Field
The present disclosure relates to the field of data storage technologies, and in particular, to a slow disc detection method and apparatus, and a computer readable and writable storage medium.
Background
In the distributed storage cluster system, mass service data are stored in a hard disk, and the quality of the hard disk directly influences the stability of the distributed storage cluster. In an application scenario, the abnormal states of the hard disk are mainly divided into two types: a "bad disk" and a "slow disk". The 'bad disc' is mainly reflected in that normal reading and writing can not be carried out at all, the characteristics are obvious, and the bad disc is easy to find; the slow disk mainly reflects phenomena such as bad tracks, stuck, jitter and the like, and when I/O (Input/Output) interaction is performed, if the slow disk exists, irregular I/O timeout occurs. The characteristic is not obvious and does not cause serious influence, but gradually influences the service performance of the distributed storage cluster. Therefore, a slow disk detection scheme is required to be provided to make a plan for the distributed storage cluster system so as to ensure the reliability of the system.
In the prior art, no specific detection and processing method is provided for scenes with high service pressure and low service pressure, so that the misjudgment rate of slow disk detection is improved. For example, a method for detecting a slow disk from dimensions of the number of timeout of I/O requests and the average service time respectively, but when a preset period occurs, traffic pressure is high and a hard disk is busy, I/O timeout is caused, and a target hard disk is misjudged. Therefore, how to overcome the technical defects in the prior art is a problem to be solved urgently.
Disclosure of Invention
The invention provides a slow disc detection method, a slow disc detection device and a computer readable and writable storage medium, wherein a slow disc is screened out from two time dimensions, a specific detection and processing method is provided particularly for large and small service pressure scenes, the misjudgment rate of slow disc detection is reduced, and the accuracy of slow disc detection is improved.
According to a first aspect of embodiments of the present disclosure, there is provided a slow disc detection method, including:
sliding by taking the first preset interval as a sliding step length by taking the second preset interval as a sliding window;
when the slow disk detection condition is met, slow disk detection parameters respectively corresponding to each first preset interval and each second preset interval in a current sliding window are obtained, wherein the slow disk detection parameters comprise service duration, I/O times and I/O overtime times of a hard disk;
determining a first I/O timeout probability of the first preset interval according to the slow disc detection parameters of the first preset interval;
determining a second I/O timeout probability of the second preset interval according to the slow disc detection parameters of the second preset interval;
and when the first I/O timeout probability or the second I/O timeout probability is determined to exceed the corresponding I/O timeout threshold, determining that the hard disk is a slow disk.
In a possible implementation manner, the determining a second I/O timeout probability of the second preset interval according to the I/O times and the I/O timeout times collected in the second preset interval includes:
and when the first I/O timeout probability is determined not to exceed the corresponding I/O timeout threshold, determining a second I/O timeout probability of the second preset interval according to the I/O times and the I/O timeout times acquired in the second preset interval.
In one possible implementation, the method further includes:
if the first I/O timeout probability and the second I/O timeout probability are both smaller than the corresponding I/O timeout threshold and I/O timeout occurs in a plurality of continuous first preset intervals in the current sliding window, determining the number of the continuous first preset intervals;
and if the number of the continuous first preset intervals is larger than the number of the set intervals, determining that the hard disk is a slow disk.
In a possible implementation manner, the obtaining the first I/O timeout number/the second I/O timeout number includes:
and combining the first I/O overtime times acquired in a third preset interval once when the frequency of the occurrence of the I/O overtime exceeds a set frequency threshold, wherein the length of the third preset interval is smaller than that of the first preset interval.
In a possible implementation manner, the determining, according to the slow disc detection parameter of the first preset interval, a first I/O timeout probability of the first preset interval includes:
calculating the average I/O overtime frequency of the first preset interval according to the ratio of the I/O overtime frequency of the first preset interval to the service time of the hard disk of the first preset interval;
calculating the average I/O frequency of the first preset interval according to the ratio of the I/O times of the first preset interval to the service time of the hard disk of the first preset interval;
and calculating the first I/O overtime probability of the first preset interval according to the ratio of the average I/O overtime frequency of the first preset interval to the average I/O frequency of the first preset interval.
In a possible implementation manner, the determining, according to the slow disc detection parameter of the second preset interval, a second I/O timeout probability of the second preset interval includes:
calculating the average I/O overtime frequency of the second preset interval according to the ratio of the I/O overtime frequency of the second preset interval to the service time of the hard disk of the second preset interval;
calculating the average I/O frequency of the second preset interval according to the ratio of the I/O times of the second preset interval to the service time of the hard disk of the second preset interval;
and calculating the second I/O overtime probability of the second preset interval according to the ratio of the average I/O overtime frequency of the second preset interval to the average I/O frequency of the second preset interval.
In a possible implementation manner, when the slow disk detection condition is satisfied, obtaining slow disk detection parameters respectively corresponding to each of a first preset interval and a second preset interval in a current sliding window, where the slow disk detection parameters include service duration, I/O times, and I/O timeout times of a hard disk, and the method includes:
acquiring a mapping relation table of hard disk slot numbers, hard disk marks, service time, I/O times and I/O overtime times of the hard disk established by the data nodes;
and acquiring slow disc detection parameters respectively corresponding to each first preset interval and each second preset interval in the current sliding window according to the service time, the I/O times and the I/O overtime times of the hard disc in the mapping relation table.
In a possible implementation manner, when determining that the hard disk is a slow disk, the method further includes:
marking the hard disk as a slow disk;
isolating the hard disk and reporting the hard disk to a metadata node, and stopping writing in service data;
and transferring the data on the hard disk to other hard disks of the data nodes or hard disks of other data nodes through load balancing.
According to a second aspect of embodiments of the present disclosure, there is provided a slow disc detection apparatus, the apparatus including:
the second preset interval sliding module is used for sliding by taking the first preset interval as a sliding step length by taking the second preset interval as a sliding window;
the slow disk detection parameter acquisition module is used for acquiring slow disk detection parameters respectively corresponding to each first preset interval and each second preset interval in a current sliding window when a slow disk detection condition is met, wherein the slow disk detection parameters comprise service duration, I/O times and I/O overtime times of a hard disk;
a first I/O timeout probability determining module, configured to determine a first I/O timeout probability of the first preset interval according to the slow disc detection parameter of the first preset interval;
a second I/O timeout probability determining module, configured to determine a second I/O timeout probability of the second preset interval according to the slow disc detection parameter of the second preset interval;
and the slow disk determining module is used for determining that the hard disk is a slow disk when the first I/O timeout probability or the second I/O timeout probability exceeds the corresponding I/O timeout threshold.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing processor-executable instructions; wherein the processor implements the steps of the slow disc detection method by executing the executable instructions.
According to a fourth aspect of the embodiments of the present disclosure, there is provided a computer readable and writable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the above-mentioned slow disc detection method.
In addition, for technical effects brought by any one implementation manner of the second aspect to the fourth aspect, reference may be made to technical effects brought by different implementation manners of the first aspect, and details are not described here.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:
set up two and preset the interval, from local to holistic detection hard disk, enlarged the detection dimension, can effectively deal with different scenes, mutual independence between hard disk and the hard disk simultaneously does not have the dependence, has guaranteed the uniqueness of testing result, has guaranteed the low coupling of detection scheme.
Considering the hot plug condition of the data node, the service time of the hard disk is different from the order of magnitude of the received I/O request, and the detection points of the hard disks are unified by the angle of the I/O overtime probability in unit time, so that the detection accuracy is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a schematic diagram illustrating an application scenario in accordance with an exemplary embodiment;
FIG. 2 is a flow diagram illustrating a slow disc detection method in accordance with an exemplary embodiment;
FIG. 3 is a diagram illustrating a sliding window in a slow disc detection method in accordance with an exemplary embodiment;
FIG. 4 is a diagram illustrating a slid window after sliding in a slow disc detection method in accordance with an exemplary embodiment;
FIG. 5 is a general flow diagram illustrating a slow disc detection method in accordance with an exemplary embodiment;
FIG. 6 is a diagram illustrating a slow disc detection method determining a slow disc in a scenario of low traffic pressure according to an exemplary embodiment;
FIG. 7 is a flow chart illustrating a slow disc detection method according to an exemplary embodiment;
FIG. 8 is a block diagram illustrating a slow disc detection device according to an exemplary embodiment;
FIG. 9 is a schematic diagram of an electronic device illustrating a slow disc detection method in accordance with an exemplary embodiment;
FIG. 10 is a program product diagram illustrating a slow disc detection method according to an exemplary embodiment.
Detailed Description
To make the objects, technical solutions and advantages of the present disclosure clearer, the present disclosure will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present disclosure, rather than all embodiments. All other embodiments, which can be derived by one of ordinary skill in the art from the embodiments disclosed herein without making any creative effort, shall fall within the scope of protection of the present disclosure.
Some of the words that appear in the text are explained below:
1. the term "and/or" in the embodiments of the present disclosure describes an association relationship of associated objects, and means that there may be three relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
2. The terms "first," "second," and the like in the description and in the claims of the present disclosure and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein.
The application scenario described in the embodiment of the present disclosure is for more clearly illustrating the technical solution of the embodiment of the present disclosure, and does not form a limitation on the technical solution provided in the embodiment of the present disclosure, and as a person having ordinary skill in the art knows, with the occurrence of a new application scenario, the technical solution provided in the embodiment of the present disclosure is also applicable to similar technical problems. In the description of the present disclosure, the term "plurality" means two or more unless otherwise specified.
In the prior art, no specific detection and processing method is provided for large and small service pressure scenes, so that the misjudgment rate of slow disk detection is improved.
In order to solve the above problems, the present disclosure provides a slow disk detection method, a slow disk detection device, and a computer readable and writable storage medium, which count and calculate the I/O timeout probability of each hard disk from two time dimensions, screen out slow disks, and provide a specific detection and processing method especially for large and small service pressure scenes, thereby reducing the misjudgment rate of slow disk detection and improving the accuracy of slow disk detection.
Reference is first made to fig. 1, which is a schematic view of an application scenario of an embodiment of the present disclosure. A user 10 logs in a metadata server 12 in the distributed storage cluster storage system through a client installed in a user device 11, where the client may be a browser of a web page or an application client installed in a mobile user device, such as a mobile phone, a tablet computer, and the like.
The user device 11 and the metadata server 12 are communicatively connected via a network, which may be a local area network, a wide area network, or the like. The user device 11 may be a portable device (e.g., a mobile phone, a tablet, a notebook, etc.) or a Personal Computer (PC), and the metadata server 12 may be any device capable of providing internet services.
One possible form of communication between the user device 11 and the metadata server 12 is that a user can log on to a corresponding slow disc detection platform, and send a slow disc detection instruction of the user 10 to the metadata server 12 through a communication network, and the data node 13 in the metadata server 12 determines whether each hard disc belonging to the data node 13 is a slow disc.
In the embodiment of the present disclosure, the data node 13 uses the second preset interval as a sliding window, and slides with the first preset interval as a sliding step length; when the slow disk detection condition is met, slow disk detection parameters respectively corresponding to each first preset interval and each second preset interval in a current sliding window are obtained, wherein the slow disk detection parameters comprise service duration, I/O times and I/O overtime times of a hard disk; determining a first I/O timeout probability of the first preset interval according to the slow disc detection parameters of the first preset interval; determining a second I/O timeout probability of the second preset interval according to the slow disc detection parameters of the second preset interval; and when the first I/O timeout probability or the second I/O timeout probability is determined to exceed the corresponding I/O timeout threshold, determining that the hard disk is a slow disk.
The embodiment of the disclosure provides a slow disc detection method, and the disclosure is based on the same concept, and further provides a slow disc detection device, an electronic device and a computer readable and writable storage medium.
Example 1
A slow disk detection method provided by the present disclosure is described below with specific embodiments, and the method is applied to a data node in a distributed storage cluster storage system, as shown in fig. 2, and includes:
step 201, adopting a second preset interval as a sliding window, and sliding by taking the first preset interval as a sliding step length;
specifically, the second preset interval is n times of the first preset interval, and the first preset interval, the second preset interval and the multiple relation n between the first preset interval and the second preset interval can be set according to actual conditions, or can be directly completed through a configuration file of the data node. The first preset interval is defined by1The second preset interval is represented by I2(I2={I11,I12,I13,...,I1n}). As shown in fig. 3, the first predetermined interval is 1 day, and the second predetermined interval is 7 days. Sliding the second preset interval as a sliding window and the first preset interval as a sliding step length, as shown in fig. 4, the sliding step length of the sliding window is 1 day from the original [0,7 ]]Position is changed to [1,8 ]]Location. The method and the device have the advantages that two preset intervals are arranged, the detection dimension is enlarged from local to integral detection hard disks, and different scenes can be effectively dealt with. Meanwhile, the hard disk and the hard disk are independent from each other, so that dependence is avoided, the uniqueness of a detection result is ensured, and the low coupling of a detection scheme is ensured.
Step 202, when a slow disc detection condition is met, obtaining slow disc detection parameters respectively corresponding to each first preset interval and each second preset interval in a current sliding window, wherein the slow disc detection parameters comprise service duration, I/O times and I/O overtime times of a hard disc;
specifically, the service duration of the hard disk is a sum of response durations of the hard disk processing the I/O requests in the first preset interval or the second preset interval, the I/O timeout times of the hard disk are total times of response timeout when the hard disk processes the I/O requests in the first preset interval or the second preset interval, and the I/O times of the hard disk are total times of the hard disk processing the I/O requests in the first preset interval or the second preset interval and include the I/O timeout times.
Step 203, determining a first I/O timeout probability of the first preset interval according to the slow disc detection parameters of the first preset interval;
specifically, the first I/O timeout probability is represented as a probability that an I/O timeout occurs within a unit time in a first preset interval, and is determined according to a service duration of a hard disk, I/O times, and I/O timeout times in the first preset interval.
Step 204, determining a second I/O timeout probability of the second preset interval according to the slow disc detection parameters of the second preset interval;
specifically, the second I/O timeout probability is represented as a probability that an I/O timeout occurs within a unit time in a second preset interval, and the second I/O timeout probability in the second preset interval is determined according to the service duration, the I/O frequency, and the I/O timeout frequency of the hard disk in the second preset interval. The data nodes have hot plug condition, the service duration of the hard disks is different from the order of magnitude of the received I/O request, and the detection points of the hard disks are unified by the angle of the I/O overtime probability in unit time, so that the detection accuracy is improved.
Step 205, determining that the hard disk is a slow disk when the first I/O timeout probability or the second I/O timeout probability exceeds the corresponding I/O timeout threshold.
Specifically, if the first I/O timeout probability is determined to exceed the corresponding I/O timeout threshold, determining that the hard disk is a slow disk; otherwise, determining a second I/O timeout probability of the second preset interval, and if the second I/O timeout probability is determined to exceed the corresponding I/O timeout threshold, determining that the hard disk is a slow disk.
The slow disk detection method provided by the embodiment of the disclosure can be used for counting and calculating the I/O overtime probability of each hard disk from two time dimensions of the first preset interval and the second preset interval, and screening the slow disks by comparing the threshold values respectively corresponding to the intervals, so that the misjudgment rate of slow disk detection is reduced, and the accuracy of slow disk detection is improved.
When the slow disc detection condition is met, the slow disc detection condition can be set according to the requirement, can be triggered periodically, can also be triggered when a certain time point is reached, or can be triggered according to the data storage condition. As an alternative implementation, it may be determined whether the detection condition is satisfied according to the configuration item of the configuration file. The configuration item is used for indicating whether the slow disc detection switch is turned on or not, and when the slow disc detection switch is determined to be turned on according to the configuration item, the slow disc detection condition is determined to be met.
The configuration file may further include a first preset interval, a second preset interval, corresponding thresholds, and other related configuration items.
As an optional implementation manner, when the method is applied to data nodes, each data node may establish its own mapping relationship table in a memory, where the mapping relationship table is used to establish data information, corresponding to each hard disk, for determining a slow disk detection parameter in a plurality of hard disks of the data node, and specifically, the content of the mapping relationship table may include: the slot number of the hard disk, the hard disk mark, the service time, the I/O times and the I/O overtime times can determine the slow disk detection parameters according to a mapping relation table established in the memory. Specifically, the hard disk slot number is a slot number in a server where the hard disk is located; the hard disk markers are used to identify normal disks, slow disks, bad disks, and unknown disks, and the disclosed embodiments are directed to only normal disks and slow disks. Determining whether the hard disk is a slow disk or not according to the hard disk mark, determining the service time of a first preset interval and the service time of a second preset interval of the hard disk by reading the service time of the hard disk, determining the I/O times of the hard disk in the first preset interval and the I/O times of the hard disk in the second preset interval by reading the I/O times of the hard disk, and determining the I/O overtime times of the hard disk in the first preset interval and the I/O overtime times of the hard disk in the second preset interval by reading the I/O overtime times of the hard disk.
As an optional implementation manner, when determining that the hard disk is a slow disk, the method further includes:
marking the hard disk as a slow disk;
isolating the hard disk and reporting the hard disk to a metadata node, and stopping writing in service data;
and transferring the data on the hard disk to other hard disks of the data nodes or hard disks of other data nodes through load balancing.
Specifically, this process is the prior art, and the load balancing is currently implemented by a plurality of methods, such as a dynamic load balancing method, which are not described herein again.
FIG. 5 is a flowchart illustrating an overview of a slow disk detection method according to an exemplary embodiment, applied to a data node in a distributed storage cluster, as shown in FIG. 5, including:
step 501, reading a configuration file by each data node in a distributed storage cluster system;
specifically, when the data node is started, the configuration file is read, and whether the slow disk detection switch is turned on or not is determined according to the configuration items in the configuration file, so that the data node can detect the hard disk in real time.
Step 502, judging whether to start slow disc detection according to configuration items in the configuration file, and if not, directly ending the slow disc detection;
step 503, according to the configuration items in the configuration file, if the slow disc detection is started, setting each preset interval, each preset interval threshold value and other related configuration items according to the configuration file;
specifically, a first preset interval and a corresponding threshold, a second preset interval and a corresponding threshold, and the like need to be set.
Step 504, each data node establishes a mapping relation table in a memory, and dynamically updates the content of the mapping relation table along with the input and output of the hard disk service data;
505, each data node performs periodic I/O performance statistics on the corresponding hard disk according to the mapping relation table;
specifically, statistics may be performed when the sliding window slides once, that is, according to the content of the mapping relationship table, slow disc detection parameters of the current sliding window and each first preset interval in the sliding window are determined.
It should be noted that, when a new sliding occurs to the sliding window, only one first preset interval is updated, and the slow disc determination according to the slow disc detection parameter has been performed in the second preset interval that is not updated before, so that only the slow disc detection parameter of the first preset interval that is updated may be counted.
Step 506, detecting whether the hard disk is a slow disk according to the counted slow disk detection parameters in the first preset interval, if not, executing step 507, otherwise, executing step 508;
step 507, detecting whether the hard disk is a slow disk according to the counted slow disk detection parameters in the second preset interval, if not, executing step 505, otherwise, executing step 508;
specifically, at the initial stage of powering on the data node, if the duration of the slow disk detection statistical data does not reach the second preset interval, the second I/O timeout probability of the second preset interval is not calculated, and only the first I/O timeout probability of the first preset interval is calculated. Because the hard disk on the data node is new hardware in the early stage of power-on, the slow disk condition can be basically ignored. And when the duration of the statistical data reaches a second preset interval, calculating a second I/O timeout probability of the second preset interval, sliding according to the length of the first preset interval, and clearing the data which are not in the second preset interval.
Step 508, the data node marks the hard disk as a slow disk, and dynamically updates a mapping relation table;
specifically, only the data node needs to modify the relation table in the memory, and no pressure is caused on the system, the memory and the network bandwidth of the data node, so that the availability of the storage cluster is improved.
Step 509, isolating the hard disk and reporting the hard disk to a metadata node, and stopping writing in service data;
step 510, transferring the data on the hard disk to other hard disks of the data node or hard disks of other data nodes through load balancing.
Specifically, the availability, reliability and access efficiency of the distributed storage cluster are ensured through step 508 and step 510.
As an optional implementation manner, the manner of the first I/O timeout probability and the second I/O timeout probability calculated in this embodiment is as follows:
1) first I/O timeout probability
Calculating the average I/O overtime frequency of the first preset interval according to the ratio of the I/O overtime frequency of the first preset interval to the service time of the hard disk of the first preset interval;
calculating the average I/O frequency of the first preset interval according to the ratio of the I/O times of the first preset interval to the service time of the hard disk of the first preset interval;
and calculating the first I/O overtime probability of the first preset interval according to the ratio of the average I/O overtime frequency of the first preset interval to the average I/O frequency of the first preset interval.
2) Second I/O timeout probability
Calculating the average I/O overtime frequency of the second preset interval according to the ratio of the I/O overtime frequency of the second preset interval to the service time of the hard disk of the second preset interval;
calculating the average I/O frequency of the second preset interval according to the ratio of the I/O times of the second preset interval to the service time of the hard disk of the second preset interval;
and calculating the second I/O overtime probability of the second preset interval according to the ratio of the average I/O overtime frequency of the second preset interval to the average I/O frequency of the second preset interval.
The slow disc can be detected by the method under the normal service environment, but the actual service scene has the scenes with higher service pressure and lower service pressure. The traffic volume under the scene with low traffic pressure is less than the normal traffic volume, which affects the reliability of the distributed cluster.
As an optional implementation, the method further comprises:
if the first I/O timeout probability and the second I/O timeout probability are both smaller than the corresponding I/O timeout threshold and I/O timeout occurs in a plurality of continuous first preset intervals in the current sliding window, determining the number of the continuous first preset intervals;
and if the number of the continuous first preset intervals is larger than the number of the set intervals, determining that the hard disk is a slow disk.
Specifically, when the traffic volume of the scenario is smaller than the normal traffic volume, both the first I/O timeout probability of the first preset interval and the second I/O timeout probability of the second preset interval are smaller than the corresponding I/O timeout thresholds, but the I/O timeout probability of the hard disk in the ith first preset interval is smaller than the corresponding I/O timeout threshold1iThe I/O timeout occurs and starts to the jth first preset interval I of the second preset interval1jWhen the I/O overtime is over, the number of the first preset intervals for the I/O overtime is larger than or equal to the number of the set intervals, namely j-I +1 is larger than or equal to N, wherein I is larger than or equal to 1 and smaller than or equal to N, N is the multiple of the second preset interval and the first preset interval, N is the number of the set intervals, and the size of N can be set according to the actual situation, and the hard disk is judged to be a slow disk.
As shown in FIG. 6, the present disclosure thresholds in a configuration file for data nodesValue N is set to
Figure BDA0003284824060000121
N is set to be 7, I/O timeout is continuously generated from the 2 nd first preset interval to the 6 th first preset interval, I is 2, j is 6, and the hard disk is judged to be a slow disk according to j-I +1 being not less than N.
According to the characteristics of the slow disk, the scene that the service pressure is small is considered, the hard disk which does not meet the threshold value of the first preset interval and the threshold value of the second preset interval but continuously generates I/O overtime in the plurality of first preset intervals possibly exists, the slow disk is effectively screened according to the phenomenon, data migration is carried out in time, and the reliability of the distributed cluster is improved.
Under the scene of high service pressure, the high service pressure can cause high-frequency I/O overtime of the hard disk jitter, and the target hard disk is easily misjudged.
As an optional implementation manner, the obtaining the first I/O timeout number includes:
and combining the first I/O overtime times acquired in a third preset interval once when the frequency of the occurrence of the I/O overtime exceeds a set frequency threshold, wherein the length of the third preset interval is smaller than that of the first preset interval.
Specifically, in order to improve the accuracy of slow disc detection, when it is detected that the frequency of occurrence of I/O timeout exceeds a set frequency threshold, the data node needs to set a third preset interval, merge the first I/O timeout times acquired within the third preset interval into one time, normally record the I/O timeout times occurring outside the third preset interval, and the length of the third preset interval is smaller than the first preset interval.
For example, the third preset interval is set to 30 seconds(s), an I/O timeout occurs at time t, the data node records the I/O timeout, if the I/O timeout still occurs in the [ t, t +30] interval, the data node does not record the I/O timeout again, that is, only once the I/O timeout is recorded in the [ t, t +30] interval, and if the I/O timeout occurs outside the [ t, t +30] interval, the number of times of the I/O timeout is normally recorded.
Under the scene of high service pressure, the frequently-occurring I/O overtime times in the third preset interval are combined, only one-time statistics is carried out, the misjudgment probability of the slow disk due to high service judgment is reduced, and the accuracy of slow disk detection is improved.
Fig. 7 is a flowchart illustrating a slow disc detection method according to an exemplary embodiment, as shown in fig. 7, including:
step 701, acquiring a slow disc detection parameter according to a mapping relation table;
specifically, according to the service time of the hard disk, the time when the I/O occurs, and the time when the I/O overtime occurs in the mapping relationship table, slow disk detection parameters respectively corresponding to each first preset interval and each second preset interval in the current sliding window are obtained.
Step 702, determining a first I/O timeout probability of a first preset interval;
as described above, in the first preset interval, before determining the first I/O timeout probability, the frequency of occurrence of I/O timeout, that is, the number of times of occurrence of I/O timeout in unit time, needs to be calculated according to the slow disc detection parameter corresponding to the first I/O timeout probability. If the frequency of the occurrence of I/O overtime exceeds the set frequency threshold, combining the acquired first I/O overtime times in a third preset interval into one time, normally recording the occurring first I/O overtime times outside the third preset interval, and then determining the first I/O overtime probability of the first preset interval, and if the frequency of the occurrence of I/O overtime does not exceed the set frequency threshold, directly determining the first I/O overtime probability of the first preset interval.
Step 703, determining whether the first I/O timeout probability exceeds a corresponding I/O timeout threshold, if yes, executing step 708, otherwise, executing step 704;
specifically, step 704, determining a second I/O timeout probability of a second preset interval;
step 705, determining whether the second I/O timeout probability exceeds the corresponding I/O timeout threshold, if yes, executing step 708, otherwise executing step 706;
step 706, acquiring the number of a plurality of continuous first preset intervals in the current sliding window;
step 707, determining that the number of the consecutive first preset intervals is greater than the number of the set intervals, if so, executing step 708, otherwise, executing step 709;
step 708, determining that the hard disk is a slow disk;
step 709, determining the hard disk is a non-slow disk.
Example 2
Based on the same inventive concept, the embodiment of the present disclosure further provides a slow disc detection apparatus, and since the apparatus is an apparatus in the method in the embodiment of the present disclosure, and the principle of the apparatus to solve the problem is similar to that of the method, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not repeated.
As shown in fig. 8, the above apparatus includes the following modules:
a second preset interval sliding module 801, configured to slide by using the second preset interval as a sliding window and using the first preset interval as a sliding step length;
a slow disc detection parameter obtaining module 802, configured to obtain slow disc detection parameters corresponding to each of a first preset interval and a second preset interval in a current sliding window when a slow disc detection condition is met, where the slow disc detection parameters include a service duration, I/O times, and I/O timeout times of a hard disc;
a first I/O timeout probability determining module 803, configured to determine, according to the slow disc detection parameter in the first preset interval, a first I/O timeout probability in the first preset interval;
a second I/O timeout probability determining module 804, configured to determine a second I/O timeout probability of the second preset interval according to the slow disc detection parameter of the second preset interval;
a slow disk determining module 805, configured to determine that the hard disk is a slow disk when determining that the first I/O timeout probability or the second I/O timeout probability exceeds the corresponding I/O timeout threshold.
As an optional implementation manner, the second I/O timeout probability determining module is configured to determine the second I/O timeout probability of the second preset interval according to the I/O times and the I/O timeout times collected in the second preset interval, and includes:
and when the first I/O timeout probability is determined not to exceed the corresponding I/O timeout threshold, determining a second I/O timeout probability of the second preset interval according to the I/O times and the I/O timeout times acquired in the second preset interval.
As an optional implementation, the apparatus further comprises:
a first preset interval number determining module, configured to determine the number of consecutive first preset intervals if the first I/O timeout probability and the second I/O timeout probability are both smaller than corresponding I/O timeout thresholds and I/O timeout occurs in consecutive first preset intervals within the current sliding window;
the slow disk determining module is further configured to determine that the hard disk is a slow disk if the number of the consecutive first preset intervals is greater than the number of the set intervals.
As an optional implementation manner, the slow disc detection parameter obtaining module is configured to obtain the first I/O timeout number, and includes:
and combining the first I/O overtime times acquired in a third preset interval once when the frequency of the occurrence of the I/O overtime exceeds a set frequency threshold, wherein the length of the third preset interval is smaller than that of the first preset interval.
As an optional implementation manner, the first I/O timeout probability determining module is configured to determine, according to the slow disc detection parameter of the first preset interval, a first I/O timeout probability of the first preset interval, and includes:
calculating the average I/O overtime frequency of the first preset interval according to the ratio of the I/O overtime frequency of the first preset interval to the service time of the hard disk of the first preset interval;
calculating the average I/O frequency of the first preset interval according to the ratio of the I/O times of the first preset interval to the service time of the hard disk of the first preset interval;
and calculating the first I/O overtime probability of the first preset interval according to the ratio of the average I/O overtime frequency of the first preset interval to the average I/O frequency of the first preset interval.
As an optional implementation manner, the second I/O timeout probability determining module is configured to determine, according to the slow disc detection parameter of the second preset interval, a second I/O timeout probability of the second preset interval, and includes:
calculating the average I/O overtime frequency of the second preset interval according to the ratio of the I/O overtime frequency of the second preset interval to the service time of the hard disk of the second preset interval;
calculating the average I/O frequency of the second preset interval according to the ratio of the I/O times of the second preset interval to the service time of the hard disk of the second preset interval;
and calculating the second I/O overtime probability of the second preset interval according to the ratio of the average I/O overtime frequency of the second preset interval to the average I/O frequency of the second preset interval.
As an optional implementation manner, the slow disc detection parameter obtaining module is configured to obtain slow disc detection parameters corresponding to each of a first preset interval and a second preset interval in a current sliding window when a slow disc detection condition is met, where the slow disc detection parameters include service duration, I/O times, and I/O timeout times of a hard disc, and the method includes:
acquiring a mapping relation table of hard disk slot numbers, hard disk marks, service time, I/O times and I/O overtime times of the hard disk established by the data nodes;
and acquiring slow disc detection parameters respectively corresponding to each first preset interval and each second preset interval in the current sliding window according to the service time, the I/O times and the I/O overtime times of the hard disc in the mapping relation table.
As an optional implementation manner, the slow disk determining module, when determining that the hard disk is a slow disk, is further configured to:
marking the hard disk as a slow disk;
isolating the hard disk and reporting the hard disk to a metadata node, and stopping writing in service data;
and transferring the data on the hard disk to other hard disks of the data nodes or hard disks of other data nodes through load balancing.
Example 3
Based on the same inventive concept, the embodiment of the present disclosure further provides a slow disc detection electronic device, and as the electronic device is the electronic device in the method in the embodiment of the present disclosure, and the principle of the electronic device to solve the problem is similar to the method, the implementation of the electronic device may refer to the implementation of the method, and repeated details are not repeated.
An electronic device 90 according to this embodiment of the present disclosure is described below with reference to fig. 9. The electronic device 90 shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 9, the electronic device 90 may be embodied in the form of a general purpose computing device, which may be a terminal device, for example. The components of the electronic device 90 may include, but are not limited to: the at least one processor 91, the at least one memory 92 storing processor-executable instructions, and a bus 93 connecting the various system components (including the memory 92 and the processor 91).
The processor executes the executable instructions to implement the steps of:
sliding by taking the first preset interval as a sliding step length by taking the second preset interval as a sliding window;
when the slow disk detection condition is met, slow disk detection parameters respectively corresponding to each first preset interval and each second preset interval in a current sliding window are obtained, wherein the slow disk detection parameters comprise service duration, I/O times and I/O overtime times of a hard disk;
determining a first I/O timeout probability of the first preset interval according to the slow disc detection parameters of the first preset interval;
determining a second I/O timeout probability of the second preset interval according to the slow disc detection parameters of the second preset interval;
and when the first I/O timeout probability or the second I/O timeout probability is determined to exceed the corresponding I/O timeout threshold, determining that the hard disk is a slow disk.
As an optional implementation manner, the determining, by the processor, a second I/O timeout probability of the second preset interval according to the number of I/O times and the number of I/O timeout collected in the second preset interval includes:
and when the first I/O timeout probability is determined not to exceed the corresponding I/O timeout threshold, determining a second I/O timeout probability of the second preset interval according to the I/O times and the I/O timeout times acquired in the second preset interval.
As an optional implementation, the processor is further configured to perform:
if the first I/O timeout probability and the second I/O timeout probability are both smaller than the corresponding I/O timeout threshold and I/O timeout occurs in a plurality of continuous first preset intervals in the current sliding window, determining the number of the continuous first preset intervals;
and if the number of the continuous first preset intervals is larger than the number of the set intervals, determining that the hard disk is a slow disk.
As an optional implementation manner, the processor is configured to perform acquiring the first I/O timeout number, and includes:
and combining the first I/O overtime times acquired in a third preset interval once when the frequency of the occurrence of the I/O overtime exceeds a set frequency threshold, wherein the length of the third preset interval is smaller than that of the first preset interval.
As an optional implementation manner, the processor is configured to execute determining a first I/O timeout probability of the first preset interval according to the slow disc detection parameter of the first preset interval, including:
calculating the average I/O overtime frequency of the first preset interval according to the ratio of the I/O overtime frequency of the first preset interval to the service time of the hard disk of the first preset interval;
calculating the average I/O frequency of the first preset interval according to the ratio of the I/O times of the first preset interval to the service time of the hard disk of the first preset interval;
and calculating the first I/O overtime probability of the first preset interval according to the ratio of the average I/O overtime frequency of the first preset interval to the average I/O frequency of the first preset interval.
As an optional implementation manner, the processor is configured to execute determining a second I/O timeout probability of the second preset interval according to the slow disc detection parameter of the second preset interval, including:
calculating the average I/O overtime frequency of the second preset interval according to the ratio of the I/O overtime frequency of the second preset interval to the service time of the hard disk of the second preset interval;
calculating the average I/O frequency of the second preset interval according to the ratio of the I/O times of the second preset interval to the service time of the hard disk of the second preset interval;
and calculating the second I/O overtime probability of the second preset interval according to the ratio of the average I/O overtime frequency of the second preset interval to the average I/O frequency of the second preset interval.
As an optional implementation manner, when the slow disk detection condition is satisfied, the processor is configured to obtain slow disk detection parameters corresponding to each of a first preset interval and a second preset interval in a current sliding window, where the slow disk detection parameters include a service duration, I/O times, and I/O timeout times of a hard disk, and the method includes:
acquiring a mapping relation table of hard disk slot numbers, hard disk marks, service time, I/O times and I/O overtime times of the hard disk established by the data nodes;
and acquiring slow disc detection parameters respectively corresponding to each first preset interval and each second preset interval in the current sliding window according to the service time, the I/O times and the I/O overtime times of the hard disc in the mapping relation table.
As an optional implementation manner, when the processor is configured to perform determining that the hard disk is a slow disk, the processor further includes:
marking the hard disk as a slow disk;
isolating the hard disk and reporting the hard disk to a metadata node, and stopping writing in service data;
and transferring the data on the hard disk to other hard disks of the data nodes or hard disks of other data nodes through load balancing.
Bus 93 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, or a local bus using any of a variety of bus architectures.
Memory 92 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)921 and/or cache memory 922, and may further include Read Only Memory (ROM) 923.
Memory 92 may also include a program/utility 925 having a set (at least one) of program modules 924, such program modules 924 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The electronic device 90 may also communicate with one or more external devices 94 (e.g., keyboard, pointing device, etc.), with one or more devices that enable a user to interact with the electronic device 90, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 90 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 95. Also, the electronic device 90 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via a network adapter 96. As shown, the network adapter 96 communicates with the other modules of the electronic device 90 via the bus 93. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 90, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Example 4
In some possible embodiments, various aspects of the present disclosure may also be implemented in a program product, which includes program code for causing a terminal device to execute steps of modules in a slow disc detection apparatus according to various exemplary embodiments of the present disclosure described in the above section of "exemplary method" of this specification when the program product runs on the terminal device, for example, the terminal device may be configured to slide with a first preset interval as a sliding step length by using a second preset interval as a sliding window; when the slow disk detection condition is met, slow disk detection parameters respectively corresponding to each first preset interval and each second preset interval in a current sliding window are obtained, wherein the slow disk detection parameters comprise service duration, I/O times and I/O overtime times of a hard disk; determining a first I/O timeout probability of the first preset interval according to the slow disc detection parameters of the first preset interval; determining a second I/O timeout probability of the second preset interval according to the slow disc detection parameters of the second preset interval; and when the first I/O timeout probability or the second I/O timeout probability is determined to exceed the corresponding I/O timeout threshold, determining that the hard disk is a slow disk.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
As shown in fig. 10, a program product 100 for slow disc detection according to an embodiment of the present disclosure is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device over any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., over the internet using an internet service provider).
It should be noted that although several modules or sub-modules of the system are mentioned in the above detailed description, such partitioning is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the modules described above may be embodied in one module, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module described above may be further divided into embodiments by a plurality of modules.
Further, while operations of the modules of the disclosed system are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain operations may be omitted, operations combined into one operation execution, and/or operations broken down into multiple operation executions.
As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (11)

1. A slow disk detection method is applied to a data node, and is characterized by comprising the following steps:
sliding by taking the first preset interval as a sliding step length by taking the second preset interval as a sliding window;
when the slow disk detection condition is met, slow disk detection parameters respectively corresponding to each first preset interval and each second preset interval in a current sliding window are obtained, wherein the slow disk detection parameters comprise service duration, I/O times and I/O overtime times of a hard disk;
determining a first I/O timeout probability of the first preset interval according to the slow disc detection parameters of the first preset interval;
determining a second I/O timeout probability of the second preset interval according to the slow disc detection parameters of the second preset interval;
and when the first I/O timeout probability or the second I/O timeout probability is determined to exceed the corresponding I/O timeout threshold, determining that the hard disk is a slow disk.
2. The method of claim 1, wherein determining the second I/O timeout probability for the second preset interval according to the I/O times and the I/O timeout times collected in the second preset interval comprises:
and when the first I/O timeout probability is determined not to exceed the corresponding I/O timeout threshold, determining a second I/O timeout probability of the second preset interval according to the I/O times and the I/O timeout times acquired in the second preset interval.
3. The method of claim 1, further comprising:
if the first I/O timeout probability and the second I/O timeout probability are both smaller than the corresponding I/O timeout threshold and I/O timeout occurs in a plurality of continuous first preset intervals in the current sliding window, determining the number of the continuous first preset intervals;
and if the number of the continuous first preset intervals is larger than the number of the set intervals, determining that the hard disk is a slow disk.
4. The method of claim 1, wherein obtaining the first I/O timeout number comprises:
and combining the first I/O overtime times acquired in a third preset interval once when the frequency of the occurrence of the I/O overtime exceeds a set frequency threshold, wherein the length of the third preset interval is smaller than that of the first preset interval.
5. The method of claim 1, wherein the determining the first I/O timeout probability for the first preset interval according to the slow disc detection parameter for the first preset interval comprises:
calculating the average I/O overtime frequency of the first preset interval according to the ratio of the I/O overtime frequency of the first preset interval to the service time of the hard disk of the first preset interval;
calculating the average I/O frequency of the first preset interval according to the ratio of the I/O times of the first preset interval to the service time of the hard disk of the first preset interval;
and calculating the first I/O overtime probability of the first preset interval according to the ratio of the average I/O overtime frequency of the first preset interval to the average I/O frequency of the first preset interval.
6. The method of claim 1, wherein the determining a second I/O timeout probability for the second preset interval according to the slow disc detection parameter for the second preset interval comprises:
calculating the average I/O overtime frequency of the second preset interval according to the ratio of the I/O overtime frequency of the second preset interval to the service time of the hard disk of the second preset interval;
calculating the average I/O frequency of the second preset interval according to the ratio of the I/O times of the second preset interval to the service time of the hard disk of the second preset interval;
and calculating the second I/O overtime probability of the second preset interval according to the ratio of the average I/O overtime frequency of the second preset interval to the average I/O frequency of the second preset interval.
7. The method according to claim 1, wherein when the slow disc detection condition is satisfied, slow disc detection parameters respectively corresponding to each of a first preset interval and a second preset interval in a current sliding window are obtained, and the slow disc detection parameters include service duration, I/O times and I/O timeout times of a hard disc, and the method includes:
acquiring a mapping relation table of hard disk slot numbers, hard disk marks, service time, I/O times and I/O overtime times of the hard disk established by the data nodes;
and acquiring the slow disk detection parameters respectively corresponding to each first preset interval and each second preset interval in the current sliding window according to the service time, the time of generating I/O and the time of generating I/O overtime of the hard disk in the mapping relation table.
8. The method according to any one of claims 1 to 3, wherein when determining that the hard disk is a slow disk, the method further comprises:
marking the hard disk as a slow disk;
isolating the hard disk and reporting the hard disk to a metadata node, and stopping writing in service data;
and transferring the data on the hard disk to other hard disks of the data nodes or hard disks of other data nodes through load balancing.
9. A slow disc detection apparatus, characterized in that the apparatus comprises:
the second preset interval sliding module is used for sliding by taking the first preset interval as a sliding step length by taking the second preset interval as a sliding window;
the slow disk detection parameter acquisition module is used for acquiring slow disk detection parameters respectively corresponding to each first preset interval and each second preset interval in a current sliding window when a slow disk detection condition is met, wherein the slow disk detection parameters comprise service duration, I/O times and I/O overtime times of a hard disk;
a first I/O timeout probability determining module, configured to determine a first I/O timeout probability of the first preset interval according to the slow disc detection parameter of the first preset interval;
a second I/O timeout probability determining module, configured to determine a second I/O timeout probability of the second preset interval according to the slow disc detection parameter of the second preset interval;
and the slow disk determining module is used for determining that the hard disk is a slow disk when the first I/O timeout probability or the second I/O timeout probability exceeds the corresponding I/O timeout threshold.
10. An electronic device, comprising: a processor; a memory for storing processor-executable instructions; wherein the processor implements the steps of the method of any one of claims 1 to 9 by executing the executable instructions.
11. A computer readable and writable storage medium on which computer instructions are stored, characterized in that the instructions, when executed by a processor, implement the steps of the method according to any one of claims 1 to 8.
CN202111143369.4A 2021-09-28 2021-09-28 Slow disk detection method and device and computer readable and writable storage medium Pending CN113903389A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111143369.4A CN113903389A (en) 2021-09-28 2021-09-28 Slow disk detection method and device and computer readable and writable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111143369.4A CN113903389A (en) 2021-09-28 2021-09-28 Slow disk detection method and device and computer readable and writable storage medium

Publications (1)

Publication Number Publication Date
CN113903389A true CN113903389A (en) 2022-01-07

Family

ID=79029699

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111143369.4A Pending CN113903389A (en) 2021-09-28 2021-09-28 Slow disk detection method and device and computer readable and writable storage medium

Country Status (1)

Country Link
CN (1) CN113903389A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114979180A (en) * 2022-05-24 2022-08-30 超聚变数字技术有限公司 Data synchronization method, system and equipment
CN115934003A (en) * 2023-03-09 2023-04-07 浪潮电子信息产业股份有限公司 Slow disk identification method, device, equipment and readable storage medium in disk array
CN117573483A (en) * 2024-01-16 2024-02-20 苏州元脑智能科技有限公司 Hard disk removing method and device, storage medium and electronic equipment
CN117785074A (en) * 2024-02-28 2024-03-29 济南浪潮数据技术有限公司 A method, device, server and medium for input and output timeout processing

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103488544A (en) * 2013-09-26 2014-01-01 华为技术有限公司 Processing method and device for detecting slow disk
WO2017012392A1 (en) * 2015-07-17 2017-01-26 中兴通讯股份有限公司 Disk check method and apparatus
CN106407051A (en) * 2015-07-31 2017-02-15 华为技术有限公司 Slow disk detection method and device
CN109815037A (en) * 2017-11-22 2019-05-28 华为技术有限公司 Slow disk detection method and storage array
CN112416639A (en) * 2020-11-16 2021-02-26 新华三技术有限公司成都分公司 Slow disk detection method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103488544A (en) * 2013-09-26 2014-01-01 华为技术有限公司 Processing method and device for detecting slow disk
WO2017012392A1 (en) * 2015-07-17 2017-01-26 中兴通讯股份有限公司 Disk check method and apparatus
CN106407051A (en) * 2015-07-31 2017-02-15 华为技术有限公司 Slow disk detection method and device
CN109815037A (en) * 2017-11-22 2019-05-28 华为技术有限公司 Slow disk detection method and storage array
CN112416639A (en) * 2020-11-16 2021-02-26 新华三技术有限公司成都分公司 Slow disk detection method, device, equipment and storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114979180A (en) * 2022-05-24 2022-08-30 超聚变数字技术有限公司 Data synchronization method, system and equipment
CN114979180B (en) * 2022-05-24 2024-05-17 超聚变数字技术有限公司 Data synchronization method, system and equipment
CN115934003A (en) * 2023-03-09 2023-04-07 浪潮电子信息产业股份有限公司 Slow disk identification method, device, equipment and readable storage medium in disk array
CN117573483A (en) * 2024-01-16 2024-02-20 苏州元脑智能科技有限公司 Hard disk removing method and device, storage medium and electronic equipment
CN117573483B (en) * 2024-01-16 2024-04-02 苏州元脑智能科技有限公司 Hard disk removing method and device, storage medium and electronic equipment
CN117785074A (en) * 2024-02-28 2024-03-29 济南浪潮数据技术有限公司 A method, device, server and medium for input and output timeout processing
CN117785074B (en) * 2024-02-28 2024-07-02 济南浪潮数据技术有限公司 Method, device, server and medium for processing input/output timeout

Similar Documents

Publication Publication Date Title
CN113903389A (en) Slow disk detection method and device and computer readable and writable storage medium
US11093349B2 (en) System and method for reactive log spooling
US8655623B2 (en) Diagnostic system and method
WO2021147220A1 (en) Page access duration acquisition method, device, medium, and electronic apparatus
WO2018120720A1 (en) Method for locating test error of client program, electronic device, and storage medium
US8082275B2 (en) Service model flight recorder
CN110674025A (en) Interactive behavior monitoring method and device and computer equipment
US9529655B2 (en) Determining alert criteria in a network environment
JP2023502910A (en) Identifying the constituent events of an event storm in operations management
CN110633255B (en) Method and device for acquiring user use duration
US9201752B2 (en) System and method for correlating empirical data with user experience
US20230065492A1 (en) Method for obtaining browser running data, electronic device, and storage medium
US8676968B2 (en) Determining information about a computing system
CN110569182B (en) Crash rate calculation method and device, computer equipment and storage medium
CN108647284B (en) Method and device for recording user behavior, medium and computing equipment
CN114298533A (en) Performance index processing method, device, equipment and storage medium
US9952773B2 (en) Determining a cause for low disk space with respect to a logical disk
CN116069591A (en) Interface performance monitoring method, device, equipment and storage medium
CN108959625A (en) The acquisition methods and device of information in cloud data system
CN107766216A (en) It is a kind of to be used to obtain the method and apparatus using execution information
JP2010237836A (en) Security audit period derivation device, security audit period derivation program, and recording medium
US20210149760A1 (en) Method and apparatus to identify a problem area in an information handling system based on latencies
KR102735573B1 (en) Method for applying a BMC-based self-analysis model for edge server management in a rugged environment
CN119718866A (en) Application performance data acquisition method, device, equipment, medium and program product
CN111352992B (en) Data consistency detection method, device and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination