puppet/modules/base/files/monitoring/check-raid.py:
Currently this Icinga check will say OK as soon as a volume begins rebuilding,
however it would be best if the monitor continued to return a WARNING until
redundancy is restored. Operators need awareness of in-progress rebuilds
because we may wish to treat a host specially, such as taking it out of a
service pool, until the rebuild is complete. This could be to minimize risk to
data integrity, or for performance reasons due to disk IO consumed by the RAID
rebuild process.
It seems most helpful if the monitor would display the time remaining and speed
of in-progress rebuilds, as supplied by /proc/mdstat:
md1 : active raid10 sdg2[6] sdf2[5] sda2[0] sdh2[7] sdd2[3] sde2[4] sdc2[2]
sdb2[1]
1132249088 blocks super 1.2 512K chunks 2 near-copies [8/8] [UUUUUUUU]
[===========>.........] check = 56.7% (642031616/1132249088) finish=5981.7min
speed=1365K/sec
Description
Description
Details
Details
- Reference
- rt6796
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T294906 Puppet Improvements | |||
Duplicate | jbond | T265138 Work required to prepare for puppet 7 | |||
Resolved | SLyngshede-WMF | T273673 replace all puppet crons with systemd timers | |||
Open | None | T132324 Tracking and Reducing cron-spam to root@ | |||
Resolved | jcrespo | T84178 investigate RAID BBU auto-learn on db hosts | |||
Resolved | faidon | T84050 Refactor RAID checks (check-raid) | |||
Restricted Task | |||||
Open | None | T83476 Icinga RAID check: monitor rebuild status |