[go: up one dir, main page]

Page MenuHomePhabricator

Degraded RAID on cloudvirt2004-dev
Closed, ResolvedPublic

Description

TASK AUTO-GENERATED by Nagios/Icinga RAID event handler

A degraded RAID (md) was detected on host cloudvirt2004-dev. An automatic snapshot of the current RAID status is attached below.

Please sync with the service owner to find the appropriate time window before actually replacing any failed hardware.

CRITICAL: State: degraded, Active: 1, Working: 1, Failed: 1, Spare: 0

$ sudo /usr/local/lib/nagios/plugins/get-raid-status-md
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 
md0 : active raid1 sdb2[1] sda2[0](F)
      467894272 blocks super 1.2 [2/1] [_U]
      bitmap: 1/4 pages [4KB], 65536KB chunk

unused devices: <none>

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@dcaro we got this automated ticket and I saw your ticket T374467 that the drive errors are gone.
I logged into the server's gui just now and it's showing several memory errors on DIMM B7. This might have something to do with the issue you were having. I can put in a work order for a new DIMM from Dell since this one is in warranty. Let me know if you want to do that.

@dcaro we got this automated ticket and I saw your ticket T374467 that the drive errors are gone.
I logged into the server's gui just now and it's showing several memory errors on DIMM B7. This might have something to do with the issue you were having. I can put in a work order for a new DIMM from Dell since this one is in warranty. Let me know if you want to do that.

Nice catch, yes please

requested submitted. I'll update when it gets here.

part should be arriving some time today. we can schedule down time for the server to get it swapped when ready.

shipping has gone awry. will update when it's in hand

@dcaro I have the replacement dimm. lmk when is a good time to replace it.

@Jhancock.wm Hey, anytime you have some time is good, the node is depooled :)

it's been replaced. not seeing any alerts at the moment. We can close this one and reference it if another error gets generated?

dcaro claimed this task.

@Jhancock.wm sure, closing, thanks a lot!