Wednesday, August 7, 2013

/scratch problems

While trying to swap out a head that was exhibiting problems we had a SAS card failure.

This failure caused 60 drives to disappear from the system.  Because of the ungraceful way the drives were removed, this took away all the redundancy in the raid 6 arrays.

We were able to get the drives back up on the old head (that had been removed) but because they had been missing from the system for 10 minutes, the arrays forced themselves into full rebuild.

Right now scratch has no parity --- none --- and we have 60 drives trying rebuilding on only one head. The other head is up but  is not picking up the paths.  We have been working with the vendor, DDN, on this.

Right now the head is rebuilding only 30 of the drives (get us up to raid 5) and then will continue onto raid6.

With only one head working we are CPU bound, the rebuild is going at 1%/hour. We are at risk of losing data until the end of the week and it will take another week to get full raid 6.