Saturday, January 17, 2009

Don't shout at your storage!

Remember: your storage don't like when you shout at it !!!

Brendan, you guys rock!!!

Wednesday, January 14, 2009

Boot-from-SAN-root-zpool behavior

Root-zpool sitting on SAN disk is configured by default in S10 10/08 with failmode=continue attribute
It means that it...
"Returns EIO to any new write I/O requests
but allows reads to any of the remaining
healthy devices. Any write requests that
have yet to be committed to disk would be
blocked."

The default behavior is: the LUN is offlined 20 sec[1] after the disk disappears from the fabric. The following error message appears on the console:
scsi: WARNING: /pci@780/SUNW,qlc@0,1/fp@0,0/ssd@w5001738000160141,7 (ssd0):
Command failed to complete...Device is gone
when MPxIO is disabled
or
scsi: WARNING: /scsi_vhci/ssd@g0017380000160008 (ssd1):
drive offline
when MPxIO is enabled.

The system is hung and must be restarted.
[1] "Within the LV stack (Leadville), there is a 20-second delay to avoid any
unintentional removal of the cable. This ensures that the loss of sync
is due to a failure and not due to removing the wrong cable and then
realizing the mistake. The 20 seconds is used to correct the mistake.
If they do not put the cable back within 20 seconds, error recovery
will start."