Monday, October 08, 2007

Why reboot_on_path_failure is useless when you're using ZFS with SunCluster 3.2

Why enabling the reboot_on_path_failure won't help on SC3.2 with ZFS ?
Because the cluster node will panic with
"ZFS: I/O failure"
before the scdpmd will figure out the problem.
What it means is: the ZFS is doing the scdpmd work much faster. It causes the host crash after first IO to the unaccessible (disconnected) storage. The scdpmd is checking storage every 10 minutes (!).
To make it works with ZFS, the bug 6322646 must be fixed.

Here's more detailed explanation.