Troubleshooting connectivity issues on a Brocade SAN

I recently had the “pleasure” to figure out what was wrong with a Brocade based SAN environment. Servers were loosing connectivity on one of the HBAs, but all links were online and further investigation was necessary.

Going through all the error counters on each of the long wave SFPs finally revealed one of the SFPs’ health as marginal (hence it was still online, but very buggy indeed). The webtools GUI showed this particular SFP als orange instead of green. Disabling and re-enabling this SFP didn’t help and I decided to shut this SFP for good. And guess what: all my troubles went away. The trunk this SFP was in went back to a non-redundant, but healthy state and all servers got back to normal operations and got their redundant paths back.

So to summarize the story: look for marginal or even faulted SFPs when vague connectivity issues arise. If links are redundant, shutting the faulty one might help.

Would you like to comment on this post?