Wednesday, December 7, 2011

sundiag Exadata

Exadata is greatest but it's not right out of Utopia so expect problems. One of the problems which sometimes arise are related to physical disks. But as usual Oracle has got some nifty and pertinent tools for the diagnostics. Sundiag is one such tool. Sundiag is used to diagnose the problems in the disks in Exadata. This script is run as the root user.

Sundiag is run by following command as root:

# /opt/oracle.SupportTools/sundiag.sh

Upon completion the above command will create a date stamped tar.bz2 file in /tmp/sundiag_/tar.bz2

For example, I ran the following from a compute (db) node:

[root@firstnode tmp]# /opt/oracle.SupportTools/sundiag.sh

Success in AdpEventLog

Exit Code: 0x00
sundiag_2011_12_08_05_47/
sundiag_2011_12_08_05_47/messages
sundiag_2011_12_08_05_47/messages.1
sundiag_2011_12_08_05_47/messages.2
sundiag_2011_12_08_05_47/messages.3
sundiag_2011_12_08_05_47/messages.4
sundiag_2011_12_08_05_47/firstnode_dmesg_2011_12_08_05_47.out
sundiag_2011_12_08_05_47/firstnode_imageinfo-all_2011_12_08_05_47.out
sundiag_2011_12_08_05_47/firstnode_lspci_2011_12_08_05_47.out
sundiag_2011_12_08_05_47/firstnode_lspci-xxxx_2011_12_08_05_47.out
sundiag_2011_12_08_05_47/firstnode_lsscsi_2011_12_08_05_47.out
sundiag_2011_12_08_05_47/firstnode_fdisk-l_2011_12_08_05_47.out
sundiag_2011_12_08_05_47/firstnode_sel-list_2011_12_08_05_47.out
sundiag_2011_12_08_05_47/firstnode_megacli64-AdpAllInfo_2011_12_08_05_47.out
sundiag_2011_12_08_05_47/CmdTool.log
sundiag_2011_12_08_05_47/MegaSAS.log
sundiag_2011_12_08_05_47/firstnode_megacli64-PdList_short_2011_12_08_05_47.out
sundiag_2011_12_08_05_47/firstnode_megacli64-GetEvents-all_2011_12_08_05_47.out
sundiag_2011_12_08_05_47/firstnode_megacli64-FwTermLog_2011_12_08_05_47.out
sundiag_2011_12_08_05_47/firstnode_megacli64-CfgDsply_2011_12_08_05_47.out
sundiag_2011_12_08_05_47/firstnode_megacli64-BbuCmd_2011_12_08_05_47.out
sundiag_2011_12_08_05_47/firstnode_megacli64-LdPdInfo_2011_12_08_05_47.out
sundiag_2011_12_08_05_47/firstnode_megacli64-PdList_long_2011_12_08_05_47.out
sundiag_2011_12_08_05_47/firstnode_megacli64-LdInfo_2011_12_08_05_47.out
sundiag_2011_12_08_05_47/firstnode_megacli64-status_2011_12_08_05_47.out
==============================================================================
Done the report files are in bzip2 compressed /tmp/sundiag_2011_12_08_05_47.tar.bz2
==============================================================================


Often, Oracle support would ask you to run the sundiag on multiple nodes. It's time-consuming and cumbersome to generate sundiag outputs from multiple nodes and then perhaps bring them to one node, again tar and zip them and ftp to the Oracle support. dcli, my favorite command at Exadata, comes to rescue there and runs sundiag for you on all the nodes and then by using dcli you can bring the sundiag output files created on all the nodes to a single node. For example, below sundiag has been run from the first node by using dcli:

1. Run the sundiag from first node with dcli:

 [root@firstnode onecommand]# dcli -g all_group -l root /opt/oracle.SupportTools/sundiag.sh 2&>1


2. Verify there is output in /tmp on each node:

 [root@firstnode onecommand]# dcli -g all_group -l root --serial 'ls -l /tmp/sundiag* '

3. Copy the files to first node and before that sort them by hostname into directories, as they will likely mostly have the same filename with the same date stamp:

 [root@firstnode onecommand]# for H in `cat all_group`; do mkdir /tmp/sundiagdir/$H ; scp -p $H:/tmp/sundiag*.tar.bz2 /tmp/sundiagdir/$H ; done

4. Verify that files have arrived at first node:

 [root@firstnode onecommand]# cd /tmp/sundiagdir
 [root@firstnode ~]# ls -l

5. Now tar and zip the files

 [root@firstnode ~]# tar cvf sundiag.tar.bz2 *

6. Upload this file to the Service Request through ftp.oracle.com

No comments: