<Insert Picture Here>
ASM Troubleshooting
Yahoo!
August 2009
Kevin Moore
Technical Lead, Advanced Customer Services
       ASM L & L Topics
            1.    ASM init.ora Parameters
            2.    ASM Alert log messages
            3.    Yahoo! Alert Log & messages
            4.    ASM data Gathering
            5.    Troubleshooting Scenarios
            6.    Instance Events
            7.    Instance Tracing
            8.    ASM Rebalancing operations
            9.    ASM Extent management
            10.   Performance Considerations
            11.   ASM Templates
            12.   Background Processes
            13.   ASM Views
            14.   ASMCMD Commands
            15.   New 11g Commands
            16.   ASM MySupport Documents
Advanced Customer Services
         ASM initSID.ora
     •   ##############################################################################
     •   # Copyright (c) 1991, 2001, 2002 by Oracle Corporation
     •   ##############################################################################
     •   ###########################################
     •   # Cluster Database
     •   ###########################################
     •   cluster_database=true
     •   ###########################################
     •   # Miscellaneous
     •   ###########################################
     •   diagnostic_dest=/home/oracle
     •   instance_type=asm
     •   ###########################################
     •   # Pools
     •   ###########################################
     •   large_pool_size=12M
     •   asm_diskgroups='DATA'
     •   +ASM2.instance_number=2
     •   +ASM1.instance_number=1
Advanced Customer Services
         ASM Alert Log
     •   Mon Aug 24 15:14:10 2009
     •   Starting ORACLE instance (normal)
     •   LICENSE_MAX_SESSION = 0
     •   LICENSE_SESSIONS_WARNING = 0
     •   Interface type 1 eth0 192.168.1.0 configured from OCR for use as a cluster interconnect
     •   Interface type 1 eth1 67.0.0.0 configured from OCR for use as a public interface
     •   Picked latch-free SCN scheme 2
     •   Using LOG_ARCHIVE_DEST_1 parameter default value as
         /home/oracle/oracle/product/11.1.0/rdbms/dbs/arch
     •   Autotune of undo retention is turned on.
     •   LICENSE_MAX_USERS = 0
     •   SYS auditing is disabled
     •   Starting up ORACLE RDBMS Version: 11.1.0.6.0.
     •   Using parameter settings in server-side pfile /home/oracle/oracle/product/11.1.0/rdbms/dbs/init+ASM1.ora
     •   System parameters with non-default values:
     •     large_pool_size       = 12M
     •     instance_type        = "asm"
     •     cluster_database       = TRUE
     •     instance_number         =1
     •     asm_diskgroups         = "DATA"
     •     diagnostic_dest       = "/home/oracle"
     •   Cluster communication is configured to use the following interface(s) for this instance
     •     192.168.1.161
     •   cluster interconnect IPC version:Oracle UDP/IP (generic)
     •   IPC Vendor 1 proto 2
Advanced Customer Services
             Yahoo! ASM Alert Log
       •   Sun May 4 00:19:05 2008
       •   kjbdomatt send to node 0 * One line for each node *
       •   kjbdomatt send to node 1
       •   kjbdomatt send to node 2
       •   NOTE: F1X0 found on disk 0 fcn 0.0
       •   NOTE: cache opening disk 1 of grp 2: DISK116 label:DISK116 * One line for each node *
       •   NOTE: cache opening disk 2 of grp 2: DISK117 label:DISK117
       •   NOTE: attached to recovery domain 2
       •   Sun May 4 00:19:14 2008
       •   NOTE: recovering COD for group 1/0x8ccb7277 (DATA)       * Metadata for tracking long running trx *
       •   SUCCESS: completed COD recovery for group 1/0x8ccb7277 (DATA)
       •   Sun May 4 00:19:14 2008
       •   NOTE: opening chunk 14 at fcn 0.0 ABA
       •   NOTE: seq=2 blk=0
       •   Sun May 4 00:19:14 2008
       •   NOTE: cache mounting group 2/0x8CDB7278 (TEMP) succeeded
       •   SUCCESS: diskgroup TEMP was mounted
       •   Sun May 4 00:19:17 2008
       •   NOTE: recovering COD for group 2/0x8cdb7278 (TEMP)
       •   SUCCESS: completed COD recovery for group 2/0x8cdb7278 (TEMP)
       •   NOTE: enlarging ACD for group 1/0x8ccb7277 (DATA)
       •   Sun May 4 00:21:10 2008
       •   SUCCESS: ACD enlarged for group 1/0x8ccb7277 (DATA)            * Metadata REDO *
       •   NOTE: enlarging ACD for group 2/0x8cdb7278 (TEMP)
       •   SUCCESS: ACD enlarged for group 2/0x8cdb7278 (TEMP)
Advanced Customer Services
       ASM Data gathering
     • Please gather all files from the ASM bdump and udump directories
       covering the specified time frame of the problem - be sure to include
       alert logs for ALL ASM instances.
     • For Hang/Performance issues, please gather System state dumps from
       ASM instances
     • Please use the script below for querying ASM views, and provide the
       spooled output (each instance).
     set newpage none
          set feedback off
          set heading off
          set termout off
          column grp format 99
          column disk format 99999
          column lxn format 999
          column flg format 999
          column chk format 999
          spool asm
          select group_number as grp, name, state, type, total_mb, free_mb from v$asm_diskgroup;
          select rpad('>', 10, '>'), to_char(sysdate, 'MON DD HH24:MM:SS') from dual;
          select group_kfdat, number_kfdat, aunum_kfdat, v_kfdat, fnum_kfdat, i_kfdat, xnum_kfdat, raw_kfdat from x$kfdat;
          select rpad('>', 10, '>'), to_char(sysdate, 'MON DD HH24:MM:SS') from dual;
          select grp, disk, NUMBER_KFDPARTNER, PARITY_KFDPARTNER, ACTIVE_KFDPARTNER from x$kfdpartner;
          select rpad('>', 10, '>'), to_char(sysdate, 'MON DD HH24:MM:SS') from dual;
          select group_kffxp as grp, number_kffxp as num, incarn_kffxp as incarn, PXN_KFFXP, XNUM_KFFXP, LXN_KFFXP as lxn, DISK_KFFXP as
          disk, AU_KFFXP, FLAGS_KFFXP as flg, CHK_KFFXP as chk from x$kffxp;
          select rpad('>', 10, '>'), to_char(sysdate, 'MON DD HH24:MM:SS') from dual;
          set linesize 1500
          select GROUP_NUMBER, DISK_NUMBER, INCARNATION, MOUNT_STATUS, HEADER_STATUS, MODE_STATUS, STATE, LIBRARY,
          TOTAL_MB, FREE_MB, NAME, FAILGROUP, LABEL, PATH, CREATE_DATE, MOUNT_DATE, READS, WRITES, READ_ERRS,
          WRITE_ERRS, READ_TIME, WRITE_TIME, BYTES_READ, BYTES_WRITTEN from v$asm_disk;
          spool off
          exit
Advanced Customer Services
         ASM Troubleshooting Scenarios
     •     ASM space issues
          1.   ASM level errors
               •   ORA-15041
               •   ORA-15047
          2. RDBMS level errors when storage is on ASM
          3. Inconsistencies between what is perceived as the available space
          4. Inconsistencies between V$ASM_DISKGROUP and X$ views
          Note #351117.1 - Information to gather when diagnosing ASM space issues contains
              scripts for collecting specific ASM information
Advanced Customer Services
         ASM Troubleshooting Scenarios
     •        ASM Disk Missing
         1.     Use OS utilities to determine which disk cannot be found
                TRUSSing or STRACEing the RBAL process while selecting * from v$asm_disk can often show errors in the path of the
                command
                SESSION #1
                strace -f -o /tmp/rbal.trc -p <OS pid of RBAL process>
                  <OR>
                truss -ef -o /tmp/rbal.out -p <OS pid for RBAL process>
                SESSION #2
                select * from v$asm_disk
                SESSION #3
                tail –f /tmp/rbal.trc
                Examine the rbal.out for errors:
                1147090: 1871929: chdir("dev/") = 0
                1147090: 1871929: statx("rhdisk8, ", 0x0FFFFFFFFFFFAA80, 176, 010) Err#2 ENOENT
                This says that rhdisk8 cannot be found
         2.     ORA-15063: ASM discovered an insufficient number of disks for diskgroup s%
                ORA-15040: diskgroup is incomplete
                ORA-15042: ASM disk "%" is missing
         Note #452770.1- ASM disk not found/visible/discovered issues
Advanced Customer Services
         ASM Troubleshooting Scenarios
     •         ASM is Unable to Detect ASMLIB Disks/Devices
          1.     First of all, please scan the disks (on all the nodes if RAC):
                 dbaasm.us.oracle.com:+ASM:oracle:11g>/etc/init.d/oracleasm scandisks
                 Scanning system for ASM disks: OK ]
                 2) Second, make sure the disks can be listed :
                 dbaasm.us.oracle.com:+ASM:oracle:11g>/etc/init.d/oracleasm listdisks
                 VOL1_10G
                 VOL2_10G
                 3) Query each disks:
                 dbaasm.us.oracle.com:+ASM:oracle:11g>/etc/init.d/oracleasm querydisk VOL1_10G
                 Disk "VOL1_10G" is a valid ASM disk on device [3, 18]
                 dbaasm.us.oracle.com:+ASM:oracle:11g>/etc/init.d/oracleasm querydisk VOL2_10G
                 Disk "VOL2_10G" is a valid ASM disk on device [3, 22]
                 4) Check if they exist at OS level:
                 dbaasm.us.oracle.com:+ASM:oracle:11g>ls -l /dev/oracleasm/disks/VOL1_10G
                 brw-rw---- 1 oracle dba 3, 18 Aug 13 09:54 /dev/oracleasm/disks/VOL1_10G
                 dbaasm.us.oracle.com:+ASM:oracle:11g>ls -l /dev/oracleasm/disks/VOL2_10G
                 brw-rw---- 1 oracle dba 3, 22 Aug 13 09:55 /dev/oracleasm/disks/VOL2_10G
                 5) Then, in the initialization parameter file set the discovery disks string parameter as follow:
                 asm_diskstring =ORCL:*
                 Note: Also, you can set it thru the DBCA (during the diskgroup(s) creation) by pressing the [Change Disk Discovery Path]
                 button.
Advanced Customer Services
         ASM Troubleshooting Scenarios
     •         ASM is Unable to Detect ASMLIB Disks/Devices (LINUX Specific)
          1.     6) If the problem persists then you can set the discovery disks string as follow:
                 asm_diskstring = /dev/oracleasm/disks/*
                 7) As workaround you can setasm_diskstring = /dev/oracleasm/disks/*, this is possible for Oracle 10g Release 2 and onwards since it
                 can access block devices. Oracle uses O_DIRECT flag, which can be used for opening block devices to bypass the OS cache.
                 8) If the problem persists, please open a new service request with Oracle support and then please provide us the next information
                 (from all the nodes if RAC) :
          2.     Upload the next files:
          3.     =======================================
                 =)> /var/log/messages
          4.     =)> New /etc/sysconfig/oracleasm
                 =)> alert+ASM#.log for each instance.
                 ================================
                 And the output of the next commands
          5.     ================================
          6.
                 $> cat /etc/*release
                 $> uname -a
                 $> rpm -qa |grep oracleasm
                 $> df -ha
                 $> ls -l /dev/oracleasm/disks
                 $> powermt display dev=emcpower# (On all the partitions if using PowerPath from EMC)
          7.     ================================
                 $> /etc/init.d/oracleasm status
                 $> usr/sbin/oracleasm-discover
                 $> /usr/sbin/oracleasm-discover 'ORCL:*'
                 SQL> show parameter asm
          Note #457369.1- ASM is Unable to Detect ASMLIB Disks/Devices
Advanced Customer Services
          ASM Instance Events
       • Applicable Event Levels (15xxx)
           •   Level 7 - DEBUG - Trace information for ASM/OSM debugging purposes only
           •   Level 6 - NLOOPS - Trace deeply nested loops within a function
           •   Level 5 - LOOPS - Trace loops within a function
           •   Level 4 - CALLS - Trace function call entry
           •   Level 3 - NORMAL - Trace normal paths within a function
           •   Level 2 - WARN - Trace warning paths within a function
           •   Level 1 - ERROR - Trace error paths within a function
           •   Kx 0x0000010 /* Array portion flags */
               Kxx 0x0000020 /* Alias-Directory operations */
               Kxx 0x0000040 /* Block validation interface */
               Kxx 0x0000080 /* metadata cache */
               Kxx 0x0000100 /* disk operations */
               Kxx 0x0000200 /* file operations */
               Kxx 0x0000400 /* disk group operations */
               Kxx 0x0000800 /* I/O layer (to ASMLIB or KSFD) */
               Kxx 0x0001000 /* node monitor (ie CSS interface) */
               Kxx 0x0002000 /* network layer (ie RDBMS-ASM connections) */
               Kxx 0x0004000 /* PLSQL package */
               Kxx 0x0008000 /* recovery */
               Kxx 0x0010000 /* templates */
               Kxx 0x0020000 /* SQL execution (processing ASM SQL commands) */
               Kxxx 0x0040000 /* ASM DBWR */
               Kxxx 0x0080000 /* ASM LGWR */
               Kxxx 0x0100000 /* I/O handles mirroring, striping, etc. */
Advanced Customer Services
         ASM Instance Tracing
     • Trace RBAL process
     •   [oracle@rac1 ~]$ ps -ef | grep rbal
     •   oracle 7745 1 0 09:24 ?           00:00:02 asm_rbal_+ASM1
     •   oracle 9255 1 0 09:27 ?           00:00:00 ora_rbal_whsed1
     •   oracle 9971 5367 0 11:31 pts/1 00:00:00 grep rbal
     •   [oracle@rac1 ~]$ strace -f -o /tmp/rbal.trc -p 7745
     •   Process 7745 attached - interrupt to quit
     •   Process 7745 detached
     •   more /tmp/rbal.trc
     •   7745 semtimedop(163842, 0xbfb973f4, 1, {2, 350000000}) = -1 EAGAIN (Resource te
     •   mporarily unavailable)
     •   7745 gettimeofday({1251917133, 714243}, NULL) = 0
     •   7745 gettimeofday({1251917133, 714337}, NULL) = 0
     •   7745 gettimeofday({1251917133, 714395}, NULL) = 0
     •   7745 getrusage(RUSAGE_SELF, {ru_utime={2, 79683}, ru_stime={1, 81835}, ...}) =
     •   7745 sendmsg(13, {msg_name(16)={sa_family=AF_INET, sin_port=htons(32963), sin_a
     •   ddr=inet_addr("192.168.1.162")}, msg_iov(2)=[{"\4\3\2\1\327\263\200\0\0\0\0\0MRO
     •   N\0\1\0\0\220\0\0\0\1"..., 68}, {"KSXP\2\0\0\0\1\0\2\0\20\0\0\0\4\0\0\0\0\0\0\0\
     •   0\0\0\0r"..., 144}], msg_controllen=0, msg_flags=0}, 0) = 212
     "buffer busy“ or “rdbms ipc reply” events
Advanced Customer Services
         ASM Rebalancing
     •     Rebalancing is the activity of spreading data amongst disks in
           an ASM group
     •     Happens in the background but can be done manually
     •     Internally the balance happens on a file per file basis
     •     Only one RBAL process runs per node
     •     Rebalance request on the same diskgroup are done serially
     •     ASM decides how best to balance load across available disks
     •     Uses one of three allocation schemes for selecting disks
          1. Placement by file/extent number
          2. Random-seeded ordering of all disks in the ASM disk directory
          3. Balanced placement over all disks
Advanced Customer Services
       ASM Rebalancing
     •        Parallel execution based on rebalance POWER
          •    POWER settings are 1-11 (default 1)
          •    Used to throttle overhead during normal operations
          •    Rebalance moves 1mb chunks at a time
          •    Setting POWER to 0 defers rebalancing to another time
Advanced Customer Services
         ASM Rebalancing
          Displaying & changing rebalance POWER setting
     •        SQL> show parameter limit
              NAME                                 TYPE          VALUE
              ------------------------------------ ----------- ------
              asm_power_limit                          integer 1
     •        Changing setting
          •       SQL> alter diskgroup dg1 rebalance power 8;
     •        Verifying Change
          •       SQL> select * from v$asm_operation;
                  GROUP_NUMBER OPERA STAT                            POWER ACTUAL                  SOFAR EST_WORK EST_RATE
                  ------------ ----- ---- ---------- ---------- ---------- ---------- ----------
                           1 REBAL RUN                   8         8         0        407        0
Advanced Customer Services
         ASM AU/Extent Management
     •   Allocation Units (AU) at the disk level and Extents at the file level
     •   Default AU size is 1mb
     •   Default extent size is 1mb
     •   Extents are allocated in 1, 4, 16, & 64mb chunks (11g)
     •   Extent placement is circular when disks are the same size
     •   Cannot be changed without recreating the diskgroup
     •   Templates can be created and added to diskgroups
Advanced Customer Services
            ASM Performance Considerations
     • Metadata ONLY is Cached In The ASM Instance
     • ASM Diskgroup Configuration
         • External Redundancy
         • Normal Redundancy (default)
         • High Redundency
     • ASM Instance Configuration (large_pool_size)
         • Resolving ORA-4031
     •   ASM Allocation Unit Size (1mb default)
     •   ASM Fine Grained Stripe Size (8x128k Stripes)
     •   MAX I/O Size
     •   Oracle Block Size
Advanced Customer Services
            ASM Default Template
     •   Archivelog Files - Coarse
     •   Autobackup       - Coarse
     •   Controlfile      - Fine Grained
     •   Datafile         - Coarse
     •   Flashback data - Fine Grained
     •   Online REDO - Fine Grained
     •   SPFILE           - Coarse
     •   Tempfile          - Coarse
     Coarse – 1mb stripe size
     Fine Grained – 8 x 128k stripes
Advanced Customer Services
            ASM Templates
     • Striping Attributes – Fine, Coarse
     • Redundancy Attributes
         • Mirror – 2 way
         • High – 3 way
         • Unprotected – Not mirrored
Advanced Customer Services
            ASM Templates
     • Viewing Template
         • select * from V$ASM_TEMPLATE;
     • Altering Template
         • Alter diskgroup DG modify template NAME attributes (coarse/fine);
     • Adding Template
         • Alter diskgroup DG add template NAME attributes (attributes);
     • Dropping Templates
         • Alter diskgroup DG drop template NAME;
Advanced Customer Services
            ASM Background Processes
  •   ora_asmb_whsed1 - Foregrounds servicing clients commands from client <procname> of database
  •   asm_pmon_+ASM1 - Process monitor, same as database
  •   asm_vktm_+ASM1 - Process to maintain a fast timer, same as database
  •   asm_diag_+ASM1 - Diag process, same as database
  •   asm_ping_+ASM1 - Process to measure network latency, same as database
  •   asm_psp0_+ASM1 - Process that Starts other Processes, used to startup other backgrounds
  •   asm_dia0_+ASM1 - Diag slave process, same as database
  •   asm_lmon_+ASM1 - Lock monitor, Same as database
  •   asm_lmd0_+ASM1 - Lock monitor diag, Same as database
  •   asm_lms0_+ASM1 - Lock monitor slaves, same as database
  •   asm_mman_+ASM1 - Autotune SGA process, Same as Database.
  •   asm_dbw0_+ASM1 - DB writes, same as database DB writer, but deals with ASM cache
  •   asm_lgwr_+ASM1 - Log writer, similar to database, but deals with diskgroups
  •   asm_ckpt_+ASM1 - Checkpoint process, Similar to database CKPT
  •   asm_smon_+ASM1 - Recovery process, Same as database SMON, but deals with diskgroup recovery
  •   asm_rbal_+ASM1 - Background process that is used for diskgroup management
  •   asm_gmon_+ASM1 - Group monitor, used for partner and status table, and node membership
  •   asm_lck0_+ASM1 - Lock monitor slave, Same as database
Advanced Customer Services
         ASM Views (10g & 11G)
                                                                                 <Insert Picture Here>
          View                                    Contents
V$ASM_ALIAS                  Alias for each disk group mounted by the ASM
                             instance
V$ASM_CLIENT                 Identifies databases using disk groups managed by
                             the ASM instance.
V$ASM_DISK                   Disks discovered by the ASM instance
V$ASM_DISKGROUP              Disk groups known by the ASM instance
V$ASM_FILE                   File list for each disk group mounted by the ASM
                             instance
V$ASM_OPERATION              Long running operations executing in the ASM
                             instance
V$ASM_TEMPLATE               Templates present in each ASM mounted disk group
Advanced Customer Services
      ASMCMD Command Reference
                                                                                              <Insert Picture Here>
  • cd - Changes the current directory to the specified directory.
  • du - Displays the total disk space occupied by ASM files in the specified
         ASM directory
  • exit - Exits ASMCMD.
  • find - Lists the paths of the specified name (with wildcards) under the
           specified directory.
  • help - Displays the syntax and description of ASMCMD commands.
  • ls - Lists the contents of an ASM directory, attributes of the sfile, or the names and attributes
         of all disk groups.
  • lsct - Lists information about current ASM clients.
  • lsdg - Lists all disk groups and their attributes.
  • mkalias - Creates an alias for a system-generated filename.
  • mkdir - Creates ASM directory.
  • pwd - Displays the path of the current ASM directory.
  • m - Deletes the specified ASM files or directories.
  • rmalias - Deletes the specified alias, retaining the file that the alias
Advanced Customer Services
         New 11g ASM Commands
                                                                                        <Insert Picture Here>
   cp - Enables you to copy files between ASM disk groups on local instances and
   remote instances.
   lsdsk -ASM can list disk information with or without a running ASM instance. Also
   useful for system or storage administrators to obtain lists of disks that
   an ASM instance uses.
   md_backup and md_restore - These commands enable you to re-create a pre-existing ASM
   disk group with the same disk path, disk name, failure groups, attributes,templates and alias
   directory structure. You can use md_backup to back up the disk group environment and use
   md_restore to re-create the disk group before loading from a database backup.
   remap - You can remap and recover bad blocks on an ASM disk in normal or high redundancy
   that have been reported by storage management tools such as disk scrubbers. ASM reads from
   the good copy of an ASM mirror and rewrites these blocks to an alternate location on disk.
Advanced Customer Services
         MySupport ASM References
                                                                               <Insert Picture Here>
    Note: 340417.1 - Data Gathering for Troubleshooting ASM Issues
    Note: 267982.1 - Automatic Storage Management (ASM) Knowledge Browser Product Page
    Note:824354.1 - How To Trace ASMCMD on Unix
    Note:351866.1 - How To Reclaim ASM Disk Space
    Note:345180.1 - How to duplicate a controlfile when ASM is involved
    Note:553319.1 - ORA-15036 When Starting An ASM Instance
Advanced Customer Services
Advanced Customer Services
Advanced Customer Services