Troubleshooting Backup - Oracle iDataAgent

Backup Restore  

Backup Failures

The following section provides information on troubleshooting backups.

Increase in sbtio.log file size

Sometimes, jobs fail due to increase in the size of sbtio.log file in the $UDUMP directory.

To resolve this, set the size limit for thesbtio.log file using the sMAXORASBTIOLOGFILESIZE registry key. Once the specified size limit is reached, the sbtio.log file gets pruned automatically.

Command Line Backup Failures
  • Make sure if the required media resource is available and then run the backups once again.
  • For on demand backups, you can run more than one script for an instance. However, backup jobs will fail if there are more than one instance in the argument file.
  • For Oracle on Windows, it is recommended to avoid using a space after a comma in the argument file. A backup job will fail if you leave a space after a comma in the argument file.
  • RMAN command line backup fails with the following error

    "Unable to open lock file /opt/calypso/Base/Temp/locks/.dir_lock: Permission denied"

    This may occur if the unmask parameter is set as 022 in the .profile file for the Oracle instance. As a workaround, change the unmask to 000 or 002 and try the backup again.

  Sometimes, the third party command line jobs may hang when you perform large backups and restores.
This happens since ClOraControlAgent updates the job manager for every 100MB data transfer and this causes the thread failure for large backups/ restores after transferring some of the data. The following exception will be seen in the clOraControlAgent.log:

5710030 304 02/22 03:47:23 608119 OraAgentBase::NotifyCommServeJobContinue() - m_jobObject->setUnCompBytesToAdd(105119744) ...
5710030 304 02/22 03:47:24 608119 CvThread::start_func() - Unhandled exception.
5710030 405 02/22 03:47:37 608119 ClOraControlAgent::OnClientTimeout() - Got timed out while waiting for msg from client 0
 

You can set sBYTESDIFFMBS registry key <value> in MBs in OracleAgent/.properties.
This will update the job manager at every <value> in MBs specified in the key.

Offline backups fails when using lights out script
  • Offline backups using lights out script fails with the following error:

    RMAN error "ORA-12528 TNS listener - all appropriate instances are blocking new connections

    As a workaround, add a reference to the database in the listener.ora file as shown in the example below:

    SID_LIST_LISTENER =

    (SID_LIST =

    (SID_DESC =

    (SID_NAME = PLSExtProc)

    (ORACLE_HOME = C:\oracle\product\10.1.0\db_1)

    (PROGRAM = extproc)

    )

    (SID_DESC =

    (SID_NAME = rman10g)

    (ORACLE_HOME = C:\oracle\product\10.1.0\db_1)

    (SID = rman10g)

    )

    )

    Oracle offline backup with lights out option fails when you use the default value for retry attempts for the subclient. As a workaround, increase the retry attempts by setting the Tries number value greater than or equal to 5. See Configuring Lights Out Script for Offline Backups for more details.

Time Out Failures The default time for resources to allocate streams during RMAN command line backups is 86400 seconds (i.e., 24 hours). If a backup fails due to a timeout being reached, you can configure the sALLOCATESTREAMSECS registry key to increase the waiting time period.
Backup Failures
  • If the following line is present in the $ORACLE_HOME/sqlplus/admin/glogin.sql file, it may cause the SrvOraAgent server process on the CommServe to fail when browsing database contents or executing a backup.

    set linesize 80

    To avoid such failures, comment out that line from the file and re-try the browse or backup operation.

  • Backup fails with following error:

    Character conversion not supported

    By default, the NLS_LANG variable on the client computer is set to American_America.US7ASCII character. If the Oracle instance uses NLS_LANG set to a non American_America.US7ASCII character, the Oracle backup operations will fail.

    In such cases, use the <oracle_SID>_NLS _LANG registry key to set the NLS_LANG environment variable to the non American_America.US7ASCII character on the client computer.

Backup Fails on Red Hat Enterprise Linux 4 with oracle version 10.1.0.5 32Bit

Issue:

The backup may fail with the following error on Red Hat Enterprise Linux 4 with oracle version 10.1.0.5 32Bit as there is a known oracle issue with libunwind.so.3 file:

channel ch1: starting piece 1 at Jul 12 2013 16:46:08
PID 30152, signal 6 (Aborted), address 0x75c8
[bt]: (1) /lib/tls/libpthread.so.0 [0x622890]
[bt]: (2) /lib/ld-linux.so.2 [0x3b07a2]
[bt]: (3) /lib/tls/libc.so.6(gsignal+0x55) [0x3f57a5]
[bt]: (4) /lib/tls/libc.so.6(abort+0xe9) [0x3f7209]
[bt]: (5) /soft/oracle/product/db/10.1.0.5/lib/libunwind.so.3(GetCurrentFrame32+0xdc) [0xb7ffd0ce]
[bt]: (6) /soft/oracle/product/db/10.1.0.5/lib/libunwind.so.3(_Unwind_RaiseException+0x5b) [0xb7ffc86b]
[bt]: (7) ./libstdc++.so.6(__cxa_throw+0x5d) [0xb60a126d]
[bt]: (8) ./libCvLib.so(_ZN10CvFwDaemonC1EPKcbii+0x2ee) [0xb6207c00]
[bt]: (9) ./libCvLib.so(_ZN10CvFwClient7connectEPKcS1_iiiiPFvR9CQiSocketPvES4_b+0xf6f) [0xb6211acb]
[bt]: (10) ./libCvSession.so(_ZN9CVSession16socketConnectionEPKcS1_+0x261) [0xb72cd4f1]
[bt]: (11) ./libCvSession.so(_ZN9CVSession9getSocketEPKcS1_+0x135) [0xb72cddd5]
[bt]: (12) ./libCvSession.so(_ZN9CVSession13getConnectionEPKvPKc+0x11b) [0xb72cdf1b]
[bt]: (51) oracleHWRHDEV(main+0xbb) [0x82816bf]
[bt]: (52) /lib/tls/libc.so.6(__libc_start_main+0xd3) [0x3e2de3]
[bt]: (53) oracleHWRHDEV(ldxsto+0x1d1) [0x828157d]
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ch1 channel at 07/12/2013 16:46:33
RMAN-10038: database session for channel ch1 terminated unexpectedly
RMAN>
Recovery Manager complete.
]
3 16:46:33
RMAN-10038: database session for channel ch1 terminated unexpectedly
RMAN>
Recovery Manager complete.
]

Resolution:

Upgrade your oracle version from 10.1.x to 10.2 to avoid the backup failure on Red Hat Enterprise Linux 4.

Database block corruption Oracle backups fail with the following error:

LISTING 2: r_20030520213618.log

RMAN-00571: ===========================================================

RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============

RMAN-00571: ===========================================================

RMAN-03009: failure of backup command on d1 channel at 05/20/2003 21:36:26

ORA-19566: exceeded limit of 0 corrupt blocks for file

/u01/app/Oracle/oradata/MRP/sales_data_01.dbf

Make sure that the maximum value for database block corruptions is set for the backup. It is recommended that you set this value to match the number of corrupted database blocks identified by RMAN for the database file being backed up.

Backups fail intermittently on Linux clients On Linux clients, if the libobk.so library fails to load, the backups may fail.

As a workaround, do the following steps:

  1. Log in to the Oracle client computer as root.
  2. From the system prompt, enter the following command:

    ldconfig /<Base_directory_name>

    For example: # ldconfig <software installation path>/Base

This will ensure that the libobk.so library is loaded so that backups for Oracle on Linux can run successfully.

Backup fails on Windows Clients Make sure that the Oracle user is part of administrator group. If the user is not part of administrator group, assign group permissions for the user as follows:
  1. From Windows Explorer, right-click Calypso folder and then select Properties.
  2. Click the Security tab.
  3. Select the user and click Edit.
  4. Click the Allow checkbox for Full Control permission for the user, and then click OK.
  5. From the Registry Editor, navigate to HKEY_LOCAL_MACHINE | SOFTWARE.
  6. Right click CommVault Systems and select Permissions...
  7. Select the user and click Allow checkbox for Full Control permission.
Log backups failure
  • If the Oracle database is configured to save the archive logs in the Flash recovery area, and Oracle subclients having both Protect backup recovery area and Archive Delete enabled at the same time then the backup will fail.

    To resolve this, there should be two different subclients, one for Protect backup recovery area and the other for Archive Delete.

  • Log backup fails if you select the default  USE_DB_RECOVERY_FILE_DEST entry as a log destination for the backup.

    To resolve this, make sure that the log destinations are included in the PFile(init<SID>.ora) or SPFile (spfile.ora) file. Also ensure that the correct log destination is selected for the backup.

Backup fails on Linux clients because of UNKNOWN Instance Status Backups may fail on Linux clients if the Oracle instance status is shown as UNKNOWN on CommCell Console.

To resolve this issue, make sure the nproc value in /etc/security/limits.d/90-nproc.conf file is greater than 1024.

Shared Memory Error Issue:

The backup failed because the shared memory on the HP-UX PA-RISC client has not been configured per operational guidelines.

Resolution:

Add the DisableIPC_GLOBAL file in the /apps/simpana/Base directory on the client where the backup failed.

  1. Stop the Calypso software.
  2. Create an empty file called DisableIPC_GLOBAL in the /apps/simpana/Base directory. From the command line, enter the following:

    touch /apps/simpana/Base/DisableIPC_Global

  3. Restart the Calypso software.

Troubleshooting Performance Issues

If you are experiencing performance issues during backup, you can troubleshoot them by enabling logging of performance details in the log files. These performance counters contain information that help in resolving the performance related issues during backups.

  1. Use the following registry to display the performance details for a specific backup job.
    Registry Key Location Supported Values
    sORASBTPERFSTAT
    • For Windows:

      HKEY_LOCAL_MACHINE\Software\CommVault Systems\Galaxy\Instance<xxx>\OracleAgent

    • For Unix: /etc/CommVaultRegistry/Galaxy/Instance<xxx>/OracleAgent/.properties
    Y or Yes to enable.

    The following performance counters will be printed in the log files:

    Total Oracle I/O Time

    Time spent per SBT thread for reading the data from disk.

    Total MA I/O Time

    Time spent during data transfer to MediaAgent i.e., data read from the network buffer and written to the disk.

  2. Perform a client backup to determine the performance statistics. To perform a backup, see Getting Started Backup - Oracle iDataAgent for step-by-step instructions.

    You can track the progress of the job from the Job Controller window of the CommCell Console.

  3. View log files of backup job to verify performance counters. See View the Log Files of a Job History for step-by-step instructions.
  4. In the log file verify the above performance counters.

    If the Total Oracle I/O Time value is more than the Total MA I/O Time value then perform the following to improve performance:

    If the Total Oracle I/O Time value is lesser than the Total MA I/O Time value then perform the following to improve performance:

Completed with One or More Errors

Backup jobs from Oracle iDataAgent will be displayed as "Completed w/ one or more errors" in the Job History in the following cases:

Oracle Errors

If you receive an Oracle error during an Oracle backup operation, we recommend that you follow procedures published by Oracle Corporation on resolving the specific error. We also advise you to consult with your on-site Oracle database administrator, as needed.