Job Management

The Job Controller window displays all the current jobs in the CommCell. A status bar at the bottom of the job controller shows the total amount of jobs; the amount of jobs that are running, pending, waiting, queued and suspended; and the high and low watermarks. The watermarks indicate the minimum and maximum number of streams that the Job Manager can use simultaneously.

Viewing Job Information

Information about a job is continually updated and available in the Job Controller or Job History window. When a job is finished, the job stays in the Job Controller for five minutes. Once a job is finished, more information about that job is obtainable using the Job History.

The following job information is displayed, depending on the selected Job History:

Job ID

A unique number allocated by the Job Manager that identifies the data protection, data recovery, or administration operation.

Operation

The type of data protection, data recovery, or administration operation being conducted.

Client/Client Computer

For data protection operations, the client computer to which the backup set and subclient belong. For data recovery operations, the computer from which the data originated.

Destination Client

The destination client to which the recovered data will be stored.

Agent Type

The agent that is performing the operation. (e.g., Windows 2000 File System).

Instance/Partition

The instance/partition in the client computer that represents the database that was included in this operation.

Subclient

The subclient that was protected during the operation. Note that a deleted subclient will have a Unix time stamp appended to its name in cases where another subclient is currently using the same name as the deleted subclient.

Job Type

The type of operation that is being conducted on data.

Backup Type

The type of backup that was conducted: Differential, Full, Incremental or Synthetic.

Failed Folders

The number of folders that were not included in the operation.

Failed Files

The number of files that were not included in the operation.

Storage Policy

The storage policy to which the operation is being directed.

MediaAgent

The MediaAgent to which the operation is being directed.

Status

The status of the operation. For job status descriptions, see Job Status Levels

Progress

A status bar indicating its progress. The progress bar is not visible for certain operations (e.g., data aging) or for the initial phases of some data protection operations.

The Job Controller progress bar will not display the progress of SAP for MAXDB backup and restore jobs accurately. This is true because Calypso cannot detect data or objects transferred by SAP for MAXDB due to the way SAP for MAXDB transfers these items.

Errors

Displays any errors that have occurred during the operation, such as a hardware problem or the job has run outside of an operation window. (See Job Errors for more information.)

Backup Set

The backup set that was protected/recovered during the operation and to which the subclient belongs.

Index

Displays New Index to indicate a new index was created during the operation. If blank, a new index was not created.

Instance/Partition

The instance/partition in the client computer that represents the database that was included in this operation.

Phase

The current phase of the operation. The number of phases varies depending on the operation.

User Name

The name of the user who initiated the operation.

Priority

The priority that is assigned to the operation. (For more information, see Job Priorities and Priority Precedence).

Start/Start Time

The date and time on the CommServe when the operation started.

End Time

The date and time on the CommServe when the operation was completed.

Elapsed

The duration of time consumed by the operation.

Libraries

The libraries that is being used by the operation.

Drives/Mount Paths

The drives/mount paths that are being used by the operation. For more information about media, see Media Operations.

Last Update Time

The last time the Job Manager received job updates for the operation.

Transferred

The amount of data that has been transferred for the operation at the present time.

Estimated Completion Time

The time that the system estimates for this job to be completed.

Size on Media

The amount of compressed data that was transferred to the media (excluding duplicated data).

The amount displayed is a compressed amount and includes valid and invalid attempts of the backup jobs.
Application data that is backed up may include sparse files. As a result, the displayed size of the data may be greater than expected.
If viewing from the storage policy copy level, amount displayed may be less if job is partially copied.

Size of Application

The amount of the application data that has been protected.

Application data that is backed up may include sparse files. As a result, the displayed size of the data may be greater than expected.
If job has completed with multiple attempts, the amount displayed may be larger.

Size of Backup

The amount of compressed data that has been protected, which includes all application data and metadata.

Content Indexed

Displays whether content indexing was used for the operation.
If viewing job history data from:

Versions 6.1.0 and prior: Yes or No
Versions 7.0.0 and later: Full, Partial, or No

Note that if a job is displayed as partially content indexed, not all of the data protected in the job was content indexed successfully. Rerun content indexing on this job so that the protected data is fully content indexed.

Delay Reason

The description of the reason why the operation may be pending, waiting, or failing.

Alert

The name of the job-based alert, if configured for the job.

Job Initiation

The origin of the operation: the CommCell Console (Interactive), a schedule (Scheduled), or a third party interface (Third Party).

Maximum Number of Readers

The maximum number of readers that can be used for the operation.

Automated Content Classification Policy

Name of the Automated Content Classification Policy.

Legal Hold Name

The Name specified for the Legal Hold data.

Legal Hold Retention Time

The time frame for which the Legal Hold Data will be retained.

Number of Readers in Use

The number of readers currently in use for the operation.

Number of Objects

The total number of objects including successful, failed and skipped.

For a Unix File System iDataAgent backup job that includes hard links and for which the HLINK registry key is set to Y and the appropriate hard link updates are applied, the value in this field will also account for the number of hard links and hard link groups that were backed up.

See the Service Pack documentation for more information on hard link updates.

Restart Interval

The amount of time the Job Manager will wait before restarting a job that has gone into a pending state. This is set in the Job Management (Job Restarts) tab.

Max Restarts

The maximum number of times the job will be restarted after a phase of the job has failed. This is set in the Job Management (Job Restarts) tab.

Error Code

Error Code for job pending or job failure reason. (See Job Errors for more information.)

Retained By

The type of retention rules defined for the job, basic or extended. For more information, see Data Aging.

Description

A brief description of the running job.

The Pause and Play buttons allow you to control how the Job Controller displays real time information from active jobs. The Pause button stops the Job Controller from displaying real time information collected from jobs. The play button allows the Job Controller to display real time job updates.

To see all the columns in the Job Controller window, use the scroll bar at the bottom of the window.

Job Errors

If a job has not completed successfully, the Error Code column will display a unique code linking to available troubleshooting and knowledgebase article(s) relevant to that error from the customer support website. These articles may include special considerations for the type(s) of job(s) you are running, suggested workarounds for issues, and common causes for that particular error.

If an error code pertains to more than one issue, the customer support website will display links to all articles for which the code is relevant. Conversely, if an error code does not have any articles associated with it, the customer support website will display a message indicating that no articles exist for that code.

Error codes may also be obtained from several other windows and dialog boxes, including:

The Job History windows
The Job Summary Report
Events
Alerts

Note the following when obtaining troubleshooting articles using error codes:

The Error Code field will only contain a code if a job has not completed successfully.
In the Job History windows and Job Summary Report, you can access troubleshooting articles by simply click on the linked error code.
In the Events and Alerts windows, error codes do not provide direct links to troubleshooting articles. However, you can search the customer support website for related articles by typing the appropriate error code in the search field.

Note that jobs which fail Data Integrity Validation will be moved to pending status. Review the error code and description of the pending job from the job controller to identify the reason for failure. See Data Integrity Validation - Troubleshoot for troubleshooting Data Integrity validation errors.

For step-by-step instructions on viewing information about job errors, see View Troubleshooting Article(s) Available from the Customer Support Website.

Flags

The Job Controller window also provides a Flags column, which is located on the left-hand side of the Job Controller window. The Flags column displays an icon for any running jobs that encounter one of the following scenarios:

A required media cannot be found in the library. This scenario requires user intervention for the job to complete successfully.
The job has not sent an update (such as bytes or files received) in over 60 minutes. This scenario may or may not require user intervention; for example, if the delay in receipt of updates is caused by insufficient network bandwidth, the job may complete successfully once additional network bandwidth is available. Conversely, if the delay in receipt of updates is caused by a hardware issue, the job will not complete successfully until the user has resolved the hardware issue.

The job is a high-priority job with a priority level of less than 100.

In order to activate this flag, the JobHighPriorityMarkEnable entry must be configured in the GXGlobalParam table with a value of 1. When this entry is present, all jobs with a priority of less than 100 will be given a flag in the Job Controller.

To change the default priority for which flags will be shown, the JobHighPriorityMark entry can be added and configured with the desired priority level. Note that the JobHighPriorityMarkEnable entry must still be present and configured.

If neither of the above scenarios are present, the Flags column will remain empty.

Viewing Additional Job Details

To view additional details about a particular job, right click the job in the Job Controller window and select Detail.

The General tab of a Job Details dialog box provides general information about the selected job, such as the subclient, storage policy, etc.
The Progress tab of a Job Details dialog box of the selected job provides more specific statistical information about the selected job's current phase.
The Streams tab of a Job Details dialog box of the selected job provides data transferred by stream on the MediaAgent the job is using.
The Attempts tab of a Job Details dialog box includes information on each attempt of each phase of the selected job, such as the status of each phase of the job. Each phase has a corresponding client log that can aid in troubleshooting data protection problems. Note that the Data Size/Transferred field amount includes metadata, and therefore, will be larger than the actual size of the backed up data.
The Phase Details tab of a Job Details dialog box provides information on each phase of the Information Management operations, such as Search, Legal Hold, ERM Connector, Restore to Review Set, and Tagging.
The Retention tab of a Job Details dialog box provides the retention information for the data protection job's storage policy. The associated storage policy copies will be listed with their defined retention rules. From here, you can quickly identify whether the storage policy copies are defined with basic or extended retention rules, and the date(s) until which the data will be retained for each storage policy copy.
For Oracle specific backup and restore jobs, you can also view the RMAN log for the selected job in the Job Details dialog box.

The Job Controller also provides the facility to view job information using other CommCell Console features, including:

Job Events, which can be viewed using the All Found Events window. For more information about events, see the Event Viewer.
Log Files, which can be viewed for any active job. For more information about viewing log files, see Log Files.

Controlling Jobs

You can select a job in the Job Controller and perform a control action on that job individually. You can also control multiple jobs simultaneously in two ways:

Select each of the desired jobs in the Job Controller window simultaneously, and then right-click any one of the selected jobs. You can then select the appropriate action from the menu displayed, which will be executed for each of the selected jobs.
Use the Multi-Job Control dialog box.

Either method allows you to perform actions on:

All jobs in the Job Controller.
All selected jobs in the Job Controller providing you have the correct security associations at the proper level for each job selected.
All data protection operations/data recovery operations/data collection operations running for a particular client or client/agent.
All data protection operations running for a particular MediaAgent.

You can perform the following actions on jobs:

Suspend	Temporarily stops a job. A suspended job is not terminated; it can be restarted at a later time. Only preemptible jobs can be suspended.
Commit	Gracefully completes the current backup job, as of that point-in-time. Applicable only for Silo backup jobs. See Commit Silo Backup for details.
Resume	Resumes a job and returns the status to Waiting, Pending, Queued, or Running depending on the availability of resources or the state of the operation windows and activity control settings.
Kill	Terminates a job.
Change Priority	Change the priority of a job or a group of jobs that are currently active. Note that the lower the priority number, the higher priority the Job Manager gives to the job when allocating resources.

Controlling the Number of Simultaneously Running Streams

The low watermark, which will only display in the status bar if defined, is used in conjunction with the high watermark. If the high watermark value is reached, and a low watermark is defined, the Job Manager will wait until the low watermark value is reached to start any new jobs. The low watermark value can be defined using the JMRunningJobsLowWaterMark registry key.

The high watermark has a default value of 10 for SRM Reports.
The high watermark has a default value of 100 for WorkStation backup jobs running to one destination. You can use the SetKeyIntoGlobalParamTbl.sql qscript with the JMReplicationJobActivityLevelHighWaterMark global parameter to change the default value. For more information, see Command Line Interface - QScripts.

Viewing Job Status

The following table describes the status levels that may appear in the Job Controller window for a particular job:

Completed	The job has completed successfully. Note that pop-up messages for reporting job completion can be enabled or disabled using the F12 key.
Completed With One or More Errors	The job has completed with errors. The following administration conditions will result in the Completed With One or More Errors status level. Disaster Recovery Backup During the operation, Phase 1 failed and Phase 2 completed, or Phase 1 completed and Phase 2 failed. Data Aging During the operation, one or more components failed, e.g., subclients failed to be aged or job history failed to be removed. Install Updates During the operation, one or more clients failed to be updated. Offline Content Indexing During the offline content indexing operation, one or more backup data failed to be content indexed. Information Management During an information management operation, if the operation defined in the Automated Content Classification Policy is partially successful. The following iDataAgent-specific conditions will result in the Completed With One or More Errors status level. Exchange Compliance Archiver During a retrieve operation, one or more files failed to be retrieved. Exchange Mailbox Archiver and Exchange Public Folder Archiver During a recovery operation, one or more files failed to be recovered. Microsoft Windows File System During a system state backup operation, one or more non-critical components failed to be backed up. During a file system restore operation, one or more files failed to restore or were locked. During a system state restore operation, one or more non-critical components failed to be restored. Microsoft Exchange Server During a backup operation of a storage group assigned to a subclient, one or more databases failed to be backed up. During a restore operation, one or more databases failed to be restored. Informix During a backup operation, one or more files failed to be backed up. Oracle, Oracle RAC During a backup operation, one or more files failed to be backed up. SAP for Oracle, SAP for MAXDB During a backup operation, one or more files failed to be backed up. SharePoint Server iDataAgent During a backup operation, one or more elements in the subclient content failed to be backed up. During a restore operation, one or more elements in the subclient content failed to be restored. SharePoint Archiver During a migration archiving operation, one or more elements in the subclient content failed to be archived. During a recovery operation, one or more elements in the subclient content failed to be recovered. Sybase During a backup operation, one or more files failed to be backed up. UNIX File System During a backup operation, one or more files failed to be backed up. Online Content Indexing Agents During an online content indexing operation, one or more files failed to be content indexed.
Dangling Cleanup	A job phase has been terminated by the job manager, and the job manager is waiting for the completion of associated processes before killing the job phase.
Failed	The job has failed due to errors or the job has been terminated by the job manager.
Interrupt Pending	The job manager is waiting for the completion of associated processes before interrupting the job due to resource contention with jobs that have a higher priority, etc.
Kill Pending	The job has been terminated by the user using the Kill option, and the job manager is waiting for the completion of associated processes before killing the job.
Killed	The job is terminated by the user using the Kill option or by the Job Manager.*
Pending	The Job Manager has suspended the job due to phase failure and will restart it without user intervention.
Queued	The job conflicted with other currently running jobs (such as multiple data protection operations for the same subclient), and the Queue jobs if other conflicting jobs are active option was enabled from the General tab of the Job Management dialog box. The Job Manager will automatically resume the job only if the condition that caused the job to queue has cleared. The activity control for the job type is disabled, and the Queue jobs if activity is disabled option was enabled from the General tab of the Job Management dialog box. The Job Manager will automatically resume the job only if the condition that caused the job to queue has cleared. The Queue Scheduled Jobs option was enabled from the General tab of the Job Management dialog box. Scheduled Jobs can be resumed manually using the Resume option or resumed automatically by disabling the Queue Scheduled Jobs option. The job started within the operation window's start and end time. The running job conflicted with the operation window and the Allow running jobs to complete pass the operation window option was not enabled from the General tab of the Job Management dialog box. (This is only applicable for jobs that can be restarted. See Restarting Jobs for more information.)
Running	The job is active and has access to the resources it needs.
Running (Cannot be verified)	During a running operation, the Job Alive Check failed. See Job Alive Check Interval for more information.
Suspend Pending	A job is suspended by a user using the Suspend option, and the Job Manager is waiting for the completion of associated processes before stopping the job.
Suspended	A running, waiting or pending job has been manually stopped by a user using the Suspend option. The job will not complete until it is restarted using the Resume option. A job has been started in a suspended state using the Start Suspended or Startup in Suspended State options available from the dialog box of the job that was initiated. Restore jobs from Search Console can be started in the suspended state using the Start End User restores in suspended state and Start Compliance restores in suspended state options in the Browse/Recover Option Dialog box in the Control Panel.
System Kill Pending	The job has been terminated by the Job Manager*, and the Job Manager is waiting for the completion of associated processes before killing the job.
Waiting	The job is active, waiting for resources (e.g., media or drive) to become available or for internal processes to start.

*The Job Manager will terminate a job when:

The number of job retries has exceeded the value set in the Job Retry dialog box.
The total running time has exceeded the amount of time set in the Job Retry dialog box.

Conflicting jobs overlap, i.e., a new backup job is initiated for the same subclient as a job that is currently running.

The Job Manager will only terminate a conflicting job if the new backup job encompasses the earlier job and if the earlier started job has yet to transfer any data to media. If these conditions exist, then the earlier job will be killed by the system and replaced by the newer job. To be more encompassing indicates that a FULL backup can kill jobs such as incrementals, differentials and other fulls; however, incrementals will not be able to kill fulls. If the current job has already started transferring data, then the normal queue rules for the new job will apply.

This feature must be enabled on the CommServe with the JMKillPreviousBackupJobForSameSubclient registry key.

The free space is less than 25MB in the CommServe installation directory.

Job Status Changes

The status of a job and the preemptibility of the phase of the job in the Job Controller determines the actions (Kill, Suspend, or Resume) that you can perform. The following table describes the status of a job after an action has been performed on it:

Original Status	Actions Available	New Status
Running	Suspend	Suspended
Running	Kill	Killed
Waiting	Suspend	Suspended
Waiting	Kill	Killed
Interrupt Pending	N/A	N/A
Pending	Suspend	Suspended
	Resume	Returns to original state, resources and other conditions permitting
	Kill	Killed
Suspend Pending	N/A	N/A
Queued	Suspend	Suspended
	Resume (scheduled jobs only)	Changes into a state of an active job, resources and other conditions permitting
	Kill	Killed
Suspended	Resume	Returns to original state, resources and other conditions permitting Changes into a state of an active job, resources and other conditions permitting
Suspended	Kill	Killed
Kill Pending	N/A	N/A
Dangling Cleanup	N/A	N/A

Job Filters

You can filter the jobs that are displayed in the Job Controller by creating a job filter from the Filter Definition dialog box. You can filter by Data Protection, Data Recovery, Data Collection (for SRM jobs), and Administration operations. The filter can also be based on an active job for a particular CommCell entity.

CommCell Administrators can utilize filters created by all users. All other users can only utilize the filters that they create. If a user account is deleted, their filters will automatically be deleted as well.

Important Considerations for Running Jobs

If a user is not part of the View All user group, then that user will not see CommCell objects for which the user's member user group(s) does not have associations. Furthermore, users will not be able to view the Job Controller or Event Viewer details associated with the CommCell objects for which they do not have permissions. Note that a user will not be able to view these CommCell objects upon logging onto the CommCell Console after the restrictions have been set.
For the File Archiver Agents, multiple stub recoveries from magnetic media or tape are submitted to the Job Controller as one job. For such stub recoveries, only one job will display in the Job Controller.

Job Preemption Control

Jobs or operations fall into two main phases:

Preemptible Phase

In a preemptible phase, the job can be interrupted by the Job Manager or suspended by the user and then restarted without having to start the phase over again from the beginning. Preemption is defined by the Job Manager at each phase of a job. A File System backup phase is one example of a preemptible phase; the Job Manager can interrupt this phase when resource contention occurs with a higher priority job. You can also suspend this phase in progress and resume it later.

Non-preemptible Phase

A non-preemptible phase is one that cannot be interrupted by the Job Manager or suspended by the user. It can only run to completion, be killed by administrative action, or be failed by the system. For example, the data recovery operations of database agents are non-preemptible.

Both preemptible and non-preemptible jobs can also be defined in terms of their restartability; preemptible jobs are always restartable. In addition, even jobs that are not preemptible might fail to start and be in a "waiting" state; these are restartable as well. For more specific information on this topic, see Job Restart.

Preemptible and Non-Preemptible Jobs

The following table lists the types of preemptible and non-preemptible jobs:

Preemptible and Restartable	Non-preemptible and Non-Restartable	Non-preemptible but Restartable
Data protection operations for most non-database agents. DataArchiver archive jobs during the Archive Index and Archive Content Index phases of the job. Data recovery operations for most File System-like (indexing-based) agents during the restore phase. Data recovery operations from the Search Console. Most administration jobs including Install Automatic Updates and Download Automatic Updates. Jobs that are run using an alternate data path cannot preempt other jobs. Similarly such jobs can also be preempted by other jobs which does not use an alternate data path. Silo backup and restore operations.	Data recovery operations for database-like agents. Media export, erase media, and inventory jobs. SAN volume data protection jobs (non-preemptible in its scan phase). All QR jobs on Unix platforms.	Data protection operations for database agents. The system state phase of Windows File System data protection operations. Offline and Online Content Indexing jobs. Data Collection operations for SRM Agents.

For information on Agents that support Job Restarts, see the following:

Controlling Job Preemption for the CommCell

You can specify that certain operations will preempt other operations based on their job priority, in cases where multiple jobs are competing for media and drives.

If a running job is preemptible, the Job Manager can interrupt the running job and allocate the resources to a higher-priority job. (The interrupted job enters a waiting state and resumes when the resources it needs becomes available.)

You can:

Allow restores and browse backup data index restores to preempt other jobs of lower priority such as backups, synthetic fulls, and auxiliary copy operations.
Allow backups (including Disaster Recovery backups) to preempt other backups of lower priority.
Allow backups (including Disaster Recovery backups) to preempt auxiliary copy jobs of lower priority.

See Set Job Preemption Control for the CommCell.

Configuring Preemptibility for Select Job Types

You can specify which of the following types of jobs are preemptible:

Data Protection and Data Recovery operations of indexing-based file system-like agents.
Disaster Recovery backup
Auxiliary Copy

To configure preemptibility in the CommCell for specific job types, see Specify Preemptibility of Job Types.

What happens when a job is Preempted

The following table provides information on the Status of the job in the Job Controller window and the Reason for job delay displayed in the Job Details dialog box when a job is preempted. In addition, a brief explanation on what happens when a job is preempted is also provided.

Job	Status in the Job Controller	Reason for Job Delay	Additional Information
Data Protection Operation	Interrupt Pending	No Job Delay	Once interrupted, job does not hold on to resources and returns to Waiting status. The job retries for resources. (The Status of the job in the Job Controller window and messages in the Reason for job delay are discussed in What Happens When There are no Resources for a Job.)
Data Protection Operation	Waiting	No resources available
Data Recovery Operations (for File System-like agents)	Interrupt Pending	No Job Delay	Once interrupted, job does not hold on to resources and returns to Waiting status. The job retries for resources. (The Status of the job in the Job Controller window and messages in the Reason for job delay are discussed in What Happens When There are no Resources for a Job.)
Data Recovery Operations (for File System-like agents)	Waiting	No resources available
Data Recovery Operation (for Database-like agents)	Not Preemptible
Index Restore (Browse Backup Data)	Not Preemptible
Auxiliary Copy	Interrupt Pending	No Job Delay	Once interrupted, job does not hold on to resources and returns to Waiting status. The job retries for resources. (The Status of the job in the Job Controller window and messages in the Reason for job delay are discussed in What Happens When There are no Resources for a Job.)
Auxiliary Copy	Waiting	No resources available
Synthetic Full	Interrupt Pending	No Job Delay	Once interrupted, job does not hold on to resources and returns to Waiting status. The job retries for resources. (The Status of the job in the Job Controller window and messages in the Reason for job delay are discussed in What Happens When There are no Resources for a Job.)
	Waiting	No resources available

The higher priority job that is doing the preemption for resources will display the Reason for Job delay as follows:

Waiting for job[ ] to release the resources.

Important Considerations

For the Image Level and Image Level ProxyHost iDataAgents, if a backup job is suspended either by the user or the Job Controller during metadata collection, the job will automatically resume from the scan phase.

For Oracle and Oracle RAC iDataAgents, selective online full backup jobs are not preemptable nor restartable. Similarly, oracle log backup jobs that are submitted during selective online full backups (data phase) also cannot be preempted nor restarted.
When restarting Oracle offline backup jobs, note that the job is restarted from the beginning rather than from the point of failure.

Restarting Jobs

Restartable jobs can be restarted either by a user or automatically by the Job Manager. Job Restartability can be configured in the Job Management Control Panel; restartability can be turned on or off, the maximum number of restart attempts can be specified, and the time interval between each restart attempt can be configured. These settings are for the entire CommCell, so that all jobs in the CommCell of a selected type will behave according to the Job Restart settings you have specified.

Restartable and Non-Restartable Jobs

Both preemptible and non-preemptible jobs can be restartable; preemptible jobs are always restartable after they are suspended; jobs that are not preemptible might fail to start and be in a "waiting" state and can be restartable as well. Additional insight about jobs that fail to start can be gained from reviewing What Happens When There are no Resources for a Job.

The following types of operations can be restarted, if so configured:

Auxiliary Copy
Data Aging
Data Protection operations of indexing-based, file system-like agents, and certain database-like agents**
Data Recovery operations of indexing-based, file system-like agents**
Disaster Recovery backup
Erase Stubs (for Exchange Mailbox Archiver only, a job-based setting is available)
Online and Offline Content Indexing jobs
Data Collection (for SRM Agents only)

The Job Restarts tab in Job Management Control Panel lists all agents that can be configured for restartability for data protection, data collection and data recovery operations. For more information see, Specify Job Restartability for the CommCell.

For a specific job, you can override one of these settings, the maximum number of restart attempts, by specifying the Number of Retries in the Job Retry tab of the job initiation dialog box for that particular job. See How to Configure Job Restarts for more specific direction on this.

In all cases, whether the Max Restarts setting is used in the Job Management Control Panel, or the Number of Retries setting in the Job Retry tab, once the maximum number of retries has been reached, if the job has still not restarted successfully, the Job Manager will kill the job.

The job-based setting will have no affect unless restartability has been turned on in the Job Management Control Panel.
You can not configure the interval between restart attempts for an individual job, only the number of attempted restarts.
Data Aging restartability can only be set in the Job Management Control Panel; you cannot set it in the Job Retry tab of the job initiation dialog box for that particular job.
The restartability of Unix raw partition backup jobs either manually or by the system is not supported. Therefore, you should run such jobs under high priority.
Data Protection/Data Collection Jobs that enter a Running (Cannot be verified) job state during a temporary network or CommServe service outage will not be restarted. These jobs do not enter a pending state; they will continue, without interruption, when the network or CommServe services become available. For more information, see Fault Tolerance.
Restarting an Oracle On Demand backup job for multiple scripts for the same instance will cause the instance, whose backup was interrupted, to be backed up again from the beginning of the script which was running. Because of this restart behavior, if the archive files for that instance were successfully backed up before the restart, they will be backed up again after the restart. As a result, Job Manager may count the data size of archive files twice for the instance that the Oracle On Demand backup job was restarted from. Therefore, the size of data reported as backed up for this job (in the Job Details and Backup Job History) will reflect the duplicate size of the archive files that were backed up twice for that instance. The scripts should be updated to prevent this behavior before resuming the job.
If a data management job for the DB2 DPF iDataAgent goes to a pending state, and if the job has completed on some of the nodes, the restart option will start the job on all the nodes unless the sBKPRESTARTFAILEDNODESTimeOut registry key is set appropriately.

Configuring Job Restarts for the CommCell

Using the Job Management control panel, Job Restarts are configured for the entire CommCell. For each job, Specify Job Restartability for the CommCell.

For Agents that support the capability, to override the CommCell's Max Restart setting for a particular job, you can specify the Number of Retries in the Job Retry tab of the job configuration dialog box for the following types of jobs:

Job Name	How To Configure Job Restarts	Notes
Auxiliary Copy	In the Auxiliary Copy dialog, click Advanced, then select the Job Retry tab and specify Number of Retries.	See Start an Auxiliary Copy or Schedule an Auxiliary Copy for step-by-step instructions.
Data Protection	In the Backup Options or Archive Options dialog, click Advanced, then select the Job Retry tab and specify Number of Retries.	Refer to information specific to your Agent, beginning with the Compliance Archiving, Backup Data, or Migration Archiving page.
Data Recovery	In the Restore Options or Recover Options dialog, click Advanced, then select the Job Retry tab and specify Number of Retries.	Refer to information specific to your Agent, beginning with the Retrieve Data - Exchange Compliance Archiver Agent, Restore Backup Data, or Recover Archived Data page.
Data Collection	In the Schedule Data Collection Job dialog, click Advanced, then select Job Retry tab and specify Number of Retries.	See, Data Collection and Run/Schedule a Data Collection Job for an SRM Instance, Agent or Subclient for detailed information.
Disaster Recovery Backup	In the Disaster Recovery Backup Options dialog, select the Job Retry tab and specify Number of Retries.	See Starting a Disaster Recovery Backup or Scheduling a Disaster Recovery Backup for step-by-step instructions.
Erase Stub jobs for Exchange Mailbox Archiver	In the Erase Stubs selected for deletion in Outlook dialog, select the Job Retry tab and specify Number of Retries.	See Erase Stubs for step-by-step instructions.
Offline Content Indexing	In the Content Indexing dialog box, click Advanced, then select the Job Retry tab and specify Number of Retries.	See Start or Schedule Offline Content Indexing Operations for step-by-step instructions.
Online Content Indexing	In the Backup Options dialog box, click Advanced, then select the Job Retry tab and specify Number of Retries.	See Start or Schedule Online Content Indexing Operations for step-by-step instructions.

QR Volume Creation Restartability

QR Volume Creation restartability is only supported on Windows platforms. See Create a QR Volume for more information.

Single Volume Subclient

The Quick Recovery Agent maintains a restart string during the Volume Creation (copying) phase of full and incremental copy jobs to keep track of the progress made on each volume being copied. This restart string is updated on the CommServe database every time 1 GB of data is copied per volume. If a job is resumed from a suspended or pending state, this restart string will be retrieved and used to identify the location in the volume from where to resume the copying. For example, a job was suspended with 2.8 GB of the data copied for a particular volume; since the restart string on the volume was last updated when 2 GB completed copying, the job resumed from that point.

Multi-Volume Subclient

In the QR Volume Creation phase, volumes are copied sequentially (i.e., not in parallel). This affects job restartability behavior for a multi-volume subclient. When a QR Volume Creation job is interrupted (suspended or pending), some of the volumes in the subclient may be completely copied while others may not be copied yet at all. If the job is restarted (either manually or automatically), the behavior toward each volume in the subclient will depend on the condition of the volume at the time of job interruption. Refer to the following table for the expected behavior (for each volume) when resuming an interrupted QR Volume Creation job for a multi-volume subclient.

Volume Condition at the Time of Job Interruption	Behavior when Job Restarts
volume was successfully copied	The Quick Recovery Agent copies any changes to the volume that occurred after the starting point of the original job up to the time of the restart. For example: A job was initiated at 2:00 P.M. At 2:30 P.M., you suspended the job. This job was suspended in the QR Volume Creation (copying) phase, after the volume was successfully copied. At 3:00 P.M. you restarted the job. Upon the resume, the Quick Recovery Agent copied the changes made to the volume from 2:00 to 3:00 P.M.
volume was partially copied	The Quick Recovery Agent runs the full or incremental copy, and then copies any changes to the volume that occurred after the starting point of the original job up to the time of the restart. For example: A job was initiated at 2:00 P.M. At 2:30 P.M., you suspended the job. This job was suspended in the QR Volume Creation (copying) phase, during the copying of the volume. At 3:00 P.M. you restarted the job. Upon the resume, the Quick Recovery Agent ran the initial copy job and then copied the changes made to the volume from 2:00 to 3:00 P.M.
volume was not yet copied	If it�s a full copy, the Quick Recovery Agent runs a normal full copy. For example: A job was initiated at 2:00 P.M. At 2:02 P.M., you suspended the job. This job was suspended in the QR Volume Creation (copying) phase, before it copied any parts of the volume. At 3:00 P.M. you restarted the job. Upon the resume, the Quick Recovery Agent ran a full copy job, copying all the data in the volume up to 3:00 P.M.
volume was not yet copied	If it�s an incremental copy, the Quick Recovery Agent copies any changes that the original incremental would have copied as well any changes to the volume that occurred after the starting point of the original incremental copy job up to the time of the restart. For example: A job was initiated at 2:00 P.M. At 2:02 P.M., you suspended the job. This job was suspended in the QR Volume Creation (copying) phase, before it copied any parts of the volume. At 3:00 P.M. you restarted the job. Upon the resume, the Quick Recovery Agent copied the data that the original incremental copy would have copied, as well as the changes made to the volume from 2:00 to 3:00 P.M.

Retrying Jobs

The Job Initiation dialog box provides several configuration options for retrying jobs, including:

Total Running Time - The maximum elapsed time, in hours and minutes, from the time that the job is created. When the specified maximum elapsed time is reached, as long as the job is in the "Running" state, it will continue; if the job is not in the "Running" state when the specified time is reached, Job Manager will kill the job.
Number of Retries - The number of times that Job Manager will attempt to restart the job. Once the maximum number of retry attempts has been reached, if the job has still not restarted successfully, Job Manager will kill the job. Note that this job-based setting will not be valid if restartability has been turned off in the Job Management Control Panel.
Kill Running Jobs When Total Running Time Expires - Option to kill the job when the specified Total Running Time has elapsed, even if its state is "Running". This option is available only if you have specified a Total Running Time.

Resuming Jobs

Jobs that have been in a waiting or pending state can be resumed by right-clicking on the job itself in the Job Controller and selecting Resume Job.

Other Considerations

Several additional job management capabilities are available. These capabilities are described in the following sections.

Hardware Considerations for Data Recovery Operations

The occurrence of a hardware failure during a restore operation puts the job in a device wait state for indefinite time. If a hardware failure occurs, you need to kill the job and start it at a later time when the hardware is available.

When a hardware failure occurs during a restore, the restore job will go into a device wait state indefinitely and will need to be killed.

Job Alive Check Interval

The Job Alive Check Interval option within the General tab of the Job Management dialog box allows you to specify the time interval by which the Job Manager will check active jobs to determine if they are still running.

Job Update Interval

The Job Update Interval allows you to view or modify how often information must be updated for data protection and data recovery operations in the Job Details.

The Job Updates tab of the Job Management dialog box displays the:

Available Agent Types
Protection Time (in Minutes)
Recovery Time (in Minutes)

It also includes:

Update interval time for the ContinuousDataReplicator.
Update interval time for the Data Classification status, if installed.

Job Running Time

At the time of job initiation, you can determine the total amount of time a job can run before it is killed by the Job Manager. The configurable parameters for Job running time allow you to control the following:

Total Running Time

The maximum elapsed time, in hours and minutes, from the time that the job is created. When the specified maximum elapsed time is reached, as long as the job is in the "Running" state, it will continue; if the job is not in the "Running" state when the specified time is reached, Job Manager will kill the job.

Example: Total Running Time for a job is specified as 1 hour.
- If the job is still running at the 1 hour point, it will continue to run.
- If the job is still running at the 1 hour point, but 30 minutes later you suspend the job, Job Manager will kill the job.
- If the job begins running, and 15 minutes later is suspended and left in that state, 45 minutes later (when the specified Total Running Time of 1 hour has elapsed) Job Manager will kill the job.
- If the job is started in the suspended state and left in that state, 1 hour later (when the specified Total Running Time of 1 hour has elapsed) Job Manager will kill the job.
Kill Running Jobs When Total Running Time Expires
Option to kill the job when the specified Total Running Time has elapsed, even if its state is "Running". This option is available only if you have specified a Total Running Time.

You can configure the Total Running Time and whether to Kill running jobs when total running time expires in the Job Retry tab of the job initiation dialog box for the following types of jobs:

For an Auxiliary Copy job, see Start an Auxiliary Copy or Schedule an Auxiliary Copy. In the Auxiliary Copy dialog, click Advanced, then select the Job Retry tab.
For a Data Aging job, see Start Data Aging or Schedule Data Aging. In the Data Aging dialog, select the Job Retry tab.
For a Data Protection operation, in the Backup Options or Archive Options dialog, click Advanced, then select the Job Retry tab. Refer to information specific to your Agent, beginning with the Archive, Backup Data, or Migration Archiving page.
For a Data Recovery Operation, in the Restore Options or Recover Options dialog, click Advanced, then select the Job Retry tab. Refer to information specific to your Agent, beginning with the Retrieve Data - Exchange Compliance Archiver Agent, Restore Backup Data, or Recover Archived Data page.
For a Data Collection Operation, in the Schedule Data Collection Job dialog, click Advanced, then select Job Retry tab and specify Number of Retries. See, Data Collection and Run/Schedule a Data Collection Job for an SRM Instance, Agent or Subclient for detailed information.
For a Disaster Recovery Backup operation, see Starting a Disaster Recovery Backup or Scheduling a Disaster Recovery Backup. In the Disaster Recovery Backup Options dialog, select the Job Retry tab.
For an Erase Stubs job for Exchange Mailbox Archiver, see Erase Stubs. In the Erase Stubs selected for deletion in Outlook dialog, select the Job Retry tab.

Job Queuing

Setting jobs to be queued allows a job that would otherwise fail to remain in the Job Controller in a Queued state, i.e., waiting. Once the condition that caused the job to be queued clears, the Job Manager will automatically resume the job. Jobs can be queued if:

they conflict with other currently running jobs (such as multiple data protection operations for the same subclient).
the activity control for the job type is disabled.

You can also set scheduled jobs to be queued. If jobs are scheduled and the Queue Scheduled Jobs option is enabled, these jobs will start in the Job Controller in a Queued state at their scheduled time. These jobs can be manually resumed or, if the Queue Scheduled Jobs option is disabled, these jobs will resume automatically. Selecting this option is especially useful during times of maintenance. Rather than suspend each job manually after it has started, you can enable the Queue Scheduled Jobs option, which will start all the scheduled jobs in the Job Controller in a Queued state. Once you have completed the maintenance, you can manually resume specific scheduled jobs, or simply deselect the Queue Scheduled Jobs option to automatically resume all the scheduled jobs.

The following types of jobs can be queued:

Data Protection
Data Recovery
Data Collection
Administration Operations

You can set the jobs to be queued from the General tab of the Job Management dialog box. The following types of jobs can be queued:

Jobs that are conflicting with other active jobs.
Jobs that cannot run because activity control for the job type(s) is disabled.
Scheduled jobs.

When a Non-Full Backup is Automatically Converted to a Full Backup

Under the following conditions, a non-full backup is automatically converted to a full backup:

If it is the first backup of the subclient.
If you re-associate a subclient to another storage policy.
If you promote a secondary storage policy copy that is not synchronized with a primary copy (for all the subclients of a storage policy).
If a backup job within the most recent backup cycle is pruned or disabled from a primary copy.
If a new content path is added to the subclient.
After CommCell Migration (for some agents).

Some agents have additional scenarios in which a non-full backup is also automatically converted to a full backup:

Exchange Database iDataAgents
- If an Exchange Database has been restored
- If an Exchange Database has been auto-discovered
- If the Pre-Selected backup type has been changed
Image Level and Image Level ProxyHost iDataAgents
- After a failover occurs in a clustered environment, without having CXBF bitmap persistence enabled. For more information, see Configure Persistence.
- After an in-place Volume Level restore
Oracle iDataAgent
- If an incremental backup is selected for an Oracle subclient that includes Archive Logs and/or control files only
SQL Server iDataAgent
- See Default Subclient Backup Conversion Rules and File/File Group Subclient Backup Conversion Rules for complete listings.
NetWare File System iDataAgent
- The first NetWare File System backup run after having selected the backup set option Decompress Data Before Backup is converted to a full backup for all subclients that belong to that backup set.
Workstation Backup Agent
- After an ungraceful shutdown of the source client computer.

What Happens When There are no Resources for a Job

Each job requires certain resources for its successful completion. Absence of these resources has different impact on different type of jobs. The following table discusses the resources required by each job, the status of the job in the Job Controller window when there are no resources and the corresponding examples of the Reason for job delay displayed in the Job Details dialog box. In addition, a brief explanation on what happens when a job does not have the required resources is also provided.

By default the Bull Calypso Media & Library Manager service on the CommServe cleans up any media and drive reservation that is held by a job which failed to release the resource when it was abruptly terminated, every 1440 minutes. You can modify the frequency using the nRESOURCERELEASEINTERVALMIN registry key.

Job	Resources	Status in the Job Controller	Reason for Job Delay	Additional Information
Data Protection Operation	Streams, Active Media, Drive	Waiting	See Example 1.	Job checks for necessary resources.
		Waiting	See Example 2.	If the resources are not available the job retries to reserve the resources when ever they are freed.
				Does not hold on to any resource until all the necessary resources are available.
Data Recovery Operations (for File System-like agents)	Drive	Pending	The media is already reserved by some other job(s).	If the resources are not available the job retries to reserve the resources when ever they are freed.
Data Recovery Operation (for Database-like agents)	Drive	Failed	See Example 1.	Job checks for necessary resources.
		Running	See Example 2.	If the resources are not available it retries every 2 minutes to reserve the resources.
				Does not hold on to any resource until all the necessary resources are available.
Index Restore Operation (Browse Backup Data)	Drive	Failed	See Example 1.	Job checks for necessary resources.
		Running	See Example 2.	If the resources are not available it retries every 2 minutes to reserve the resources.
				Does not hold on to any resource until all the necessary resources are available.
Auxiliary Copy	Destination Drives	Pending	See Example 1.	Job checks for necessary resources.
		Waiting		Job reserves 2 drives for source and destination media.
		Waiting	See Example 2.	If the above resources are not available, it retries every 2 minutes to reserve these resources.
				Does not hold on to any resource until all the necessary resources are available.
	Source Media	Running		Once the 2 drives and destination media is obtained job reserves the source media.
		Pending	See Example 2.	If the job encounters resource contention while reserving the source media, (Example 2) it retries every 20 minutes and a maximum of 144 times to obtain the source media.
				Holds on to the 2 drives and destination media as long as it is not interrupted and as long as the source media is available.
Synthetic Full	Streams, Destination Drives, Destination Media	Waiting	See Example 1.	Job checks for necessary resources.
		Waiting		Job reserves streams, marks active media full, reserves 2 drives and destination media.
		Waiting	See Example 2.	If the resources are not available the job retries to reserve the resources whenever they are freed.
				Does not hold on to any resource until all the necessary resources are available.
	Source Media	Running		Once the 2 drives and destination media is obtained job reserves the source media.
		Pending	See Example 2.	If the job encounters resource contention while reserving the source media, (Example 2) it retries every20 minutes and a maximum of 144 times to obtain the source media.
				Holds on to the 2 drives and destination media as long as it is not interrupted.

Job Management

Job Name

How To Configure Job Restarts

Notes

Single Volume Subclient

Multi-Volume Subclient