Topics | How To | Related Topics
System Behavior when Replication is Interrupted
Replication is a continuous activity and details of on-going replication activity is shown in the Data Replication Monitor in the CommCell Console. See View Data Replication Monitor for step-by-step instructions.
From the Replication Monitor you can:
All other job-based activity, such as Recovery Point creation, is reflected in the Job Controller. See Controlling Jobs in Job Management for comprehensive information.
CDR utilizes phases to perform three types of operations - initial data transfer or baselining, smart synchronization, and continuous data replication. The sequence of these phases is listed below along with details of CDR activities during each phase, and the consequence of an interruption, such as a temporary loss of connectivity:
Job Phase |
Associated Activity |
Comments |
Baseline Scan |
For Windows only, start NTFS journaling on the source to track any
file operations that occur during the entire Baseline phase. Scan source path to obtain the number of files and bytes to transfer. Generate Collect File. |
The Replication Pair will show a Job State of
Preparing for Replication
in the Data Replication Monitor. If this phase is interrupted:
A Full Re-Sync will start at this phase. |
Baseline (For Windows) |
Calculates checksum on the source and
destination to identify files that will be sent to the destination. Data is transferred from the Replication Pair source path to the destination path using the checksum. |
If this phase is interrupted, it can resume again at the same point. |
SmartSync Scan | Create a non-persistent snapshot; for Windows, compare it to the
change journal. Scan snapshot and generate a new Collect File for any files or directories that were added or data that was modified since the beginning of the Baseline Scan phase. |
For Windows:
For Unix:
If this phase is interrupted:
A Smart Re-Sync will start at this phase. |
Processing Orphan Files | Compare the Collect File to the Destination to identify orphan files, and apply orphan file settings. | Any data that was deleted on the
replication source during the Baselining phases are treated according to
your settings for
Orphan Files. If this phase is interrupted:
|
Checksum Calculation (On Windows only) |
Calculate checksums on the source and destination to identify files that have changed since Baseline Scan. | If this phase is interrupted, it will resume again from the beginning of this phase; however, if the snapshot is no longer available, it will return to the SmartSync Scan phase. |
SmartSync | Transfer all changed files to destination from the new Collect File. | If this phase is interrupted:
|
Updating Smart Sync (On Windows only) |
Compare time stamps on source and destination and update. Temporary snapshot is deleted. |
If this phase is interrupted, it will resume again from the beginning of this phase. |
Replication | Data is continuously replicated from the source to destination. | Log Transfer & Log Replay activity is on-going. For more information, refer
to Replication Logs. The Replication Pair will show a Job State of Replicating in the Data Replication Monitor. If the Replication phase is interrupted, when restarted, if it is possible, replication will begin again from the last log replayed on the destination; if this is not possible, the Replication Pair will return to the Baseline Scan phase (Full Re-Sync) or to the SmartSync Scan phase (a Smart Re-Sync) depending on the nature and duration of the interruption. Note that if a user manually restarts Replication by choosing Start Full Resync, the Replication Pair will return to the Baseline Scan phase. |
For the SmartSync Scan, while new files and directories will be copied in their
entirety, modified files do not need to be copied. Thus, for larger files,
only the modified portion is re-copied, while smaller files with substantial
changes may be copied in their entirety. Modified files below a certain size
threshold are copied again as complete files, while files above that size
are broken into blocks with just the changed blocks copied to the destination
computer.
Files smaller than 256KB will be copied in their entirety whether they match the destination or not. For files above 256KB in size, only the changed blocks will be transferred; the default block size for hashing is 64KB. The default values of the minimum file size and the block size for hashing, can be configured in Replication Set Properties. See Create a Replication Set for step-by-step instruction. |
By default, CDR handles interruptions by seamlessly restarting replication, but if that is not possible, Smart Re-Sync will be started. However, some interruptions will require a Full Re-Sync. The following sections describes each phase and restart behavior when the phase is interrupted:
Smart Re-Sync is the default behavior of CDR when activities are interrupted and cannot be seamlessly restarted at the same point again. In general, CDR endeavors to do the following in such cases, wherever possible:
For examples of commons types of interruptions, and how Smart Re-Sync handles the recovery, refer to System Behavior when Replication is Interrupted.
For a detailed listing of each phase, and the specifics of the exact point at which Smart Re-Sync restarts activities, refer to Job Phases.
Full Re-Sync should be necessary only in cases such as the following:
In such a case, all existing content in the destination path is considered inconsistent and Full Re-Sync is recommended to rebuild it again based on the current data in the specified source path. When you start replication from the Replication Set or Replication Pair level, you can specify Full Re-Sync, causing the Replication Pair to begin at the Baseline Scan phase.
Data Replication will be interrupted if a hard disk used for either a source or destination is put into the 'standby' state through the power schema configuration. It will be necessary to abort activity for all affected Replication Sets and restart them again using Start Full Resync after such an event.
Changes to the following configuration items will not be effective until data replication activity has been interrupted and restarted:
The following will require data replication to be interrupted and restarted:
There are several ways in which data replication activity can be interrupted, and CDR recovers from each of them in a similar manner. The table below provides a listing of common causes of interruption, and the effect of them on Baselining, SmartSync, and data replication, as well as how CDR recovers from them.
Interruption |
Effect Of Interruption & Smart Re-Sync |
Abort a Replication Pair during Baselining phases | Baselining activities stop on the source. When the Replication Pair is restarted, Baselining activities will resume, restarting at the beginning of the phase if necessary, then SmartSync and data replication activities will begin automatically. |
Abort a Replication Pair during SmartSync phases | Logging stops on the source. When the Replication Pair is restarted, SmartSync activities will resume, restarting at the beginning of a phase if necessary, and data replication activities will begin automatically. |
Abort a Replication Pair during Replication phase | Logging stops on the source. When the Replication Pair is restarted, for NTFS or UNIX, Smart Re-Sync will continue the data replication activities automatically; for FAT file systems, Full Re-Sync will be necessary. |
Suspend a Replication Set | Baselining, SmartSync, and data replication
activities stop for all Replication Pairs, but any logging activities will
continue on the source. When the Replication Set is resumed:
|
Graceful or non-graceful shutdown of the source computer | The destination computer continues to replay
the logs it has received. When the source computer and software are running again, Replication Pair(s) will be in the System Aborted state for some time, then Smart Re-Sync will be performed. |
Graceful or non-graceful shutdown of the destination computer | Logging continues on the source. When the destination computer and software are running again:
|
CDR software shutdown on the source | All CDR-related activities stop. When the software is restarted, CDR will start Smart Re-Sync. |
CDR software shutdown on the destination | Logging continues on the source.
|
Replication Service is stopped on the source | Baselining, SmartSync, and data replication
activities stop for all Replication Pairs, but logging continues on the
source, and the destination computer continues to replay the logs it had
received before the service was stopped. When the Replication Service is started again:
|
Replication Service is suspended on the destination | Baselining, SmartSync, and data replication
activities stop for all Replication Pairs, and log replay stops on the destination,
but logging continues on the source. When the Replication Service is started again:
|
Interruption of network connectivity (source and/or destination) | Baselining, SmartSync, and data replication
activities stop for all Replication Pairs, but logging continues on the
source, and the destination computer continues to replay the logs it had
received before the network connectivity was interrupted. When network connectivity is restored:
If the network interruption is for a significant amount of time, the following will occur:
|
Source computer runs out of log space (Windows) -- or -- Source computer tries to create new entries in a log before the old entries have been transferred to the destination (UNIX) |
Logging will stop, all logs will be deleted,
all Replication Pairs will be System Aborted.
|
For instructions on restarting replication after it has been interrupted, see Start/Suspend/Resume/Abort Data Replication Activity.
The Data Replication Monitor shows the state of each Replication Pair. These states are briefly described:
New Pair | The Replication Pair has been created, but no activity has taken place yet. |
Preparing for Replication | CDR is scanning the source paths, preparing for initial transfer or Full Re-Sync. |
Baseline | For detailed information, see Baseline. |
Initial Sync | For detailed information, see Baseline Scan. |
SmartSync Scan | For detailed information, see SmartSync Scan. |
SmartSync | For detailed information, see SmartSync. |
Processing | For detailed information, see Processing Orphan Files. |
Replicating | Data is being continuously replicated. |
Replicating (Not verifiable) | The most recent communication between the CommServe and CDR Client indicated the job was in the Replicating state, but this cannot be verified because communication has been interrupted. |
Suspended | Replication activity has been temporarily halted, either by a user, or because communication between the source and destination has been interrupted. Logs continue to be written on the source. |
Pending | There has been a temporary interruption and CDR is attempting to reconnect and resume operations. |
Failed | Phase failed to complete, or log transfer has stopped, perhaps for connectivity issues; logs continue to be written on the source. |
Paused | CDR is trying to resume replication activity. |
Stopped | Replication activity has
been halted by one of the following:
|
System Aborted | For CDR on Windows only, a Replication Pair will be in this state for 3 minutes if the source disk hosting replication logs runs out of space, after which the system will attempt to restart. |
To see more information about a particular Replication Pair, see View details of data replication activities.
You can change the state of a Replication Pair, or several at the same time. See Change the State of Replication Pair.
The following information is available in the Data Replication Monitor:
Active | When the symbol is green, it indicates recent activity for the Replication Pair; an orange symbol indicates no recent activity. An exclamation point preceding the symbol indicates that some files are not copied successfully to the destination computer during replication. To see failed files for a replication pair, see View the failed files for a Replication Pair for step-by-step instructions. |
Phase | The current phase of the job; for more detailed information see Job Phases. |
General |
|
Job ID | A unique number allocated by the Job Manager for the operation. |
State | The current state of the Replication Pair; for more detailed information see Job States. |
Last Update Time | The date and time of the CommServe when the Job Manager last updated the Data Replication Monitor. |
Pair Abort Reason | For a Replication Pair that was aborted, the reason is listed. |
Last Error | The most recent error message for this Replication Pair. |
Initial Sync Information |
|
Start Time | The date and time of the CommServe when data replication activity began for the Replication Pair. |
Number of Files To Be Transferred | The files remaining to be transferred for the Replication Log file currently being replayed on the destination. |
Number of Files Already Transferred | The files transferred for the Replication Log file currently being replayed on the destination. |
Data To Be Transferred during Initial Sync On Source | The aggregate size of all files to be transferred between the source and destination for the Replication Pair. The actual data transferred may differ slightly from this number, based on whether a given file actually gets transferred in full or in part. |
Data Transferred during Initial Sync On Destination | The sum of all data already transferred between the source and destination for the Replication Pair. |
Throughput Unit | The rate of data transfer during Baseline phase, in GB/hour. |
Progress | The percentage of files transferred for the Replication Log file currently being replayed on the destination. |
Replicating State Information |
|
Last Log Played Time | The date and time of the CommServe when the most recent Replication Log was played on the destination computer. |
Replicated Data | The sum of all data transferred between the source and destination machines since the Start Time. |
Attempts | The number of attempts at replication the system has made for the Replication Pair. |
Latest Source Log | The number of the most recent Replication Log that was created on the source computer. |
Latest Destination Log | The number of the most recent Replication Log that was replayed on the destination computer. If this number is lower than the Latest Source Log number, it indicates that the destination computer has not yet replayed all of the Replication Logs that have been created on the source computer. |
Configuration |
|
Pair ID | A unique number allocated by the Job Manager that identifies the Replication Pair. |
Source Path | The path on the source computer for the Replication Pair. |
Destination Path | The path on the destination computer for the Replication Pair. |
Replication Set | The name of the Replication Set. |
Replication Type | The type of replication configured for the Replication Set. (See Data Replication Type.) |
Client | The CDR Client that is the source computer for the Replication Pair. |
Destination Host | The CDR Client that is the destination computer for the Replication Pair. |
The following information is available in the Attempts window:
Phase | The phase that the Replication Pair was in at the time of the attempted activity. |
State | Current state of the Replication Pair. |
Start Time | The date and time of the CommServe when the attempted activity began for the Replication Pair. |
End Time | The date and time of the CommServe when the attempted activity ended for the Replication Pair. |
Elapsed Time | The amount of time that elapsed while the activity was being attempted for the Replication Pair. |
Files to Transfer | Files to be transferred to the destination computer for the Replication Pair, based on the initial scan. |
Files Transferred | Files already transferred to the destination computer for the Replication Pair. |
Data Transferred | The sum of all data already transferred between the source and destination during the attempted activity. |
Data to Transfer | The aggregate size of all files to be transferred between the source and destination for the Replication Pair. The actual data transferred may differ slightly from this number, based on whether a given file actually gets transferred in full or in part. |