Seeding a Deduplicated Storage Policy

Table of Content

Overview

Prerequisites

Configuration

Perform Seeding

Overview

Data transfers across a high latency networks such as Wide Area Networks (WAN) can be time consuming, especially during the transfer of baseline backups where most of the data is unique and needs to be transferred.

The process in this section explains how to manually transfer baseline backup between two sites using easily available removable disks such as USB disks. As a part of this process, a pre-seeded source side deduplication database is created that is used to lookup signatures locally instead of across the network, thereby speeding up signature lookup operation and hence improving the overall data transfer speed.

This is useful in scenarios where remote office sites are separated from the data center across WAN and data either needs to be remotely backed up or periodically replicated to central data center sites. Once the initial baseline is established, all subsequent backup and Auxiliary Copy operations consume less network bandwidth as only the changes are transferred.

The diagram above represents the initial setup for the seeding to work.

How IT Works?

Storage Policy is configured to point to three copies - Primary, Secondary and Tertiary Copies.

Seeding process works as follows:

Prerequisites

You should have the following setup configuration:

 

Components

Example

1. Configure a Local Shared Library (Library1), on MediaAgent1 computer located in the Remote Office Site.

In the screenshot displayed on right, matador6_3 is the MediaAgent1 at Remote Office site and MA1_Lib1 is Library1.

2. Configure a Shared Library (Library2), between MediaAgent1 and MediaAgent2 computers, on USB drive connected in the remote office site.

In the screenshot displayed on right, MediaAgent1 is matador6_3, MediaAgent2 is matador3_2 and Shared Disk Library is Lib2_USB_MA1_MA2.

3. Configure Local Library (Library3), used for Auxiliary Copy operation on MediaAgent2 computer located in the Data Center site.

Configuration

Pre-configure your setup with the following steps that involves the creation of a new storage policy and copies:

1. Create a new storage policy on MediaAgent1 computer and specify Library1 as the library to which the Primary Copy (Copy1) should be associated.
  • From the CommCell Console, navigate to Policies, right-click the Storage Policies node and click New Storage Policy.
  • Follow the prompts displayed in the Storage Policy Wizard.

    During creation, in the Do you Want to enable Deduplication for primary copy? dialog box, make sure Deduplication is selected and then click Next.

2.. Create a Secondary Copy (Copy2) on MediaAgent1 computer and specify Library2 as the library to which the Secondary copy should be associated.
  • Right-click the storage policy just created and click All Tasks | Create Copy.
  • Specify <Library2> as the library and <MediaAgent1> as the MediaAgent for the Default Destination.
  • Select Deduplication check box.
  • Click OK.
3. Create a Tertiary Copy (Copy3) on MediaAgent2 computer and specify Library3 as the library to which Tertiary copy should be associated.
  • Right-click the storage policy and click All Tasks | Create Copy.
  • Specify <Library3> as the library and <MediaAgent2> as the MediaAgent for the Default Destination.
  • Select the Enable Deduplication check box.
  • Click OK.

Perform Seeding

The following steps explain how to perform the seeding process:

1. Perform the backups on all the clients associated with the Storage Policy.

Use the following steps to perform the backup:

  • From the CommCell Console, navigate to Client Computers | <MediaAgent1>, right-click the Subclient and click Backup.
  • Select Full as backup type.
  • Click OK.
2. Perform Auxiliary Copy to copy all the jobs from Primary Copy (Copy1) to Secondary Copy (Copy2). Use the following steps to run the Auxiliary Copy:
  1. From the CommCell Browser, navigate to Policies | Storage Policies.
  2. Right-click <Storage_Policy>, point to All Tasks and then click Run Auxiliary Copy.
  3. Click OK.

The backup data is transferred to the USB drive and to the local drive (Copy2) on the Remote office site.

3. After completion of data copy, unplug the USB drive and ship it to the data center.

Once the USB disks is available at the data center, plug the USB, and perform the following:

  • Navigate to Policies | Storage Policies, right-click the <secondary copy> and click Properties.
  • Click the Data Path tab.

    You should see Library2 enabled on both MediaAgent1 and MediaAgent2 computers.

  Once you unplug the USB drive on Remote Office, Library2 on MediaAgent1 will be offline.
4. Create the registry key UseCacheDB on MediaAgent2 (Data Center site) computer.

This registry key will create a source side Deduplication database that will be seeded with signatures during the Auxiliary Copy process.

See Managing Registry Keys from the CommCell Console for more information.
5. Associate Secondary Copy as a Source Copy for the Tertiary Copy.
  • Right-click the <tertiary copy> and click Properties.
  • Click the Copy Policy tab.
  • Under Source Copy, select Specify Source for Auxiliary Copy check box, and select Secondary Copy from the list.
6.. Enable DASH copy on the Tertiary Copy.
  • Right-click the <tertiary copy> (Copy3) and click Properties.
  • Click the Deduplication tab, then the Advanced tab.
  • Click Enable DASH Copy and select Network Optimized Copy.
  • Click OK.
7.
  • From the CommCell Console, navigate to Policies | Storage Policies.
  • Right-click your <storage policy> and click All Tasks | Run Auxiliary Copy.
    Auxiliary copy jobs should not be run from the Remote office site until the source side database is copied from MediaAgent2 to MediaAgent1 computer after seeding.
  • Click OK.

    This will seed the deduplication policy and will create a source side deduplication database on MediaAgent2 computer under the job results folder.

8. Manually copy the seeded source side database from the Job Results folder of MediaAgent2 back to MediaAgent1.

If Global Deduplication Policy is being used and there are multiple policies pointing to the same GDSP policy, you must copy the seeded database from the Job Results folder of each computer to the respective MediaAgent(s) Job Results source folder.

Example

Copy CV_CLDB_AUX_98 from the Job Results folder of MediaAgent2 to the Job Results folder of MediaAgent1:

C:\<software>\iDataAgent\JobResults\CV_CLDB\ CV_CLDB_AUX_98

9. Delete the seeded database from the Job Results folder of MediaAgent2.  
10. After seeding process, re-associate primary copy as source copy for Tertiary Copy.
  • Right-click <tertiary copy> (Copy3) and then click Properties.
  • Click the Copy Policy tab.
  • Clear Source Copy check box to use Copy1 (primary copy) as source copy during Auxiliary Copy operation.
  • Delete secondary copy as this is no longer needed.
11. Run a full backup followed by an Auxiliary Copy job.

You will see a minimum amount of data being transferred between MediaAgent1 and MediaAgent2. Any backup or auxiliary copies started at the remote office site will now verify data signatures from the seeded source side deduplication database. If a signature is already present in the source side deduplication database this means the data block is already available at the data center and will not be transferred.