Deduplication - How To

Topics | How To | Support | Related Topics


General

Enable Deduplication for a Primary Copy

Enable Deduplication for a Secondary Copy

Configure Deduplication options for Storage Policy Copies

Configure Deduplication for a Subclient

Configure Block Size for Block Level Deduplication

Suspend/Resume Deduplication

Rebooting a MediaAgent Hosting the Deduplication Store

Deduplication Store

Evaluate a Disk for Hosting the Deduplication Store

Change the MediaAgent Hosting Deduplication Store

Change the Location of Deduplication Store

Configure Deduplication Store Creation

Seal the Active Deduplication Store

Deduplication Tools

Measure Disk Performance

Create a Deduplication Store Dump File

Generate Deduplication Store Statistics

Re-Index the Deduplication Store

Compact the Deduplication Store Database


Enable Deduplication for a Primary Copy

Required Capability: See Capabilities and Permitted Actions

Before You Begin

To enable deduplication for a primary storage policy copy:

  1. From the CommCell Browser, right-click the Storage Policies node, and select New Storage Policy from the shortcut menu.
  2. The Storage Policy Wizard guides you through the process of creating a storage policy. In creating the storage policy, provide the following options for deduplication. Note that the options provided are only applicable to the primary copy.
    If you select No, you cannot enable deduplication on the primary copy at a later time. However, you can enable deduplication on newly created secondary copies. See Enable Deduplication for a Secondary Copy for details.
  3. The Review Summary window is displayed. Review your selections and then click Cancel, Back (to return to a previous window to change a selection), or Finish (to exit and create the storage policy).
  4. See Configure Deduplication options for Storage Policy Copies to configure deduplication options for the storage policy copy.

Enable Deduplication for a Secondary Copy

Required Capability: See Capabilities and Permitted Actions

Before You Begin

To enable deduplication for a secondary storage policy copy:

  1. From the CommCell Browser, right-click the storage policy for which you wish to create the secondary copy, click All Tasks and then click Create New Copy. Configure the necessary options in the  Copy Properties (Retention) and Copy Properties (Copy Policy) tabs.
  2. From the General tab of the Copy Properties dialog box, select the Library and MediaAgent, and then click (and enable) the Enable deduplication option. The Deduplication tab is enabled.

    Deduplication can only be enabled for storage policy copies associated with a magnetic library.

  3. From the Storage Policy Properties (Store Information) tab, select the Deduplication Type, Object Level or Block Level. If required, modify the default name of the deduplication store in the Deduplication Database Store Name box. If you select block level deduplication, set the block size for deduplication in the storage policy properties. See Configure Block Size for Block Level Deduplication for step-by-step instructions.

    Note the following when selecting the deduplication type.

  4. From the Copy Properties (Deduplication - Store Information), create the deduplication store. Review Disk Specifications for Hosting the Deduplication Database for recommendations on the deduplication database location.

    Click the Add button and provide the following details in the Deduplication Store Access Path dialog box:

    The store information is displayed in the Deduplication Store Access Path area in the Copy Properties (Deduplication - Store information) dialog box.

  5. See Configure Deduplication options for Storage Policy Copies to configure deduplication options for the storage policy copy.

Configure Block Size for Block Level Deduplication

Required Capability: See Capabilities and Permitted Actions

Before You Begin

To configure the block size for block level deduplication:

  1. From the CommCell Browser, right-click the desired storage policy and select Properties from the shortcut menu.
  2. From the Storage Policy Properties (Advanced) tab, select the desired block size in the Block Level Deduplication Factor field. The minimum block size is 32 KB, and the maximum block size is 512 KB. The default block size is 128 KB. Note that the block size is applicable to all copies in the storage policy.
    For VMWare data, set the Block Level Deduplication Factor to 32 KB to achieve optimal deduplication results.
  3. Click OK to save the changes.

Configure Deduplication Options for Storage Policy Copies

Required Capability: See Capabilities and Permitted Actions

To configure Deduplication options for storage policy copies:

  1. From the Copy Properties dialog box, select the Copy Properties (Deduplication - Advanced) tab.
  2. Select the number of instances of the deduplicated objects/blocks to be created in the storage in the Redundancy Factor box. The default value is set to 1.
  3. Select the minimum size of objects to be deduplicated in the Minimum Size of Deduplicable Object field. This option is applicable only for object level deduplication. The default value is 50 KB.
  4. Select the age of a primary object/block that can be used for deduplication reference in the Do not Deduplicate against objects older than field. The default is the value of the retention set for this copy; for infinite retention, the default is set to 90 days. The value can be set to a maximum of 1825 days.
    • To obtain optimal results, we recommended that the values for Minimum Size of Deduplicable Object and Do not Deduplicate against objects older than are not set below the default values.
  5. Select the absolute free space always required in the volume in which the deduplication store is configured in the Minimum Free Space field. The default value is set to 0 MB.
  6. In the Free Space Warning field, select the amount of free space in the volume in which the deduplication store is configured, reaching which a disk space low alert is generated, if configured. The default value is 1024 MB.
  7. Select the Enable Software Compression with Deduplication field to enable software compression for all subclients associated with this storage policy copy. This option is enabled by default. It is recommended to have data compression enabled when using deduplication.

    Note this option supersedes the compression option set in the corresponding subclients.

  8. Click OK to save the deduplication store options for the storage policy copy.

Configure Deduplication for a Subclient

Required Capability: See Capabilities and Permitted Actions

Before you Begin

To configure deduplication for a subclient:
  1. From the CommCell Browser, right-click the subclient for which you wish to enable (or disable) deduplication and then click Properties.
  2. Click the Storage Device tab and then click the Deduplication tab.
  3. Select one of the following options for signature generation.

    By default, signature generation is set On Client. Note that signature generation is performed only if the subclient is associated with a storage policy copy that is deduplication enabled.

  4. Click OK to save the changes.

Suspend/Resume Deduplication

Required Capability: See Capabilities and Permitted Actions

Before You Begin

Review Disable Deduplication

To temporarily suspend or resume deduplication for storage policy copies:

  1. From the Copy Properties dialog box, select the Copy Properties (Deduplication - Store Information) tab.
  2. Clear the option Active to temporarily suspend deduplication. Note that when a storage copy is deduplicated, this option is enabled by default.

    Select the option to resume deduplication.

  3. Click OK to save the changes.

Evaluate a Disk for Hosting the Deduplication Store

Required Capability: See Capabilities and Permitted Actions

This tool is used to estimate the performance of the disk where you plan to create the Deduplication Store.

Before You Begin

To evaluate a disk for hosting the deduplication store:

  1. Locate the SIDB2 tool at <software installation path>/Base folder.

    If you are not operating in turbo mode (if you can locate .fcs files in the database directory, then you are not operating in turbo mode), use the SIDB tool instead of SIDB2. All other options remain the same.

  2. The tool can be used with the following options:

    SIDB2 -simulateddb -p <SidbLocation> -in <Instance#> [-datasize] [-dratio] [-blocksize] [tlimit] [-diskperf -tpath] [-user] [-password] [-domain]

    where:

  3. For the details on the projected average transaction time for an insert/query in the deduplication database based on the size of the application data that is backed up, use the tool with the -simulateddb and -datasize options.

    For example:

    sidb2 -simulateddb -in instance001 -p d:\dedup_store -datasize 500

    Sample output:

    The disk is capable of hosting a deduplication DB for:

    0.500 TB of Application Data Size

    0.100 TB of data on disk

    146.0 microseconds average Q&I overhead perblock

    Throughput for DDb server 3156 GB per Hour

  4. For recommendations on the maximum application data size that can be backed up using the store based on the average access time for each record, use the tool with the -simulateddb and -tlimit options.

    For example:

    sidb2 -simulateddb -in instance001 -p d:dedup_store -tlimit 150

  5. For recommendations on disk performance, use the tool with the -simulateddb and -diskperf options.

    For example:

    sidb2 -simulateddb -in instance001 -p d:\dedup_store -datasize 100 -diskperf -tpath d:\disktest


Rebooting a MediaAgent Hosting the Deduplication Store

To reboot a Windows MediaAgent:

  1. Open the Service Control Manager and stop the services on the MediaAgent computer.
  2. When the services are stopped, open the Windows Task Manager.
  3. Select the Processes tab and locate the SIDB.exe or SIDB2.exe process. If either of the processes is located, then wait until the process is complete. Depending on the size of the deduplication store, this process might take as long as 30 minutes to complete.
  4. Once the process is complete and no longer displayed on the task manager, reboot the computer.

To reboot a Unix MediaAgent:

  1. Log on to the computer as root and stop the services. At the command line prompt, type the Calypso stop command and press Enter.
  2. When the services are stopped, type in ps –aef | grep sidb to view all the deduplication processes that are still running.
  3. If either the SIDB.exe or SIDB2.exe process is found running, then wait until the process is complete. Depending on the size of the deduplication store, this process might take as long as 30 minutes to complete.
  4. Repeat Step 2 to confirm that the processes are no longer running and then reboot the computer.

Change the MediaAgent Hosting Deduplication Store

Required Capability: See Capabilities and Permitted Actions

Before You Begin

To change the MediaAgent hosting the deduplication store:

  1. From the Copy Properties dialog box, select the Copy Properties (Deduplication - Store Information) tab.
  2. Select the Deduplication Store Access Path and click the Change Host button.
  3. Perform the following in the Deduplication Store Access Path dialog box:
    The contents of the current Deduplication store should be manually copied to the new location when the Media Agent is changed.
  4. Click OK to save the deduplication store options for the storage policy copy.

Change the Location of Deduplication Store

Required Capability: See Capabilities and Permitted Actions

Before You Begin

To change the location of the deduplication store:

  1. From the Copy Properties dialog box, select the Copy Properties (Deduplication - Store Information) tab.
  2. Select the Deduplication Store Access Path and click the Properties button.
  3. Perform the following in the Deduplication Store Access Path dialog box:

    The store information is displayed in the Deduplication Store Access Path area in the Copy Properties (Deduplication - Store information) dialog box.

  4. Click OK to save the deduplication store options for the storage policy copy.

Configure Deduplication Store Creation

Required Capability: See Capabilities and Permitted Actions

Before You Begin

To configure the deduplication store creation:

  1. From the CommCell Browser, right-click the Storage Policy Copy for which you wish to configure the deduplication store creation, and then click Properties.
  2. Select the Copy Properties (Deduplication) - Store Information tab.
  3. From the Deduplication Store Creation options:

    Note that if both the options are set, a new deduplication store will be created if either one of the two conditions is satisfied.

  4. Click OK to save the changes.

Seal the Active Deduplication Store

Required Capability: See Capabilities and Permitted Actions

To seal the active deduplication store:

  1. From the CommCell Browser, right-click the Storage Policy Copy for which you wish to seal the active deduplication store, click All Tasks, and then click Seal Deduplication Store.
  2. Click Yes on the confirmation dialog.
  3. The current active deduplication store will be sealed and the deduplication of data on that store will be self contained.

Measure Disk Performance

Before you Begin

To measure the disk performance using CvDiskPerf tool:

  1. Locate the CvDiskPerf tool at <software installation path>/Base folder.
  2. Run the following command from the command prompt:

    CvDiskPerf -READWRITE -PATH <path> -RANDOM -BLOCKSIZE <blocksize> -BLOCKCOUNT- <blockcount> -FILECOUNT <filecount> -USER <username> -PASSWORD <password> -DOMAIN <domain> -OUTFILE <outputfile>

    where:

    Consider the following sample commands:

    CvDiskPerf -READWRITE -PATH c:\temp -OUTFILE c:\temp\perf.txt

    CvDiskPerf -READWRITE -RANDOM -PATH c:\temp -OUTFILE c:\temp\perf.txt

    CvDiskPerf -READWRITE -RANDOM -PATH c:\temp -BLOCKSIZE 1024 -OUTFILE c:\temp\perf.txt

    CvDiskPerf -READWRITE -RANDOM -PATH c:\temp -BLOCKSIZE 1024 -BLOCKCOUNT 5 -FILECOUNT 500 -OUTFILE c:\temp\perf.txt

    CvDiskPerf -READWRITE -RANDOM -PATH c:\temp -USER commuser –PASSWORD commpw -OUTFILE c:\temp\perf.txt

  3. The details of the disk performance are stored in the output file provided in the -OUTFILE option. The contents of a sample output file are given below:

    DiskPerf Version        : 1.0

    Path Used               : f:\

    Read-Write type         : RANDOM

    Block Size              : 512

    Block Count             : 4096

    File Count              : 500

    Total Bytes Written     : 1048576000

    Time Taken to Write(S)  : 7.113515

    Throughput Write(GB/H)  : 494.217709

    Total Bytes Read        : 1048576000

    Time Taken to Read(S)   : 7.581667

    Throughput Read(GB/H)   : 463.700792


Create a Deduplication Store Dump File

Before you Begin

To create a deduplication store dump file using the SIDB tool:

  1. Locate the SIDB tool at <software installation path>/Base folder.
  2. Run the following command from the command prompt:

    SIDB -dump <primary/secondary/statistics/said> <SIDB-location> [output file name]

    where:

        primary - option to generate the dump for actual objects/blocks in the deduplication store.

        secondary - option to generate the dump for duplicate objects/blocks.

        statistics - option to generate the dump for deduplication store statistics.

        said - option to generate the dump for information on distinct archive-IDs from the secondary table.

    Consider the following sample commands:

    SIDB –dump primary D:\production\2008\CV_SIDB\2\1 D:\production\sidbdump.csv

    SIDB –dump primary D:\production\2008\CV_SIDB\2\1

    SIDB –dump secondary D:\CV_SIDB\8\21 myoutput.csv

    SIDB –dump statistics D:\mdoc\CV_SIDB\3\7

    SIDB –dump said D:\db\cbkdb\CV_SIDB\3\1212

  3. The deduplication store dump file is created in the location provided in the output file name. The output is a .csv file containing comma separated values, and can be viewed using applications like Microsoft Excel, Notepad, etc.

Generate Deduplication Store Statistics

Before you Begin

To generate deduplication store statistics using the SIDB tool:

  1. Locate the SIDB tool at <software installation path>/Base folder.
  2. Run the following command from the command prompt:

    SIDB -stat <SIDB-location>

    where:

  3. The deduplication store statistics information is displayed.

Re-Index the Deduplication Store

Before you Begin

To re-index the deduplication store using the SIDB tool:

  1. Locate the SIDB tool at <software installation path>/Base folder.
  2. Run the following command from the command prompt:

    SIDB -re-index <SIDB-location>

    where:

    For example, SIDB -reindex D:\production\2008\CV_SIDB\3\12

  3. The tables in the deduplication store will be re-indexed.

Compact the Deduplication Database

Before you Begin

To compact the deduplication database to optimize performance and manage database growth:

  1. Stop the all the Services associated with the MediaAgent hosting the deduplication database.
  2. Review the database directory for any .fcs files.
  3. This compacts and re-indexes the tables in the deduplication database.

Back to Top