Deduplication - How To
Topics |
How To |
Support |
Related Topics
General
Enable Deduplication for a Primary Copy
Enable Deduplication for a Secondary Copy
Configure
Deduplication options for Storage Policy Copies
Configure Deduplication for a Subclient
Configure Block Size for Block Level Deduplication
Suspend/Resume Deduplication
Rebooting a MediaAgent Hosting the Deduplication Store
Deduplication Store
Evaluate a Disk for Hosting the Deduplication Store
Change the
MediaAgent Hosting Deduplication Store
Change the Location
of Deduplication Store
Configure
Deduplication Store Creation
Seal the Active Deduplication Store
Deduplication Tools
Measure Disk Performance
Create a Deduplication
Store Dump File
Generate Deduplication
Store Statistics
Re-Index the Deduplication Store
Compact the Deduplication Store Database
Required Capability: See
Capabilities
and Permitted Actions
Before You Begin
To enable
deduplication for a primary storage policy copy:
- From the CommCell Browser, right-click the Storage Policies node,
and select New Storage Policy from the shortcut menu.
- The Storage Policy Wizard guides you through the process of creating a
storage policy. In creating the storage policy, provide the following
options for deduplication. Note that the options provided are only
applicable to the primary copy.
- Click Yes for
deduplication.
|
If you select No, you cannot enable
deduplication on the primary copy at a later time. However, you can
enable deduplication on newly created secondary copies. See
Enable Deduplication for a Secondary Copy
for details. |
- Select the deduplication type, Object Level or Block Level.
If you select block level deduplication, set
the block size for deduplication in the storage policy properties.
See Configure Block Size for Block Level Deduplication for step-by-step
instructions.
- Provide a name for the Deduplication Store.
- Select the MediaAgent accessing the Deduplication Store. Note that
this can also be a MediaAgent outside of the deduplication data path.
- Provide the location for the Deduplication Store. Ensure that you
provide a valid path; if the path is invalid, then the storage policy
will be created without deduplication.
Note that the deduplication database must be located in a folder and not directly under the root of a disk volume.
- The Review Summary window is displayed. Review your selections
and then click Cancel, Back (to return to a previous window to
change a selection), or Finish (to exit and create the storage
policy).
- See Configure
Deduplication options for Storage Policy Copies to configure
deduplication options for the storage policy copy.
Required Capability: See
Capabilities
and Permitted Actions
Before You Begin
- The deduplication can be enabled only during copy creation. Once the
copy is created, deduplication cannot be enabled later.
- Once enabled, deduplication cannot be disabled for the copy. See
Disabling
Deduplication for details.
To enable
deduplication for a secondary storage policy copy:
- From the CommCell Browser, right-click the storage policy for which you
wish to create the secondary copy, click All Tasks and then click
Create New Copy. Configure the necessary options in the
Copy Properties (Retention) and
Copy Properties (Copy Policy) tabs.
- From the
General
tab of the Copy Properties dialog box, select the Library and
MediaAgent, and then click (and enable) the Enable
deduplication option. The Deduplication tab is enabled.
|
Deduplication can only be enabled for
storage policy copies associated with a magnetic library.
|
- From the
Storage Policy Properties (Store Information) tab, select the Deduplication
Type, Object Level or Block Level. If required, modify the
default name of the deduplication store in the Deduplication
Database Store Name box.
If you select block level deduplication, set
the block size for deduplication in the storage policy properties.
See Configure Block Size for Block Level Deduplication for step-by-step
instructions.
Note the following when selecting the deduplication type.
- If the primary copy is deduplicated, then by default this copy will
follow the deduplication type set in the primary copy when creating the
storage policy.
- If the primary copy is not deduplicated, then you can select the
deduplication type.
- From the
Copy Properties (Deduplication - Store Information), create the
deduplication store. Review Disk Specifications for Hosting
the Deduplication Database for recommendations on the deduplication database
location.
Click the Add
button and provide the
following details in the
Deduplication Store Access Path dialog box:
- Select the MediaAgent Name from the list of MediaAgents
available.
- Select the Enable Deduplication Store Access Path to use
the specific deduplication store access path online.
- In the Location box, type the path to the deduplication store or use the
Browse button to select the path
- Click OK to save the deduplication store configuration in the
Deduplication Store Access
Path dialog box.
The store information is displayed in the Deduplication Store
Access Path area in the Copy Properties (Deduplication - Store
information)
dialog box.
- See Configure
Deduplication options for Storage Policy Copies to configure
deduplication options for the storage policy copy.
Required Capability: See
Capabilities
and Permitted Actions
Before You Begin
To
configure the block size for block level deduplication:
- From the CommCell Browser, right-click the desired
storage policy and select Properties from the
shortcut menu.
- From the
Storage Policy Properties (Advanced)
tab, select the desired block size in the
Block Level Deduplication Factor field. The minimum block size is 32
KB, and the maximum block size is 512 KB. The default block size is 128
KB. Note that the block size is applicable to all copies in the
storage policy.
|
For VMWare data, set the Block Level Deduplication Factor
to 32 KB to achieve optimal deduplication results.
|
- Click OK to save the changes.
Required Capability: See
Capabilities
and Permitted Actions
To
configure Deduplication options for storage policy copies:
- From the Copy Properties dialog box, select the
Copy Properties (Deduplication - Advanced) tab.
- Select the number of instances of the deduplicated objects/blocks to be
created in the storage in the Redundancy Factor box. The
default value is set to 1.
- Select the minimum size of objects to be
deduplicated in the Minimum Size of Deduplicable
Object field. This option is applicable only for object level
deduplication. The default value is 50 KB.
- Select the age of a primary object/block that can be
used for deduplication reference in the Do not Deduplicate against objects older than
field. The default is the value of the retention set for this copy; for
infinite retention, the default is set to 90 days. The value can be set to a
maximum of 1825 days.
|
- To obtain optimal results, we recommended that the values for Minimum Size of
Deduplicable Object
and Do not Deduplicate against objects older than are not set below the
default values.
|
- Select the absolute free space always required in the volume in
which the deduplication store is configured in the Minimum Free Space
field. The default value is set to 0 MB.
- In the Free Space Warning field, select the amount of free space in the volume in which the deduplication
store is configured, reaching which a disk space low alert is generated, if
configured. The default value is 1024 MB.
- Select the Enable Software Compression with Deduplication field
to enable software compression for all subclients associated with this
storage policy copy. This option is enabled by default. It is recommended to have data compression enabled
when using deduplication.
Note this option supersedes the compression
option set in the corresponding subclients.
- Click OK to save the deduplication store options for the storage policy
copy.
Required Capability: See
Capabilities
and Permitted Actions
Before you Begin
- Do not modify the properties of a subclient when a data protection job
associated with the subclient is in progress.
To configure
deduplication for a subclient:
- From the CommCell Browser, right-click the subclient for which you wish
to enable (or disable) deduplication and then click Properties.
- Click the Storage Device tab and then click the
Deduplication tab.
- Select one of the following options for signature generation.
- On Client option to enable the signature generation for
Deduplication on the client computer.
- On MediaAgent option to enable the signature generation for
deduplication on the MediaAgent computer.
- Off option to disable deduplication for the
subclient.
By default, signature generation is set On Client. Note that
signature generation is performed only if the subclient is associated with a
storage policy copy that is deduplication enabled.
- Click OK to save the changes.
Required Capability: See
Capabilities
and Permitted Actions
Before You Begin
Review Disable
Deduplication
To
temporarily suspend or resume deduplication for storage policy copies:
- From the Copy Properties dialog box, select the
Copy Properties (Deduplication - Store Information) tab.
- Clear the option Active to temporarily suspend deduplication. Note that
when a storage copy is deduplicated, this option is enabled by default.
Select the option to resume deduplication.
- Click OK to save the changes.
Evaluate a Disk for Hosting the Deduplication Store
Required Capability: See
Capabilities
and Permitted Actions
This tool is used to estimate the performance of the disk where you plan to
create the Deduplication Store.
Before You Begin
- Create a directory on the disk where you wish to locate the deduplication store.
|
Contents of this directory (if any) will be
deleted by the simulation tool. So ensure that the directory is
empty. |
- A user-interface version of this tool is available from the
Maintenance Advantage web site.
To
evaluate a disk for hosting the deduplication store:
- Locate the SIDB2 tool at
<software installation path>/Base folder.
If you are not operating in turbo mode (if you can locate
.fcs files in the database directory, then you
are not operating in turbo mode), use the SIDB
tool instead of SIDB2. All other options remain
the same.
- The tool can be used with the following options:
SIDB2 -simulateddb -p <SidbLocation> -in <Instance#>
[-datasize] [-dratio] [-blocksize] [tlimit] [-diskperf -tpath] [-user]
[-password] [-domain]
where:
- -simulateddb is the keyword to simulate
the deduplication database to evaluate the disk compatibility for hosting
the deduplication store.
- -p is the location (an empty directory)
where the deduplication store will be located.
- -in is the instance of the
software using the tool.
- -datasize is the application data size in GB. Number.
- -dratio is the expected deduplicaiton ratio. Number (default value is
5.)
- -blocksize is the deduplication data block size in KB. Number (default
is 128.)
- -tlimit is the value in microsecond. Number (default value is 1000.)
- -samplesize is the size of the sample. Number (default values is 10000.)
- -diskperf and -tpath.
Diskperf is the keyword to measure disk performance and tpath is the path of the disk.
The two options must be used together.
- -keepddb is the option to keep the deduplication database files. The
files are removed by default.
- For the details on the projected average transaction time for an insert/query in the
deduplication database based on the size of the application data that is backed
up, use the
tool with the -simulateddb and
-datasize options.
For
example:
sidb2 -simulateddb -in
instance001 -p d:\dedup_store -datasize 500
Sample output:
The disk is capable of
hosting a deduplication DB for:
0.500 TB of
Application Data Size
0.100 TB of data on disk
146.0 microseconds average Q&I overhead perblock
Throughput for DDb server 3156 GB per Hour
- For recommendations on the maximum application data
size that can be backed up using the store based on the average access time for
each record, use the tool with the -simulateddb and
-tlimit options.
For example:
sidb2 -simulateddb -in instance001 -p d:dedup_store -tlimit
150
- For recommendations on disk performance, use the
tool with the -simulateddb and
-diskperf options.
For example:
sidb2 -simulateddb -in instance001 -p
d:\dedup_store -datasize 100 -diskperf -tpath
d:\disktest
To
reboot a Windows MediaAgent:
- Open the Service Control Manager and stop the services on the
MediaAgent computer.
- When the services are stopped, open the Windows Task Manager.
- Select the Processes tab and locate the SIDB.exe or
SIDB2.exe
process. If either of the processes is located, then wait until the process is
complete. Depending on the size of the deduplication store, this process might
take as long as 30 minutes to complete.
- Once the process is complete and no longer displayed on the task manager, reboot
the computer.
To
reboot a Unix MediaAgent:
- Log on to the computer as root and stop the services. At the command line prompt, type the
Calypso stop command and press Enter.
- When the services are stopped, type in ps –aef | grep sidb to view all the deduplication processes
that are still running.
- If either the SIDB.exe or
SIDB2.exe
process is found running, then wait until the process is
complete. Depending on the size of the deduplication store, this process might
take as long as 30 minutes to complete.
- Repeat Step 2 to confirm that the processes are no longer running and
then reboot the computer.
Required Capability: See
Capabilities
and Permitted Actions
Before You Begin
- Data Deduplication Enabler license is required for every
MediaAgent hosting the deduplication store. see
License Requirements
for more details.
To
change the MediaAgent hosting the deduplication store:
- From the Copy Properties dialog box, select the
Copy Properties (Deduplication - Store Information) tab.
- Select the Deduplication Store Access Path and click the Change Host
button.
- Perform the following in the
Deduplication Store Access Path dialog box:
- Select the MediaAgent Name from the list.
- Select the Enable Deduplication Store Access Path to
use the specific deduplication store access path online.
- In the Location box, type the path to the deduplication store or use the
Browse button to select the path.
|
The contents of the current Deduplication store should be manually copied to the new location when the Media Agent is changed. |
- Click OK to save the deduplication store options for the storage policy
copy.
Required Capability: See
Capabilities
and Permitted Actions
Before You Begin
To
change the location of the deduplication store:
- From the Copy Properties dialog box, select the
Copy Properties (Deduplication - Store Information) tab.
- Select the Deduplication Store
Access Path and click the Properties button.
- Perform the
following in the
Deduplication Store Access Path dialog box:
- Click the Change button.
- Select the Enable Deduplication Store Access Path to use
the specific deduplication store access path online.
- In the Location box, type the path to the deduplication store or use the
Browse button to select the path
- Click OK to save the deduplication store configuration in the
Deduplication Store Access
Path dialog box.
- In the Deduplication Store Data dialog box:
- Select Yes, if you wish to automatically copy all the
store contents to the new location.
- Select No, if you wish to manually copy all the store
contents to the new location.
The store information is displayed in the Deduplication Store
Access Path area in the Copy Properties (Deduplication - Store
information)
dialog box.
- Click OK to save the deduplication store options for the storage policy
copy.
Required Capability: See
Capabilities and Permitted Actions
Before You Begin
To
configure the deduplication store creation:
- From the CommCell Browser, right-click the Storage Policy Copy for which
you wish to configure the deduplication store creation, and then click
Properties.
- Select
the
Copy Properties (Deduplication) - Store Information tab.
- From the Deduplication Store Creation options:
- Select Create New Store Every - Days and specify
the number of days after which a new deduplication store must be created. Default
value is set to 30 days.
- Select Create New Store Every - TB and specify
the size of the store after which a new deduplication store must be
created. Default value is set to 100 TB.
Note that if both the options are set, a new deduplication store will be created if either one of
the two conditions is satisfied.
- Click OK to save the changes.
Required Capability: See
Capabilities and Permitted Actions
To
seal the active deduplication store:
- From the CommCell Browser, right-click the Storage Policy Copy for which
you wish to seal the active deduplication store, click All Tasks, and
then click Seal Deduplication Store.
- Click Yes on the confirmation dialog.
- The current active deduplication store will be sealed
and the deduplication of data on that store will be self contained.
Before you Begin
To measure
the disk performance using CvDiskPerf tool:
- Locate the CvDiskPerf tool at
<software installation path>/Base folder.
- Run the following command from the command prompt:
CvDiskPerf -READWRITE -PATH <path> -RANDOM -BLOCKSIZE
<blocksize> -BLOCKCOUNT- <blockcount> -FILECOUNT <filecount> -USER <username>
-PASSWORD <password> -DOMAIN <domain> -OUTFILE <outputfile>
where:
- -READWRITE is the option to measure read/write
performance.
- -PATH is the deduplication store
mount path to be tested for performance.
- -RANDOM is the keyword to measure random read/write
operations (Optional). By default, sequential read/write operations are measured.
- -BLOCKSIZE is the size of the block (in bytes)
used in each read/write operation (Optional). Default value is 512 Bytes.
- -BLOCKCOUNT is the number of read and write operations to be made
in each file (Optional). For example, a value of 200 means that the block is
read and written 200 times in a file. Default value is 4096.
- -FILECOUNT is the number of files used in the
read and write operations (Optional). Default value is 1024.
- -USER, -PASSWORD, and -DOMAIN are
options to provide specific user credentials to impersonate access to the path provided
in the –PATH option (Optional). By default,
the application user-credential will be used. If domain name is not provided,
then the default domain will be used.
- -OUTFILE is the location of the output file to store the disk performance
results (Optional). Default value is '.\CvDiskPerf.txt'
Consider the following sample commands:CvDiskPerf -READWRITE -PATH c:\temp -OUTFILE c:\temp\perf.txt
CvDiskPerf -READWRITE -RANDOM -PATH c:\temp -OUTFILE
c:\temp\perf.txt
CvDiskPerf -READWRITE -RANDOM -PATH c:\temp -BLOCKSIZE
1024 -OUTFILE c:\temp\perf.txt
CvDiskPerf -READWRITE -RANDOM -PATH c:\temp -BLOCKSIZE
1024 -BLOCKCOUNT 5 -FILECOUNT 500 -OUTFILE c:\temp\perf.txt
CvDiskPerf -READWRITE -RANDOM -PATH c:\temp -USER
commuser –PASSWORD commpw -OUTFILE c:\temp\perf.txt
- The details of the disk performance are stored in the output file provided
in the -OUTFILE option. The contents of a sample
output file are given below:
DiskPerf Version
: 1.0
Path Used
: f:\
Read-Write type
: RANDOM
Block Size
: 512
Block Count
: 4096
File Count
: 500
Total Bytes Written : 1048576000
Time Taken to Write(S) : 7.113515
Throughput Write(GB/H) : 494.217709
Total Bytes Read
: 1048576000
Time Taken to Read(S) : 7.581667
Throughput Read(GB/H) : 463.700792
Before you Begin
- Review
Deduplication Store
- Ensure that no deduplication jobs are running when executing this procedure.
To create
a deduplication store dump file using the SIDB tool:
- Locate the SIDB tool at
<software installation path>/Base folder.
- Run the following command from the command prompt:
SIDB -dump <primary/secondary/statistics/said> <SIDB-location>
[output file name]
where:
- -dump is the keyword to create a dump file
containing the deduplication store details based on the option provided.
primary - option to generate the dump
for actual objects/blocks in the deduplication store.
secondary - option to generate the dump
for duplicate objects/blocks.
statistics - option to generate the
dump for deduplication store statistics.
said - option to generate the dump
for
information on distinct archive-IDs from the secondary table.
- SIDB-location is the absolute location of
the deduplication store for which the dump file is created.
- output file name is the name of the output
file (Optional). If no output file is provided, then a default output file will
be created in the location of the deduplication store, with the naming convention - "MA-NAME_Store-id_Timestamp.csv".
Consider the following sample commands:
SIDB –dump
primary D:\production\2008\CV_SIDB\2\1 D:\production\sidbdump.csv
SIDB –dump primary D:\production\2008\CV_SIDB\2\1
SIDB –dump secondary D:\CV_SIDB\8\21 myoutput.csv
SIDB –dump statistics D:\mdoc\CV_SIDB\3\7
SIDB –dump said D:\db\cbkdb\CV_SIDB\3\1212
- The deduplication store dump file is created in the location provided in the
output file name. The output is a .csv file containing comma separated values, and can be viewed using
applications like Microsoft Excel, Notepad, etc.
Before you Begin
- Review
Deduplication Store
- Ensure that no deduplication jobs are running when executing this procedure.
To
generate deduplication store statistics using the SIDB tool:
- Locate the SIDB tool at
<software installation path>/Base folder.
- Run the following command from the command prompt:
SIDB -stat <SIDB-location>
where:
- -stat is the keyword to generate statistical details
of the deduplication store such as, the number of objects tracked in the store, number of objects in the secondary table, size of store, size of data tracked in store, and space savings
due to deduplication.
- SIDB-location is the absolute location of
the deduplication store for which the dump file is created.
- The deduplication store statistics information is displayed.
Before you Begin
- Review
Deduplication Store
- Ensure that no deduplication jobs are running when executing this procedure.
To
re-index the deduplication store using the SIDB tool:
- Locate the SIDB tool at
<software installation path>/Base folder.
- Run the following command from the command prompt:
SIDB -re-index <SIDB-location>
where:
- -re-index is the keyword to re-index the
tables in the deduplication store.
- SIDB-location is the absolute location of
the deduplication store for which the dump file is created.
For example, SIDB -reindex D:\production\2008\CV_SIDB\3\12
- The tables in the deduplication store will be re-indexed.
Before you Begin
- Review
Deduplication Store
- Ensure that no deduplication jobs are running when executing this procedure.
To
compact the deduplication database to optimize performance and manage database
growth:
- Stop the all the Services associated with
the MediaAgent hosting the deduplication database.
- Review the database directory for any .fcs
files.
- If there are no .fcs files in the
database directory, then locate the SIDB2 tool at
<software installation path>/Base folder. Run the following command from the command prompt:
SIDB2 -compact <SIDB location>
where:
- -compact is the keyword to compact the deduplication
database.
- SIDB-location is the absolute location
of the deduplication database.
For example, SIDB2 -compact E:\production\2008\CV_SIDB\3\12
- If you can locate .fcs files in the
database directory, then locate the SIDB tool at
<software installation path>/Base folder. Run the following command from the command prompt:
SIDB -compact <SIDB location>
where:
- -compact is the keyword to compact the deduplication
database.
- SIDB-location is the absolute location
of the deduplication database.
For example, SIDB -compact E:\production\2008\CV_SIDB\3\12
- This compacts and re-indexes the tables in the deduplication database.
Back to Top