Disk Specifications for Hosting the Deduplication Store
Evaluating The Disk For Hosting Deduplication Store
Measuring the Deduplication Disk Performance
Managing the Deduplication Store
Setting up the Minimum Free Space
Setting up an Alert For Free Space
Setting the Age of the Primary Block
Changing the MediaAgent Hosting the Store
Changing the Location of the Deduplication Store
Configuring Deduplication Store Creation
Backing Up Deduplication Database
Configure Alerts for Deduplication Store Backup
Deduplication Database Recovery
Manually Reconstructing a Store
Enabling Deduplication in Secondary Copies
Enabling Inline Copy for Deduplicated Primary Copy
Setting up Deduplication for Existing Non-Deduplicated Data
Configuring Signature Generation
Suspending/Resuming Deduplication
Rebooting a MediaAgent Hosting the Deduplication Store
Reconstruct Deduplication Database Job Summary Report
Deduplication uses a hashing algorithm to compare data. A Signature Generation module computes the hashed signature for each block and then compares it with the existing signatures maintained in a Deduplication Store to determine whether it is identical. Based on the comparison, the MediaAgent performs one of the following operations:
The deduplicated data are stored in specially designed container files to increase the system throughput and scalability.
Deduplication is easy-to-use and does not require additional configurations once it is setup. The following table describes the various operations when deduplication is enabled.
Operation | Description | ||
Backup Operations | The sequence of operations is almost similar to a regular
backup job when deduplication is enabled.
When a backup job is initiated the backup module secures the data and starts the data transfer module to the MediaAgent. The following sequence of events occur when data is secured:
|
||
Restore Operations | Data Recovery operations are identical to regular restore
operations and are virtually unaffected by deduplication.
Deduplication store is not contacted for normal restore operations, except when the data is not available in the disk. All types of restore operations (including Restore by Jobs and Restoring from copies) are supported. |
||
Auxiliary Copy | Auxiliary Copy operations will automatically unravel
or explode the deduplicated data, if deduplication is not enabled in the
copy. If the secondary copy is set up for Deduplication, then a separate Deduplication Store gets created for the copy and the associated data is deduplicated for secondary copy. |
||
Data Aging Operations | Data Aging operations will automatically look up the
Deduplication Store before data is deleted from the disk.
Data Aging will only delete the source data when all the references to a given block is pruned. So if you see older chunks in disk libraries remaining on the volume even if the original data is deleted, it might be due to the fact that valid deduplication reference(s) to the chunk exists within the data. |
||
Data Encryption and Data Compression | When Data Encryption and/or Data Compression are enabled
the system automatically runs the signature module after data compression
and before data encryption. If the setup contradicts this order, the system
will automatically perform compression, signature generation and encryption
in the source client computer. When you have a primary copy that is encrypted (and is not deduplicated), enabling deduplication on a secondary copy will not accomplish any viable deduplication on the secondary copy. This is because each backup includes unique encryption keys which in turn will cause unique signatures for each backup.
|
||
Data Multiplexing | Data Multiplexing is not supported with Deduplication.
Also a storage policy copy enabled for Deduplication cannot have a
direct or indirect source copy enabled for Data Multiplexing. However an Auxiliary Copy can be configured with Data Multiplexing when the source copy is enabled for Deduplication. |
||
Spool Copy | Deduplication-enabled Storage Policy Copies cannot be configured as Spool Copies. Note that existing deduplicated Spool Copies will continue to exist until the Spool Copy retention setting is removed; once removed, the deduplicated copy cannot be configured as a Spool Copy. | ||
Deduplication Jobs on Migrated CommCell | After CommCell Migration, the Deduplication Store operates
in the read-only mode in the destination CommCell.
The migrated (deduplication enabled) storage policies in the destination CommCell can be used to restore the deduplicated data migrated from the source CommCell and to perform Auxiliary Copy operation with the migrated data as the source. Migrated Storage Policies in the destination CommCell cannot be used to deduplicate new backup operations. |
The Deduplication store (or the Deduplication Database) serves as the repository for signatures associated with all blocks that are backed up. It also has the reference counts to copies of the blocks that are backed up using the storage policy copy.
Deduplication stores are maintained for each Storage Policy Copy that has the deduplication option enabled. Multiple MediaAgents can be a part of the same copy and use the same Deduplication Store provided the libraries accessed by the MediaAgents are configured as static shared libraries and accessible from all the MediaAgents.
The Deduplication Store is configured when creating a storage policy copy, both for primary and secondary storage policy copies. Any MediaAgent can be associated in the Deduplication Store.
The MediaAgent associated with data store could be any one of the MediaAgents in the data paths, or outside of the data path too. You can also change the MediaAgent hosting the Deduplication Store.
Deduplication store can be located on any of the following platforms:
Windows | All platforms supported by Windows MediaAgents, except
64-bit editions on Intel Itanium (IA64) and Windows XP. Supported on NTFS. |
Linux | All platforms supported by Linux MediaAgents, except
Power PC (Includes IBM System p). Supported on ext3 and ext4. |
Microsoft Cluster Service (MSCS) | Clusters supported by Windows MediaAgents. Supported on NTFS. |
Linux Cluster | Clusters supported by Linux MediaAgents. Supported on ext3 and ext4. |
Never delete the Deduplication Store manually. The Deduplication Store facilitates the deduplication of backup jobs and data aging jobs. If deleted, new deduplicated backup jobs cannot be performed and the existing data in the disk mount paths will never be pruned. |
To ensure optimal performance for deduplication operations, the disk hosting the Deduplication Store must satisfy the following specifications. Note that these specifications are only for the disk hosting the Deduplication Store, and not for the entire mount path.
Ensure that the average read throughput of the disk is around 500 GB per hour, and the average write throughput of the disk is around 400GB per hour.
Calculate the average read and write throughputs from multiple samples (between three and ten), for a FILECOUNT of 500.
Use the steps described in Measuring the Disk Performance to measure the disk throughput.
UNC path is not supported for Deduplication Store access.
The following section provides information on how to evaluate the disk in which you plan to create the Deduplication Store. This will help you to determine the size of the data and store that can be hosted on the disk.
You can also use the user-interface version of this tool. See SIDB Simulator for more details and usage.
Running the Tool |
Run the following file from the MediaAgent computer hosting
the Deduplication Store. C:/Program Files/Bull Calypso/Calypso/Base/SIDB2.exe |
Usage | SIDB2 -simulateddb -p <SidbLocation>
-in <Instance#> [-datasize] [-dratio] [-blocksize] [tlimit] [-diskperf -tpath]
[-user] [-password] [-domain] Where: Options in [] denotes optional arguments -simulateddb is the keyword to simulate the deduplication database to evaluate the disk compatibility for hosting the deduplication store. -p is the location (an empty directory) where the deduplication store will be located. -in is the instance of the software using the tool. -datasize is the application data size in GB. Number. -dratio is the expected deduplicaiton ratio. Number (default value is 5.) -blocksize is the deduplication data block size in KB. Number (default is 128.) -tlimit is the value in microsecond. Number (default value is 1000.) -tlimit and -datasize arguments cannot be used together. -samplesize is the size of the sample. Number (default values is 10000.) -diskperf and -tpath. Diskperf is the keyword to measure disk performance and tpath is the path of the disk. If you use -diskperf, -tpath is mandatory. -keepddb is the option to keep the deduplication database files. The files are removed by default. -stopCounter signifies how many additional iterations to process after reaching the threshold time. This is to limit spikes caused by caching. (default value is 50.) |
Example 1 | For the details on the projected average transaction
time for an insert/query in the deduplication database based on the size
of the application data that is backed up, use the tool with the
-simulateddb and
-datasize options.CommandSIDB2 -simulateddb -in instance001 -p d:\dedup_store -datasize 500 Sample outputThe disk is capable of hosting a deduplication DB for: 0.500 TB of Application Data Size 0.100 TB of data on disk 146.0 microseconds average Q&I overhead perblock Throughput for DDb server 3156 GB per Hour |
Example 2 | For recommendations on the maximum application data size
that can be backed up using the store based on the average access time for
each record, use the tool with the -simulateddb.
This will run till it reaches the default threshold time limit of 1000
microseconds.ExampleSIDB2 -simulateddb -in instance001 -p d:\dedup_store |
Example 3 | For recommendations on disk performance, use the tool
with the -simulateddb and
-diskperf options.ExampleSIDB2 -simulateddb -in instance001 -p d:\dedup_store -datasize 100 -diskperf -tpath d:\disktest |
Use the following steps to measure the disk throughput for the disk in which you plan to create the Deduplication Store.
Running the Tool |
Run the following file from the MediaAgent computer hosting
the Deduplication Store. Windows: C:/Program Files/Bull Calypso/Calypso/Base/CvDiskPerf.exe Linux: ./CVDiskPerf |
Usage |
Windows: CvDiskPerf -READWRITE -PATH <SIDB path> -RANDOM -FILECOUNT <filecount> -USER <username> -PASSWORD <password> -DOMAIN <domain> -OUTFILE <outputfile> Linux: ./CVDiskPerf -READWRITE -PATH <path> -RANDOM -FILECOUNT <filecount> -OUTFILE <outputfile> Where: -READWRITE is the option to measure read/write performance. -PATH is the deduplication store mount path to be tested for performance. -RANDOM is the keyword to measure random read/write operations (Optional). By default, sequential read/write operations are measured. -FILECOUNT is the number of files used in the read and write operations (Optional). Default value is 1024. -USER, -PASSWORD, and -DOMAIN are options to provide specific user credentials to impersonate access to the path provided in the –PATH option (Optional). By default, the application user-credential will be used. If domain name is not provided, then the default domain will be used. -OUTFILE is the location of the output file to store the disk performance results (Optional). Default value is '.\CvDiskPerf.txt' |
Sample Commands | Windows: CvDiskPerf -READWRITE -PATH c:\SIDB01 -OUTFILE c:\temp\perf.txt CvDiskPerf -READWRITE -RANDOM -PATH c:\SIDB01 -OUTFILE c:\temp\perf.txt CvDiskPerf -READWRITE -RANDOM -PATH c:\SIDB01 -USER commuser –PASSWORD commpw -OUTFILE c:\temp\perf.txt Linux: ./CVDiskPerf -READWRITE -RANDOM -PATH /test1 -OUTFILE /tmp/CVDISKLIB01.log |
Output | The details of the disk performance are stored in the output file provided
in the -OUTFILE option. The contents of a sample
output file are given below: DiskPerf Version : 1.3 Path Used : f:\ Read-Write type : RANDOM Block Size : 128 Block Count : 1024 File Count : 500 Total Bytes Written : 1048576000 Time Taken to Write(S) : 7.113515 Throughput Write(GB/H) : 494.217709 Total Bytes Read : 1048576000 Time Taken to Read(S) : 7.581667 Throughput Read(GB/H) : 463.700792 Time Taken to Create(S) : 1.16 Throughput Create(GB/H) : 325.04 |
Ensure that the average read throughput of the disk is around 500 GB per hour, and the average write throughput of the disk is around 400GB per hour.
Calculate the average read and write throughputs from multiple samples (between three and ten), for a FILECOUNT of 500.
The following table provides a sample of the disk performance calculation:
Disk Performance |
Throughput in GB/Hour |
|
Write |
Read |
|
Sample 1 | 341.3798 | 477.6198 |
Sample 2 | 344.3546 | 513.2807 |
Sample 3 | 340.8644 | 575.6513 |
Sample 4 | 428.8675 | 499.7836 |
Sample 5 | 397.6285 | 426.5668 |
Sample 6 | 438.2224 | 503.0041 |
Sample 7 | 428.0591 | 494.4092 |
Sample 8 | 427.0613 | 643.4305 |
Sample 9 | 446.6219 | 523.7768 |
Sample 10 | 396.5592 | 581.3948 |
Average | 398.9619 | 523.8918 |
The minimum free space that must be available at all times in the volume in which the Deduplication Store is configured. By default, if the free space is less than 2GB on the volume hosting the Deduplication Store, jobs will not continue.
Use the following steps to set the minimum free space.
|
![]() |
If the amount of free space falls below the specified amount in the volume in which the Deduplication Store is stored, the MediaAgent generates an event message and generates the MediaAgents (Disk Space Low) alert, if configured.
Use the following steps to set the minimum free space to generate the alert:
|
![]() |
You can set the number of days after which a block cannot be used for new deduplication. Setting this value will ensure that very old blocks are not allowed as the 'origin' data for newer backup jobs that are deduplicated.
Use the following steps to set the number of days after which a block cannot be used for deduplication:
|
![]() |
By default when deduplication storage policy is configured, compression is automatically enabled for the storage policy copy. This setting overrides the subclient compression settings by enabling Use Storage Policy Settings option at subclient level. For most of the data types the compression is recommended. This process works by compressing the blocks and then generating a signature hash on the compressed block.
Use the following steps to enable data compression for all subclients to storage policy:
|
![]() |
By default, all associated subclients uses the compression settings set on the deduplication storage policy copy. To modify or turn off the compression settings on the subclients, use the following steps:
|
![]() |
Perform the following to change the MediaAgent hosting the deduplication store:
Make sure that there are no SIDB.exe and SIDB2.exe process are running on the MediaAgent from which the SIDB currently resides. Use the following steps to confirm that no process are running:
For Windows:
For Linux:
|
You need to manually copy the content available in the current Deduplication Store to the new mediaagent which you want to host the Deduplication Store. Use the following steps to copy the content available in the current Deduplication Store:
You cannot copy the deduplication database (SIDB) from Windows to Linux or from Linux to Windows location. |
|
![]() |
Use the following steps to change the MediaAgent hosting the Deduplication Store:
|
![]() |
If your old MediaAgent is in use hosting deduplication store for other storage policies and libraries or for backup, use the following steps to start the services.
For Windows:
For Linux:
|
Use the following steps to change the location of the Deduplication Store in the existing MediaAgent:
|
![]() |
By default, a new Deduplication Store is created for every 100 TB of data. Note that this is the amount of data stored on the media after deduplication.
Use the following steps to create new Deduplication Store:
|
![]() |
The currently active Deduplication Store can be sealed on-demand.
When a Deduplication Store is sealed:
The option to Seal Deduplication Stores is useful in rare cases when there are hardware issues or disk malfunction. Creating a new store will prevent new data from referencing any of the old data in the malfunctioned disks.
Use the following steps to seal the Deduplication Store:
|
![]() |
Use the following method to backup the deduplication database so that it can be reconstructed in the unlikely event of an offline deduplication database. If this method is not used, the system will automatically use the automatic recovery process as described in Setting Up Automatic Recovery to reconstruct the database.
This is the recommended method of protecting the deduplication database. If there are multiple deduplication databases on the MediaAgent, this method automatically backs up all the deduplication databases.
This method performs a FULL backup of the deduplication database and the backup data is sent to the appropriate backup media based on the storage policy selected for the Deduplication Database subclient.
If you have deduplication database hosted on Linux Intel Itanium (IA64) machine, deduplication database backups using DDB subclient is not supported. To backup deduplication databases, use automatic recovery process described in Setting Up Automatic Recovery. |
Use the following steps to set up regular backup of deduplication database through CommCell Console:
Use the following steps to create DDB subclient, assign storage policy to the subclient and then schedule the DDB backup.
1. | File System iDataAgent
must be installed on the MediaAgent hosting the deduplication store. You can install the File System iDataAgent as a Restore Only Agent without consuming any license. To do so, make sure to select Install Agents for Restore Only check box from the Select Platforms dialog box during File System iDataAgent installation. |
See Getting Started - Windows File System Deployment for step-by-step procedure. | ||
2. |
|
![]() |
||
3. |
|
|||
4. |
|
![]() |
||
5. | Click Schedule and then click OK. |
![]() |
||
6. |
|
![]() |
||
7. |
|
![]() |
||
8. | Click OK. The new Deduplicate Database Store subclient will be displayed in the right-pane. |
![]() |
||
9. | When the schedule is run, the Job Controller window
will display the backup job as shown in the sample image.
|
![]() |
Use the following steps to set automatic recovery of a deduplication database.
10. |
|
![]() |
||||
11. |
|
![]() |
||||
12. | When the system detects a offline deduplication database, the Job Controller window will display the recover job as shown in the sample image. |
![]() |
Use the following steps to create DDB subclient through command line.
|
SAMPLE XML Parameter<?xml version="1.0"?> <App_CreateSubClientRequest> <subClientProperties contentOperationType="ADD"> <subClientEntity subclientName="DDBsubclient" clientName="Name of the MediaAgent" appName="File System"/> <fsSubClientProp isDDBSubclient="true"/> <commonProperties> <storageDevice> <dataBackupStoragePolicy storagePolicyName="Name of the Storage Policy"/> </storageDevice> </commonProperties> </subClientProperties> </App_CreateSubClientRequest> |
Additionally, you can configure alert for deduplication store backup jobs to receive alerts when a deduplication store backup job fails and when there are no deduplication backup jobs.
Use the following steps to configure alert for the deduplication database backup:
|
![]() |
||
![]() |
When the system detects an offline deduplication database (DDB), the DDB reconstruction job can be run to recover the DDB. During the deduplication database reconstruction job, the data in the DDB is validated against the CommServe database to ensure that both the databases are synchronized for successful recovery of the DDB. In addition, it allows you to use the same DDB in the future.
The following sections explain the different methods of recovering the deduplication database.
When a system detects a offline deduplication store, the 'recover job' will automatically run to restore the deduplication store from the deduplication backup which was backed up using DDB subclient.
See Backing Up Deduplication Store Database for more information on backing up deduplication store.
Use the following steps to revert to the default settings if you have changed the store recovery points.
|
![]() |
You can choose to recover from a offline deduplication store by manually reconstructing the store. If a offline deduplication store is detected, all jobs on that copy are paused until the store is manually reconstructed.
Use the following steps to configure and perform manual reconstruction:
|
![]() |
|
![]() |
You can choose to automatically create a new Deduplication Store in the event the active store becomes offline and deduplication database backup is not available. When configured, if a offline store is detected then the store is automatically sealed and a new store is created.
Use the following steps to automatically create a new Deduplication Store when the store becomes offline and deduplication database backup is not available:
|
![]() |
Variable content alignment is performed on the client system and consequently you may experience some performance overhead, especially when used together with software compression. You can enable variable content alignment as follows:
|
![]() |
Deduplication can be enabled for secondary copies during Storage Policy Copy creation. Once the copy is created, deduplication cannot be enabled later.
1. |
|
![]() |
||
2. |
|
![]() |
||
3. |
|
![]() |
||
4. |
|
![]() |
||
5. |
|
![]() |
||
6. |
Click OK to accept the default schedule. |
![]() |
||
7. |
Secondary Copy is displayed in the Storage Policy pane. |
![]() |
When the Primary Copy is deduplicated, you might want to create additional copies for offline storage. Note that you could use the auxiliary copy feature for this. But to create an Auxiliary Copy you would have to wait until the primary copy becomes available. This could cause delays in getting the data offsite. The Inline Copy feature allows you to create additional copies of data at the time of backups. Since the Primary Copy is the source for the Inline Copy the Inline Copy can be created along with the Primary Copy. However, note that the Inline Copy does not get deduplicated.
|
![]() |
If necessary you can promote the secondary copy as the primary copy so that subsequent backups are automatically deduplicated.
To reduce the time taken to read the data during restore and auxiliary copy operations, deduplication-enabled operations can be performed using look-ahead reader. Use the following steps to enable the look ahead reader, by creating DataMoverUseLookAheadLinkReader registry key on the MediaAgent where the disk library is created.
Look-Ahead Reader operation is not applicable for Cloud Storage Library. |
|
![]() |
The signature generation module generates signatures for each block. This is done using SHA 512 (Secure Hash Algorithm) along with the size of the data. This combination eliminates the possibility of collisions, where two blocks hash to the same value.
The signature generation module can be configured either on the Client or the MediaAgent. Note that it is recommended to be run on the Client as it is both memory and resource intensive. Follow the steps described below to configure the signature generation:
|
![]() |
Data Aging operations will automatically look up the deduplication store before data is deleted from the disk. Data Aging will only delete the source data when all the references to a given block is aged. So if you see older chunks in disk libraries remaining on the volume even if the original data is deleted, it might be due to the fact that deduplication reference(s) to the chunk is still valid.
If a deduplication store is offline, then that store will not be aged until all data on the store is eligible for aging.
Do not manually delete the Deduplication Store. The Deduplication Store facilitates the deduplication backup jobs and data aging jobs. If deleted, new deduplicated backup jobs cannot be performed and the existing data in the disk mount paths will never be aged.
Once enabled, deduplication cannot be disabled on a storage policy copy. However, you can use the following workaround to disable deduplication:
Although deduplication cannot be disabled, it can be temporarily suspended. Suspend deduplication to temporarily detach the Deduplication Store to gain access to the store, for diagnostics and maintenance purposes. Once you resume deduplication, the signature verification and data deduplication is resumed.
Follow the step-by-step instructions described below to suspend/resume:
|
![]() |
Depending on the size of the Deduplication database, this process might take as long as 30 minutes to complete.
Calypso stop
ps –aef | grep sidb
Depending on the size of the Deduplication database, this process might take as long as 30 minutes to complete.
When a MediaAgent hosting the deduplication database (DDB) attempts to reboot or power off, by default the system doesn't halt and the operating system will shut down regardless of any processes that are running.
However, SIDB process has a built-in capability to receive the shutdown notification and to bring down the deduplication database gracefully if there is enough amount of time between the shutdown notification and the actual machine shutdown. In case where the graceful stopping of the DDB takes more time than the OS allows, it may still damage the DDB. In order to prevent the shutdown while the SIDB process is still running, the following method is suggested, which will prevent the shutdown in most of the cases.
To allow system to shut down gracefully when SIDB process is running, perform the following on Windows computer.
For Linux, the Calypso stop automatically handles the graceful shutdown of the MediaAgent.
Note that when MediaAgent attempts to reboot or shut down, the existing CVD process attempts to stop so that it do not accept any more requests. By setting up the below script, if there are any SIDB process that are running at this period, the CVD process will go into Stopping state and wait for SIDB process to gracefully exit before shutting down.
This installs Update 34948 which automatically executes the AddScripttoShutdownGPO.exe script. This script allows the system to delay the reboot or shutdown till it reaches the grace period.
If StopProc.vbs script is not populated then perform the following:
From the Command Prompt, navigate to the following location:
<Installation Directory>\Base
Run the following command:
AddScripttoShutdownGPO.exe -vm InstanceXXX
Repeat step 2 to verify that the StopProc.vbs script is populated in the Shutdown Properties.
This waiting time prevents the shutdown while the SIDB process is running, allows the process to stop gracefully and not to damage the deduplication database.
To uninstall the script perform the following:
<Installation Directory>\Base
AddScripttoShutdownGPO.exe -vm InstanceXXX -uninstall
Once you run the above command, you will not see any delay in rebooting of your machine.
When a Deduplication Store is offline, the Deduplication Store is automatically reconstructed based on the Deduplication Store availability options. This Reconstruct Deduplication Database report provides the information about the storage policy, Deduplication Store name to which it was reconstructed and status of the restore job.
The following procedure provides the steps necessary to run a Reconstruct Deduplication Database report:
The Storage Policy report provides deduplication related information including deduplication properties and Deduplication Store information. The following procedure provides the steps necessary to run Storage Policy report:
The disk usage report provides the following information:
Use the following steps to run the Disk Usage report:
|
![]() |
Deduplication requires following licenses based on the License Type: