Tools - Deduplication to Disk

Table of Contents

Measure the Deduplication Disk Performance

Evaluate a Disk for Hosting the Deduplication Store

Create a Deduplication Store Dump File

Generate Deduplication Store Statistics

Upgrade the Deduplication Database

Measure the Deduplication Disk Performance

Use the following steps to measure the disk throughput for the disk in which you plan to create the Deduplication Store.

Running the Tool Run the following file from the MediaAgent computer hosting the Deduplication Store.

Windows:

C:/Program Files/Bull Calypso/Calypso/Base/CvDiskPerf.exe

Linux:

./CVDiskPerf

Usage Windows:

CvDiskPerf -READWRITE -PATH <SIDB path> -RANDOM -FILECOUNT <filecount> -USER <username> -PASSWORD <password> -DOMAIN <domain> -OUTFILE <outputfile>

Linux:

./CVDiskPerf -READWRITE -PATH <path> -RANDOM -FILECOUNT <filecount> -OUTFILE <outputfile>

Where:

-READWRITE is the option to measure read/write performance.

-PATH is the deduplication store mount path to be tested for performance.

-RANDOM is the keyword to measure random read/write operations (Optional). By default, sequential read/write operations are measured.

-FILECOUNT is the number of files used in the read and write operations (Optional). Default value is 1024.

-USER, -PASSWORD, and -DOMAIN are options to provide specific user credentials to impersonate access to the path provided in the –PATH option (Optional). By default, the application user-credential will be used. If domain name is not provided, then the default domain will be used.

-OUTFILE is the location of the output file to store the disk performance results (Optional). Default value is '.\CvDiskPerf.txt'

Sample Commands Windows:

CvDiskPerf -READWRITE -PATH c:\SIDB01 -OUTFILE c:\temp\perf.txt

CvDiskPerf -READWRITE -RANDOM -PATH c:\SIDB01 -OUTFILE c:\temp\perf.txt

CvDiskPerf -READWRITE -RANDOM -PATH c:\SIDB01 -USER commuser –PASSWORD commpw -OUTFILE c:\temp\perf.txt

Linux:

./CVDiskPerf -READWRITE -RANDOM -PATH /test1 -OUTFILE /tmp/CVDISKLIB01.log

Output The details of the disk performance are stored in the output file provided in the -OUTFILE option. The contents of a sample output file are given below:

DiskPerf Version        : 1.3

Path Used               : f:\

Read-Write type         : RANDOM

Block Size              : 128

Block Count             : 1024

File Count              : 500

Total Bytes Written     : 1048576000

Time Taken to Write(S)  : 7.113515

Throughput Write(GB/H)  : 494.217709

Total Bytes Read        : 1048576000

Time Taken to Read(S)   : 7.581667

Throughput Read(GB/H)   : 463.700792

Time Taken to Create(S) : 1.16

Throughput Create(GB/H) : 325.04

Ensure that the average read throughput of the disk is around 500 GB per hour, and the average write throughput of the disk is around 400GB per hour.

Calculate the average read and write throughputs from multiple samples (between three and ten), for a FILECOUNT of 500.

The following table provides a sample of the disk performance calculation:

Disk Performance

Throughput in GB/Hour

Write

Read

Sample 1 341.3798 477.6198
Sample 2 344.3546 513.2807
Sample 3 340.8644 575.6513
Sample 4 428.8675 499.7836
Sample 5 397.6285 426.5668
Sample 6 438.2224 503.0041
Sample 7 428.0591 494.4092
Sample 8 427.0613 643.4305
Sample 9 446.6219 523.7768
Sample 10 396.5592 581.3948
Average 398.9619 523.8918

Evaluate a Disk for Hosting the Deduplication Store

The following section provides information on how to evaluate the disk in which you plan to create the Deduplication Store. This will help you to determine the size of the data and store that can be hosted on the disk.

You can also use the user-interface version of this tool. See SIDB Simulator for more details and usage.

Running the Tool Run the following file from the MediaAgent computer hosting the Deduplication Store.

C:/Program Files/Bull Calypso/Calypso/Base/SIDB2.exe

Usage SIDB2 -simulateddb -p <SidbLocation> -in <Instance#> [-datasize] [-dratio] [-blocksize] [tlimit] [-diskperf -tpath] [-user] [-password] [-domain]

Where:

Options in [] denotes optional arguments

-simulateddb is the keyword to simulate the deduplication database to evaluate the disk compatibility for hosting the deduplication store.

-p is the location (an empty directory) where the deduplication store will be located.

-in is the instance of the software using the tool.

-datasize is the application data size in GB. Number.

-dratio is the expected deduplicaiton ratio. Number (default value is 5.)

-blocksize is the deduplication data block size in KB. Number (default is 128.)

-tlimit is the value in microsecond. Number (default value is 1000.) -tlimit and -datasize arguments cannot be used together.

-samplesize is the size of the sample. Number (default values is 10000.)

-diskperf and -tpath. Diskperf is the keyword to measure disk performance and tpath is the path of the disk. If you use -diskperf, -tpath is mandatory.

-keepddb is the option to keep the deduplication database files. The files are removed by default.

-stopCounter signifies how many additional iterations to process after reaching the threshold time. This is to limit spikes caused by caching. (default value is 50.)

Example 1 For the details on the projected average transaction time for an insert/query in the deduplication database based on the size of the application data that is backed up, use the tool with the -simulateddb and -datasize options.

Command

SIDB2 -simulateddb -in instance001 -p d:\dedup_store -datasize 500

Sample output

The disk is capable of hosting a deduplication DB for:

0.500 TB of Application Data Size

0.100 TB of data on disk

146.0 microseconds average Q&I overhead perblock

Throughput for DDb server 3156 GB per Hour

Example 2 For recommendations on the maximum application data size that can be backed up using the store based on the average access time for each record, use the tool with the -simulateddb. This will run till it reaches the default threshold time limit of 1000 microseconds.

Example

SIDB2 -simulateddb -in instance001 -p d:\dedup_store

Example 3 For recommendations on disk performance, use the tool with the -simulateddb and -diskperf options.

Example

SIDB2 -simulateddb -in instance001 -p d:\dedup_store -datasize 100 -diskperf -tpath d:\disktest

Create a Deduplication Store Dump File

The following tool helps you to create a .csv (comma separated values) file to view the contents of the store. The .csv file can be viewed using applications like Microsoft Excel, Notepad, etc.

Pre-requisite Check the Job Controller to make sure that no Deduplication jobs are in progress.
Running the Tool Run the following file from the MediaAgent computer hosting the Deduplication Store.

C:/Program Files/Bull Calypso/Calypso/Base/SIDB2.exe

Usage SIDB -dump <primary/secondary/statistics/said> <SIDB-location> [output file name]

Where:

-dump is the keyword to create a dump file containing the deduplication store details based on the option provided.

    primary - option to generate the dump for actual objects/blocks in the deduplication store.

    secondary - option to generate the dump for duplicate objects/blocks.

    statistics - option to generate the dump for deduplication store statistics.

    said - option to generate the dump for information on distinct archive-IDs from the secondary table.

SIDB-location is the absolute location of the deduplication store for which the dump file is created.

output file name is the name of the output file (Optional). If no output file is provided, then a default output file will be created in the location of the deduplication store, with the naming convention - "MA-NAME_Store-id_Timestamp.csv".

Sample Commands

SIDB –dump primary D:\production\2008\CV_SIDB\2\1 D:\production\sidbdump.csv

SIDB –dump primary D:\production\2008\CV_SIDB\2\1

SIDB –dump secondary D:\CV_SIDB\8\21 myoutput.csv

SIDB –dump statistics D:\mdoc\CV_SIDB\3\7

SIDB –dump said D:\db\cbkdb\CV_SIDB\3\1212

Generate Deduplication Store Statistics

Use this tool to display statistical information about the deduplication store.

Pre-requisite Check the Job Controller to make sure that no Deduplication jobs are in progress.
Running the Tool Run the following file from the MediaAgent computer hosting the Deduplication Store.

C:/Program Files/Bull Calypso/Calypso/Base/SIDB2.exe

Usage SIDB -stat <SIDB-location>

Where:

-stat is the keyword to generate statistical details of the deduplication store such as, the number of objects tracked in the store, number of objects in the secondary table, size of store, size of data tracked in store, and space savings due to deduplication.

SIDB-location is the absolute location of the deduplication store for which the dump file is created.

Upgrade the Deduplication Database

The following section provides information on how to upgrade the 8.0 Deduplication Store Database to 9.0 format which helps you to avoid the new baseline (i.e., calculating the disk space required to create a new deduplication database store) on the storage policy.

Use this tool after upgrading the MediaAgents containing the Deduplication Database.

Pre-requisites
  • Check the Job Controller to make sure that no Deduplication jobs are in progress.
  • Upgrade the MediaAgent and CommServe computer from 8.0 to 9.0.
Running the Tool On the MediaAgent computer which is hosting the Deduplication database store, open the command prompt and navigate to <Software_Installation_Path>/Base folder

Run the following command:

SIDB2 -in <Instance Name> -cn <Client Name> -i <Engine Id> -convert writer

Usage SIDB2 -in <Instance Name> -cn <Client Name> -i <Engine Id> -convert writer

Where:

-in is the keyword for the instance name.

-cn is the keyword for the client name.

-i <Engine Id> is the id of the deduplication database store that you wish to upgrade.

Example C:\Program Files\CommVault\Simpana\Base>sidb2 -in Instance001 -cn matador4 -i 2 -convert writer

2011/03/23 22:17:15 Initialize: Instance [Instance001], Client [matador4], Getting information from CS for Engine [2]

2011/03/23 22:17:19 Writer: Converting primary records...

2011/03/23 22:17:20 WriteRecords: Total records to process [181245683]

100% complete.

2011/03/24 01:43:51 WriteRecords: Finished copying [181245683] records

2011/03/24 01:43:51 Writer: Converting secondary records...

2011/03/24 01:43:52 WriteRecords: Total records to process [269886761]

100% complete.

2011/03/24 09:39:25 WriteRecords: Finished copying [269886761] records

2011/03/24 09:54:09 Writer: Going to rename [D:\Convert\CV_SIDB\2\2] -> [D:\Convert\CV_SIDB\2\Old_2]

2011/03/24 09:54:09 Writer: Going to rename [D:\Convert\CV_SIDB\2\New_2] -> [D:\Convert\CV_SIDB\2\2]

2011/03/24 09:54:09 Writer: Going to update the table version in the CS.

2011/03/24 09:54:09 Writer: Done.