While certain database iDataAgents
support multiple data streams per subclient
for data protection operations, file system iDataAgents
support only a single data stream per subclient by default; data is read from each
source volume, sent to the data receiver, and from there to the data writer which
writes the data on a single piece of media. File System Multi-Streaming employs
multiple data streams per subclient for the data protection operation, enabling
the subclient's contents to be distributed to all the streams, transmitting them
in parallel to the storage media. Hence a subclient whose data is secured using
three data streams, utilizes more of the available network resources, and can complete
in as little as one third the time that the same data would require using
a single stream.
A data stream can be thought of as a data channel that connects the client file
system to the storage media. Multiple streams are parallel channels through which
data can flow, thus improving the rate at which data can be written to the storage
media. (Compare Examples 1 and 2 in the illustration below.) The number of streams
is specified using the Number of Data Readers field in the
General tab of the Subclient
Properties.
The Allow multiple data readers within a drive or mount point option,
also in the General tab
of the Subclient Properties, provides further control of how File System Multi-Streaming
operates.
If cleared, only 1 Data Reader per drive (or mount point, for Unix) is allowed.
For example, if 10 Data Readers are specified for 3 drives or mount points,
without also selecting the Allow multiple data readers within a drive or
mount point option, only 3 Data Readers at most will ever be active simultaneously,
1 per drive or mount point; however, the list of files (or blocks, for Image
Level and Image Level ProxyHost) to be backed up will be split into smaller
groupings, called Collect Files, based on the specified number of Data Readers.
Thus the effect of specifying a higher number of Data Readers, in this case,
is merely to spread the total list of files/blocks to be backed up across the
number of active Data Readers more evenly. As a Data Reader completes backing
up all the files/blocks in a given Collect File on a drive or mount point, it
will begin working on the list of files/blocks in the next Collect File, until
all specified files/blocks on the drive or mount point have been backed up.
For more detailed information about Collect files, refer to
Best Practices below.
If selected (see Example 3 in the illustration below), multiple simultaneous
Data Readers are allowed on each drive (or mount point, for Unix.) For Windows
and NetWare,
note that the read throughputs for multiple Data Readers on the same physical
disk may be degraded; for Unix file systems, read throughputs for multiple Data
Readers on different mount points on the same disk may vary depending on the
hardware.
NOTES
RAID controllers have very advanced caching schemes, and thus are suitable
for the Allow multiple data readers within a drive or mount point
option.
Other hardware, such as a SAN, may be suitable for this option as well.
NAS iDataAgents do
not show this as an option at all, as their dedicated operating systems
and controllers are able to arbitrate such multiple reads without any
potential for problems.
For spanned volumes using Windows dynamic disks (only supported for
the Windows File System iDataAgent)
where one drive letter represents multiple physical disks, you may want
to try enabling the Allow multiple data readers within a drive or mount
point option, and gauging the affect on performance. There may be a
performance benefit when the Data Readers are operating on different physical
drives, but performance could be somewhat degraded when they are operating
on the same physical disk.
Testing is recommended to determine the actual performance of the hardware
involved, using diskread.exe utility provided
in the Resource Pack. (See
Resource
Pack for more information.)
In conjunction with File System Multi-Streaming,
Data Multiplexing can be
utilized to cause all streams to be written to a single media. Note that if Data
Multiplexing is not utilized, each stream is written to different media, sufficient
available media must be ensured, and the index is created on the last tape to complete;
also, without Data Multiplexing, when using stand-alone drives,
Drive Pooling Stand-Alone Drives is necessary.
The following section provides the steps for using File System Multi-Streaming:
Create a new Subclient or
configure an existing one.
In the General
tab of the Subclient Properties, specify the Number of Data Readers,
and if appropriate, select the Allow multiple data readers within a drive
or mount point option.
Select a Storage
Policy with a sufficient number of streams configured. The number of
streams configured in the Storage Policy should be equal to, or greater
than, the number of Data Readers specified in your Subclient Properties.
Run a backup. Refer to one of the following for more information:
To confirm the backup job is using multiple streams, right click the job
in the Job Controller window, select Detail..., and in the Job Details
window select the
Streams
tab. Note that the actual number of streams shown will vary as the job progresses.
When multiple data streams are active, any action imposed on one stream
(such as suspending) will affect all streams.
For best restore performance when restoring all the contents of a multi-stream
backup (multiplexed or not), use the
Restore by Jobs option, if this
feature is supported for your
iDataAgent. See
Advanced File System iDataAgent
Options - Support for more information.
Data Multiplexing considerations:
If Data Multiplexing is used, data recovery operation will employ a
single stream. If multiple data recovery streams are desirable, do not use
Data Multiplexing.
If Data Multiplexing is not used:
Regular restore will use only one stream
Restore by Jobs can employ the same number of streams used for backup;
thus, for best performance in the case of a tape library, the same number
of tape drives used for backup should be available for restore
The number of Collect Files (the lists of files/blocks to be backed up)
that are created is based on the following:
Windows and NetWare:
If "Allow multiple readers...", is selected: <specified number of
Data Readers> x <number of physical drives> x 2
If "Allow multiple readers...", is not selected: <number of physical
drives> x 2
Unix:
If "Allow multiple readers...", is selected: <specified number of
Data Readers> x <number of mount points> x 2
If "Allow multiple readers...", is not selected: <number of mount
points> x 2
Example:
<specified number of Data Readers> = 3
<number of physical drives> = 3
3 x 3 x 2 = 18 Collect Files
NOTES
In the equations and example shown, notice the final number is "2".
This is the default multiplier used to determine the number of Collect Files
that will be created for the job.
For Windows File System iDataAgent
only, if a greater number of Collect Files is desirable for load balancing
purposes, use the
For_Multiple_Reads_On_Disk_Collect_Split_Multiplication_Factor
registry key to change the default setting of "2" to a higher number. This
can be useful in a case where one of the Data Readers takes significantly
longer to complete a given Collect File, leaving the other Data Readers
idle. By causing the content to be divided among a greater number of Collect
Files, such disparity can be diminished, fully utilizing the expected number
of streams for a greater percentage of the job duration.
Multiple Stream Backups from the CommCell Console - If any of the streams
do not have resources available, the whole job will be placed in a pending state,
and will eventually fail if the condition is not corrected. If you cannot
determine the availability of resources for all the streams, run the backup with
one stream. This can be done by specifying
1 in the Number of Data Backup Streams field
in the Subclient Properties (Storage Device) window.
Keep in mind that single-stream backups will also fail if the required media resource
is unavailable.
Windows Specific
For the purposes of performance load balancing, which maximizes the speed
of this feature, any mount point on a physical drive which points to another
physical drive is not taken into account. For example, if drive
D: has multiple mount points which each point
to other physical drives containing data, the performance-enhancing parallelism
of this feature is not utilized; drive D: and
all of those mount points will be backed up sequentially, not in parallel. To
maximize performance, exclude the mount points from the content of drive
D:, instead treating each of the mount points
as separate Subclient content.
Unix Specific
Mount points, rather than physical drives, are treated as separate entities
for the purposes of performance load balancing, and the software will deploy
a Data Reader for each mount point. Thus, if more than one mount point is pointing
to the same physical drive, multiple simultaneous reads will be performed on
that drive, even if the Allow multiple data readers within a drive or mount
point option has not been selected.
This feature requires a Feature License to be available in the CommServe® Server.
Review general license requirements included in
License Administration. Also,
View All Licenses provides step-by-step
instructions on how to view the license information.
The Command Line Interface supports the capability to perform backups using a
subclient that has been configured for multiple streams as described in
How to use File System
Multi-streaming above. For command line backups, see
Command Line Interface for an overview, and
qoperation - backup for more specific
information about commands.