Topics | Related Topics
Achieving Parallel Data Protection Operations for Database using Data Streams
Achieving Parallel Data Protection Operations for File System iDataAgentsAchieving Parallel Data Protection Operations for NAS Data
Considerations for Multiple Streams
A data stream can be thought of as a data channel that connects the client file system or database to the storage media. Multiple streams provide for multiple channels through which data can flow. When used, multiple streams provide the means to parallelize an operation and thus improve the rate at which data can be written to or retrieved from the storage media.
The DB2, DB2 DPF, Informix, Microsoft SQL Server, Oracle, Oracle RAC, Sybase, and SAP for Oracle and MAXDB iDataAgents support multiple streams per subclient, or instance. (Note that the SAP for MAXDB iDataAgent supports multiple streams for database file backups only, not for log and control file backups.) In addition, the Automatic File System Multi-Streaming feature extends multiple stream per subclient support to additional iDataAgents, configurable at the subclient level.
Agents that perform data protection operations using single streams can use any drive in the library provided it is not in use and not off-line. Note too that a given data stream always writes to or reads from the same media group. (This topic is discussed further in Removable Media Groups.)
The following illustration depicts how data streams are used in single and multi-stream data protection operations.
The top portion shows a data protection operation through the single data stream. When this operation begins, the subclient initiates one process to transfer the data. This data travels the data stream to the storage media.
The bottom portion of the illustration shows a hypothetical multi-stream database data protection operation. In this example, the data protection operation is configured to use three data streams. When the operation begins, the subclient launches three processes, one for each data stream, and transfers different database objects through the streams to the media. Since the data transfer occurs over three data streams, each conveying different database objects, all of the database objects involved in the data protection operation are written to three distinct media groups.
|
This section describes how the Agents for some database applications exploit multiple data streams to parallelize their data protection operations. This topic is being discussed to provide you with a thorough understanding of how the software addresses different operating environments. With this understanding, you will be able to make more informed decisions when you configure the system to support the various types of Agents.
Deploying multiple data streams in a data protection operation enables the subclient to distribute the database objects to all the streams and transmit those objects in parallel to the storage media. Hence a database, or portion thereof, that secures data using three data streams takes about one third the time than the same set of database objects would require using a single stream.
Data streams are configured attributes of each storage policy. When configuring a storage policy, you specify the maximum number of data streams that you want the subordinate copies to support. This number may be subject to limitations, depending on the type of storage device that you are using. For information relating to specific types of storage hardware (e.g., tape, disk, etc.) see Hardware-Specific Resource Issues.
The following illustration shows the relationships between storage policies, copies and data streams. Note that this is only a relational diagram. In terms of the physical layer, a data stream extends between a client computer and the storage media.
In this illustration, two storage policies complete with their own copies are shown. Each storage policy contains two copies each with the same number of streams. Copies 1 are the primary copies and carry all data for their respective storage policies. Copies 2 are secondary copies. They are used for auxiliary copy operations. Each data stream maps to a discrete set of archive files on the storage media.
It should be noted that, for clarity, the figure omits the data protection attributes such as compression mode, associated library, and retention periods. As explained in Storage Policy Copies, these attributes are established for each copy.
The maximum number of streams that can be created simultaneously must be the same for all copies within a given storage policy. The reason for this is that for some databases, the number of streams through which they are restored/recovered must equal the number of streams through which they were backed up. If different storage policy copies supported different numbers of streams, operations would fail if they tried to use one copy to restore/recover data that was backed up through a different copy with a greater number of streams.
Consequently, the maximum number of streams available to each copy of a given storage policy is limited by the smallest number of streams available to any copy within the storage policy. If the limiting factor severely hampers the efficiency of one of the copies (e.g., if a copy directed to disk media is limited by the restrictions placed on a copy directed to tape media), you may want to create separate storage policies for the different copies. For additional information, see Hardware-Specific Resource Issues.
You can either add or reduce the maximum number of data streams from the General tab of the Storage Policy Properties dialog box.
However, keep in mind that each stream requires the use of one media drive. Thus the maximum number of data streams can be as follows:
You can change the number of data streams if the storage policy does not have any data from data protection operations associated with it. However, it is recommended that you do not decrease the number of streams for a storage policy which contains data associated with a subclient which supports multiple streams. (For example, in the SQL Server iDataAgent, after running a backup using a storage policy with three streams, it is recommended that you do not decrease the number of streams for the storage policy.)
The number of data streams must be the same on the primary copy as the number of streams defined for the storage policy. However, for a secondary copy that combines streams, this number can be defined. For more information on combining streams, see Auxiliary Copy With Combined Streams.
Silo-enabled storage policy copies can be configured with additional data streams dedicated to silo backup operations only. This is in addition to the data streams already configured for the copy. One silo stream is configured by default and this value can be modified. See Configuring Data Streams for a Silo Backup for more details.
Refer to Automatic File System Multi-Streaming for information on how streams can be used for non-database iDataAgents.
See Also:
Refer to Advanced - NAS iDataAgent Backup for information on how streams can be used to back up data on a NAS file server.
Before performing any procedures using multiple streams, review the following information:
For SAP for Oracle, multistream log backups are not supported from the CommCell Console. Such backups are supported from the Command Line Interface.
Multiple Stream Backups from the CommCell Console - If any of the streams do not have resources available, the whole job will be placed in a pending state, and will eventually fail if the condition is not corrected. If you cannot determine the availability of resources for all the streams, run the backup with one stream. This can be done by specifying 1 in the Number of Data Backup Streams field in the Subclient Properties (Storage Device) window. Keep in mind that single-stream backups will also fail if the required media resource is unavailable.
Oracle third party command line operations running on multiple streams will share the same Job ID in the Job Manager. If all the streams return failure, then the job is marked as failed. However, if one of the streams fail, it is submitted to the other stream for completion.