Topics | Related Topics
Achieving Parallel Data Protection Operations for Database using Data Streams
Achieving Parallel Data Protection Operations for File System iDataAgentsAchieving Parallel Data Protection Operations for NAS NDMP Data
Considerations for Multiple Streams
A data stream can be thought of as a data channel that connects the client file system or database to the storage media. Multiple streams provide for multiple channels through which data can flow. When used, multiple streams provide the means to parallelize an operation and thus improve the rate at which data can be written to or retrieved from the storage media.
The DB2, DB2 DPF, Informix, Microsoft SQL Server, Oracle, Oracle RAC, Sybase, and SAP iDataAgents support multiple streams per subclient, or instance. (Note that the SAP for MAXDB iDataAgent supports multiple streams for database file backups only, not for log and control file backups.) In addition, the Automatic File System Multi-Streaming feature extends multiple stream per subclient support to additional iDataAgents, configurable at the subclient level.
Agents that perform data protection operations using single streams can use any drive in the library provided it is not in use and not off-line. Note too that a given data stream always writes to or reads from the same media group. (This topic is discussed further in Removable Media Groups.)
The following illustration depicts how data streams are used in single and multi-stream data protection operations.
The top portion shows a data protection operation through the single data stream. When this operation begins, the subclient initiates one process to transfer the data. This data travels the data stream to the storage media.
The bottom portion of the illustration shows a hypothetical multi-stream database data protection operation. In this example, the data protection operation is configured to use three data streams. When the operation begins, the subclient launches three processes, one for each data stream, and transfers different database objects through the streams to the media. Since the data transfer occurs over three data streams, each conveying different database objects, all of the database objects involved in the data protection operation are written to three distinct media groups.
|
This section describes how the Agents for some database applications exploit multiple data streams to parallelize their data protection operations. This topic is being discussed to provide you with a thorough understanding of how the software addresses different operating environments. With this understanding, you will be able to make more informed decisions when you configure the system to support the various types of Agents.
Deploying multiple data streams in a data protection operation enables the subclient to distribute the database objects to all the streams and transmit those objects in parallel to the storage media. Hence a database, or portion thereof, that secures data using three data streams takes about one third the time than the same set of database objects would require using a single stream.
Data streams are configured attributes of each storage policy. When configuring a storage policy, you specify the maximum number of data streams that you want the subordinate copies to support. This number may be subject to limitations, depending on the type of storage device that you are using. For information relating to specific types of storage hardware (e.g., tape, magnetic disk, etc.) see Hardware-Specific Resource Issues.
The following illustration shows the relationships between storage policies, copies and data streams. Note that this is only a relational diagram. In terms of the physical layer, a data stream extends between a client computer and the storage media.
In this illustration, two storage policies complete with their own copies are shown. Each storage policy contains two copies each with the same number of streams. Copies 1 are the primary copies and carry all data for their respective storage policies. Copies 2 are secondary copies. They are used for auxiliary copy operations. Each data stream maps to a discrete set of archive files on the storage media.
It should be noted that, for clarity, the figure omits the data protection attributes such as compression mode, associated library, and retention periods. As explained in Storage Policy Copies, these attributes are established for each copy.
The maximum number of streams that can be created simultaneously must be the same for all copies within a given storage policy. The reason for this is that for some databases, the number of streams through which they are restored/recovered must equal the number of streams through which they were backed up. If different storage policy copies supported different numbers of streams, operations would fail if they tried to use one copy to restore/recover data that was backed up through a different copy with a greater number of streams.
Consequently, the maximum number of streams available to each copy of a given storage policy is limited by the smallest number of streams available to any copy within the storage policy. If the limiting factor severely hampers the efficiency of one of the copies (e.g., if a copy directed to magnetic disk media is limited by the restrictions placed on a copy directed to tape media), you may want to create separate storage policies for the different copies. For additional information, see Hardware-Specific Resource Issues.
You can either add or reduce the maximum number of data streams from the General tab of the Storage Policy Properties dialog box.
However, keep in mind that each stream requires the use of one media drive. Thus the maximum number of data streams can be as follows:
You can change the number of data streams if the storage policy does not have any data from data protection operations associated with it. However, it is recommended that you do not decrease the number of streams for a storage policy which contains data associated with a subclient which supports multiple streams. (For example, in the SQL Server iDataAgent, after running a backup using a storage policy with three streams, it is recommended that you do not decrease the number of streams for the storage policy.)
The number of data streams must be the same on the primary copy as the number of streams defined for the storage policy. However, for a secondary copy that combines streams, this number can be defined. For more information on combining streams, see Auxiliary Copy With Combined Streams.
Refer to Automatic File System Multi-Streaming for information on how streams can be used for non-database iDataAgents.
See Also:
Refer to Backup - NAS NDMP - Multiple Data Stream Backups for information on how streams can be used to back up data on a NAS NDMP file server.
Before performing any procedures using multiple streams, review the following information:
For SAP for Oracle, multistream log backups are not supported from the CommCell Console. Such backups are supported from the Command Line Interface.
Multiple Stream Backups from the CommCell Console - If any of the streams do not have resources available, the whole job will be placed in a pending state, and will eventually fail if the condition is not corrected. If you cannot determine the availability of resources for all the streams, run the backup with one stream. This can be done by specifying 1 in the Number of Data Backup Streams field in the Subclient Properties (Storage Device) window. Keep in mind that single-stream backups will also fail if the required media resource is unavailable.
Multiple Streams for Oracle Third-Party Command Line Backups - Third-Party command line backups can utilize multiple streams. However, when all streams do not have resources available, the streams which do not have resources will fail. Since each backup stream has a separate job ID in the Job Manager, and the Job Controller window displays a separate job for each process, the streams that have resources will report as successful, while the streams that do not have resources will be placed in a pending state, and will eventually fail if the condition is not corrected. If this problem is encountered, the backup will not be complete; rerun the backup after determining the availability of adequate resources for all the streams. If you cannot determine the availability of resources for all the streams, run the backup with one stream. Keep in mind that single-stream backups will also fail if the required media resource is unavailable.
Oracle 10g - for third-party command line backups utilizing multiple streams, if resources are not available for any stream, it will be placed in a pending state, and eventually in a failed state. Despite that, when another stream for which resources are available has completed, its resources will then be utilized for the stream that was in the pending/failed state, and the job will complete successfully. (Note that under such circumstances, it is necessary to check the RMAN status of the job to determine that it did complete successfully.) Thus, Oracle 10g backup streams will be automatically serialized when lack of resources makes it impossible for them to run in parallel. |