CommCell Scalability Guide

Table of Contents

Overview

Benefits

Planning CommCell Deployment

Deployment Requirements

Assessing Future Growth

Planning New CommServe's

Placement of Libraries

Deployment in Virtual Environments

CommServe

MediaAgent

Scalability Guidelines for a CommServe

Server Scalability Parameters

Disconnecting Idle GUI Connections

Scalability Guidelines for a CommCell

Increasing Streams for Concurrent Backups

Decreasing Network Agents for Non-LAN Optimized Backups

Setting up Fan-In Ratio for Connections to a CDR/WBA Destination

Setting up Job Preemption Control for the CommCell

Managing Concurrent Jobs

Managing Hardware Snapshots

Managing Content Indexing Scalability

Scalability Guidelines for CommServe/CommNet Server

Large CommCell Optimization Parameters

CommServe

MediaAgent

File System iDataAgent

CommCell® Performance Parameters

Performance Degradation Impact Analysis

Overview

Calypso® software is deployed in large enterprise environments. Various scalability criteria needs to be addressed in order to ensure a successful and sustainable installation.

Certain guidelines associated with the design, deployment, and support of large CommCell® environments must be followed. Consider the following as you plan the install and configuration of Calypso®.

Benefits

The following list outlines the benefits gained by deploying Calypso® software across multiple CommCell groups in an Enterprise environment:

Planning CommCell Deployment

This section suggests the deployment requirements and considerations to be taken into account for a Workgroup, Datacenter or an Enterprise environment. Each environment has different hardware and software requirements.

Deployment Requirements

Consider these requirements as you plan how you will install and configure Calypso®.

The following table illustrates the recommended system requirements for the CommServe and MediaAgents, based on scalability requirements.

Module Class Processor Memory in GB Disk space in GB

CommServe

Workgroup Single Quad Core Opteron, 2.2Ghz or equivalent 4 75
  Datacenter Two Quad-Core or Dual-Core Intel Xeon 5500 series processors 16 100
  Enterprise Dual quad-core or six-core 64-bit Intel Xeon processors @ 3.66GHz 32 200

Solid-state drive (SSD) recommended.

MediaAgent

Workgroup Processor Equivalent to Single Quad Core Opteron, 2.2Ghz or equivalent 2 to 4 4% of amount of data backed up through the MediaAgent
  Datacenter Processor Equivalent to two Quad-Core or Dual-Core Intel Xeon 5500 series processors 4 to 8 4% of amount of data backed up through the MediaAgent
  Enterprise Processor Equivalent to dual quad-core or six-core 64-bit Intel Xeon processors @ 3.66GHz 16 or more 4% of amount of data backed up through the MediaAgent

Refer to System Requirements for a detailed list of required components for each CommCell module.

Assessing Future Growth

It is important to complete an accurate assessment of data growth planned, during completion of CommCell® design configuration.

This ensures that the design aligns with the scalability requirements reflected in this document when calculating backups, tape utilization and retention requirements.

Planning New CommServe's

Once any new CommServe® servers are added, it allows future CommCell growth and upgrades to be completed in a more controlled fashion.

  There can only be one active CommServe instance in a CommCell, at one time.

Placement of Libraries

When deploying multiple CommCells ensure that disk and tape libraries of different types are placed in different CommCells.

Conversely, ensure tape libraries of the same type are placed within the same CommCell group, whenever possible.

For large tape libraries, use vendor software that allows configuration of smaller virtual libraries.

Tape exports from multiple CommCells can be centrally managed by creating the appropriate Vault Tracker® policies in each CommCell configuration.

Deployment in Virtual Environments

When deploying Calypso® software in virtual environments consider the following:

  1. The hardware/software vendors must support their products within the hypervisor.
  2. The hypervisor vendor must support use of the hardware/software product within the hypervisor. Refer to specific notations regarding tape device connectivity to MediaAgents resident on Virtual Machines in the sections that follow.
  3. The use of  Calypso®software within a virtual machine may introduce conditions (such as contention for shared resources or other interruptions) not present during standard product certification and scalability testing. The most commonly anticipated impacts associated with system performance on Virtual Machines are specified, however these impacts are intentionally broad and different use cases may yield better or lesser performance than as documented here. Additional tuning may be required to address any resulting delay, retry, or timeout conditions. Any such tuning is at the recommendation of the Virtual Machine platform developer rather than Bull.
  4. The software may experience lesser performance when used in a Virtual Machine configuration for CommServe and MediaAgent systems. It may be necessary to increase system resources to address observed performance issues.
  5. As noted in the upcoming sections of this document using a hypervisor vendor’s interface to suspend, resume, or otherwise end the operation of a production (running services) CommServe or MediaAgent is specifically not supported. Only the use of the Bull Service Control interface (GUI and / or qscripts and qcommands) is supported for control.
  6. In the event of a support escalation, Bull Customer Support will make commercially reasonable efforts to attempt to resolve the issue within the virtual environment. Bull reserves the right to qualify, limit, exclude, or discontinue support for a virtualized CommServe or MediaAgent configuration due to unforeseen incompatibilities within the hypervisor environment.

CommServe

If the CommServe server is configured on a Virtual Machine (VM), then it typically operates at a range of 60% efficiency as compared to a comparable physical server. In this deployment model scalability limits are affected as compared to physical server environment. The following conditions must be taken into consideration while deploying the CommServe in virtual environment:

  1. ERSet creation for a virtualized CommServe server is also limited to writes to a shared magnetic library given support for tape devices in Virtualized environments. Appropriate disk should be allocated to the CommServe server for the ERSet.
  2. Virtualized CommCell guests should be running on VMware vSphere 4.0.x or Microsoft Hyper-V R2 with comparable resources to a physical server. At a minimum 2 vCPUs, 4 GB vMemory and 60GB of disk should be available to the guest (Workgroup-level CommServe).
  3. In order to perform a storage vMotion, the virtual machine being vMotioned must have a snapshot taken. Once the snapshot is taken, the base disk is then copied from the source LUN to the destination LUN. This operation imposes high IO usage on both those disks, decreasing capacity available for VM usage. Additionally, Virtual Machine IO is directed to the snapshot, hence for every read operation the VM must first check the snapshot file to see if the bit is there, and if it is not, then must read it from the base disk. Once the copy is complete, the VM's base disk is changed to the destination location, but the snapshot (containing IO that occurred during the vMotion) still resides on the source disk. At this time that snapshot is committed. This means those changes are synced back to the base disk, now on the destination LUN. This also requires IO overhead.
      During a storage vMotion, every IO operation sent by the virtual machine requires at least 2 IO operations on the physical disk, and additional IO is being sent to the SAN, which would decrease available IO capacity.

    During any of these phases, the additional IO could cause an IO intensive application on the virtual machine, which is sensitive to data read/write times above and beyond what the OS timeouts are set to, to time out on some of those requests, if the SAN is unable to cope with the additional workload and still return read/write requests within that application's timeout settings. Even if requests are returned fast enough to not cause a timeout, it still would result in additional latency. Databases are among the types of applications this might affect. vMotion operations associated with the CommServe DB will create undesirable latency on CommServe operations. Under no circumstances should a vMotion operation be completed with an active CommServe DB.

    Storage vMotions cause additional IO load, which can cause a heavily loaded storage device to respond more slowly than an application might expect.

Configure an Alarm using vSphere Client

As a best practice step when configuring a CommServe on a VM, it is recommended to configure an alarm to notify the CommCell administrator of a vMotion event. The CommCell administrator can use this alarm to diagnose any CommServe performance issue that follows the vMotion event.

1.
  • From the vSphere Client, navigate to the virtual machine.
  • Click the Alarms tab and select the Definitions view.
  • Right-click anywhere in the tab and click New Alarm...
2.
  • Type a name for the alarm in the Alarm name box.
  • Specify the purpose for this alarm in the Description box.
  • Select Monitor for specific events occurring on this object, for example, VM powered On.
3.
  • Click the Triggers tab.
  • Click Add.
  • From the Event column, select the activity that will trigger the alarm.
4.
  • Click the Actions tab.
  • Click Add.
  • From the Action column, select Send a notification email.
  • Click OK.

MediaAgent

If a MediaAgent is configured on a Virtual Machine (VM), then it typically operates at a range of 60% efficiency as compared to a comparable physical server. In this deployment model scalability limits are affected as compared to physical server environment. The following conditions must be taken into considerations while deploying the MediaAgent in virtual environment:

  1. The maximum number of concurrent streams supported by a single MediaAgent is effectively 60% of the specification for a physical host. Refer to the upcoming sections of this document for additional detail regarding MediaAgent scalability associated with stream counts.
  2. Virtualized MediaAgent guest servers running in a VMware environment should be running on VMware vSphere 4.0.x with the latest version and patches applied. In a Microsoft environment guests should be running on Hyper-V R2. At a minimum 2 vCPUs, 4 GB vMemory and sufficient disk resources to hold the data (to be protected) should be available to the guest.
  3. Library creation for a virtualized MediaAgent is limited to writes to a single magnetic library or stand alone library given support for tape devices in Virtualized environments.
    For a VMware virtualized MediaAgent, Virtual disk libraries can be configured as virtual disks within the MediaAgents guests or as attached RAW LUNs. Tape Libraries also can be presented to a virtual guest by using SCSI pass through. Adaptec cards are supported for SCSI pass through and iSCSI is supported for tape library connections. It is recommended to bind the iSCSI initiator to a separate NIC on the ESX host as it will isolate the ISCSI calls from routine LAN traffic. For more information on SCSI pass through and hardware compatibility consult VMware documentation: http://www.vmware.com/support/pubs/ and VMware support.
      Library creation for a virtualized MediaAgent is also limited to writes to a single magnetic library or stand alone library given support for tape devices in Virtualized environments.

    For Hyper-V MediaAgents, disk can be passed to the virtualized Hyper-V MediaAgent with the pass through disk feature. Note that the disk will need to be in an offline state on the parent server to function on the guest. For more information on pass through disks in Hyper-V refer to the Microsoft Hyper-V R2 website: http://www.microsoft.com/windowsserver2008/en/us/hyperv-main.aspx.

Scalability Guidelines for a CommServe

A warning is issued to Administrators when the scalability limits are approached. The warning message advises to modify current settings or configure the entities that exceed the scalability guidelines within a different CommCell group.

The following environments have been identified as scalability thresholds within a single CommCell group. Threshold considerations are divided as follows:

Server Scalability Parameters

Observe the following parameters for implementing a CommCell hierarchy.

  If your CommServe is running on a physical computer (a non-virtual machine) with Solid-state drive (SSD), you can scale it to twice the limits stated below.

E.g. Number of clients sustained could reach 5000 clients instead of usual 2500.

CommCell Class Multiplexing Factor Concurrent Job Streams Client Count MediaAgent Count Drive Count SLOT COUNT

Workgroup

2 75 50 20 20 No Limit

Datacenter

5 300 200 50 100 No Limit

MS SQL Express Edition

5 10 25

The maximum number of Client count is inclusive of MediaAgents.

25

Any client computer can be installed as a MediaAgent.

500 No Limit

Enterprise

25 1000 2500 Any client computer can be installed as a MediaAgent.

There is no separate restriction for MediaAgent.

1000 No Limit

Notes

Multiplexing Factor Limit for a Single CommCell.

Administrator Notification are set at a Multiplex count of 5.

Number of Active Job Streams corresponding to concurrently running jobs.

Running Job Stream = No. of tape drives * (Multiplexing Factor + Magnetic Writers).

This is the high watermark value in the GUI and is enforced at 1,200.

Number of Supported Clients Within a Single CommCell.

Hard Limit Notification occurs at 4,500 Clients within the CommCell. 

Number of Supported MediaAgents within a Single CommCell

Express CommCells have a limit of

25 MediaAgents

The maximum Number of Concurrent Streams to a Single MediaAgent is set at a value of 300.

Number of Supported Tape Drives within a Single CommCell Number of Supported Slots within a Single CommCell

Disconnecting Idle GUI Connections

To improve performance of a CommCell group, enable the option for the CommCell Console to become disconnected when inactive for a certain period of time. This option disconnects connections from idle GUI sessions, thereby allowing other GUIs to connect without exceeding the established parameter.

Follow the steps below to enable this option:

  1. From the CommCell Console, select Tools | Control Panel.
  2. From the Control Panel, select System.
  3. From the General tab, select the Allow GUI connections to timeout.
  4. Enter a timeout value in GUI timeout in minutes box.

    The recommended value is 30 minutes, however the default value is set to 180 minutes.

      Enter a reasonable timeout value to programmatically terminate idle connections. The lower the disconnect threshold value, the sooner the idle GUIs will be disconnected, allowing active GUI sessions to connect to the CommServe.
  5. Click OK.

Scalability Guidelines for a CommCell

It is strongly recommended that the soft limits mentioned in the sections below are followed, and not be exceeded.

It is also recommended to review of the Deduplication Architecture Guide when planning for Calypso® Block Level Deduplication CommCell group deployments.

Contact Customer Support or your Account Team for the current release of that separate document.

Increasing Streams for Concurrent Backups

To increase the number of streams for concurrent backups from large number of clients, enable the option optimize for concurrent backups, It will increase the current stream count limit by 200 more streams.

  1. From the CommCell Browser, navigate to Storage Resources | MediaAgents | <MediaAgent>.
  2. Right-click the MediaAgent that you wish to optimize for concurrent LAN backups and then click Properties.
  3. Click the Control tab.
  4. Select the Optimize for concurrent LAN backups to enable the option.
  5. Click OK.

Decreasing Network Agents for Non-LAN optimized backups

For better throughput, specify a lower value for the number of data pipes/processes that the client uses to transfer data over a network.

  1. From the CommCell Browser, right-click the subclient.
  2. Click Properties.
  3. Click Storage Device.
  4. Click Data Transfer Option tab.
  5. Specify the number of Network Agents as 1.
      On non-UNIX computers, the default value is 2 and a maximum of 4 can be established if necessary. On UNIX computers the default value is 1 and a maximum of 2 can be established if necessary.
  6. Click OK.

Setting Up Fan-In Ratio for Connections to a CDR/WBA Destination

For maximum performance and robustness, the total number of Replication Pairs configured for the same source volume should be kept to a minimum. If multiple Replication Pairs for the same source volume are required, the following limits must be observed.

CommCell Class FAN IN Ratio for Different Server Types
Win 32 Win 64

Workgroup

1 to 20 1 to 60

Datacenter

21 to 50 61 to 150

Enterprise

51 to 100 151 to 500

Setting Up Job Preemption Control for the CommCell

In Virtual Tape Library Environments scalability may run beyond 60,000 tapes when backup jobs preempt auxiliary copy jobs.

  1. From the CommCell Console, select Tools | Control Panel.
  2. From the Control Panel, select Job Management.
  3. From the General tab, select the Backups Preempts Auxiliary Copy.
  4. Click OK.

Managing Concurrent Jobs

It is important to manage concurrently running jobs, this can be done by staggering schedules. Use multiple schedule policies on different client groups and adjust the timing of the schedules in order to optimize scalability.

The table below displays the maximum number of concurrent jobs permitted in different environments.

CommCell Class Total Permitted Job Count

Workgroup

1 to 100

Datacenter

101 to 300

Enterprise

301 to 1,000

Notes

This includes Jobs in a Waiting/Pending status.

However, there is no limit to the number of Storage Policies in a single CommCell Group for the Enterprise environments.

Stagger the start time of jobs to be separated by a time interval of Upto 20 minutes.

Managing Hardware Snapshots

There is no software limit on the number of hardware-generated Snapshots using Calypso®. However for limits imposed by each manufacturer’s controllers and software, refer to your hardware provider's documentation.

Managing Content Indexing Scalability

Content Indexing and Search provides the ability to content index and search both file server data and protected data for data discovery.

The following limits must be observed to achieve maximum performance.

Installation Type Hardware Type Object Count per Index Node

Upgrade

Legacy 50,000,000

New Installation

Virtual Machine-based 50,000,000

New Installation

x64, 16GB RAM Servers 100,00,0000

Notes

  Content Indexing 100,000,000 objects per Indexing Node is a Hard Limit and cannot be exceeded.

Scalability Guidelines for CommServe/CommNet Server

This section represents scalability thresholds for a co-located CommServe and CommNet Server, where the CommNet Server functions as reporting server for a single CommServe.

CommCell Class Maximum Number of Supported Clients

Workgroup

1 to 50

Datacenter

51 to 200

Enterprise

201 to 2,500

Notes

  • A dedicated and stand alone CommNet Server is recommended for Multiple CommCells.

    The co-located CommNet Server cannot function as reporting server for multiple CommCells, unless the appropriate licenses are available.

  • Client counts per CommCell beyond these specification require the configuration of separate CommNet Servers.
  • The maximum number of CommCell groups managed by a single CommNet Server is 25.
      It is recommended to install CommNet Server on dedicated machines, in order to achieve scalability of connecting maximum number of clients to a CommServe.

Large CommCell Optimization Parameters

CommServe

Reducing Subclient Count

Subclient count optimization allows managing the daily backup operations easily for the administrator. It is recommended to periodically review the subclients in order to determine if any redundant or unneeded subclients exist and can be removed from the CommCell configuration. Information about each subclient is tracked by the CommServe, and by reducing the number of subclients, there is a huge reduction in the amount of tracking information.

Increasing Chunk Size

This parameter impacts tape backup operations. A higher chunk size gives a better throughput.

A lower value for this setting is recommended for frequent checks against slower data protection operations, especially when data is moving across a WAN link.

  1. From the CommCell Console, select Tools | Control Panel.
  2. From the Control Panel, select Media Management.
  3. Click the Chunk Size tab.
  4. Click the Agent for which you wish to modify the chunk size, and modify the chunk size to 8 GB in the Chunk Size column.
     
    • Granular restores will be slower using this setting, however, large restores will be faster.
    • This setting does not apply to database backups where the Chunk Size is 16GB by default.
  5. Click OK.

Updating Interval

It is recommended to implement a formal software update management process. This process allows for planned software updates to be aggregated on a 30-day or 60-day basis.

A test CommServe is recommended to be used for the completion of update validation prior to placing scheduled updates into a production environment. This helps improve backup stability.

It is also recommended to increase the Job manager Update Interval for your Agent, this can be done using the following steps:

  1. From the CommCell Console, select Tools | Control Panel.
  2. From the Control Panel, select Job Management.
  3. Click the Job Updates tab.
  4. Click the Agent for which you wish to modify the update interval, and modify the Protection (Mins) to 10.
  5. Click OK.

MediaAgent

File System iDataAgent

All File Systems

File System Multi-Streaming employs multiple data streams per subclient for the data protection operation, enabling the subclient's contents to be distributed to all the streams, transmitting them in parallel to the storage media. It is recommended to use Automatic File Multi-Streaming for larger subclients (1TB or more).

It allows the file system backup to use multiple readers for increased performance, this configuration in turn reduces duplicate file Scan Time on client servers.

  For subclients less than 1TB in total size, set the number of readers on the sub-client to 1.

Windows File System

After upgrading to a newer release of Calypso®, delete the system state backups.

Newly installed clients do not have the separate subclient.

CommCell® Performance Parameters

CommCell Performance is based on load that can be measured by the impact on following types of operations:

  1. GUI response time for all of the screens
  2. Ability to run backups/restores and other jobs at specified times
  3. Uninterrupted operation of the subsystems without services stopping or requiring a reboot at the client computer
  4. CommServe database consistency
  5. Console flooding with critical/major alerts or events
  6. Tape drive thrashing (tape to drive swaps)

Performance Degradation Impact Analysis

Within a CommCell hierarchy, the following subsystems must be examined so that a degradation in performance does not occur. Concerning areas are dependent upon individual data protection objectives and the server along with the available storage resources. The subsystem recommendations described in this section are based on size and and expected growth trends.

No Impact

Update index on MediaAgent. Up to the number of drives controlled through each MediaAgent.

Least Impact

  1. Growth and rapid overwriting of log files.
  2. Create index processes on the MediaAgent.

    Approximately six backup jobs per drive per night on a MediaAgent, for a maximum of 36 processes that may start concurrently.

  3. Number of cache mounts in the system.

Medium Impact

  1. Refresh updated to the GUI.
  2. Concurrent schedule submission (600).

    You may stagger up to 100 jobs at a time and 20 minutes apart.

  3. Operating too many jobs in the job controller.
  4. Resource Manager slowness due to parsing of high numbers of resource entities.
  5. Reports Prune Forecast Report and Data on Media Report run slow when amount of data is high.

High Impact

  1. Performance degrades when large Exchange Mailbox or Lotus Notes Document jobs are run concurrently.
  2. Excessive number of simultaneous GUI connections to the CommServe also impact performance.
  3. CommServe SQL database interaction with Data Aging, Timeouts, deadlocks, etc. impacts performance.
  4. Flooding of event messages and alerts (if configured) impacts performance.
  5. Tape drive thrashing (tape to drive swaps) may occur if all jobs for all Storage Policies are flooded into the Job Controller at one time. It is recommended that backups of Storage Policies are scheduled in groups in order to achieve a level of efficiency so that queued jobs that are associated with one Storage Policies have the tapes mounted in drives. This will allow jobs to stream to “pre-mounted” tapes.