Setting Up Data Classification Enabler

Topics | How To | Related Topics


Overview

Installing the Data Classification Enabler

Enable and Configure the Agents or Components


Overview

Setting up the Data Classification Enabler includes the following tasks:


Installing the Data Classification Enabler

The Data Classification Enabler can be installed on Windows and Unix computers. For more information, see Deployment - Data Classification Enabler.

Also, see System Requirements - Data Classification Enabler for the supported operating systems, applications and file servers.

Once the Data Classification Enabler is installed and enabled, it performs an initial data collection of all the data and then creates SQL-like (meta) databases. Once this initialization is completed, the supported components can use Data Classification.

Database Considerations

Exchange

To administer Exchange data, the Data Classification Enabler uses a couple of processes: enumeration and sink. The enabler uses enumeration to log in to the Exchange Server, parse each Exchange mailbox on the server, and create a map of the data in the Data Classification database. The enabler uses sink to hook to the Exchange server in order to capture the state changes (events) of the mailbox contents and to record this information in the Data Classification transaction logs. Once these logs reach the specified maximum size, they are consumed in the Data Classification database. As such, the Data Classification database includes a record of the data change events and a corresponding time stamp for each event.

Unix

Each meta database contains information about the files in the associated volume. Thereafter, the Data Classification service constantly monitors all the files on these volumes, and it detects new volumes at a prescribed time interval. The service updates the databases and it keeps tracks of the updates (e.g. file additions, content update to files, etc.) made to each database; in effect, this provides almost a real-time view of the data in the system.

By default, the meta database is located at the root of each mount point, and it is named .db.cv (e.g., for /home, it would be /home/.DATACLASS_1/.db.cv). Journals from the FSF driver is used to keep track of the updates to each meta database.

Windows

Each meta database contains information about the files in the associated volume. Thereafter, the Data Classification service constantly monitors all the files on these volumes, and it detects new volumes at a prescribed time interval. The service updates the databases and it keeps tracks of the updates (e.g. file additions, content update to files, etc.) made to each database; in effect, this provides almost a real-time view of the data in the system.

By default, the meta database is located at the root of each NTFS volume or mount point, and it is named [volume]_db.db (e.g., for C:\, it would be c_db.db). For mount points, the database name is [mountpoint]_db.db (e.g., for a mount point C:\mountpoint, the file is mountpoint_db.db, and it resides in the C:\mountpoint directory). Change Journal is used to keep track of the updates to each meta database.

Data Classification works with NTFS volumes but not with FAT volumes. New volumes that are added to your system are automatically recognized.

Space and Performance Considerations

The meta databases created by Data Classification usually consume about 5% of the total space on the hard disk. Depending on the type of data and folder layout, the metafiles may consume additional space.

For Data Classification on Unix, each Data Classification update record consumes about 256 bytes (this assumes an average short name length of 16 bytes and an average full path length of 256 bytes).

For Data Classification on Windows, you can administer the size of the Data Classification databases by using the DC_CREATE_INDEX registry key. This key also allows you to administer other items associated with Data Classification, such as the time required for database initialization as well as backup and archiving speed for some agents.

Services

Data Classification services can be started or stopped using the Service Control Manager. See Services for an overview. Data Classification services on Windows can also be started or stopped using the Data Classification Administration Utility or the Data Classification Console for Windows.


Enable and Configure the Agents or Components

Depending on the agent or component, you can configure the Data Classification Enabler to do the following:

Exchange Agents

These agents can use the Data Classification Enabler to log and use events generated by the Exchange server in order to select the eligible data for data protection operations or Online Content Indexing. The events keep track of what data has been added to, removed from, and changed in the Exchange Server. Tracking data using events is especially useful when preparing to run incremental backups or Online Content Indexing. 

Use the Data Classification Administration Utility or the Data Classification Console for Windows to administer event logging for Exchange mailbox data and to populate the Data Classification database. For more information, see Advanced Plugins. For the step-by-step procedures, see Administer Exchange Properties for Data Classification and Start Data Classification for Exchange to Populate the Data Classification Database with Exchange Metadata.

To use the Data Classification Enabler with an eligible Exchange 2007 agent, both the enabler and the agent must be installed on a proxy. To administer Exchange 2007 mailbox data using the Data Classification Enabler, you must configure a proxy computer for the enabler by populating the affected fields discussed in Administer Exchange Properties for Data Classification. Also, you must click the name of the proxy from the Exchange Proxy list on the Client Computer Properties (Advanced) tab.

To use the Data Classification Enabler on a 2008 CCR cluster, you must configure a shared folder that will essentially serve as the third node in the cluster. To enable creation and use of this shared folder, ensure that your system is running the Distributed File System (DFS) and that it includes a DFS replicator. Be sure to add the shared folder to the replication group for all the nodes in the cluster. This setup will enable the data on all the nodes to be replicated. As such, in case of a failover, all the nodes will have the data.

Selected data can be backed up or migrated, as appropriate, by the supported agents. Before you perform either data management operation, be sure to select the Use Data Classification option from the Backup Set/Archive Set Properties dialog box in the CommCell Console. For a step-by-step procedure, see Use Classic File Scan or Data Classification.

If you are using Data Classification on an Exchange database that has been restored from a protected copy, be sure to stop the Data Classification services after the restore and then restart the services. This will repopulate the Data Classification database correctly.

File Archiver for Unix Agent

For this agent, there is no fallback scan method if the Data Classification Enabler is not available.

This agent can use the Data Classification Enabler to define archiving rules based on file attributes and not just on volumes and basic attributes, such as size and modified times. For example, you can use Data Classification to define the agent's subclient content to contain all files starting with 'A', all files modified after a specific date, etc. You can make the associated queries for these and more complex definitions by issuing SQL database-like commands from the CommCell Console against the metadata databases. For an overview, see Rules and Queries.

File Archiver for Windows Agent

Local File System Instance

For this agent, there is no fallback scan method if the Data Classification Enabler is not available.

This agent can use the Data Classification Enabler to define archiving rules based on file attributes and not just on volumes and basic attributes, such as size and modified times. For example, you can use Data Classification to define the agent's subclient content to contain all files starting with 'A', all files modified after a specific date, etc. You can make the associated queries for these and more complex definitions by issuing SQL database-like commands from the CommCell Console against the metadata databases. For an overview, see Rules and Queries. To use this capability, you must first configure a Local File System Instance.

This agent can use the Data Classification Enabler to support domain users and user groups. You can authenticate against the Active Directory domain the users whose files you want to archive. For more information, see Users and User Groups. Using Data Classification for this purpose is especially useful when you are archiving data for user groups across multiple volumes. Data Classification can archive data for users in these groups using rules that you define without the need for your specifying the exact paths to find this data.

Online Content Indexing

Online Content Indexing can content-index various data that are scanned or selected by the Data Classification Enabler.

SRM Windows File System Agent

This agent can use the Data Classification Enabler to scan file system data before data collection jobs. Such scans help expedite Analysis-level data collection jobs. Scans using Data Classification for this agent are enabled by default from the CommCell Console. For more information, see Agents - SRM Windows File System: Data Classification. For a step-by-step procedure, see Enable Data Classification Enabler for SRM.

Once this setting is enabled, all Analysis level data collection jobs that you run subsequently will use Data Classification to gather data. Data Collection jobs will transition to traditional collection methods if any of the following conditions are true:

Unix File System iDataAgents

These agents can use the Data Classification Enabler to improve the scan speed of file system data before data management operations. If the enabler is not available, Classic File Scan is used to scan the data. Scans using Data Classification for these agents must be enabled from the CommCell Console. See Use Classic File Scan or Data Classification for a step-by-step procedure.

Windows File System iDataAgents

These agents can use the Data Classification Enabler to improve the scan speed of file system data before data management operations. If the enabler is not available, Change Journal or Classic File Scan is used to scan the data. Scans using Data Classification for these agents must be enabled from the CommCell Console. See Use Change Journal, Classic File Scan or Data Classification for a step-by-step procedure.