Configuration - Content Indexing and Search

Topics | How To | Related Topics


Content Indexing Engine

Offline Content Indexing

Online Content Indexing

Security

Search Console

Agent-Specific Configuration

Upgrade Considerations


Content Indexing Engine

Once you have installed the Content Indexing Engine, you can set the following options in the CI Engine Properties dialog box from the CommCell Console to improve the performance of the content indexing operation: (In the case of multi-node installation, you can set these options for the Content Indexing Engine in the Admin node.)

Maximum Number of Batch Slots - You can set this option to determine the maximum number of batch slots to be sent at a time to the Content Indexing Server for content indexing. By default, the value is set to 40. It is recommended to set this value to 40 times the number of nodes created (including the Admin node). For example, if you have 4 nodes created, including the Admin node, set the value to 160.

Maximum Number of Documents Per Batch - You can set this option to determine the maximum number of documents to be included in a batch for content indexing. By default, the value is set to 1000. It is always recommended to include 100 documents in a batch.

For step-by-step instructions on setting these options, see Configure Content Indexing Engine Options.

Offline Content Indexing

After installing the Content Indexing Engine, you can configure the Offline Content Index before running or scheduling content indexing operations. This includes the following tasks: (If you have not already installed the software, see Deployment - Content Indexing and Search for more information on how to install the Content Indexing Engine.)

  1. Identify and enable content indexing in storage policies that will be used for content indexing data in storage.

    See Enable (or Disable) Storage Policies for Content Indexing for step-by-step instructions.

    Note that, when you disable content indexing for a storage policy, a warning message is displayed prompting you to whether de-configure and remove all the content indexes associated with this policy. On selecting Yes, all the content indexes associated with the specific storage policy gets pruned and the content indexing feature is disabled. On selecting No, the content indexes are retained, but the content indexing feature will be disabled for the storage policy.

  2. Identify and enable Clients for which data must be content indexed by the content indexing engine, whenever a data protection operation is run.
    • A license will be consumed when you enable a Client for Offline Content Indexing - see License Requirements for more information.
    • To ensure that the protected data associated with the Subclients (in the Client) are content indexed, make sure that the required Subclients point to a Storage Policy (Copy) in which Content Indexing is enabled.

    See Enable (or Disable) Clients for Content Indexing for step-by-step instructions.

  3. Identify the file types that must be content indexed, using an inclusion or exclusion list.

    See Filter File Types that Must be Content Indexed for step-by-step instructions.

    It is strongly recommended that you filter out files that are not required to be content indexed. This would help you to limit the size of the index to only those documents that require to be content indexed.

    When filtering the files for content indexing, note that the CommCell Console enables filtering based on the file extensions, whereas the Content Indexing Engine filters the files based on the file/MIME types. Multipurpose Internet Mail Extensions (MIME) type is an Internet standard that is used to identify the type of information in a file.

    For example, if you change the file extension of a word document to a JPEG image and provide a filter for .jpg files in the CommCell console, the specific file will not be sent to the Content Indexing Engine for content indexing. However, if you do not provide a filter for .jpg files in the CommCell Console, the file is sent to the Content Indexing Engine and will be content indexed as a word file, since the MIME type identifies the file as a word document.

    By default the system content indexes all the file/MIME types listed in Supported Document Formats. (This list also provides a list file/MIME types that can be included or excluded from being content indexed.)

    By default the system does not content indexes the file types listed in Common File Types Excluded From Content Indexing.

  4. Configure the retention criteria for the content index. By default the indexes are maintained as long as the data is maintained and automatically pruned when the data aging operation prunes the associated data.

    See Configure Retention Criteria for the Content Index for step-by-step instructions.

  5. If necessary, specify the backup selection criteria for Content Indexing in the storage policy.

    See Specify the Backup Selection Criteria for Content Indexing for step-by-step instructions.

    Once again, it is recommended that you enable content indexing for data associated with long-term retention, such as a monthly/yearly full backups or for data with extended retention periods, which would help you to limit the size of the index.

  6. If necessary, select the Subclients that must be content indexed in the storage policy.

    See Add (or Remove) Subclient for Content Indexing for step-by-step instructions.

  7. If necessary, you can disable the preview of search results before restore, in the storage policy.

    See Disable Preview of Search Results for step-by-step instructions.

Once the Content Indexing Engine is configured you can start to run or schedule the Content Indexing Operations. See Operations - Content Indexing Engine for more information.

Other Considerations


Online Content Indexing

Once you have installed an Online Content Indexing agent, you need to configure it before running or scheduling any online content indexing operations. (If you have not already installed the agent, see Deployment - Content Indexing and Search for information on installing the Online Content Indexing agents.)

When installed, the software by default creates a content index set with a default subclient. However, you can also create user-defined subclients based on your content indexing needs. Prior to performing content indexing operations, configure the following:

If necessary, you can also configure the following:

The other configurable properties available for the Online Content Indexing Agents are User Administration and Security and Activity Control.

Once the Online Content Indexing subclient is configured you can start to run or schedule the Content Indexing Operations on the subclient. See Operations - Content Indexing Engine for more information.

 

Other Considerations

General

Online Content Indexing for File System Agent

Online Content Indexing for Exchange Agent


Security

Security must be configured in the CommCell to grant permissions for users and user groups to search data before searches can be performed. This includes the following tasks for each search tool:

Search Console

Perform the following security configuration tasks for the Search Console as appropriate for your implementation:

CommCell Console

Perform the following security configuration tasks for the CommCell Console as appropriate for your implementation:

Outlook Add-In

Perform the following security configuration tasks for the Outlook Add-In as appropriate for your implementation:


Search Console

Once you have installed the Web Search Server, certain configuration tasks must be performed before you can begin searching for data from the Search Console. This includes the following tasks: (If you have not already installed the software, see Deployment - Content Indexing and Search for more information on how to install the Web Search Server.)

Once the configuration tasks have been completed, and content indexing operations have been performed, you can begin conducting searches on the data. For more information, see Data Discovery and Search.

You can control the disk space utilization and search result display for each user from the User Administration page of the Search Console. To do this, you need to be a user with CommCell wide administrative rights. For more information, see User Administration.

To change the location of the URLs for accessing the Search Console or User Administration page, see Configure the Search Server URLs.

To change the language preferences for the Search Console, see Select Language Preferences for Search Console

Currently, the Search Console supports the following languages:

User Administration

The User Administration page for the Search Console provides options to manage the disk space utilization and search result display for each user. In order to access the User Administration page, you need to be a CommCell administrator. See Security Configuration for Search Console for information on creating a user with administrative rights.

The User Administration page also displays the Last login time and Last logged in system details for each user. Note that, you can configure and view the user details for only those users who have performed a search operation using the Search Console.

Whenever a user performs a search action, the results of the search are displayed as a list of items. The user then selects specific items from the available list and moves them to a review set. Next, the items in the review set are restored to the job results directory for that specific user in the web server. To optimize the disk space in the job result directory and customize the display of the search results, you need to configure the following options for each user in the User Administration page:

Disk Quota

Disk Quota specifies the maximum amount of data that can be restored in the job result directory for each user. Consider a situation where a user restores large amount of data into the job result directory. This data occupies most of the space in the disk, thus restricting the search restore operation of other users. Therefore, in order to optimize the disk space usage, you need to set the Disk Quota limitation that will allow the user to restore data only up to the maximum specified value. Once the Disk Quota limitation is reached, the user will not be able to restore any data unless the already restored data is pruned. For step-by-step instruction on configuring the disk quota limitation, see Configuring Disk Quota.

Based on the Disk Quota limitation for each user, the Search Console can prune the items that are exceeding the Disk Quota value in the job results directory. Similarly, the items that are deleted from the review set can also be pruned. The User Administration page also displays the amount of disk space utilized by each user.

Number of Results per Page

You can specify the number of search results to be displayed in a page for each user, using the Display n results per page option in the User Administration page. This option allows you to set up to a maximum of 100 results per page. For step-by-step instruction on configuring the number of results per page, see Configure Number of Results per Page.

Number of Queries per User

You can specify the number of queries that can be saved by each user, using the Allow n queries to be saved option in the User Administration page. Once the specified number is reached, you are not allowed to create any more queries, unless some of the existing queries are removed. For step-by-step instruction on configuring the number of queries per user, see Configure Number of Queries per User.

Number of Results per Page in Review Set

You can specify the number of search results to be displayed per page in a review set, using the Display n results per page in review set option in the User Administration page. This option allows you to set up to a maximum of 100 results per page. For step-by-step instruction on configuring the number of results per page in a review set, see Configure Number of Results per Page in Review Set

Number of Review Sets per User

You can specify the maximum number of review sets that can be created by a user, using the Allow n review sets to be created option in the User Administration page. Once the specified number is reached, you are not allowed to create any more review sets, unless some of the existing review sets are removed. For step-by-step instruction on configuring the number of review sets per user, see Configure Number of Review Sets per User.

Review Set Retention

You can specify the number of days the deleted review set items can be retained in the web server cache by using the Specify number of days review sets to be retained option in the User Administration page. After deleting the review set and reaching the specified number of days the items belonging to that review set only are pruned from the web server cache. For step-by-step instruction on configuring the retention time for the review sets, see Configure Review Set Retention.

The Search Console also provides the facility to manually delete a review set from the end-user/compliance-user level. For more information, see Search Actions.


Agent-Specific Configuration

The following agent-specific configuration tasks are required for content indexing and search operations.

Domino Mailbox Archiver Agent

If you wish to perform content indexing operations for the Domino Server's journaling mailbox, you must configure the following administrative settings for the Domino Server's journaling mailbox:

  1. The Method must be set to Send from Mail-In Database.
  2. The option to Encrypt Incoming Mail must be set to NO.

Note the following:

In order to perform an end-user search for the Domino Mailbox Archiver agent, you need to configure the following settings:

  1. From the CommCell Console, make sure that the Collect User Identity check box is selected in the Subclient Properties (General) dialog box.
  2. The Domino mailbox user name must be in sync with the Active Directory user name.

Outlook Add-In

In order to take advantage of Search Console capabilities from the Outlook Add-In, perform the following configuration tasks:

  1. On the client where Outlook Add-In is installed, edit the UIOptions registry key to add 128 to the existing value. This will enable the Search Console toolbar button in Outlook with the default capability of performing End-User Searches.
  2. After editing the UIOptions registry key, if you want to change the default capability to be Compliance Searches instead of End-User Searches, then you will need to create the SearchPageURLOption registry key with a value of 1 on the Outlook Add-In client.
  3. Stop and re-start the Outlook session for the change to take effect.

The Outlook toolbar buttons to launch the Search Console will appear as follows:
Compliance Search
End-User Search


Upgrade Considerations

General Considerations for Content Indexing

To take advantage of the new features, it is recommended to upgrade the Web Server along with the CommServe upgrade.

For tagging and delegate search, ensure that to have a latest version of Content Indexing Engine profile.

Back to Top