Content Indexing and Search - Building Block Guide

Table of Contents

Overview

Content Indexing Engine Components

Web Search Components

Terminology

System Requirements

Sizing Considerations

Single Server Setup

Hardware Requirements for a Single Server Setup

Multi-Node Setup With 2-3 Servers

Hardware Requirements on the Admin and Index Node with Web components

Hardware Requirements on the Dedicated Index Node

Multi-Node Setup With 2-9 Servers

Hardware Requirements on the Admin Node with Web Components

Hardware Requirements on the Dedicated Index Node

Multi-Node Setup With 2-10 Servers

Hardware Requirements on the Dedicated Admin Node

Hardware Requirements on the Dedicated Index Node

Hardware Requirements on the Dedicated Web Search Server and Web Search Client Node

Backfilling the Index

Media Agent Relationship to Content Index Performance

Overview

Content Indexing and Search provides the ability to content index and search files and emails for data discovery and other purposes. This product includes the following main components:

Content Indexing Engine Components

The Content Indexing Engine can be installed on a single node or on multiple nodes. A multi-node setup includes the following:

Admin Node – This is the entry point for the data to be indexed (one per CI cloud). It receives data from the MediaAgent or Content Indexing iDataAgent and distributes it to the Index nodes for processing.

Index Node – Performs all data indexing and query and results processing. We can have up to 8 index nodes per cloud.

Web Search Components

Web Search Server (WSS) – SQL 2008 and IIS 7 provide a web service for processing requests from one or more Web Search Clients.

Web Search Client (WSC) – This provides the web interface for end users and compliance users. The Web Search Client passes the user requests back to the Web Server for processing.  Users can log into the Web Search Client using Single Sign On.

Terminology

Terms Description
Single Server Setup A setup where the Content Indexing Engine, Web Search Server, and Web Search Client are installed on the same server.
MediaAgent Agent that enables indexing data from backup targets.
Node The client computer on which the Content Indexing component is installed.

System Requirements

Refer the following system requirements to identify the operating system, network, and application requirements for your setup:

Sizing Considerations

The number of index nodes required in a Content Indexing cloud depends on the type of data being content indexed. The content types can be categorized as follows:

Content Types Description
Standard This is a combination of emails and files, such as Microsoft Office documents.
Multimedia This type includes large files with little or no text , such as .mp3, .cad, etc.,
Heavy Text This includes heavy text files, such as log files, Bloomberg dumps, etc.,

These content types have a significant impact on Content Indexing performance and scalability. Contact the Products team for assistance to plan the Content Indexing and Search implementation in such environments.

For standard and multimedia content types, the Content Indexing nodes can index up to 80 million objects per node. Also the indexing rate of each node is up to 75000 documents per hour. Hence, in such environments, the Content Indexing cloud can include approximately 3 nodes after which each node is added for additional capacity only. However, note that a Content Indexing cloud can include one admin node and a maximum of 8 index nodes.

You can increase the indexing speed by:

Single Server Setup

In the single server setup, the Content Indexing Engine (admin and index node), Web Search Server, and Web Search Client are installed in the same server. This is useful in small and ad-hoc environments where the number of objects to be indexed will be approximate 50 million or less and has a lighter search load.

 

Hardware Requirements for a Single Server Setup

2U Server

4 Processor Quad Core

24GB RAM

8 SAS Disk Drives

  • 2 300GB 10K (RAID 1) or higher for OS and applications. Additional space maybe required for web search recovery cache (Job Results) and can be a low cost disk – size is dependent on amount of web recovery expected
  • 6 300GB 10K or higher (RAID 5) for index/cache

SATA is not supported.

iSCSI based storage for the Index Node(s) is not supported.

Fibre Channel or SAS-attached volumes may be used for the index volume if performance requirements are met and the volumes are not shared with other applications or index nodes. Only dedicated SAS or higher speed drives will be supported.

Multi-Node Setup With 2-3 Servers

In a multi-node setup with 2-3 servers, you can install the Content Indexing Engine (admin and index node), Web Search Server, and Web Search Client on the same server. The additional 2 index nodes will be installed on separate servers.

This setup can support up to 225 million object capacity and provides a light search load.

Hardware Requirements on the Admin and Index Node with Web components

2U Server

4 Processor Quad Core

24GB RAM

8 SAS Disk Drives

  • 2 300GB 10K (RAID 1) or higher for OS and applications. Additional space maybe required for web search recovery cache (Job Results) and can be a low cost disk – size is dependent on amount of web recovery expected
  • 6 300GB 10K or higher (RAID 5) for index/cache

SATA is not supported.

iSCSI based storage for the Index Node(s) is not supported.

Fibre Channel or SAS-attached volumes may be used for the index volume if performance requirements are met and the volumes are not shared with other applications or index nodes. Only dedicated SAS or higher speed drives will be supported.

Hardware Requirements on the Dedicated Index Node

4U Server

4 Processor Quad Core

16GB RAM

8 SAS Disk Drives

2 300GB 10K (RAID 1) or higher for OS and applications

6 450GB 10K or higher (RAID 5) for index/cache

SATA is not supported.

iSCSI based storage for the Index Node(s) is not supported.

Fibre Channel or SAS-attached volumes may be used for the index volume if performance requirements are met and the volumes are not shared with other applications or index nodes. Only dedicated SAS or higher speed drives will be supported.

 

Multi-Node Setup With 2-9 Servers

In a multi-node setup with 2-9 servers, the Content Indexing Engine (admin node), Web Search Server, and Web Search Client are installed in a single server. A maximum of 8 index nodes will be installed on separate servers.

This setup can support up to approximately 625 million object capacity and provides a light search load.

 

Hardware Requirements on the Admin Node with Web Search Server and Web Search Client

1U Server

2 Processor Quad Core

16GB RAM

2 SAS Disk Drives

2 146GB 10K SAS (RAID 1) or higher for OS and applications

2 SATA Disk Drives

2 1TB SATA 7K (RAID 1) or higher for web cache (web recoveries) Size disk space based on web recovery requirements. Can be low cost disk.

Hardware Requirements on the Dedicated Index Node

4U Server

4 Processor Quad Core

16GB RAM

8 SAS Disk Drives

2 300GB 10K (RAID 1) or higher for OS and applications

6 450GB 10K or higher (RAID 5) for index/cache

SATA is not supported.

iSCSI based storage for the Index Node(s) is not supported.

Fibre Channel or SAS-attached volumes may be used for the index volume if performance requirements are met and the volumes are not shared with other applications or index nodes. Only dedicated SAS or higher speed drives will be supported.

 

Multi-Node Setup With 2-10 Servers

In environments that require approximately 625 million object capacity and also a heavy search load, you can have all the components installed on dedicated servers. In this setup, the admin node will be a separate server, and the index nodes will be installed on individual servers each. The Web Search Server and the Web Search client will be installed on a single server.

This setup provides the best performance and is always preferred whenever possible.

Hardware Requirements on the Dedicated Admin Node

1U Server

2 Processor Quad Core

16GB RAM

2 SAS Disk Drives - 2 146GB

10K SAS (RAID 1) or higher for OS and applications

SATA is not supported.

Hardware Requirements on the Dedicated Index Node

4U Server

4 Processor Quad Core

16GB RAM

8 SAS Disk Drives

2 300GB 10K (RAID 1) or higher for OS and applications

6 450GB 10K or higher (RAID 5) for index/cache

SATA is not supported.

iSCSI based storage for the Index Node(s) is not supported.

Fibre Channel or SAS-attached volumes may be used for the index volume if performance requirements are met and the volumes are not shared with other applications or index nodes. Only dedicated SAS or higher speed drives will be supported.

Hardware Requirements on the Dedicated Web Search Server and Web Search Client Node

1U Server

2 Processor Quad Core

16GB RAM

2 SAS Disk Drives

2 146GB 10K SAS (RAID 1) or higher for OS and applications

2 SATA Disk Drives

2 1TB SATA 7K (RAID 1) or higher for web cache (web recoveries) Size disk space based on web recovery requirements. Can be low cost disk.

Backfilling the Index

During content indexing operation, by default, the indexing phase is run in parallel with the content indexing job. Since this slows down the indexing speed, you can choose to suspend/backfill the indexing phase when running the content indexing job. Once the job is complete, you can resume the indexing of the backfilled data. The backfill provides the fastest means to get all of the pre-existing data in a storage policy indexed and searchable. Once the backfill is complete a reset index is needed so that an INDEX of the RAW data (FIXML) that was backfilled can be created. Once the reset index is complete then upcoming Content Indexing jobs will continue processing new data.

When backfilling, note that the search will be available only after the indexing is completed and the indexers are reset.

Use the following steps to backfill the index.

1. From the admin node of the Content Indexing Engine, open the command prompt and navigate to the <CI_Engine_Install_directory>\bin folder. For Example:
E:\<CIEngine_Install_Directory>
cd \bin
To find the CIEngine_Install_Directory, from the RUN command prompt type %fastsearch%
2. Type the command to suspend the indexing. indexeradmin -a suspendindexing
3. Type the command to reset the indexing. Indexeradmin -a resetindex or indexeradmin resetindex –a,

where the [–a] is used for a multi-node environment

Media Agent Relationship to Content Index Performance

MediaAgents are a key part of indexing performance. Prior to building up a content indexing cloud, make sure to review the MediaAgent configuration.