Content Indexing and Search provides the ability to content index and search
files and emails for data discovery and other purposes. This product includes
the following main components:
The Content Indexing Engine can be installed on a single node or on multiple
nodes. A multi-node setup includes the following:
Admin Node – This is the entry point for the data to be indexed (one per CI
cloud). It receives data from the MediaAgent or Content Indexing
iDataAgent and distributes it to the
Index nodes for processing.
Index Node – Performs all data indexing and query and results processing. We
can have up to 8 index nodes per cloud.
Web Search Server (WSS) – SQL 2008 and IIS 7 provide a web service for processing requests
from one or more Web Search Clients.
Web Search Client (WSC) – This provides the web interface for end users and compliance
users. The Web Search Client passes the user requests back to the Web Server for
processing. Users can log into the Web Search Client using Single Sign On.
The number of index nodes required in a Content Indexing cloud depends on the
type of data being content indexed. The content types can be categorized as
follows:
Content Types
Description
Standard
This is a combination of emails and files, such as Microsoft Office
documents.
Multimedia
This type includes large files with little or no text , such as
.mp3, .cad, etc.,
Heavy Text
This includes heavy text files, such as log files, Bloomberg dumps,
etc.,
These content types have a significant impact on Content Indexing
performance and scalability. Contact the Products team for assistance to
plan the Content Indexing and Search implementation in such
environments.
For standard and multimedia content types, the Content Indexing nodes can
index up to 80 million objects per node. Also the indexing rate of each node is
up to 75000 documents per hour. Hence, in such environments, the Content
Indexing cloud can include approximately 3 nodes after which each node is added
for additional capacity only. However, note that a Content Indexing cloud can
include one admin node and a maximum of 8 index nodes.
In
the single server setup, the Content Indexing Engine (admin and index
node), Web Search Server, and Web Search Client are installed in the same server. This
is useful in small and ad-hoc environments where the number of objects
to be indexed will be approximate 50 million or less and has a lighter search
load.
2 300GB 10K (RAID 1) or higher for OS and applications.
Additional space maybe required for web search recovery cache (Job
Results) and can be a low cost disk – size is dependent on amount of
web recovery expected
6 300GB 10K or higher (RAID 5) for index/cache
SATA is not supported.
iSCSI based storage
for the Index Node(s) is not supported.
Fibre Channel or SAS-attached volumes may be used
for the index volume if performance requirements are met and the volumes
are not shared with other applications or index nodes. Only
dedicated SAS or higher speed drives will be supported.
In
a multi-node setup with 2-3 servers, you can install the Content
Indexing Engine (admin and index node), Web Search Server, and Web
Search Client on
the same server. The additional 2 index nodes will be installed on
separate servers.
This setup can support up to 225 million object capacity
and provides a light search load.
2 300GB 10K (RAID 1) or higher for OS and applications.
Additional space maybe required for web search recovery cache (Job
Results) and can be a low cost disk – size is dependent on amount of
web recovery expected
6 300GB 10K or higher (RAID 5) for index/cache
SATA is not supported.
iSCSI based storage
for the Index Node(s) is not supported.
Fibre Channel or SAS-attached volumes may be used for the index
volume if performance requirements are met and the volumes are not
shared with other applications or index nodes. Only
dedicated SAS or higher speed drives will be supported.
2 300GB 10K (RAID 1) or higher for OS and applications
6 450GB 10K or higher (RAID 5) for index/cache
SATA is not supported.
iSCSI based storage
for the Index Node(s) is not supported.
Fibre Channel or SAS-attached volumes may be used for the index
volume if performance requirements are met and the volumes are not
shared with other applications or index nodes. Only
dedicated SAS or higher speed drives will be supported.
In
a multi-node setup with 2-9 servers, the Content Indexing Engine (admin
node), Web Search Server, and Web Search Client are installed in a single server. A
maximum of 8 index nodes will be installed on separate servers.
This setup can support up to
approximately 625 million object capacity
and provides a light search load.
2 300GB 10K (RAID 1) or higher for OS and applications
6 450GB 10K or higher (RAID 5) for index/cache
SATA is not supported.
iSCSI based storage
for the Index Node(s) is not supported.
Fibre Channel or SAS-attached volumes may be used for the index
volume if performance requirements are met and the volumes are not
shared with other applications or index nodes. Only
dedicated SAS or higher speed drives will be supported.
In
environments that require approximately 625 million object capacity and also a heavy search
load, you can have all the components installed on dedicated servers. In
this setup, the admin node will be a separate server, and the index
nodes will be installed on individual servers each. The Web Search Server and
the Web Search client will be installed on a single server.
This setup provides the best performance and is always preferred
whenever possible.
2 300GB 10K (RAID 1) or higher for OS and applications
6 450GB 10K or higher (RAID 5) for index/cache
SATA is not supported.
iSCSI based storage
for the Index Node(s) is not supported.
Fibre Channel or SAS-attached volumes may be used for the index
volume if performance requirements are met and the volumes are not
shared with other applications or index nodes. Only
dedicated SAS or higher speed drives will be supported.
During content indexing operation, by default, the indexing phase is run in
parallel with the content indexing job. Since this slows down the indexing
speed, you can choose to suspend/backfill the indexing phase when running the
content indexing job. Once the job is complete, you can resume the indexing of
the backfilled data. The backfill provides the fastest means to get all of the
pre-existing data in a storage policy indexed and searchable. Once the backfill
is complete a reset index is needed so that an INDEX of the RAW data (FIXML)
that was backfilled can be created. Once the reset index is complete then
upcoming Content Indexing jobs will continue processing new data.
When backfilling, note that the search will be available only after the
indexing is completed and the indexers are reset.
Use the following steps to backfill the index.
1.
From the admin node of the Content Indexing Engine,
open the command prompt and navigate to the <CI_Engine_Install_directory>\bin
folder.
For Example:
E:\<CIEngine_Install_Directory>
cd \bin
To find the CIEngine_Install_Directory, from the RUN command prompt type
%fastsearch%
2.
Type the command to suspend the indexing.
indexeradmin -a
suspendindexing
3.
Type the command to reset the indexing.
Indexeradmin -a resetindex or
indexeradmin resetindex –a,
where the [–a] is used for a multi-node environment