System Requirements - Content Indexing Engine


The following requirements are for the Content Indexing Engine:

Operating System

Windows 2003

Microsoft Windows Server 2003 32-bit Editions with a minimum of Service Pack 1

Windows 2008

Microsoft Windows Server 2008 32-bit Editions

Processor

Dual-Core Intel® Xeon® processor 5100 series minimum required; Quad-Core Intel® Xeon® processor 5300 series recommended

Memory

4 GB RAM minimum required; 8 GB RAM recommended

Virtual memory should be set to twice the amount of available physical memory

Hard Disk

1 GB of local disk space for software.

1 TB of local disk space for index directories.

See Also:

Peripherals

DVD-ROM drive

Port Requirements

A sequential port range that spans 4000 ports on each computer that will host the Content Indexing Engine. For example, port range 13000 – 17000.

These are local ports used by the Content Indexing Services. These ports are NOT required to be opened in the firewall.

By default the Content Indexing Engine uses a default base port number of 13000 and a default administration GUI port number of 16000 (base port plus 3000). Using ports below 1024 is not recommended.

Miscellaneous

TCP/IP Services configured on the computer.

The File System iDataAgent will be automatically installed during installation of this software, if it is not already installed. For System Requirements specific to the File System iDataAgent, refer to System Requirements - Microsoft Windows File System iDataAgent.

Microsoft Visual C++ 2008 Redistributable Package is automatically installed. Note that Visual C++ 2008 Redistributable Package can co-exist with other versions of this software.

.NET Framework 3.5 with Service Pack 1 is automatically installed. Note that .NET Framework 3.5 can co-exist with other versions of this software.

Notes on Content Indexing Engine Installations

The Content Indexing Engine is a resource intensive application that requires dedicated servers. There should be no other applications or software components installed.

Each node should have a single physical network interface card (NIC) with a unique name and a static IP address on all host computers where the Content Indexing Engine will be installed.

Content Indexing Engines can only be installed on hosts that have a fully qualified domain name.

Before installing the Content Indexing Engine on a Windows 2008 machine, it is recommended to turn off the User Account Control (UAC).

Use the ipconfig/all command to verify the hostname and fully qualified domain name for the installation host. Also, ensure the hostname and the fully qualified domain name are reachable from the CommCell network and resolve correctly using DNS.

Perform forward and reverse lookups on the host's fully qualified domain name. Execute the command nslookup <host_name.fully_qualified_domain_name.com> where host_name.fully_qualified_domain_name.com is the fully qualified domain name of the installation host. Repeat for each installation host and note the reply from the command. Make a reverse lookup on the IP address returned by the command. Make sure that the primary host name is the same as the one returned by the name lookup. In short, for each host, the information across these commands and files should be consistent: hostname, nslookup and/or ipconfig.

When installing on multiple nodes, the date/time on the nodes must be synchronized.

The Content Indexing Engine requires that the clock is kept in sync and not abruptly corrected forwards or backwards.  Avoid manual clock adjustments and consider using professional software for keeping clocks synchronized.

Set the server Time Zone to either GMT or UTC and always uncheck the option Automatically adjust clock for daylight saving changes on the Time Zone settings.

It is recommended that anti-virus software is not run on the servers running the Content Indexing Engine. If anti-virus software must be installed as a business requirement, the Content Indexing software and data directories must be excluded from real-time or scheduled scanning.

Disable the Windows Indexing Service. The Content Indexing data directory should be located on a dedicated physical disk separate from the operating system and application installations. The optimal configuration would include a mirrored set (RAID1) for the operating system and application installations and a striped set (RAID 5 or 10) for the Content Indexing data directory.

If possible, install The Content Indexing Engine on a separate physical disk than the one Windows is running from. e.g., C:\. Do not assign the paging file or system directories on this disk. The optimal is to install The Content Indexing Engine on a striped disk array (RAID 0).

If a Multi Node installation is performed, the temporary directory of the user doing the installation needs to be on a partition with the administrative share enabled, and this user needs to be able to access that administrative share. The user's temporary directory is usually on the C:\ partition. To test if the administrative share is enabled and accessible for partition C:\ on the host myhost.example.com, execute the command dir \\myhost.example.com\c$ from a different host. This should produce the same output as the command dir c: run locally on myhost.example.com.

When installing behind a firewall, make sure that all the nodes are in the same side of the firewall.

In a multi-node setup, the admin node and the index nodes must reside on the same GIGE LAN subnet.

Outlook 2003 or later should be installed on the Web Search node.

Installation Types

The Content Indexing Engine provides two installation types:

  • Single Node

    Installs all required components on a single host. This installation type is primarily intended for evaluation or demonstration purposes and smaller environments. The installation requires very little user interaction, but the user should be familiar with most of the concepts explained for the Multi Node installation type before running the installer. See Install the Content Indexing Engine - Single Node Installation for step-by-step instructions.

  • Multi Node

    Provides a fully interactive GUI for selecting and deploying components across all target hosts. This will build an InstallProfile.xml file which contains the full system description. This must be saved, and used during the installation for all the nodes in the cluster. See Install the Content Indexing Engine - Multi-Node Installation for step-by-step instructions.

Contact Professional Services for assistance in planning and deploying the Content Indexing Engine and Search in your environment.

Notes on Space Requirements

It is a best practice to allocate 1TB space per node for the Content Indexing Engine. This allocation of space will account for the required space necessary to sustain the metadata within the index and the transient need to allocate additional space as the index partitions are built and re-distributed. During the re-distribution of indexed content among partitions, the space consumption of what is in the INDEX at the time of the partition re-order will be 100% of the index footprint. On completion of that partition ordering process the staging space is released.

This is very similar to the optimization process that any indexer (eg., Alta Vista, a deduplication appliance, etc) will use for re-ordering the content associated with index performance optimization.

For email data within the Index, the estimate will be 33% of the size of the total mail volume to be indexed as part of the index size.

For file system data within the Index, the estimate will be between 3-10% depending on the data type.

Examples:

  • In the case of a small text only document, since the file is very small and only composed of text, the index footprint would be 100%.
  • A 3MB Microsoft PowerPoint presentation will normally index to approximately 10-20KB of the text.
  • A document including many images may index to a handful of pages of actual text and have a 1-3% index footprint.

There are a couple of ways to reduce the size of the Content Indexes.

  1. Filters can be defined that either include or exclude specific document types. Images and multimedia files are of no value in a Content Index as they have no body to index.
  2. Set proper retention settings for the Content Indexes. For example, if vast majority of searches will be performed within 90 days of them entering the system, then you can set the retention settings such that the Content Indexes expire after 90 days. All data that was retained can still be re-Content Indexed if there was a special need to retain them beyond the retention date.

Hard Disk Recommendations

Keep in mind the following hard disk recommendations:

  • A minimum of three disks in a RAID configuration are recommended.
  • Serial Attach SCSI (SAS) 15K RPM hard disks are recommended for optimum performance.
  • SATA (Serial Advanced Technology Attachment) disks are recommended for Content Indexing in environments with minimum search and index concurrency.

The following table provides a summary of available disk options:
Criteria Description SATA SAS 10k SAS 15k
Performance The primary metric to predict Content Indexing performance will be the sustained I/O Operations Per Second (IOPS). Optimum IOPS is obtained by using 4 or more spindles in a RAID configuration. Good Better Best
Reliability The MTBF (Mean Time Between Failure) for SAS disks is usually between 1 and 1.5 million hours. For SATA disks, the figure is between 0.6 and 1 million hours. Good Best Better
Cost Enterprise SATA disks offer a lower price per GB than equivalent SAS disks. The total cost of ownership should consider not only the initial price, but also the expected lifespan of the disk. Best Better Good

DISCLAIMER

Minor revisions and/or service packs that are released by application and operating system vendors may, in some cases, affect the working of our software. Although we may list such revisions and/or service packs as “supported” in our System Requirements, changes to the behavior of our software resulting from an application or operating system revision/service pack may be beyond our control. However, we will make every effort to correct such disruption as quickly as possible. When in doubt, please contact your software provider to ensure support for a specific application or operating system.

Additional considerations regarding minimum requirements and End of Life policies from application and operating system vendors are also applicable.