This configuration element, which can be found in the instance-specific
config/indexing.xml
file, determines the details of content
indexing by the Content Management Server and the Template Engine.
advancedSearch
: Configures the indexing when the
advanced search is used in the Content Management Server. The element
has the same subentries as incrementalExport
.
contentPreprocessors
: This element defines
preprocessors, that are called before versions are indexed. If no
preprocessors are to be called,
<contentPreprocessors />
must be specified.
Example for an internal and an external preprocessor definition:
<contentPreprocessors type=list> <preprocessor> <processor type="internal"/> <mimeTypes type="list"> <mimeType>application/vnd.ms-excel</mimeType> <mimeType>application/vnd.ms-powerpoint</mimeType> <mimeType>application/msword</mimeType> </mimeTypes> </preprocessor> <preprocessor> <processor type="external">bin/tclsh</processor> <processorArguments type="list"> <argument>pdfToTextWrapper.tcl</argument> </processorArguments> <mimeTypes type="list"> <mimeType>application/pdf</mimeType> </mimeTypes> </preprocessor> <preprocessor> <!-- Another preprocessor for other MIME types --> </preprocessor> </contentPreprocessors>
Each preprocessor is responsible for at least one MIME
type. As with all lists, The contentPreprocessors
Element has an obligatory attribute, type="list"
.
This element consists of subelements each of which defines a
preprocessor. Each preprocessor
subelement has the
following subelements:
mimeTypes
defines the MIME
types of the versions to be processed by this preprocessor.
Attributes: type
with the value
list
(obligatory).
Content: For each MIME type a
mimeType
element, whose value is the respective
MIME type (for example text/html
).
processor
defines the
preprocessor for versions with one of the specified MIME
types.
Attributes: type
with one of
the following values: internal
,
external
, ignore
,
ignoreBlob
. Default: external
.
Content, if type
has the value
internal
: The blob is filtered by the Verity
filter application before it is indexed.
Content, if type
has the value
ignore
: the version is not indexed; the content
of the element is ignored.
Content, if type
has the value
ignoreBlob
: empty. All fields except the main
content are indexed. The main content is not converted
(normally, all field values are converted to plain text
before a version is indexed).
Content, if type
has the
value external
: The data to be indexed is
passed to the program specified. Further arguments can be
passed to it by means of the
processorArguments
element. For
further
explanations on the external preprocessor facility
please refer to the Search Server documentation.
processorArguments
is optional.
This element defines the arguments to be passed to the program
defined as processor
.
Attributes: type
with the value
list
(obligatory).
Content: Each commandline argument is
specified as the content of an argument
subelement.
Note: Up to version 6.7.0, the commandline arguments need
to be provided directly as the value of the
processorArguments
element (e.g.
<processorArguments>pdfToTextWrapper.tcl</processorArguments>
).
incrementalExport
: Configures the indexing for
the incremental export. The element has the following subentries:
isActive
: Switches indexing on
(true
) or off (false
).
collectionSelection
: Defines rules that
determine the collection to be used for indexing a document.
Example:
<collectionSelection> <select collection="cm-contents"> <isEqual name="state" value="edited"/> </select> <select collection="cm-contents"> <isEqual name="state" value="released"/> </select> </collectionSelection>
In each select
element
collection
determines a collection into which a document
is indexed if all of the rules contained in the element apply. The
rules contained in a select
element are AND-related. An
OR relation can be formed by using more than one select
element in which the same collection name is specified. If the
collection
attribute is omitted, the document is not
indexed if the rules apply. The rules are processed one by one. The
first set of rules that applies determines the collection into which
the document is indexed. Each rule is represented by one element and
can be reversed by adding the tag attribute
negate="true"
. The following rules exist:
isEqual
: This rule applies if the
value of the file or version field specified by means of the
name
tag attribute exactly corresponds to the string
value
. Example:
<isEqual
name="mimeType" value="application/x-shockwave-flash"
/>
isTrue
: This rule applies if
the file or version field specified by means of the
name
tag attribute has the value
true
, yes
, or 1
.
isFalse
: This rule applies if
the file or version field specified by means of the
name
tag attribute has the value
false
, no
, or 0
hat.
hasPrefix
: This rule applies if
the value of the file or version field specified by means of
the name
tag attribute begins with the string
value
. Example:
<hasPrefix
name="mimeType" value="application/" />
hasSuffix
: This rule applies if
the value of the file or version field specified by means of
the name
tag attribute begins with the string
value
. Example:
<hasSuffix
name="mimeType" value="/zip" />
matches
: This rule applies if
the value of the file or version field specified by means of
the name
tag attribute contains a string that
matches the regular expression specified as value
.
Example:
<matches name="collspec"
value=".*live.*" />
staticExport
: Configures the indexing for the
static export by the Content Management Server. The element has the
same subentries as incrementalExport
.
vseLocale
: Determines the locale (language
specific settings) the Verity Search Cartridge is to use.
uni
, germanx
, and englishx
are
available by default (additional locales can be acquired).
uni
is a universal locale that uses the UTF-8 character
encoding. However, no language-specific search query functions such as
stemming or typographical tolerance can be used. The value specified is
applied to all collections. If this value is changed, all collections
need to be created again.