Request

An indexing request is coded using an ses-indexDoc element inside an ses-request. This is shown in the following example. It also shows that several requests can be placed into one payload:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ses-payload SYSTEM "http://www.example.com/ses.dtd">
  <ses-payload payload-id="B42TE241" timestamp="20100825172100" version="2.1">
  <ses-header>
    <ses-sender sender-id="FX45RTDT" name="CM-Server"/>
    <ses-authentication login="cm-server" password=""/>
  </ses-header>
  <ses-request request-id="BR12TI5X">
    <ses-indexDoc docId="4712" collection="collection1"
         mimeType="application/ms-word" usesStreaming="YES">
      <title encoding="plain">testdoc1</title>
      <customAttribute encoding="base64">MGHX2c5=</customAttribute>
      <blob encoding="stream">d--1157180779-000000001-X</blob>
    </ses-indexDoc>
  </ses-request>
  <ses-request request-id="BR12TI5Y">
    <ses-indexDoc docId="4713" collection="collection2">
      <title encoding="plain">testdoc2</title>
      ...
      <blob>Just the blob</blob>
    </ses-indexDoc>
  </ses-request></ses-payload>

The ses-indexDoc element has the following attributes:

  • docId
    The ID of the document to index. Usually this is the content ID (Content Management Server) or the object ID (Template Engine).
  • collection
    The name of the collection to which the document to be indexed is to be added.
  • mimeType
    The MIME type of the document to be indexed. The Search Engine Server uses the MIME type to determine the preprocessor with which the document is to be preprocessed (see Configuring the Search Engine Server).
  • usesStreaming
    YES , if at least one of the attributes to be indexed was transferred to the Search Engine Server via the Streaming-Interface. Otherwise NO.

The ses-indexDoc element contains as subelements all the attributes listed in section Content Indexing. Of these attributes only title, the custom attribute customAttribute, and blob were used in the example above. The encoding of the contents of the object and content attributes to be indexed is specified using the encoding tag attribute in the attribute tags concerned. encoding can have one of the following values:

  • plain
    The value of the attribute to index is not encoded. It is included directly as the value of the element. This is the default.
  • base64
    The attribute value is base64-encoded.
  • stream
    The attribute value is a streaming ticket. The ticket refers to a content that the client has already transferred to the Search Engine Server using the streaming interface (see the explanation below).

If an attribute is base64-encoded or has been transferred to the Search Engine Server via the streaming interface, a preprocessor must have been configured for the MIME type of the document. This preprocessor’s task is to convert the attribute’s content to plain text and to set the value of encoding to plain.

Streaming

A client has the possibility to send the contents of attributes to the Search Engine Server in advance, i. e. prior to sending it an indexing request. This procedure is recommendable for large amounts of binary data because it is faster than base64-encoding the data and including it in the request.

A client uses the so-called streaming interface to transfer such data to the Search Engine Server. The streaming interface is addressed by sending a POST request to the HTTP port of the Search Engine Server, specifying /stream as URL. After the data have been transferred, the client receives a streaming ticket in the response. In the indexing request that follows, the client specifies the ticket ID in the manner described above in order to refer to the data.

The Content Management Server transferrs the contents of generic documents to the Search Engine Server via the streaming interface. This also applies to the body of publication, document, and template objects, if the body is larger than 8 kilobytes. Except for templates, this is also true for the Template Engine (the Template Engine does not send templates to the Search Engine Server for indexing). The minimum amount of data to be transferred via streaming can be configured in the system configuration of the Content Manager and the Template Engine using the minStreamingDataLength entry.