Deleting Documents from the Search Engine Index

In very rare cases the index of the Autonomy search engine may contain documents that no longer exist at the location stored in the index. If, after a search, such a document is contained in the search result, the link associated with it will point to a wrong document or no document at all. In the latter case, an error message is output, giving the impression that the linking of the website is corrupt.

Such documents can be deleted directly from the live index of the search engine. For this (and many more useful cases) the XML client, which is part of CMS Fiona, can be used. The XML client is a tool for support and debugging purposes.

Proceed as follows to find and delete such outdated index entries:

  1. Connect to the Template Engine using the Tcl client:

    instance/myInstance/bin/client localhost teTclPort login password
  2. Load the XmlClient:

    source lib/XmlClient.tcl
  3. Enter the collection to be searched:

    ::sesXmlClient::setDefaultCollection collectionName
  4. The default collection name for the live pages is live-docs. If you do not know the name of the collection, you can look it up from the indexing.incrementalExport.collectionSelection system configuration entry located in the file instance/myInstance/config/indexing.xml.

  5. Perform a search query for the search term (here foreign) to find the documents to be deleted from the index:

    ::sesXmlClient::sesSearch query {foreign*} -resultRecord {docId path}
  6. Check whether the files to which the hits refer exist by accessing each of them:

    obj withId docId
  7. If an error message is returned, the corresponding file does not exist. Thus, it can be deleted from the index of the search engine:

    ::sesXmlClient::deleteDocFromIndex docId

This procedure will delete exactly those references from the index which point to nonexisting files. To update the complete index (i.e. to rebuild it), please proceed as follows:

  1. If you are not connected to the Template Engine, please connect using the Tcl client (as described above).

  2. Mark all files as updated so that they are exported again:

    obj touchAll
  3. Trigger the export:

    app publish