DocMoto
Server

Rebuilding the Search Index on CentOS

Note: This article ONLY applies to DocMoto servers running on CentOS Linux.

Overview

On CentOS the DocMoto server uses Apache's Solr high performance indexing system to index a file's content.

Whilst it is rare it can occasionally be necessary to rebuild Solr's index.

This article describes symptoms that might indicate an index has corrupted and how to go about rebuilding it.

Symptoms of a Corrupt Index

Solr only is used for content searching. The PostgreSQL database controls tag and file name searching.

Possible symptoms for a corrupt Solr index include one or more of the following:

  • The phrase 500 Internal Server Error being displayed in the search footer.
  • Intermittent or inconsistent search results being returned.

The Search Architecture and Process

By way of background it is helpful to understand how the DocMoto server manages file indexing.

Firstly searches are split, tag and file name searches are performed within the PostgreSQL database, only file content searches are passed to the Solr server for processing.

When a new file is added to DocMoto the DocMoto server places a reference to the file in a queue. As and when the DocMoto server is idle it will take a file from the queue, pass it to Solr for indexing, and finally remove it from the queue once Solr reports it has successfully processed the file.

If for any reason Solr is unable to process the file the DocMoto server will "send" the file to the back of the queue and re-try the index at a later stage.

The DocMoto server will re-try 10 times before marking the file as not being possible to index.

The queue is held within a table called queue within the docmoto database in PostgreSQL.

Testing an Index

Before rebuilding it is possible to make a superficial check that the index is OK.

From the command line type the following, where 'term' is a word you know should be in the index.

curl http://localhost:8983/solr/select?q='term'

If the index is OK you should see some XML returned. If the index is corrupt you will either see nothing returned or some kind of error message.

Note: This simple test does not necessarily indicate a an index is OK, it can only be used as a simple quick test.

Rebuilding the Index

Note: For large indexes a rebuild could take several days. In addition as the DocMoto server only sends file for processing when it is otherwise idle it is generally more sensible to rebuild an index over a weekend when user activity is minimal.

Rebuilding the index will make sure of the dmSolr utility which you can download from here.

  1. Make sure you have a complete backup of the DocMoto server. In particular you MUST have a backup of the folder /opt/docmoto

  2. Log into the host CentOS machine via terminal as user 'root'.

  3. Stop the DocMoto and Solr servers by typing:

    service docmoto stop
    

    and...

    service solr stop
    
  4. Log into the PostgreSQL command line tool 'psql' and clear anything left in the queue:

    sudo -u docmoto psql
    

    Once into psql clear the queue

    delete from queue;
    

    Now quit psql by typing:

    q
    
  5. Rename the Solr index folder - it's no longer wanted and can be removed once the re-index is complete

    Firstly move to the directory /opt/docmoto/data/search by typing:

    cd /opt/docmoto/data/search
    

    Then rename the 'index' folder, using a command such as:

    mv index index_`date +'&F&T'`
    
  6. Restart Solr - This will cause a new index folder and contents to be created

    service solr start
    
  7. Copy the dmSolr file to a location where the user docmoto can execute a script. Typically /opt/docmoto

  8. Rebuild the queue using dmSolr by typing:

    sudo -u docmoto ./dmSolr
    

    Note: For the question "Only documents with apostrophes in the name" answer 'n'

    For the question "Enter next version id to include (or return for all)" press the 'return' key.

    The utility now loads up the queue table with references. When complete it returns a message "All done" plus additional details.

    To view the queue table log into psql as before using sudo -u docmoto psql and run the following query:

    select * from queue;
    

    To obtain a count of the number of files to be indexed type:

    select count(*) from queue;
    

    Note: If no files are listed you will need to retrace and recheck the steps above.

  9. Restart the DocMoto server

    service docmoto start
    

The DocMoto server will now re-index the files. You can monitor the progress by querying the number of files remaining in the queue table. ie

sudo -u docmoto psql
select count(*) from queue;

Troubleshooting

Very occasionally a corrupt file can file to be indexed and stop the indexing process. If this happens it may be necessary to delete the file from the queue table. If the queue stalls the top file will be the file causing the problem. To remove the file use the queueid value. ie

delete from queue where queueid = x;

Where x is the queueid of the problematic file.

The dmSolr Utility

The dmSolr utility walks through a DocMoto data store and places a reference to each file into the queue table within DocMoto's PostgreSQL database.

To ensure none of the files are removed during indexing the utility creates a link to each file. Once indexed the Solr indexer removes the link effectively freeing the original file.

Files to be indexed have link references created in the folder /opt/docmoto/data/content/searchkit/.

If for any reason the utility cannot create the link files it will throw an error.

If the utility has been run and aborted for any reason it is possible that links will have been created in the /opt/docmoto/data/content/searchkit/ folder and the queue table. Re-running the utility with files in the searchkit folder will throw an error though the process will usually complete. In general though it is best to clear anything from the /opt/docmoto/data/content/searchkit/ folder and the queue table before re-running the script.

Queue Table

The queue table is a table within the DocMoto database that controls the passing of files for content indexing.

Whenever a file is added to the DocMoto repository a reference is added to the queue table. When otherwise idle the DocMoto server will take each entry from the queue table and pass the associated file to the indexing service.

Once indexed the queue table entry is removed.

To establish if there are any entries in the queue table you need to login to the PostgreSQL database and query the table. ie

sudo -u docmoto psql
select count(*) from queue;

Still have a question?

If you still can't find the answer to your question or need more information, please contact the DocMoto team on +44 (0)1242 225230 or email us

We value your privacy

We use Cookies to make using our website easy and meaningful for you, and to better understand how it is used by our customers. By using our website, you are agreeing to our privacy policy.

I agree