Rebuilding the Search Index on CentOS
Note: This article ONLY applies to DocMoto servers running on CentOS Linux.
On CentOS the DocMoto server uses Apache's Solr high performance indexing system to index a file's content.
Whilst it is rare it can occasionally be necessary to rebuild Solr's index.
This article describes symptoms that might indicate an index has corrupted and how to go about rebuilding it.
Symptoms of a Corrupt Index
Solr only is used for content searching. The PostgreSQL database controls tag and file name searching.
Possible symptoms for a corrupt Solr index include one or more of the following:
- The phrase 500 Internal Server Error being displayed in the search footer.
- Intermittent or inconsistent search results being returned.
The Search Architecture and Process
By way of background it is helpful to understand how the DocMoto server manages file indexing.
Firstly searches are split, tag and file name searches are performed within the PostgreSQL database, only file content searches are passed to the Solr server for processing.
When a new file is added to DocMoto the DocMoto server places a reference to the file in a queue. As and when the DocMoto server is idle it will take a file from the queue, pass it to Solr for indexing, and finally remove it from the queue once Solr reports it has successfully processed the file.
If for any reason Solr is unable to process the file the DocMoto server will "send" the file to the back of the queue and re-try the index at a later stage.
The DocMoto server will re-try 10 times before marking the file as not being possible to index.
The queue is held within a table called queue within the docmoto database in PostgreSQL.
Testing an Index
Before rebuilding it is possible to make a superficial check that the index is OK.
From the command line type the following, where 'term' is a word you know should be in the index.
If the index is OK you should see some XML returned. If the index is corrupt you will either see nothing returned or some kind of error message.
Note: This simple test does not necessarily indicate a an index is OK, it can only be used as a simple quick test.
Rebuilding the Index
Note: For large indexes a rebuild could take several days. In addition as the DocMoto server only sends file for processing when it is otherwise idle it is generally more sensible to rebuild an index over a weekend when user activity is minimal.
Make sure you have a complete backup of the DocMoto server. In particular you MUST have a backup of the folder /opt/docmoto
Log into the host CentOS machine via terminal as user 'root'.
Stop the DocMoto and Solr servers by typing:
service docmoto stop
service solr stop
Log into the PostgreSQL command line tool 'psql' and clear anything left in the queue:
sudo -u docmoto psql
Once into psql clear the queue
delete from queue;
Now quit psql by typing:
Rename the Solr index folder - it's no longer wanted and can be removed once the re-index is complete
Firstly move to the directory /opt/docmoto/data/search by typing:
Then rename the 'index' folder, using a command such as:
mv index index_`date +'&F&T'`
Restart Solr - This will cause a new index folder and contents to be created
service solr start
Copy the dmSolr file to a location where the user docmoto can execute a script. Typically /opt/docmoto
Rebuild the queue using dmSolr by typing:
sudo -u docmoto ./dmSolr
Note: For the question "Only documents with apostrophes in the name" answer 'n'
For the question "Enter next version id to include (or return for all)" press the 'return' key.
The utility now loads up the queue table with references. When complete it returns a message "All done" plus additional details.
To view the queue table log into psql as before using sudo -u docmoto psql and run the following query:
select * from queue;
To obtain a count of the number of files to be indexed type:
select count(*) from queue;
Note: If no files are listed you will need to retrace and recheck the steps above.
Restart the DocMoto server
service docmoto start
The DocMoto server will now re-index the files. You can monitor the progress by querying the number of files remaining in the queue table. ie
sudo -u docmoto psql select count(*) from queue;
Very occasionally a corrupt file can file to be indexed and stop the indexing process. If this happens it may be necessary to delete the file from the queue table. If the queue stalls the top file will be the file causing the problem. To remove the file use the queueid value. ie
delete from queue where queueid = x;
Where x is the queueid of the problematic file.
The dmSolr Utility
The dmSolr utility walks through a DocMoto data store and places a reference to each file into the queue table within DocMoto's PostgreSQL database.
To ensure none of the files are removed during indexing the utility creates a link to each file. Once indexed the Solr indexer removes the link effectively freeing the original file.
Files to be indexed have link references created in the folder /opt/docmoto/data/content/searchkit/.
If for any reason the utility cannot create the link files it will throw an error.
If the utility has been run and aborted for any reason it is possible that links will have been created in the /opt/docmoto/data/content/searchkit/ folder and the queue table. Re-running the utility with files in the searchkit folder will throw an error though the process will usually complete. In general though it is best to clear anything from the /opt/docmoto/data/content/searchkit/ folder and the queue table before re-running the script.
The queue table is a table within the DocMoto database that controls the passing of files for content indexing.
Whenever a file is added to the DocMoto repository a reference is added to the queue table. When otherwise idle the DocMoto server will take each entry from the queue table and pass the associated file to the indexing service.
Once indexed the queue table entry is removed.
To establish if there are any entries in the queue table you need to login to the PostgreSQL database and query the table. ie
sudo -u docmoto psql select count(*) from queue;