DocMoto
Server

Understanding the DocMoto Backup Dataset

It goes without saying that all DocMoto systems should be backed up, but what does that actually get you? This article explains the contents of a backup set.

Files and Databases

DocMoto keeps file names, their parent folders, their meta data and other details in a PostgreSQL database. Whenever you view a DocMoto repository through the DocMoto client the folder and file hierarchy that is displayed comes directly from this database.

The actual files themselves are not held in the database. This is for good technical reasons, it makes retrieval faster and makes backup much simpler. Instead they are held within the Contents folder within /var/opt/docmoto/archive.

The files are arranged in DocMoto's own folder system, with names typically of the form FXX eg F30. The current version of each file is held in its original form, with the exception that it has a different name, typically something of the form DocXX.bin, eg Doc33.bin, other previous versions are held as deltas, also with the .bin extension.

The database holds the original name of a file, and its current location in the archive folder.

A Backup Set

So, to get a "good" backup set you need a copy of the archive folder and a dump of the database. Moreover the database dump must match exactly the contents of the archive folder. It's no use for example having a database dump 3 weeks out of date with the archive.

To get a good backup you should ideadlly stop the DocMoto server (thus preventing anybody changing any files), take a database dump, copy the archive, and restart the DocMoto server. This way you know for sure that the database dump matches the contents of the archive folder.

It will come as no surprise to learn that the backup utility shipped with DocMoto does exactly as described above.

Restoring from a Backup Set

In most cases the purpose of a backup set it to restore a "downed" DocMoto installation. We deal with exactly how to do this in our article covering restoring a DocMoto server.

Working without a Database Backup

So now you understand the architecture of a DocMoto backup set the question of what happens if things have gone so badly wrong that you don't have a database dump, how do you get files back?

The answer is the little XML file that accompanies each and every file in the archive folder. The XML file is your "get out of jail" card. The XML file has the same name as the corresponding .bin file, except its extension is .xml. For example Doc16.bin has a corresponding XML file of Doc16.xml.

The XML file contains all kinds of data about its corresponding .bin file, see below. Crucially it contains the .bin file's original name (documentName) and its location (folderName) in the hierarchy.

<!--?xml version="1.0" encoding="UTF-8" standalone="no" ?--><docmoto documentname="blank_pages.pages" foldername="/DocMotoConfig/MasterDocuments/">
  <tags>
    <tag name="Comment" value="Apple Pages">
    <tag name="Comment" value="">
    <tag name="Copyright" value="(c)">
    <tag name="DAV:/checked-in-user" value="administrator">
    <tag name="DAV:/creationdate" value="2013-10-22 08:15:35 +0">
    <tag name="DAV:/getlastmodified" value="2013-10-22 08:15:35 +0">
    <tag name="DOCMOTO:/documentid" value="EEB2B04E-CB25-46A6-9BE4-FE239D691B45">
    <tag name="FileModificationDateTime" value="2013:10:22 09:15:35+01:00">
    <tag name="FileSize" value="28 kB">
    <tag name="FileType" value="Apple Pages">
    <tag name="MIMEType" value="application/x-iwork-pages-sffpages">
    <tag name="PreviewImage" value="(Binary data 3637 bytes, use -b option to extract)">
    <tag name="Terms_Indexed_In_Document" value="-1">
    <tag name="ZipBitFlag" value="0">
    <tag name="ZipCRC" value="0xc3ad363b">
    <tag name="ZipCompressedSize" value="3637">
    <tag name="ZipCompression" value="None">
    <tag name="ZipFileName" value="QuickLook/Thumbnail.jpg">
    <tag name="ZipModifyDate" value="2012:03:19 16:52:27">
    <tag name="ZipRequiredVersion" value="20">
    <tag name="ZipUncompressedSize" value="3637">
  </tag></tag></tag></tag></tag></tag></tag></tag></tag></tag></tag></tag></tag></tag></tag></tag></tag></tag></tag></tag></tag></tags>
</docmoto>

To recover a file then is as simple as renaming the .bin file with the value of the documentName attribute from the xml file. That's it.

Remember occasions where it's necessary to work without a database are extremely rare. But if they do occur you will need to use command line tools like grep to find the correct XML file, and from that the correct .bin file.

If a large scale recovery is required this is possible using the XML files, you simply need to work through them and rename the appropriate .bin. The code below indicates how that can be done.

# THIS SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON INFRINGEMENT. 
# IN NO EVENT SHALL CHLSOFTWARE BE LIABLE FOR ANY CLAIM, DAMAGES OR 
# OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 
# OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 
# SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

import sys
import os
import urllib
import shutil
from xml.dom import minidom
from xml.parsers.expat import ExpatError
import datetime
import tarfile


# Change this value to the path where you want the files exported
baseNewDirectory = "/Users/fredjones/export files"

# Change this value to the path of the DM content
baseDocMotoStore = "/var/opt/docmoto/archive"

# May find some modules not installed, use pip eg pip install urllib

############################################ Change nothing from here ##########################################

now = datetime.datetime.now()


def main():
    process()
    print 'nAll done'


def process():
    fileList = []
    for root, dirs, files in os.walk(baseDocMotoStore):
        for file in files:
            if file.endswith(".xml"):
                #Look for bin file with same name
                myBinPath = os.path.join(root, file)
                myBinPath = myBinPath.replace(".xml", "")
                if os.path.exists(myBinPath):
                    fileList.append(os.path.join(root, file))
                    #print os.path.join(root, file)

    processRows(fileList)

def testXML(myXMLFile):
    try:
        in_xml = minidom.parse(myXMLFile)
        return True
    except ExpatError:
        print "Bad XML for " + myXMLFile
        return False

def parseXML(myXMLFile):
    myResults = dict()

    xmldoc = minidom.parse(myXMLFile)
    docmotoElement = xmldoc.getElementsByTagName('docmoto')

    # Add unquoted name and path
    myResults['documentName'] = urllib.unquote(docmotoElement[0].attributes['documentName'].value)
    myResults['folderName'] = urllib.unquote(docmotoElement[0].attributes['folderName'].value)

    # Remove front forward slash
    myResults['folderName'] = myResults['folderName'].replace('/','',1)

    # Add .bin name and path
    myResults['binFileName'] = myXMLFile.replace(".xml", "")

    # Deal with any XML tags
    myTagList = xmldoc.getElementsByTagName('tag')

    myDate = ''
    myCompositeFileType = ''
    for tag in myTagList:
        if tag.attributes['name'].value == 'DAV:/creationdate':
            myDateParts = (tag.attributes['value'].value).split(' ')
            myDate = myDateParts[0]

        if tag.attributes['name'].value == 'DOCMOTO:/composite-file-type':
            myCompositeFileType =  tag.attributes['value'].value

    myResults['creationDate'] = myDate
    myResults['compositeFileType'] = myCompositeFileType

    return myResults


def processRows(in_rows):
    validRows = []
    rows = iter(in_rows)
    for row in rows:

        if(not testXML(row)):
            next(rows)
            continue

        myResult = parseXML(row)
        validRows.append(myResult)
        
    # Now have files so print them out
    processValidRows(validRows)

def processValidRows(rows):
    count = 0
    for validRow in rows:

        sys.stdout.write("Number of files rebuilt: %d   r" % count)
        sys.stdout.flush()

        createFile(validRow)
        count = count + 1

def createFolderStructure(myBranch):
    myPath = baseNewDirectory
    for part in myBranch:
        myPath = myPath + "/" + part
        if not os.path.exists(myPath):
            os.makedirs(myPath)
    return myPath

def createFile(validRow):

        folderPath = createFolderStructure(validRow['folderName'].split('/'))

        if validRow['compositeFileType'] != 'Package':
            shutil.copy2(validRow['binFileName'], '/' + folderPath + '/' + validRow['documentName'])
        else:
            #Need to untar it into a folder with the name of the file
            myTempFileName =  '/' + folderPath + '/' + validRow['documentName'] + '.tmp'
            myPackageFolderName = '/' + folderPath + '/' + validRow['documentName']

            # Create temp file
            shutil.copy2(validRow['binFileName'],myTempFileName)

            # Create folder with app name
            if not os.path.exists(myPackageFolderName):
                os.makedirs(myPackageFolderName)

            # Unpack the zip into the folder
            tar = tarfile.open(myTempFileName)
            tar.extractall(path=myPackageFolderName)
            tar.close()

            # Remove the temp file
            os.remove(myTempFileName)

if __name__ == '__main__':
    main()

Still have a question?

If you still can't find the answer to your question or need more information, please contact the DocMoto team on +44 (0)1242 225230 or email us

We value your privacy

We use Cookies to make using our website easy and meaningful for you, and to better understand how it is used by our customers. By using our website, you are agreeing to our privacy policy.

I agree