Creating a folder monitor using Python
In this article we will look at how to set up a monitored folder using the DocMoto API and python.
The objective is to create a script that will monitor a given folder, and when a new file (or folder) is added have it automatically uploaded to DocMoto.
The script we will create only deals with "new" files and folders. It does not attempt to "synchronize" a folder and file set or handle renamed folders or files.
A complete set of project files is available here.
Included in the download is a python script "LwpMotoUtils.py". This wraps up the WebDAV api calls into a nice simple to use library and makes use of the python "urllib2" library.
There are several options open to set up a folder monitor. We could use Applescript to look for additions, and have it call a python script to upload the new files and folders.
This approach is fine except that Applescript only notifies us of new files (or folders) in the monitored folder directly, it doesn't notify us of additions in any sub folders.
So for this example we are using the python Watchdog module. In particular we will be using the directory snap shot utilities to spot new file and folders being added.
Note: By default Watchdog does not appear to be installed on a Mac. The Watchdog web site provides details on how to install the module.
Tools and environment
All scripts have been developed using the excellent and free community edition of PyCharm from JetBrains.
All scripts were developed using python 2.7 on a MacBook Pro running OS X 10.8.5.
The script (simpleMonitor.py)
In the bundle is a script called "simpleMonitor.py". This does the work, and needs to be set up to permanently monitor our folder.
if __name__ == "__main__": myMonitor = CHLMonitor() myMonitor.watch(path='../../test',recursive=True) myMonitor.scan() var = 1 while var == 1: myMonitor.scan() time.sleep(10)
The script creates an instance of a class CHLMonitor. A watch is then set on the folder we are interested in monitoring (in this case the folder is "test" in the home directory).
The "watch" method takes two parameters. The "path" (either absolute or relative to the location of simpleMonitor.py) is the folder to monitor. Recursive (True or False) defines whether to scan sub folders.
The "scan" method takes a snap shot. After an initial scan subsequent scans are taken every 10 seconds.
This script contains the class definition for the CHLMonitor class. The "scan" method is the most important function.
def scan(self): if not self._is_started: self._is_started = True self._newSnapshot = DirectorySnapshot( path=self._path, recursive=self._is_recursive, walker_callback=walker_callback) self._oldSnapshot = self._newSnapshot else: self._newSnapshot = DirectorySnapshot(path=self._path, recursive=self._is_recursive, walker_callback=walker_callback) self._diffSnapshot = DirectorySnapshotDiff(self._oldSnapshot,self._newSnapshot) self._oldSnapshot = self._newSnapshot self.process_diff() return
The method is responsible for managing the "old" and "new" snapshots taken at successive scans. Any differences between the "old" and "new" scans are sent to the "process_diff" method for processing.
def process_diff(self): #Securely connect to DocMoto myRet = LwpMotoUtils.secureinitialize(DOCMOTO_USERNAME,DOCMOTO_PWD,DOCMOTO_SERVER_URL) if not myRet['returnOK']: self.writeLog("DocMoto connection error " + str(myRet['responseString'])) return self.process_dirs() self.process_files() return
The method starts by opening a connection with DocMoto. The variables DOCMOTO_USERNAME,DOCMOTO_PWD and DOCMOTO_SERVER_URL being set earlier in the "simpleMonitor" script.
If successful, folders and files are sent for processing by the methods "process_dirs" and "process_files" respectively.
The process_dirs method
Essentially this method checks if a folder already exists, and if not then creates it.
def process_dirs(self): for my_path in self._diffSnapshot.dirs_created: myURL = self._root #Remove unwanted part of path my_path = my_path.replace(self._path,'') #Check each dir en route my_dir_list = my_path.split("/") for my_dir in my_dir_list: if len(my_dir) > 0: myURL = myURL + "/" + urllib.quote(str(my_dir)) myRet = LwpMotoUtils.checkExists(myURL) if(not myRet['exists']): LwpMotoUtils.createCollection(myURL) if LOG_MODE > 0: self.writeLog("Dir created " + myURL)
Two loops are involved. The first loops through all the directories listed as being created in the snap shot difference object (self._diffSnapshot.dirs_created).
The second handles the presence of any sub folders. This is achieved by splitting the path into a list (my_dir_list) and looping through the list checking for the folder in DocMoto using the "checkExists" method.
Any values passed to DocMoto MUST be valid URLs (you cannot for example have a space in a folder name). The python urllib library provides an ideal function for this in the "quote" method.
The process_files method
The process_files method effectively passes the file to the "sendFile" method, and takes care of any post-upload operations.
def process_files(self): for my_path in self._diffSnapshot.files_created: myURL = self._root myFullFileName = my_path #Remove unwanted part of path to get the name only myFileName = my_path.replace(self._path,'') myURL = myURL + urllib.quote(str(myFileName)) if self.sendFile(myURL,myFullFileName): if REMOVE_AFTER_UPLOAD == 1: self.archiveFile(myFileName,myFullFileName) elif REMOVE_AFTER_UPLOAD == 2: self.removeFile(myFullFileName)
The sendFile method
The sendFile method starts by copying the file into a buffer, then streams this across to the DocMoto server.
The method uses the LwpMotoUtils.checkExists to see if the file already exists in DocMoto. If communication is via the standard ports (3983 or 3984) DocMoto will not accept the file if it already exists. If communication is via ports 4983 or 4984 then DocMoto will "append" the new file to the existing one by creating a new version.
def sendFile(self,myURL,fileName): try: f = open(fileName,'rb') buff = f.read() f.close() except: self.writeLog("Unexpected sendFile return error for file " + fileName) return False #Quick look to see if exists myRet = LwpMotoUtils.checkExists(myURL) if(myRet['exists']): self.writeLog("File already exists error for file (will be ignored if on port 4983 or 4984) " + fileName) LwpMotoUtils.createPut(myURL,buff) # Add any properties. Correspond to DocMoto tags request = """
""" # Confirm LwpMotoUtils.createPropPatch(myURL, request) LwpMotoUtils.createVersionControl(myURL) if LOG_MODE > 0: self.writeLog("File created " + fileName) return True Moved up by file monitor app
the sendFile method makes use of three DocMoto utility calls to send the file, "createPut", "createPropPatch" and "createVersionControl". Note that we are sending tag information over using "createPropPatch". In this case we are completing a value for the "comments" tag. We can extend this to include other tags by extending the XML sent by "createPropPatch".
NB If using the pure WebDAV ports (4983 or 4984) sendFile will generate two versions since the createPropPatch action creates a new version under these ports. If you want to avoid two versions you can omit the createPropPatch and createVersionControl method calls. The only downside being you cannot set properties such as a comment.
Clean up, archiving and logs
Just for completeness the script also has routines to clean up, make notes in log files and move all files uploaded to an archive area. These can be enabled/disabled with the use of simple flags defined at the start of the script.
Setting it up
In DocMoto create a folder where the files are to be uploaded, eg "monitortest".
Download the zip file containing the scripts. Copy the uncompressed folder into your home directory. Create a folder (also in your home directory) called "test". This will be the folder we are monitoring.
Add values for the variables listed at the top of the simpleMonitor.py script. You will need a DocMoto server URL as well as a valid DocMoto user name and password.
Note: You should generally not use an actual user's name and password. This is because a user cannot be logged in twice, so if the script logs in whilst the user is connected the user will be logged out.
The script needs to be running to perform its monitoring. This can be simply achieved by running the script from a terminal window and leaving the window open.
The script can be stopped by pressing ctrl and c;
However this isn't very robust, a much better solution is to use a plist. plists are a great way to ensure a process runs permanently, even after a reboot.
With the script running drop some folders and files into the "test" folder. Within a few seconds (depending on how many you dropped in) you should see the files listed within DocMoto.
If not then you will need to check the settings. The script does write a log in a file called "activity.log" within the script folder which can be a useful source of debug information.
Limitations of file monitoring
File monitoring in this form does have its limitations.
There is no filtering of files so temporary files and working files (such as .DS_Store) get copied.
There is very little scope for tagging files and no scope for renaming files (particularly when using scanners which often give a file a name based on the date and time).
The technology detailed in this article can form the basis of powerful integrations and sub systems. If you are interested in importing files, particularly from scanners, we recommend DocMoto's "Import Profile" technology. For an introduction take a look at our video An Introduction to Import Profiles.