Monitoring is an essential aspect of software engineering. One of the aspect of monitoring is checking and watching files for changes and modifications. There are several libraries that can be used to monitor files and directory for changes. These include
- inotify
- guard
- watchdog
- etc
In this tutorial we will be exploring watchdog – a simple but powerful library for monitoring and watching file systems for modifications and events.
By the end of this tutorial you will explore
- How to install watchdog
- The main components of watchdog library
- How to use watchdog to monitor files and directory
- How to use watchdog to trigger backups of folders
- How to use watchdog for monitoring network drives
- How to monitor patterns
- etc
Installing Watchdog
You can install watchdog via pip as below
pip install watchdog
Components of Watchdog
In order to use watchdog there are 4 essential features that you
should be familiar with.
These comprises of
- The Observer() : used to observe or watch for changes in a directory or file system.It schedules watching directories and dispatches calls to event handlers.
- Events: as the name goes, it is any event, hence it can be a change,deletion,modification,movement,creation,closing of files,etc.
- Handler: this handles the events and displays it as specified. This can be a
- LoggingEventHandler
- FileSystemEventHandler
- PatternMatchingEventHandler
- etc
- Directory: this is the path or directory we are observing or monitoring. An event that happens in a directory under observation
can be a emitted for a file or a directory (event.is_directory).
Let us see the basics of using watchdog
How to use watchdog to monitor files and directory
Supposing we want to monitor a directory (TESTFOLDER) for any
modifications and event that happens, we can use watchdog for that.
# monitor.py
import sys
import time
import logging
from watchdog.observers import Observer
from watchdog.events import LoggingEventHandler
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO,
format='%(asctime)s -%(process)d - %(message)s',
datefmt='%Y-%m-%d %H:%M:%S')
# Path to Directory we are monitoring
path = sys.argv[1] if len(sys.argv) > 1 else '.'
# Handler to Display the Events to our console via logging
event_handler = LoggingEventHandler()
observer = Observer()
observer.schedule(event_handler, path, recursive=True)
observer.start()
try:
while True:
time.sleep(1)
finally:
observer.stop()
observer.join()
In our example above we are using LoggingEventHandler to display the changes we are observing to our console.
The recursive=True in our observer object ensures that we can operate within a directory with sub-directories and files.
We can now run the file as below and point to the directory we are
monitoring as below
python3 monitor.py TESTFOLDER
When we create or make modifications within our TESTFOLDER, we will see the logs in our console.
This is quite useful, however it would be better to store the logs to a log file for future usage. Let us see how to do that
Saving Logging Data to File with Watchdog
In order to do so we can use the Filehandler from logging module, however there is a simply way with just the basicConfig of logging module.
logging.basicConfig(filename="dev.log",
filemode='a',level=logging.INFO,
format='%(asctime)s -%(process)d - %(message)s',
datefmt='%Y-%m-%d %H:%M:%S')
By adding the filename and filemode arguments we can see our results being piped to a log file of our choosing.
How to Add User and Process ID to our Logs
For effective accounting and debugging in telemetry is would be useful to also have the Process ID and the UserID in our logs.
We can do so via getpass
. With getpass.getuser() we can get
the current logged in user for our system and add that to our log
format via the LoggingAdaptor or Filter or this simple trick of concatenating the user to our format string.
import getpass
user = getpass.getuser()
logging.basicConfig(filename="dev.log",
filemode='a',level=logging.INFO,
format='%(asctime)s -%(process)d - %(message)s' + f' {user}',
datefmt='%Y-%m-%d %H:%M:%S')
Let us combine all into a complete code
# monitor.py
import sys
import time
import logging
from watchdog.observers import Observer
from watchdog.events import LoggingEventHandler
import getpass
if __name__ == "__main__":
user = getpass.getuser()
logging.basicConfig(filename="dev.log",
filemode='a',level=logging.INFO,
format='%(asctime)s -%(process)d - %(message)s' + f' {user}',
datefmt='%Y-%m-%d %H:%M:%S')
path = sys.argv[1] if len(sys.argv) > 1 else '.'
event_handler = LoggingEventHandler()
observer = Observer()
observer.schedule(event_handler, path, recursive=True)
observer.start()
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
observer.stop()
observer.join()
So far we have seen how to use watchdog observing and monitoring files, but we can do more with watchdog.
Let us see how to use watchdog and python to do backups if there is changes to our directory
How to use watchdog to trigger backups of folders
Using shutil or any backup software, you can use watchdog to monitor if there are changes to a file system or directory and then trigger automatic backups when an event is emitted.
Basic Example
Watchdog’s EventHandlers have certain methods such as on_modified,
on_create,etc that we can enhance or override with our own custom functions. We can do so either directly or via creating subclasses via inheritance
import sys
import time
import logging
from watchdog.observers import Observer
from watchdog.events import LoggingEventHandler
import getpass
import shutil
import os
def on_modified(event):
dest_path = '/home/jcharis/Documents/JLabs/Tuts/FebTuts/BACKUP/'
# fetch all files
for file_name in os.listdir(path):
# construct full file path
source = path + file_name
destination = dest_path + file_name
# copy only files
if os.path.isfile(source):
shutil.copy(source, destination)
print('copied', file_name)
if __name__ == "__main__":
user = getpass.getuser()
logging.basicConfig(filename="dev.log",
filemode='a',level=logging.INFO,
format='%(asctime)s -%(process)d - %(message)s' + f' {user}',
datefmt='%Y-%m-%d %H:%M:%S')
path = sys.argv[1] if len(sys.argv) > 1 else '.'
event_handler = LoggingEventHandler()
event_handler.on_modified = on_modified
observer = Observer()
observer.schedule(event_handler, path, recursive=True)
observer.start()
try:
while True:
time.sleep(1)
finally:
observer.stop()
observer.join()
In the example above we just created a function with the same name
and then equated it to the default on_modified attribute.
def on_modified(event):
dest_path = '/home/jcharis/Documents/JLabs/Tuts/FebTuts/BACKUP/'
# fetch all files
for file_name in os.listdir(path):
# construct full file path
source = path + file_name
destination = dest_path + file_name
# copy only files
if os.path.isfile(source):
shutil.copy(source, destination)
print('copied', file_name)
Then on our event_handler we substituted or overrode it.
event_handler = LoggingEventHandler()
event_handler.on_modified = on_modified
Alternatively you can use the class option as below
import os
import sys
import time
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
class MonitorFolder(FileSystemEventHandler):
FILE_SIZE=1000
def on_created(self, event):
print(event.src_path, event.event_type)
def on_modified(self, event):
print(event.src_path, event.event_type)
dest_path = 'path/to/BACKUP/'
# fetch all files
for file_name in os.listdir(event.src_path):
# construct full file path
source = event.src_path + file_name
destination = dest_path + file_name
# copy only files
if os.path.isfile(source):
shutil.copy(source, destination)
print('copied', file_name)
if __name__ == "__main__":
src_path = sys.argv[1]
event_handler=MonitorFolder()
observer = Observer()
observer.schedule(event_handler, path=src_path, recursive=True)
print("Monitoring started")
observer.start()
try:
while(True):
time.sleep(1)
except KeyboardInterrupt:
observer.stop()
observer.join()
With Watchdog we can even monitor network drives or shared folders via the PollingObserver() (replace Observer with this) and do some cool task such as logging what files were modified by users or backup or restrict people from deleting files
How to Use Watchdog to Monitor Patterns
With Watchdog PatternMatchingEventHandler you can monitor a particular pattern eg csv, python files and do an activity when such a pattern is modified or triggered.
# monitor.py import sys import time from watchdog.observers import Observer from watchdog.events import PatternMatchingEventHandler class Handler(PatternMatchingEventHandler): def __init__(self) -> None: PatternMatchingEventHandler.__init__(self, patterns=['*.csv'], ignore_directories=True, case_sensitive=False) def on_created(self,event): print("A new create event was made",event.src_path) def on_modified(self,event): print("A new modified event was made",event.src_path) def on_deleted(self,event): print("A deletion event was made",event.src_path) if __name__ == '__main__': # Directory or File to be monitored path = sys.argv[1] if len(sys.argv) > 1 else '.' event_handler = Handler() # use our custom pattern handler here observer = Observer() observer.schedule(event_handler,path,recursive=True) observer.start() try: while True: time.sleep(1) except KeyboardInterrupt: observer.stop() observer.join()
Watchdog and inotify are great tools that are used by some web frameworks to restart their server if there are changes eg streamlit,flask/fastapi when reload is set to true.
There are several applications.
I hope this was useful.
Check out the video tutorial below for more.
Thank You For Your Attention
Jesus Saves
By Jesse E.Agbe(JCharis)