A GetMail IDLE daemon script

Updates

Although the script in this article works, I'm having some problems with it after long-running sessions. The symptom seems to be that imaplib2 just stops processing IDLE session responses - it terminates and recreates them just fine, but no new mail is ever detected and thus getmail is never triggered. With 12 or so hours of usage out of the script, this seems odd as hell and probably like an imaplib2 bug.

With the amount of sunk time on this, I'm tempted to go in 1 of 2 directions: re-tool the script to simply invoke getmail's IDLE functionality, and basically remove imaplib2 from the equation, or to write my own functions to read IMAP and use the IDLE command.

Currently I'm going with option 3: turn on imaplib's debugging to max, and see if I can spot the bug - but at the moment I can't really recommend this particular approach to anyone since it's just not reliable enough - though it does somewhat belie the fact that Python really doesn't have a good IMAP IDLE library.

Updates 2

After another long-running session of perfect performance, I'm once again stuck with a process that claims to start idling successfully, but seems to hang - giving no exceptions or warnings of any kind and only doing so after 8+ hours of perfect functioning. It's not a NAT issue since this is far short of the 5-day default timeout.

At a best guess the problem seems to be that once logged in, imaplib2 leaves the session open but dumbly just listens to the socket - which eventually dies for some reason (re-assigning IPs by my ISP maybe?) but imaplib's "reader" thread just blocks on polling rather then triggering the callback code (since the notable thing is I can see the poll commands in the log stop, the session timeout detection, but no invocation of the callback).

As it stands, I have to strongly recommend against using imaplib2 for any long running processes like IDLE - you simply can't deal with a library that's going to silently hang itself after a half-day or so without crashing or logging anything to indicate this happens - the only detection is when self-addressed emails don't arrive, but that's a really stupid keep-alive protocol. I'll be retooling the script to try out imapclient next but that will be a future article and a separate gist.

Why

This is a script which took way too long to come together in Python 2.7 using imaplib2 (pip install imaplib2).

The basic idea is to use the very reliabe GetMail4 (apt-get install getmail4) - which is written in Python - to poll my IMAP mail accounts when new mail arrives, rather then as I had been doing with a 1 minute cronjob (which is slightly too slow for how we use email these days, and may not be liked by some mail servers - not to mention resource intensive).

The big benefit here is rapid mail delivery, but the other benefit is that it solves the problem of cron causing overlapping executions of getmail which can lead to blank messages (though not message loss). Other means of solving, such as wrapping cron in a flock call aren't great, since if the lockfiles don't get cleaned up it will just stop working silently.

Requirements

Writing a reliable IDLE daemon that won't cause us to spend half a day wondering where our email is is not easy. This was an interesting Python project for me, and it's certainly not pretty or long - but I mostly spent a ton of time trying to think through as many edge cases as I could. In the end, I settled on tying the daemon itself to sendmail on my system, so at least if it crashes or an upstream server goes offline I'm notified, and I have a decent chance of seeing why, and the use of pid files means I can have cron failsafe re-execute every 5 minutes if it does go down.

The Script

I started with the example I found here but ended up modifiying it pretty heavily. That code isn't a great approach in my opinion since it overwhelms the stack size pretty quickly with multiple accounts - imaplib2 is multithreaded behind the scenes (2 threads per account), so spawning an extra thread to handle each account gives you 3 per account, 6 accounts gives you 18 threads + the overhead of forking and running GetMail in a subprocess.

Though when all things are considered, I didn't improve things all that much but using a single-overwatch thread to reset the IDLE call on each object is simpler to handle (although I don't present it that way IMO). But the important thing is it works.

Download

The script is quite long so grab it from the Gist. It has a few dependencies, best installed with pip

$ pip install imaplib2 psutil
$./getmail-idler.py -h
usage: getmail-idler.py [-h] [-r GETMAILRC] [--pid-file PIDFILE] [--verbose]
                        [--daemonize] [--logfile LOGFILE]

optional arguments:
  -h, --help            show this help message and exit
  -r GETMAILRC          getmail configuration file to use (can specify more
                        then once)
  --pid-file PIDFILE, -p PIDFILE
                        pidfile to use for process limiting
  --verbose, -v         set output verbosity
  --daemonize           should process daemonize?
  --logfile LOGFILE     file to redirect log output too (useful for daemon
                        mode)

It uses a comprehensive argparse interface, the most important parameter is -r. This is exactly like the getmail -r command, and accepts files in the same format - though it doesn't search the same locations although it will search $HOME/.getmail/.

Currently it only handles IMAPSSL, which you should be using anyway though it should be easy to hack it to fix this I just have no incentive too at the moment.

Currently I use this with a cronjob set to every minute or 5 minutes - with verbose logging (-vv) it won't produce output until it forks into a daemon. This means if it crashes (and I've tried to make it crash reliably) cron will restart it on the next round, and it'll email a tracelog (hopefully).

My current crontab using this script:

* * * * * /home/will/bin/getmail-idler.py -r config1.getmailrc -r config2.getmailrc -r config3.getmailrc -r config4.getmailrc -r config5.getmailrc --pid-file /tmp/will-getmail-idler.pid --logfile .getmail-idler.log -vv --daemonize

Personal thoughts

I'm pretty pleased with how this turned out (edit: see updates section at the top on how that's changed - I'm happy with the script, less happy with imaplib2) since it was a great exercise for me in learning some new things about Python. That said, compared to something like NodeJS, I feel with the write library this would've been faster in a language with great eventing support, rather then Python's weird middle-ground of "not quite parallel" threads. But, I keep coming back to the language, and the demo-code I used here was Python so it must be doing something right.

I'll probably keep refining this if I run into problems - though if it doesn't actually stop working, then I'll leave it alone since the whole self-hosted email thing's biggest downside is when your listener dies and you stop getting email - that's the problem I've really tried to solve here - IDLE PUSH email functionality, and highly visible notifications when something is wrong.