Many sophisticated e-mail filtering options are available by using a program called "procmail". This is configured by the file ".procmailrc" in your home directory. This allows you to make decisions based on various attributes of the e-mail message, including filing it, forwarding it, or discarding it.
Writing a .procmailrc
Here are the basics of a .procmailrc. You might want to skip down to the example, then look back and forth as desired.
A .procmailrc consists of a sequence of clauses, called "recipes". Each recipe contains three parts: a header line, zero or more condition lines, and exactly one action line.
As for the header lines: Most simply, each recipe begins with either the line ":0" or ":0:", where ":0:" does file locking and ":0" does not. Use ":0:" for saving to a file (not including /dev/null), and use ":0" for forwarding to another e-mail address or for discarding (to /dev/null).
The condition lines begin with '*'. They constitute the criteria by which it chooses to follow this recipe. More about condition lines below. For each incoming e-mail message, recipes are checked in order from the beginning of the file to its end until one matches. The condition lines are ANDed together, with zero condition lines representing "true" (this feature can be used for a default-action recipe at the end, since it matches always.. but only if you get to it, because all the previous recipes' conditions were false, hence it would be a default case). If no recipes' conditions match, normal mail delivery occurs.
Then there is an "action line", which is either a file name (no introducer, i.e. the line is simply the file name); or an '!' and an e-mail address to forward to; or a '|' and a unix command to pipe the message to. You'll probably usually use a file name here.
The "conditions" use unix regular expressions. In short, '.' matches any single character; most characters match themselves, case-insensitively (e.g. if you want to match the string "squid", you can simply type "squid", and this would also match "sQuId"); something followed by an asterisk matches zero or more of that (so for example ".*" matches any string of any length, including the empty string); something followed by a plus sign matches one or more of that; something followed by a question mark matches zero or one of that.
Also, '^' matches the beginning of the line, so we match (for example) a Subject: header field using "^Subject: " because it has to be at the beginning of a line. Backslashes can be used for quoting, e.g. "\." matches an actual period. And square brackets indicate a list of characters such that the expression matches any one, meaning that you can match "zero or more spaces or tabs in any order and any combination" (i.e. "optional whitespace") with "[ ]*", where that string is actually leftsquarebracket, space, tab, rightsquarebracket, asterisk. That is, the space versus tab really matters here and a sufficiently "user-friendly" editor might make this difficult to type correctly.
Finally, as an extremely-special case, the string "^TO_" at the beginning of a regular expression matches the field name of any header line which is an adddressee. So "^TO_username" will match such header lines as "To: username" and "Cc: username", as well as many more-obscure variants. (Note that it will also match "To: email@example.com" and "To: usernamessssssssssss".)
Here is a very simple example .procmailrc with an explanation. It files messages with the new "[PMX:SPAM]" tag in a different mail file named "spam", and it throws messages with the "[PMX:VIRUS]" away altogether.
:0: * ^Subject:[ ]*\[PMX:SPAM] /u/username/spam :0 * ^Subject:[ ]*\[PMX:VIRUS] /dev/null
Line 1: Beginning of a new recipe. Writing ":0:", as opposed to ":0", means that we want it to do file locking. That is because we are going to deliver to the actual file "/u/username/spam", so in case two messages come in at once with two different "procmail" programs running, we have to avoid interleaved writes for this file. All writing to files from .procmailrc should involve locking.
Line 2: The condition for this first recipe. The '*' just means that this is a condition line. Next, the '^' means that we are looking for matches at the beginning of a line. At the beginning of the line, we expect the characters "Subject:". If these are not present, this line doesn't match this recipe and it will look at other header lines to see if they match. After that colon comes zero-or-more tabs-and/or-spaces, represented by those two characters in square brackets followed by an asterisk meaning zero-or-more. Note that if you copy and paste this from this web page, it's not going to work; you have to fix it up by typing an actual tab and a space (or copy from the file procmailrc) After that comes an actual left-square-bracket, which needs to be escaped with a backslash. Then PMX:SPAM, and a closing square bracket (the closing square bracket is not special out of context so needs no backslash, although the backslash is actually permitted, but unnecessary backslashes are a bad habit). Now, there is a character similar to '^' (namely '$') which matches the end of a line, but since this is not there, this condition is not saying that the line has to end there. Normally there will be further characters on the message's Subject: line, and they won't interfere with the match.
Line 3: The action line. There could have been more than one condition lines, but since this line begins with a non-asterisk, the condition lines are over. This action line says to deliver to the file /u/username/spam. Since line 1 had the terminating colon, the file /u/username/spam will be locked properly to avoid simultaneous delivery.
Line 4: A blank line to make it look pretty. Any number of blank lines are allowed between recipes or before the first recipe or after the last recipe.
Line 5: Beginning of a new recipe. ":0", as opposed to ":0:", means that we want it not to lock the delivery file. Since the delivery file (line 7) will be /dev/null, which is the special discard file, simultaneous delivery is encouraged. It's also important to use ":0" (rather than ":0:") when there is no file involved, else procmail will give you an error message saying that it can't figure out which file to lock.
Line 6: The condition line for this second recipe. Very similar to line 2; this is matching the lines which are being tagged as viruses rather than spam; this is here so that we can do something different. Or line 7 could be the same as line 3 if we wanted to save viruses in /u/username/spam too; in this case, line 5 would have to be ":0:" to specify locking.
Line 7: The action line for this second recipe. /dev/null is a special "device" file in unix, which is a device driver which basically just returns success, thus discarding whatever data you write to it. So this is used in procmail to discard a message.
There is no line 8: Since the file ends at this point, if no recipes have matched then normal delivery occurs, to /var/mail/user (e.g. /var/mail/username in my case), with file-locking.
If you save a message into a file, you can type simply
and then see what procmail did with it.
If you have a new .procmailrc candidate but you want to test it before renaming it to .procmailrc, you can specify the file name on the command-line:
procmail procmailrcfile <messagefile
If your message file does not begin with a 'From ' line (with the space) (e.g. the files in a "maildir", or as produced by mh), you need to use the option which fakes the 'From ' line, which is "-f-" (a minus sign before and after the 'f').
procmail -f- procmailrcfile <messagefile
Rather than simply poking around to try to determine what procmail did and why, you might want to turn on logging, next section.
Some people recommend using the option wherein procmail logs everything it does, so that you can figure out later what happened, why legitimate mail was discarded as spam, etc.
In any case you might want to turn on logging during testing.
To log procmail actions to a file, you have to set some variables. Variable use looks much like that in 'sh'. Assignments look like "var=value"; variable interpolation looks like "$var".
First you have to set the file name for it to log to, as a value of the variable "LOGFILE". Example:
Then you have to tell it what to log. The "LOGABSTRACT" variable, when set to "all", logs something about every message. The "VERBOSE" variable, when set to "yes", logs even more, about every condition and every action. To turn them both on:
Altogether my earlier example file becomes:
LOGFILE=$HOME/maillog LOGABSTRACT=all VERBOSE=yes :0: * ^Subject:[ ]*\[PMX:SPAM] /u/username/spam :0 * ^Subject:[ ]*\[PMX:VIRUS] /dev/null
At CS, procmail is activated by editing your .forward file to pass messages to procmail rather than delivering them. (Then procmail might deliver the message; or it might not.)
Consider making a .forward file which looks like this, e.g. if your username is "fred"
"|$MAILBIN/bin/procmail -f- #fred"
This causes incoming e-mail to be re-sent to a program. It runs "procmail" with the e-mail message on its standard input. The double-quotes are necessary. (As mentioned under "testing", above, the "-f-" makes it fake a From-space line if necessary. This may not be important in the .forward but it's a standard precaution.)