Log into a UNIX machine on the cluster that handles your email, and run /fs/domain/bin/setup_spamassassin
This command will set up your ~/.forward, ~/.procmailrc, and ~/.spamassassin/user_prefs files to scan your mail and filter out detected spam and viruses.
This command will fail if you already have a .forward, .procmailrc or .spamassassin/user_prefs file.
To override this check and move your old files aside, run
/fs/domain/bin/setup_spamassassin -f.
Make certain
that you want to completely replace your existing files before
telling the script to go ahead and move your old files aside. It
won't delete your files, so you can always retrieve the originals, but
it will report that it's moving your files aside into a backup folder and
completely replace them without regard for the previous
contents. We leave it up to you to know what you're doing if
you run this script with -f. When in doubt, ask for help.
You should only run this command on the system that receives and processes your email. If you run this command on a machine or cluster that simply forwards your email to another machine or cluster, you will break your mail forwarding, because the script will replace your .forward file with one that delivers your mail locally. Again, when in doubt, ask for help.
If you read your email on junkfood machines, you may run this command on any junkfood machine.
Once the setup_spamassassin script has successfully completed, that's really all you need to do. You can stop here if you want to -- the rest of the instructions involve customizing scanning and filtering behavior. The procmail rules that have been set up filter out spam before it gets to your inbox, so if you use a client like Outlook or Thunderbird, you should already have most of your spam filtered before you see your mail.
NOTE: By default, your spam and virus tagged messages will still be collected in your home directory. Over time, they will likely fill up your disk quota. Remember to check the spam and virus folders periodically to empty them or compress them. Spam will be deposited in ~/mail/spam, and virus tagged messages will be deposited in ~/mail/virus.
Collect folders of mail that each contain only ham or spam. You should have a large number of messages, on the order of several hundred to a thousand, if possible. The Bayes tests require at least 300-500 of each type of message before they will try to evaluate your incoming messages.
It's important that you have verified that each folder contains purely ham or purely spam, because you will be teaching the spamassassin process to recognize patterns common to your spam or ham messages. Feeding it incorrectly marked messages will reduce its effectiveness. If you accidentally feed it spam has ham or vice versa, don't worry, it's easily reversible by just feeding it the same message with the correct classification. spamassassin will automatically recognize it as a message it has seen before and unlearn the old information.
For sample folders spamfile (containing spam) and hamfile (containing ham), run these commands:
cat spamfile | formail -s /usr/local/bin/spamc -d spamd.cs.umd.edu -L spam cat hamfile | formail -s /usr/local/bin/spamc -d spamd.cs.umd.edu -L ham
As you receive new spam messages that have slipped through the filters, pipe them to spamc -d spamd.cs.umd.edu -L spam to further train your Bayes database to recognize the new messages as spam. You can modify your ~/.procmailrc to do this automatically, but be aware that it will be incorrectly learning any misclassified spam or ham.
For sample folders spamfile (containing spam) and hamfile (containing ham), run these commands:
/usr/local/bin/sa-learn --spam --mbox --showdots spamfile /usr/local/bin/sa-learn --ham --mbox --showdots hamfile
As you receive new spam messages that have slipped through the filters, pipe them to sa-learn --spam to further train your Bayes database to recognize the new messages as spam. You can modify your ~/.procmailrc to do this automatically, but be aware that it will be incorrectly learning any misclassified spam or ham.
Junkfood users, note: You must run sa-learn on a Linux machine. The mail processing machines are linux machines, and if you run sa-learn on a solaris machine, it will corrupt the database and you'll have to start all over again. This is only a problem if the mail handling machine and the machine running sa-learn differ (Solaris vs. Linux).
Start by looking at the headers of the message. You should see lines that look like this:
X-Spam-Status: Yes, score=9.5 required=5.0 tests=BAYES_99,BPLECS_INVESTOR,
BPLECS_STOCK,FORGED_RCVD_HELO,NO_REAL_NAME autolearn=no version=3.1.5
X-Spam-Report:
* 0.5 NO_REAL_NAME From: does not include a real name
* 1.0 FORGED_RCVD_HELO Received: contains a forged HELO
* 1.0 BPLECS_STOCK BODY: BPLECS_STOCK
* 2.0 BPLECS_INVESTOR BODY: BPLECS_INVESTOR
* 5.0 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
* [score: 1.0000]
This is a report of the rules that matched and what values they were assigned. If you can determine that a certain rule is matching lots of spam and little or no ham, you can increase the score associated with that rule. For example, if you can determine that you've received a dozen spam messages that matched the FORGED_RCVD_HELO rule, and none of your legitimate messages have matched this rule, you can increase its score.
Open ~/.spamassassin/user_prefs in your favorite editor, and look for this line:
score FORGED_RCVD_HELO 1.0If you don't see one, go ahead and add it. Then adjust the score to whatever you want it to be. You can see from the headers above that a 5.0 is required for a message to be marked as spam, so you may want to just crank this up to 5.0 if you know it will only occur in spam messages. If you see it in legitimate messages as well, you might want to just increase it a little, to 2.0 or 3.0.
If you simply want to reduce the score required for a message to be marked spam, look for and adjust this line in your ~/.spamassassin/user_prefs file:
required_hits 5.0
Errors in your ~/.procmailrc can and will result in lost email. I strongly recommend setting up a duplicate copy of all incoming mail while you are tinkering with your .procmailrc file, just to protect against email disappearing. We cannot recover email lost during delivery, because it hasn't been on the system long enough to be stored by the backup system.
To set up a duplicate archive of your email, create a file which will receive the duplicates:
% cd % touch mail/archive...now edit your ~/.forward file to add a line that indicates the location of this file:
"|/usr/local/bin/procmail -t #bplecs", /fs/junkfood/bplecs/mail/archiveThis sends a copy of all of your unfiltered email to the file /fs/junkfood/bplecs/mail/archive as well as to your procmail filter rules. Just remember to either remove the archive when you're done tinkering, or pay attention to it regularly, because such files can easily grow out of control and fill up your disk or disk quota.
See also man forward.
:0: * ^From jsmith@aol.com $ORGMAIL
This rule will catch messages from the specified user and deliver them to your inbox before it runs through spamassassin.
$ORGMAIL is defined as /var/mail/$LOGNAME, which is generally your inbox. You can easily specify an absolute path to any other file, or a path to a file relative to your home directory.
:0: * ^From jsmith@aol.com mail/spamThis rule will catch messages from the specified user and deliver them to your spam folder before it runs through spamassassin. Doing this before the spamassassin run helps reduce load on the mail server and deliver your mail faster.
:0 ! user@inter.net
....where user@inter.net is really the address to which you want to forward your email. This will forward all of your email offsite without saving a local copy. If you just want to forward a copy of your email to another address, add a c flag:
:0c ! user@inter.net
This tells procmail to only send a copy, and keep one locally to continue through the rest of the procmail rules (which, if this is the last rule, means falling off the end and delivering to your system inbox).