X

Margin Note: Give your spam a Bayesian blast

Margin Note: Give your spam a Bayesian blast

CNET staff
3 min read

Every day at MacFixIt, we wade through hundreds - often thousands - of individual e-mail messages on a daily basis. With e-mail address harvesters more active than ever, just "cutting the fat" (removing unwanted bulk e-mail) has become a task in itself. And missing an important e-mail note through the use of an inaccurate spam blocking tool can make the difference between publishing a critical troubleshooting solution or not.

Using Entourage as a primary e-mail client, deleting spam over through an IMAP server is a tedious process. When mass deleting, messages sometimes re-appear or do not delete at all. Worse, the background process of deleting so many files on different request paths can be detrimental to Entourage's performance when browsing through other messages. You can pull e-mail messages into a FileMaker database or some other management system, but using a spam filter is the only way to make working with hundreds of junk messages manageable directly within Entourage.

Leave it to Apple to embrace a solution that eventually makes itself useful in a Microsoft product.

The revision of Mail.app included with Mac OS X 10.2.x includes a relatively widely publicized feature called a Bayesian spam filter. The Bayesian technology was first discussed and presented in 1998, and offers a user-trained filtering approach, combined with a score-based evaluation system. The result is a spam barricade that has been more highly praised than virtually any other mainstream-available solution.

Thanks to the talented Macintosh shareware development community, Bayesian filtering is no longer relegated to users of Mail.app - and can be used in several other popular clients. SpamSieve, by Michael Tsai, employs the Bayesian technology, and works with Claris E-mailer, Eudora, Entourage or Mailsmith or PowerMail.

The first aspect you will appreciate about this product its comprehensiveness and professional execution. More than just a crude hack to implement an open-source offering as a Mac OS X application, SpamSieve includes some of the best documentation we have ever seen in a US $ 20 shareware product. Not only is there a detailed explanation of how exactly Bayesian filtering works, explanation of the already straightforward features is excellent.

After you've downloaded and installed SpamSieve, you will need to starting adding messages to your "Corpus" - a database that stores information about how many times words appear in either valid, or junk messages. This can be done through the Entourage AppleScript menu (far right in the menu bar) after properly installing SpamSieve through, or through a keyboard combination. If you are using an IMAP server, make sure to select all of your unread messages and click "Receive entire message" from the "Message" menu before starting to sort. Otherwise you will have to wait for each individual message to load before analyzing its contents.

You can actually see your training panning out in the SpamSieve application, alongside a probability ranking that will eventually work to catch unwanted e-mail.

And the best part is that SpamSieve adapts (with a little bit of help from you) to progressive spam tactics by constantly being trained to look for an evolving set of words. The database - or "corpus" - can also be pruned to rule out words that appear too inconsistently among good and bad e-mail messages in order to have any positive effect.

After using SpamSieve for a few days, you will notice that it becomes gradually more accurate (provided you continue to train it), until you can almost completely trust the software to automatically remove your junk mail without your marked consent for each individual piece.

Feedback on this issue? Drop us a line at late-breakers@macfixit.com.

Resources

  • SpamSieve
  • late-breakers@macfixit.com
  • More from Late-Breakers