Fighting, and Winning, the Battle Against Spam

Hold on Cowboy

This blog post is pretty old. Be careful with the information you find in here. It's likely dead, dying, or wildly inaccurate.

The Battle Field 

Spam is like a flood of water. The dams do what they can, but let’s face it, there is a force that penetrates our best efforts. The constant barrage of email to ‘Enlarge this’, ‘Buy now’, ‘Meet sexy singles in your area’ is eternal. Some how getting into your inbox makes the spammers money. I’ve yet to hear someone enjoy the spam they get, so I really scratch my head on their business model, but they keep coming, so we’ll keep fighting.

What to do 

How do you fight it? First, I’ll say that the web mail providers do a pretty good job of keeping spam at bay. They should, they have multiple millions of dollars to attack the problem. Google, Yahoo, and Hotmail (MSN) have pretty good filters. I have an account at all three and from best to worst is Google, Yahoo, MSN respectively.

What about the little guy (me). I’m the IT guy for a small local company and we need email too. I’m also a fan of Linux for its power and flexibility. So I’ve put together some free tools to help me fight this war and I keep added to the arsenal whenever a new weapon of choice is made available. The great thing about these tools, they are free and quite advanced.

The power of open source is this: Someone who works for passion will produce more and better than someone who works for pay. Try as they might, the big software companies pay the best programmers to do their bidding, but they are competing with people who have a passion and are willing to work in their spare time.

A Little History 

When I first took up my current position, everyone (about 30 people) was milling through about 100-500 spam messages a day. It was awful and needless to say, inefficient. So I set up a Postfix SMTP server with a Cyrus IMAP server. I proceeded to install Spamassassin (SA) and use a cron script to have SA learn Ham and Spam every night.

Now all the associates had to do was drag spam into their Junk folder and SA would find and learn from them on a nightly basis.

Easy enough, but the spammers are desperate. Soon they started including a bunch of random words to offset the spammy words in their email. This started to fool SA and the spam started to trickle back. Quick stretch over and put your finger in that hole.

The Troops Rally 

First, Bill Gates a few years back predicted that the spam problem would be solved in two years. Well that time has come and gone and most would agree, it’s still a problem. So a guy named Meng Wong came up with the idea of publishing authorized senders for a domain in the DNS records, so an SMTP server could check this when it receives mail. This is called Sender Policy Framework (SPF). It’s quite effective in beating the spammers, who try to get in using what is called a Joe Job . Essentially, they can’t send you email from your own domain. This works good, but is only effective if domains publish this info, and now we are maybe at 20% who have published this info. SPF does break one thing, forwarding. With SPF enforced, an SMTP cannot auto forward an email, but that’s not such a big deal.

Unfortunately Microsoft decided to rename SPF to Sender ID, then they claimed that they championed this great new idea (they originally fought it, but that is common Microsoft practices).

A Hero Arrives 

An invasion of armies can be resisted, but not an idea whose time has come. -Victor Hugo

Not really a technological breakthrough, but an idea. Spammers don’t play by the rules, so if we force incoming spam to play by the rules, then spammers will get rejected. Spammers try to send an email once and if it gets delay or temporarily rejected, they just drop it and move on. It would be costly/inefficient for them to keep trying to deliver an email to you. The idea is Grey-listing . Simple. Keep track of the IP, To:, and From: addresses of each email in simple DB files. If this is the first time we’ve seen this triplet, then reject the email with a temporary fail message (450).

If this is a true SMTP server, it will try again at a later time (usually with in 5-15 minutes later). Grey-listing will keep rejecting the email for a certain period of time (usually 5 minutes) then it will let it through and the next email from that person will get right in with no delay.

This was pure gold. In the shop I work at, we cut spam by 90%, not really the spam that was in our inbox because SA was already doing a great job, but our Junk folders were about 90% smaller than before grey-listing.

The drawbacks. An initial delay in email delivery. There are also some popular domains that don’t play correctly and those are let through with no delay. Some critics have stated they didn’t want to create the extra bandwidth of having an SMTP server try back. Poppycock and Boulder-dash! When it connects, it only uses a few small packets to handshake before the whole message gets delivered. I’m rejecting the email during the handshake before the message is delivered and not having to deal with the bandwidth that the spammer is using to deliver his message.

For Your Eyes Only 

Now, the new trick that is being pulled is image spam. The email has an image that has what the spammer wants you to know. Most are pharmacy or stock spams. Then they have text that is included to throw off the scanners like SA. This get their message around the scanners, because the scanners cannot read the image.

They’ve stuck at us, now we open fire on them. The solution. FuzzyOCR plugin for SA. That’s right. By the power of OCR technology, we can scan that image in the email for words and then run them against the SA scanner.


In short when the email comes in it gets checked as follows

  1. Postfix Headers are checked for way out of dates.
  2. SPF checks to make sure the sending SMTP has the right to send via SMTP
  3. Postgrey does the grey-listing
  4. Rejects it with a 450 status if haven’t seen this sender before
  5. Accepts if sender has been seen before.
  6. Spamassassin runs 1000s of tests on this email
  7. If it’s spam, status the email as such
  8. If there is an embedded image, FuzzyOCR will extract the words and run them against a list of bad words, this adds to SA threshold.
  9. The email is delivered to Cyrus IMAP

No more spam.

It looks as if this battle will continue, sorry Bill. I think the next thing the spammers will do is start using funky fonts in the image spam to thwart the OCR software. What ever they do, we will prevail, because like in the movies, good always wins.

Did this help you out? It took me a few days to piece together all this information together, I hope this saves you some time (who knows, maybe the future me will be thankful I wrote this down). Let me know your thoughts.