MetaChat REGISTER   ||   LOGIN   ||   IMAGES ARE OFF   ||   RECENT COMMENTS




artphoto by splunge
artphoto by TheophileEscargot
artphoto by Kronos_to_Earth
artphoto by ethylene

Home

About

Search

Archives

Mecha Wiki

Metachat Eye

Emcee

IRC Channels

IRC FAQ


 RSS


Comment Feed:

RSS

01 June 2009

AskMecha: What's up with Gmail's spam filters lately? I've been noticing more and more spam making it through Gmail's filters and getting to my Inbox. Which dovetails into a great question I've always wondered about spam filters: [More:]

Exactly how do they work? Why would a normally working filter like Gmail's suddenly fail? Also, I noticed this happening within the last week or so but I haven't done anything silly like posting my @gmail.com addy anywhere or signing up for any lists with it.

Can someone please explain this to me?
Spam filters used to work on various factors: scanning the header, subject and (if you opted in) body of the message for certain "hot" ip addresses, funky phony e-mail address/header information, certain BUY NIGERIAN VIAGRA TO INCREASE HER CHEAP PRESCRIPTION GIRTH PLEASURE NAO!! phrases, etc. Your ISP/e-mail provider would then build blacklists from the information culled from all of this and dump stuff that would go above a certain tolerance level (could be user-specified or ISP-specified or a combination thereof) and dump those e-mails into your spam folder.

This was 10 years ago, and as spammers come up with ways to defeat existing filters, those filters have evolved to the point where I'm not sure how it's done so well by gmail, for example. I've got two gmail addresses and I've had three spams come through their filters recently, the first time in about a year. My first e-mail address is my Yahoo address which is about 15 years old now. I use it solely for a spam trap and Yahoo's filters have been overwhelmed in the last three weeks to the point where I'm checking it daily rather than, as for the past several years, once every month or so.

The thing about filters, though, is that they don't actually *prevent* spam. The e-mail still gets sent, it still consumes resources, and people still actually fall for it. This is a large reason why even though you have never used your gmail address anywhere, you will have mass spams that start with "a@gmail.com" and just work through all the permutations from there. Chain e-mails that get forwarded without the forwarder taking the consideration of stripping all the visible addresses from the e-mail used to be a prime harvest for spammers, and may still be. You may have signed up with someone with your gmail address and forgotten, or they changed their terms of use and the notification of the change may have been accidentally dumped into your spam folder. There's no guaranteed way to prevent spam, never has been and never will be. The transport systems and protocols just were never designed to be anything but extremely open.

But something's up, that's for sure. Some spammer out there has temporarily gotten the upper hand over at least gmail and Yahoo.
posted by WolfDaddy 01 June | 09:50
Most spam filters these days work using something called a naive Bayes classifier. If you've ever used an email client that makes you 'teach' it which emails are spam and which are legitimate, chances are it's a Bayesian filter.

The basic idea is that by manually classifying documents into categories of 'spam' and 'not spam', the system can build up a word list. For each word, it records the probability (based on your teaching) that that word would appear in a spam email. For instance, for the word 'viagra', there's a high chance that that's a word from a spam email. And for other words (which may be specific to that user, i.e. names of family members and friends), there is a low chance that the word is from a spam email.

The 'Bayesian' part of it is a mathematical probability method that lets you turn these relationships backwards. So, working from data about individual words, and the probability that they exist in documents in a certain category, you can turn that around, and ask the question, "Given this new document, what is the probability that this document is in a certain category?"

The spam filter applies this test to incoming emails, and if the probability is above a certain threshold, it marks the email as spam.

One of the reasons that Gmail's spam filter is (usually) so successful is that it can learn not just from your personal emails and classifications, but from the collective emails and classifications of all Gmail users. This means that it can adapt pretty quickly (they claim within minutes) to new styles and phraseology in spam emails that get sent around.

The flipside to this is that it's a somewhat fuzzy reactive system that isn't immune from gaming and random blips in word distribution across the body of emails (spam and otherwise) currently being received. Most of the time these automatic adjustments improve the effectiveness system, but occasionally you'll get adjustments that have the opposite effect -- it's just a law of averages sort of thing. Or you'll get new styles of spam coming out that take the system longer to adapt to than normal.

So chance are that it's nothing you've personally done that's caused this uptick in uncaught spam -- think of it more like a random blip that will smooth out and right itself after a short while.
posted by chrismear 01 June | 09:51
My account has been letting spam through too. Strangely there was one with the subject line "Safety of silicon implants"--not exactly the enticement I need to look at their pron.

Which dovetails into a great question I've always wondered about spam filters...Exactly how do they work?

Eh, that's not that great of question. An entry-level inquiry really.
posted by mullacc 01 June | 10:22
FWIW, I haven't seen any spam get through the filters of my Yahoo email in many months. Maybe something's going on targeting the gmail.com domain?
posted by BoringPostcards 01 June | 10:26
Monday Morning Three Point Update: || Overthinking a tin of crisps,

HOME  ||   REGISTER  ||   LOGIN