An Analytical Look at Spam
I am writing this manifesto mainly because I’m angry, but also because I like to solve problems. Since June 20, 2001, I have been unfortunate enough to receive 402 spam messages (stat date: July 17, 2001). This is ridiculous. Hopefully, my efforts to bring an engineering prospective to the spam problem will one day result in the end of spam’s chokehold on humanity!
What is spam?
Spam is typically defined as “unsolicited e-mail”. (Spam can also be used as a verb in the sense of sending unsolicited e-mail to someone.) This is along the lines of “junk mail”, which is unsolicited postal mail. Most people argue “I didn’t ask for it, I shouldn’t get it.” But I believe that statement is wrong, and on top of that the issues (or true problems) are more complex.
Spam should not be confused with SPAM (TM) the luncheon meat. Please see Hormel’s explanation.
I recommend spelling the e-mail-type of “spam” in all lowercase letters when stylistically possible.
- “Spammer” - someone who voluntarily spams
- “Spam bucket” - e-mail account that you use publicly in order to draw attention away from a different e-mail address that you want to use with a more select private audience
- “harvesting” - seeking out and gathering e-mail addresses with the intent to spam, done automatically (by spam bot) or manually (by spammer)
- “Open relay” - a mail server that accepts e-mail from anyone and sends it anywhere
- “UCE” - Unsolicited Commercial E-mail; Spam
What is the problem?
An engineer always asks “what is the problem?”. Most people will just rush and say “kill spammers” or “we need Spam filters”. But an engineer takes a more global look by directly addressing the underlying problems. Often, more effective (and humane :-) solutions can be found this way. Unfortunately, there are multiple problems:
- User’s attention/time being wasted/abused.
- Bandwidth being wasted: clogging up preventing/slowing down other data transfer (once again user’s time wasted), money wasted
- Messages offend user (emotional damage/peace of mind)
- Messages corrupt minds of children (emotional damage)
- Waste of storage space (hard drive on either system or user’s computer - territorial encroachment? probably not a valid problem since even nonspammers’ messages take up your hard drive space, but perhaps its a “money wasted” issue, especially for server maintainer if he/she has to buy more storage space to handle spam influx)
- Worst of all, a message can contain many forms of viruses (macro/script, executable) which can damage your computer (time wasted, money wasted, emotional damage) or spam other people
So in summary, Spam is troublesome because it:
- Wastes time
- Wastes money
- May cause emotional damage
But wait, there are counter-arguments to that!
- Saves time (you don’t have to waste your time look for bargains)
- Saves money (they offer/give you bargains)
- Conveys good stuff, make people feel good (I suppose this is possible, but I never got a spam message with this tone of message)
- Educate/enlighten people (once again, never have seen this, but its possible!)
I dispute such counter-arguments because spam does not save net-time nor net-money (ironic play on the word “net”) nor does it on average make people feel better or properly (factually, honestly) educate people. Unfortunately, I love to overanalyze things. Its not illegal to waste someone’s time from what I understand. But perhaps you are holding someone’s attention hostage. :-). Hostage = “A prisoner who is held by one party to insure that another party will meet specified terms” (WordWeb dictionary). They hold your attention hostage so that you buy their product. Now wasting someone’s money is a property damage from what I understand and can be litigated in court. However, their is a minimum amount of damages that must be present before such a case can be brought before a court (I think). Plus, regular messages consume your money too. That leads up to “social contract”:
Right to Receive Spam (Argh!)
Its hard to ban spam outright though because it seems people should have the right to receive spam if they want to. Otherwise, we would be dealing with censorship and extreme curtailing of the freedom of speech.
Reasons for wanting to receive spam:
- People do buy products from spam (scary as it is, telemarketing works too because of this)
- People might want to collect threatening messages in order to seek legal action or know nature of threat to protect themselves
- Joy of receiving e-mail each day (I admit partially to this :-P )
So What’s So Evil Here?
Sounds like I’m being nice to spammers, but I’m just being fair to the issues, as well as fair to the rights of individuals. There is still lots of wrong doings being done via spam:
Why do Spammers Spam?
While we know a woodchuck woodchucks wood to sharpen its teeth, why does a spammer spam? Let’s look at the motivating factors involved:
- Greed (lust for money - when you want more money than you need)
- Financial Obligations (instead of lust, money needed instead to pay off debts, support family, etc.)
- Propaganda / Support for some idea
- Joke / Prank
- No reason whatsoever (damn irrational spammers!)
- Tricked into spamming (you ever reply to those messages about a kid with cancer who wants an e-mail before he dies? Well, most likely you fell for a spam prank to mail bomb someone)
- Mail bomb (attack someone)
- Love (spam for love? anything’s possible when you’re in love! :-) - As for spamming to be loved, I’m afraid that won’t happen! :-P
- Sex (spam for sex? I guess its possible. A fair trade in some horny minds.)
- Loneliness (at least, that’s the punch for the “kid in the hospital” prank. But its still a valid motive if genuine)
- Desire to be famous (I suppose its possible to spam oneself into fame, but you’ll like also land yourself in jail)
Motivating factors are nice (especially for explaining dumb criminal actions), but the reason dumb and smart “criminals” spam is that SPAMMING WORKS! That is, people make money off it, or can raise awareness for an issue, cause emotional damage, or worse yet can effectively clog up someone’s e-mail account.
What is the system being analyzed?
Engineers like to deal with systems, that is: put something in a theoretical box! But what is the “box” when dealing the previously mentioned spam problems? Possiblities are:
- E-mail account on POP server
- Mail client software (Inbox) on user’s computer
- User’s attention (not sure if this can count as a system, but then again a “user” can be a system :-)
Sometimes it helps to consider similar situations and the issues that surround them. Situations that are similar to spam include:
- Junk Mail (via postal service)
- Prank Phone Calls (Is your refrigerator running? Yeah? Well go catch it!)
- Door-to-door activities (salesmen, solicitors, doorbell ringing pranksters!)
- Fan Mail (getting way too much mail - only if you are famous)
- Fame (people fighting over your attention - people possibly wasting your time)
- Advertisements (both print and online) - people fighting over your attention
- Office mailbox (stupid memos)
In chemical engineering, balances help us analyze how some quantity (such as mass or energy) moves in and out of a system, as well as where it accumulates in places. Formula-wise, a balance is:
INPUT - OUTPUT + GENERATION - CONSUMPTION = ACCUMULATION
Let’s now look at what each term means in term of Spam, and also what the sources are:
Sources of INPUT (Enters system by crossing system boundary):
- Spammer (definition: one who sends spam) sends message to your e-mail account
Sources of OUTPUT (Leaves system by crossing system boundary):
- User voluntarily sends spam (their own spam, or spam from company they agreed to allow send spam through them)
- User involuntarily sends spam (spyware/spamware, virus)
Sources of GENERATION (Appears in system without crossing system boundary):
- Users don’t generally generate spam, but they can, see #1 for output
- User decides that a regular message is now Spam (it was already in the system!)
Sources of CONSUMPTION (Leaves system without crossing system boundary):
- User deletes message.
- User’s software (spam filter) deletes message.
- User decides message is not Spam.
Locations of ACCUMULATION (Rate of amount staying in system):
- User’s POP e-mail account (server)
- User’s hard drive
Where do spammers get your e-mail address from?
High Risk (According to this CNET article [inactive link 1] [no longer available online as of 2012-09-30]) - results in many (5+ per cause) spam messages per day:
- USENET or online message boards
- online lotteries and sweepstakes
- AOL chatrooms (all they need is your screen name to spam you)
- companies that buy/sell e-mail address lists (but they need to get it from some other reason listed here first)
Medium Risk - results in about 5 spam messages per day:
- Your e-mail address on record if you register a domain name (WHOIS database); Note: I can attest that your regular postal mail address will get lots of postal junk mail too from this.
- spam bots (scour internet for mailto: addresses, especially front pages of dot coms)
- online phonebook or address book (thank you UConn)
Low Spam / No Spam:
- your ISP (not sure if this is true)
- your friends :-(
- your enemies :-( :-(
- product registration (not bad if you check/uncheck appropriate opt-out boxes; overall though, this can be argued to be solicited e-mail)
- website access registration / creating member profile on website (like to NYTimes)
- online shopping (generally can unsubscribe if present)
- by just having a free web-based e-mail account (Getting spam this way is a myth which is not true. Opening an account and never using it publicly will not generate spam on its own. See CNET article [inactive link 2] again [unfortunately, that article is no longer available online as of 2012-09-30])
- subscribing to an e-mail newsletter
Please tell me they don’t make my address up out of thin air!
Unfortuantely, as it has been pointed out to me, if your e-mail address is not very unique, chances are you’ve been hit by spam due to a method called a “dictionary attack”. For example, spammers can send spam messages to simple first name addresses at each mailserver they know of: firstname.lastname@example.org, email@example.com, firstname.lastname@example.org... Other simple combinations of words and numbers can be generated in a quick batch method by spammers.
Solutions to Spam
I’m not jumping-the-gun here, but just listing possible solutions that I’ve heard through the grapevine:
Solutions Regarding Sources of INPUT (stop Spam from being sent to your e-mail address):
Direct Action Against Spammers
- Kill spammers (sorry, I don’t believe in the death penalty :-)
- Take computers or Internet access away from spammers (steal [evil] or legal force [good])
- Make spam illegal (I don’t think its fair to make all unsolicited mail illegal. I think someone, especially a cute girl, is welcome to e-mail me even if I never knew her before. Heaven forbid we make flirting illegal :-P )
- Make unsolicited messages from businesses illegal (that sounds more reasonable, but will this include non-profit organizations?) Also, wouldn’t this make business-to-business e-mail offers illegal? How can we differentiate between business and personal addresses? Different first level domains in e-mail addresses! How about “dot person” (TM)? :-P My new e-mail address would be email@example.com)
- Make spoofing e-mail origin and return addresses illegal, also spoofing IP address
- Make illegal to deliver messages to someone whom the message isn’t addressed to (how is this possible to do in the first place?) - mailing lists would be illegal; e-mail address w/o name of addressee is illegal; your name has to be listed in message or the message is illegal; Is it too much to ask for full name OR first name OR last name OR approved nickname in body of message?
White Lists (Approval) / Black Lists (Banned)
- Make it so e-mail servers must all be registered through a central authority. Unregistered servers would have their DNS or IP rights terminated. Better yet, there would be an approved server list to compare Spam against. Sounds like ORBS, which got sued I think.
- Ban e-mail from open relays (MAPS Realtime Blackhole List)
Tests to verify sender is human, not automated spammer
- Have people pass human-only-passable tests the first time they e-mail you (I think this what Spamcop.net uses; doesn’t work since most people get confused by it)
Avoiding spam bots and/or e-mail address collection
When posting/putting an e-mail address online (like a message board), post your e-mail address using one or a combination of the follow techniques:
- Unlinked Image
- Text Insertion: “firstname.lastname@example.org”
- Punctuation Replacement: “newman at vgmusic dot com”
- Mirror Image: “moc.cisumgv@namwen”
- URL Context Reference: “newman@ this page’s domain name”
- Name Context Reference: “My Last Name @vgmusic.com” - This may not be the best method since it results in the generation of an “error message” if a spammer tries using the invalid truncated address “Name @vgmusic.com”. It is best to avoid spam ever arriving to the real server. Additionally, error messages can sometimes be forwarded to the postmaster of a mail server, which is a further unforunate consequence. A solution that wastes bandwidth and other people’s time (besides the spammer’s) should be frowned upon.
- Spaces: “n e w m a n @ v g m u s i c . c o m”
- NATO phonetic alphabet: november echo whiskey mike alpha november at victor golf mike uniform sierra india charlie dot charlie oscar mike
- Unicode Obfuscation: The following e-mail address is written in browser-readable Unicode: “email@example.com”. To generate such an address easily, use the free-price program E_Cloaker. View the source code for this page to see how this works. You can also generate a linking mailto version of with Unicode: firstname.lastname@example.org. While the link looks normal when rendered by a web browser, the underlying HTML code is not easily readable by e-mail address harvesting bots.
Don’t post messages online (that’s no fun!)
Only post anonymous messages online (still not fun!)
Post messages online that don’t require your e-mail address (ok, that’s ok)
Using different e-mail addresses
E-mail addresses that expire / Disposable e-mail addresses:
Change your e-mail address every few months (not an easy option for me, plus “I’m not going to run anymore!” :-)
Use spam-bucket (use webmail or another e-mail account in public, but only give out private e-mail to only a select group of people)
Quit using e-mail (sorry, I’m not the Unabomber); this may include just quitting to receive e-mail, but someone who only sends but does not receive sounds like a spammer to me!
Solutions Regarding Sources of OUTPUT (stop Spam from being sent to others FROM YOU):
- Use virus scanner / virus shield program
- Don’t use spamware software
Solutions That Utilize OUTPUT (sending, replying to, or forwarding messages to stop Spam):
- Reporting service (Spamcop.net)
- Writing your governmental representative (how does this stop Spam??) / Or sending your Spam to them.
- Consumer protection agency
- Request to be removed from a spammer’s e-mailing list. (There are tons of problems with this, including: bogus companies that don’t give a damn if you make such a request; reply addresses can be fake; you often get put on a sucker’s list if you reply to such messages; my having to exert more effort than the original time that the message wastes defeats the whole purpose. Thankfully this technique works with real companies - in theory)
- Send out fake bounced message (possible with called Bounce Spam Mail. This program is only good for messages with valid return addresses. Also, spammers can put down someone else’s return address and you might end up spamming them unknowingly!)
- Mail bomb spammers (Flood their e-mail inbox with messages). (Evil, and doesn’t work if the return address is fake or stolen - its also philosophically hypocritical)
Solutions Regarding ACCUMULATION (dealing with Spam when it has already arrived):
Filter Spam (after it arrives to your computer)
- Custom Filter (one you design yourself). See my example of recommended filter keywords and phrases for Outlook Express [Note: This list is outdated, and not updated lately since I’ve disregarded this method.]
- Automatic Filter (provided by other entity)
- Distributed Filter (mutiple people working constantly together to have a dynamically updated filter: CloudMark)
Filter Spam (at mail server; censorship??)
Filter Spam at each mail router (censorship??)
Filtering service (Spamcop.net)
Have a filter that only accepts people you have pre-approved. This is a perfect solution for non-solicited e-mail, but you won’t get to hear from anyone new. :-(
Get a secretary! (human to filter e-mail for you; this option makes you lose privacy and subjects you to possible censoring)
Use artificial intelligence to filter e-mail. (not advanced enough yet)
- The DELETE key :-)
- Read your subject headers only, and based on those you can personally decide to view or delete messages (pretty good method)
Unsubscribe (on one hand it sometimes works, while on the other hand it may tell spammers your address really exists - sorta like the equivalent of a “sucker list” for postal junk mail. I think its a good method only with businesses and organizations you recognize and trust.)
Contact your ISP about spam message (AOL keyword “tosspam”) so they can block it on their end
Solutions from Outside the SYSTEM:
- Denial of service attack to offending mail servers or spammer’s IP addresses (evil, ineffective, especially since spammers can use stolen ISP accounts)
- Act of God (keep praying!)
- Class-Action Lawsuit against spammers
- Sue spammer on your own
- Bot Bait - put up tons of fake e-mail addresses on websites in order to overload spam bots
Solutions that Change the SYSTEM::
- Create new e-mail system that is not compatible with current e-mail and have it institute Spam protection measures
- Hashcash/camram - change e-mail from a “receiver-pays” to a “sender-pays” system. This method upsets some people because it introduces inefficiency into the e-mail system in order to solve yet another inefficiency problem (spam). However, I like this idea very much since there is a clear need for “a mechanism to throttle systematic abuse of un-metered internet resources”, as explained in Hashcash’s overview. Also check out camram for an explanation of how hashcash can potentially benefit e-mail.
The Best Solution So Far: I feel hashcash/camram has the most potential to alleviate the spam problem. This revelutionary idea gives e-mail value, or more importantly, associates a cost in sending out e-mail (CPU time). Used in combination with filters, it would provide a very important measure of whether a message should be considered spam. In the meantime (until hashcash/camram gets past the concept stage), I am very hopeful about the success that distributed filters like CloudMark may be able to provide. It takes my old favorite solution to spam, “Get a secretary!”, and expands its to an even higher level.
Good luck fighting spam! And happy analyzing to all! :-)
“manifesto on spam”, “manifesto against spam”, spam, “spam problem”
This manifesto was originally written July 15, 2001. For a decade, the article resided at http://www.vgmusic.com/~mike/an_analytical_look_at_spam.html I have moved it to this blog location to encourage discussion using the Disqus system below. Your feedback would be appreciated!
Note due to the age of this article, many external links no longer work. I have replaced as many as possible with Internet Archive copies or Wikipedia articles. Any other broken links have been marked as broken, but left unchanged.
As of September 2012, I have been using gmail as my e-mail system of choice. Their spam filtering methods get rid of most of my spam. The issues discussed above are still of great concern for people who run their own e-mail servers, and it continues to be an on-going battle even a decade after this article was written.