Reducing Spam - The Honeypot
I would like to clarify some points on using “the honeypot” method of Spam filtering.
The basic method goes, you place a decoy input field within your form and hide it from the users, but not from the spam bots. The spam bots come along and fill in all the fields and submit the form. You can then filter the input during the server side processing of your form.
This all works very well but in order to make it that little bit more useful and user friendly I want to clarify some points:
- Use CSS to hide the input and label, do not use the hidden method, any spam bot programmer with half a brain can get round that.
- If you do use CSS then please inform users that don’t use CSS e.g. those using screen readers /text browsers. That they should not enter anything in the field e.g. Make your label something like “Leave this blank” or “Don’t write anything here”. Don’t compromise usability and try not to confuse your users.
- Javascript processing is very well, but remember a raw request is all that is needed to post the data to your processing script so when using Javascript don’t bother checking for spam.
- Combine techniques. Once you get rid of all the bots that fall for the honeypot consider checking for known spam keywords, if you need some ideas check your email. There are loads of different techniques you can use and combine.
- Moderate! don’t rely solely on your code, read your comments once in a while, answer them. This will make you a better blogger and help build a sense of community, and you can pick up on the spam that may have escaped the net and of course learn how to stop it getting through again.
I would be interesting to learn of anymore ideas that you have regarding this issue.
August 4th, 2007 at 3:48 am
I’ve written an html parser ( because MS/XML throws exceptions for malformed code ), and your first comment about how it would be more difficult for a bot to find hidden fields that are hidden in CSS is right on the mark. I’m using my parser to “scrape” my own site, developed in Notepad++ with the data really only stored in html file. I need the info in SQL so I can build a CMS around it. It’s pretty easy to walk a DOM tree, by “applying” CSS in memory without actually rendering something is more difficult.
But your last point is probably the best advice. Least interesting from a code point of view, but probably giving the best results.
August 6th, 2007 at 5:18 pm
I appreciate your input over on geeksnotnerds. I have been following the posting between you and Flyswat, and I have found your input valuable, and will be implementing something shortly there, with your suggestions and his in mind. Thanks for the advice!
August 6th, 2007 at 5:22 pm
By the way, not sure if you are aware, but under firefox 2.0.0.6, and ie 6, this comment box is breaking your layout because of it’s width. Not trying to be a dick, just pointing it out in case you missed that, it’s an easy fix, obviously.
(Under firefox, on short posts it’s running underneath your RSS feed button, and under ie 6 it’s forcing the right column to render below it, as well as changing the width of your layout above it. It’s just messy.)