Log in

No account? Create an account
Previous Entry Share Next Entry
Keeping spam out of Google Forms
Okay, the answer to my question from yesterday seems to be: there is no good answer. Looking around, I find a lot of people frustrated at Google Forms' lack of even primitive captcha capability, and some who are getting 1000+ spam entries a day in their public forms.

There seem to be three practical workarounds. [Which I will hide behind the nifty new lj-spoiler tag, for those who don't care.]

First is the one suggested by laurion and Aaron: wrap the Google Form in a homebrew page that *does* contain adequate security measures. This is the most effort, but allows more or less arbitrary security -- at that point, you can implement pretty much whatever you think is appropriate. IMO it's overkill for the project at hand, but probably best for any form that is going to get really wide distribution. (And thus, is likely to get more seriously attacked.)

Second is an idea talked about in some of the Google forums: use page branching to honeypot the spambots to death. While Google Forms is quite simplistic, it *does* have basic branching capabilities: that is, depending on Answer A on page 1, you get to page 2, 3, or whatever. So you can theoretically define a single answer which is obviously correct to a human, but not to a computer, and branch to a dead end if the wrong answer is given, or even just sit and loop back to the same page. Nice in principle; however, it only works for multiple-choice answers, so a spambot that simply chooses those randomly is still going to get through unacceptably often.

So the third option, which I think is probably best for the use case I have, is a variant of that: have a question that requires *text* entry, for which the answer is trivially easy for the intended audience but unguessable for a spambot, ideally one that requires at least slight knowledge of the problem domain. Then you add a script to the spreadsheet itself (yes, as it turns out Google Docs is pretty nicely scriptable), that fires when a form is submitted, checks the entry, and deletes the row if the answer isn't correct. The result is a lot of traffic for Google, but is basically invisible for us.

It takes only a few minutes to implement (assuming you have an appropriate question for your audience), so it's a reasonable workaround until and unless Google comes up with a real answer. (Indeed, it's probably *better* than ordinary captchas in most cases, because it's less mental effort for your real audience...)

  • 1
(Deleted comment)
Don't know, but I've never heard of any sort of metering of Google Docs beyond number of users. (This isn't a Google App -- it's a straight-up use of Google Docs itself.) I suspect that it's unlikely to rise past the level of noise from Google's perspective, but it's hard to know for sure.

At this point, my attitude is "good enough to start with" -- worse comes to worst, if Google Docs turns into a PITA at some point, it's ridiculously easy to export the data and move to something realer...

I think that third option is something I've been seeing in the wild a lot lately on forum registration software. "What is the name of this game?" and similarly obvious questions.

Yup. Called Knowledge or Logic captchas. They ask for information not encoded directly into the question. The goal is to improve accessibility for the blind, since computer image recognition software has long made most of the image based captchas useless. textCaptcha.com and revCaptcha.com are two services you can plug in to apps, but the forum versions all seem to let you create your own questions, which I think is a good thing. Identical questions across multiple sites are ripe for pre-attack computing, plus being able to add or remove questions quickly makes it easy to dispose of a question when the bots do find away of answering it.

Looking at it another way: Knowledge Captchas *scale* nicely. If it requires even a small amount of out-of-band knowledge that requires human intervention, it is usually going to evade spambots unless the prize is so clearly worthwhile that it's worth someone's time figuring out the right answer. (Cynically -- you don't have to have perfect spam protection, just good enough that it's easier to go spam that other site over there.)

Hadn't really occurred to me until I came up against this problem, but I'll have to keep the trick in mind. Besides being mildly effective, it is *way* easier on your users if you choose the right questions, and far less annoying that traditional captchas have become...

  • 1