Spam, spam and more [bloody] spam
If I have one wish for the future, a perpetual wish for the future, it is that algorithms and agents to detect spam, whether mail, registration bots, servers and scripts, improves faster than the spammers can develop it. Admittedly, a lot has improved, spam is not on a decline but at least with regards to mail filtering things are better, especially if compared to say 10 years ago. Good news, but if you have a website, private or enterprise, plan to build one with user registration, user applications but don't want to manually maintain the processes, well there are a few things you'll have to look out for.
This blog is about spam and fake registration preventive actions.
Fake user registrations - shit that may (or will) eventually happen
If you're unlucky it will happen much sooner than you thought possible. In fact, it's amazing how soon or how quickly trash finds your website address. It sometimes appears trash finds you faster than you produce (other types of) trash. Digital Karma? Nature's digital way of getting back at us, we, the chief trash makers? I am joking (of course, with some degree of irony) but over the past few months I've seen some stuff that really had me wondering.
Anyhow, there are of course a variety of ways to scan for web domains, probe search engines, indexes. Code and languages have gotten as elaborate and prying as have the more friendly spiders used by Google and other search engines. This is the price we pay for transparency, a cost tied to our desire, or need, of transparency. Sources of information are publicly available and they need to be. Or else your project, company or website will never be found. That is, for anyone who depend on some degree or form of exposure, a fate almost worse than digital death (which is an illusion).
Most of the CMS (Content Management Services) available today are, at one time or another, haunted with bots chasing down domain names and installations. Scripts, or plagues, configured to look for standard paths, i.e. directories and specific files. All CMS solutions have their standards, like /admin, /administrator, /user /login and so forth. The bots are programmed to analyze documents, read meta-information, headers, links and, when it knows the type of framework or solution you have installed, when it knows the standard paths used, it then hops on your admin or user registration pages.
Sooner than you can say cake, and/or by your next login you discover that your site has attracted registrants! But none you'd wish for! So over the coming weeks or months the traffic of probing bots steadily increases, to such an extent that for some bewildering moments it almost look as if you're going statistically viral. It's only when you begin to dissect statistics you realize that a substantial amount of the not too friendly traffic has absolutely nothing to do with people actually finding and reading your stuff.
Preventive solutions and actions
1. CAPTCHA: acronym for "Completely Automated Public Turing test to tell Computers and Humans Apart". Surfaced around 14yrs ago now, some clever guys came up with the at the time brilliant idea. A shit preventing measure. For long the principle acted as a frontier against lots of automated rubbish, the concept evolved and got an increasing number of finesses to make it more difficult to crack. The principle may still serve a purpose but no longer alone. Therefore, if you have or maintain webforms, for registration, commentary or other purposes, and these are anonymously available, CAPTCHA is today but one of the measures.
2. IP Filtering: nowadays an option that comes with most more elaborate Content Management Services. It may still offer some degree of protection if your audience or focus are mostly confined within a country, and therefore will be using ranges of IP adresses connected to your country. Such is identifiable, but you will need to use a Firewall in order to deflect the traffic properly. Individual filtering can be done in your CMS but as IP addresses can be spoofed, and the nuisance you're trying to deflect utilizes a range of IP addresses, possible from around the globe, you might as well forget about this. No sooner have you added an address to your filter section, the rubbish is back but now using an entirely different one.
3. Changing standard addresses: here's a solution that will at least reduce the amount of bot-traffic. By changing standard paths, those who run spamming operations or bots will have to make particular modifications in scripts to search for (your) alternative path names. And that means scripts must try many options. Changing standard paths is a good practise. More elaborate CMS solutions like WordPress, Joomla and Drupal quite likely will have a module or plugin to support such changes. Thereby avoiding the need for code modification, with the subsequent effect that you have something to manually maintain.
Example: for Drupal is this module permits you to make quick changes to standard paths, and with no impact on the core application. In other words, no modification that will conflict with platform updates. Example: http://mydomain.com/user/register to http://mydomain.com/folks/register . Just by making this change the bots won't be able to find your registration formula.. at least not without modifications.
4. SpamBot trappers: these modules, applications or products comes on various shapes or forms. With different functionality, some good, some not so good. Very likely you will not solve any spam/bot issue with just one of them. Yet the more elaborate ones are connected to a central database or register and can do look-up to check of a given token, address or session is associated with a blacklisted IP address. This a great solution but it can slow down a session process, and if the service times out or is unavailable, it might disrupt a legit login or registration process. So you need other lines of defense, modules or applications that can comprehend what is going on.
Most of the bots have something in common when they hit on your login or registration page. One is some ridiculous username plus a fake email address that has absolutely nothing to do with the name of the account. More often than not a username with no spaces is attempted used, enforcing use of space in a username might be one way to deflect shit. I use a variety of protections, while alone they don't solve all challenges I've worked out a combination that does the trick, the combo is different all depending on the CMS I am using. For Drupal you especially take a closer look at Spamicide and BOTCHA.
Some final words
Now, if it's not your forté to get web things to work, install and configure CMS stuff, or build websites, then leave spam preventive measures along with the rest. Leave it for someone who knows how to deal with it, to set it up for you. Call it an SEP (Someone Else's Problem). If you're a customer, owner of a website, simply make it a part of the/your requirements, or part of the specs. Anticipate that, if you plan to at one point or another, cater users across the internet, provide services for private or business means, publish newsletters etc there will be attempts to flood your site with shit. it's not a question of IF, it's a question of WHEN.
Thank you for reading this blob....er.. blog. I hope it brought some sense to the table.