Who's winning the billion dollar Google Dance - Google or the spammers?

© Icqurimage 2006

During the short period of history over which search engines have governed Internet traffic, there have been those manipulative minds which have devoted themselves to redirecting their results. The difference between appearing in the top three and the bottom three of a Google search display can mean the difference between commercial success and failure. As a result Search Engine Optimisation (SEO), both legitimate and illegitimate, has become a $6 billion industry (2005 figures), with profiteers on both sides of the legal divide. Whilst it is perfectly legitimate to charge $50 to submit a domain name to the search engines, there are many other lucrative activities which distort the principles of fairness upon which the search engine providers pride themselves. Some of these are prohibited by Google, other practices are even illegal, but with millions of dollars at stake and the vast wildernesses of Cyberspace in which to operate, most outlaws are willing to take their chances.
The first SEO techniques intended to boost search engine rankings appeared in the 1990s with the advent of the first search engines. This technique involved adding hidden ‘meta tags’ to web pages to serve as a guide to the page’s content. To this feature Google added a ‘Page Rank’ system which graded a page based upon the number of incoming links or recommendations to it, and the impact of a link in the Google ratings was in turn determined by the relevance of its own content and page ranking.
This simple system did not survive the sand storms of capitalism, and soon a market emerged in buying and selling links which helped commercial sites to boost their Google rankings. As a countermeasure Google developed secret ‘off-site’ measures to determine the relevance of a link to the page it was recommending. Although most of these measures are kept highly secret to thwart optimisers, this does not deter a snake oil industry of those who claim to know the innermost secrets of Google’s Page Ranking system. As with any set of laws, whether created by the Supreme Court or by Google, there are loop-holes and an entire industry has emerged ready to exploit them. Such ‘unethical’ practices in distorting a site’s apparent popularity on the search engines are commonly known as ‘black hat’ techniques.

The ‘Black Hat’ operatives

So called ‘link spammers’ ply their trade in the form of links to sites. Some have even devised programs which automatically generate links and content on message boards, blogs and forums that recommend sites by linking through to a commercial page.
Another popular black hat technique is ‘keyword spam’ which simply means that irrelevant or superfluous keywords are crammed onto a page in the form of nonsense passages or ‘hidden’ text which are deliberately selected to deceive search engines as to the true nature of its content. Some adult sites list every popular adult search term or model whether they feature them or not, and others are more deceptive still. It isn't just the dark back street offices of the adult, diet and software industries which use keyword spam. Elite car manufacturer BMW famously used the term ‘gebrauchtwagen’, or used car no fewer than forty times on its front page for new car sales in order to tempt consumers looking for used cars to its web site. Google blacklisted BMW, as they do with any keyword spammer.
So-called ‘Spamdexers’ attempt to increase page rankings by tricking the complex Google equation (or algorithm) to assign their pages greater weighting. ‘Cloakers’ use page redirect commands so that clicking on a page for one product results in another page appearing for a different or unrelated product. This is even used to divert traffic from a rival’s web site to that of another competitor. ‘Google-aters’ design web pages whose content serves no other purpose than to boost its Google search engine ranking.
Search engine spamming is defined as those ‘pages which are created intentionally to deceive a search engine into offering inappropriate, useless, or low quality search results.’ There is of course plenty of subjective ‘wiggle room’ within this definition to blacklist rivals or disliked competitors. The moral contention is that web sites should be designed primarily for the consumption of the end user rather than for the search engines. However, as any webmaster will tell you, you may ignore SEO criteria at your peril. Yes, directly marketed web sites can gain popularity, but do you really have the time to wait for your Internet investment to grow solely through personal recommendations and donated links?
The Grand Grimoire of black hat spamming includes at least sixteen categories which include page redirects, Key word ‘stuffing’, mirror sites, tiny text, doorway pages, link farms, cloaking, keyword stacking, gibberish, hidden text (for example text so small that it looks like lines or has the same colour as its background), domain spam, hidden links, micro-sites that are solely designed to capture traffic, bait & switch page swapping, typo spam, and cyber squatting (getting domain names that are similar to those of a coveted rival site).
Yahoo's Tim Mayer described the dilemma of SEO in a nutshell when he told a black hat conference, ‘If you're being entirely organic and going after 'Viagra,' it's like taking a sword to a gunfight. You just aren't going to rank.’ For ultra competitive categories such as the adult industry, gambling and travel the simple rules of the market ensure that only those who test the absolute limits of SEO acceptability will survive. However there is a definite limit (wherever that may be), and any spam site which subverts the search engine's methods of ranking pages with black hat techniques will be black-balled (if they are caught).
There is one thing which the Black Hat Brigade has achieved, and that is to unite the search engines for the first time in their ten year war of supremacy to agree to do something about it. This something is known as the ‘Nofollow Attribute for Links’. This new indexing command will protect webmasters from being accused of search engine spamming, especially those who run blogs or forums. The ‘nofollow’ attribute can be associated with links to block them. When the rel=‘nofollow’ is put in front of the anchor hypertext tag (the one that says ‘go somewhere’) of any link it will flag that link as being one to ignore when the search engine spider indexes the page. This will prevent the owner’s blog or forum from being black-listed through unvetted links.
Another rarely mentioned form of spamming is the generation of unsolicited links to competitors which are placed upon those sites which use black hat techniques. These sites effectively blacklist their competition. If you think this hasn't happened to you, just look at how many unsolicited links there are when you enter the name of your web site on Google. Enter Icqurimage into a Google search and see how many adult companies have provided unsolicited links to the site in an attempt to condemn its SEO image. To extrapolate this argument to its ultimate commercial conclusion, keep an eye out for litigation regarding unsolicited links from black hat sites.
Link spam is usually associated with blogs, forum operators, and those sites which still operate their predecessor - guest books. Any such open access publishing site is vulnerable to an avalanche of ‘posts’ that may contain link spam and referrals to other product sites. Sites that link to others, as with any social recommendations are deemed to be only as reputable as their source. A good link is seen as a good vote by a search engine, and a bad link as a negative recommendation. Appropriately web sites that use ‘black hat’ techniques are known as bad neighbourhoods in search engine parlance. The only question that remains is to how Google achieves this discrimination, if indeed it actually does. Google gives linking pages a Page Rank value, determined in part by how many click-throughs are required to reach it from the entry page, and also by how many pages link to it from other sites, and what their respective Page Ranking is.
Examples of ‘comment spam’ may be found all over the Internet and these consist simply of spurious comments which may or may not be pertinent to the topic covered in the forum or blog. For example, a dating forum may have an entry stating that you should improve your sex life with Cialis or Viagra and then report to a link which sells them. Comment spam frequently contains a string of keywords and often makes no sense whatsoever. Google hopes that by flagging links with the Nofollow attribute that it will make blogs, forums and guest books less vulnerable to such spamming techniques. As with the eternal battle between the virus and our immune system, this will merely mean that spammers will be forced to pause a few days to reflect and to try something new, especially given that much blog spamming is automated. However, there will be those sites which will not adopt Nofollow and these will continue to be targets for blog spamming engines until Google finally blacklists all heavily spammed blogs and forums otherwise euphemistically termed ‘bad neighbourhoods.’ Those who employ automated Link partnering tools should beware. More recently, owing to improvements in forum and blog software, link spam is now increasingly focused upon ‘wikis’ such as open source encyclopedias like Wikipedia who decided not to use nofollow tags and to use a spam blacklist instead.
There are other defences which may be used against comment spam. Turing tests use various means to require inputs and registrations to be made by hand, including so-called captcha gateways. Server-side redirects are also used effectively, but these are more complicated, time-consuming and involve increased traffic. Redirecting such links prevents Google from factoring the link into its Page Ranking system, rendering the spam ineffective. There are also distributed approaches for dealing with link spam. As a given web site domain which is running a spam campaign will usually only deposit a single link per site, if the spammer uses different IP (computer or server) addresses, they are almost impossible to trace or to block. One solution is the free service LinkSleeve which communicates between blogs, guest books, forums and wikis on different servers and strips the posted data from the sites if a given threshold of linking activity is passed. Some software companies have even developed their own customised anti-spamming measures, creating white lists and black lists which prevent noted IP addresses from posting links, or filtering out notorious spam keywords.
The fundamental flaw which all comment spammers have been exploiting is that open comment forums such as wikis, forums and blogs allow HTML to be posted free of charge. The problem is that such pages are heavily ranked by Google's existing algorithm due to the extensive interlinking of blogs. Fifty or blogs linking to your site would create a firestorm effect on your Google rankings. Until Google and other search engines treat such sites as chat sites and reduce their impact upon Page Rankings, the incentive will remain and the problem will persist.
One of the most interesting pieces of journalism in recent years was an experiment conducted by a Guardian journalist Bobbie Johnson who sought during the dying embers of 2005 to try and see if he could trick Google with a spoof site that he had created. Over the course of a week he used a variety of black hat ‘tricks’ to push his spoof site to the top of the Google rankings. His spoof site promoted eco-friendly flip-flops, a spurious product which was advertised as producing no harmful emissions (a disclaimer was added to the site to clarify the nature of his experiment clear together with a picture of a pair of flip-flops. First he overloaded the page with keywords and added invisible metadata. These efforts failed to promote the site’s Google ranking above a further 11,000 results for ‘eco-friendly flip-flops’. Mr. Johnson then turned to hard-core black-hat tactics. He created a second site which contained a large number of links to the first. Within hours his spoof site was rising to the top of the search engine rankings above 11,500 others, a feat he achieved in a few days. So it appears that Google is not quite as black hat resistant as we were led to believe.

A slight case of ‘Google Bombing’

Another much used and poorly understood term is ‘Google bombing’. Google bombs are pages which are engineered to appear when a specific phrase or word string is entered into a Google search. One of the first and most famous Google bombs appeared in 1999 when the Google search phrase ‘more evil than Satan’ led to the home page of Microsoft. Google denied any responsibility. As with all SEO techniques there are darker and lighter shades of grey.
Recently Wordpress, a common piece of blogging software, was reported to have used hidden advertising on their website to create a revenue stream. A company Hot Nacho has a system where it pays a small sum for short articles to be written on random topics and then pays hosts such as Wordpress which possess a high Google Page Ranking to host thousands of such articles and to link to them. Wordpress allegedly hid these links far from the roving human eye using a ‘CSS trick’ in exchange for a flat monthly rate. This ‘google bombing’ helped to pay for an otherwise popular open source project. Such practices however were deemed to constitute a violation of Google’s ‘cloaking content’ policy and have damaged both Wordpress’ reputation and Google ranking.
Some define a ‘Google bomb’ or ‘washer’ as an attempt to influence page ranking through the use of consistent anchor text in all links directed to that page. A so-called ‘Google bomb’ is created if a significant number of sites link to the same page using the same keyword string. For example, if a user registers many website domains and all of them link to a primary site using the link text ‘cleverer than Einstein’, then a Google search using the phrase ‘cleverer than Einstein’ will push the linked site to the top of the rankings even though the phrase does not appear on the site itself. One of the most common forms of Google bombing involves exploiting blogs, where even a small number of such link phrases can produce a short-lived Google bomb before they are removed, provided that they are not highly competitive phrases such as ‘elite model’. Google bombs are also renowned for their humour. For instance using the Google search phrase ‘Internet star baloney’ returned the name of actress Clare Danes as top ranking, although other less adult search terms could of course be employed. Google bombing was a flaw discovered by Adam Mathes in 2001, and this allowed him to deduce the nature of the Google algorithm that is used to calculate page rankings. He tested his theory by making the website of his colleague the number one result for a search query of ‘talentless hack’. A search for ‘miserable failure’ on June 1, 2005 returned George W. Bush’s biography as the top-ranked search item on most search engines.
In response to these assaults on the integrity of its search results, Google has defended its algorithms as being merely an index of the popularity of certain topics on the Internet, denouncing Google ‘bombing’ as short-lived ‘cybergraffiti’. Google state in a press release that they do not ‘condone the practice of google bombing, or any other action that seeks to affect the integrity of our search results, but [that they are] also reluctant to alter our results by hand in order to prevent such items from showing up. Pranks like this may be distracting to some, but they don't affect the overall quality of our search service, whose objectivity, as always, remains the core of our mission’ (Google Blog). Since February 2005, Google have been changing their algorithm specifically to counter threats such as Google bombs. This allegedly has knocked out 90% of all known Google bombs. This has been largely achieved due to Google’s restructuring of its Page Ranking system to meet rising standards in result relevancy maintained by Yahoo and MSN.
As Yahoo, AltaVista and HotBot use similar algorithms to Google, they are likewise affected by Google bombs. Some search engines such as Ask Jeeves and MetaCrawler produce different search results, as they are ‘metasearch’ engines which collate data from other search engines. Google bombs are easily constructed, for instance select a keyword or phrase such as ‘some like it spicy’, and then a URL e.g. http://www.aadvarkconsulting.com . Put the two together within hyperlink tags and post these on web sites and forums all over the Internet, and should such a company exist they might be red-faced. (I do not personally condone such behaviour, but the point is self-evident.)
Google-bombing techniques have also been applied to spamdexing, which simply means the posting of links to sites on Internet forums together with the phrase that is being promoted. Such exponents often target forums such as wikis with less traffic to remain undetected. Another tricky technique is for a domain name to set up the Domain Name Server (DNS) entry so that all sub-domains are directed to the same server. As all sub-domains are interlinked with desirable keywords, and as Google regards each sub-domain as a distinct site, many sub-domains linking to one another effectively enhances the Google Page Rank of these sub-domains and of any other pages that they link to.
One of the big Google bombing issues was raised through the activities of marketing giant Quixtar, who began a ‘Quixtar Web Initiative’ intentionally to manipulate Google search results. In essence it was a propaganda war which was designed to depress negative evaluations of Quixtar to the lower reaches of the Google search results after negative comments had been published by consumer protection groups, in an eBook, and by other detractors of its alleged Pyramid schemes. Quixtar’s early attempts backfired and saw them banished to the lower reaches of the Google search engine results. However in January of 2006 Quixtar.com bounced back to the top of the search engines as a result of the number of the number of sites which were critical of Quixtar’s business activities, mainly on the same medium (blogs) that Quixtar had sought to subvert. Quixtar are also alleged to have used independent lines of sponsorship to achieve a Google bombing effect. What is interesting is that those condemning Quixtar’s Google bombing campaign did more to promote its search engine ranking than the Quixtar campaign itself. This reveals yet other unexpected flaws within the Google algorithm.
On a lighter note here are some amusing top-ranked Google bombs:

‘Buffone’ returns the unofficial biography for Italian Premiere Silvio Berlusconi. ‘Buffone’ is the Italian word for ‘clown’. ‘Miserabile fallimento’ also returns the link for the official Silvio Berlusconi biography. ‘miserabile fallimento’ being the Italian for ‘miserable failure’.

‘French military victories’ directs to a faked Google page suggesting that there is no such thing as a French military victory (not entirely true as Napoleonic historians will inform you).

‘gastrointestinal dysentery’ gives a search result of Kres Chophouse & Lounge of Orlando, Florida, a restaurant which fired a servant for blogging about his work there.

‘Great President’ returns the biography of George W. Bush on the official White House website. Perhaps the first government sanctioned Google Bomb. However entering ‘moronic politician’ returns a link on ‘Bush’s brain’.

‘liar’ yields Tony Blair (on google.co.uk), who was accused of misleading the public over weapons of mass destruction in Iraq. ‘poodle’ returns the link to a Tony Blair biography on a UK domain search.

‘Orgy convention’ returns the Democratic political Convention, and ‘political sexual deviant’ gives Bush’s advisor’s policy on American family values, but then again this is published on the Democrats home page. Clearly Google bombing has entered the mainstream of political mudslinging.
Whatever the strategy, the evolution of predator and prey continues at a great pace on the Internet. For every problem solved two more are created, and for every spam-busting technique that is designed, two new ways of subverting search engine results are uncovered. But as Professor John Maynard Keynes discovered, it is this cat and mouse game which keeps everyone on their toes and drives innovation, and besides, if people didn't break windows there would be no work for the people who fix them.
© Icqurimage 2006