This is Spinn3r's offficial weblog where we discuss new product direction, feature releases, and all our cool news.

Spinn3r is a web service for indexing the blogosphere. We provide raw access to every blog post being published - in real time. We provide the data and you can focus on building your application / mashup.

Spinn3r handles all the difficult tasks of running a spider/crawler including spam prevention, language categorization, ping indexing, and trust ranking.

If you'd like to read more about Spinn3r you could read our Founder's blog or check out Tailrank - our memetracker.

Spinn3r is proudly hosted by ServerBeach.


September 2009
July 2009
June 2009
May 2009
April 2009
February 2009
January 2009
December 2008
October 2008
September 2008

Post-mortem of an Advanced Spam Attack

Tailrank (and Spinn3r) suffered a spam attack over the weekend of July 21st. This attack was very advanced both in terms of the scope and technical nature. It consisted of a large number of doorway pages that redirected to another site which tried to install malware on the victims computers.

Source of Inbound Links to Doorway Pages

We're still analyzing the source of inbound links to the doorway pages in this attack. The attacker used vulnerable blog sites to link to .edu domains which hosted the actual content.

This analysis is made more difficult due to that fact that the source of the links often clean up the offending data before we can perform analysis.

Hosting Source for Pages

All of the content was hosted on compromised websites. Theses pages seemed to have resulted from more than one security hole. For instance the content hosted on:


... was probably due to a file upload hole where the attacker found a way to upload files.

On the other hand URLs such as:

http://smallschools.ischool.washington.edu:8000/d_www/generic-levitra.html http://webtango.ischool.washington.edu:8002/x_www/lesbian-incest.html http://webtango.ischool.washington.edu:8002/a_www/japanese-girls.html

Seem to have been hosted on HTTP daemons that the attackers were able to install on those hosts.

Content of Doorway Pages

The doorway pages on the hacked servers contain content which was designed to rank well on search engines. The pages were optimized for search by specifically targeting .eds domains with high pagerank.

The attacker is using machine generated text that is waited heavily waited towards a certain topic. For instance the URL:


Contains the words "japanese girls" in almost every seantance. The general strategy with this content is to rank high in searches for the target terms on search engines.

Once they have managed to attract a user to the page it then contains javascript code to redirect the user to the payload page. This redirect is heavily obfuscated such that it is imposable to know that the page contains a redirect unless you execute the javascript.

Example redirect javascript:

if(tqfojokx969 = 'bi653')

Each of the vars being evaluated contains only a small portion of the redirect code:

var irvpoqb515='docu';
var ni271 ='ment';
var msippqydp980='.lo';
var bpz978='ti';

There are 30 more variables here in random order here which each form the evaluated string.

The effect of this is that one can not write a primitive scanner that can tell that there is a redirect in the code making doorway detection difficult.

This is advantageous to the attacker as it means we can not easily red flag pages which contain a redirect. They also have hidden the target of the redirect so even if we knew that the payload site is malicious we can not see that in the source of the document without executing the javascript.

Content of the Attack Page

The attack page contains a DHTML application that pretends to scan the victims computer for malware and then offers a windows .exe that will supposedly cleans the computer of the malware that it "found."

In reality, the .exe is almost certainly itself malware. The Ajax was very well executed and looked identical to a Windows dialog box:


The code was also written to customize itself depending on what version of windows was running and if the browser was Internet Explorer:

is_XP_SP2 = (navigator.userAgent.indexOf("SV1") != -1) 
            || (navigator.appMinorVersion && 
            (navigator.appMinorVersion.indexOf('SP2') != -1));


if (navigator.appName.toLowerCase()=='microsoft internet explorer') {
if (navigator.userAgent.toLowerCase().indexOf('opera') <= 0) {


The attacker tracked the referrer to detect which SEO spam campaigns and keywords were successful on specific doorway pages.

They can then use this data to determine which campaigns were most successful and focus their efforts on improving conversion.

Effect on Tailrank and Spinn3r

The majority of the attack effected Tailrank and not Spinn3r. There were only about a dozen blogs which linked through to the doorway pages involved in this attack. These blogs have either been blocked or have had their authors contacted and the spam removed.

Tailrank ended up promoting about 30 stories which were removed a few hours later.


All of these steps adds up to make this a very advanced attack. The attack probably took one-two man months of work to achieve.

Now it is likely that the work was preformed where the cost of labor is much lower but it is worth looking at the cost in these terms to understand just how motivated the attackers are.

This is a arms race that takes up a large portion of our time at Tailrank. Where this attack did manage to get content in to our index for a short time it is worth noting that amount that got though is small in comparison the the amount we block every hour.



"about a dozen blogs which linked through to the doorway pages involved in this attack."

Did these links appear in the body of the blog post or in follow-up comments? If these links were in blog comments, which are known to be heavily targeted for spam, why would Tailrank follow and rank such content?

Jeff Row


that is a proxy server. What you describe is all part of the web, and has been. Most systems are designed to work around spam with trusted networks or analysis. It seems that Tailrank is very 'lazy' in which links it follows and pages it indexes.

Btw, tech.tailrank.com is STILL full of spam as of this moment..

Kevin Burton

Niall. The blogs look like they were either on the sidebar or in the body of the post They were weblogs which had ham but then converted to spam.

Jeff. It might be a REVERSE proxy but either way it doesn't matter. It's a totally valid HTTP URL.

"Most systems are designed to work around spam with trusted networks or analysis."

Ha... now they're not! Spam is too much of an issue on most existing systems.

Spam fixed btw... this was an artifact of the attack which we talk about here. Thanks.


Jeff Barr

Great analysis, Kevin.

The complexity of this attack certainly seems to indicate that the "value per victim" is relatively high. Someone really wants to compromise a lot of machines to make a lot of money.


Doorway pages are specially created to fool the search engines algorithm and
draw search engine visitors to a website. Doorway pages are Web pages
designed and built specifically to draw search engine visitors to your
website. They are standalone pages designed only to act as entry or door to
your websites. Usually these pages are theme based. They are also known as
portal pages, jump pages, gateway pages, and entry pages

Doorway pages are considered to be part of black hat and should not be used,
although many of seo companies use these pages for gaining more traffic.



The attacker tracked the referrer to detect which SEO spam campaigns and keywords were successful on specific doorway pages.


thanks you all

Palcom Web

Doorway pages are special pages that are made for the sole purpose of fooling the search engines crawlers. They are also known as bridge" pages or "doorway" pages, portal pages, jump pages, gateway pages, entry pages, and by other names as well. To search engines to improve their traffic. These are black hat techniques to fool the search engines algorithm, usually door WebPages are made to concentrate on a single keyword or keyword phrase. They are created to emphasize on a particular keywords and are present on the net as a door to the website. One problem with this is that these pages tend to be very generic. Since these pages are made for different keywords they are often marked as spam, since they carry duplicate content. Another problem is that these pages don’t provide the goal pages, they act as a bridge or doorway to the original website that is why they got the names as bridge" pages or "doorway" pages, portal pages, jump pages, gateway pages, entry pages etc webmasters usually propel visitors forward with a prominent "Click Here" link or with a fast meta refresh command.

msn indir

Thanks You All The Topic Beatiful

Joel Webb

Re the comment above;

oh the irony..


Great post though.

The comments to this entry are closed.