.

This is Spinn3r's offficial weblog where we discuss new product direction, feature releases, and all our cool news.

Spinn3r is a web service for indexing the blogosphere. We provide raw access to every blog post being published - in real time. We provide the data and you can focus on building your application / mashup.

Spinn3r handles all the difficult tasks of running a spider/crawler including spam prevention, language categorization, ping indexing, and trust ranking.

If you'd like to read more about Spinn3r you could read our Founder's blog or check out Tailrank - our memetracker.

Spinn3r is proudly hosted by ServerBeach.

Archives

May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
August 2007
July 2007

Massive Blog Spam Epidemic Gets More Attention

200804071213We've been covering a massive blog spam epidemic thanks to a nasty/evil spammer who's exploiting a XMLRPC bug in Wordpress 2.2.

This issue is FINALLY getting the attention it deserves:

I had a closer look at many of the blogs concerned that had spammy content — pages promoting credit cards, pharmaceuticals and the like, and I realized that if you go to the root domain they are all legitimate blogs. Not scraper blogs that were being auto-generated with adsense / affiliate links, which was extremely curious, and actually reminiscient of something that hit home a few months ago.

A few months ago, this blog got hacked — but in a sneaky way. Not only did the hackers insert “invisible” code into my template, so that I was getting listed in Google for all manner of sneaky (and NSFW terms), so that people could click on those links with the hacker getting the affiliate cash — but *actually*, said hackers also inserted fake tempates into my wordpress theme.

Techaddress is also covering this issue...

Oddly enough Tailrank picks up on this spam because of our clustering algorithm. We cluster common links and terms via our blog index and promote these stories to our front page.

Since we 'trust' stories with past behavior when major A-list blogs like ZDNet get owned we believe they are legitimate links.

If we had a smaller index this might be a big easier to handle but we're indexing 12M blogs within Tailrank and on Spinn3r.

Another way around this of course would be to blacklist every blog running Wordpress 2.2 or earlier but we're talking millions of blogs here and we don't want to unfairly harm anyone.

To date our approach has been to wait until Tailrank has identified the spam, and then blacklist any blogs that have been compromised.

Unfortunately this is a war of attrition with the spammer just spending a few more days and hacking another dozen or so sites.

The only positive aspect of this is that it's encouraging people to upgrade to Wordpress 2.5.

We're also working on some secondary algorithms to catch this a bit sooner and we'll probably ship these in Spinn3r 2.5 which is due shortly.

Comments

Post a comment

If you have a TypeKey or TypePad account, please Sign In