.

This is Spinn3r's offficial weblog where we discuss new product direction, feature releases, and all our cool news.

Spinn3r is a web service for indexing the blogosphere. We provide raw access to every blog post being published - in real time. We provide the data and you can focus on building your application / mashup.

Spinn3r handles all the difficult tasks of running a spider/crawler including spam prevention, language categorization, ping indexing, and trust ranking.

If you'd like to read more about Spinn3r you could read our Founder's blog or check out Tailrank - our memetracker.

Spinn3r is proudly hosted by ServerBeach.

Archives

September 2009
July 2009
June 2009
May 2009
April 2009
February 2009
January 2009
December 2008
October 2008
September 2008

Spinn3r 2.2 Released

200804062309Spinn3r 2.2 rolled out the door today.

We've been working on a much larger release which is still pending but wanted to release new functionality out the door for some of our more recent clients.

So what's new?

We've added the ability to register weblogs directly within Spinn3r. All that's necessary is to call a new source.register method with a link to a weblog or any URL that has an RSS feed and publishes dynamic content. Spinn3r will then do the rest. We'll fetch the HTML feed, perform RSS autodiscovery, and then add it to our source list and start crawling in real time.

What's interesting is that this allows our clients to collaborate on weblog discovery. Spinn3r does a great job at discovering weblogs but there are some niche sources where we'd love to have a few more signals to help out in our spam detection.

200804062320-1This also fixes a number of bugs including:

  • Our permalink crawler API now adds the ability to filter by API tier.
  • We've added better mainstream media site detection.
  • A new post:resource_guid field is available within Spinn3r results to identify a unique post
  • New publisher types including FORUM, CLASSIFIED, and REVIEW.

It sounds crazy but we've also started a sub-project to allow Spinn3r to also license spam content. We've had a few malware and anti-virus companies approach us looking for a solid stream of real time spam posts. Unfortunately, Spinn3r wasn't setup to provide this as 99% of our customers are only interested in ham.

This adds a new spam_probability backend variable which isn't exposed just yet. We'll allow our customers to add &spam_probability=x.x in their API call to control how much spam they want to receive.

Believe it or not, some customers would like to boost up their signal a bit and add a bit and add more spam as a tradeoff to get a bit more recall.

By default, this content will only be available to the client who registered the source. This prevents clients with niche requirements to index special feeds (search feeds being a good example) without hurting any of our other customers.

Spinn3r 2.5 is also right around the corner. It's taken us a bit longer than we had hoped to bring our new hardware online. You can read about our progress here on my personal blog.

Comments

The comments to this entry are closed.