Spinn3r 2.2 Released
Spinn3r 2.2 rolled out the door today.
We've been working on a much larger release which is still pending but wanted to release new functionality out the door for some of our more recent clients.
So what's new?
We've added the ability to register weblogs directly within Spinn3r. All that's necessary is to call a new source.register method with a link to a weblog or any URL that has an RSS feed and publishes dynamic content. Spinn3r will then do the rest. We'll fetch the HTML feed, perform RSS autodiscovery, and then add it to our source list and start crawling in real time.
What's interesting is that this allows our clients to collaborate on weblog discovery. Spinn3r does a great job at discovering weblogs but there are some niche sources where we'd love to have a few more signals to help out in our spam detection.
This also fixes a number of bugs including:
- Our permalink crawler API now adds the ability to filter by API tier.
- We've added better mainstream media site detection.
- A new post:resource_guid field is available within Spinn3r results to identify a unique post
- New publisher types including FORUM, CLASSIFIED, and REVIEW.
It sounds crazy but we've also started a sub-project to allow Spinn3r to also license spam content. We've had a few malware and anti-virus companies approach us looking for a solid stream of real time spam posts. Unfortunately, Spinn3r wasn't setup to provide this as 99% of our customers are only interested in ham.
This adds a new spam_probability backend variable which isn't exposed just yet. We'll allow our customers to add &spam_probability=x.x in their API call to control how much spam they want to receive.
Believe it or not, some customers would like to boost up their signal a bit and add a bit and add more spam as a tradeoff to get a bit more recall.
By default, this content will only be available to the client who registered the source. This prevents clients with niche requirements to index special feeds (search feeds being a good example) without hurting any of our other customers.
Spinn3r 2.5 is also right around the corner. It's taken us a bit longer than we had hoped to bring our new hardware online. You can read about our progress here on my personal blog.
Comments