SlyPIs out in the wild
I get really frustrated when websites don't have APIs. I know its kinda mean to screen scrape, and potentially prevented by ToCs etc, but I built this anyway. I have finals exams to revise for and not enough to procrastinate with!
So, what is all this SlyPI nonsense about then?
A SlyPI is a small (YAML based) settings file that describes how to mechanically traverse a website in order to get a variety of different bits of information. For example, my tv.com example will allow you to search their database for shows by name, then use their 'showid' to find out about that show's episodes, its air times, genre and so on.
These settings files on their own aren't particularly helpful. Lucky for you I built a ruby gem that interprets them and creates a class with methods pertaining to that slypi, so you just do:
require 'slypi' s = SlyPI.new("tv.com.slypi") s.SearchShows(:q => "Terminator")
And you'll get a list of shows that have the word Terminator in them. Cool ne?
So if a SlyPI exists for the site you want to use this is simplicity itself, but why bother making one for a new site? When you screen scrape you run the risk that the site layout will change and everything will be broken. With SlyPI even if it doesyou only need to edit the SlyPI settings file and order is restored. You could feasibly write an application that depended on screen scraping that auto-updated when a newer version of the slypi was available.
Naturally, people get frustrated at screen scraping, so please remember that you use this at your own risk, and definitely don't attempt to download the entire of a website though it...
ruby code slypi gem screen scraping from-tumblr
From tumblr archive