August 07, 2006

In Need of Standardisation: Library Feeds (New Books)

I've just been tinkering with a way of displaying, and hopefully adding value to "new books in the library" feeds (a development of this post that can be found in evolving form at ouseful.ning.com) and I've hit a ***huge*** problem.

None standardised feed descriptions.

Actually - I should rephrase that - feeds that don't identify an ISBN within the link or description tag for each item somewhere. What I had hoped was to be able to write a regular expression that would parse link and description fields looking for things that stood a good chance of being ISBNs, and using these to key the added value items (book cover, cotnents, dscription etc.)

Maybe more by luck than judgement, I'd picked an OU feed (maybe the only feed?) that had a link structure for each item that contained an ISBN. I scraped this to index in to various other services (book cover, for example, as well as book description and book contents listings).

However, it seems that most of the other feeds use a different way of describing the links (e.g. look at the link tag here (http://voyager.open.ac.uk/cgi-bin/Pwebrecon.cgi?BBID=349200) and here (http://voyager.open.ac.uk/cgi-bin/Pwebrecon.cgi?DB=local&Search_Arg=0120884364&Search_Code=ISBN&SL=None&CNT=10).

And there is no useful description info (though some of the feeds give location info - such as classmark and whereabouts the book is in the library - in the ddescription field).

Admittedly the BBID link is cleaner - but how repurposable is each BBID? Is there a BBID2ISBN converter (and v.v.?)? And if so, is it available as a webservice? (which would be another shed load of hits on the server?)

So when I came to check the script with other OU new books feeds (they are arranged by topic) I found they don't work...:-(

Having a quick trawl of other libraries (just Google library new books rss) you'll find there are as many - if not more - ways of "defining" a new books feed as there are libraries who publish them.

Not being a librarian, or even working for a library, yet reading a lot of Library 2.0 blogs, I've noticed there seems to be a lot of gripes in the community about vendors locking data up and making it hard to repurpose.

On the other hand, it seems that an increasing number of people are managing to subvert the system and produce feeds - like new books feeds - using their own scripts.

Just like in the OU.

But when I started using the OU library feeds 'properly' a week or two ago, I discovered that many of them were broken/didn't validate, had lots of nonsense elements in (e.g. that broke links) and so on.

And looking through a handful of feeds from a handful of other sites just now, I find exactly the same pattern - feeds with item description links that all point to the same catalogue entry for different items, feeds with broken links (you can't have spaces and words in a link - it just won't work...it seems that many library catalogues give out junk at the end of a link - so you have to clean it! (here's a tip - try running your feeds through http://rss.scripting.com/).

PS whilst looking at the various library feeds, I also had a look at feeds from LibraryThing. It seems these books are referenced according to a LibraryThing number which potentially maps onto several ISBNs (i.e. different versions of a particular book). I haven't checked to see if there is a strict one-one or many-one mapping between ISBN and LibraryThing number. Either way, an API that did round trip ISBN/LibraryThing number mappings would be handy (similary for ISBNs and OCLC numbers).

PPS Yes I know ISBNs are dodgy things (as the previous PS shows) but for a new book in the library, it almost definitely will have a single ISBN (not one of several possible ISBNs) associated with it, won't it? (Or won't it? I'm talking 80% of the time here...).

In which case, can we please, please, please try and find a way of exposing it in a consistent way in new books feeds - and then we can build some reusable and scalable apps that can be used to demo services generally and not have to be specially built for each particular feed at each particular site...

(Maybe the citation microformat effort will be able to help out here? Or maybe standardised use of Dublin Core? Just by the by, this list of RSS Feed namespaced extensions came through on today's news).

Posted by ajh59 at August 7, 2006 12:52 AM
Comments

What a mess, eh?

For LibraryThing--this it the founder here--check out our thingISBN API (http://www.librarything.com/thingology/2006/06/introducing-thingisbn_14.php). It works like OCLC's xISBN. Feed it an ISBN and it'll send you back a list of ISBNs in that edition. These do, in fact, correspond to all the ISBNs that fall under LibraryThing's "work" number. In fact, I could easily give you ISBN-to-LT Work or LT Work-to-ISBN, but in practice nobody should use LT work codes except as a transient intermediary. Because of the unique group-cataloging, the "work" number changes daily. I know, CRAAAAZY! :)

RE: Similar books. Check out LT's similar books. There's various arguments why they work better than Amazon's, particularly for a library, first among which is that they aren't trying to sell you anything.

Shoot me an email some time if you want to talk about these subjects or similar ones. This is the first time I've run into this blog, and I'm pretty excited about what you're doing!

Tim
tim [at] librarything.com

Posted by: Tim at August 7, 2006 03:09 AM