May 07, 2005

Hacking Voyager URIs

In a previous post (Embedding RSS Search...), I described a screenscraper for the OU Voyager library catalogue that produces an RSS feed of the search results.

The scraper was very crude and gave a fixed number of search results, which (as the someone in the Library pointed out!) was all a bit contrived. As indeed it was....

So here's a new Voyager catalogue scraper that allows you to decide how many search items you want reporting, 3 in this case from a keyword search on technology:

http://ouseful.open.ac.uk/OUVoyagerScraperRSSNum.php?q=technology&num=3

It's quite refeshing that the screenscraper accpets limited compound search queries, for example on multiple keywords such as technology+internet

But that's not really the point of this post. What I wanted to log here was a few things I've found out about how to construct Voyager queries in the address line.

A bit of playing with the standard Voyager catalogue search interface can be used to generate URIs to search result forms very easily, but reverse engineering the switches can be a laborious task. Picking apart the web form to see how queries are in principle constructed is another possibility (I generally use the Firefox web developer toolbar Forms tools for this sort of forensic activity) but again, it can be time consuming.

Tinkering around with the form led to a couple of interesting constructions, such as:

- class mark search, e.g. http://voyager.open.ac.uk/cgi-bin/Pwebrecon.cgi?SC=CallNumber&SA=6.31

- ISBN search, e.g. http://voyager.open.ac.uk/cgi-bin/Pwebrecon.cgi?DB=local&Search_Arg=0849304563&Search_Code=ISBN&CNT=50

With a bit of playing, it's easy to see what some of the switches do, and how they can be added to other searches.

For example, it's easy enough to work out that adding CNT=nn limits the number of search results returned to a maximum of nn.

If you want to tinker with te URIs generated from the web form search, here are a coupleof pointers to making apersistent URI (the ones automatically generated in response to making a query through the web form are often littered with session information):

a) Edit the URL to remove the information beginning with PID. The PID, SEQ, and HIST numbers identify the search as belonging to a specific user session. Leaving them in will get you a time-out message rather than search results.

b) Add "DB=local&CNT=25" (or CNT=10, or whatever) to the end of your URL. This tells the computer which database to search and how many hits to display.

Here are a couple of queries I managed to deconstruct. The v1/v2 switch changes the reporting level (setting vn=0 is the same as not including it) and SID=1 appears to do the same job as DB=local:

http://voyager.open.ac.uk/cgi-bin/Pwebrecon.cgi?v2=1&Search_Arg=0849304563&Search_Code=ISBN&CNT=10&SID=1
http://voyager.open.ac.uk/cgi-bin/Pwebrecon.cgi?v1=1&Search_Arg=0849304563&Search_Code=ISBN&CNT=5&DB=local

There comes a point in playing, though, when sometimes you just want to cheat and look up what a particular switch does without having to try all the options. And this guide to URL-Launched Searches in ILLINET Online in WebVoyage does the job admirably.

The makers of Voyager also produce guidance, I think, but I haven't been able to see it... I do have a URL though (Voyager crib sheet), so if anybody can get me a copy of the actual document, I'd be very grateful :-)

Posted by ajh59 at May 7, 2005 12:37 AM
Comments