OUseful Info: URL Pipelines

April 11, 2006

URL Pipelines

A few more thoughts on URL pipes and URL pipelines.

If you recall, the aim is to formalise in some way a scheme or pattern for feeding the output of one web service into an other via URLs. The intention? To provide a simple mechanism for wiring the web using URL pipelines...

(Just trawling some old bookmarks, I stumbled across a couple of related items which I may need to re-read: Udell on URL-line commands and a piece on APIs and social software.)

For example, imagine something like the following, which would take an RSS feed from a social bookmarking system, generate an OPML version of it and pass this on to an OPML browser:

http://example.com/displayOPML?url=http://example.com/generateOPML?url=http://del.icio.us/tag/library

Loads of precedents exist already, of course, for example this XSLT knocked up by Danny Ayers:
http://www.w3.org/2000/06/webdata/xslt?xslfile=http%3A%2F%2Fpragmatron.org%2Fxslt%2Fdelicious-to-opml.xsl&xmlfile=http%3A%2F%2Fdel.icio.us%2Frss%2Fdanja%2Freadinglist%2Btech&transform=Submit

The form of this URL command is:
http://example.com/XSLT?xslfile=...&xmlfile=
and the output is an XML (OPML) file, which could in principle then be consumed by an OPML viewer, for example:
http://example.com/OPMLviewier?url=http://example.com/XSLT?xslfile=...&xmlfile=

In order to set up URL command pipelines, you need to have services that can be:
1) accessed via a URL, (http://example.com/service or http://example.com/service2 )
2) take a URL as an argument (http://example.com/service?url=), and
3) provide a page that can be fed into another service via its URL argument, (e.g. http://example.com/service2?url=http://example.com/service?url=)

You can then start to wire different services together in one long URL 'command line'
(http://example.com/service2?url=http://example.com/service?url=http://example.com/data)

Managing long pipelines of URLs, and in particular which arguments belong to which URL in the chain is likely to be problematic (would just saying that the URL argument is always the last one for a given 'URL command' work, I wonder?), so one workaround would be to have a URL pipeline executive service which would manage the pipeline on the user's behalf.

How so? Well, this is how I see it possibly working (all just vaporware at the mo, the product of idling time whilst dog walking...):

First off, we need some argument datatypes, for example feedURL (perhaps broken down further into OPML, RSS, Atom etc.), FOAF or calendar record, or for additional return type arguments such as JSON feed or even Javascript include (i.e. things we can use either within a pipleline, or to terminate a pipeline, as with a Javascript include that embeds a reader or tagclound in a page, for example).

Second, we need a way to name and register RESTful services that take feedURL (FOAF, etc.) arguments, and perhaps return feedURL (FOAF, etc.) files. For example:

Name: pagelinks2OPML
Location: http://ouseful.open.ac.uk/pagelinks2opml.php

Name: OPMLbrowser
Location: http://www.optimalbrowser.com/optimal.php

We also need to record the input and output/return arguments for the service:

So we might have:

Name: pagelinks2OPML
Location: http://ouseful.open.ac.uk/pagelinks2opml.php?
Input argument: [HTML]
Output: [OPML]

Name: OPMLbrowser
Location: http://www.optimalbrowser.com/optimal.php?
Input argument: &url= [OPML]
Output: [HTML]

Name: XSLTengine
Location: http://www.w3.org/2000/06/webdata/xslt?
Input argument: &xslfile= [XSL], &xmlfile=[XML]
Output: [XML]

We might also define specific implementations of general services, such as:

Name: delicious2OPML
Location: XSLTengine
Input argument: &url= [XML], &xsl=http://pragmatron.org/xslt/delicious-to-opml.xsl
Output: [OPML]

The next step is to consider the URL pipeline executive, or processor, which might work as follows:

http://example.com/URLpipelineprocessor/service2/service1

In particular, the processor replaces URL services with the name of the service (e.g. service2 replaces http://example.com/service2) and the trailing &url with a '/'.

So for example, we would write something like:

http://example.com/URLpipelineprocessor/OPMLbrowser/pagelinks2OPML/blogs.open.ac.uk/Maths/ajh59/006027.html

rather than:

http://www.optimalbrowser.com/optimal.php?url=http://ouseful.open.ac.uk/pagelinks2opml.php?url=./006027.html

The pipeline processor could work in two ways - firstly it might just (using redirects?) a long URL pipeline, writing itself out of the final URL. I'm not sure whether this would work correctly at all...Secondly, it could act as intermediary, generating different intermediate URLs that can be passed to each service in the pipeline in turn, so that each service is only presented with a single URL.

That is, in this second approach, the role of the URLpipelineprocessor is to:

1) identify the intial input URL (e.g. blogs.open.ac.uk/Maths/ajh59/006027.html)
2) feed it in to the first service (http://ouseful.open.ac.uk/pagelinks2opml.php), and
3) relay the output via an intermediate URL to the second service (e.g. http://www.optimalbrowser.com/optimal.php?url=http://example.com/d2a36e2); the intermediate URL (http://example.com/d2a36e2) is actually a page that republishes the output of the previous step (http://ouseful.open.ac.uk/pagelinks2opml.php?url=./006027.html) or translates it from the filetype output from the first service to the filetype required as input to the second service (this might be appropriate for example where one service produces an Atom feed output and the other will only accept RSS).

In other words, the processor would have to execute something like the following process:

- URL = get input URL
- currentService = get first service
- while there are further services in the pipleine
-- outputPage = capture currentService?url=URL
-- URL = republish outputPage at a local url
-- currentService = get next service in pipeline

It us then up to the user as to how to display/consume final output (e.g. by embedding a Javascript include produced by the final serivce in a page, or letting the final serivce in the chain return an output HTML page).

Instead of building the pipleine processor as a webservice, another approach would be to embed it in a browser so that it could support keyword pipelining, for example.

By setting up a patchboard/matrix listing URL fed services, and identifying which can be fed into which by virtue of their input/output types, it would be possible to wire up and generate URL pipelines relatively easily.

PS the XMLArmyKnife uses URL Pipelining to great effect...

Posted by ajh59 at April 11, 2006 02:36 PM

Comments

I'm a lapsed techie but I have chained together, by hand, some software which means I can move from a proprietary format into OPML and display it using Grazr.

The proprietary format is BrainStorm's working model in memory. I export a text outline, open it in OPML Editor, save it as OPML (I extended OPML editor, with help) and point Grazr at it for display in a blog or web page.

It all works well and doesn't take long to do - a few minutes - but it would be great to see the outline to OPML element as a web service.

I've looked but drawn a blank. Anyone here know of one?

Posted by: David Tebbutt at April 12, 2006 09:06 AM