OUseful Info: OpenLearn XML Processor, Redux

February 12, 2008

OpenLearn XML Processor, Redux

Some time ago, I posted about an OpenLearn XML processor, that parsed the XML representation of a sample OpenLearn XML unit in various ways, disaggregating the unit into component assets, such as images, audio and video items, and so on.

Looking back over the code a couple of weeks ago, I'm not sure that it ever worked properly, so I've popped a revised version of the processor up that should work correctly (fingers crossed... ;-)

The original proof of concept ran various XSL translation tools over local copies of a handful of OpenLearn XML units. Since then, the OpenLearn team have started to provide access directly to the XML source of each unit on the OpenLearn website, so the XML processor can now be run against any unit hosted on the OpenLearn website.

One thing I need to think about doing is cacheing the output of each XML processor transaction, or storing the output of the XSLT following its first run. The OpenLearn units are pretty stable, after all, so running the XML representation of each course every time the processor run is hugely inefficient. (I was thinking of trying to store the results in the cloud using the Amazon simple database service, although if Zoho offer an API to their database, maybe I should give that a whirl? The third alternative I've been considering is storing the results on the Talis platform, though I'm not sure whether I have an account/space on there yet?

So what does the OpenLearn XML processor do? Here's a quick recap - for each unit run through the processor, it will generate an OPML file containing collections of links that point to: the image fies that appear in the unit content; the audio files referred to in the unit; the video files contained within the unit.

For example, here's what happens when you run the L120_1 materials through the OpenLearn XML processor:

(This doesn't actually work for me a lot of the time in my everyday-use Flock/Firefox browser, though it does work fine for me elsewhere - some cookie problem to do with being logged in to OpenLearn or not, I think. If you aren't an OpenLearn user/have never logged in to OpenLearn then you should be able to see the images and play the audio. (I'm not sure the video is working yet, though). The issue has something or other to do with the huge login kludge that OpenLearn uses for guesting access to assets via the RSS feeds, but apparently I'm the only person who sees this problem so it's not a proper bug...)

You can find the OpenLearn XML processor at: http://ouseful.open.open.ac.uk/openlearnplayground/openlearnProcessor2.php

As well as the combined OPML output, the processor provides separate RSS feeds that contain links to any images found in the unit and any external links (URLs) referenced in the unit material. The outgoing links can also bundled into a Google Linked custom search engine annotations file, along with a link to the unit itself on OpenLearn, which means that is is possible to create a search engine on-the-fly that will search over only the unit and the sites it links to (see OpenLearn Dynamic Custom Search Engine for the initial announcement of this live CSE app).

As and when I get round to it, I'll also do a separate RSS audio feed with a link to the audio file added as an enclosure, which means that services like the Grazr feed reading widget can play the file using an automatically embedded audio player. (Ideally, the OPML bundle asset bundle for a unit should point to separate RSS feeds for each asset type.)

Also on my to do list is a slideshow tool to display the images from a unit in a slideshow widget via the processed image RSS feed :-)

One of the reasons I'm exploring the idea of unbundling media typed OpenLearn resources into constituent feeds is that I'm trying to get a feeling for how easy it would - or wouldn't - be to indulge in a little roundtripping, unbundling the unit into constituent parts and then reassembling them to hopefully form some semblance of the original course, or at least, a coherent, well-resourced course on the same topic.

(Another take on this exploration is provided by the MIT opencourseware unit unbundling and representation experiment I tried way back when: Disaggregating an MIT OpenCourseware Course into Separate RSS Feeds)

Although it pains me to say so, it may be that I need to have a look at some learning design templates to see whether or not they might provide some way of physically representing the structure of a course and the way the individual assets are organised within it. The XML representation used by Compendium might also be relevant here? As a first step, through, it may make more sense to try to tease out the structure of a course from structural outline elements (such as section headings etc.) and represent them using a hierarchical schema, such as the Freemind mindmap format.

(I've actually experimented with that format a couple of times before, though in a slightly different context. For example, see the T180_8 wikipedia/freemind mashup)

Alternatively, I wonder if its possible to use a clustering algorithm to try to naively tease out any content clusters within the course, again maybe representing the result in a hierarchical Freemind representation (as demoed in Hierarchical Course Clusters from Course Profiles App for the course codes declared by students using our Course Profiles Facebook application).

Ho hum - too many things I could do, no time to do any of them... :-(

Blogged with Flock

Tags: openlearn, open content, oer, xml, opml

Posted by ajh59 at February 12, 2008 05:11 PM

Comments

I just thought I ought to point out that some of the images/audio/video etc that are embedded in OpenLearn units have their own rights attributions and have not been released under the full creative commons licence as has the rest of the content. Where this happens, the rights agreed by the owner of the asset have been displayed in the textual content beside the asset.

As a result, it is risky to separate the assets from the text for re-use in case you (or some-one else consuming one of the feeds) misses the rights and breaches the item's terms.

That's why we've not done this kind of OPML of assets for the units ourselves. It would be possible to include the rights in DC or CC tags within such a feed, but since there's no asset level metadata for each asset to pick up rights info from, you're reduced to scanning the original XML, and that's messy and unreliable.

Oh and BTW we've found a fault with our guest access for RSS feeds, so maybe your problems will go away after our next release.

Posted by: Jenny Gray at February 13, 2008 01:17 PM

"It would be possible to include the rights in DC or CC tags within such a feed, but since there's no asset level metadata for each asset to pick up rights info from, you're reduced to scanning the original XML, and that's messy and unreliable."

I argued years ago for some sort of provenance attribute in e.g. image tags in the OU schema, and a license attribute would make sense too.

Given that a sizeable chunk of OpenLearn funding has ben spent on rights clearance, it seems a bit odd that you not making appropriate use of metadata and that there isn't a clear way of processing the XML feed automagically so that rights can be recoginised and addressed, and images swapped out/replaced by a rights notice and a link back to OpenLearn, if necessary...

...or something like that ;-)

"Oh and BTW we've found a fault with our guest access for RSS feeds, so maybe your problems will go away after our next release."

If I wasn't so busy, I'd find the email where you said there weren't any problems with that horrible guest authentication hack I think you're using, and it must be something wrong at my end...!

When's OpenId coming onstream by the way? ;-) And how about OAuth?! ;-)

Posted by: Tony Hirst at February 13, 2008 01:41 PM