OUseful Info: Making Sense of the Web - Semantic Search Optimisation

June 29, 2007

Making Sense of the Web - Semantic Search Optimisation

A jumble of thoughts as I ease into the day, starting with a phrase that occurred to me walking the dog last night...

I'd just picked up an email from Stephen relating to wikimindmap, which I'd rehashed to display an OpenLearn Unit navigation in mindmap format .

Stephen commented that a tension arises when displaying a link collection scraped from a page in a simple hierarchy, because not all links are equal: "Two links look the same to a wiki, but to a concept map one is a major link while the other is minor, subsidiary."

I agree with this, but still think there's mileage in starting somewhere and then seeing how we can improve matters through structure, or cunning...

The semantic web may or may not be just around the corner.

What struck me whilst walking the dog was where the semantics could be applied. I've just read David Weinberger's Everything is Miscellaneous, where he writes at length about sorting on the way out rather than indexing on the way in.

You can listen to an interview with David Weinberger on IT Conversations (how come IT Conversations don't having sharing/embed code??), see the book tour presentation at Google TechTalks:

or see David Weinberger in conversation with Bradley Horowitz via the YUI Theatre:

"Sorting on the way out, rather than indexing on the way in..."

It struck me that if we get regular expressions in search queries (as in Google code search), then it becomes possible to write quite complex filters that over time will give some approximation to semantic search, whilst being incrementally useful.

So for example, with regular expression for a postcode or zipcode applied as a search limit, I could narrow down on "where is" type queries (in the US and UK at least!).

For some time, many of the search engines have been supporting question asking/fact returning queries (for example: population of England or capital of france (see more examples on the In Search of Google playlist)).

It seems to me that by providing regular expressions, as well as search limits that employ proximity search (that is, looking for terms that are near each other), expert users will be increasingly able to issue "semantic queries", albeit phrased in arcane language and specifying the form or structure of the desired answer.

Like looking in the library for books with a blue spine... we put the shape of the answer we want into the query...

Now I know only too well that even today's advanced search tools are not that widely used (one reason we put the TU120 Beyond Google information literacy course together), but using services like Yahoo pipes, it would be possible to produce a set of custom built 'semantic' filters that can be invoked using simple, hopefully easy to remember search limits. (Try typing just the word movies into Google, for example: the first hit will be a list of movies showing near you (once you have provided a 'home' postcode...)

This is where semantic search optimisation comes in - optimising the text on the page so that it becomes amenable to "semantic" search filters, and allows searchers to write query filters that pick up on the desired shape of the results page - like the constraint it contains a zip code.

These are all baby steps I'm talking about - incremental innovations that may or may not (probably are) part of search engine voodoo magic - but things that can be picked up on by individuals and incorporated in their searches (if they know how.... hmmm... ;-)

Formally structured (standardised) URLs are also relevant in this context, not only for indexing and navigation via the address line, but also supporting the definition of advanced queries limited by URL structure (this is already partly possible using things like filetype:, page and domain search limits).

Read/Write Web picked up on this today: Standard URLs - Proposal for a Web with Less Search, suggesting that "if there was a standard way to turn things into URLs, then finding information would be a lot easier".

They propose widespread adoption of standardised URL structures, but I'm not sure how plausible this is, unless publishing system developers get on bard (I could imagine a "blog standard URL structure", for example) or "URL protocols" along the lines of OpenURL.

I could see a role for an intermediary URL rewriting service, however, that publishes a standard URL format, with an element specifiying a particular 'target' website, that rewrites URLs appropriately:

Users could then write URL 'queries' along the lines of : http://easypeasyurl.net/amazon/books/j-k-rowling/harry-potter-and-the-goblet-of-fire

In the library world, libezproxy provides a good example of a widely used URL rewriting service.

(And don't forget the OUseful Redirects page, or my suggestion for standardised OpenLearn URLs ;-)

PS Just by the by, whilst trying to track down a link I knew I'd visited today, it struck me how a branching forward listing would be useful in my browser; in one tab, I had gone forward from a delicious links page to a site (call it 'A' ;-), back to the links page, then forward to another delicious links page and on down another path.

When I wanted to revisit site 'A', I couldn't reach it via forward and back browser buttons (though I could go back to the branch point and see from clicked through links which one it was). What would be really neat would be for the forward and back browser buttons to implement at least a tree navigator for the whole of the session in that tab... or maybe even a fully blown graph navigator (that somehow depicts how you moved from one tree branch to another in a single step, for example...).

I need to sketch a mockup, don't I? ;-)

Posted by ajh59 at June 29, 2007 11:43 AM

Comments