TMCnet News

The "semantic web" will make finding answers online as simple as asking the right question
[May 30, 2008]

The "semantic web" will make finding answers online as simple as asking the right question


(New Scientist Via Acquire Media NewsEdge) BY THE time the web began to be widely used, around a decade ago, its inventor was already working on a more ambitious plan. Tim Berners-Lee was imagining the ultimate "mash-up": a web in which any sort of data - from train timetables to scientific papers - could be seamlessly combined. The days of trawling through results from search engines would be over. Instead, browsers would navigate in search of answers, not web pages.

Although many still doubt this so-called "semantic web" will take off, the first concrete steps have recently been taken. Over the past year, several large data sources, most notably Wikipedia, have been converted into formats that make them easier to combine. Software that integrates these data sources is also being developed. The results are not yet user-friendly, but the semantic web, so long in gestation, may finally be coming into being.

"This is a really important moment," says Tom Heath, a semantic web specialist at Talis, a software firm in Birmingham, UK. "We're taking the theory and ideas of the last 10 years and making the vision a reality."


The fruits of recent work into building the semantic web are most visible at DBpedia, a semantic version of Wikipedia. The regular, web-page-based version of the online encyclopedia works fine when a single page contains all the information you need, less well when you want answers to broad or complex questions. To find all battles that took place in the German region of Saxony, for example, a user might search for the terms "battle" and "Saxony". That returns nearly 1000 results, most of which are not directly relevant.

To solve this problem, the DBpedia team at the Free University of Berlin and the University of Leipzig in Germany have developed software that analyses the content of Wikipedia and reorganises it into a vast list of statements about "things", such as people and places. The entry for Saxony, for example, becomes a set of links that connect the name with other entries in the database, such as local landmarks and notable residents.

With the data in this format, users will ultimately be able to ask questions rather than perform searches for suitably chosen phrases. When asked to identify conflicts that took place in Saxony, DBpedia identified four battles without turning up reams of irrelevant pages. Queries can be extremely complex: one user asked for all soccer players who wear the number 11 shirt, play for a club with a stadium that seats over 40,000 people and were born in a country whose population exceeds 10 million. DBpedia duly supplied a list of 10 players.

The system is still very much in the test phase - the two answers mentioned above included a battle that took place in Lower Saxony and a player who now wears a number 3 shirt. But DBpedia provides a hint of what the semantic web would look like if all web pages were tagged in a way that allowed computers to understand what kind of information they contain.

This can be done according to a model known as the Resource Description Framework (RDF), created by the World Wide Web Consortium (W3C), the body which develops new web guidelines and technologies under the direction of Berners-Lee. With RDF, any type of data on websites can be assigned appropriate descriptive tags.

Once this is done, data from different sources can be combined in such a way that more interesting things start to happen. Two members of the DBpedia team, Christian Becker and Christian Bizer of the Free University of Berlin, have developed Mobile DBpedia, a cellphone application that takes a user's GPS position and displays Wikipedia articles on places in the vicinity, as well as showing them on a map. It also draws in information from any source that has made its data available in an appropriate format. This includes articles from Revyu, a semantic website that lets users post reviews of, for example, restaurants and places of interest, along with photos from Flickr that have been tagged with location coordinates.

Last October the BBC started making data on its television and radio shows available in a semantic format. "People outside the BBC can now do interesting things with our content," says Tom Scott, a team leader at BBC Audio and Music Interactive. The online synopsis of each show is now tagged so that a suitably configured mash-up tool can identify what the broadcast was about and who it featured. Once the BBC makes its music data available, something Scott and his colleagues are working on, it will be possible to develop an application that lets the user know whenever the BBC airs a show matching their musical tastes.

Some sites have managed to achieve similar aims, though only by painstakingly collating disparate sources of information. EveryBlock, which covers San Francisco, Chicago and New York, is an example: it delivers news, blog posts and images linked to the user's location. But this involves developers at the site taking each database as it comes and working out how to integrate it into their system. "A more semantic web would make it easier for us to compile disparate sets of data," says Adrian Holovaty, the site's founder.

A semantic web would also allow users to develop their own mash-ups, rather than rely on the skills of Holovaty and other programmers. Once users are able to ask browsers about things - people, places, events - and the relationships between them, they will be able to, for example, easily identify properties for sale that are close to highly rated schools and hospitals, instead of just browsing real-estate listings.

So is Berners-Lee's vision about to be realised? Despite some progress, it's not yet clear that it will. For a start, at the moment there is no easy way to search the semantic web. The interface used to ask questions of DBpedia would frighten less web-savvy users. There is other software that can be used to search a broader range of online semantic information, but these tools are not suited for the average internet user either.

As the amount of semantic data grows, better search tools will probably spring up. But some web experts point to another problem with the W3C's plans. They say that semantic web advocates have spent too much time focusing on the technical aspects of their schemes. These can require web developers and content creators to invest considerable time learning new programming languages before the data can be made available in a suitable format. As a result, years of proselytising by experts has failed to convince content creators to buy into the semantic web ideal. It's notable, for example, that DBpedia was created by computer scientists, not by the community of editors that keeps Wikipedia running.

"I was a big proponent of the semantic web a few years back," says Timo Hannay, who runs the web publishing team at Nature Publishing Group in London. Now Hannay and others say the languages developed by the W3C to describe data semantically are just too complex for many website owners.

Web developers and users may instead turn to simpler semantic systems, such as the metadata tags already used to describe shared bookmarks on sites like del.icio.us. While these tags do help computers find the right data, they can limit the potential for mash-ups. If a blogger only uses the tag "San Francisco" to describe reviews of restaurants in that city, for example, that will not help users wanting details of particular kinds of eatery in specific neighbourhoods. "We're seeing the semantic web emerge," says Hannay. "It's just messier than we'd hoped."

So while the semantic web seems to be coming to life, no one can say for sure what it will grow into. In an article in Scientific American

in 2001, Berners-Lee imagined a world in which software would be able to take over important but uninteresting tasks, such as selecting a trusted doctor based at a convenient location. That would require many more parts of the web, from online calendars to doctors' surgeries, to go semantic. Right now, it's still a futuristic ideal. Advocates of the idea may have kick-started the creation of the semantic web, but it's still not certain that others will follow their lead.

Copyright ? 2008 Reed Business Information - UK. All Rights Reserved.

[ Back To TMCnet.com's Homepage ]