Title Retrieval and the Semantic Web incorporating A Theory of Retrieval Using Structured Vocabularies
Abstract A primary motivation for the development of the Semantic Web has been the need for effective information retrieval systems which may be realised through vocabulary control and the use of structured metadata. The technological framework of the Web (URI, HTTP, XML) and of the Semantic Web (RDF, OWL, SPARQL) provides a platform upon which distributed data and metadata applications may be constructed, but does not in itself provide any direct support for information retrieval applications per se. Widely applicable Semantic Web languages that extend this basic layer and provide generic support for retrieval applications, in addition to good practice guidelines and design patterns for developing such applications, are required. The ultimate purpose of this report is to develop a formal theory of retrieval using controlled vocabularies that have a simple and intuitive structure, to provide the necessary theoretical foundations for the development of Semantic Web languages and design patterns for distributed retrieval applications. The main body of this report is devoted to the articulation of such a theory. The theory is expressed formally through the use of mathematical notation, with the intention that this level of formality will provide the bridge between informal requirements specifications and the implementation of effective retrieval applications in computer systems. Specifically, a theory is developed to describe the ways in which a structured vocabulary may be used to construct an index over a collection of objects and then used to express queries which may be evaluated against an index to obtain a set of results. This theory is extended to consider ways in which both the precision and recall of retrieval strategies may be improved, through the use of expansion and ranking techniques and through ?coordination?. The problem of translating between controlled vocabularies is also considered. The theory attempts to formalise, unify and extend the traditional wisdom of the library sciences regarding the use of thesauri, classification schemes, subject heading systems, taxonomies and other types of structured vocabulary, so that proven techniques and methodologies may be transferred to a Semantic Web context. The recently chartered W3C Semantic Web Deployment Working Group has been charged with the development of the Simple Knowledge Organisation System (SKOS) to W3C Recommendation status. SKOS is a Semantic Web language specifically intended to support information retrieval applications using controlled vocabularies that have a relatively simple structure. A formal requirements specification is the first planned deliverable in the standardisation of SKOS. An immediate goal of this report is to provide a level of abstraction that can be used to perform a comparative analysis of use cases involving information retrieval systems that operate with structured vocabularies, so that the requirements of these systems with respect to Semantic Web languages such as SKOS may be clearly determined. Also, this report suggests ways in which the theory may be mapped to concrete language constructs and representation patterns in Semantic Web languages. In so doing it is hoped that the development of SKOS and similar languages may be grounded with sufficient rigour to ensure their wide applicability and consistent use.
Keywords Formal Methods , Semantic Web , Information Retrieval , KOS , SKOS
Language English (EN)
Thesis MSc, Oxford Brookes University, 2006. http://purl.org/net/retrieval report.pdf 2006
