1 minute read

Challenges for finding the relevant datasets

More and more public data become available through various channels, e.g., open data portal and other websites. However, there is a lack of overview of available open data. Often the published data descriptions are unprecise or incomplete and it is time-consuming to browse over a large number of datasets with unprecise or incomplete descriptions to identify relevant datasets. On the other hand, data portals or websites normally provide only keyword-based search. It is often difficult to know what to search, e.g., which keyword shall be used in order to find the relevant datasets that user needs.

Semantic search copes with the above challenges and attempts to improve search accuracy by understanding the searcher’s intent and the contextual meaning of terms in the searchable dataspaces.

Ontology provides explicit formal specifications of the concepts and relations in a domain and facilitates common understanding and knowledge sharing. An ontology defines a common vocabulary and the relationships among the concepts.

The idea of ontology-based semantic search for open data can be described as follows. The published datasets are tagged with concepts from a well-defined ontology. Such tags can be defined using metadata. The semantic search is then based on the semantic similarity between the search terms and the concepts annotated for the datasets. Preliminary results from experiments have indicated that semantic search based on ontologies is a promising approach to increase search quality and efficiency.