Title An Evaluation of Metadata Tools and Methods to Assess the Impact and Value of Provenance Management in the PaNData community
Abstract The aggressive update programme and continuous introduction of new generation of detec-tors and cameras across the Photon and Neutron (PaN) community have resulted in a rapid-ly rising awareness of data management problems across the data continuum of the large facilities. Today, facilities, synchrotrons and increasingly neutron facilities, are witnessing the arrival of big data experiments, that is, experiments that collect 10s of TBs of data within a few days of beam time. This has not only putting strain on the frontline “conventional” data services, e.g. data acquisition, storage and archiving, but also motivate the facilities to im-prove its data management and support services across the entire data continuum, from proposals, experiment, data analysis to (data and paper) publications. This is to ensure that the description of the data, in the form of metadata, can be better captured in the first in-stance. And subsequently, they can be better exploited, during and immediately after the data are collected, but also over time when the data is analysed by the people who conduct the experiment and by others who need to exploit the data in other context, for example, for validation, secondary analysis, or meta-analysis. To do that, it is of interest to the facilities, the scientists, and the science communities in general, to keep track of the data, i.e. experiment, analysed, and resultant data, across the data continuum. That is the problem of provenance management for scientific data in the facility science domain. The aim of this deliverable is to assess the impact and value of provenance management in the PaN facilities in the PaNData-ODI project. It analyses the responses gathered from 11 operational large facilities in Europe regarding metadata capturing, storage, usage, and standardisation across the data continuum. Our survey has shown that there is still a long way to go in standarising metadata and metadata management across the data continuum within one facility and across different facilities. However, there are emerging efforts which aim at bridging the gap, including PaNKOS - an ontology for facility science, and ongoing work that leverages standardised metadata publishing and access protocol OAI-PMH to share metadata about experiment data. In particular, we show how PaNKOS can be used to encapsulate contextual metadata for experiment data and how it can be used to tag pro-posals to enhance the understanding of the science conducted in a facility.
Keywords PanData, Data Infrastructure, Data provenance, tools for data provenance, data continuum, ontology for data provenance, research lifecycle, Linked Open Data, data publication, data sharing, Metadata
Language English (EN)
Type Details URI(s) Local file(s) Year
Report PaN-data Open Data Infrastructure (PaNData-ODI) Deliverable Series - D6.4. 2014.
