Title Data Management for Photon and Neutron Sources
Abstract Photon and Neutron sources, such as the UK's Diamond Light Source and ISIS Spallation Neutron Source are large-scale facilities providing high resolution data for crystallography and other materials analysis techniques. Traditionally, the raw data generated from such facilities has been managed by the instrument and user scientists themselves. However, the current generations of such facilities can undertake a large number of experiments, and generate hugely increased volumes of data. As a consequence, the traditional approach has become unsustainable and a more automated approach to data management has had to be developed. In this talk, I shall outline the data management infrastructure developed within STFC to manage raw data. This infrastructure takes an integrated approach to aggregate, store and catalogue data generated at ISIS and Diamond. In particular, I shall describe ICAT, a suite of tools which catalogues data as it is generated by beam lines, and provides access to that raw data to its user community, allowing them to search and retrieve their data, within the facilities themselves or within their home institution. This is provided using a service application programming interface so that a variety of different search and analysis tools can be interfaced to search and access the data, and also register and catalogue derived data. The management of raw data is part of a wider scientific process, starting from proposals for research through to the publication of results. We shall further discuss how the ICAT and similar tools can be extended to support this wider process by allowing data to be federated across a number of different data sources and also linking the raw data to analysed and published data so that the provenance of data can be tracked; this is being considered in the project Integrated Infrastructure in Structural Sciences (I2S2). This allows data to be formally cited and reused, and results to be validated. We relate this work to the publication process being developed by the International Union of Crystallography, tracing the relationship between raw data generated from beam lines, and the CIF files lodged during the publication process. This integrated data infrastructure is being taken forward by the European Photon and Neutron Data Infrastructure initiative (PaNData), a consortium of European photon and neutron sources serving an expanding user community of tens of thousands of scientists across Europe. The experiments in these facilities are of increasing complexity, they are increasingly done by international research groups and many of them will be done in more than one laboratory. The resulting data needs to be accessible over the Internet and remain on-line until the results are published and in many cases much longer to allow re-processing and to allow for the preservation of knowledge. PaNData is developing common data formats, data and software catalogues within the framework of a common data policy.
Organisation ESC , STFC , ESC-SA
Keywords large-scale facilities , Chemistry , pandata , information management , data management , metadata
Language English (EN)
Presentation Presented at XXII Congress and General Assembly of the International Union of Crystallography (IUCr 2011), Madrid, Spain, 22-30 Aug 2011. IUCr-2011-final.ppt 2011