HySpeed Computing explores the concepts and ideas behind community data sharing.
Data is the foundation on which scientific research is built. Data is the basis for testing scientific hypotheses, developing new analysis techniques, deducing substantive correlations, identifying systematic trends, and generating research products. But what happens to this critical data once the report has been written and the paper has been published? What is the data legacy?
More often than not data has value beyond its initial use. For instance, data can be used by other researchers to corroborate results, integrate findings into larger more comprehensive data sets, investigate new hypothesis without replicating data collection efforts, form the basis for new research directions, and serve as example data for student research projects. While such data is sometimes made available to the community with these benefits in mind, it is all too often relegated to a dusty storage cabinet or forgotten computer hard drive.
To overcome this shortfall, there is an increasing movement towards data sharing. For example, as of January 2011, the U.S. National Science Foundation (NSF) requires research projects funded by the agency to include a Data Management Plan. The intent is that “Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants.” In another example, the Australian government is similarly encouraging data sharing through its AusGOAL program, which “provides support and guidance to government and related sectors to facilitate open access to publicly funded information.”
Open data sharing, however, is not a straightforward objective to achieve. There are many questions that need to be considered. For example: Who is responsible for maintaining the data archive – the researcher, the funding agency, the government? How long should data be retained and made available – five years, ten years, indefinitely? What are the costs associated with data storage – hardware, software, support personnel? Who should have access to the data – academics, government, industry? What are the licensing agreements associated with the data – public domain, research only, commercial? Additionally, some data can’t be made openly available due to concerns with privacy (e.g., medical research), confidentiality (e.g., intellectual property), or security (e.g., national defense).
Despite these challenges, many people are actively contributing data to the community, thereby extending the utility of their research and expanding their own influence. This trend is expected to continue. Data is an integral part of our intellectual growth and community knowledge. Are you sharing your data?