Community Data – Not all data is equal

HySpeed Computing explores the concepts and ideas behind community data sharing.

A common theme heard throughout the scientific community is the need for more open and more effective data sharing mechanisms. However, not all data is created equal nor should there be a single methodology or pathway for distributing data.

So what are the differences in data? Data can be categorized differently as a function of its origin and intended use. Accordingly, each data type has correspondingly different considerations associated with sharing. Below are examples of four main categories of data types.

Application Data. This category includes data that is routinely utilized for fulfilling the implementation needs of one or more applications. For example, satellite imagery from the suite of Landsat sensors provides a multidisciplinary resource for a broad array of earth observing applications, e.g., forestry, agriculture, coastal, urban monitoring projects, etc. Such data is typically housed in large repositories, offering users access to the data, but typically with little additional information beyond descriptors of the data characteristics and application domains.

Development Data. This is data utilized to develop new algorithms and analysis techniques. For example, data collected from a new instrument, such as the next generation AVIRIS sensor, is provided to the science community to test sensor performance and explore new analysis capabilities. These types of data are usually offered in smaller data repositories, sometimes with more restricted access, but typically include additional supporting documentation beyond just the data characteristics, such as sensor design information, science discussions and research results.

Validation Data. This category refers to data used for an existing research discovery and offered as a resource for others to validate the same findings. For example, satellite data documenting the declining ice coverage in the arctic regions is made available for multiple research groups to independently assess and validate conclusions on global change. As with the development category, such data is typically offered in smaller data repositories, which in addition to the data characteristics contain summaries of existing research methods and results that have already been obtained using the data.

Private Data. This type of data is that which contains personal or confidential information that if released could cause harm or damage. For example, imagery of military facilities can contain details that are inappropriate to be released publicly. In some cases such data can be openly distributed if deemed no longer sensitive, or safeguards are in place to restrict distribution or conceal particular elements, but in most cases such data is appropriately kept confidential.

Note that data categories are not exclusive of one another. A given dataset can easily fall into more than one category. The important point is to recognize the particular characteristics of the data and share it appropriately and as openly as possible.

HySpeed Computing will continue to explore different aspects of community data sharing in future posts, and will soon also be releasing its own data access portal.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s