Open-Access Scientific Data – A new option from the Nature Publishing Group

In May 2014 the Nature Publishing Group will be launching a new online publication – Scientific Data – which will focus on publishing citable descriptions of open-access data.

There are many benefits to open-access data sharing, including enhanced collaboration, greater research visibility, and accelerated scientific discovery. However, the logistics of providing efficient data storage and dissemination, and ensuring proper citations for data usage, can be a challenging process if undertaken individually. Fortunately there are a growing number of government sponsored and privately funded data centers now providing these services to the community.

As one of the newest offerings in this domain, Scientific Data is approaching open-access through the publication of Data Descriptors: “peer-reviewed, scientific publications that provide detailed descriptions of experimental and observational datasets.” Data Descriptors are “designed to be complementary to traditional research publications” and can include descriptions of data used in new journal publications, data from previously published research, and standalone data that has its own intrinsic scientific value.

Scientific Data

Scientific Data’s six key principles (source: nature.com)

Because Scientific Data is open-access, there are no fees associated with user access to the Data Descriptors. However, to support and facilitate this open-access, authors must pay an article processing charge for each Descriptor that is published. Authors have the option of publishing their Data using one of three different Creative Commons licenses: Attribution 3.0 Unported (CC BY 3.0), Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0), or Attribution-NonCommercial-Share Alike 3.0 Unported (CC BY-NC-SA 3.0). Each license requires users to properly cite the source of the data, but with varying levels of requirements on how the data can be used and re-shared.

Note that under this model Scientific Data is only publishing the Data Descriptors, and authors must still place the data itself in approved publically available data repositories. This helps ensure data is made readily available to the community without restriction. Approved repositories within the environmental and geosciences currently include the National Climatic Data Center, the NERC Data Centres, and PANGAEA. However, authors can also propose additional data repositories be included in this list.

Scientific Data is now accepting submissions, and offering early adopting authors a discounted article processing charge.

For more info on Scientific Data: http://www.nature.com/scientificdata/

Remote Sensing Archives – An overview of online resources for lidar data

Previous posts on data access have focused on general resources for obtaining remote sensing imagery – Getting your hands on all that imagery – and specific resources related to imaging spectrometry – A review of online resources for hyperspectral imagery. To add to this compendium of data resources, the following includes an overview of online archives for lidar data.

USGS lidar Mount St Helens

Lidar image of Mount St. Helens (courtesy USGS)

Lidar (light detection and ranging), also commonly referred to as LiDAR or LIDAR, is an “active” remote sensing technology, whereby laser pulses are used to illuminate a surface and the reflected return signals from these pulses are used to indicate the range (distance) to that surface. When combined with positional information and other data recorded by the airborne system, lidar produces a three-dimensional representation of the surface and the objects on that surface. Lidar technology can be utilized for terrestrial applications, e.g. topography, vegetation canopy height and infrastructure surveys, as well as aquatic applications, e.g. bathymetry and coastal geomorphology.

Below is an overview of archives that contain lidar data products and resources:

  • CLICK (Center for LIDAR Information Coordination and Knowledge) provides links to different publically available USGS lidar resources, including EAARL, the National Elevation Dataset and EarthExplorer. The CLICK website also hosts a searchable database of lidar publications and an extensive list of links to relevant websites for companies and academic institutions using lidar data in their work.
  • EAARL (Experimental Advanced Airborne Research Lidar) is an airborne sensor system that has the capacity to seamlessly measure both submerged bathymetric surfaces and adjacent terrestrial topography. By selecting the “Data” tab on the EAARL website, and then following links to specific surveys, users can view acquisition areas using Google Maps and access data as ASCII xyz files, GeoTIFFs and LAS files (a standardized lidar data exchange format).
  • NED (National Elevation Dataset) is the USGS seamless elevation data product for the United States and its territorial islands. NED is compiled using the best available data for any given area, where the highest resolution and most accurate of which is derived from lidar data and digital photogrammetry. NED data are available through the National Map Viewer in a variety of formats, including ArcGRID, GeoTIFF, BIL and GridFloat. However, to access the actual lidar data, and not just the resulting integrated products, users need to visit EarthExplorer.
  • EarthExplorer is a consolidated data discovery portal for the USGS data archives, which includes airborne and satellite imagery, as well as various derived image products. EarthExplorer allows users to search by geographic area, date range, feature class and data type, and in most cases instantly download selected data. To access lidar data, which are provided as LAS files, simply select the lidar checkbox under the Data Sets tab as part of your search criteria.
  • JALBTCX (Joint Airborne Lidar Bathymetry Technical Center of Expertise) performs data collection and processing for the U.S. Army Corps of Engineers, the U.S. Naval Meteorology and Oceanography Command and NOAA. The JALBTCX website includes a list of relevant lidar publications, a description of the National Coastal Mapping Program, and a link to data access via NOAA’s Digital Coast.
  • Digital Coast is a service provided by NOAA’s Coastal Services Center that integrates coastal data accessibility with software tools, technology training and success stories. Of the 950 data layers currently listed in the Digital Coast archive, lidar data represents nearly half of the available products. Searching for lidar data can be achieved using the Data Access Viewer and selecting “elevation” as the data type in your search, or by following the “Data” tab on the Digital Coast home page and entering “lidar” for the search criteria. The data is available in a variety of data formats, including ASCII xyz, LAS, LAZ, GeoTIFF and ASCII Grid, among others.
  • NCALM (National Center for Airborne Laser Mapping) is a multi-university partnership funded by the U.S. National Science Foundation, whose mission is to provide community access to high quality lidar data and support scientific research in airborne laser mapping. Data is accessible through the OpenTopography portal, either in KML format for display in Google Earth, as pre-processed DEM products, or as lidar point clouds in ASCII and LAS formats.

Lidar can be useful on its own, e.g. topography and bathymetry, and can also be merged with other remote sensing data, such as multispectral and hyperspectral imagery, to provide valuable three-dimensional information as input for further analysis. For example, lidar derived bathymetry can be used as input to improve hyperspectral models of submerged environments in the coastal zone. There has also been more widespread use of full-waveform lidar, which provides increased capacity to discriminate surface characteristics and ground features, as well as increased use of lidar return intensity, which can be used to generate a grayscale image of the surface.

What is readily apparent is that as the technology continues to improve, data acquisition becomes more cost effective, and data availability increases, lidar will play an important role in more and more remote sensing investigations.

HyPhoon is coming!

HyPhoon

HySpeed computing is proud to announce the coming launch of HyPhoon:

  • A gateway for the access and exchange of datasets, applications and knowledge.
  • A pathway for you to expand your impact and extend your community outreach.
  • A framework for the development and deployment of scientific applications.
  • A resource for obtaining and sharing geospatial datasets.
  • A mechanism for improved technology transfer.
  • A marketplace for scientific computing.

The initial HyPhoon release, coming soon in mid-2013, will focus on providing the community with free and open access to remote sensing datasets. This data will be available for the community to use in research projects, class assignments, algorithm development, application testing and validation, and in some cases also commercial applications. In other words, in the spirit of encouraging innovation, these datasets are offered as a community resource and open to your creativity. We look forward to seeing what you accomplish.

We’ll be announcing the official HyPhoon release here, so stay tuned to be the first to access the data as soon as it becomes available!

Our objective when developing these datasets has been to focus on quality rather than any predefined set of content requirements. Thus, dataset contents are variable. Many of the datasets include a combination of imagery, validation data, and example output. Some datasets include imagery of the same area acquired using different sensors, different resolutions, or different dates. And other datasets simply include unique image examples.

The datasets originate largely from the community itself. In some cases data also originates from public domain repositories as well as from commercial image providers. We are also interested in hearing your thoughts on new datasets that will benefit the community. Contact us with your ideas and if our review team approves the project then we will work with you to add your data to the gateway.

Beyond datasets, HyPhoon will also soon include a marketplace for community members to access advanced algorithms, and sell user-created applications. Are you a scientist with an innovative new algorithm? Are you a developer who can help transform research code into user applications? Are you working in the application domain and have ideas for algorithms that would benefit your work? Are you looking to reach a larger audience and expand your impact on the community? If so, we encourage you to get involved in our community.

For more on HySpeed Computing: www.hyspeedcomputing.com

The National Strategy for Earth Observation – Data management and societal benefits

White House OSTP

Office of Science and Technology Policy

Earlier this month the U.S. National Science and Technology Council released its report on the National Strategy for Civil Earth Observations. This is the first step towards building a National roadmap for the more efficient utilization and management of U.S. Earth observing resources.

Current U.S. capabilities in Earth observation, as summarized in the report, are distributed across more than 100 different programs, including those at both Federal agencies and various non-Federal organizations (e.g., state and local governments, academic institutions, and commercial companies). This extends far beyond just the well-known satellite programs operated by NASA and NOAA, encompassing a variety of other satellite and airborne missions being conducted around the country, as well as a host of other land- and water-based observing systems. From a National perspective this represents not just a complex array of programs and organizations to manage, but also an increasingly voluminous collection of data products and information to store and make available for use.

With an objective towards improving the overall management and utilization of the various Earth observing resources, the National Strategy outlines two primary organizational elements. The first element addresses a “policy framework” for prioritizing investments in observing systems that support specified “societal benefit areas,” and the second element speaks to the need for improved methods and policies for data management and information dissemination.

The National Strategy also lays the foundation for ultimately developing a National Plan for Civil Earth Observations, with initial publication targeted for fiscal year 2014 and subsequent versions to be repeated every three years thereafter. As indicated by its title, the National Plan will provide the practical details and fundamental information needed to implement the various Earth observing objectives. Additionally, by periodically revisiting and reassessing technologic capabilities and societal needs, the “approach of routine assessment, improved data management, and coordinated planning is designed to enable stable, continuous, and coordinated Earth-observation capabilities for the benefit of society.”

The overall motivation behind the National Strategy and National Plan is the recognized societal importance of Earth observation. Specifically, “Earth observations provide the indispensable foundation for meeting the Federal Government’s long-term sustainability objectives and advancing U.S. social, environmental, and economic well-being.” With that in mind, the National Strategy specifies twelve key “societal benefit areas”: agriculture and forestry, biodiversity, climate, disasters, ecosystems, energy and mineral resources, human health, ocean and coastal resources and ecosystems, space weather, transportation, water resources, weather, and reference measurements. Also deemed relevant are the various technology developments that span across all focus areas, such as advances in sensor systems, data processing, algorithm development, data discovery tools, and information portals.

The National Strategy additionally presents a comprehensive outline for a unified data management framework, which sets the fundamental “expectations and requirements for Federal agencies involved in the collection, processing, stewardship, and dissemination of Earth-observation data.” The framework addresses needs across the entire data life cycle, beginning with the planning stages of data collection, progressing through data organization and formatting standards, and extending to data accessibility and long-term data stewardship. Also included is the need to provide full and open data access to all interested users, as well as optimize interoperability, thereby facilitating the more efficient exchange of data and information products across the entire community.

With this National Strategy, the U.S. is defining a unified vision for integrating existing resources and directing future investments in Earth observation. We are looking forward to reading the upcoming National Plan, which is targeted for release later this year.

To access a copy of the National Strategy report, visit the Office of Science and Technology Policy: http://www.whitehouse.gov/administration/eop/ostp

Remote Sensing Data Access – A review of online resources for hyperspectral imagery

Hyperspectral CubeIn our previous post – Remote Sensing Data Archives – we explored some of the many general online data discovery tools for obtaining remote sensing imagery. We now sharpen our focus to the field of hyperspectral remote sensing, aka imaging spectrometry, and delve into resources for accessing this particularly versatile type of imagery.

Hyperspectral imaging emerged on the remote sensing scene in the 1980s, originating at the Jet Propulsion Laboratory (JPL) with the development and deployment of the Airborne Imaging Spectrometer (AIS), followed soon thereafter by the Airborne Visible Infrared Imaging Spectrometer (AVIRIS). Since then hyperspectral imaging has evolved into a robust remote sensing discipline, with satellite and airborne sensors contributing to numerous applications in Earth observation, and other similarly sophisticated sensors being used for missions to the moon and Mars.

The premise behind hyperspectral imaging is that these sensors measure numerous, relatively narrow, contiguous portions of the electromagnetic spectrum, thereby providing detailed spectral information on how electromagnetic energy is reflected (or emitted) from a surface. To give this some perspective and provide an example, this can equate to measuring the visible portion of the spectrum using 50 or more narrow bands as opposed to three broad bands (i.e., red, green and blue) that we typically see with cameras and our eyes. Because objects (plants, soil, water, buildings, roads, etc…) reflect light differently as a function of their composition and structure, this enhanced spectral resolution offers more information with which to identify and map features on the Earth’s surface.

For those interested in hyperspectral remote sensing, and curious to see what can be achieved using this type of data, let’s look at some of the archives that are available:

  • Hyperion – The Hyperion sensor (220 bands; 400-2500nm; 30m resolution) is located on NASA’s EO-1 satellite, and although deployed in 2000 as part of a one-year demonstration mission, the satellite and its onboard sensors have shown remarkable stamina, continuing to collect data today. Archive data from Hyperion are available through both Earth Explorer and GloVis, and new data can be requested through an online Data Acquisition Request (DAR).
  • HICO – The Hyperspectral Imager for Coastal Ocean sensor (128 bands; 350-1080nm; 90m resolution) was installed on the International Space Station (ISS) in 2009 and is uniquely configured for the acquisition of ‘dark’ targets, specifically coastal aquatic areas. The sensor was initially developed and sponsored by the Office of Naval Research, with continuing support now provided through NASA’s ISS Program. Archive data from HICO, as well as requests for new data, are available through the HICO website hosted by Oregon State University; however, interested users must first submit a short proposal to become part of the HICO user community.
  • CHRIS – The Compact High Resolution Imaging Spectrometer (18-62 bands; 410-1020nm; 17-34m resolution) is the main payload on ESA’s Proba-1 satellite, which was launched in 2001. As with the EO-1 satellite, Proba-1 was only intended to serve as a short-lived technology demonstrator, but has managed to continue collecting valuable science data for more than a decade. Data from CHRIS are available to registered users, obtained via submittal and acceptance of a project proposal, through ESA’s Third Party Missions portfolio on Earthnet Online.
  • AVIRIS – The Airborne Visible Infrared Imaging Spectrometer (224 bands; 400-2500nm, 4-20m resolution) has been supporting hyperspectral projects for more than two decades, and can be credited as a true pioneer in the field. AVIRIS is most commonly flown onboard a Twin Otter turboprop or ER-2 jet, but has also been configured to operate from several other airborne platforms. Images from 2006-2011 are available through the AVIRIS Flight Data Locator, with plans to soon expand this archive to include additional imagery from 1992-2005 (currently available through request from JPL).
  • NEON – The National Ecological Observatory Network is a continental-scale network of 60 observation sites located across the United States, where a standardized set of field and airborne data are being collected to support ecological research. Remote sensing data are being acquired via the Airborne Observation Platform, which includes a high-resolution digital camera, waveform LiDAR, and imaging spectrometer. The NEON project is adapting an open data policy, but data acquisition and distribution tools are currently still in development. Thus, initial “prototype” data, which includes a sampling of hyperspectral imagery, are being made available through the NEON Prototype Data Sharing (PDS) system.
  • TERN – The Terrestrial Ecosystem Research Network is an Australian equivalent of NEON, providing a distributed network of observation facilities, datasets, map products and analysis tools to support Australian ecosystem science. Within this larger project is the AusCover facility, which leads the remote sensing image acquisition and field validation efforts for TERN. Current hyperspectral datasets available through AusCover include both airborne data and a comprehensive collection of Hyperion imagery. Data are accessible through the TERN Data Discovery Portal and the AusCover Visualization Portal.

These aren’t the only hyperspectral instruments in operation. There are new instruments, such as the Next Generation AVIRIS (AVIRIS-NG), Hyperspectral Thermal Emission Spectrometer (HyTES) and Portable Remote Imaging Spectrometer (PRISM), which all recently conducted their first science missions in 2012. There are a growing number of hyperspectral programs and instruments operated by government agencies and universities, such as the NASA Ames Research Center and the Carnegie Airborne Observatory (CAO). There are various airborne sensors operated or produced by commercial organizations, such as the Galileo Group, SpecTIR, HyVista and ITRES. And there are also a number of new satellite-based sensors on the horizon, including HyspIRI (NASA), EnMAP (Germany), PRISMA (Italy) and HISUI (Japan).

It’s an exciting field, with substantial growth in both sensor technology and analysis methods continuing to emerge. As the data becomes more and more available, so too does the potential for more researchers to get involved and new applications to be developed.

Remote Sensing Data Archives – Getting your hands on all that imagery

Remote Sensing DataThere are now vast collections of remote sensing imagery available, much of it readily available for you to download, but it’s not always obvious where and how to access these archives. Below we explore some of the many publically available resources where users can search for and download remote sensing data for their own projects.

As you would expect, government agencies support some of the largest remote sensing data resources, most notably NASA in the U.S. and the ESA in Europe. These agencies provide robust web-clients that can be easily used to discover and download extensive collections of Earth observing data:

  • For NASA, the centralized go-to data repository can be found on the Earthdata website, which itself provides an integrated portal for accessing a wealth of information related to NASA’s Earth Observing System Data and Information System (EOSDIS). Within the Earthdata website you will find links to Reverb, the “Next Generation Earth Science Discovery Tool”, which allows users to search and explore more than 3200 different datasets distributed throughout NASA’s 12 EOSDIS Data Centers.
  • For the ESA, which represents an international consortium of more than 20 European Member States, Earth observing data is primarily hosted through Earthnet Online. This website offers users access to data from the full collection of different ESA Earth Observing Missions, Third Party Missions, ESA Campaigns, and GMES Space Component data.

For other entry points to U.S. data archives, you can also visit the USGS Global Visualization Viewer (GloVis) or USGS EarthExplorer (EarthExplorer) to access data from particular sets of sensors. Alternatively, one can directly visit the various EOSDIS Data Centers, which each provide their own unique data discovery tools, such as the NSDIC Data Search tool at the National Snow & Ice Data Center (NSDIC) Distributed Active Archive Center (DAAC) and the Mercury tool at the Oak Ridge National Laboratory (ORNL) DAAC for Biogeochemical Dynamics. For projects with time dependency constraints, such as natural disaster monitoring, there is also the option to download near real-time data from certain sensors using the Land Atmosphere Near Real-Time Capability for EOS (LANCE) tool. And for data from NOAA’s archives, the Office of Satellite and Product Operations (OSPO) provides links to a number of different data discovery tools, including NOAA’s Comprehensive Large Array-Data Stewardship System (CLASS). In the end, there is usually more than one way to reach the same data; it’s really a question of what tools you find easiest to use and which are most relevant to your intended application.

Searchable archives are also similarly available amongst various space agencies in other countries. For example, the Japan Aerospace Exploration Agency (JAXA) hosts the Earth Observation Research Center (EORC) Data Distribution Service (DDC), and the Indian Space Research Organisation (ISRO) offers Bhuvan, the geoportal for the National Remote Sensing Centre (NRSC) Open EO Data Archive (NOEDA).

And there are also commercial archives, such as the DigitalGlobe ImageFinder, which include high resolution satellite and aerial imagery from around the globe. While images from these archives do have a price tag, given the high spatial resolution and global coverage, such imagery can be an excellent resource for many different applications.

The above compilation is but a subset of what is ultimately available for users to access. The full extent of imagery that can be obtained, particularly when considering the many secondary data resources available from individual entities and researchers, is truly astounding. Additionally, as more and more Earth observing satellites are launched, and as airborne imagery becomes more cost efficient and easier to collect, the scope and number of both government and commercial archives will continue to expand.

What will remain a challenge is for these archives to maintain robust data discovery tools that can be used access the growing volume of data, that can adapt to new sensors and new image formats, and that can integrate data across different archives. As evident above, great progress has been made in this domain, and developers continue to explore and implement new tools for managing this valuable global resource.

So get out there and put these data discovery tools to work for your project.

Accessing The Oceans – See how Marinexplore is connecting users with a world of data

Are you working on an oceanographic or marine related project where you need to identify and access the many data resources available for your study area? Marinexplore is now making this process easier than ever.

As with many fields of research, the realm of ocean science includes a staggering volume of data that has already been collected, and continues to be collected, by different organizations and government entities around the world. While there is a general movement throughout science towards improved data availability, greater standardization of data formats, and increased adoption of data interoperability standards, efficiently searching and accessing all of this data can still be a cumbersome task.

Marinexplore

MARINEXPLORE

To address this challenge, Marinexplore has created a centralized resource for the ocean science community to quickly access multiple data sources from a single framework. Using an interface built on top of Google Maps, users can easily search from amongst the many available data collections, select relevant data for a particular project, and download the resulting dataset in a single file. Not only does this cut down on search time, it also simplifies a number of data integration and preprocessing steps. Users can also save, store and share created datasets, as well as collaborate with other users.

So how does it work? Marinexplore currently has access to more than 1.2 billion in situ measurements. This predominantly includes publically available data that has been acquired from ocean instruments, such as buoys, drifters, fixed platforms, and ships, as well as products generated from satellite sensors. Data access is a free service, but users must first register with Marinexplore to set up an account. Users can then create up to three datasets per day, where the size is initially limited to no more than 5 million measurements per dataset, but with options to significantly expand this limit to 25 million measurements per dataset (and beyond) by referring other users.

Marinexplore reportedly also has plans to expand functionality of its system, such as providing an API (Application Programming Interface) for developing specialized applications, functionality for data streaming, and the ability to run oceanographic models. But these features have yet to be added. For now Marinexplore is focused on establishing a user community and delivering data to interested users.

So go check out the data, and see what’s available for you to use on your next project.

For more information on Marinexplore: http://marinexplore.com/

Data Management and You – A broader look at research data requirements

This is Part 2 of a discussion series on data management requirements for government funded research.

As discussed in the previous installment of this series, data management has become an integral requirement of government funded research projects. Not only are there considerations related to the fact that the research was supported using taxpayer funding, and hence the data should be made available to the public, but data sharing also helps expand the impact and influence of your own research.

Part 1 of this series focused on the data management requirements of the National Science Foundation (NSF). In Part 2 below we look at the National Aeronautics and Space Administration (NASA), the Australian Research Council (ARC), and the Research Councils United Kingdom (RCUK).

NASAAs with the NSF proposal process, NASA requires a data-sharing plan to be incorporated as part of any proposal response. Specifically, as described in the NASA Guidebook for Proposers, the “Proposer shall provide a data-sharing plan and shall provide evidence (if any) of any past data-sharing practices.” Unlike NSF, which requires a separate two-page plan, the NASA data-sharing plan must be incorporated within the main body of the proposal as part of the Scientific-Technical-Management section. Additionally, as something important to keep in mind, NASA also specifies that “all data taken through research programs sponsored by NASA are considered public”, “NASA no longer recognizes a ‘proprietary’ period for exclusive use of any new scientific data”, and that “all data collected through any of its funded programs are to be placed in the public domain at the earliest possible time following their validation and calibration.” This means no more holding data in reserve until such time as a researcher has completed their work and published their results. Instead, NASA is taking a strong stand on making its data publically available as soon as possible.

RCUKLooking now to the United Kingdom, the RCUK explicitly defines data sharing as a core aspect of its overall mission and responsibility as a government organization. As part of its Common Principles on Data Policy, RCUK states that “publically funded research data are a public good, produced in the public interest, which should be made openly available with as few restrictions as possible in a timely and responsible manner.” To achieve this objective, the individual Research Councils that comprise the RCUK each incorporate their own specific research requirements that conform to this policy. For example, the Natural Environment Research Council (NERC) specifies in its Grants and Fellowships Handbook that each proposal must include a one-page Outline Data Management Plan. If funded, researchers will then work with the NERC Environmental Data Centres to devise a final Data Management Plan. And at the conclusion of the project, researchers will coordinate with the Data Centres to transfer their data and make it available for others to use.

ARCThe Australian Research Council also encourages data sharing as an important component to funded research projects. While the ARC does not specify the need for data management plans in its proposals, the policies listed in the ARC Funding Rules explicitly encourage “depositing data and any publications arising from a research project in an appropriate subject and/or institutional repository.” Additionally, as part of the final reporting requirements for most ARC awards, the researcher must specify “how data arising from the project have been made publically accessible where appropriate.” It is also common amongst the various funding opportunities to include a discussion in the required Project Description on strategies to communicate research outcomes. While not explicitly stated, data sharing can certainly play an important role in meeting such needs to disseminate and promote research achievements.

Government agencies clearly recognize the importance of data, and are making it a priority in their research and proposal requirements. So don’t forget to include data management as part of your next proposal planning process.

Data Management and You – A look at NSF requirements for data organization and sharing

This is Part 1 of a discussion series on data management requirements for government funded research.

NSF LogoData is powerful. From data comes information, and from information comes knowledge. Data is also a critical component in quantitative analysis and for proving or disproving scientific hypotheses. But what happens to data after it has served its initial purpose? And what are your obligations, and potential benefits, with respect to openly sharing data with other researchers?

Data management and data sharing is viewed with growing importance in today’s research environment, particularly in the eyes of government funding agencies. Not only is data management a requirement for most proposals using public funding, but effective data sharing can also work in your favor in the proposal review process. Consider the difference between two accomplished scientists, both conducting excellent research and publishing results in top journals, but only one of the scientists has made their data openly available, with 1000s of other researchers already accessing the data for further research. Clearly, the scientist who has shared data has created substantial additional impact on the community and facilitated a greater return on investment beyond the initially funded research. Such accomplishments can and should be included in your proposals.

As one example, let’s examine the data management requirements for proposals submitted to the U.S. National Science Foundation. What is immediately obvious when preparing a NSF proposal is the need to incorporate a two-page Data Management Plan as an addendum to your project description. Requirements for the Data Management Plan are outlined in the “Proposal and Award Policies and Procedures Guide” (2013) within both the “Grant Proposal Guide” and the “Award & Administration Guide.” Note that in some cases there are also specific data management requirements for particular NSF Directorates and Divisions, which need to be adhered to when submitting proposals for those programs.

To quote from the Data Management Plan: “Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants. Grantees are expected to encourage and facilitate such sharing.” Accordingly, the proposal will need to describe the “types of data… to be produced in the course of the project”, “the standards to be used for data and metadata format”, “policies for access and sharing”, “policies and provisions for re-use, re-distribution, and the production of derivatives”, and “plans for archiving data… and for preservation of access.” Proposals can not be submitted without such a plan.

As another important consideration, if “any PI or co-PI identified on the project has received NSF funding (including any current funding) in the past five years”, the proposal must include a description of past awards, including a synopsis of data produced from these awards. Specifcally, in addition to a basic summary of past projects, this description should include “evidence of research products and their availability, including, but not limited to: data, publications, samples, physical collections, software, and models, as described in any Data Management Plan.”

Along these same lines, NSF also recently adjusted the requirements for the Biographical Sketch to specify “Products” rather than just “Publications.” Thus, in addition to previous items in this category, such as publications and patents, “Products” now also includes data.

The overall implication is that NSF is interesting in seeing both past success in impacting the community through data sharing and specific plans on how this will be accomplished in future research. Be sure to keep this this in mind when writing your next proposal. And remember… data is powerful.

For more information on NSF proposal guidelines: http://www.nsf.gov/bfa/dias/policy/

HySpeed Computing – Reviewing our progress and looking ahead

Join HySpeed Computing as we highlight our accomplishments from the past year and look ahead to what is sure to be a productive 2013.

The past year has been an eventful period in the life of HySpeed Computing. This was the year we introduced ourselves to the world, launching our website (www.hyspeedcomputing.com) and engaging the community through social media platforms (i.e., using the usual suspects – Facebook, LinkedIn and Google+). If you’re reading this, you’ve found our blog, and we thank you for your interest. We’ve covered a variety of topics to date, from community data sharing and building an innovation community to Earth remote sensing and high performance computing. As our journey continues we will keep sharing our insights and also welcome you to participate in the conversation.

August of 2012 marked the completion of work on our grant from the National Science Foundation (NSF). The project, funded through the NSF SBIR/STTR and ERC Collaboration Opportunity, was a partnership between HySpeed Computing and the Bernard M. Gordon Center for Subsurface Sensing and Imaging Systems at Northeastern University. Through this work we were able to successfully utilize GPU computing to accelerate a remote sensing tool for the analysis of submerged marine environments. Our accelerated version of the algorithm was 45x faster than the original, thus approaching the capacity for real-time processing of this complex algorithm.

HySpeed Computing president, Dr. James Goodman, also attended a number of professional conferences and meetings during 2012. This included showcasing our achievements in geospatial applications and community data sharing at the International Coral Reef Symposium in Cairns, Australia and the NASA HyspIRI Science Workshop in Washington, D.C. and presenting our accomplishments in remote sensing algorithm acceleration at the GPU Technology Conference in Pasadena, CA and the VISualize Conference in Washington, D.C. Along the way we met, and learned from, a wonderfully diverse group of other scientist and professionals. We are encouraged by the direction and dedication we see in the community and honored to be a contributor to this progress.

HyPhoonSo what are we looking forward to in 2013? You heard it here first – we are proud to soon be launching HyPhoon, a gateway for accessing and sharing both datasets and applications. The initial HyPhoon release will focus on providing the community with free and open access to remote sensing datasets. We already have data from the University of Queensland, Rochester Institute of Technology, University of Puerto Rico at Mayaguez, NASA, and the Galileo Group, with additional commitments from others. This data will be available for the community to use in research projects, class assignments, algorithm development, application testing and validation, and in some cases also commercial applications. In other words, in the spirit of encouraging innovation, these datasets are offered as a community resource and open to your creativity. We look forward to seeing what you accomplish.

Connect with us through our website or via social media to become pre-registered to be the first to access the data as soon as it becomes available!

Beyond datasets, HyPhoon will also soon include a marketplace for community members to access advanced algorithms, and sell user-created applications. Are you a scientist with an innovative new algorithm? Are you a developer who can help transform research code into user applications? Are you working in the application domain and have ideas for algorithms that would benefit your work? Are you looking to reach a larger audience and expand your impact on the community? If so, we encourage you to get involved in our community.

HySpeed Computing is all about accelerating innovation and technology transfer.