Algorithm Validation – Using datasets and challenges to assess performance

Validation is an important component of algorithm development. Validation is the process by which developers confirm that a given algorithm meets acceptable levels of accuracy and performance. Achieving effective validation requires a dataset with known input and output parameters, whereby algorithm outputs can be directly compared against the already established output values.

What types of datasets are commonly used in the remote sensing validation process? There are a few different options that can be considered. One is to generate your own dataset, collecting relevant field data in conjunction with an image acquisition campaign. Another approach is to build a synthetic dataset with computer modeling techniques or carefully controlled laboratory methods. And yet a third option is to employ an independent dataset with its own well-defined data parameters. While each of these options are available for assessing algorithm performance, effective validation data is typically difficult to find or create.

With that in mind, a topic related to validation and an example of the independent dataset option, is the concept of algorithm challenges. The objective of an algorithm challenge is to utilize a common set of data and/or specifications as the basis for answering a particular problem. Participants in the challenge work independently or in teams to develop the best solution to that problem, where the top contributors usually receive at least recognition for their accomplishment and in some situations are also awarded a payment or prize. As an example, the Netflix Prize was a challenge focused on developing an improved algorithm for predicting user ratings on films. On a far grander scale, the X Prize Foundation is using the challenge format to address large societal issues, such as in healthcare, genomics and the environment.

DIRS RITIn remote sensing, an excellent example of an algorithm challenge is the Target Detection Blind Test run by the Digital Imaging and Remote Sensing (DIRS) laboratory at the Rochester Institute of Technology. In this challenge, participants are first provided with a dataset for algorithm development and testing, which includes high-resolution hyperspectral imagery, a spectral library of targets to be identified, and the exact locations of those targets in the imagery. A second dataset is then provided, which includes just the imagery and spectral library. The target locations are not specified. Instead, participants must employ their algorithm to generate estimates of the target locations. Results are uploaded to the DIRS website and the estimates are evaluated for accuracy in terms of both correctly identifying target locations and minimizing the number of false identifications. Submitted results are then ranked according to overall effectiveness.

Challenges also impart advantages such as encouraging innovation through competition and mobilizing the community to solve complex problems. Thus, the next time you develop a set of validation data, consider the benefits of transforming this data into a challenge. The resulting impact these datasets can have on the greater community and level of innovation can be immense.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s