Numerous data repositories are available online; however few evaluation criteria have been suggested for these important resources. We suggest that when considering data repositories, two areas need to be evaluated: 1) The repository and 2) The data. Below is a list with a brief description of desirable attributes for both of these areas along with the rating criteria used on this site.
* The starred items are considered critical to creating an online data repository.
Three evaluation systems were used when evaluating the site based on the type of data being evaluated:
- If the feature was present If the feature was not present.
- 0-5 depending on the number of subfeatures within a category. If 5 subfeatures were possible, a star was assigned for each feature. If 9 were possible a star was assigned for each 2 features.
- LIST: the results of evaluating the site were simply listed.
- *Single Entry Point (): Despite the variety of data available from a source, providing a single entry point to the data helps orient the user to the repository and improves the data's findability. Multiple entry points can leave users wondering if they missed something and reduces their ability to create an effective search strategy. A good single entry point should also provide users with assistance navigating the repository.
- *Overview(): An overview of the repository allows the user to understand quickly what is available and what is not. A good overview should provide the user with an understanding of the (1) number, (2) size, (3) type, (4) source, and (5) temporal range of available datasets as well as the range of topics covered.
- Browsable(): Subdividing the data into meaningful categories, which users can quickly browse, facilitates both comprehension of large data sets and navigation.
- *Searchable(): Because data repositories are often large and include a wide variety of data, search becomes an important function. Search should be easily accessible from the home page and the data only (as opposed to the web site and the data).
- *Data Retrieval Formats (LIST): Providing a method for the user to download data tables after locating them, as opposed to cut and paste is a critical feature. Excel formats are an excellent choice for data sets with less than 65,000 rows, although an alternative format should be made available for non-excel users. Larger data sets can be handled by subdividing the data, providing alternative methods of download, or allowing the user to manipulate the data online to reduce the amount of data downloaded. Any download method that requires installation of non-standard software is less desirable, but may be required and for larger data sets. Providing a combination of methods in these situations is likely the best solution to meet multiple users's needs.
- Online Data Interaction(): Providing a mechanism for interacting with the data online, if implemented in a usable fashion, is an excellent extra feature, particularly for larger data sets. It should not replace providing a method of data retrieval since copying and pasting html tables often looses formatting and adds extra data. Generally tabular presentations are easier to read and handle and is better representation for relational data. XML, however, is better representation for hierarchical data. Also XML is more manageable when loaded in memory.
- Visual Interfaces(): When appropriate, providing visual interfaces to locating or understanding the data will improve the user's experience.
- Additional Information (LIST): Information about and links to popular searches, usage reports, or related discussion groups can peak a user's interest and take advantage of data's use by multiple people.
- Feedback Mechanisms(): Methods of providing feedback to the repository such as contact information and an error reporting function are also beneficial.
Data Criteria ()
In addition to the attributes of the repositories, certain information about the data allows a user to assess the data's authority and quality. The following information is recommended for this purpose:
- *Author information
- *Data quality guidelines
- *Method of indicating uncertainty/incompleteness
- Last update date
- Version information
- Data collection data
- Additional data details
- Detailed descriptions
- Reports based on the data
Each site was evaluated for 10-15 minutes. If an item was not found in that time then it was marked as missing. This does not guarantee that the feature did not exist on the site but does imply it was not easily findable. As of May 8, 2006, evaluations were conducted by only one individual based on the criteria listed above. Further work needs to be done to ensure the methods were consistent and replicable.