NIST Software Assurance Metrics and Tool Evaluation (SAMATE) Project

Michael Kass

Computer Scientist
National Institute of Standards and Technology (NIST)
Information Technology Laboratory

<michael.kass@nist.gov>

Introduction

Software metrics are a means of assuring that an application has certain attributes, such as adequate security. Some software assurance techniques are scanning source code, byte code, or binaries, penetration testing, "sandbox" testing. Source code, byte code, and and binary scanners in particular are important in evaluating a software product because they may detect accidental or intentional vulnerabilities such as "back doors". Additionally, software metrics are essential to help determine what effect, if any, a change in the software development process has on the software quality.

The U.S. Department of Homeland Security is concerned about the effectiveness of software assurance (SA) tools. When an SA tool reports software vulnerabilities, or the lack thereof, for a software product, to what degree can the user be confident that the report is accurate? Does the tool faithfully implement SA checking techniques? What is the rate of false positives and the rate of false negatives for those techniques? Standard testing methodologies with reference matter may help measure a tool's effectiveness.

A more fundamental question is, given that some technique has been used to examine a software product, how much confidence should the user have in the product? In other words, how well do techniques actually measure security, correctness, robustness, etc.? What assurance level can be assigned? How much do different techniques overlap in detecting the same problems (or lack thereof)?

NIST is tasked by DHS to help define these needs. The Software Assurance Metrics and Tool Evaluation (SAMATE) program ^[1]is designed to develop metrics for the effectiveness of SA techniques and tools and to identify deficiencies in software assurance methods and tools.

This paper is targeted at the community of researchers, developers and users of software defect detection tools. In particular, this paper addresses the technical areas where NIST can provide SA tool evaluation support.

Background

The performance, effectiveness, and scope of SA tools vary significantly. For instance, although a tool may be generally classed as a "source code scanner", the scanning methodology employed, the depth and rigor with which that tool identifies software flaws and potential vulnerabilities may be quite different. In addition, different vendors make different trade-offs in performance, precision, and completeness based on the needs of different industrial segments.

Today tool vendors develop and test internally against their own test material and tool metrics. The ability to find code flaws and vulnerabilities is an important metric in measuring a tool's effectiveness. However, there is no common benchmark. A set of common, publicly available programs would allow vendors to independently check their own progress. Small tool producers and researchers would especially benefit from having a collection maintained and checked. Users would be more confident in the capabilities of different tools and be able to choose the tool(s) that are most appropriate for their situation.

Finding code flaws may not be the only metric. How useful is a tool that generates a large number of false positives? Or conversely, how effective is a tool that generates few false positives, but misses many problems (many false negatives)? These concerns must be factored in, too.

Publicly available work is already being done in the area of SA tool evaluations and metrics ([TSAT] and [ASATMS]). NIST is examining these and other existing bodies of work as possible sources of contribution to the SAMATE specifications, metrics and test material.

NIST's Information Technology Laboratory has developed or helped develop specifications and test suites for numerous technologies, including XML, PHIGS, SQL, Smart Card, and computer forensic tools. The highly successful NIST Computer Forensic Tools Testing (CFTT) project ^[2]is a model of tool testing that can be applied to the evaluation of the effectiveness of SA tools. The CFTT framework defines a taxonomy of forensic tool functions, functional specifications of expected tool behavior, and metrics for determining the effectiveness of test procedures.

Vendors are properly concerned about reports on their product. NIST, a part of the U.S. Department of Commerce, is a "neutral" party in the development of testing specifications and test suites. Because NIST's mandate is to support U.S. commerce and business, our role in testing is to help companies improve the quality of the products they bring to market. NIST is not "consumer reports", and does not endorse one company's product over another.

The SAMATE roadmap

By serving as neutral party maintaining an open repository for SA tool testing methods and matter, NIST will provide a common resource for SA tool vendors, researchers and users to measure the suitability and effectiveness of a particular tool or technique.

The SAMATE project will provide an open, free, publicly reviewed resource to SA tool vendors, researchers and users that will include:

A taxonomy of classes of SA tool functions
- NIST will help find or develop a taxonomy based on the state of the art in software assurance tools and techniques
Workshops for SA tool developers and researchers and users to prioritize particular SA tool functions
- Priority may be based upon commonality, criticality, cost efficiency or other factors
- This list will determine areas of focus and specification development
Specifications of SA tool functions
- Based upon focus group results, detailed specifications of functions for particular classes of SA verification will be developed
Detailed test methodologies
- How and which reference applications to use
- Well-defined counting and computing procedures
- Associated scripts and auxiliary functions
Workshops to define and study metrics for the effectiveness of SA functions
- Follow-on workshops to critique methodologies and formalize metrics for SA tools based upon experience
- Incorporate ongoing research and thinking
A set of reference applications with known vulnerabilities
Publish papers in support of the SAMATE metric
- The methodology used to define the functional specifications, test suites, test reports and definition of SA tool metrics will be published by NIST for peer review by the community

Immediate Goals

By introducing the SAMATE project at this workshop, NIST's goals are to:

Solicit participation in upcoming NIST workshops to prioritize functional areas of testing
Identify existing taxonomies, surveys, metrics, etc.
Solicit contributions of example program suites
Discuss issues of importance to vendor, user, and researcher participation

Bibliography

[TSAT] Testing Static Analysis Tools Using Exploitable Buffer Overflows from Open Source Code, Misha Zitser, Richard Lippmann, Tim Leek, Copyright © 2004, ISBN 1-58113-855-5, Proceedings of the 12th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2004, Newport Beach, CA, USA, October 31 - November 6, 2004. ACM 2004, 97-106, http://portal.acm.org/citation.cfm?doid=1029911

[ASATMS] DRAFT Application Security Assessment Tool Market Survey Version 1.0, U.S. Defense Information Systems Agency (DISA), Washington D.C., 2002, https://iase.disa.mil/appsec/index.html

^[1]http://www.cftt.nist.gov , web page for the NIST Computer Forensic Tool Testing Project

^[2]http://samate.nist.gov , web page for the NIST SAMATE Project