CMSC 631 Notes for: A static analyzer for finding dynamic programming errors William R. Bush, Jonathan D. Pincus and David J. Sielaff Prepared by Matthew Snover (snover@cs) 11/21/02 This paper describes a static analyzer, PREfix, for finding dynamic programming errors. Previoius Static Checkers: Lint Select (also uses path simulation) Annotation checkers: LCLint Aspect Extended Static Checker monadic second-order logic checker Previous Dynamic Checkers: Purify The primary motivation behind this work is to make a purify-like tool that can be used statically, rather than dynamically. Usability was the main design concern. PREfix assumes that most errors occur between functions, often due to multiple programmers working on different assumptions. PREfix essentially tries to do a dynamic analysis of a program from a static viewpoint to avoid the problems that arise when using test cases. The goals of the work are as follows: 1) Program must be easy to use and not require annotations of code base. 2) Errors reported must be meaningful and useful to programmar. 3) Only Achievable Program Paths should be analyzed To reduce extraneous error reporting. 4) Must be usable on large industrial code bases. How Does it Work? 1) The source code is first parsed. (this can take around 20% of the runtime on large projects). 2) Models for each function are made. Every possible path (up to a user defined maximum) are executed and memory is modelled for each path. - Contraints on the parameters of function are set so that if violated an error condition would occur. - Possible return values, and side effects to externals, are calculated. Paths leading to the same result are potentially merged. - Guards are requirements on parameters to the functions, with one correponding to each return value. Guards calculated so that the proper model of the function is used when the proper arguements to functions are used. Recursive functions are only recursed two times (though this can be varied by user parameter). Whenever a function calls another function the proper model for that function's behavior is found and used. 3) Possible errors are stored in a database along with the conditions required to trigger the error. The path of execution causing the error and a 'meaningful' message describing the bug are listed. 4) The user can then use a web-interface to browse and filter the bugs. - Common errors that programmers often do not care about (such as out-of-memory errors) can be filtered out. 5) In addition many basic functions of the langauge, such as malloc, free, fopen, fclose, have been hand modelled for greater accuracy, (and because the semantics of those methods are not otherwise known to PREfix). - Such hand modelling might be imployed by the user as a form of annotation, allowing for even better error detection on project specific requirements, though this is by no means necessary. The type of defects that are detected and found by PREfix are as follows: 1) Memory errors - use of uninitialized memory - dereferencing null pointers - memory leaks - attempts to access freed memory 2) Resource misuse use of resources in improper states such as: - failure to close file handles - use of unopened file handles The are several types of errors that PREfix does not attempt to detect, such as: 1) Assuring null terminated strings are always null terminated 2) Array bounds checking 3) String bounds checking 4) Memory size copying. (You can copy memory into a size that is too small) Results: PREfix was tested on three programs (Apache, Mozilla, and a modified GDI demo), perhaps too small of a number to give accurate findings. Only Mozilla is of substantial size, and thus is perhaps the only case worth investigating. Most of the results presented focus on the results when running on Apache. PREfix took 11 hours to run on Mozilla (only 15 minutes on Apache), and reported 1159 warnings. It was estimated that 25% of these errors were incorrect. The vast majority of warnings were of "Deferenceing a NULL pointer" and "Using uninitalized memory". To help justify the runtime, the experiments were performed on a 266 MGHZ pentium, and the compile time for mozilla was approximatly 2 hours. The majority of errors detected occured in very infrequently executed portions of the code, such as when an error condition is detected, or the programs needs to exit abrubtly. Such conditions are hard to generate with test cases (as would be needed if using purify), as they happen only under rare conditions. An exact breakdown as to how many of these reported errors programmers found 'useful' was not available. There was also no direct comparison to purify (or any other error detection program), which PREfix identifies as it primary competition (despite its being dynamic). Limitations: 1) Slow. PREfix is typically run overnight, and thus is not suited as interactive tool. 2) Incomplete. PREfix cannot find all bugs. 3) Since it is static and does not use annotations - there is no way to limit its calculations to reasonable inputs of the program. (this is not mentioned in paper but seems an unescapable limitation. - dynamically loaded libraries and those for which source code is unavailable are difficult to model, resulting in conservative anaylsis which will miss many errors. 4) There does not appear to be any handling of concurrency in the modelling or path execution. PREfix seems to assume that the code is not multithreaded. (This is my own opinion and not mentioned in paper) 5) Very slow and very incomplete is worst case, though these concerns apparently do not occur in practice. Because the number of paths in a function can be very, very large checking them all could be intractable, or if only a limited number are checked then a great number of the paths, and thus many possible errors, would be ignored. An attempt to make PREfix more interactive, by Pincus, has resulted in a new program, PREfast, which is meant to be a very fast, but less accurate, very of PREfix. Such a tool can be used by the programmer to aid in coding, much as the code and recompile method often used. PREfast also reportedly has more verbose error messages, which users will are more likely to believe. It seems to me that PREfix, as it stands, could be used interactively by storing the models for functions which the user has not changed and which are not affected by the modified function. Only those functions whose models have changed would need to be reanalyzed. The first run of PREfix would still take a long time, but could be done overnight. For example if PREfix reports a bug which is caused by an error in procedure foo(), the programmer could then fix the error, and PREfix could reanalyze foo(), and any functions that call foo(). If the models for the calling functions change then PREfix would need to recursivly reanalyze the functions that call them. Except for major or fundamental changes in the code, only a small portion of PREfix would need to be rerun. No attempts of this nature were reported in the paper and it would interesting to see if such a method would work in practice. The main limitations of PREfix lie in the modelling system, and path expansion. The number of paths could grow expontentially with the size of the code, though the authors report that this is not generally the case. The number of paths simulated is limited by a user defined variable (default 50). In practice this seems a reasonable method to balance execution time and accuracy. There is a sharp point of diminishing returns (in terms of number of errors found vs execution time) as the path limit increases. The modelling system is very imcomplete though. The authors acknowledge the severe limitations of the models, but claim such restrictions in practice detect a large number of errors and allow for a reasonable execution time. For example if a function returns then the modelling system can only represent 5 possible outcomes: 1) The function returns 0. 2) The function returns 1. 3) The function returns -1. 4) The function returns a value other than 0,1,-1. 5) The function returns an unknown value. The only other return types that can be stored in a model are void return, longjumps and exiting. The appendix of the paper contains a full description of the modelling language and what it can represent. Similar restrictions litter the rest of the modelling language. Since the modelling language is fairly small it can be run effeciently. The cost of this is the accuracy of the analysis. It would be interesting to see what portion of false errors are a result of this, and what errors could not be detected by PREfix (but perhaps could be detected by dynamic programs such as purify). Conclusions: Despite its limitations (or perhaps because of them), PREfix seems to be an extremely useful tool. If it were complete and more accurate then it would be so slow as to be unusable. It is ideal for finding rare bugs and errors occuring when interfacing code written by multiple users. A sped up version, such as PREfast, would be an invaluable resource. The primary benefit of PREfix is its user interface though. The ability to view the execution path of code required to generate an error is essential to the usability of such a tool. The filtering tools , though not well described in the paper, and customizable parameters allow a user to ignore the noisy error messages to focus on the real errors detected. It is very encouraging to see an approach so structured to applied use in real situations. A more complete evaluation, as well as a comparison to similar tools, is necessary to fully guage its usefullness.