PhD Proposal: Debugging via Code Sequence Covers

Talk
Ethar Elsaka
Time: 
07.11.2014 10:00 to 11:30
Location: 

AVW 4172

With the increasing complexity of today's software, the software development process is becoming highly time and resource consuming. The increasing number of software configurations, input parameters, usage scenarios, supporting platforms, external dependencies, and versions plays an important role in expanding the costs of maintaining and repairing unforeseeable software faults. To repair software faults, developers spend considerable time in identifying the scenarios leading to those faults and root-causing the problems. My goal in this proposal is to improve the software development process in general, and software debugging process in particular, by devising techniques and methods for automated software debugging, which can help software developers spend less time in the debugging phase, and find fault root causes faster. Additionally, automated debugging enhances the software quality in general and reduces development and maintenance costs. I propose using a code sequence-based approach for recommending faulty code paths to developers. My approach leverages the advances in software testing, particularly in automatic test case generation and replay, to generate a large number of test cases, and replay them to extract their execution traces, or alternatively, sequence covers.
Afterwards, commonalities between failing test case sequence covers are extracted, and presented to the developers in the form of subsequences that may be causing the fault. My hypothesis is that code sequences that are shared between a number of faulty test cases for the same reason resemble the faulty execution path, and hence, the search space for the faulty execution path can be narrowed down by using a large number of test cases.
In my preliminary work, I propose an efficient algorithm for finding common subsequences among a set of code sequence covers. I devise optimization techniques to generate shorter and more logical sequence covers, and to select subsequences with high likelihood of containing the root cause among the set of all possible common subsequences. I implement a debugging tool to enable developers to use the proposed approach, and integrate it with an existing Integrated Development Environment. The tool is also integrated with the environment's program editors so that developers can benefit from both the tool suggestions, and their source code counterparts. Finally, I perform a user study that shows that my tool leads developers to save 79% of their debugging time, and show that developers need only to inspect a small number of lines in order to find the root cause of the fault. Furthermore, my experimental evaluation shows that the proposed algorithm optimizations lead to better results in terms of both the algorithm running time and the output subsequence length. The remaining work includes improving the output subsequences further by combining the proposed approach with other software engineering techniques such as dependency graph incorporation, finding top k common subsequences, applying a hybrid approach for generating common subsequences, and extracting variable state information. Such extensions will lead to more useful sequences which explain the faulty code path more directly. In addition to the contributions to the field of software engineering, the proposed approaches, particularly for finding common subsequences, can be extended to other fields and applications, such as the alignment of DNA and protein sequences, word alignment for machine translation, and finding optimal matching in optimization problems.
Examining Committee:
Committee Chair: - Dr. Atif Memon
Dept's Representative - Dr. Ashok Agrawala
Committee Member: - Dr. Mihai Pop