PhD Defense: Quantifying Flakiness and Minimizing its Effects on Software Testing

Talk

Zebao Gao

Time:

12.04.2017 10:00 to 12:00

Location:

AVW 4172

URL:

https://talks.cs.umd.edu/talks/1939

In software testing, test inputs are passed into a system under test (SUT); the SUT is executed; and a test oracle checks the outputs against expected values. There are cases when the same test case is executed on the same code of the SUT multiple times, and it passes or fails during different runs. This is the test flakiness problem and such test cases are called flaky tests.The test flakiness problem makes test results and testing techniques unreliable. Flaky tests may be mistakingly labeled as failed, and this will increase not only the number of reported bugs testers need to check, but also the chance to miss real faults. The test flakiness problem is gaining more attention in modern software testing practice where complex interactions are involved in test execution, and this raises several new challenges: What metrics should be used to measure the flakiness of a test case? What are the factors that cause or impact flakiness? And how can the effects of flakiness be reduced or minimized?This research develops a systematic approach to quantitively analyze and minimize the effects of flakiness. This research makes three major contributions. First, a novel entropy-based metric is introduced to quantify the flakiness of different layers of test outputs (such as code coverage, invariants, and GUI state). Second, the impact of a common set of factors on test results in system interactive testing is examined. Last, a new flake filter is introduced to minimize the impact of flakiness by filtering out flaky tests (and test assertions) while retaining bug-revealing ones.Two empirical studies on five open source applications evaluate the new entropy measure, study the causes of flakiness, and evaluate the usefulness of the flake filter. In particular, the first study empirically analyzes the impact of factors including the system platform, Java version, application initial state and tool harness configurations. The results show a large impact on SUTs when these factors were uncontrolled, with as many as 184 lines of code coverage differing between runs of the same test cases, and up to 96% false positives with respect to fault detection. The second study evaluates the effectiveness of the flake filter on the SUTs' real faults. The results show that 3.83% of flaky assertions can impact 88.59% of test cases, and it is possible to automatically obtain a flake filter that, in some cases, completely eliminates flakiness without comprising fault-detection ability.

Examining Committee:

Chair: Dr. Atif Menon Dean's rep: Dr. Gang Qu Members: Dr. Ashok Agrawala Dr. James Purtilo Dr. Alan Sussman

Upcoming Events

Event

04.26.2024 12:00 to 13:30

IRB-4105

Computer Science APT Meeting

Event

04.26.2024 13:00 to 14:00

IRB-5105

Computer Science Instructional Faculty Meeting

Talk

04.26.2024 13:30 to 15:00

ATL 3100A

PhD Proposal: Towards the Verification of Quantum Networks
Yusuf Alnawakhtha

Event

04.26.2024 15:00 to 16:30

IRB-0318

Computer Science Education Committee Meeting

Talk

04.29.2024 11:30 to 12:30

IRB 4107

PhD Proposal: Multi-Agent Autonomous Decision Making in Artificial Intelligence
Saptarashmi Bandyopadhyay

Talk

04.29.2024 15:00 to 16:00

IRB 5105

PhD Proposal: Scaling Policy Gradient Methods to Open-Ended Domains
Ryan Sullivan

Talk

04.30.2024 10:00 to 12:00

IRB 4105

AI Empowered Music Education
Snehesh Shrestha

Talk

04.30.2024 12:30 to 15:00

IRB 4107

Towards Trustworthy Models in Machine Learning
Xiaoyu Liu

Talk

05.01.2024 15:00 to 17:00

IRB IRB-4105

PhD Defense: Feedback for Vision
Michael Maynord

Talk

05.02.2024 12:30 to 14:00

IRB 4107

Towards AI Alignment: Advancing Fairness, Reliability, and Human-Like Perception in AI
Bang An