With the abundance of testing techniques being developed, a critical question---yet one that has not been adequately answered by the research so far---is how to evaluate the effectiveness of these techniques. The answer is far from straightforward because different techniques may excel in different situations. For example, test suites generated by the same technique may differ in characteristics that affect their ability to detect faults. Different kinds of faults, in turn, may be more easily detected by some techniques, so evaluating the effectiveness of testing techniques against different samples of faults may give different results. This issue is further complicated by the fact that it is not even clear what characteristics should distinguish different kinds of faults. This research proposes to explore three inter-related questions: How do characteristics of test suites affect their ability to detect faults? How do characteristics of faults affect their detectability by test suites? And how should faults be characterized? Two insights that are expected to make progress toward answering these questions will be investigated. The first insight is that, in empirical studies of testing techniques, testing situations---in particular, test suites and faults---should be characterized, and results should be presented in terms of those characterizations, to enable researchers or practitioners in different situations to better understand how the study results would translate to their situation. The second insight is that the problem of empirically studying fault detection in testing can be broken down into the sub-problems of studying coverage of faulty program elements, state failures introduced by faults, and failures introduced by state failures. In the proposed research, these two insights will be analyzed empirically. In addition, this work proposes to develop a practical application of the first insight: a framework for adaptive, search-based regression testing, which is expected to enhance existing regression-testing techniques by enabling testers to learn from previous testing iterations how to choose and improve the technique used in the next iteration. To take advantage of existing resources and to limit the proposed work to a feasible scope, the work will focus on testing in the domain of graphical user interfaces (GUIs), which represent an increasing proportion of modern software yet a small proportion of subjects in software-testing studies. Although the empirical studies in this work propose to focus on GUI testing, the insights that motivate the studies will potentially apply to a wide range of testing domains.