1) What are the characteristics of a good training set for the approach of using neural networks in test case reduction? Explain the effects that the training set have on the performance of this approach. Answer: A good training set for this approach will be a one that is sampled uniformly over the input space, thus guaranteeing that each range of inputs has a representative sample in the training set. Having such a training set will cause the resulting neural network to mimic the actual program behaviour, thus leading to a reduction in the test suite while adequately testing all possible paths in the program. On the other hand, using an inadequate training set will result a neural network that doesn't contain the rules of the range of inputs that wasn't exercised by the training data, thus causing the user to remove important test cases from the test-suite (namely the ones that exercise this input range), which ultimately will lead to leaving significant paths in the program untested. --------------------------- 2) Explain how the rule-extraction phase allows us to actually reduce the number of test cases. Answer: After the rule extraction phase, we get an idea about how the input space is partitioned around each rule in the program, thus we can reduce the number of test cases that uses many values in one partition of the input space into one representative test case for this partition (because all such values will actually follow the same program path as we inferred from the extracted rule). --------------------------- 3) "The pruning phase might cause the neural network representation to produce test-suites that might avoid testing important parts of the program". Do you agree with the previous statement? Justify your answer. Answer: This is highly dependent on the quality of the training set (coverage of input space) and the threshold you set for the penalty function in the pruning phase. If these criteria is met, then the pruning phase is guranteed to actually remove the edges that represent the rules that aren't actually in the real program. As having lower weights for these edges indicated that this edge isn't exercised by the training set, in other words that this specific input on one part of th edge doesn't affect the output associated with corresponding hidden node. On the other hand, if the training set isn't distributed well over the input space, that means that the lower weight of some edge might be resulting from not having enough training data other than not being an actual rule in the program.