Randomized Testing

Property-Based Testing

One approach to soft­ware test­ing is unit test­ing, where we write indi­vid­ual input-out­put pairs which we expect to hold for our code. This approach is some­times suf­fi­cient and can be good at lever­ag­ing pro­gram­mer knowl­edge of cor­ner cases. How­ever, often times unit tests fail to cap­ture a large enough per­cent­age of pro­gram behav­iour and can miss cor­ner cases not antic­i­pated by the pro­gram­mer.

Another approach is totally ran­dom­ized test­ing, where inputs are ran­domly gen­er­ated and fed to a pro­gram in the hopes of find­ing inputs which cause the pro­gram to crash. This can often cover a far greater per­cent­age of pro­gram behav­iour and no longer relies on pro­gram­mer knowl­edge to gen­er­ate test cases. How­ever, it is not able to iden­tify soft­ware vul­ner­a­bil­i­ties that do not cause the pro­gram to crash (unless you include assert(prop­erty) for all such prop­er­ties at all rel­e­vant points in the code).

Prop­erty-based test­ing falls some­where in the mid­dle of these two. Prop­erty-based test­ing allows pro­gram­mers to write gen­eral prop­er­ties that are expected of soft­ware, and then the tester auto­mat­i­cally gen­er­ates ran­dom tests which test that prop­erty. This approach strikes a nice bal­ance between automa­tion and pro­gram­mer intu­ition.

QuickCheck for Simple Properties

Quick­Check is a prop­erty-based test­ing tool for Haskell cre­ated by Koen Claessen and John Hughes roughly 20 years ago.

In order to intro­duce Quick­Check, let’s first dis­cuss some prop­er­ties of lists.

reverse [x] = [x]
reverse (xs++ys) = reverse ys ++ reverse xs
reverse (reverse xs) = xs

Ok, let’s try to write these in Haskell Quick­Check syn­tax so that we can ver­ify that these prop­er­ties hold.

prop_revrev :: [Int] -> Bool
prop_revrev xs = reverse (reverse xs) == xs

Ok, let’s test this using the Quick­Check func­tion quickcheck.

quick­Check :: (Testable prop) => prop -> IO ()

The type sig­na­ture says that quickcheck takes a Testable prop­erty and returns an IO monad. For now, we can just assume the Testable type­class defines func­tions which allow us to ran­domly sam­ple the inputs of our prop­erty, in this case [Int].

> quick­Check prop_revrev
+++ OK, passed 100 tests.

Huh, cool, ok. quickcheck ran­domly tested the dou­ble reverse prop­erty for lists on 100 dif­fer­ent inputs, and found that the prop­erty held for all ran­domly gen­er­ated inputs.

Let’s try another one.

prop_revapp :: [Int] -> [Int] -> Bool
prop_revapp xs ys = reverse (xs++ys) == reverse xs ++ reverse ys

And let’s use quickcheck again.

> quick­Check prop_revapp
*** Failed! Fal­si­fi­able (after 4 tests and 7 shrinks):

Huh? Do you see the mis­take?

Our prop­erty is slightly wrong. It should be

prop_revapp' :: [Int] -> [Int] -> Bool
prop_revapp' xs ys = reverse (xs++ys) == reverse ys ++ reverse xs

And now we try again.

> quick­Check prop_revapp'
+++ OK, passed 100 tests.

QuickCheck for Conditional Properties

Let’s sup­pose we have the fol­low­ing inser­tion func­tion for lists.

insert :: a -> [a] -> [a]
insert x []                 = [x]
insert x (y:ys) | x < y     = x : y : ys
                | oth­er­wise = y : insert x ys

insert places an ele­ment into a list where the ele­ment to its left is either the begin­ning of the list or less than it, and the ele­ment to its right is either the end of the list or greater than it.

> insert 4 [1,3,5,7]

Ok, let’s try to ver­ify that call­ing insert results in an ordered list. In order to do this, let’s define a pred­i­cated that checks if a list is ordered.

is­Ordered ::         (Ord a) => [a] -> Bool
is­Ordered (x1:x2:xs) = x1 <= x2 && is­Ordered (x2:xs)
is­Ordered _          = True

Ok, now let’s define our prop­erty.

prop_insert_ordered :: Int -> [Int] -> Bool
prop_insert_ordered x xs = is­Ordered (insert x xs)

Let’s see what hap­pens when we check this.

> quick­Check prop_insert_ordered
*** Failed! Fal­si­fi­able (after 4 tests and 3 shrinks):

What’s the prob­lem this time?

Ah, I see, we only expect that the result­ing list will be ordered if the orig­i­nal one is too.

prop_insert_ordered' :: Int -> [Int] -> Bool
prop_insert_ordered' x xs = (not $ is­Ordered xs) || is­Ordered (insert x xs)

And now let’s try that.

> quick­Check prop_insert_ordered'
+++ OK, passed 100 tests.

Great, it worked! ... or did it? Do we see any prob­lems here?

How many of our ran­domly gen­er­ated lists are actu­ally ordered? Well, let’s see.

prop_insert_ordered'' :: Int -> [Int] -> Prop­erty
prop_insert_ordered'' x xs =
    clas­sify (is­Ordered xs) "ord" $
    (not $ is­Ordered xs) || is­Ordered (insert x xs)

clas­sify will tell us how many inputs were actu­ally ordered (report­ing them as “ord”). Notice we had to change the return type from Bool to Prop­erty. This is due to the return type of clas­sify. We will dis­cuss in a moment what a Prop­erty is, but for now, think of it syn­ony­mously with Bool.

> quick­Check prop_insert_ordered''
+++ OK, passed 100 tests (9% ord).

Hmmm, of 100 tests, only 9 were actu­ally ordered. That’s not great. We could just run quick­Check more times. Can we do some­thing slightly bet­ter? Con­di­tional prop­er­ties to the res­cue!

(==>) :: Testable prop => Bool -> prop -> Prop­erty

This func­tion acts just like the impli­ca­tion arrow, except it guar­an­tees that every test case reported sat­is­fies the con­di­tion to the left of the arrow. The type Prop­erty, encodes this notion.

prop_insert_ordered''' :: Int -> [Int] -> Prop­erty
prop_insert_ordered''' x xs =
    clas­sify (is­Ordered xs) "ord" $
    is­Ordered xs ==> is­Ordered (insert x xs)

Ok, surely, this should now ver­ify our prop­erty!

> quick­Check prop_insert_ordered'''
*** Gave up! Passed only 70 tests (100% ord).

Oh no! It only gen­er­ated ordered inputs, but it couldn’t find enough ordered lists! This makes sense. Every zero or sin­gle ele­ment list is ordered. With two ele­ments, only 50% of lists are sorted. The chances keep going down as we go higher. Let’s view this in action.

prop_insert_ordered'''' :: Int -> [Int] -> Prop­erty
prop_insert_ordered'''' x xs =
    col­lect (length xs) $
    is­Ordered xs ==> is­Ordered (insert x xs)

col­lect is sim­i­lar to clas­sify, and will report the num­ber and per­cent­age of tests of dif­fer­ent length that quickcheck uses.

> quick­Check prop_insert_ordered''''
*** Gave up! Passed only 77 tests:
40% 0
39% 1
17% 2
 3% 3
 1% 4

Well, that’s not a very good dis­tri­bu­tion. 40% of the tests were just on the empty list...

Looks like we’re going to have to do some­thing a bit more involved.

User-Defined Generators

A Haskell term that gen­er­ates a ran­dom value of type a has the type Gen.

new­type Gen a = Mk­Gen { un­Gen :: Std­Gen -> Int -> a }

We can think of Haskell terms of type Gen tak­ing a ran­dom num­ber gen­er­a­tor Std­Gen, and a seed Int, and return­ing a ran­dom term of type a.

Quick­Check now defines the Arbi­trary type­class, which are used to sig­nify which types can be ran­domly gen­er­ated.

class Arbi­trary a where
  arbi­trary :: Gen a

The Arbi­trary type­class has one func­tion arbi­trary which is a gen­er­a­tor of a ran­dom value of type a as described above.

We can con­struct new instance of the type­class to define more com­plex gen­er­a­tors.

instance (Arbi­trary a, Arbi­trary b) => Arbi­trary (a,b) where
  arbi­trary = (,) <$> arbi­trary <*> arbi­trary

Quick­Check has built in a num­ber of use­ful gen­er­a­tor com­bi­na­tors which can help with defin­ing new Arbi­trary instances.

choose :: (Sys­tem.Ran­dom.Ran­dom a) => (a, a) -> Gen a
ele­ments :: [a] -> Gen a
oneof :: [Gen a] -> Gen a
fre­quency :: [(Int, Gen a)] -> Gen a

choose takes an inter­val and returns a ran­dom ele­ment from that inter­val. ele­ments takes a list of ele­ments and returns a ran­dom ele­ment from that list. oneof takes a list of gen­er­a­tors and ran­domly selects one ele­ment from one of the gen­er­a­tors. fre­quency gen­er­al­izes oneof such that each gen­er­a­tor in the list of gen­er­a­tors is given an explicit weight.

sam­ple takes a gen­er­a­tor and gen­er­ates exam­ple val­ues from it.

> sam­ple $ choose (0,3)
> sam­ple $ ele­ments [10, 20..100]
> sam­ple $ oneof [ele­ments [10, 20..100], choose (0,3)]
> sam­ple $ fre­quency [(1, ele­ments [10, 20..100]), (9, choose (0,3))]

Ok, let’s com­bine all of this to define our gen­er­a­tor for lists that we wanted!

gen­List ::  (Arbi­trary a) => Gen [a]
gen­List = do
  x  <- arbi­trary
  xs <- gen­List
  return (x:xs)

Does this work?

Well, sort of, but this only gen­er­ates infi­nite lists. Let’s define one that uses one of our com­bi­na­tors from above to gen­er­ate ran­dom lists of inte­gers of at most length m.

gen­List' ::  Int -> Gen [Int]
gen­List' m = do
  xs  <- gen­List
  n   <- ele­ments [0..m]
  return $ take n xs

Let’s gen­er­ate some ran­dom lists!

> sam­ple $ gen­List' 10

Cool! Recall, we wanted a gen­er­a­tor for only sorted lists, so we have one more step.

gen­Ord­List :: Int -> Gen [Int]
gen­Ord­List m = sort <$> gen­List' m

So now we get the fol­low­ing.

> sam­ple $ gen­Ord­List 10

We can now use the forall com­bi­na­tor to actu­ally test our orig­i­nal prop­erty.

for­All :: (Show a, Testable prop)
       => Gen a -> (a -> prop) -> Prop­erty

forall takes a gen­er­a­tor of a type a and a func­tion which pro­duces a Testable type, and returns a Prop­erty.

prop_insert :: Int -> Prop­erty
prop_insert x = for­All gen­Ord­List $
  \xs -> clas­sify (is­Ordered xs) $
         is­Ordered (insert x xs)

Now finally, let’s test it.

> quick­Check prop_insert
+++ OK, passed 100 tests (100% ord).


A QuickCheck Case Study: Primality Testing

Con­sider the fol­low­ing two func­tions.

fac­tors :: Int -> [Int]
fac­tors n = [i | i <- [1..n], n `mod` i == 0]

is­Prime :: Int -> Bool
is­Prime n = fac­tors n == [1,n]

The first func­tion finds all of the fac­tors of a num­ber n in ascend­ing order. The sec­ond deter­mines if a num­ber is prime by check­ing if its only fac­tors are 1 and itself.

Let’s con­sider an alter­na­tive solu­tion, say the fol­low­ing.

is­Prime' :: Int -> Bool
is­Prime' n = length (fac­tors n) == 2

Let’s try to check if our new imple­men­ta­tion is still cor­rect (rel­a­tive to the orig­i­nal imple­men­ta­tion).

prop_is­Prime1 :: Int -> Prop­erty
prop_is­Prime1 n =
   clas­sify (n < 0) "neg" $
   is­Prime n == is­Prime' n

Let’s test it out.

> quick­Check prop_is­Prime1
+++ OK, passed 100 tests (50% neg).

That did ok, but half of the test cases were neg­a­tive, which seems a bit silly. Let’s fix that.

prop_is­Prime2 :: Int -> Prop­erty
prop_is­Prime2 n = (n >= 0) ==> is­Prime n == is­Prime' n

Keep in mind when using ==> that you should be very care­ful to make sure your pred­i­cate to the left of the arrow is not overly pro­hib­i­tive. Con­sider the fol­low­ing.

bad­Prime :: Int -> Bool
bad­Prime _ = True

prop_is­Prime3 :: Int -> Prop­erty
prop_is­Prime3 n = (is­Prime n) ==> is­Prime n == bad­Prime n

This prop­erty only checks if the two func­tions are equal on inputs that are prime, so the prop­erty will be sat­is­fied by the silly bad prime finder.

> quick­Check prop_is­Prime3
+++ OK, passed 100 tests.

A Second QuickCheck Case Study: The Coloring Problem

Recall the graph col­or­ing prob­lem from HW1.

data Balkan = Alba­nia | Bul­garia | Bosnia­And­Herze­gov­ina
            | Kosovo | Mace­do­nia | Mon­tene­gro
  deriv­ing (Show, Eq)

data Color = Blue | Red | Green | Yel­low
  deriv­ing (Show, Eq)

balkans :: [Balkan]
balkans = [Alba­nia, Bul­garia, Bosnia­And­Herze­gov­ina, Kosovo, Mace­do­nia, Mon­tene­gro]

col­ors :: [Color]
col­ors = [Blue, Red, Green, Yel­low]

The is­Good­Col­or­ing func­tion takes a list of adja­cent Balkans and a col­or­ing of those Balkans, and returns true if the col­or­ing was good and false oth­er­wise. Here are two poten­tial solu­tions to that prob­lem.

is­Good­Col­or­ing1 :: [(Balkan, Balkan)] -> [(Balkan,Color)] -> Bool
is­Good­Col­or­ing1 adj col­or­ing
  = null [ (c1,c2) | (c1,c2) <- adj, lookup c1 col­or­ing == lookup c2 col­or­ing &&
                                     lookup c1 col­or­ing /= Noth­ing]

is­Good­Col­or­ing2 :: [(Balkan, Balkan)] -> [(Balkan,Color)] -> Bool
is­Good­Col­or­ing2 adj col­or­ing = null $ do
  (c1, c2) <- adj
  if ((lookup c1 col­or­ing)==(lookup c2 col­or­ing))&& (lookup c1 col­or­ing /= Noth­ing)
    return (c1, c2)

Let’s try to test if they are equiv­a­lent. We will start by defin­ing instances of Arbi­trary for both Balkan and Color.

instance Arbi­trary Balkan where
  arbi­trary = ele­ments balkans

instance Arbi­trary Color where
  arbi­trary = ele­ments col­ors

Let’s cre­ate a prop­erty that we would like to test.

test­Col­or­ing1 :: [(Balkan, Balkan)] -> [(Balkan, Color)] -> Bool
test­Col­or­ing1 adj col =
  is­Good­Col­or­ing1 adj col == is­Good­Col­or­ing2 adj col

And now let’s run it.

> quick­Check test­Col­or­ing1
+++ OK, passed 100 tests.

Is this suf­fi­cient? Maybe not... these col­or­ings are not guar­an­teed to be com­plete, i.e. have a col­or­ing for each Balkan. Let’s fix that.

same­Elems :: Eq a => [a] -> [a] -> Bool
same­Elems xs ys =  length xs == length ys && all (`elem` ys) xs

com­plete :: [(Balkan, Color)] -> Bool
com­plete col­or­ing = balkans `same­Elems` col_balks
    col_balks = map fst col­or­ing

test­Col­or­ing­Com­plete :: [(Balkan, Balkan)] -> [(Balkan, Color)] -> Prop­erty
test­Col­or­ing­Com­plete adj col =
  com­plete col ==>
  is­Good­Col­or­ing_teach adj col == is­Good­Col­or­ing_stud adj col

This is good, but will fail because it is unlikely to ran­domly gen­er­ate a com­plete col­or­ing. Let’s define a gen­er­a­tor for com­plete col­or­ings.

gen­Rand­Com­plete­Col­or­ing :: Gen [(Balkan, Color)]
gen­Rand­Com­plete­Col­or­ing = do
  c1 <- arbi­trary
  c2 <- arbi­trary
  c3 <- arbi­trary
  c4 <- arbi­trary
  c5 <- arbi­trary
  c6 <- arbi­trary
  return [(Alba­nia, c1), (Bul­garia, c2), (Bosnia­And­Herze­gov­ina, c3), (Kosovo, c4),
          (Mace­do­nia, c5), (Mon­tene­gro, c6)]

Finally, let’s make sure there are no rep­e­ti­tions in the adja­cency list, nor are there any self adja­cen­cies.

gen­Rand­Adja­cen­cies :: Gen [(Balkan, Balkan)]
gen­Rand­Adja­cen­cies = do
  adj <- arbi­trary
  return $ fil­ter (\(x,y) -> x /= y)
         $ nub­By (\(x1,y1) (x2,y2) -> (x1,y1) == (x2,y2) ||
                                      (x1,y1) == (y2, x2)) adj

So now we finally have the fol­low­ing.

test­Col­or­ing :: Prop­erty
test­Col­or­ing =
  for­All gen­Rand­Com­plete­Col­or­ing $ \col­or­ing ->
  for­All gen­Rand­Adja­cen­cies $ \adj ->
  clas­sify com­plete "comp" $
  is­Good­Col­or­ing1 adj col­or­ing == is­Good­Col­or­ing2 adj col­or­ing