Typeclasses

Let’s calculate 1 + 1 with integers and with floats. First, in Java.

Sys­tem.out.println(1 + 1);     // 2
Sys­tem.out.println(1.0 + 1.0); // 2.0

Now, in OCaml.

print_int (1 + 1)        (* 2 *)
print_float (1.0 +. 1.0) (* 2. *)

Finally, in Haskell.

print (1 + 1)     -- 2
print (1.0 + 1.0) -- 2.0

Before getting into the specifics, what are the similarities and differences between how we do this in Java and OCaml? Is Haskell more like Java or OCaml?

Ad-hoc Polymorphism

Notice that in Java the + operator is defined for both integers and floats. This is operator overloading or ad-hoc polymorphism, where + adopts different definitions depending on the types of the arguments it’s applied to.

In contrast, OCaml does not support this ad-hoc polymorphism. The + operator has one type and one type only, int -> int -> int. You cannot use it with floats, you have to use a different function, the +. operator.

Haskell is a different beast. The situation looks more similar to Java, where + can be used for both integers and floats. Indeed, Haskell does support ad-hoc polymorphism, but it has a different mechanism than the usual operator overloading present in Java. This is evident from its type signature.You can get the type of + by asking GHCi with :t (+).

(+) :: Num a => a -> a -> a

Here, Num is a typeclass, a concept akin to Java’s interfaces. You can declare any type an instance of the Num typeclass, so long as you implement certain functions, including a definition for +.

Creating an Instance

Let’s take another look at the example at the start. In OCaml we had to use different print functions depending on the argument. This is no surprise. The way you print an integer is different than how you would a float or string, so this requires different functions with different names in a language without ad-hoc polymorphism.You could define a generic print function with type ‘a -> unit, but perhaps not every value has a sensible printable representation. What would you do in those cases?

However in Haskell we could use print for all of them. Why? The type signature tells all.

print :: Show a => a -> IO ()

Any type that is an instance of the Show typeclass can be printed. Any type that is an instance of Show has to implement a show function that outputs a string representation of that value. Here is a good demonstration of this.

data Word = Hello | World

instance Show Word where
  show Hello = "Hola"
  show World = "Mundo"

We’re declaring a new type Word and telling Haskell that the string representation of each value is the Spanish translation. So, if you print Hello you’ll get out Hola.

So what?

This may seem like a convenience of little consequence, but it’s a big deal. Consider the following function in OCaml.

let rec more_than y xs =
  match xs with
  | [] -> []
  | x :: xt ->
      if x > y then x :: (more_than y xt)
      else more_than y xt

more_than 10 [5; 18; 80; -1; 7; 13]
(* = [18; 80; 13] *)

Suppose I want to apply this function to a bunch of foods and get back only the one’s I like better than some particular food.

type food = Apple | Banana | Can­taloupe | Dumpling |
            Egg­plant | Fudge | Grape­fruit

more_than Dumpling [Grape­fruit; Fudge; Can­taloupe; Egg­plant]
(* = [Egg­plant; Fudge; Grape­fruit] *)

Sorry, OCaml will enforce its own derived notion of order. It will assume you want sort based on the order given in the data definition. Not what I want.

Let’s try something similar in Haskell. First, the equivalent piece of code.I’ve written this program to mirror the OCaml version, but you could write it much more compactly using list comprehensions.

more­Than :: (Ord a) => a -> [a] -> [a]
more­Than y [] = []
more­Than y (x : xt) =
  if x > y then x : (more­Than y xt)
  else more­Than y xt

Notice that moreThan doesn’t care about the types of its input except for the fact that it defines some notion of order by being an instance of Ord. This allows a level of abstraction and flexibility unavailable to the OCaml programmer.An OCaml’er would solve this dilemma by forcing the user to provide moreThan with a comparator function. Imagine a more complicated function where you may have different types with possibly many different constraints. This solution doesn’t scale. Now with Haskell I can define the notion of order on food values. I declare foods an instance of Ord with my preferences instead of the lexicographic order.

import Data.List (elem­Index)

data Food = Apple | Banana | Can­taloupe | Dumpling |
            Egg­plant | Fudge | Grape­fruit
            deriv­ing (Eq, Show)

instance Ord Food where
  com­pare x y = com­pare kx ky
    where prefs     = [Grape­fruit, Can­taloupe, Apple,
                       Banana, Dumpling, Fudge, Egg­plant]
          (Just kx) = elem­Index x prefs
          (Just ky) = elem­Index y prefs

Haskell now knows the order I like my food, so I can find only food I like better than apples.

more­Than Apple [Grape­fruit, Fudge, Can­taloupe, Egg­plant]
-- = [Fudge, Egg­plant]

Haskell knows how to print our food items even though we didn’t give it an implementation of show. What gives? Well, I snuck in deriving (Eq, Show) that can automatically generate reasonable instances for some typeclasses, including Show.Any instance of Ord must also be an instance of Eq, so that’s why that’s there. You can also derive Ord and it will use the same ordering as OCaml.

The Expression Problem

Suppose we have a little languageThis particular language is sometimes referred to as Hutton’s razor. for integer addition.

data Expr = Val Int
          | Add Expr Expr

We can easily define some operations over this data.

show :: Expr -> String
show (Val x) = show x
show (Add l r) = (show l) ++ " + " ++ (show r)

eval :: Expr -> Int
eval (Val x) = x
eval (Add l r) = (eval l) + (eval r)

Here there are several data variants including Val and Add. There could also be many different operations, here we have show and eval. The expression problem is to find a way to make both the data and operations extensible. When we say extensible, we want three main criteria.

Modular. The extension should live outside of what it’s extending in a separate module. We should not have to modify the existing code to create an extension.
Separate compilation. If a user has a compiled version of our original module, we should be able to send them a separately compiled extension module and have everything work correctly.
Type safety. The extension should not subvert the static type system.

In a functional language, we can easily get operation extensibility. We simply introduce a new module that imports the data definition from above and implements a new function over the data.

swap :: Expr -> Expr
swap (Val x) = Val x
swap (Add l r) = Add r l

However, if we wanted to add a new data variant we’re stuck. Now we’d have to modify all the existing modules with operations in them to support this new variant. No good.

What about OOP? We can define the language once more, but this time in Java.

pub­lic abstract class Expr {
  pub­lic abstract String to­String();
  pub­lic abstract Inte­ger eval();
}

pub­lic class Val extends Expr {
  pub­lic Inte­ger data;

  pub­lic String to­String() {
    return data.to­String();
  }

  pub­lic Inte­ger eval() {
    return data;
  }
}

pub­lic class Add extends Expr {
  pub­lic Expr left, right;

  pub­lic String to­String() {
    return left.to­String() + " + " + right.to­String();
  }

  pub­lic Inte­ger eval() {
    return left.eval() + right.eval();
  }
}

In an OO language, we can easily get data extensibility. We simply introduce a new module that implements a new class that extends Expr and implements all the required methods.

pub­lic class Neg extends Expr {
  pub­lic Expr data;

  pub­lic String to­String() {
    return "-(" + data.to­String() + ")";
  }

  pub­lic Inte­ger eval() {
    return -data.eval();
  }
}

However, we cannot get operation extensibility. This would necessitate modifying the abstract class and in turn all the classes extending it. No good.

Extensibility with Typeclasses

We will use typeclasses to get extensibility. Consider this modification to the data definition.

{-# LAN­GUAGE Datatype­Con­texts #-}
data Val = Val Int
data Add l r = Add l r

class Expr x
instance Expr Val
instance (Expr l, Expr r) => Expr (Add l r)

There’s some strange stuff going on here. Instead of having a single unified data type with several variants, we have multiple data types, one for each variant, and a single unified typeclass. As expected, each of the types is an instance of the typeclass. Notice the second instance is saying that Add l r is an instance of Expr so long as l and r are too.

Now we’re going to see operation extensibility.

class Expr x => Eval x where
  eval :: x -> Int

instance Eval Val where
  eval (Val x) = x

instance (Eval l, Eval r) => Eval (Add l r) where
  eval (Add l r) = (eval l) + (eval r)

Here too our function is going to be encoded as a typeclass. For a type to be an instance of Eval it must already be an Expr. This makes sense. This typeclass will have a concrete function that must be implemented. Every data type should should then become an instance.

Now, what does a data extension look like?

data Expr x => Neg x = Neg x
instance Expr x => Expr (Neg x)

instance Eval x => Eval (Neg x) where
  eval (Neg x) = -(eval x)

The first two lines define a new data type like we did before, but for Neg. As with our previous constructors, we make it an instance of Expr. Now in order to implement any of the functions over our data, we need only make it an instance of the corresponding typeclass. Here we’ve made it an instance of Eval and hence we can call the eval function on any Neg type.

While slightly exotic, this certainly does solve the expression problem.

Additional Reading

Typeclasses were originally introduced by Wadler and Blott in their paper “How to make ad-hoc polymorphism less ad hoc.”
Niki Vazou has a lecture note on typeclasses that some of this was based upon.
LYaH covers the subject in both chapter 3 and 8.
Ralf Lämmel has two video lectures, one on the expression problem itself, and the other on a solution in Haskell.
Oleksandr Manzyuk has a nice blog post on solutions to the expression problem in both Java and Haskell.