TypeclassesA Tutorial on Typeclasses in Coq

Suppose we need string converters for lots of data structures. We can build a small library of primitives
       showBool : bool -> string
       showNat : nat -> string
and combinators for structured types:
     showList : {A : Type} 
                (A -> string) -> (list A) -> string
     showPair : {A B : Type} 
                (A -> string) -> (B -> string) -> A * B -> string
Then we can build string converters for more complex structured types by assembling them from these pieces:
     showListOfPairsOfNats = showList (showPair showNat showNat)

This works, but it's clunky.
  • Requires making up names for all these string converters, when it seems they could be generated automatically.
  • Moreover, even the definitions of converters like showListOfPairsOfNats seem pretty mechanical, given their types.
Solution: Typeclasses.
  • Based on Haskell
  • Coq's typeclasses are a thin layer around existing features
  • Important to have a clear understanding what's going on under the hood
  • N.b., this is not "classes" as in object-oriented programming!

Classes and Instances

To automate converting various kinds of data into strings, we begin by defining a typeclass called Show.

Class Show A : Type :=
  {
    show : Astring
  }.

The Show typeclass can be thought of as "classifying" types whose values can be converted to strings -- that is, types A such that we can define a function show of type A string.
We can declare that bool is such a type by giving an Instance declaration that witnesses this function:

Instance showBool : Show bool :=
  {
    show := fun b:boolif b then "true" else "false"
  }.

Other types can similarly be equipped with Show instances -- including, of course, new types that we define.

Inductive primary := Red | Green | Blue.

Instance showPrimary : Show primary :=
  {
    show :=
      fun c:primary
        match c with
        | Red ⇒ "Red"
        | Green ⇒ "Green"
        | Blue ⇒ "Blue"
        end
  }.

Instance showNat : Show nat :=
  {
    show := string_of_nat
  }.

Next, we can define functions that use the overloaded function show like this:

Definition showOne {A : Type} `{Show A} (a : A) : string :=
  "The value is " ++ show a.

Compute (showOne true).
Compute (showOne 42).
The parameter `{Show A} is a class constraint, which states that the function showOne is expected to be applied only to types A that belong to the Show class.
Concretely, this constraint should be thought of as an extra parameter to showOne supplying evidence that A is an instance of Show -- i.e., it is essentially just a show function for A, which is implicitly invoked by the expression show a.

More interestingly, a single function can come with multiple class constraints:

Definition showTwo {A B : Type}
           `{Show A} `{Show B} (a : A) (b : B) : string :=
  "First is " ++ show a ++ " and second is " ++ show b.

Compute (showTwo true 42).
Compute (showTwo Red Green).
In the body of showTwo, the type of the argument to each instance of show determines which of the implicitly supplied show functions (for A or B) gets invoked.

Here is another basic example of typeclasses: a class Eq describing types with a (boolean) test for equality.

Class Eq A :=
  {
    eqb: AAbool;
  }.

Notation "x =? y" := (eqb x y) (at level 70).

And here are some basic instances:

Instance eqBool : Eq bool :=
  {
    eqb := fun (b c : bool) ⇒
       match b, c with
         | true, truetrue
         | true, falsefalse
         | false, truefalse
         | false, falsetrue
       end
  }.

Instance eqNat : Eq nat :=
  {
    eqb := Nat.eqb
  }.

Parameterized Instances: New Typeclasses from Old

What about a Show instance for pairs?

Instance showPair {A B : Type} `{Show A} `{Show B} : Show (A × B) :=
  {
    show p :=
      let (a,b) := p in
        "(" ++ show a ++ "," ++ show b ++ ")"
  }.

Compute (show (true,42)).

Similarly, here is an Eq instance for pairs...

Instance eqPair {A B : Type} `{Eq A} `{Eq B} : Eq (A × B) :=
  {
    eqb p1 p2 :=
      let (p1a,p1b) := p1 in
      let (p2a,p2b) := p2 in
      andb (p1a =? p2a) (p1b =? p2b)
  }.

...and here is Show for lists:

Fixpoint showListAux {A : Type} (s : Astring) (l : list A) : string :=
  match l with
    | nil ⇒ ""
    | cons h nils h
    | cons h tappend (append (s h) ", ") (showListAux s t)
  end.

Instance showList {A : Type} `{Show A} : Show (list A) :=
  {
    show l := append "[" (append (showListAux show l) "]")
  }.

Class Hierarchies

We often want to organize typeclasses into hierarchies. For example, we might want a typeclass Ord for "ordered types" that support both equality and a less-or-equal comparison operator.
We parameterize the definition of Ord on an Eq class constraint:

Class Ord A `{Eq A} : Type :=
  {
    le : AAbool
  }.

When we define instances of Ord, we just have to implement the le operation.

Instance natOrd : Ord nat :=
  {
    le := Nat.leb
  }.

Functions expecting to be instantiated with an instance of Ord now have two class constraints, one witnessing that they have an associated eqb operation, and one for le.

Definition max {A: Type} `{Eq A} `{Ord A} (x y : A) : A :=
  if le x y then y else x.

How It Works

Typeclasses in Coq are a powerful tool, but the expressiveness of the Coq logic makes it hard to implement sanity checks like Haskell's "overlapping instances" detector.
As a result, using Coq's typeclasses effectively -- and figuring out what is wrong when things don't work -- requires a clear understanding of the underlying mechanisms.

Implicit Generalization

The first thing to understand is exactly what the "backtick" notation means in declarations of classes, instances, and functions using typeclasses. This is actually a quite generic mechanism, called implicit generalization, that was added to Coq to support typeclasses but that can also be used to good effect elsewhere.
The basic idea is that unbound variables mentioned in bindings marked with ` are automatically bound in front of the binding where they occur.

To enable this behavior for a particular variable, say A, we first declare A to be implicitly generalizable:

Generalizable Variables A.

Now, for example, we can shorten the declaration of the showOne function by omitting the binding for A at the front.

Definition showOne1 `{Show A} (a : A) : string :=
  "The value is " ++ show a.
Coq will notice that the occurrence of A inside the `{...} is unbound and automatically insert the binding that we wrote explicitly before.

Print showOne1.
(* ==>
    showOne1 = 
      fun (A : Type) (H : Show A) (a : A) => "The value is " ++ show a
           : forall A : Type, Show A -> A -> string

    Arguments A, H are implicit and maximally inserted
*)

In fact, even the `{Show A} form hides one bit of implicit generalization: the bound name of the Show constraint itself. You will sometimes see class constraints written more explicitly, like this...

Definition showOne2 `{_ : Show A} (a : A) : string :=
  "The value is " ++ show a.
... or even like this:

Definition showOne3 `{H : Show A} (a : A) : string :=
  "The value is " ++ show a.

If we ask Coq to print the arguments that are normally implicit, we see that all these definitions are exactly the same internally.

Set Printing Implicit.
Print showOne.
Print showOne1.
Print showOne2.
Print showOne3.
(* ==> 
    showOne = 
        fun (A : Type) (H : Show A) (a : A) => 
          "The value is " ++ @show A H a
      : forall A : Type, Show A -> A -> string
*)

Why is this useful?
Because it works recursively.

Definition max1 `{Ord A} (x y : A) :=
  if le x y then y else x.

Set Printing Implicit.
Print max1.
(* ==>
     max1 = 
       fun (A : Type) (H : Eq A) (H0 : @Ord A H) (x y : A) =>
         if @le A H H0 x y then y else x

   : forall (A : Type) (H : Eq A), 
       @Ord A H -> A -> A -> A    
*)


Check Ord.
(* ==> Ord : forall A : Type, Eq A -> Type *)

Records are Products

Although records are not part of its core, Coq does provide some simple syntactic sugar for defining and using records.

Record types must be declared before they are used. For example:

Record Point :=
  Build_Point
    {
      px : nat;
      py : nat
    }.
Internally, this declaration is desugared into a single-field inductive type, roughly like this:
    Inductive Point : Set :=
      | Build_Point : natnatPoint.

Elements of this type can be built, if we like, by applying the Build_Point constructor directly.

Check (Build_Point 2 4).
Or we can use more familar record syntax, which allows us to name the fields and write them in any order:

Check {| px := 2; py := 4 |}.
Check {| py := 4; px := 2 |}.

We can also access fields of a record using conventional "dot notation" (with slightly clunky concrete syntax):

Definition r : Point := {| px := 2; py := 4 |}.

Compute (r.(px) + r.(py)).

Record declarations can also be parameterized:

Record LabeledPoint (A : Type) :=
  Build_LabeledPoint
    {
      lx : nat;
      ly : nat;
      label : A
    }.

Check {| lx:=2; ly:=4; label:="hello" |}.
(* ==> 
     {| lx := 2; ly := 4; label := "hello" |}
        : LabeledPoint string
*)


Typeclasses are Records

Typeclasses and instances, in turn, are basically just syntactic sugar for record types and values (together with a bit of magic for using proof search to fill in appropriate instances during typechecking, as described below).
Internally, a typeclass declaration is elaborated into a parameterized Record declaration:

Set Printing All.
Print Show.
(* ==> 
    Record Show (A : Type) : Type := 
      Build_Show
        { show : A -> string } 
*)

Analogously, Instance declarations become record values:

Print showNat.
(* ==>
    showNat = {| show := string_of_nat |}
       : Show nat
*)


Similarly, overloaded functions like show are really just record projections, which in turn are just functions that select a particular argument of a one-constructor Inductive type.

Set Printing All.
Print show.
(* ==>
    show = 
      fun (A : Type) (Show0 : Show A) =>
        match Show0 with
          | Build_Show _ show => show
        end
   : forall (A : Type), Show A -> A -> string 

   Arguments A, Show are implicit and maximally inserted  *)

Inferring Instances

The "special sauce" of typeclasses is the way appropriate instances are automatically inferred (and/or constructed!) during typechecking.
For example, if we write show 42, what we actually get is @show nat showNat 42:

Definition eg42 := show 42.

Set Printing Implicit.
Print eg42.
How does this happen?

First, since the arguments to show are marked implicit, what we typed is automatically expanded to @show _ _ 42. The first _ should obviously be replaced by nat. But what about the second?
By ordinary type inference, Coq knows that, to make the whole expression well typed, the second argument to @show must be a value of type Show nat. It attempts to find or construct such a value using a variant of the eauto proof search procedure that refers to a "hint database" called typeclass_instances.

We can see what's happening during the instance inference process if we issue the Set Typeclasses Debug command.

Set Typeclasses Debug.
Check (show 42).
(* ==>
     Debug: 1: looking for (Show nat) without backtracking
     Debug: 1.1: exact showNat on (Show nat), 0 subgoal(s)
*)


Check (show (true,42)).
(* ==>
     Debug: 1: looking for (Show (bool * nat)) without backtracking
     Debug: 1.1: simple apply @showPair on (Show (bool * nat)), 2 subgoal(s)
     Debug: 1.1.3 : (Show bool)
     Debug: 1.1.3: looking for (Show bool) without backtracking
     Debug: 1.1.3.1: exact showBool on (Show bool), 0 subgoal(s)
     Debug: 1.1.3 : (Show nat)
     Debug: 1.1.3: looking for (Show nat) without backtracking
     Debug: 1.1.3.1: exact showNat on (Show nat), 0 subgoal(s)      *)


In summary, here are the steps again:
    show 42
       ===> { Implicit arguments }
    @show _ _ 42
       ===> { Typing }
    @show (?A : Type) (?Show0 : Show ?A) 42
       ===> { Unification }
    @show nat (?Show0 : Show nat) 42
       ===> { Proof search for Show Nat returns showNat }
    @show nat showNat 42

Typeclasses and Proofs

Since programs and proofs in Coq are fundamentally made from the same stuff, the mechanisms of typeclasses extend smoothly to situations where classes contain not only data and functions but also proofs.
This is a big topic -- too big for a basic tutorial -- but let's take a quick look at a few things.

Propositional Typeclass Members

The Eq typeclass defines a single overloaded function that tests for equality between two elements of some type. We can extend this to a subclass that also comes with a proof that the given equality tester is correct, in the sense that, whenever it returns true, the two values are actually equal in the propositional sense (and vice versa).

Class EqDec (A : Type) {H : Eq A} :=
  {
    eqb_eq : x y, x =? y = truex = y
  }.

To build an instance of EqDec, we must now supply an appropriate proof.

Instance eqdecNat : EqDec nat :=
  {
    eqb_eq := Nat.eqb_eq
  }.

If we do not happen to have an appropriate proof already in the environment, we can simply omit it. Coq will enter proof mode and ask the user to use tactics to construct inhabitants for the fields.

Instance eqdecBool' : EqDec bool.
Proof.
  constructor.
  intros x y. destruct x; destruct y; simpl; unfold iff; auto.
Defined.

Given a typeclass with propositional members, we can use these members in proving things involving this typeclass.

Lemma eqb_fact `{EqDec A} : (x y z : A),
  x =? y = truey =? z = truex = z.
Proof.
  intros x y z Exy Eyz.
  rewrite eqb_eq in Exy.
  rewrite eqb_eq in Eyz.
  subst. reflexivity. Qed.


Some Useful Typeclasses

Dec

The ssreflect library defines what it means for a proposition P to be decidable like this...

Require Import ssreflect ssrbool.

Print decidable.
(* ==>
     decidable = fun P : Prop => {P} + {~ P}
*)

... where {P} + {~ P} is an "informative disjunction" of P and ¬P.

It is easy to wrap this in a typeclass of "decidable propositions":

Class Dec (P : Prop) : Type :=
  {
    dec : decidable P
  }.

We can now create instances encoding the information that propositions of various forms are decidable. For example, the proposition x = y is decidable (for any specific x and y), assuming that x and y belong to a type with an EqDec instance.

Instance EqDec__Dec {A} `{H : EqDec A} (x y : A) : Dec (x = y).
Proof.
  constructor.
  unfold decidable.
  destruct (x =? y) eqn:E.
  - left. rewrite <- eqb_eq. assumption.
  - right. intros C. rewrite <- eqb_eq in C. rewrite E in C. inversion C.
Defined.

Similarly, we can lift decidability through logical operators like conjunction:

Instance Dec_conj {P Q} {H : Dec P} {I : Dec Q} : Dec (PQ).
Proof.
  constructor. unfold decidable.
  destruct H as [D]; destruct D;
    destruct I as [D]; destruct D; auto;
      right; intro; destruct H; contradiction.
Defined.

One reason for doing all this is that it makes it easy to move back and forth between the boolean and propositional worlds, whenever we know we are dealing with decidable propositions.
In particular, we can define a notation P? that converts a decidable proposition P into a boolean expression.

Notation "P '?'" :=
  (match (@dec P _) with
   | left _true
   | right _false
   end)
  (at level 100).

Now we don't need to remember that, for example, the test for equality on numbers is called eqb, because instead of this...

Definition silly_fun1 (x y z : nat) :=
  if andb (x =? y) (andb (y =? z) (x =? z))
  then "all equal"
  else "not all equal".
... we can just write this:

Definition silly_fun2 (x y z : nat) :=
  if (x = yy = zx = z)?
  then "all equal"
  else "not all equal".

Monad

In Haskell, one place typeclasses are used very heavily is with the Monad typeclass, especially in conjunction with Haskell's "do notation" for monadic actions.
Monads are an extremely powerful tool for organizing and streamlining code in a wide range of situations where computations can be thought of as yielding a result along with some kind of "effect." Examples of possible effects include
  • input / output
  • state mutation
  • failure
  • nondeterminism
  • randomness
  • etc.

There are lots of different monad packages for Coq. We like Gregory Malecha's (from his ExtLib library).

Require Export ExtLib.Structures.Monads.
Export MonadNotation.
Open Scope monad_scope.

The main definition provided by this library is the following typeclass:
    Class Monad (M : TypeType) : Type :=
       {
         ret : {T : Type}, TM T ;
         bind : {T U : Type}, M T → (TM U) → M U
       }.
That is, a type family M is an instance of the Monad class if we can define functions ret and bind of the appropriate types.

For example, we can define a monad instance for option like this:

Instance optionMonad : Monad option :=
  {
    ret T x :=
      Some x ;
    bind T U m f :=
      match m with
        NoneNone
      | Some xf x
      end
  }.

The other nice thing we get from the Monad library is lightweight notation for bind: Instead of
      bind m1 (fun xm2),
we can write
      x <- m1 ;; m2.
Or, if the result from m1 is not needed in m2, then instead of
      bind m1 (fun _m2),
we can write
      m1 ;; m2.

This allows us to write functions involving "option plumbing" very compactly.
For example, suppose we have a function that looks up the nth element of a list, returning None if the list contains less than n elements.

Fixpoint nth_opt {A : Type} (n : nat) (l : list A) : option A :=
  match l with
    [] ⇒ None
  | h::tif n =? 0 then Some h else nth_opt (n-1) t
  end.
We can write a function that sums the first three elements of a list (returning None if the list is too short) like this:

Definition sum3 (l : list nat) : option nat :=
  x0 <- nth_opt 0 l ;;
  x1 <- nth_opt 1 l ;;
  x2 <- nth_opt 2 l ;;
  ret (x0 + x1 + x2).

One final convenience for programming with monads is a collection of operators for "lifting" functions on ordinary values to functions on monadic computations. ExtLib defines three -- one for unary functions, one for binary, and one for ternary. The definitions (slightly simplified) look like this:

Definition liftM
            {m : TypeType}
            {M : Monad m}
            {T U : Type} (f : TU)
          : m Tm U :=
  fun xbind x (fun xret (f x)).

Definition liftM2
            {m : TypeType}
            {M : Monad m}
            {T U V : Type} (f : TUV)
          : m Tm Um V :=
    fun x ybind x (fun xliftM (f x) y).

For example, suppose we have two option nats and we would like to calculate their sum (unless one of them is None, in which case we want None). Instead of this...

Definition sum3opt (n1 n2 : option nat) :=
  x1 <- n1 ;;
  x2 <- n2 ;;
  ret (x1 + x2).
...we can use liftM2 to write it more compactly:

Definition sum3opt' (n1 n2 : option nat) :=
  liftM2 plus n1 n2.

Controlling Instantiation

"Defaulting"

The type of the overloaded eqb operator...

Check @eqb.
(* ==>
     @eqb : forall A : Type, Eq A -> A -> A -> bool    *)

... says that it works for any Eq type. Naturally, we can use it in a definition like this...

Definition foo x := if x =? x then "Of course" else "Impossible".
... and then we can apply foo to arguments of any Eq type.
Right?


Fail Check (foo true).
(* ==>
     The command has indeed failed with message:
     The term "true" has type "bool" while it is expected 
       to have type "bool -> bool". *)

Here's what happened:
  • When we defined foo, the type of x was not specified, so Coq filled in a unification variable (an "evar") ?A.
  • When typechecking the expression eqb x, the typeclass instance mechanism was asked to search for a type-correct instance of Eq, i.e., an expression of type Eq ?A.
  • This search immediately succeeded because the first thing it tried worked; this happened to be the constant eqBoolBool : Eq (boolbool). In the process, ?A got instantiated to boolbool.
  • The type calculated for foo was therefore (boolbool)->(boolbool)->bool.
The lesson is that it matters a great deal exactly what problems are posed to the instance search engine.

Manipulating the Hint Database

One of the ways in which Coq's typeclasses differ most from Haskell's is the lack, in Coq, of an automatic check for "overlapping instances."
That is, it is completely legal to define a given type to be an instance of a given class in two different ways.

Inductive baz := Baz : natbaz.

Instance baz1 : Show baz :=
  {
    show b :=
      match b with
        Baz n ⇒ "Baz: " ++ show n
      end
  }.

Instance baz2 : Show baz :=
  {
    show b :=
      match b with
        Baz n ⇒ "[" ++ show n ++ " is a Baz]"
      end
  }.

Compute (show (Baz 42)).
(* ==> 
     = "42 is a Baz"
     : string   *)

When this happens, it is unpredictable which instance will be found first by the instance search process; here it just happened to be the second. The reason Coq doesn't do the overlapping instances check is because its type system is much more complex than Haskell's -- so much so that it is very challenging in general to decide whether two given instances overlap.
The reason this is unfortunate is that, in more complex situations, it may not be obvious when there are overlapping instances.

One way to deal with overlapping instances is to "curate" the hint database by explicitly adding and removing specific instances.
To remove things, use Remove Hints:

Remove Hints baz1 baz2 : typeclass_instances.

To add them back (or to add arbitrary constants that have the right type to be intances -- i.e., their type ends with an applied typeclass -- but that were not created by Instance declarations), use Existing Instance:

Existing Instance baz1.
Compute (show (Baz 42)).
(* ==> 
     = "Baz: 42"
     : string    *)


Remove Hints baz1 : typeclass_instances.

Another way of controlling which instances are chosen by proof search is to assign priorities to overlapping instances:

Instance baz3 : Show baz | 2 :=
  {
    show b :=
      match b with
        Baz n ⇒ "Use me first! " ++ show n
      end
  }.

Instance baz4 : Show baz | 3 :=
  {
    show b :=
      match b with
        Baz n ⇒ "Use me second! " ++ show n
      end
  }.

Compute (show (Baz 42)).
(* ==> 
     = "Use me first!  42"
     : string  *)

0 is the highest priority.
If the priority is not specified, it defaults to the number of binders of the instance. (This means that more specific -- less polymorphic -- instances will be chosen over less specific ones.)

Debugging

Instantiation Failures

One downside of using typeclasses, especially many typeclasses at the same time, is that error messages can become puzzling.
Here are some relatively easy ones.

Inductive bar :=
  Bar : natbar.

Fail Definition eqBar :=
  (Bar 42) =? (Bar 43).

(* ===> 
    The command has indeed failed with message:
    Unable to satisfy the following constraints:
       ?Eq : "Eq bar"  *)


Fail Definition ordBarList :=
  le [Bar 42] [Bar 43].

(* ===>
    The command has indeed failed with message:
    Unable to satisfy the following constraints:
      ?H : "Eq (list bar)"
      ?Ord : "Ord (list bar)" *)

In these cases, it's pretty clear what the problem is. To fix it, we just have to define a new instance. But in more complex situations it can be trickier.

A few simple tricks can be very helpful:
  • Do Set Printing Implicit and then use Check and Print to investigate the types of the things in the expression where the error is being reported.
  • Add some @ annotations and explicitly fill in some of the arguments that should be getting instantiated automatically, to check your understanding of what they should be getting instantiated with.
  • Turn on tracing of instance search with Set Typeclasses Debug.
The Set Typeclasses Debug command has a variant that causes it to print even more information: Set Typeclasses Debug Verbosity 2. Writing just Set Typeclasses Debug is equivalent to Set Typeclasses Debug Verbosity 1.

Another potential source of confusion with error messages comes up if you forget a `. For example:

Fail Definition max {A: Type} {Ord A} (x y : A) : A :=
  if le x y then y else x.
(* ===>
     The command has indeed failed with message:
     Unable to satisfy the following constraints:
     UNDEFINED EVARS:
      ?X354==A Type (type of Ord) {?T}
      ?X357==A0 Ord A x y Eq A (parameter H of @le) {?H}
      ?X358==A0 Ord A x y Ord A (parameter Ord of @le) {?Ord}  *)

The UNDEFINED EVARS here is because the binders that are automatically inserted by implicit generalization are missing.

Nontermination

An even more annoying way that typeclass instantiation can go wrong is by simply diverging. Here is a small example of how this can happen.

Declare a typeclass involving two types parameters A and B -- say, a silly typeclass that can be inhabited by arbitrary functions from A to B:

Class MyMap (A B : Type) : Type :=
  {
    mymap : AB
  }.

Declare instances for getting from bool to nat...

Instance MyMap1 : MyMap bool nat :=
  {
    mymap b := if b then 0 else 42
  }.
... and from nat to string:

Instance MyMap2 : MyMap nat string :=
  {
    mymap := fun n : nat
      if le n 20 then "Pretty small" else "Pretty big"
  }.

Definition e1 := mymap true.
Compute e1.

Definition e2 := mymap 42.
Compute e2.

Notice that these two instances don't automatically get us from bool to string:

Fail Definition e3 : string := mymap false.

We can try to fix this by defining a generic instance that combines an instance from A to B and an instance from B to C:

Instance MyMap_trans {A B C : Type} `{MyMap A B} `{MyMap B C} : MyMap A C :=
  { mymap a := mymap (mymap a) }.
This does get us from bool to string automatically:

Definition e3 : string := mymap false.
Compute e3.

However, although this example seemed to work, we are actually in a state of great peril: If we happen to ask for an instance that doesn't exist, the search procedure will diverge.

(* 
Definition e4 : list nat := mymap false.
*)


Alternative Structuring Mechanisms

Typeclasses are just one of several mechanisms that can be used in Coq for structuring large developments. Others include:
  • canonical structures
  • bare dependent records
  • modules and functors
(* See the chapter for pointers to further information and 
   comparisons.*)

Advice from Experts

(See the full chapter for lots...)