Notes on the VISITOR PATTERN CMSC 433, Spring 2006 Michael Hicks ---------------------------------------------------------------------- We'll explain visitors using the following running example, which implements arithmetic on integers. interface Node { } class Number implements Node { int num; Number(int n) { num = n; } } class Plus implements Node { Node lhs; Node rhs; Plus(Node l, Node r) { lhs = l; rhs = r; } } ---------------------------------------------------------------------- EXAMPLE 1. Here's the standard object-oriented way of implementing "analyses" on an object hierarchy. Here we'll add "compute" to compute the expression. interface Node { int compute(); } class Number implements Node { private int num; Number(int n) { num = n; } int compute() { return num; } } class Plus implements Node { private Node lhs; private Node rhs; Plus(Node l, Node r) { lhs = l; rhs = r; } int compute() { return lhs.compute() + rhs.compute(); } } ---------------------------------------------------------------------- EXAMPLE 2. But if we add many analyses, then the Number and Plus classes will get bloated. Every one will have to change each time we add an analysis. Here's a simple way we could "move the analysis off to the side." It depends on the data stored in Number and Plus as being accessible to outside classes, as in the initial definitions at the top, as opposed to the one for the OO way above. interface NotAVisitor { void visit(Node); } class SumNotAVisitor implements NotAVisitor { int sum = 0; void visit(Node n) { if (n instanceof Number) { sum += ((Number)n).num; } else if (n instanceof Plus) { Plus n1 = (Plus)n; visit(n1.lhs); visit(n1.rhs); } else { throw new RuntimeException("Unexpected case!"); } } } Then we could do Node n = new Plus(new Number(1), new Number(2)); SumNotAVisitor v = new SumNotAVisitor(); v.visit(n); System.out.println("result = "+v.sum); This works, but instanceof is not very pretty. And there's the nasty "else" case which we might forget to implement, say when we add a new kind of Node, like a Minus expression. How can we use dynamic dispatch instead? ---------------------------------------------------------------------- EXAMPLE 3. At first glance, we might think we could do this as follows: interface Visitor { void visit(Number); void visit(Plus); } class SumNotAVisitor2 implements Visitor { int sum = 0; void visit(Number n) { sum += n.num; } void visit(Plus n) { visit(n.lhs); visit(n.rhs); } } That is, we have different visit methods for each kind of Node object. Then, when we call visit(n) where n is a Node, the appropriate visit() method will get called at run time. E.g., just as above: Node n = new Plus(new Number(1), new Number(2)); SumNotAVisitor2 v = new SumNotAVisitor2(); v.visit(n); System.out.println("result = "+v.sum); Our thought is that the first call to visit will be the visit(Plus) method of v, since n is, at run time, a Plus object. However, instead, you will find that the code above does not even compile! We get the errors SumNotAVisitor2.java:7: cannot resolve symbol symbol : method visit (Node) location: class SumNotAVisitor2 visit(n.lhs); ^ SumNotAVisitor2.java:8: cannot resolve symbol symbol : method visit (Node) location: class SumNotAVisitor2 visit(n.rhs); ^ 2 errors The reason is that we cannot know at compile-time what kind of Node lhs or rhs will be. We only know at runtime. But the only thing that is allowed to vary at run time is the RECEIVER of the class. That is, it is certainly reasonable for me to do in EXAMPLE 1 Node n = ... n.compute(); Which method to call will be determined by n's runtime class (e.g., either Plus or Number). But such "dynamic dispatch" is not allowed on method arguments, only the receiver. Thus, even though in Visitor we have two methods with the same name (visit), they are really two completely different methods. That is, we have OVERLOADED the name, but in reality we can think of them as two unrelated methods (e.g., as if it were visitPlus(Plus n) and visitNumber(Number n) instead of visit(Plus n) and visit(Number n) as overloaded now). ---------------------------------------------------------------------- EXAMPLE 4. To fix this problem, we need to do "double dispatch." We want to dispatch both on the receiver (the visitor), and the argument (the node). We can approximate this with an "accept" method as follows: interface Visitor { void visit(Number n); void visit(Plus n); } interface Node { void accept(Visitor v); } class Number implements Node { int num; Number(int n) { num = n; } void accept(Visitor v) { v.visit(this); // The type of this is Number, not Node } } class Plus implements Node { Node lhs; Node rhs; Plus(Node l, Node r) { lhs = l; rhs = r; } void accept(Visitor v) { v.visit(this); // The type of this is Plus } } class SumVisitor implements Visitor { int sum = 0; void visit(Number n) { sum += n.num; } void visit(Plus n) { n.lhs.accept(this); n.rhs.accept(this); // The type of this is SumVisitor, which implements Visitor } } Notice how the visit() method now uses "double dispatch". It first calls the "accept" method on lhs, which is a Node. This allows lhs's type to vary at run time. Next, the accept method inside of either Number or Plus calls v.visit(this), where v is a Visitor, which can safely vary at run time. Moreover, the type of "this" is now either Number for Number.accept() or Plus for Plus.accept(), so which visit() method to call is now clear. So we have dispatched both on the dyanmic type of b (in the call n.accept(v)) and on the dynamic type of v (v.visit(this) where this will be n). Here's a good place to check out the UML diagram for the Visitor pattern. See for example http://128.42.6.89/JavaResources/DesignPatterns/Visitor.jpg. Note that in an OO language with "multi-methods", we wouldn't need double dispatch. That is, EXAMPLE 3 would work properly, allowing dispatch to occur on both the receiver and the arguments. For more on this see for example http://www.onlamp.com/pub/a/python/2003/05/29/multimethods.html http://en.wikipedia.org/wiki/Multimethod Exercise: say we added a new Node called Minus with the obvious meaning. How would we have to change the above visitor so that things still work? ---------------------------------------------------------------------- EXAMPLE 5. In this formulation, the accept() method is trivial, while the traversal and the operations are both defined by the Visitor's visit() methods. We might choose to put the traversal code in the object structure instead, as part of "acceptAndTraverse" methods. For example: interface Visitor { void visit(Number n); void visit(Plus n); } interface Node { // will do the traversal here void accept(Visitor v); } class Number implements Node { int num; Number(int n) { num = n; } void accept(Visitor v) { v.visit(this); } } class Plus implements Node { Node lhs; Node rhs; Plus(Node l, Node r) { lhs = l; rhs = r; } void accept(Visitor v) { v.visit(this); // Now, doing the traversal here lhs.accept(v); rhs.accept(v); } } class SumVisitor implements Visitor { int sum = 0; void visit(Number n) { sum += n.num; } void visit(Plus n) { // Do nothing } } But this is unsatisfactory for two reasons. First, it fixes the traversal for all visitors---e.g., we can't define a different visitor that traverses on the rhs first. Second, it makes managing state within the visitor more difficult, since visit() methods do not know from whence they are called. For example, consider your sample solution to when implementing SumVisitor to support Minus nodes---how would you do it using this technique? Thus, in general, this method is not used. ---------------------------------------------------------------------- EXAMPLE 6. In EXAMPLE 4, the visitor does both the traversal and the processing in the visit() methods. For example, the visit(Plus) method chooses to dispatch first to the lhs and then to the rhs, thus defining the traversal. The actual work is done by the visit(Number) node, which adds all the numbers together. (We might actually perform work within Plus as well, for different visitors.) To make things more generic, we might split these methods up into process() and visit() methods where the latter perform the computation, and the former perform the traversal. For example: interface ProcessVisitor { void visit(Number n); void visit(Plus n); // with process methods for doing the actual work void process(Number n); void process(Plus n); } (Node, Plus, Number as in EXAMPLE 4.) abstract class LtoRPostTraverseVisitor implements ProcessVisitor { void visit(Number n) { process(n); } void visit(Plus n) { n.lhs.accept(this); n.rhs.accept(this); process(n); } abstract void process(Number n); abstract void process(Plus n); } // Here we implement things more generically, with a stack class SumProcessVisitor extends LtoRPostTraverseVisitor { Stack sum; void process(Number n) { sum.push(n.num); } void process(Plus n) { int lsum = sum.pop(); int rsum = sum.pop(); sum.push(lsum+rsum); } } Exercise: Write an abstract in order traversal on which you could build a visitor for printing out arithmetic expressions. ---------------------------------------------------------------------- EXAMPLE 7. We could implement the above idea, but adhere to the standard Visitor interface by using composition, rather than inheritance, by using a "payload" visitor as follows: class LtoRPostTraverseVisitor2 implements Visitor { Visitor payload; LtoRPostTraverseVisitor2(Visitor v) { payload = v; } void visit(Number n) { payload.visit(n); } void visit(Plus n) { n.lhs.accept(this); n.rhs.accept(this); payload.visit(n); } } Exercise: How would you write the equivalent of SumProcessVisitor using this kind of visitor? ---------------------------------------------------------------------- DISCUSSION. - How do visitors differ from Iterators? Could you use an iterator to implement your traversal? If you could, would you need a visitor? - Is there a way to use inheritance in Example 6 WITHOUT the use of a process() method; i.e., to adhere to the standard visitor interface? If so, what is the downside?