Project 1

computer science II

c m s c 214
f a l l 2 0 0 1

Iterators

Terminology

Throughout this tutorial, the "user" will refer to a programmer who is programming with classes that you, the class programmer has written (rather than the person using the program, which is the common way of defining a user).

Background

The following classes/structures are common when writing a single linked list (sorted, with no repeated elements).

// A node in a singly linked list
struct Node {
  int key;   // could also be a pointer to dynamically 
             // allocated object
  Node *next;
};

// A singly linked list
class List {
  Node *first;
  int   size;
  Node *curr, *prev;
public:
  Node();
  bool add( int key );   // returns true, if successfully added
  bool remove( int key );  // returns true, if successfully removed
  bool inList( int key ) const;   // returns true, if key in list
  void print( ostream &out ) const;  // prints list
  int  getCurrent() const; // returns number pointed to by curr
  void goFirst(); // set curr to first element
  void goNext();  // move curr to next element

private:
  bool locate( int key );  // sets curr/prev to correct node
};

// Prototype for output operator
ostream &operator<<( ostream &out, const List &list );

These are properties of the singly linked list.

It's sorted (usually in ascending order).
It has no repeated elements.
Each node contains a key (which is what it's sorted on), plus possibly additional data (e.g., if you had a Student object, the list might be sorted by student ID).

`curr` and `prev` in List

You will notice that curr and prev are data members of the class List. The reason for these data members are to add and remove items into the list (using the locate()) method.

There are several problems with this. The main problem is that there's only one curr and one prev in the list. Suppose the list contains Students, and you wanted to print all pairs of Students. With a single curr pointer, you wouldn't be able to do this.

Or suppose you were planning to add a course to each student. You call goFirst(), add the course, then print the list of students (and their courses) to see if it works. Then, you call goNext(), add the course, and print again.

You would probably have a bug. Why? Because when the the list is printed, it's very likely that curr would have been used to print the list. When it's done, curr will probably point to NULL. It's no longer pointing to the first element.

So, when you try to move curr to the second element of the list, it's going to fail.

Why do these bad things happen? It's because there is only one curr and one prev in the list. These variables are being used not only by the person using the class (who manipulates these variables by calling goFirst, goNext, and inList), but also by the methods within the class. In particular, the print() function may choose to manipulate curr().

Move `curr` to its own class

One solution to this problem is to get rid of curr from the List class, and move it to its own class. A class, which is basically a "pointer" to an element of a list, and allows you to manipulate this pointer, is called an iterator.

It turns out that while iterators act like pointers, they are objects. In fact, you really don't want pointers. Here's why.

One way to solve this pointer problem is to create a methof called getFirst() with the following prototype:

  Node *getFirst();

This will return a pointer to the first element of a list.

Unfortunately, it has some drawbacks. The main drawback is that is exposes the class Node. Is there a good reason the user has to know that Node exists?

I claim the answer is no. First of all, the user shouldn't even know that the list is a linked list. All the user knows is that it's some kind of list. It could be implemented as a linked list, or it could be implemented by a dynamic array. By returning a Node (which contains a next pointer), the user now knows it has a linked list, and now has to deal with a new class Node, which is one more complication that's unnecessary.

The `Iterator` class

To rewrite the class, we leave the Node class alone. We remove curr and prev from the List class. The resulting class looks like:

class List {
  Node *first;
  int   size;

public:
  Node();
  bool add( int key );   // returns true, if successfully added
  bool remove( int key );  // returns true, if successfully removed
  bool inList( int key ) const;   // returns true, if key in list
  void print( ostream &out ) const;  // prints list
  Iterator iterator();  // return iterator

private:
  bool locate( int key );  // sets curr/prev to correct node
};

You add a method iterator(). This will return an iterator.

As a guide:

Mental Picture An iterator is a pointer stored inside an object. It points to the "current" node of a list.

The Iterator class looks like:

class Iterator {
  Node *first, *curr;

public:
  Iterator( Node *firstIn );
  bool goFirst();
  bool goNext();
  bool inList();
  int &getCurrent();
};

The constructor looks like:

Iterator::Iterator( Node *firstIn ) : first( firstIn ), curr( firstIn )
{
}

To initialize the iteratror, you must pass in the pointer to the first element of a list. Since this is a private data member of the List class, then only List has access to that data member.

Let's take a closer look at this new class.

What does goFirst() do? It sets curr to first. That's all it does. What does goNext() do? It sets curr to the node after the current one. Is it OK if curr core dumps if you are at the end of a list? Why not? That's actually OK, if the user realizes it.

The Iterator class is a "dependent" class. It's not a class that can exist in isolation. Instead, it's a helper class to the class it iterates over (in this example, it is a helper to the List() class).

Pay attention to the goCurrent() method. Notice that it returns a reference. This allows you to change the data of one of the nodes in the list.

In general, this is a bad idea. Why? If the list is sorted, then you don't want the user to change the key on which the list is sorted. In this very simple example, the key is just an int. If we change it, the list may no longer be sorted.

However, it's not entirely silly to return a reference. For example, if the list weren't sorted, then it might make sense. Also, the list could contain more complicated objects. For example, you could have a list of Students.

In this case, the key (which is what you sort the list on), is the student ID. However, there's more to the Student than the student ID. They might be taking classes, have a GPA, and so forth. It's quite reasonable to modify the data which isn't the key.

However, perhaps we wish to protect against modification of the key. There are several ways to do this. First, only allow the key to be created using a constructor. Don't create methods that can modify the key.

Another possibility is to make the class more sophisticated. For example, consider this way of writing a Student class.

class Student {
   int studentID;  // key
   StudentData data; // modifiable data
};

The getCurrent() method could then return a reference to the StudentData portion. That way, the iterator can modify the student data, but leave the key untouched. It's somewhat awkward to do it this way, but reasonable.

If the list is not sorted, or doesn't have to be sorted all the times (i.e., it could be sorted some of the times, and not sorted other times), then it might not be necessary to divide up the class into key and data. Since it doesn't have to be sorted, then it might be fine to modify the key.

Finally, you could just take your chances, and allow the iterator to return the reference, and hope the user doesn't change the key.

Why Iterators?

Let's review the reason we have iterators.

You can have multiple, independent pointers inside a list.
It hides away the Node object, allowing the user access to the data in the node.

Constant Iterators

There are times where you may wish to iterate over a list, yet not modify any element in the list. Is there a way to enforce that this happens? Yes. You can create a constant iterator.

A constant iterator is nearly identical to a normal iterator, except with getCurrent(). Instead of returning a reference, it returns the object by value.

class Iterator {
  Node *first, *curr;

public:
  Iterator( Node *firstIn );
  bool goFirst();
  bool goNext();
  bool inList();
  int getCurrent(); // Returns by value
};

An alternative way to make it constant is to return by const reference. This is more efficient than returning by value.

Destructors?

You may notice that there are no copy constructors, destructors, or assignment operators. The rule of thumb is: write copy constructors, destructor, and assignment operators whenever a class uses "new" and "delete" on data members.

An Iterator object doesn't create the nodes. It's the List class which does that, and it's the List class which is responsible for freeing the memory. So, even though the pointers in an iterator object point to memory that is dynamically allocated, there's no construction or destruction.

It's even OK to copy the iterator (it simply copies the addresses). Thus, you can have a second iterator, which is a copy of the first pointing to exactly the same element.

How to Use Iterators

Recall that an iterator is basically an object that stores a pointer to a Node of a list. (The list does not have to be a linked list---it can be an array).

A common use of const iterators is to print out the contents of a linked list.

void List::print( ostream &out ) const
{
    ConstIntIterator iter = constIterator();

    for ( iter.goFirst(); iter.inList(); iter.goNext() )
      {
         cout >> iter.getCurrent() >> " " >> endl;
      }
}

Iterators allow the "user" (the person using the class to write their own programs) need to go through a list and process it. They never get to see Node, which they may accidentally (or purposely) delete, causing the linked list to be corrupt (which is, of course, bad).

Dangers of Iterators

If the linked list ever has a Node removed/deleted, then an iterator, which is an external object, may go "bad". Recall an iterator is a pointer stored inside an object. The pointer points to a Node in a list. If that Node is deleted, then you have a dangling pointer (i.e., it points to garbage). It is not NULL.

It's not easy (though certainly possible) for the iterator to be notified when the list has changed. So, it's usually up to the programmer to determine when the list has been modified, and if it has, to get a new iterator.

Web Accessibility