CMM: Generic programming tutorial: Generic sum, with parametrized iteration

Generic programming abstracts from the concrete data representation / data structure. It should not matter whether the elements to be summed are part of an array, a list or come from elsewhere. We use iterators (generalizations of pointers) to achieve this abstraction.
Author: Guntram Berti

Why only arrays?

As we have seen, we can pass arrays of arbitrary value type to our sum1. But what, if our data live in a std::list<int>? Or even in a file? Or we want to compute column sums of matrices, thus accessing only every nth element?

For computing a sum, we basically just need to access each element once. This is certainly possible for matrix columns and lists. It's just that the concrete details of element access and moving to the next element are different.

To gain a clearer view of what is going on, and how we could generalize sum1, we reformulate the code a bit. What we certainly cannot do with a list, is the random access to elements like in a[i] for an array. But we don't need that for a sum!

template<typename T>
T sum1b(T* a, T* a_end)
{
  T res = 0;
  for(; a != a_end; ++a)
    res += *a;
  return res;
}

Instead of a pointer to the start of the array (the first element) and the number of elements we now provide pointers to the start and after the last element. (Why not to the last element? I'll discuss this elsewhere in more detail, for now, observe that we can thus also specify empty ranges, and in some cases, it might not be possible / cheap to access the last element). Most importantly, the code does not use the random access with operator[] any more, but instead the dereference operator operator* (like *a). Thus, the call of sum1 changes a bit:

float a[10];
// ...
float sa = sum1b(a, a+10);

What did we gain with this change? We now have reduced the necessary operations for the argument a of type T*:

pointer increment: ++a
pointer comparison: a != a_end
pointer dereference: *a

This still looks like being rather specialized to pointers, but in reality, these are quite simple operations, which in principle can be implemented for lists as well (and equally for matrix columns).

Generalizations of pointers: Iterators

Until now, we simply changed our implementation of sum, to make it more generic. At this point however, we have to add something to the intended argument types, in order to provide the necessary functionality. Or, more precisely, to translate existing functionality into the language of the generic function sum: Our basic assumption is that the necessary functionality is in principle already present in the argument type in reality, we'll see we need a closer look on sum1. So, let's take as example a simple list definition:

struct mylist {
  int      data;
  mylist  *next;
};

For this class, we now want to implement a helper type that behaves like a pointer with respect to the functionality listed above (increment, comparison, dereference), i.e. supports the operations ++p, *p and p != q. Such a type permits iteration over list elements and is called an Iterator type. As iterators are ubiquitous, they have been formalized in the C++ standard library. In our example below, we implement only a subset of the iterator functionality.

struct mylist_iter {
  // ...
  mylist  *curr;
  mylist_iter& operator++() { 
    if(curr) 
      curr = curr->next;
    return *this; 
  }
  // precondition: curr != 0 
  int&         operator* () { return curr->data; } 
};

bool operator!=(mylist_iter const& p, mylist_iter const& q)
{ return p.curr != q.curr; }

Well equipped, we can now extend our definition of sum to accept not only pointer types, but arbitrary iterators providing the expected interface:

template<typename Iterator>
?? sum2(Iterator  a, Iterator a_end)
{
  ?? res = 0;
  for(; a != a_end; ++a)
    res += *a;
  return res;
}

But wait. We called for some trouble here. What should be the return type of sum2? Before, we could deduce this type T directly from the argument type T*. From where do we take it now?