CMM: Generic programming tutorial: Generic sum with initialization parameter, how to control the value type?

The initialization parameter comes with some pitfalls, but permits more control.
Author: Guntram Berti

Little pitfalls

Let's look what we can do with sum3 ...

Implicit type of literals arguments for `init`

int   a[] = { 0,   1,   2,   3 };
float b[] = { 0.5, 1.5, 2.5, 3.5 };

int   sa = sum3(a, a+4, 0); // Yeah!
float sb = sum3(b, b+4, 0); // Ouch!

The result sb is not quite as expected ... what did happen? Basically, the same as in the example with strings: The compiler determines the implicit type of the literal 0 not as float (and why should it?), but as int. And if you are unlucky, you don't see a warning, nowhere ... (gcc does not warn with only -Wall, you have to use -Wconversion). So, again, we need to be more explicit:

float b[] = { 0.5, 1.5, 2.5, 3.5 };

float sb  = sum3(b, b+4, float(0)); // Like this!
float sb1 = sum3(b, b+4, 0.0);     //  Or like that?

This is ugly and a fruitful source of bugs. (Btw, is 0.0 of type float or double? float(0) is unambiguous and can be generalized to any type.) We would like to shield ourselves and our users from these pitfalls … we'll come back to that in a minute.

Controlling the algorithm via `init`

On the other hand however, the explicit specification of the result type permits algorithmic improvements by choosing an extended type for computation. Let's look at the following:

vector<unsigned char> dice_rolls(BigN); 

// surprise
unsigned sum_dice_rolls = sum3(dice_rolls.begin(), dice_rolls.end(), (unsigned char)(0));

Looking at the result, we are unpleasantly surprised. A value of type unsigned char can hold the result of a single dice roll, but if N > 42, the sum can already be too large: The value type of the sequence is too narrow for summation! In this case, the call

unsigned sum_dice_rolls = sum3(dice_rolls.begin(), dice_rolls.end(), 0);

would have been better (by accident). To make this intent more explicit, we can write (assuming we know unsigned int is large enough):

unsigned sum_dice_rolls = sum3(dice_rolls.begin(), dice_rolls.end(), unsigned(0));

But even if the result fits the nominal value type of the sequence, it can be advisable or even necessary to employ a wider type for computation. This is true in particular for floating point types, where the general recommendation is to always use double precision for computations.

std::vector<float> a(1000000, 1.0);
// ...
double a_sum = sum3(a.begin(), a.end(), double(0.0));

Improving the design

As quintessence we can take home that the possibility to explicitly choose the compute and result type via the init parameter is an important method of controlling the computation, but is extremely error prone as well. Thus, we should think about eliminating the error source, while keeping the compute and result type control possibility.

When using sum3, errors come in from two directions:

The implicit specification of the type of init by a literal like 0, possibly leading to an inappropriate result type
The value type of the input sequence is too narrow for computation and result

Errors of the first kind are indeed due to the generic implementation. Errors of the second kind may be triggered by an error of the first kind, but can occur as well when explicitly specifying the type of init, and thus also in traditional, non-generic code. We'll come back to this type of error later.

In order to eliminate error source no. 1, we can enforce the explicit specification of the type of init, by changing the interface of sum3 such that the type has to be provided:

template<class T>
struct value {
  typedef T type;
  type val;

  value(type const& t) : val(t) {}
  // make T t = value<T>(v) possible
  operator type() { return val; }
};

template<typename Iterator, typename T>
T
sum3b(Iterator  a, Iterator a_end, value<T> init)
{
  T res = init;
  for(; a != a_end; ++a)
    res += *a;
  return res;
}

We now call sum3b like this:

float a[] = { 0.0, 0.5, 1.0, 1.5 };
float sa  = sum3b(a, a+4, value<float>(0));

This is, admittedly, somewhat clumsy. What about the normal cases, where compute type and iterator value type can coincide?

Little pitfalls

Implicit type of literals arguments for init

Controlling the algorithm via init

Improving the design

Implicit type of literals arguments for `init`

Controlling the algorithm via `init`