CMM: Generic programming tutorial: Applying generic reduction

There are a number of expected and not so expected applications of reduce.
Author: Guntram Berti

Even more fun with reduce

Problems worthy of attack prove their worth by hitting back.

Piet Hein

Back once more to our strings. In the example above, I would have preferred to output "This is a sentence" instead of "Thisisasentence". Basically, all that is needed is to insert a space between two words ... but the predefined operator+(std::string const&, std::string const&) doesn't do this. But hey, we can now define and pass our own operators! (Another exercise).

We can use reduce for task which one does not directly think of, like counting the occurrence of elements with certain properties, like std::count_if does. As we have discussed before, the type of init (result type) might be different from the value type of the sequence. For the operator argument this means its both arguments do not need to be of the same type. More precisely, as its first argument it should accept a value of result type (or something result type is convertible to), as second argument a value of value type (or value type should be convertible to it), and return a value convertible to result type. Thus, we can easily implement the functionality of std::count_if with reduce:

template<typename Pred, typename T>
struct count {
  Pred p;
  count(Pred p) : p(p) {}
  int operator()(int cnt, T t) const { return cnt + (p(t) ? 1 : 0); }
};

struct is_positive { bool operator()(float x) const { return x > 0.0; } };

int cnt_positive = reduce(a, a+N, value<int>(0), count<is_positive,float>());

In practice, it quickly becomes cumbersome to define these operators and predicates separately from their use, in particular if they are used only once. Libraries like boost.bind make it somewhat more convenient. A general solution is offered in C++11 by lambdas:

int cnt_positive = reduce(a, a+N, value<int>(0), 
	                  [](int cnt, float x) { return cnt + (x > 0 ? 1 : 0); })

Which is much more concise. Thus, it should be no problem to tackle cases like the following:

struct employee {
  string  name;
  float   salary;
};

vector<employee> staff;
// ... fill staff ...
float total_salaries = reduce(staff.begin(), staff.end(), ???);

Again, this will be easier with C++11 ...

More fun with less

In spite of all these nice examples, it is necessary to think about appropriate limits of the application range of reduce. For instance, the example with string has the drawback, that for each addition memory has to be allocated for the result, which grows a bit. It would be much more efficient to allocate the necessary memory once in advance. But in order to support this in the generic code, we would need to invest quite a bit of work testing for the necessity of pre-allocation, branching to a different version of reduce, and maybe even change the interface again.

I think the linearly growing memory requirement of the result is a crucial structural difference between the addition of numbers (where the result is of the same size as each input element) and the "addition" of strings (where the size of the result is the sum of the sizes of the input elements). Therefore, it seems adequate to introduce a separate generic algorithm maybe called concat, which explicitly takes the memory growth into account.
Exercise: How would you implement concat generically?

In the original implementation of sum, which was limited to addition, we could use arguments of type string only because of the accidental coincidence that the language designers of C++ chose to use operator+ for concatenation — other languages use different operators. Some experts like Alexander Stepanov maintain that using operator+ for concatenation is a design mistake, because concatenation is not commutative, contrary to centuries-old mathematical conventions for the operation +. For our implementation of sum or reduce, the operator is not required to be commutative. But it is easily conceivable to imagine implementations changing the order of operands, for instance for better parallelization. Of course, such implementations must specify this requirement or even better use a compile time branching on this property, but contra-intuitive non-commutative behavior remains error prone.

Our implementation of reduce is also not optimal if the result can be obtained without visiting the entire sequence, for instance by reducing a sequence of booleans with logical_and or logical_or. Strictly speaking, reduce does not fully conform to our strict definition of a generic procedure.
Exercise: Can one repair this deficiency?

Beyond C++

What we have called reduce, is known by a plethora of names. The C++ standard library calls it accumulate (and this function has an additional init argument and the associated problem described before).

In functional programming, general routines like reduce are conventially called higher-order functions (because they take other functions as input or return them as output). The functional programming equivalent of reduce is known as fold or more precisely, foldl, because operations are grouped from the left: a+b+c = (a+b)+c. The counterpart foldr would be obtained if the sequence is traversed in reverse order (assuming commutativity), or when using a recursive implementation (as customary in functional programming):

template<typename Iterator, typename T, typename Op>
T reduce_rec(Iterator begin, Iterator end, value<T> init, Op op)
{ return (begin == end? init : op(*begin, reduce_rec(begin+1, end, init, op))); }

In this case, init would be used at the very end only.