Even more fun with reduce
Back once more to our strings.
In the example above,
I would have preferred to output
"This is a sentence"
instead of
"Thisisasentence"
.
Basically, all that is needed is to insert a space between two words ...
but the predefined
operator+(std::string const&, std::string const&)
doesn't do this.
But hey, we can now define and pass our own operators!
(Another exercise).
We can use reduce
for task which one does not directly think of,
like counting the occurrence of elements with certain properties, like std::count_if
does.
As we have discussed before,
the type of init
(result type) might be different from the value type of the sequence.
For the operator argument this means its both arguments do not need to be of the same type.
More precisely, as its first argument it should accept a value of result type (or something result type is convertible to),
as second argument a value of value type (or value type should be convertible to it),
and return a value convertible to result type.
Thus, we can easily implement the functionality of std::count_if
with reduce:
template<typename Pred, typename T>
struct count {
Pred p;
count(Pred p) : p(p) {}
int operator()(int cnt, T t) const { return cnt + (p(t) ? 1 : 0); }
};
struct is_positive { bool operator()(float x) const { return x > 0.0; } };
int cnt_positive = reduce(a, a+N, value<int>(0), count<is_positive,float>());
In practice, it quickly becomes cumbersome to define these operators and predicates separately from their use,
in particular if they are used only once.
Libraries like boost.bind
make it somewhat more convenient.
A general solution is offered in C++11 by lambdas:
int cnt_positive = reduce(a, a+N, value<int>(0),
[](int cnt, float x) { return cnt + (x > 0 ? 1 : 0); })
Which is much more concise. Thus, it should be no problem to tackle cases like the following:
struct employee {
string name;
float salary;
};
vector<employee> staff;
// ... fill staff ...
float total_salaries = reduce(staff.begin(), staff.end(), ???);
Again, this will be easier with C++11 ...
More fun with less
In spite of all these nice examples,
it is necessary to think about appropriate limits of the application range
of reduce
.
For instance, the example with string
has the drawback,
that for each addition memory has to be allocated for the result,
which grows a bit.
It would be much more efficient to allocate the necessary memory once in advance.
But in order to support this in the generic code,
we would need to invest quite a bit of work testing for the necessity of pre-allocation,
branching to a different version of reduce
, and maybe even change the interface again.
I think the linearly growing memory requirement of the result
is a crucial structural difference
between the addition of numbers (where the result is of the same size as each input element)
and the "addition" of strings (where the size of the result is the sum of the sizes of the input elements).
Therefore, it seems adequate to introduce a separate generic algorithm maybe called concat
,
which explicitly takes the memory growth into account.
Exercise: How would you implement concat
generically?
In the original implementation of sum
, which was limited to addition,
we could use arguments of type string
only because of the accidental
coincidence that the language designers of C++ chose to use operator+
for concatenation
— other languages use different operators.
Some experts like Alexander Stepanov maintain that using operator+
for concatenation
is a design mistake, because concatenation is not commutative,
contrary to centuries-old mathematical conventions for the operation +.
For our implementation of sum
or reduce
,
the operator is not required to be commutative.
But it is easily conceivable to imagine implementations changing the order of operands,
for instance for better parallelization.
Of course, such implementations must specify this requirement
or even better use a compile time branching on this property,
but contra-intuitive non-commutative behavior remains error prone.
Our implementation of reduce
is also not optimal
if the result can be obtained without visiting the entire sequence,
for instance by reducing a sequence of booleans with logical_and
or logical_or
.
Strictly speaking, reduce
does not fully conform to our
strict definition
of a generic procedure.
Exercise: Can one repair this deficiency?
Beyond C++
What we have called reduce
,
is known by a plethora of names.
The C++ standard library calls it accumulate
(and this function has an additional init argument and the associated
problem described before).
In functional programming, general routines like reduce
are
conventially called higher-order functions
(because they take other functions as input or return them as output).
The functional programming equivalent of reduce
is known as
fold
or more precisely,
foldl,
because operations are grouped from the left: a+b+c = (a+b)+c
.
The counterpart foldr would be obtained if the sequence is traversed in reverse order (assuming commutativity),
or when using a recursive implementation (as customary in functional programming):
template<typename Iterator, typename T, typename Op>
T reduce_rec(Iterator begin, Iterator end, value<T> init, Op op)
{ return (begin == end? init : op(*begin, reduce_rec(begin+1, end, init, op))); }
In this case, init
would be used at the very end only.