Relearning C++ after C++11

C++ is an old but evolving programming language that can be used for almost anything and can be found in many places. In fact, C++'s inventor Bjarne Stroustrup described it as the invisible foundation of everything. Sometimes, it might be deep inside a library of another language because C++ can be used for performance-critical paths. It can run in small embedded systems or it can power video games. Your browser might be using it. C++ is almost everywhere!

Why C++ matters

C++ has been around for a long time but has changed significantly, particularly since 2011. A new standard referred to as C++11 was introduced then, marking the beginning of a new era of frequent updates. If you haven't used C++ since before C++11, you have a lot to catch up on. So where do you start?

The language is compiled and targeted at a specific architecture such as a PC, a mainframe, an embedded device, bespoke hardware, or anything else you can think of. If you need your code to run on various different types of machines, you need to recompile it. This has pros and cons. Different configurations give you more maintenance work, but compiling to a specific architecture gets you "down to the metal", allowing the speed advantage.

Whatever platform you target, you will need a compiler. You also need an editor or integrated development environment (IDE) to write code in C++. ISOCpp gives a list of resources including C++ compilers. The Gnu compiler collection (gcc), Clang and Visual Studio all have free versions. You can even use Matt Godbolt's compiler explorer to try code on various compilers in your browser. The compiler may support various different versions of C++, so you have to state the version you need in the compiler flags, for example -std=c++23 for g++ or /std:c++latest for Visual Studio. The ISOCpp website has an FAQ section which gives an overview of some recent changes including C++11 and C++14 and big picture questions. There are several books focused on later new versions of C++ too.

Quick look at C++11 using Vector

If you’ve been left behind, the plethora of resources might be overwhelming. However, we can focus on a small example to understand some basics. Stopping to try things out is often the best way to learn. So let’s start with something approachable!

A useful (and easy) starting point is the humble vector, which lives in the vector header in the namespace std, short for standard. CppReference provides an overview telling us the vector is a sequence container that encapsulates dynamic sized arrays. A vector therefore contains a sequence of contiguous elements, and we can resize a vector as needed. The vector itself is a class template, so it needs a type, for example std::vector<int>. We can add an item to the end of a vector using push_back. C++11 introduced a new method called emplace_back which takes values to construct a new item. For an int, the code looks identical:

    std::vector<int> numbers;
    numbers.push_back(1);
    numbers.emplace_back(1);

If we had something more complicated than an int, we might get performance benefits from the emplace version, because the emplace version can avoid copying the item by constructing it in place.

C++11 introduced r-value references and move semantics to avoid unnecessary copying. Potential performance improvements were one of the motivations for C++11, which subsequent versions have built on. To explain what r-value references are, let’s consider the push_back method from the previous example. It has two overloads, one taking a const reference, const T& value, and one taking an r-value reference, T&& value. The second version can move the elements into the vector, which can avoid copying temporary objects. Likewise, the signature of emplace_back takes the arguments by r-value reference, Args&&…, again allowing the arguments to be moved rather than copied. Move semantics is a big topic, and we have only scratched the surface. Thomas Becker wrote an excellent article back in 2013 that walks through the details if you want to learn more.

Let's now make a vector and put a couple of items in it, then display the contents using std::cout, from the iostream header. We use the stream insertion operator<< to display the elements. We can write a for loop over the size of the vector, and use operator [] to access each element:

#include <iostream>
#include <vector>

void warm_up()
{
    std::vector<int> numbers;
    numbers.push_back(1);
    numbers.emplace_back(1);
    for(int i=0; i<numbers.size(); ++i)
    {
        std::cout << numbers[i] << ' ';
    }
    std::cout << '\n';
}

int main()
{
    warm_up();
}

The code displays two 1s. This code is available on the compiler explorer.

Class template argument deduction

Let's do something slightly more interesting, and learn a bit more modern C++. We can build up the first few triangle numbers and we might spot a pattern. The triangle numbers are 1, 3, 6, 10, ... formed by summing 1, 1+2, 1+2+3, 1+2+3+4, ... . If we racked up that many snooker balls, we could make a triangle, hence the name:

To add another row, we would add six more snooker balls. A further row would add seven, and so on.

In order to get the numbers 1, 2, 3, etc. we could form a vector filled with 1s, then sum these. Rather than using another loop, we can directly create a vector with, say, 18 ones. We state how many we want followed by the value:

    std::vector numbers(18, 1);

Notice we don't need to say <int> any more. Since C++17, class template argument deduction (CTAD) has been possible. The compiler can deduce that we mean int, since we asked for the value 1, which is an int. If we need to display the vector, we can use a range based for loop. Instead of using a traditional for loop over the vector's indices, we state a type, or even use the new keyword auto, telling the compiler to figure out the type, followed by a colon and then the container:

    for (auto i : numbers)
    {
        std::cout << i << ' ';
    }
    std::cout << '\n';   

CTAD and the range based for loop are some of the convenience features introduced since C++11.

Ranges

Armed with a vector of ones, we can include the numeric header, and fill a new vector with the partial sums, 1, 1+1, 1+1+1, etc. giving us 1, 2, 3, etc. We need to state the type of the new vector because we will start with an empty vector and the compiler can't deduce its type without any values to use. The partial_sum needs the beginning and end of the numbers and finally we need to use a back_inserter so the destination vector grows as needed:

    #include <algorithm>
…
    std::vector numbers(18, 1);
    std::vector<int> sums;
    std::partial_sum(numbers.begin(), numbers.end(),
        std::back_inserter(sums));

This gives us the numbers 1 to 18 inclusive. We are part way to our triangle numbers, but C++ now lets us be more succinct. C++11 introduced the iota function, also in the numeric header, which fills a container with increasing values for us:

    std::vector<int> sums(18);
    std::iota(sums.begin(), sums.end(), 1);

In fact, C++23 introduced a ranges version, which finds begin and end for us:

    std::ranges::iota(sums, 1);

C++23 isn't widely supported yet, so you might have to wait until your compiler offers the ranges' version. Many of the algorithms in the numeric and algorithm headers have two versions, one taking a pair of input iterators, first and last, and a ranges version simply taking the container. Ranges overloads are gradually being added to standard C++. Ranges offer far more than avoiding the need to specify two iterators. We can filter and transform inputs, chain these together, and use views to avoid copying data. Ranges support lazy evaluation, so the contents of the view is only evaluated when needed. Ivan Čukić's Functional Programming in C++ gives further details on this (and much more).

We need to do one last thing to form the triangle numbers. If we find the partial sums of the vector

    std::partial_sum(sums.begin(), sums.end(), sums.begin());

we have the triangle numbers we wanted, 1, 3, 6, 10, 15, ... 171.

We noted there are ranges versions of some algorithms. Let's use one. The first two triangle numbers 1 then 3 are odd, then we get two even numbers, 6 then 10. Does this pattern continue? If we transform our vector, flagging odd numbers with a dot, '.', and even numbers with a star '*', we will find out. We can declare a new vector to hold the transformation. We will use a single character for each number, so we need a vector of char:

std::vector<char> odd_or_even.

We can write a short function taking an int and returning the appropriate character:

char flag_odd_or_even(int i)
{
    return i % 2 ? '.' : '*';
}

If i % 2 is non-zero, we have an odd number, so we return '.'; otherwise, we return '*'. We can use our function in the transform function from the algorithm header. The original version took a pair of input iterators, first and last, an output iterator and a unary function: a function taking one input, like our flag_odd_or_even function. C++20 introduced a ranges version, which takes an input source, rather than a pair of iterators, along with the output iterator and unary function. This means we can write

    std::vector<char> odd_or_even;
    std::ranges::transform(sums,
        std::back_inserter(odd_or_even),
        flag_odd_or_even);

to transform the sums we generated earlier. If we look at the output we see

. . * * . . * * . . * * . . * * . .

It appears that we do get two odd numbers then two even numbers over and over. Stack Exchange's math site explains why this happens.

Lambdas

Let's make one final improvement to our code using another new C++ feature. If we look at the transformation code, we have to look elsewhere to see what the unary function does.

C++11 introduced anonymous functions or lambda expressions. They look like named functions, having parameters in brackets, and the body in curly braces, however they do not have a name, do not need a return type, and have a capture group denoted by []:

[](int i) { return i%2? '.':'*'; }

If we compare this with the named function

char flag_odd_or_even(int i){ return i % 2 ? '.' : '*'; }

we can see the similarity. We can specify variables in the capture group, giving us a closure. They are beyond the scope of this article, but are very powerful and common in functional programming.

If we assign the lambda to a variable

auto lambda = [](int i) { return i % 2 ? '.' : '*'; };

we can call it as we would a named function:

lambda(7);

This feature allows us to rewrite the transform call with a lambda:

    std::ranges::transform(sums,
        std::back_inserter(odd_or_even),
        [](int i) { return i%2? '.':'*'; });

We can then see what the transforming function does without having to look elsewhere.

Summary

Pulling everything together, we have the following code:

#include <algorithm>
#include <iostream>
#include <numeric>
#include <vector>

int main()
{
    std::vector<int> sums(18);
    std::iota(sums.begin(), sums.end(), 1);
    std::partial_sum(sums.begin(), sums.end(), sums.begin());

    std::vector<char> odd_or_even;
    std::ranges::transform(sums,
        std::back_inserter(odd_or_even),
        [](int i) { return i%2? '.':'*'; });

    for (auto c : odd_or_even)
    {
        std::cout << c << ' ';
    }
    std::cout << '\n';
}

We've used ranges, lambdas and range-based for loops are some of the new features introduced in C++11. Move semantics is also a powerful feature that can help improve performance in C++. Vectors are a popular container in C++ that can be used to store elements of any type. Keep up the good work!

No comments:

Post a Comment