Sequence Containers, Iterators, Ranges and Views in C++


Bjarne Stroustrup: “The C++ standard library containers were designed to meet two criteria:

  • to provide the maximum freedom in the design of an individual container,

  • while at the same time allowing containers to present a common interface to users.

This allows optimal efficiency in the implementation of containers and enables users to write code that is independent of the particular container used.”


Example

Lets assume we are about to write an output routine for a list of objects. C++ allows us to implement this in a quite generic way.

void output(auto container)
{
    for(auto i : container)
    {
        std::cout << i << "\n";
    }
}

This code actually works, the code

output(std::vector<int>{1,2,3});

does exactly what one would expect.

This is so because the function output(), or, to be more precise the for loop in it relies on the parameter container to have SequenceContainer semantics. SequenceContainer semantics is a generic way to access and iterate through the elements of a container.

In particular, the argument container does has to have methods begin() and end() which return so-called iterators. This becomes more obvious when re-writing output() in pre-C++11 code:

template<class ContainerType>
void output(ContainerType container)
{
    for(typename ContainerType::const_iterator it = container.begin(); it != container.end(); ++it)
    {
        typename ContainerType::value_type i = *it;
        std::cout << i << "\n";
    }
}

Ok, this is a bit less intuitive, right? What are iterators?

A good answer for someone familiar with plain C could be: An iterator is generalization for something that behaves like a raw pointer.

That is, an iterator is an object which can, among other things, be

  • compared, e.g. if(it1 == it2) do_something();

  • incremented, e.g. ++it1;

  • and de-referenced, e.g. object = *it;

The plain C (pointer-based) equivalent of the code above (assuming the elements in container are ints) reads as

void output(Container_value_type[] container, int size)
{
    for(Container_value_type const * i = container; i < container + size; ++i)
    {
        printf("%d\n",*i);
    }
}

Overview Standard Sequence Containers

The C++ standard library comes with a number of sequence containers for different purposes:

container

description

std::vector

1d resizeable array

std::array

1d array with compile-time constant size

std::list

doubly-linked list

std::forward_list

singly-linked list

std::set

set of elements, elements have to come with a less<> (= operator < ()) operation which induce a strict-total order

Furthermore, there are sequence containers that are defined as adaptors to other sequence containers:

container

description

std::deque

double-ended (FIFO) queue

std::stack

stack, allows pushing and popping top (=back) element

These different containers have different complexities for different operations - an overview is found here

In particular, inserting an element to a std::vector to any other position than the last has complexity O(n), whereas inserting an element in a std::list has complexity O(1) (independent of list size), whereas accessing the i-th element of a std::vector is O(1), whereas it is O(n) for a std::list.

With C++20 std::span was introduced as a “non-owning” view to a contiguous chunk of memory. This is quite helpful to simplify the code. For instance, the output method may be re-stated as

template<class T>
void output_span(std::span<T> test)
{
    for(auto i : test)
    {
        std::cout << i << "\n";
    }
}

This method may be called with arguments of type

  • std::vector<>,

  • plain old data arrays, e.g. int array[4]

  • and std::arrays

or any contiguous sub-range of it.

The examples presented in this section are found here.

The C++20 std::ranges library

The std::ranges library simplifies the usage of the C++ standard library by providing a new, simplified, safer interface:

	std::vector<int> myVec{ -3, 5, 0, 7, -4 };
#if __cplusplus < 202002L
	std::sort(myVec.begin(), myVec.end());     
#else    
	std::ranges::sort(myVec);                  
#endif    

Furthermore, the std::view library provides methods to conveniently define and manipulate sequences of object:

void output_range(std::ranges::input_range auto&& test, std::ostream& out = std::cout)
{
	for (auto i : test)
	{
		out << i << "\n";
	}
}
...
	auto test_view = std::ranges::iota_view{ 1,4 };  // = {1,2,3}

	output_range(test_view);

	// only even numbers:
	output_range(test_view | std::views::filter([](int i) {return i % 2 == 0; }));

	// squared numbers:
	output_range(test_view | std::views::transform([](int i) { return i * i; }));

A Custom Sequence Container View developed @ ITP: residual_stack_view

A question that one often encounters when implementing numerical algorithms is how to represent the jacobian $$ \mathrm{Jac},(f) =\left[\frac{\partial f_i}{\partial x_j}\right]_{i,j} $$ of a system of (differential) equations efficiently w.r.t. memory and computational time.

This question has recently popped up both in SMILE as well as EPyC.

The obvious - seeming choice of an Eigen::SparseMatrix turns out to be not optimal, as it implies using std::allocator which causes memory fragmentation, which is just slow. To make this efficient, one would have to overload the memory allocator, which make the implementation way less straightforward.

The solution that has been implemented in SMILE many years ago was to use a pair of raw pointers (int * and double *) which had to be traversed in very specific way to retrieve a jacobian matrix:

for (int i = 0; i < n_resid; i++)
{
    auto res_type = *iwstack++;
    auto ndep = *iwstack++;

    out << "Residual [" << i << "]: type " 
                    << res_type << ",  value " << *wstack << " \n";			
    for (int j = 0; j < ndep; ++j)
    {
        out
        << "  d Residual[" << i << "] / d " << model.find_addr(iwstack[j]) << " = " << wstack[1 + j] << "\n";
    }
    iwstack += ndep;  wstack += 1 + ndep;
}

This approach had practical advantages (speed, memory efficiency, FORTRAN interoperability), but is obviously hard to read and error prone (how to consistently check stack sizes?).

In a recent PR, a sequence container view to the very same underlying data model has been proposed.

This container view allows for using modern C++ syntax:

for (auto const& residual : SMILE::numalg::residual_stack_view(model, iwstack, wstack, n_resid))
{
    out << "Residual [" << residual.get_index() << "]: type " 
                    << residual.res_type() << ",  value " << residual.value() << " \n";			
    for (auto partial_derivative : residual.get_gradient())
    {
        out << "  d Residual / d " << model.find_addr(partial_derivative.independent_index()) << " = " << partial_derivative.value() << "\n";
    }
}

This view ensures data integrity, allows for interoperability with standard library algorithms and allows for more readable, abstract code without sacrificing efficiency.

Furthermore, it may be used as a common representation of a dense and a sparse jacobian.

References:

  • https://www.heise.de/blog/Die-Ranges-Bibliothek-in-C-20-Designentscheidungen-9346271.html

  • https://www.srcmake.com/home/cpp_data_structures