Sequence Containers, Iterators, Ranges and Views in C++
Bjarne Stroustrup: “The C++ standard library containers were designed to meet two criteria:
to provide the maximum freedom in the design of an individual container,
while at the same time allowing containers to present a common interface to users.
This allows optimal efficiency in the implementation of containers and enables users to write code that is independent of the particular container used.”
Example
Lets assume we are about to write an output routine for a list of objects. C++ allows us to implement this in a quite generic way.
void output(auto container)
{
for(auto i : container)
{
std::cout << i << "\n";
}
}
This code actually works, the code
output(std::vector<int>{1,2,3});
does exactly what one would expect.
This is so because the function output(), or, to be more precise the for loop in it relies on the parameter container to have SequenceContainer semantics. SequenceContainer semantics is a generic way to access and iterate through the elements of a container.
In particular, the argument container does has to have methods begin() and end() which return so-called iterators. This becomes more obvious when re-writing output() in pre-C++11 code:
template<class ContainerType>
void output(ContainerType container)
{
for(typename ContainerType::const_iterator it = container.begin(); it != container.end(); ++it)
{
typename ContainerType::value_type i = *it;
std::cout << i << "\n";
}
}
Ok, this is a bit less intuitive, right? What are iterators?
A good answer for someone familiar with plain C could be: An iterator is generalization for something that behaves like a raw pointer.
That is, an iterator is an object which can, among other things, be
compared, e.g. if(it1 == it2) do_something();
incremented, e.g. ++it1;
and de-referenced, e.g. object = *it;
The plain C (pointer-based) equivalent of the code above (assuming the elements in container are ints) reads as
void output(Container_value_type[] container, int size)
{
for(Container_value_type const * i = container; i < container + size; ++i)
{
printf("%d\n",*i);
}
}
Overview Standard Sequence Containers
The C++ standard library comes with a number of sequence containers for different purposes:
container |
description |
---|---|
std::vector |
1d resizeable array |
std::array |
1d array with compile-time constant size |
std::list |
doubly-linked list |
std::forward_list |
singly-linked list |
std::set |
set of elements, elements have to come with a less<> (= operator < ()) operation which induce a strict-total order |
Furthermore, there are sequence containers that are defined as adaptors to other sequence containers:
container |
description |
---|---|
std::deque |
double-ended (FIFO) queue |
std::stack |
stack, allows pushing and popping top (=back) element |
These different containers have different complexities for different operations - an overview is found here
In particular, inserting an element to a std::vector to any other position than the last has complexity O(n), whereas inserting an element in a std::list has complexity O(1) (independent of list size), whereas accessing the i-th element of a std::vector is O(1), whereas it is O(n) for a std::list.
With C++20 std::span was introduced as a “non-owning” view to a contiguous chunk of memory. This is quite helpful to simplify the code. For instance, the output method may be re-stated as
template<class T>
void output_span(std::span<T> test)
{
for(auto i : test)
{
std::cout << i << "\n";
}
}
This method may be called with arguments of type
std::vector<>,
plain old data arrays, e.g. int array[4]
and std::arrays
or any contiguous sub-range of it.
The examples presented in this section are found here.
The C++20 std::ranges library
The std::ranges library simplifies the usage of the C++ standard library by providing a new, simplified, safer interface:
std::vector<int> myVec{ -3, 5, 0, 7, -4 };
#if __cplusplus < 202002L
std::sort(myVec.begin(), myVec.end());
#else
std::ranges::sort(myVec);
#endif
Furthermore, the std::view library provides methods to conveniently define and manipulate sequences of object:
void output_range(std::ranges::input_range auto&& test, std::ostream& out = std::cout)
{
for (auto i : test)
{
out << i << "\n";
}
}
...
auto test_view = std::ranges::iota_view{ 1,4 }; // = {1,2,3}
output_range(test_view);
// only even numbers:
output_range(test_view | std::views::filter([](int i) {return i % 2 == 0; }));
// squared numbers:
output_range(test_view | std::views::transform([](int i) { return i * i; }));
A Custom Sequence Container View developed @ ITP: residual_stack_view
A question that one often encounters when implementing numerical algorithms is how to represent the jacobian $$ \mathrm{Jac},(f) =\left[\frac{\partial f_i}{\partial x_j}\right]_{i,j} $$ of a system of (differential) equations efficiently w.r.t. memory and computational time.
This question has recently popped up both in SMILE as well as EPyC.
The obvious - seeming choice of an Eigen::SparseMatrix
The solution that has been implemented in SMILE many years ago was to use a pair of raw pointers (int * and double *) which had to be traversed in very specific way to retrieve a jacobian matrix:
for (int i = 0; i < n_resid; i++)
{
auto res_type = *iwstack++;
auto ndep = *iwstack++;
out << "Residual [" << i << "]: type "
<< res_type << ", value " << *wstack << " \n";
for (int j = 0; j < ndep; ++j)
{
out
<< " d Residual[" << i << "] / d " << model.find_addr(iwstack[j]) << " = " << wstack[1 + j] << "\n";
}
iwstack += ndep; wstack += 1 + ndep;
}
This approach had practical advantages (speed, memory efficiency, FORTRAN interoperability), but is obviously hard to read and error prone (how to consistently check stack sizes?).
In a recent PR, a sequence container view to the very same underlying data model has been proposed.
This container view allows for using modern C++ syntax:
for (auto const& residual : SMILE::numalg::residual_stack_view(model, iwstack, wstack, n_resid))
{
out << "Residual [" << residual.get_index() << "]: type "
<< residual.res_type() << ", value " << residual.value() << " \n";
for (auto partial_derivative : residual.get_gradient())
{
out << " d Residual / d " << model.find_addr(partial_derivative.independent_index()) << " = " << partial_derivative.value() << "\n";
}
}
This view ensures data integrity, allows for interoperability with standard library algorithms and allows for more readable, abstract code without sacrificing efficiency.
Furthermore, it may be used as a common representation of a dense and a sparse jacobian.
References:
https://www.heise.de/blog/Die-Ranges-Bibliothek-in-C-20-Designentscheidungen-9346271.html
https://www.srcmake.com/home/cpp_data_structures