# Sequence Containers, Iterators, Ranges and Views in C++ --- _Bjarne Stroustrup: “The C++ standard library containers were designed to meet two criteria:_ * _to provide the maximum freedom in the design of an individual container,_ * _while at the same time allowing containers to present a common interface to users._ _This allows optimal efficiency in the implementation of containers and enables users to write code that is independent of the particular container used.”_ --- ### Example Lets assume we are about to write an output routine for a list of objects. C++ allows us to implement this in a quite generic way. ```c++ void output(auto container) { for(auto i : container) { std::cout << i << "\n"; } } ``` This code actually works, the code ```c++ output(std::vector{1,2,3}); ``` does exactly what one would expect. This is so because the function *output()*, or, to be more precise the for loop in it relies on the parameter *container* to have *SequenceContainer semantics*. *SequenceContainer semantics* is a generic way to access and iterate through the elements of a container. In particular, the argument *container* does has to have methods *begin()* and *end()* which return so-called *iterators*. This becomes more obvious when re-writing *output()* in pre-C++11 code: ```c++ template void output(ContainerType container) { for(typename ContainerType::const_iterator it = container.begin(); it != container.end(); ++it) { typename ContainerType::value_type i = *it; std::cout << i << "\n"; } } ``` Ok, this is a bit less intuitive, right? What are iterators? A good answer for someone familiar with plain C could be: An iterator is generalization for something that behaves like a raw pointer. That is, an *iterator* is an object which can, among other things, be * _compared_, e.g. if(it1 == it2) do_something(); * _incremented_, e.g. ++it1; * and _de-referenced_, e.g. object = *it; The plain C (pointer-based) equivalent of the code above (assuming the elements in container are *int*s) reads as ```c++ void output(Container_value_type[] container, int size) { for(Container_value_type const * i = container; i < container + size; ++i) { printf("%d\n",*i); } } ``` ## Overview Standard Sequence Containers The C++ standard library comes with a number of sequence containers for different purposes: | container | description | |-----------|-------------| | std::vector | 1d resizeable array | | std::array | 1d array with compile-time constant size | | std::list | doubly-linked list | | std::forward_list | singly-linked list | | std::set | set of elements, elements have to come with a *less<>* (= *operator < ()*) operation which induce a strict-total order Furthermore, there are sequence containers that are defined as adaptors to other sequence containers: | container | description | |-----------|-------------| | std::deque | double-ended (FIFO) queue | | std::stack | stack, allows pushing and popping top (=back) element | These different containers have different complexities for different operations - an overview is found [here](https://alyssaq.github.io/stl-complexities) In particular, inserting an element to a *std::vector* to any other position than the last has complexity *O(n)*, whereas inserting an element in a *std::list* has complexity *O(1)* (independent of list size), whereas accessing the *i*-th element of a *std::vector* is *O(1)*, whereas it is *O(n)* for a *std::list*. With C++20 *std::span* was introduced as a "non-owning" view to a contiguous chunk of memory. This is quite helpful to simplify the code. For instance, the output method may be re-stated as ```c++ template void output_span(std::span test) { for(auto i : test) { std::cout << i << "\n"; } } ``` This method may be called with arguments of type * *std::vector<>*, * plain old data arrays, e.g. *int array[4]* * and *std::arrays* or any contiguous sub-range of it. The examples presented in this section are found [here](https://dev.azure.com/LindeEngineering/ITP_Dev/_git/ITP_Software_Quality?version=GBfeature/sequence_containers&%3Bpath=/Examples/SequenceContainers/demo_sequence_containers.cpp&path=/Examples/SequenceContainers/demo_sequence_containers.cpp). ## The C++20 *std::ranges* library The *std::ranges* library simplifies the usage of the C++ standard library by providing a new, simplified, safer interface: ```c++ std::vector myVec{ -3, 5, 0, 7, -4 }; #if __cplusplus < 202002L std::sort(myVec.begin(), myVec.end()); #else std::ranges::sort(myVec); #endif ``` Furthermore, the *std::view* library provides methods to conveniently define and manipulate sequences of object: ```c++ void output_range(std::ranges::input_range auto&& test, std::ostream& out = std::cout) { for (auto i : test) { out << i << "\n"; } } ... auto test_view = std::ranges::iota_view{ 1,4 }; // = {1,2,3} output_range(test_view); // only even numbers: output_range(test_view | std::views::filter([](int i) {return i % 2 == 0; })); // squared numbers: output_range(test_view | std::views::transform([](int i) { return i * i; })); ``` ## A Custom Sequence Container View developed @ ITP: *residual_stack_view* A question that one often encounters when implementing numerical algorithms is how to represent the jacobian $$ \mathrm{Jac}\,(f) =\left[\frac{\partial f_i}{\partial x_j}\right]_{i,j} $$ of a system of (differential) equations efficiently w.r.t. memory and computational time. This question has recently popped up both in SMILE as well as EPyC. The obvious - seeming choice of an *Eigen::SparseMatrix* turns out to be not optimal, as it implies using *std::allocator* which causes memory fragmentation, which is just slow. To make this efficient, one would have to overload the memory allocator, which make the implementation way less straightforward. The solution that has been implemented in SMILE many years ago was to use a pair of raw pointers (_int *_ and _double *_) which had to be traversed in very specific way to retrieve a jacobian matrix: ```c++ for (int i = 0; i < n_resid; i++) { auto res_type = *iwstack++; auto ndep = *iwstack++; out << "Residual [" << i << "]: type " << res_type << ", value " << *wstack << " \n"; for (int j = 0; j < ndep; ++j) { out << " d Residual[" << i << "] / d " << model.find_addr(iwstack[j]) << " = " << wstack[1 + j] << "\n"; } iwstack += ndep; wstack += 1 + ndep; } ``` This approach had practical advantages (speed, memory efficiency, FORTRAN interoperability), but is obviously hard to read and error prone (how to consistently check stack sizes?). In a [recent PR](https://dev.azure.com/LindeEngineering/ITP-Smile/_git/M-SMILE/pullrequest/5268), a sequence container view to the very same underlying data model has been proposed. This container view allows for using modern C++ syntax: ```c++ for (auto const& residual : SMILE::numalg::residual_stack_view(model, iwstack, wstack, n_resid)) { out << "Residual [" << residual.get_index() << "]: type " << residual.res_type() << ", value " << residual.value() << " \n"; for (auto partial_derivative : residual.get_gradient()) { out << " d Residual / d " << model.find_addr(partial_derivative.independent_index()) << " = " << partial_derivative.value() << "\n"; } } ``` This view ensures data integrity, allows for interoperability with standard library algorithms and allows for more readable, abstract code without sacrificing efficiency. Furthermore, it may be used as a common representation of a dense and a sparse jacobian. ## References: * https://www.heise.de/blog/Die-Ranges-Bibliothek-in-C-20-Designentscheidungen-9346271.html * https://www.srcmake.com/home/cpp_data_structures