# Balancing Performance and Flexibility using Modern C++(20) Metaprogramming A key takeaway from Fernando's performance analysis was that typical loops such as ```c++ for(size_t i= 0; i < n; ++i) { y[i] = f(x[i]); } ``` are orders of magnitude faster if: * *x* and *y* are *std::array*s rather than *std::vector* or *std::span*s and * *n* is a compile time constant The reasons for that are that the compiler (or, well, at least *clang* and *gcc*) is in many cases able to automatically vectorize loops (using flags /O2 resp -O3) of the above mentioned type and exploit the predictability of the memory access pattern. However, from a software engineering point of view we want the code to be flexible w.r.t. to *n*. Modern C++20 features allow us to get both a generic code base as well as fast machine code. # The *dimension* object A key software component to achieve that is the recently introduced *dimension* class. The *dimension* class represents a non-negative integer value either at compile time or at run time. The code reads as: ```c++ /** \brief dimension type to represent static and dynamic dimensions in a uniform way */ /// For Dim != std::dynamic_extent this represents a compile-time constant template struct dimension { static constexpr auto is_dynamic = false; static constexpr auto value = Dim; ///<- compile time value constexpr dimension() = default; // Constructor for syntatic compatibility with std::dynamic_extent // specialization below constexpr dimension([[maybe_unused]] size_t d) { assert(d == Dim); } }; /// partial specialization to represent run time value template <> struct dimension { static constexpr auto is_dynamic = true; size_t value; ///<- run time value constexpr dimension(size_t d) : value(d) { } }; ``` The tricky part is the partial specialization of the template for the so called *sentinel value* (=special value which indicates a special state) *std::dynamic_extent*: If the NTTP parameter *Dim != std::dynamic_extent*, the first overload is used and *dimension::value* is defined as the compile time constant *Dim*. If *Dim == std::dynamic_extent*, the partial specialization below is used and *dimension.value* is stored at runtime. Furthermore, one can detect whether *dimension.value* is a compile time constant by the (always) compile time constant boolean attribute *dimension.is_dynamic*. Hence, the function *calc_y* below can be used both for a compile time constant dimension object *n* as well as for a run time value of *n*, providing the speedup and the first case while still being usable in the latter case. ```c++ void calc_y(auto const n, MyVector const& x, MyVector& y) { for(size_t i = 0; i < n.value; ++i) { y[i] = f(x[i]); } } ``` This trick was key to enable the 'bakable' sympy models introduced with [PR Generate 'bakable' model source code](https://dev.azure.com/LindeEngineering/ITP_Dev/_git/sympy_modeling/pullrequest/11778). ## Type safety is achieved by type traits You may ask why the interface of the function above reads as *void calc_y(auto const n, MyVector const& x, MyVector& y)* and not *void calc_y(dimension const n, MyVector const& x, MyVector& y)*. Well, the reason is that dimension is not a usual C++ type. For instance *dimension<2>* and *dimension<3>* are actually different types! The type parameter 2 resp. 3 is referred to as *Non Type Template Parameter* (=*NTTP*). However, the parameter can be constrained to be a *dimension* by a C++ technique called *type traits*. *type traits* are metaprogramming tools for compile-time type introspection of types. A simple type trait to detect if an object is a *dimension* reads as ```c++ // Default case: type is not a dimension... template struct is_dimension : std::false_type { }; // ...unless there is a number N such that it equals dimension template struct is_dimension> : std::true_type { }; template inline constexpr bool is_dimension_v = is_dimension::value; // tests: static_assert(is_dimension_v>); //< partial specialization is_dimension> is used, which is derived from std::true_type // and hence makes the static constexpr bool value = true available static_assert(is_dimension_v>); static_assert(not is_dimension_v); //< generic implementation of is_dimension is used, which is derived from std::false_type // and hence makes the attribute static constexpr bool value = false available ``` This allows for refining the interface of *calc_y()* to only accept dimension objects: ```c++ void calc_y(auto const n, MyVector const& x, MyVector& y) requires is_dimension_v ``` The requires clause uses C++20 concepts to constrain the template, ensuring only dimension types are accepted. Besides being more expressive and providing better error messages, this re-implementation also allows for implementing alternative overloads of the function, e.g. ```c++ void calc_y(size_t const n, MyVector const& x, MyVector& y) ``` ## Mathematical operations on *dimension* objects One would expect *dimension* objects to behave similiarly to integer values, and particularly to come with elementary mathematical operations such as addition and multiplication. This can be done by operator overloading. However, we have to be careful to achieve interoperability of compile time dimensions and run-time dimensions. This can be achieved as follows: ```c++ template constexpr auto operator +(dimension a, dimension b) { if constexpr (not(a.is_dynamic or b.is_dynamic)) { return dimension{}; } else { return dimension{a.value + b.value}; } } ``` * if both *a* and *b* are compile time constants, the addition can be done at compile time, * otherwise it has to be done at run time. The constexpr for the return value is essential to *allow* to evaluate the function at compile time, but not *enforce* it. The compile time evaluation would be enforced by declaring the function *consteval*; however this would make the run time evaluation impossible. ## The dimension class enables conditional use of *std::array* for performance We mentioned in the introduction that using *std::array* enables vectorization and hence should be used whenever possible. The key requirement for that is that its size have to be known at compile time. The *dimension* object now helps to make the code flexible with regard to that. This example illustrates that: ```c++ /** \brief allocates std::array or std::span depending on dimension being static or dynamic */ template auto make_array(auto const d) requires is_dimension_v { if constexpr (d.is_dynamic) { return std::vector(d.value); } else { std::array array; return array; } } auto z = make_array(ncomp); ``` Hence, *z* is a *std::array* if *ncomp* is known at compile time, i.e. *not ncomp.is_dynamic*. ## Application in EPyC core The methods illustrated above are one key to the performance improvements in the EPyC core achieved over the last months. Please note that these performance improvements are rather significant. In our standard EPyC core test case *"EPYC_VFRAC"* the factor was about 50 when using clang on an x64 machine with AVX2 support.