A simple indexing class of variadic std::size_t template parameters is often used to provide a structured method to select multiple elements from a C++11 tuple. In this post I present an alternative to this interface, commonly used when building an object of this type, wherein a finite series of indices is now generated according to a numeric range and signed stride; akin to Fortran array section syntax. Let’s look at the common solution first.

A tuple may be created either using the std::tuple constructor, or std::make_tuple function template. Both expect the same variadic series of arguments. In addition, the constructor requires the type of each argument explicitly. Hence, the more concise std::make_tuple, is used throughout the examples. In the example below, the last element of t1 will be an int.

auto t1 = std::make_tuple('z',false,42,"cat",7);
auto t2 = std::tuple<char,bool,int,const char *,long>('z',false,42,"ktn",7);

Selection of a single tuple element is achieved using the standard tuple function template: std::get. The sole template parameter of std::get, a std::size_t, represents an index; and must be a constant expression. Using the t1 tuple from the code above, std::get<4>(t1) evaluates to an int; with a value of 7.

As a variadic function template, std::make_tuple can of course be used to create a new tuple from the elements of another. In the code below, t3, having type tuple<char,int,int>, is formed from copies of the 1st, 3rd, and 5th elements from the earlier tuple, t1.

auto t3 = std::make_tuple(std::get<0>(t1),std::get<2>(t1),std::get<4>(t1));

In the code above, we hard coded the selection using three index values: 0, 2, and 4. How could we instead write a function template, select, that accepts, at least, a tuple argument, and returns another tuple formed from an arbitrary set of its elements? The conventional solution introduces the following simple variadic class template:

template <std::size_t ...Is>
struct indices {};

An object of type indices might then be created with, for example: indices<>; indices<0,2,4>; or indices<1,1,2,3,5,8>. Our select function may then be defined, as shown in the code below.

template <typename ...Ts, std::size_t ...Is>
select(std::tuple<Ts...> t, indices<Is...>) ->
  decltype(std::make_tuple( std::get<Is>(t)... )) {
  return   std::make_tuple( std::get<Is>(t)... );

Calling select with the earlier tuple variable t1 and a indices<0,2,4> variable, again results in a tuple with type tuple<char,int,int>.

There are a few C++11 things to notice in this code. The Is template parameter pack is not expanded “directly” in the function body, due to the placement of the ellipsis. So, while indices<Is...>, say, would expand to indices<0,2,4>, in the select function above, the actual expansion becomes std::get<0>(t), std::get<2>(t), std::get<4>(t); three arguments for the variadic make_tuple function. Such ellipses will expand all parameter packs in the pattern to their left. Expanding multiple parameter packs of differing lengths, with a single ellipsis, will cause a compilation error.

Also worth noting: the use of C++11′s trailing return type, and decltype specifier is, here, optional; the select function can be typed just as effectively by the more ornate code below. Often, however, the former typing technique is preferable as it has a simple, mechanical, syntax-based application; and so is generally applicable. With short functions the concise form can also provide a readable symmetry. With luck, perhaps by C++17, or maybe even C++14, we can do away with it altogether; after all, the auto keyword can bind to untyped lambda expressions.

template <typename ...Ts, std::size_t ...Is>
  typename std::decay<
    typename std::tuple_element<Is,std::tuple<Ts...>>::type
select(std::tuple<Ts...> t, indices<Is...>) {
  return std::make_tuple( std::get<Is>(t)... );

Finally, as only the template arguments of its type are used, the second function parameter is not bound to a name.

All well and good, but how would I modify every element of an arbitrary-length tuple? A common solution, seen here, here, and here, is to use make_indices, a variadic class template with a typedef member, type, which equates to an instantiation of the indices class template. The std::size_t template parameters of the indices type are instantiated as a zero-based finite arithmetic progression, with length equal to the number of template arguments given to make_indices. For example, make_indices<short,int>::type, would evaluate to indices<0,1>. The code below demonstrates a simple application of make_indices within a function template, id, which “does nothing”; well, it returns a tuple comprised of the same elements as the input. With additional parameters, make_indices can easily be used to create tuple versions of map and zipWith.

template <typename ...Ts>
std::tuple<Ts...> id(std::tuple<Ts...> t) {
  return select(t,typename make_indices<Ts...>::type());

Although useful, the make_indices class template has a number of weaknesses:

  • The template parameters of indices are fixed as std::size_t only;
  • The first index created is always 0;
  • The common difference between each index is fixed to 1;
  • The common difference is a positive value only;
  • make_indices can only be applied to type template parameter lists, and not non-type template parameters.

The first point can be addressed by a variadic, generic, index container:

template <typename T, T...>
struct indicesT {};

With tuples, the argument given to the relevant index function, std::get, will commonly have type std::size_t. The following type alias template allows the more straightforward std::size_t specialisation of indicesT to be used; e.g. indices<0,1,2> instead of indicesT<std::size_t,0,1,2>; also, the earlier definition of select can remain unchanged.

template <std::size_t ...Is>
using indices = indicesT<std::size_t, Is...>;

The traditional make_indices class template outlined earlier only allows us to specify the extent of the generated, zero-based, indices. Using the mk_index_range alias template, the same may be achieved using an integral range. For example, like make_indices<int,bool,char>::type, the type expression mk_index_range<0,2> will evaluate to indices<0,1,2>; or [0,2] in interval notation.

The mk_index_range alias template can also use a non-zero based start value; for example mk_index_range<8,10> will evaluate to indices<8,9,10>. A third, optional, template parameter of mk_index_range allows the specification of a stride. So, mk_index_range<1,9,2> will evaluate to indices<1,3,5,7,9>; and mk_index_range<9,1,-2>.

The indices produced by mk_index_range will always have type std::size_t; that is, the same type as the template parameter of the <tuple> function std::get. To produce signed indices, say of type int, a second alias template is provided: mk_index_rangei. Behind the scenes, something like mk_index_rangei<3,-3,-1>, which evaluates to indicesT<3,2,1,0,-1,-2,-3>, is defined as MkIndices<int,int,3,-3,-1>, using a more general alias template, MkIndices.

A fair question is, how many indices are produced by an arbitrary index range? For example, what is produced by mk_index_range<1,9,2> as opposed to mk_index_range<1,10,2>. How about mk_index_range<1,0,1>? The answer comes from the language of the new century: Fortran. With the loop-control of the DO construct defined by the grammar production below,

do-variable = expr1, expr2 [,expr3]

then the number of iterations is defined by the following equation:

niters = max((expr2 − expr1 + expr3)/expr3, 0)

The C++ code for this is a constexpr function which can be used within the template arguments of the mk_index_range implementation.

We’re now in a position to have some fun. The following basic function, tuple_tail is used within the definition of the tuple overload of the insertion operator <<. The tuple_tail function returns a tuple comprised of all elements of the input tuple argument, minus the first element:

template <typename T, typename ...Ts>
tuple_tail(tuple<T,Ts...> t) {
  return select(t, mk_index_range<1,sizeof...(Ts)>());

The following function, tuple_reverse, unsurprisingly returns a tuple constructed from all the elements of the input tuple argument, in reverse order:

template <typename ...Ts>
tuple_reverse(tuple<Ts...> t) {
  return select(t, mk_index_range<sizeof...(Ts)-1,0,-1>());

The mk_index_range function template is now used throughout an updated version of the compile-time FFT code described in the previous post. The map, zipWith and iota functions there now all use mk_index_range; they’re also similar to each other; and are interesting enough. The following function, condenseN, is though more compelling: returning a tuple comprised of every nth element of the input tuple. It is both integral to the FFT implementation; and uses a stride, or common difference, that isn’t 1. Incidentally, the actual instantiation is condenseN<2>, and alas this will crash the current version of Clang; Clang 3.2 snapshot.

template <size_t N, typename ...Ts>
condenseN(tuple<Ts...> t) ->
  decltype(select(t,mk_index_range<0,sizeof...(Ts)-1,N>())) {
  return   select(t,mk_index_range<0,sizeof...(Ts)-1,N>());

My personal favourite is also used by the compile-time FFT. The std::tuple_cat is a variadic function template which should catenate all tuples provided as arguments. The implementation below uses a helper function, tuple_cat_helper, which expands two parameter packs, Is and Js, within a single statement:

template <typename ...Ts>
tuple_cat() { return tuple<>(); }

template <typename ...Ts>
tuple_cat(tuple<Ts...> t) { return t; }

template  <typename Tup1, typename Tup2, std::size_t ...Is, std::size_t ...Js>
auto tuple_cat_helper(Tup1 t1, Tup2 t2, indices<Is...>, indices<Js...>) ->
  decltype(make_tuple(get<Is>(t1)...,get<Js>(t2)...)) {
  return   make_tuple(get<Is>(t1)...,get<Js>(t2)...);

template <typename ...Ts, typename ...Us, typename ...Tups>
tuple_cat(tuple<Ts...> t1, tuple<Us...> t2, Tups ...ts) ->
  return   tuple_cat(

Note that the C++11 standard definition of std::tuple is not defined as constexpr. The non-standard tuple implementation provided with the FFT code is contained within the ctup namespace.

The code for mk_index_range is here and the constexpr-friendly tuple implementation is here.

A compile-time FFT in C++11

September 2, 2012

Generalised constant expressions are a core language feature introduced to C++ by the recent C++11 standard. A call to a function or method qualified with the new constexpr keyword can now be evaluated at compile-time. Of course there are restrictions: arguments to such a call must also be amenable to compile-time evaluation. The primary restriction is that the body of a constexpr function must comprise of a single return statement. While this initially sounds draconian, the use of recursion, and the ternary operator, can permit essentially any calculation. The language of C++11 generalised constant expressions is that of a functional language.

The code below presents a brief example. Note the call to sqr_int in the second template argument to the C++11 std::array object. Also of interest: the call to sqr_int will prosaically perform floating-point calculations at compile-time. Incidentally, among other restrictions, lambda expressions may not, alas, form part of a C++11 generalised constant expression.

#include <array>
constexpr unsigned sqr_int(double d) { return d*d; }
std::array<bool,sqr_int(4.5)> garr;

Having been impressed by the ctrace and metatrace compile-time ray-tracers, I thought I’d give the fast fourier transform (FFT) a whirl. Seeking an FFT implemented in a functional language, I came across the elegant Single Assignment C (SAC) version shown here. After failing to compile the assembled pieces with the SAC compiler, I converted it to Fortran 9x and verified the results against another Fortran implementation from here.

The next thing I needed was a container data type to stand in for the rank-one, first-class arrays employed by the Fortran. Thoughts of using std::array soon evaporated as I realised that no constexpr element access methods exist; neither operator[] nor std::array::front() comply. Similar issues exist with std::tuple. Furthermore, the FFT only requires a homogeneous container. An implementation using the heterogeneous std::tuple would be inclined to add additional type-checking code. In response, I created the following recursive array container type:

template <typename T, size_t N>
struct recarr {
  constexpr recarr(T x, recarr<T,N-1> a) : m_x(x), m_a(a) {}

  constexpr T             head()   { return m_x; }
  constexpr recarr<T,N-1> tail()   { return m_a; }

  T m_x;
  recarr<T,N-1> m_a;

template <typename T>
struct recarr<T,0> {};

The common restriction on recursive data types: members having the same type as the enclosing class must be pointers, is not applicable here; the m_a member is always a different type; i.e. recarr is not the same type as recarr. The zero-length specialisation of recarr has no members.

The list type used routinely in functional languages such as Haskell is a clear inspiration in the design of std::recarr. A cons function for std::recarr, necessary for the construction of higher order functions such as map and zipWith, may be defined as shown below. Rather than the raw recarr constructor, the stand-alone recarr_cons function can infer its template arguments from the runtime arguments.

template <typename T, size_t N>
recarr<T,N+1> recarr_cons(T x, recarr<T,N> xs) {
  return recarr<T,N+1>(x,xs);

An FFT is an operation on complex numbers. The standard std::complex type, however, delivers another spanner to the works; it too lacks the constexpr methods needed for a compile-time FFT; including the constructors, and arithmetic operators. So, a new complex number class was developed, along with the necessary C++11 generalised constant expression support. Complex number formulae and definitions were drawn from here and here.

GCC and Clang both have excellent support for C++11. Wouldn’t it be nice to find out which is the faster compiler for the evaluation of a constexpr FFT? Some more challenges materialise here: Clang rejects all of the cmath library functions, as being non-constexpr. Functions required by the FFT include sin, sinh, cos, cosh, sqrt, log, exp, atan, atan2 and pow. Initially I though that Clang was being a little cranky here, but this is in fact the correct behaviour; n.b. a constexpr function may not be overloaded with a non-constexpr function; and vice-versa. Standard library functions which are to be constexpr must consequently be specified as such by the relevant C++ standard; the functions in cmath are not. After all, functions comprised of a single return statement may not be the most performant! The N3228 and N3231 documents from the C++ Standards Committee's November 2010 mailings elaborate on this theme.

So, constexpr versions of the mathematical functions from cmath necessary for the FFT were developed; with reference to the formulae and guidelines of the Algebra Help e-Book.

I am in fact a big fan of the std::tuple class. In the end I couldn't resist a second, comparative implementation of a constexpr FFT using C++11 tuples; and so the magic of variadic templates. Need I mention that std::tuple is also missing many of the constexpr constructors and support functions necessary for the job? The github code linked below includes a simple recursive version of the std::tuple type, with constexpr constructors; a version of get; a version of make_tuple; and, as the FFT uses concat, a version of tuple_cat. For comparison with recarr_cons above, here is the simple, concise tuple_cons; in constrast to a more verbose definition for tuple_cat provided in the full project.

template <typename T, typename ...Ts>
tuple<T,Ts...> tuple_cons(T x, tuple<Ts...> xs) {
  return tuple_cat(make_tuple(x),xs);

The code is stored in a github repository here.

Diagonal Selection

February 6, 2011

A good example of the expressivity of the forall statement from Fortran 95 and HPF, is the selection of the elements on the diagonal of an array. We begin such a thing with the declarations of a rank 2 array a, and a vector da to hold the diagonal:

integer :: i, a(4,2) = reshape([(i,i=1,size(a))],shape(a))
integer :: da(minval(shape(a)))

Note that regardless of the rank of the original array, the length of the sought diagonal will always be equal to the mininum element of the array’s shape vector. In our case the shape vector of a is [ 4, 2 ], and so the diagonal has a length of 2. Of course this is due to our definition of the diagonal of an array. Informally, our notion of diagonal corresponds to the set of array elements, indexed by a set of equal values; and ordered by those values. So, for a, this will be the elements indexed by a(1,1) and a(2,2): [ 1, 6 ]. So then, let’s look at the forall statement to achieve this:

forall (i=1:size(da)) da(i) = a(i,i)

How would this look as an array expression? As discussed in my first blog post, such array expressions hold the potential to handle input of arbitrary dimensionality; unlike the forall statement; though certainly more verbose.

Diagonal Selection

If an array of rank n is flattened, say by the Fortran pack intrinsic, it becomes clear that the number of elements separating each adjacent pair of diagonal members, is constant. Conveniently, the set of recursive equations which can express this constant may also be encoded using the scan_product function from the previous blog post.

The array a has shape [ 4, 2 ]. The product scan of [ 4, 2 ] is [ 4, 8 ]. We’re looking for a logical mask vector to select the diagonal elements. ( True and False will also be denoted as 1 or  0 as required. ) Some examples: the earlier [ 4, 2 ] shape array will require a mask of [ 1, 0, 0, 0, 0, 1, 0, 0 ]; while an array of shape [ 3, 4 ] requires a mask of [ 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0 ]. The pattern to notice is a repeating string of False values, delimited by True values, then succeeded by False values. The repeating string of False values has a length of four in the first example, and a length of three in the second. How many False values in general? With an input array a, the following code will assign the answer to the zps (zeros_per_step) integer variable:

zps = sum(eoshift(scan_product(shape(a)),-1))

This value be used to size a logical vector (zs) of False values as shown. A True value should be cons’d to the start of this vector, and the result repeated mi times, where mi is one less than the length of the final diagonal. The implicit do loop can handle such concatenative actions concisely, as shown in the code below:

zs  = spread(.false.,1,zps)
mi  = minval(shape(a))-1
tzs = [(.true.,zs,j=1,mi)]

The final, padding, string of False values, the vector zpd below, is longest in arrays having a relatively large final dimension; for example, shapes such as [ 2, 10 ], or [ 4, 5, 100 ].

nz  = product(shape(a)) - (mi * (zps+1)) - 1
zpd = spread(.false.,1,nz)

To construct the final 1D boolean selection mask, msk, the tzs vector has both a single True value appended; and also the zpd vector:

msk  = [(tzs,.true.,zpd,i=1,1)]

To apply this mask vector, we need to flatten the input array a using the pack intrinsic. The same pack intrinsic can then be used to apply the mask we have constructed above:


If the operation is to be encapsulated in a function, we are once again forced to choose a single, static rank for the result. The code below is then arbitrarily for an array of rank 3; again, an interface statement, could help construct a verbose generic representation. I’ve also used the associate construct from Fortran 2003, which operates like a non-recursive let expression. Hopefully it can aid readability; at the least one can easily print out the contents of each binding.

function diag3(a) result(r)
  integer, intent(in) :: a(:,:,:)
  integer :: i,j,r(minval(shape(a)))

  associate (sa  => shape(a)         )
  associate (sc  => scanl_product(sa))
  associate (eo  => eoshift(sc,-1)   )
  associate (zps => sum(eo)          )
  associate (ps  => product(sa)      )
  associate (mi  => minval(sa)-1     )
  associate (nz  => ps - (mi * (zps+1))-1)
  associate (zpd => spread(.false.,1,nz))
  associate (zs  => spread(.false.,1,zps))
  associate (tzs =>    [(.true.,zs,j=1,mi)])
  associate (msk => [(tzs,.true.,zpd,i=1,1)])
    r = pack(pack(a,.true.),msk)
  end associate
  end associate
  end associate
  end associate
  end associate
  end associate
  end associate
  end associate
  end associate
  end associate
  end associate
end function


September 6, 2010

Much can be achieved using the array functions of Fortran alone. Imperative (do/for) loops, for example, can often be avoided. The following code performs a one-dimensional convolution using array sections:

a(2:n-1) = a(1:n-2) + a(3:n)

There are also a few array “reduction” functions; such as all, any, count, maxval, minval, product, and sum. A higher-order reduction function could implement all of these functions; alas (in Fortran) it doesn’t a priori exist. For example, the functionality of product could be achieved using syntax such as:

result = reduce( (*), 1, [1,2,3] )

where result would be assigned a value of 6.

A common and useful partner to the reduce function is the scan function. The scan function returns not one final value, but all intermediate values too. If it existed in Fortran, a use case might look like this:

result = scan( (*), 1, [1,2,3] )

Assuming the multiplicative unit (1) is not included in the output, this would produce a result of [ 1, 2, 6 ]. I’d now like to implement this particular (multiplicative) scan using only existing Fortran array intrinsic functions; i.e. no loops. The earlier input of [ 1, 2, 3 ] will be used as a working example. (We will return to this scan function in the following post on diagonal selection.)

Creating the Scan Expression

The intrinsic function spread will, given an array of rank n, produce a new array with rank n+1. The new array is constructed by replicating the input a number of times specified by spread‘s third (and final) argument. The second argument to spread specifies which dimension (>=1) to replicate over.  We can visualise this choice with a pencil representing our 1D input. If we wish to replicate the pencil to make a square, we can begin either with the pencil positioned horizontally; or, vertically. Similarly, if we instead had a book and wished to create a cuboid, we begin with three choices.

A simple way to visualise the output of spread, is simply to consider the resulting arrangement of array values in memory. Often, Fortran array data is arranged in a contiguous linear arrangement, and so the output of:

spread( [1,2,3], 1, 3 )

is stored in memory as [ 1, 1, 1, 2, 2, 2, 3, 3, 3 ]. The output of:

spread( [1,2,3], 2, 3 )

however, is stored as [ 1, 2, 3, 1, 2, 3, 1, 2, 3 ]. Both are 2D arrays with a shape of [ 3, 3]; and the difference is clear. Here we take the second choice, and replicate the input vector (that is, a 1D array) across the second dimension.

We now use the eoshift intrinsic function to shift the individual rows of this 3×3 array to the right, filling the holes left behind with the value provided by eoshift‘s third (and final) argument, 1; aka. the multiplicative unit. The second argument is a vector indicating the degree of shifting required for each row. As the default shift direction is to the left, we are looking for a vector, [ -2, -1, 0 ]. It may be constructed using the following implied-do expression:


So, obtaining the length of the input using the size intrinsic, we have built a new 1D array containing the arithmetic progression from minus one less than the length, up to zero.

Returning to eoshift, the following expression:

eoshift( spread( [1,2,3] ,2, size([1,2,3]) ), &
         [(i,i=-(size([1,2,3])-1),0)],        &

evaluates to a 2D array with shape [ 3, 3 ]; its contents in memory are: [ 1, 1, 1, 1, 1, 2, 1, 2, 3 ].

Fittingly, the product reduction intrinsic can at last be employed. As well as operating on a vector, product can be applied to all vectors aligned with the axis specified by its second  (and optional) argument.  In this case that is axis #1, and so final result can be obtained by applying the product intrinsic to the result of the eoshift expression from above:

product( eoshift( spread( [1,2,3] ,2, size([1,2,3]) ), &
                  [(i,i=-(size([1,2,3])-1),0)],        &
                  1),                                  &

Wrapping up as a Function

Rather than crafting such an expression inline around each 1D array expression we wish the scanned product of, we can at least create a function; scanl_product. The execution-part of this function is basically the product expression above, with a 1D array variable abstraction v replacing the literal [ 1, 2, 3 ]:

function scanl_product(v) result(vout)
  integer, intent(in) :: v(:)
  integer :: i, vout(size(v))

  vout = product( eoshift( spread( v, 2, size(v) ), &
                           [(i,i=-(size(v)-1),0)],  &
                           1),                      &
end function
coz = seventeen1(z)

Fragrant Fortran

February 23, 2010

Fortran is a programming language like C, with more capital letters, and less pointers. Since around 1990, the Fortran standard has also supported arrays as first class language elements. Such array features, provided by Fortran 90, Fortran 95, and Fortran 2003, are reminiscent of Kenneth Iverson’s seminal APL (A Programming Language), and J; though less developed.

Fortran 9x includes many useful and essential procedures for manipulating arrays. For example, the function shape takes one array argument (of any dimensionality) and return a one-dimensional array (often referred to as a vector) which lists the extent/size of each dimension. The length of this vector is therefore equal to the dimensionality of the argument to the shape function.

Tens of such array functions are specified by the Fortran standard, and included in essentially every Fortran compiler. Along with the more numerous scalar functions, these are collectively referred to as the Fortran intrinsics.

Fortran array intrinsics are often defined for arrays of arbitrary dimensionality (rank). However, the end user is made to work hard to replicate such genericity. A user-defined function accepting or returning arrays must specify a fixed dimensionality. Consequently, Fortran’s expression language is massively more concise than its procedure language. Hence,

m = sum(a) / size(a)

will assign the mean (excepting zero-sized arrays, and other desiderata) of input array a, to m, regardless of the rank of the a. A function to perform a similar action on a one-dimensional array could look like this:

function mean1(a) result(m)
  real, intent(in) :: a(:)
  real :: m

  m = sum(a) / size(a)
end function

which could be called with:

m = mean1(a)

We would need another for an array of rank 2:

function mean2(a) result(m)
  real, intent(in) :: a(:,:)     ! Extra colon !
  real :: m

  m = sum(a) / size(a)
end function

The two versions of mean differ only by the rank in array a‘s declaration. Also, apart from such “boilerplate” declarations, the core of our function is lexically the same as the earlier expression; “m = sum(a) / size(a)”. At least Fortran arrays have a maximum rank of seven, so we would “only” need seven versions of this mean function. Tragically however, if a function takes n array arguments, we will need to write 7^n functions.

One ray of light comes in the form of Fortran’s interface statement. The interface statement allows a user to access multiple procedures via a single name; similar to C++ overloading. For example, the following interface statement allows a call to mean to be instantiated appropriately, determined by the types of its arguments.

interface mean
  module procedure mean1, mean2
end interface mean

It’s worth mentioning that Professor Martin Erwig’s Parametric Fortran project provides one solution to this convoluted aspect of Fortran.

Finally, note also that alongside array expressions, 1990 also eschewed ye olde brutal and capitalist “FORTRAN” in favour of the effervescent stateswoman we know and love as “Fortran” today.


Get every new post delivered to your Inbox.