I am writing an algorithm to speed up some specific bitmap comparisons.
The signature of this function is.
template <size_t vec_size, typename uint_t>auto foo(uint_t a, std::array<uint_t, vec_size> &&bitmap){ static_assert(std::is_integral<uint_t>(), "Must pass a numerical bitmap"); static_assert(std::is_unsigned<uint_t>(), "The bitmap must be unsigned, cant deal with 1 or 2's complement." // emplace bitmap here. // I have to move, copy or elide a copy at once in the function body. auto localbitmap = std::forward(bitmap); ... // compute answer with bitmap. ... return value;}
if I call foo(6, std::array<uint8_t, 4>{2, 7, 8, 12})
like this, I would hope the compiler (gcc-10 or clang-10) knows how to elide the copy and does so with some form of reliability.However, when passing in an lvalue, it'll pass in a reference, which is totally fine.when doing something like foo(6, bar())
, the result of bar()
might be moved or might undergo return-value-optimization.
That still leaves one move when bitmap is an rvalue or rvalue-reference.Can I somehow get rid of that move when bitmap is a prvalue?Is this just a matter of forcing inlining if I really want RVO to happen?
I am ready to stare at the generated assembly and do the benchmarks for quite some time since this is gonna be a performance critical piece of code.