This draft is outdated. You can find the most recent version of the proposal at http://klammler.eu/data/computer-science/iso-c++/p0205/.

Allow Seeding Random Number Engines With std::random_device

Document number: D0205R0
Date: 2016-01-31
Project: Programming Language C++
Audience: Library Working Group, SG6 (Numerics)
Reply-to: Moritz Klammler <moritz\x2Eklammler\x40gmail\x2Ecom> (OpenPGP: 2732 DA32 C8D0 EEEC A081 BE9D CF6C 5166 F393 A9C0)

Table of Contents

  1. 1. Introduction
  2. 2. Motivation and Scope
    1. 2.1 Background
    2. 2.2 The Problem With the Current Standard Library
    3. 2.3 The Proposed Solution
  3. 3. Impact On the Standard
  4. 4. Design Decisions
    1. 4.1 Alternative Solutions
      1. 4.1.1 Provide a Generic Adapter Type Instead of Modifying std::random_device
      2. 4.1.2 Integer Types
    2. 4.2 Shortcomings
    3. 4.3 Impact on Standard Library Implementations
  5. 5. Technical Specifications
  6. Acknowledgments
  7. References

1. Introduction

The purpose of this proposal is to make properly seeding a random number engine in a non-deterministic way easy and fast.

In order to achieve this, the following changes are proposed:

The changes are non-breaking and can be implemented as a pure library solution with current C++14 features.

2. Motivation and Scope

2.1 Background

C++11 introduced a powerful random number generation library (§ 26.5 [rand]) under the <random> header. It provides random number engines (§ 26.5.1.4 [rand.req.eng]) that can be used either directly as a source of uniformly distributed pseudo random integers of unsigned type or together with random number distributions (§ 26.5.1.6 [rand.req.dist]) to produce pseudo random numbers according to a variety of distributions.

If an engine has an internal state of N bits, it can produce at most 2N different sequences and the longest sequence it can produce will have a period of at most 2N values until it necessarily starts repeating itself. Applications that wish to produce high-quality pseudo random numbers will therefore choose an engine such as the Mersenne twister with a large internal state. Choosing an engine with a large internal state helps ensuring that the period of the generated sequence will be large. However, in order to ensure that the number of different sequences the engine can produce is also exponential in the size of its state, it has to be made sure that the initial state is evenly distributed across all possible states. This is done by seeding the random number engine. If each of the 2N states should be chosen with equal probability as the initial state, seeding requires 2N bits of entropy.

The standard library provides the type std::random_device (§ 26.5.6 [rand.device]), a uniform random number generator (§ 26.5.1.3 [rand.req.urng]) that is supposed (but, unfortunately, not required) to produce a non-deterministic sequence of uniformly distributed integers of type unsigned int. The natural choice for an application that wishes to generate pseudo random numbers and does not need (and often doesn't want) reproducibility is to use it for seeding a random engine that is then used to produce pseudo random numbers.

2.2 The Problem With the Current Standard Library

Unfortunately, the current standard library does not provide any convenient way to use a std::random_device to properly (in the sense that each initial state is equally likely) seed a random engine.

The naïve approach that most people seem to use is the following.

template <typename EngineT>
  // requires(RandomNumberEngine(EngineT))
void
seed_non_deterministically_1st(EngineT& engine)
{
  std::random_device rnddev {};
  engine.seed(rnddev());
}

This code is severely flawed. If EngineT is std::mt19937, it has a state size of 19 968 bits. However, if an unsigned int is 32 bits (as is the common case on many platforms today), then of the up to 219 968 states, at most 232 (that is one 2−19 936-th) can possibly be chosen!

To illustrate that this is in fact a real problem, O'Neill [1] has found out that a std::mt19937 engine seeded like above can never produce the numbers 7 or 13 as its first value. Quite a surprise that can have bad consequences on real-world programs. A program that incorrectly assumes that an impossible number will eventually be produced as the engine's first (or n-th) output is broken in a very subtle way. After all, it would take extensive and practically unrealistic unit testing to do an exhaustive search over the possible seeds and even detect the bug.

In addition to seeding an engine with an integer, the standard library also provides a way to seed it with a so-called seed sequence (§ 26.5.1.2 [rand.req.seedseq]). A seed sequence may be constructed in a deterministic way from zero or more integers that provide some initial entropy. The numbers are then possibly scrambled and stored in some internal state of the seed sequence which can be externalized using the param member function. Such a memento may then be used to re-create a seed sequence of the same type in the same state. A seed sequence provides a member function generate that takes a pair of random access iterators and assigns a uniformly distributed unsigned 32 bit integer to each element in the range denoted by the iterator pair. The standard library provides a single implementation of a seed sequence in std::seed_seq (§ 26.5.7.1 [rand.util.seedseq]). This type can be used to seed a random engine more thoroughly.

template <typename EngineT, std::size_t StateSize = EngineT::state_size>
  // requires(RandomNumberEngine(EngineT))
void
seed_non_deterministically_2nd(EngineT& engine)
{
  using engine_type = typename EngineT::result_type;
  using device_type = std::random_device::result_type;
  using seedseq_type = std::seed_seq::result_type;
  constexpr auto bytes_needed = StateSize * sizeof(engine_type);
  constexpr auto numbers_needed = (sizeof(device_type) < sizeof(seedseq_type))
    ? (bytes_needed / sizeof(device_type))
    : (bytes_needed / sizeof(seedseq_type));
  std::array<device_type, numbers_needed> numbers {};
  std::random_device rnddev {};
  std::generate(std::begin(numbers), std::end(numbers), std::ref(rnddev));
  std::seed_seq seedseq(std::cbegin(numbers), std::cend(numbers));
  engine.seed(seedseq);
}

This code has a number of problems.

2.3 The Proposed Solution

With this proposal adopted, properly seeding a random engine could be as simple as this.

template <typename EngineT>
  // requires(RandomNumberEngine(EngineT))
void
seed_non_deterministically_3rd(EngineT& engine)
{
  std::random_device rnddev {};
  engine.seed(rnddev);  // note: passing rnddev itself as opposed to the
                        // result of invoking its call operator
}

This would be made possible by adding a generate member function to std::random_device directly and relaxing the type requirements on the argument to the seed member function to only require that generate member function.

There would be no unnecessary copies, no unneeded tempering, no dynamic memory allocation, no introduced bias and little chance to get anything wrong on the user's side.

3. Impact On the Standard

This proposal could be implemented with very little changes, none of which would be breaking anything. No core language changes are needed.

The requirements on the type parameter for a random engine's seed member function (and the corresponding constructor) should be relaxed such that it only requires the generate member function from seed sequence. Bolas [2] has argued that even today, a conforming implementation isn't allowed to call anything but generate and perhaps size due to existing complexity requirements. Unfortunately, it may still enforce their presence via type checks.

To allow also other types than std::random_device to be used, a new concept seed generator that only specifies the generate member function should be introduced and the seed sequence concept be defined as a refinement of it.

The specification of std::random_device should be changed such that it meets the requirements of seed generator. The only thing that is needed here is the addition of the generate member function.

This proposal has no dependencies on any other proposal. The author is not aware of any other proposal that depends on or conflicts with this proposal. In particular, it is orthogonal to N3547 [5]. (However, the sample implementation of std::randomize in that paper could benefit from the feature suggested in this proposal.)

4. Design Decisions

4.1 Alternative Solutions

The proposed solution falls into two parts:

The author is not aware of any substantially different alternative solution regarding the relaxation of the type requirements. This is confirmed by the observation that O'Neill [1] has independently suggested the same change back in April 2015.

4.1.1 Provide a Generic Adapter Type Instead of Modifying std::random_device

Instead of adding a member function to std::random_device, a new adapter type could be provided that would turn any uniform random number generator into a seed generator. This would have the advantage that one could use, say, a std::mt19937 engine to seed another std::ranlux24 engine. (However useful that may be.) On the other hand, it would require more typing for the common case of seeding any random engine with a std::random_device as the temporary adapter object would have to be created. Such an adapter could also not take advantage of the size of the range being known ahead of time.

Anyway, here is a suggestion for this alternative.

template <typename UrngT>
  // requires(UniformRandomNumberGenerator(UrngT))
class seed_gen
{

private:

  UrngT urng_;

public:

  template <typename... ArgTs>
  seed_gen(ArgTs&&... args) : urng_ {std::forward<ArgTs>(args)...}
  {
  }

  template <typename FwdIterT>
    // requires(ForwardIterator(FwdIterT)
    //          && Assignable(FwdIterT::value_type, std::uint32_t))
  void
  generate(const FwdIterT first, const FwdIterT last)
  {
    std::uniform_int_distribution<std::uint32_t> dist {};
    std::generate(first, last, [this, &dist](){ return dist(urng_); });
  }

};

It would then be used like this.

std::seed_gen<std::random_device> generator {};
std::mt19937 engine {generator};
std::cout << engine() << '\n';

4.1.2 Integer Types

Another minor design variation would be to require std::random_device::generate to not assign 32 bit values to each element in the range but rather set all bits of the dereferenced iterator to independently chosen random values. This would be a more natural (and, in many contexts, more useful) behavior but it wouldn't be compatible with the existing specification of seed sequence. Therefore, it seems better to stay with the 32 bit requirement.

Conversely, std::random_device's result_type could be changed to std::uint32_t and its operator() be required to produce values in the range [0, 232). This would integrate better with the other facilities but could potentially break existing (brittle) code (on exotic platforms) in subtle ways. It could also affect performance adversely on platforms where std::random_device::operator() is implemented via a special hardware instruction and the result of that instruction is not a 32 bit value.

4.2 Shortcomings

A known minor shortcoming of the proposed solution is that the seed function (and corresponding constructor) would have to take their argument by non-const reference, so a temporary cannot bind to it which means that it is not possible to write the following code.

auto engine = std::mt19937 {std::random_device {}};  // won't compile

Instead, one has to write the slightly more verbose

std::random_device device {};
auto engine = std::mt19937 {device};

which leaves the std::random_device in scope longer than necessary. Since it can be quite large and might hold an open file handle, this is potentially undesirable. To avoid this, a lambda could be used.

auto engine = [](){
  std::random_device device {};  // only lives as long as the lambda ...
  return std::mt19937 {device};  // ... and the copy is hopefully elided
}();

4.3 Impact on Standard Library Implementations

Little to no code should be required to change in the implementations of the random engines. A quick glance over the code of libstdc++ [3] (GCC) and libc++ [4] (Clang) has shown that there are no changes required for libstdc++ and all that is needed for libc++ is removing or weakening the concept check.

The to-be-added generate member function of std::random_device could be implemented in a compliant but naïve way like this.

template <typename RandIterT>
  // requires(RandomAccessIterator(RandIterT)
  //          && Unsigned(RandIterT::value_type)
  //          && (std::numeric_limits<RandIterT::value_type>::digits >= 32))
void
random_device::generate(const RandIterT first, const RandIterT last)
{
  std::uniform_int_distribution<std::uint32_t> dist {};
  std::generate(first, last, [this, &dist](){ return dist(*this); });
}

Implementers will probably be interested to provide optimizations for the case that RandIterT is a pointer or can be converted to a pointer (such as std::vector<unsigned>::iterator_type) so that instead of calling (*this)() for each element, they can fill in all bytes at once if reading from a file like /dev/urandom. Care has to be taken to handle the case where sizeof(RandIterT::value_type) > sizeof(std::uint32_t) correctly. This enhanced std::random_device could be useful in other contexts, too.

It is worth mentioning that merely adding a member function does not break binary compatibility of existing code that uses std::random_device.

5. Technical Specifications

All proposed changes to the standard mentioned in this section are relative to N4140.

Add a new section before the existing § 26.5.1.2 [rand.req.seeseq] to define seed generator.

Seed generator requirements [rand.req.seedgen]

A seed generator is an object that produces a requested number of unsigned integer values i with 0 ≤ i < 232. The generated numbers may be obtained from a non-deterministic source of entropy.

A class S satisfies the requirements of a seed generator if the expressions shown in the below table are valid and have the indicated semantics and if S also satisfies all other requirements of this section. In that table and throughout this section:

  1. T is the type named by S's associated result_type;
  2. q is a value of S;
  3. rb and re are mutable random access iterators with an unsigned integer value_type of at least 32 bits.

Expression

Return type

Pre/post-condition

Complexity

S::result_type

T

T is an unsigned integer type of at least 32 bits.

compile-time

q.generate(rb, re)

void

Does nothing if rb == re. Otherwise, fills the supplied sequence [rbre) with 32-bit quantities in a possibly non-deterministic way.

Ο(re - rb)

In the existing section § 26.5.1.2 [rand.req.seedseq], change the beginning of the second paragraph as follows.

A class S satisfies the requirements of a seed sequence if it satisfies the requirements of a seed generator and in addition, the expressions […]

In the current table 115, remove the rows that are already covered by the seed generator requirements.

In section 26.5.6 [rand.device], add the following sentence at the end of the first paragraph.

It also meets the requirements of a seed generator.

Add the following declaration to the class overview of std::random_device right after the operator() under the generating functions section.

template <typename RandIterT> void generate(RandIterT first, RandIterT last);

After paragraph 7 of the same section, add the following specification.

template <typename RandIterT> void generate(RandIterT first, RandIterT last);

Requires: RandIterT must meet the requirements of random access iterator and its value_type must be an unsigned integer of at least 32 bits.

Effects: Assigns non-deterministic 32 bit random values uniformly distributed over the interval [0, 232) to the elements in the sequence [firstlast). This function behaves as if it were implemented by the following code.

uniform_int_distribution<uint32_t> dist {};
generate(first, last, [this, &dist](){ return dist(*this); });

Throws: A value of an implementation-defined type derived from exception if a random number could not be obtained.

Acknowledgments

Melissa O'Neill has written a remarkable blog post [1] that discusses the problems with seeding random engines in-depth. Although not the initial motivation for the author of this proposal, that blog post provided valuable theoretical and experimental support and the fact that its author had independently suggested the same addition to the standard library was very affirming.

Baum mit Augen's shared experience of struggling to properly seed a std::mt19937 from a std::random_device and the following discussion with Deduplicator (out of which came an earlier version of the code for seed_non_deterministically_2nd) were part of the motivation for writing this proposal [6].

Nicol Bolas and Zhihao Yuan have provided valuable feedback on the std-proposals@isocpp.org mailing list.

References

  1. Melissa O'Neill, C++ Seeding Surprises. 2015-04-16, http://www.pcg-random.org/posts/cpp-seeding-surprises.html
  2. Nicol Bolas, via std-proposals@isocpp.org, 2016-01-03, https://groups.google.com/a/isocpp.org/d/msg/std-proposals/sF-P4VE2Z3Q/u24T-g-hEgAJ
  3. The libc++ C++ Standard Library, http://libcxx.llvm.org/
  4. The GNU Standard C++ Library Version 3, https://gcc.gnu.org/libstdc++/
  5. Walter Brown, Three <random> related Proposals. 2013-03-12, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3547.pdf
  6. Seed std::mt19937 from std::random_device in: Code Review, 2015-10-30, http://codereview.stackexchange.com/q/109260