Fastest C# Random Number Generator: XorShift+

I constantly find myself needing pseudo-random numbers for machine learning projects. In most cases, the built-in .NET Random class is perfect for this type of thing. For high-performance or type-specific application… not so much. It’s bad enough that Random is limited to a handful of types. It’s even worse from a performance standpoint. For those who need to generate pseudo-random data hundreds of millions of times per second, Random simply won’t cut it.

When implemented, XorShift+ is one of the fastest C# random number generator out there. It even passes the BigCrush statistics tests, meaning it generates a superb distribution of pseudo-random numbers.

How is it so fast?

XorShift+ is heavily optimized by using simple bit-wise shifting and arithmetic. This means that the compiled code consists of very few CPU instructions, all of which involve native manipulation of numbers. Modern CPUs are very intelligent when it comes to executing these instructions, so we are able to generate random numbers amazingly fast – even in C#.

 

The Algorithm

The XorShift+ algorithm is encapsulated in a class to maintain state, which is randomly seeded upon instantiation. The state variables are x and y, and they both denote the previous state. This previous state is used to generate the next state. Here’s how it would be implemented in C++:

class XorShiftRandom {
  public:
    explicit XorShiftRandom(uint64_t seed) {
      x = seed << 1; y = seed >> 1;
    };

    uint64_t Next() {
      uint64_t x_, y_, _;
      x_ = y;
      x ^= x << 23; y_ = x ^ y ^ (x >> 17) ^ (y >> 26);

      _ = y_ + y;
      x = x_;
      y = y_;

      return _;
    };

  private:
    uint64_t x;
    uint64_t y;
}

So, what exactly is happening here? Simply put, we store a temporary state x_ and y_, perform bit-wise shifting and exclusive-or operations (hence the name XorShift), store the new state, and return the sum of the old and new y states. It is very simple, and very few instructions are needed.

 

C# Implementation

Now for the real meat and potatoes: Implementing XorShift+ in C# and comparing it’s performance to the Random class.

Let’s start with the class itself, since we need a state. We’re storing our state as a 64-bit integer, since that allows us to generate twice as many bits with roughly the same performance as an int. After all, we’re only performing simple bit-wise operations.

public class XorShiftRandom
{

    #region Data Members

    private ulong x_;
    private ulong y_;

    #endregion

    #region Constructor

    public XorShiftRandom()
    {
      x_ = ( ulong ) Guid.NewGuid().GetHashCode();
      y_ = ( ulong ) Guid.NewGuid().GetHashCode();
    }

    #endregion

}

As you can see, we are randomly generating two Guid values and using their hash codes as seeds. While this is not very good performance wise, I have found it to produce very good seeds. If you are constructing instances of XorShiftRandom so frequently that this causes performance issues, you should rethink your design.

 

Generating random integers and other primitives

Anyways, we now have our state encapsulated in a class. Next, lets generate a random 32-bit integer.


public int NextInt32()
{
  int _;
  ulong temp_x, temp_y;

  temp_x = y_;
  x_ ^= x_ << 23; temp_y = x_ ^ y_ ^ ( x_ >> 17 ) ^ ( y_ >> 26 );

  _ = ( int ) ( temp_y + y_ );

  x_ = temp_x;
  y_ = temp_y;

  return _;
}

We didn’t have to change much from the C++ implementation. In fact, the only thing that has changed here is that we are using different variable names for the temporary state. Better yet, this code will be nearly identical for most of the primitive types we are generating. The only major difference is the return type. Because of this, we aren’t going to go through every primitive, but our complete code is at the end of this post.

 

Generating random floating point numbers

With a majority of the primitive types handled, let’s make a generator for the double type. Most random number generators will produce a value between 0 and 1 for floating point types, so we can’t just cram a bunch of random bytes in and call it a day. We need to be a bit smarter than that.

private const double DOUBLE_UNIT = 1.0 / ( int.MaxValue + 1.0 );
public int NextDouble()
{
  double _;
  ulong temp_x, temp_y, temp_z;

  temp_x = y_;
  x_ ^= x_ << 23; temp_y = x_ ^ y_ ^ ( x_ >> 17 ) ^ ( y_ >> 26 );

  temp_z = temp_y + y_;
  _ = DOUBLE_UNIT * ( 0x7FFFFFFF & temp_z );

  x_ = temp_x;
  y_ = temp_y;

  return _;
}

Since we need to generate a value between 0 and 1, it only makes sense that we generate an arbitrary number and multiply it by the smallest possible unit.. That is, we try to calculate the smallest fraction a double can hold and multiply the number by that. We’re also masking the generated number by 0x7FFFFFFF because we want the result to be positive, not negative. Credit to EpicMorgArchive on Github for this technique. They applied it to an older, slower version of XorShift that does not pass BigCrunch.

 

Generating random Decimal numbers

C# has given rise to a more precise data type that aids in financial computing. The Decimal type stores 128 bits of data and is not prone to rounding errors in the same way as floating point numbers are. Rather, it stores numbers in a way that performs arithmetic like it would a whole number. This is much slower than floating point operations are, as most modern processors accelerate floating point arithmetic with built-in registers, opcodes, and optimizations. Decimal arithmetic is currently performed entirely in-software, and as such is much slower. However, it only makes sense to support this type in case we ever need random values of this particular type.

public decimal NextDecimal()
{
  decimal _;
  int l, m, h;
  ulong temp_x, temp_y, temp_z;

  temp_x = y_;
  x_ ^= x_ << 23; temp_y = x_ ^ y_ ^ ( x_ >> 17 ) ^ ( y_ >> 26 );

  temp_z = temp_y + y_;

  h = ( int ) ( temp_z & 0x1FFFFFFF );
  m = ( int ) ( temp_z >> 16 );
  l = ( int ) ( temp_z >> 32 );

  _ = new decimal( l, m, h, false, 28 );

  x_ = temp_x;
  y_ = temp_y;

  return _;
}

Since we are generating 64 bits of random data, we have two options. Either we can have two rounds of value generation, or we can use one value and shift it to appear random. I chose the latter. This isn’t the most robust solution, and likely will not pass BigCrush tests. However, in terms of performance, it seems to work very well.

Take a look at the three integers we are generating: hi, mid, and lo. These three 32-bit values make up the 96-bit decimal. First, we mask hi by 0x1FFFFFFF to maintain a number between 0 and 1. Next, we take bits 16-48 and use that as the mid, and then take the highest 32 bits and use that as the lo. We feed all of those into the Decimal constructor, signifying the signed-ness as false and the scale as 28 so that all possible numbers are between 0 and 1.

 

Generating individual booleans (bits) and bytes

We could easily use the same code as most primitive types do to generate individual boolean or byte values. However, if we are generating 64 bits at a time, why not just cache the remaining bits instead of throwing them away? This is another optimization technique based on EpicMorgArchive’s implementation, although I have combined the two types to share the same buffer.

// Buffer for optimized bit generation.
private ulong buffer_;
private ulong bufferMask_;

public bool NextBoolean()
{
  bool _;
  if ( bufferMask_ > 0 )
  {
    _ = ( buffer_ & bufferMask_ ) == 0;
    bufferMask_ >>= 1;
    return _;
  }

  ulong temp_x, temp_y;
  temp_x = y_;
  x_ ^= x_ << 23; temp_y = x_ ^ y_ ^ ( x_ >> 17 ) ^ ( y_ >> 26 );

  buffer_ = temp_y + y_;
  x_ = temp_x;
  y_ = temp_y;

  bufferMask_ = 0x8000000000000000;
  return ( buffer_ & 0xF000000000000000 ) == 0;
}

public byte NextByte()
{
  if ( bufferMask_ >= 8 )
  {
    byte _ = ( byte ) buffer_;
    buffer_ >>= 8;
    bufferMask_ >>= 8;
    return _;
  }

  ulong temp_x, temp_y;
  temp_x = y_;
  x_ ^= x_ << 23; temp_y = x_ ^ y_ ^ ( x_ >> 17 ) ^ ( y_ >> 26 );

  buffer_ = temp_y + y_;
  x_ = temp_x;
  y_ = temp_y;

  bufferMask_ = 0x8000000000000;
  return ( byte ) ( buffer_ >>= 8 );
}

These two methods essentially do the same thing; the only difference being the size of bits they consume from the buffer. They both check to see if the buffer has enough bits left to satisfy the request. If so, they consume the needed amount of bits and return the generated value. If the buffer does not contain enough bits, a new buffer is generated. This allows us to use the same buffer for 64 boolean values or 8 byte values.

 

Generating an array of bytes

When generating an array of bytes, we will most likely need to loop our value generator code. Because of this, we can use some additional optimization techniques such as unrolling and variable localization.

public unsafe void NextBytes( byte[] buffer )
{
  // Localize state for stack execution
  ulong x = x_, y = y_, temp_x, temp_y, z;

  fixed ( byte* pBuffer = buffer )
  {
    ulong* pIndex = ( ulong* ) pBuffer;
    ulong* pEnd = ( ulong* ) ( pBuffer + buffer.Length );

    // Fill array in 8-byte chunks
    while ( pIndex <= pEnd - 1 )
    {
      temp_x = y;
      x ^= x << 23; temp_y = x ^ y ^ ( x >> 17 ) ^ ( y >> 26 );

      *( pIndex++ ) = temp_y + y;

      x = temp_x;
      y = temp_y;
    }

    // Fill remaining bytes individually to prevent overflow
    if ( pIndex < pEnd )
    {
      temp_x = y;
      x ^= x << 23; temp_y = x ^ y ^ ( x >> 17 ) ^ ( y >> 26 );
      z = temp_y + y;

      byte* pByte = ( byte* ) pIndex;
      while ( pByte < pEnd ) *( pByte++ ) = ( byte ) ( z >>= 8 );
    }
  }

  // Store modified state in fields.
  x_ = x;
  y_ = y;
}

There are a few things to pay careful attention to here:

First and foremost, we are marking our method as unsafe and using pointers. This is to optimize copying of values into the provided byte array. Visual Studio may require you to open your project properties and check “Allow Unsafe Code”, as this is a type safety violation, albeit one that we are using extreme care with.

Next, you’ll notice that we are attempting to copy 64-bit chunks of data at a time. Since we are using unsafe code, there is potential for a buffer overflow. Hence, we have to check the remaining size of the buffer before adding, and if there isn’t enough space, we need to add the remaining bytes one-at-a-time. This causes a performance hit, but it is still very fast when compared to the Random class.

Lastly, you’ll notice we take the class state variables and make them local variables. This is an optimization technique called “variable localization”. Since the variables are now on the stack, we can access them much faster. It doesn’t make sense to do this in the other methods, as they don’t loop, but in this case, it provides a performance boost. Of course, we need to remember to save the modified state values after we are done filling the buffer.

 

Benchmark Results

I used BenchmarkDotNet to run the benchmark tests, which is a fantastic tool. It performs JIT warm-up and calculates overhead to provide accurate results. All benchmarks were compiled in Release mode with optimizations on.

Here’s how XorShift holds up against Random:

Method .NET Random XorShift+ % Change
Next (32-bit int) 8.040ns 1.470ns -546.94%
NextDouble 9.216ns 2.010ns -458.51%
NextBytes (1) 11.364ns 6.332ns -179.47%
NextBytes (8) 67.263ns 7.610ns -883.88%
NextBytes (16) 128.708ns 7.029ns -1831.10%
NextBytes (32) 247.082ns 11.075ns -2230.99%
NextBytes (64) 486.822ns 17.115ns -2844.42%
NextBytes (128) 982.328ns 28.841ns -3406.01%

As you can see, XorShift+ is faster at everything. One surprising observation is that NextBytes grows linearly for both generators, but XorShift+ grows at an incredibly lower rate, making it more ideal for generating large sequences of bytes.

Of course, we implemented extra methods in our XorShift class that we can’t compare against Random. Let’s see how those perform:

 

Method Time per op
NextBoolean 1.328ns
NextByte 1.361ns
NextBytes (96) 23.343ns
NextInt16 1.692ns
NextUInt16 1.662ns
NextInt32 1.575ns
NextUInt32 1.573ns
NextInt64 1.463ns
NextUInt64 1.632ns
NextDouble 2.152ns
NextDecimal 3.000ns

 

And there you have it. XorShift+ is one of the – if not the – fastst C# random number generator algorithms in existence. Clocking in at 1-3 nanoseconds per operation, we can effectively produce 6-8 gigabytes of pseudo-random data per second if we so choose. With that in mind, the bottleneck shifts from CPU clock speed to memory speed, which is impressive to say the least.

 

The Final Code

As promised, here is the final code. I hope you have both enjoyed and learned something from this article. If you have any questions, concerns, or suggestions for new posts, please do reach out in the comments section.


/*===============================[ XorShiftPlus ]==============================
  ==-------------[ (c) 2018 R. Wildenhaus - Licensed under MIT ]-------------==
  ============================================================================= */

using System;

namespace Haus.Math
{

  /// <summary>
  ///   Generates pseudorandom primitive types with a 64-bit implementation
  ///   of the XorShift algorithm.
  /// </summary>
  public class XorShiftRandom
  {

    #region Data Members

    // Constants
    private const double DOUBLE_UNIT = 1.0 / ( int.MaxValue + 1.0 );

    // State Fields
    private ulong x_;
    private ulong y_;

    // Buffer for optimized bit generation.
    private ulong buffer_;
    private ulong bufferMask_;

    #endregion

    #region Constructor

    /// <summary>
    ///   Constructs a new  generator using two
    ///   random Guid hash codes as a seed.
    /// </summary>
    public XorShiftRandom()
    {
      x_ = ( ulong ) Guid.NewGuid().GetHashCode();
      y_ = ( ulong ) Guid.NewGuid().GetHashCode();
    }

    /// <summary>
    ///   Constructs a new  generator
    ///   with the supplied seed.
    /// </summary>
    /// <param name="seed">
    ///   The seed value.
    /// </param>
    public XorShiftRandom(ulong seed)
    {
      x_ = seed << 3; x_ = seed >> 3;
    }

    #endregion

    #region Public Methods

    /// <summary>
    ///   Generates a pseudorandom boolean.
    /// </summary>
    /// <returns>
    ///   A pseudorandom boolean.
    /// </returns>
    public bool NextBoolean()
    {
      bool _;
      if ( bufferMask_ > 0 )
      {
        _ = ( buffer_ & bufferMask_ ) == 0;
        bufferMask_ >>= 1;
        return _;
      }

      ulong temp_x, temp_y;
      temp_x = y_;
      x_ ^= x_ << 23; temp_y = x_ ^ y_ ^ ( x_ >> 17 ) ^ ( y_ >> 26 );

      buffer_ = temp_y + y_;
      x_ = temp_x;
      y_ = temp_y;

      bufferMask_ = 0x8000000000000000;
      return ( buffer_ & 0xF000000000000000 ) == 0;
    }

    /// <summary>
    ///   Generates a pseudorandom byte.
    /// </summary>
    /// <returns>
    ///   A pseudorandom byte.
    /// </returns>
    public byte NextByte()
    {
      if ( bufferMask_ >= 8 )
      {
        byte _ = ( byte ) buffer_;
        buffer_ >>= 8;
        bufferMask_ >>= 8;
        return _;
      }

      ulong temp_x, temp_y;
      temp_x = y_;
      x_ ^= x_ << 23; temp_y = x_ ^ y_ ^ ( x_ >> 17 ) ^ ( y_ >> 26 );

      buffer_ = temp_y + y_;
      x_ = temp_x;
      y_ = temp_y;

      bufferMask_ = 0x8000000000000;
      return ( byte ) ( buffer_ >>= 8 );
    }

    /// <summary>
    ///   Generates a pseudorandom 16-bit signed integer.
    /// </summary>
    /// <returns>
    ///   A pseudorandom 16-bit signed integer.
    /// </returns>
    public short NextInt16()
    {
      short _;
      ulong temp_x, temp_y;

      temp_x = y_;
      x_ ^= x_ << 23; temp_y = x_ ^ y_ ^ ( x_ >> 17 ) ^ ( y_ >> 26 );

      _ = ( short ) ( temp_y + y_ );

      x_ = temp_x;
      y_ = temp_y;

      return _;
    }

    /// <summary>
    ///   Generates a pseudorandom 16-bit unsigned integer.
    /// </summary>
    /// <returns>
    ///   A pseudorandom 16-bit unsigned integer.
    /// </returns>
    public ushort NextUInt16()
    {
      ushort _;
      ulong temp_x, temp_y;

      temp_x = y_;
      x_ ^= x_ << 23; temp_y = x_ ^ y_ ^ ( x_ >> 17 ) ^ ( y_ >> 26 );

      _ = ( ushort ) ( temp_y + y_ );

      x_ = temp_x;
      y_ = temp_y;

      return _;
    }

    /// <summary>
    ///   Generates a pseudorandom 32-bit signed integer.
    /// </summary>
    /// <returns>
    ///   A pseudorandom 32-bit signed integer.
    /// </returns>
    public int NextInt32()
    {
      int _;
      ulong temp_x, temp_y;

      temp_x = y_;
      x_ ^= x_ << 23; temp_y = x_ ^ y_ ^ ( x_ >> 17 ) ^ ( y_ >> 26 );

      _ = ( int ) ( temp_y + y_ );

      x_ = temp_x;
      y_ = temp_y;

      return _;
    }

    /// <summary>
    ///   Generates a pseudorandom 32-bit unsigned integer.
    /// </summary>
    /// <returns>
    ///   A pseudorandom 32-bit unsigned integer.
    /// </returns>
    public uint NextUInt32()
    {
      uint _;
      ulong temp_x, temp_y;

      temp_x = y_;
      x_ ^= x_ << 23; temp_y = x_ ^ y_ ^ ( x_ >> 17 ) ^ ( y_ >> 26 );

      _ = ( uint ) ( temp_y + y_ );

      x_ = temp_x;
      y_ = temp_y;

      return _;
    }

    /// <summary>
    ///   Generates a pseudorandom 64-bit signed integer.
    /// </summary>
    /// <returns>
    ///   A pseudorandom 64-bit signed integer.
    /// </returns>
    public long NextInt64()
    {
      long _;
      ulong temp_x, temp_y;

      temp_x = y_;
      x_ ^= x_ << 23; temp_y = x_ ^ y_ ^ ( x_ >> 17 ) ^ ( y_ >> 26 );

      _ = ( long ) ( temp_y + y_ );

      x_ = temp_x;
      y_ = temp_y;

      return _;
    }

    /// <summary>
    ///   Generates a pseudorandom 64-bit unsigned integer.
    /// </summary>
    /// <returns>
    ///   A pseudorandom 64-bit unsigned integer.
    /// </returns>
    public ulong NextUInt64()
    {
      ulong _;
      ulong temp_x, temp_y;

      temp_x = y_;
      x_ ^= x_ << 23; temp_y = x_ ^ y_ ^ ( x_ >> 17 ) ^ ( y_ >> 26 );

      _ = ( ulong ) ( temp_y + y_ );

      x_ = temp_x;
      y_ = temp_y;

      return _;
    }

    /// <summary>
    ///   Generates a pseudorandom double between
    ///   0 and 1 non-inclusive.
    /// </summary>
    /// <returns>
    ///   A pseudorandom double.
    /// </returns>
    public double NextDouble()
    {
      double _;
      ulong temp_x, temp_y, temp_z;

      temp_x = y_;
      x_ ^= x_ << 23; temp_y = x_ ^ y_ ^ ( x_ >> 17 ) ^ ( y_ >> 26 );

      temp_z = temp_y + y_;
      _ = DOUBLE_UNIT * ( 0x7FFFFFFF & temp_z );

      x_ = temp_x;
      y_ = temp_y;

      return _;
    }

    /// <summary>
    ///   Generates a pseudorandom decimal between
    ///   0 and 1 non-inclusive.
    /// </summary>
    /// <returns>
    ///   A pseudorandom decimal.
    /// </returns>
    public decimal NextDecimal()
    {
      decimal _;
      int l, m, h;
      ulong temp_x, temp_y, temp_z;

      temp_x = y_;
      x_ ^= x_ << 23; temp_y = x_ ^ y_ ^ ( x_ >> 17 ) ^ ( y_ >> 26 );

      temp_z = temp_y + y_;

      h = ( int ) ( temp_z & 0x1FFFFFFF );
      m = ( int ) ( temp_z >> 16 );
      l = ( int ) ( temp_z >> 32 );

      _ = new decimal( l, m, h, false, 28 );

      x_ = temp_x;
      y_ = temp_y;

      return _;
    }

    /// <summary>
    ///   Fills the supplied buffer with pseudorandom bytes.
    /// </summary>
    /// <param name="buffer">
    ///   The buffer to fill.
    /// </param>
    public unsafe void NextBytes( byte[] buffer )
    {
      // Localize state for stack execution
      ulong x = x_, y = y_, temp_x, temp_y, z;

      fixed ( byte* pBuffer = buffer )
      {
        ulong* pIndex = ( ulong* ) pBuffer;
        ulong* pEnd = ( ulong* ) ( pBuffer + buffer.Length );

        // Fill array in 8-byte chunks
        while ( pIndex <= pEnd - 1 )
        {
          temp_x = y;
          x ^= x << 23; temp_y = x ^ y ^ ( x >> 17 ) ^ ( y >> 26 );

          *( pIndex++ ) = temp_y + y;

          x = temp_x;
          y = temp_y;
        }

        // Fill remaining bytes individually to prevent overflow
        if ( pIndex < pEnd )
        {
          temp_x = y;
          x ^= x << 23; temp_y = x ^ y ^ ( x >> 17 ) ^ ( y >> 26 );
          z = temp_y + y;

          byte* pByte = ( byte* ) pIndex;
          while ( pByte < pEnd ) *( pByte++ ) = ( byte ) ( z >>= 8 );
        }
      }

      // Store modified state in fields.
      x_ = x;
      y_ = y;
    }

    #endregion

  }

}

Leave a Reply