Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why not just read 64 bits off /dev/urandom and be done with it? All this additional complexity doesn't actually buy any "extra" randomness over this approach, and I'm skeptical that it improves speed either.


The problem is, there's around 2^62 double precision numbers between 0 and 1, but they're not uniformly spaced-- there's many, many more between 0 and 0.5 (nearly 2^62) than there are between 0.5 and 1 (around 2^52), for instance.

So, if you want a uniform variate, but you want every number in the range to be possible to generate, it's tricky. Each individual small number needs to be much less likely than each individual large number.

If you just draw from the 2^62 space randomly, you almost certainly get a very small number.


Seems to me that the simplest solution would be to repeatedly divide the range of numbers into two halves, then randomly selecting either one until it converges onto a single value. In C this might look something like this:

  double random_real(double low, double high, int (*random_bit)(void)) {
    if (high < low)
      return random_real(high, low, random_bit);
    double halfway, previous = low;
    while (true) {
      halfway = low + (high - low) / 2;
      if (halfway == previous)
        break;
      if (random_bit() & 1)
        low = halfway;
      else
        high = halfway;
      previous = halfway;
    }
    return halfway;
  }
That should theoretically produce a uniformally-distributed value. (Although perhaps I've missed some finer point?)


So you have two doubles halfway and previous and a recursion that depends on if(halfway==previous) to terminate, where halfway is the result of a floating point calculation. You sure that’s going to work? I suspect it will frequently fail to terminate when you think.

Secondly, why does this generate a uniform random number? It’s not clear to me at all. It seems it would suffer the exact problem GP’s talking about here, that certain ranges of numbers would have a much higher probability than others on a weighted basis.


> Secondly, why does this generate a uniform random number?

Each interval of equal size occurs with equal likelihood at each step.

Consider that you want to generate a random number between 0 and 1024 (excl.). The midpoint would be 512, thus you choose randomly whether the lower interval [0, 512) or the higher interval [512,1024) is selected. In each step, the range size is independent of the concrete numbers, i.e. for it is exactly 2^(-step_size) * (high - low), and in each step each range has equal probability. Thus if the algorithm terminates, the selected number was in fact uniformly sampled.

I would also presume it must terminate. Assume that the two endpoints are one ulp apart. The midpoint is thus either of the two, there is no randomness involved (barring FPU flags but they don't count). In the next step, the algorithm either terminates or sets the endpoints equal, which also fixes the midpoint. Thus the procedure always returns the desired result.


The issues that the GP is grappling with are largely due to the fact that they are trying to "construct" real numbers from a stream of bits. That is always going to lead to bias issues. On the other hand, with this particular algorithm (assuming a truly random source) the resulting number should be more or less completely uniform. It works because we are partitioning the search space itself in such a way that all numbers are as likely as any other. In fact, that the algorithm terminates rather predictably essentially proves just that. After one million invocations, for example, the average number of iterations was something like 57 (with the minimum being 55 and the maximum outlier 74). Which is to say you could pick any number whatsoever and expect to see it no more than once per ~2^57 invocations.


I was curious about this. On the one hand, comparing doubles with == is rarely a good idea but, on the other hand, your explanation seems valid.

After some testing I discovered a problem but not with the comparison. The problem is with calculating the halfway value. There are some doubles where their difference cannot be represented as a double:

  #include <stdio.h>
  #include <stdlib.h>
  #include <time.h>
  #include <float.h>

  double random_real(double low, double high, int (*random_bit)(void)) {
    if (high < low)
      return random_real(high, low, random_bit);
    double halfway, previous = low;
    while (1) {
      halfway = low + (high - low) / 2;
      if (halfway == previous)
        break;
      if (random_bit() & 1)
        low = halfway;
      else
        high = halfway;
      previous = halfway;
    }
    return halfway;
  }


  int main(int argc, char *argv[]) {
    srand(time(NULL));
    for (int i = 0; i < 1000000; i++) {
      double r = random_real(-DBL_MAX, DBL_MAX, rand);
      printf("%f\n", r);
    }
  }


Actually, the problem is that the algorithm is having to calculate DBL_MAX+DBL_MAX, which of course is going to exceed the maximum value for a double-precision number (by definition). That isn't a very realistic use-case either, but in any case you could just clamp the inputs like so:

  double random_real(double low, double high, int (*random_bit)(void)) {
    if (high < low)
      return random_real(high, low, random_bit);
    const double max = DBL_MAX / 2;
    if (high > max)
      high = max;
    const double min = -max;
    if (low < min)
      low = min;
    double halfway, previous = low;
    while (1) {
      halfway = low + (high - low) / 2;
        if (halfway == previous)
      break;
      if (random_bit() & 1)
        low = halfway;
      else
        high = halfway;
      previous = halfway;
    }
    return halfway;
  }


Fixed it!

  #include <stdio.h>
  #include <stdlib.h>
  #include <time.h>
  #include <float.h>

  double random_real(double low, double high, int (*random_bit)(void)) {
    if (high < low)
      return random_real(high, low, random_bit);
    double halfway, previous = low;
    while (1) {
      halfway = low + (high/2 - low/2);
      if (halfway == previous)
        break;
      if (random_bit() & 1)
        low = halfway;
      else
        high = halfway;
      previous = halfway;
    }
    return halfway;
  }


  int main(int argc, char *argv[]) {
    srand(time(NULL));
    for (int i = 0; i < 1000000; i++) {
      double r = random_real(-DBL_MAX, DBL_MAX, rand);
      printf("%f\n", r);
    }
  }


Yep, that works too. You likely won't ever need a random number that large, of course, but if you did that would be the way to do it.


That wouldn't produce uniformly distributed random floating-point numbers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: