On generating random numbers

Properties of random number: Part0

Random Number meet the following general these properties.

(Markov property) \ (1 \) has nothing to do with the number in front of the rear when it is generated.

(Uncertainty) \ (2 \) to a part and the random algorithm given sample, we can not launch the remainder of the sample.

(Irreproducibility) \ (3 \) which can not reproduce a random sample.

Also talk about the concept of statistical pseudo-random number.

Statistical pseudorandomness Statistical refers to a pseudo-randomness in the random bit stream of a given sample, the number 1 is substantially equal to the number 0, empathy, "10" "01" "00" "11," the four approximately equal amount. similar standards are called statistical randomness meet such requirements in the digital human "glance" random. (From Baidu terms)

In fact this is in the computer of the pros and cons of the concept of pseudo-random numbers.

Part1: pseudo-random number

Pseudo-random number, which is what we commonly used in C ++ random number. Why say it is "pseudo" random? In fact, just a little too detailed understanding of C ++ randfunctions people should be able to understand this.

Since the computer itself can not be generated and the random number, only to "fake" random by generating a set of long loop section number.

C ++ is randa function really just calculated according to the seed to provide you with a set of long cycle section number. As long as the seed is the same offer twice, the randrandom number function offered is the same.

So, in the end a long cycle section and to what extent?

In fact, rand()to achieve the LINUX system function as follows:

static unsigned long next=1;

// RAND_MAX assumed to be 32767 

int rand()
{
    next=next*1103515245+12345;
    return ((unsigned)(next/65536)%32768);
}

void srand(unsigned seed)
{
    next=seed;
}

Through this method we can infer, randthe cycle should be in the section of function \ (32768 (2 ^ {15}) \) or less.

Thus, in computer terms, now, if without the aid of external help, it is unable to achieve true random.

Part2: judging the merits of random numbers

Speaking before the random number algorithm, random numbers should first talk about the merits of the decision. After all, only cleared the pros and cons of random numbers, we can say how to generate high-quality random number.

Here we will use statistical pseudorandomness said earlier:

Statistical pseudorandomness Statistical refers to a pseudo-randomness in the random bit stream of a given sample, the number 1 is substantially equal to the number 0, empathy, "10" "01" "00" "11," the four approximately equal amount. similar standards are called statistical randomness meet such requirements in the digital human "glance" random. (From Baidu terms)

Binding properties of the pseudo random number, we can draw the following three properties of the merits of the random number is determined:

\ (1 \) random extent that the adequacy of randomized algorithms complexity, would not exist a clear link between the random number.

\ (2 \) distribution, that is, whether there is the phenomenon of large numbers of random numbers is too large too small in the distribution area, the adequacy of the distribution average.

\ (3 \) cycle length, that is, whether the case will cycle occurs when a large number of calls quickly.

With these rules judge, we better learn to generate high-quality random numbers.

Part3: based on C ++ randhigh-quality random number generation

Let's speaking about the C ++ based on randgenerating a random number function.

1. oscillating back and forth

This random number is mainly for the needs of annealing algorithm like with a random number to correct answers. Since it is amended answer, then we hope for the best swing back and forth, one positive and one negative. This is characteristic of random numbers by part manual processing, the original randrandom number generating function becomes alternately positive and negative.

static int f=3000;
static double del=0.999;// f和del是用来控制随机数幅度不断变小的 
static int con=-1;
static int g=1; // 控制正负交替 

int rand1()
{
    f*=del;
    g*=con;
    int tmp=f*g*rand();
    return tmp;
}

This generates a random number introduced annealing idea, of course, you can also use existing algorithms directly controlled temperature.

2. the average formula

This is mainly used for a balanced tree treap, characterized in ensuring the individual random number to ensure more evenly distributed on the whole.

int p; // 希望的分布位置

int rand2()
{
    int tmp=(p+rand())/2; // 通过取于分布位置的平均数,是产生的数更加靠近期望分布 
    return tmp;
}

3. multiple calls are not repeated formula

Of course, if someone really need to be very close to the true random number, which is repeatedly run the program the same situation does not occur, it would need to use a certain amount of external interference.

The first is clocka function. As already mentioned, a program every time the running time, there will be small changes constantly during the call. We can make good use of this change. After each call to reset a random number seed.

There is also a possibility we will ignore method, error computer itself. As we all know, computers do floating-point operations will generate a loss of precision, then we can use this feature to assist Clock `` 调整种子(毕竟程序调用时间相同其实可能性也不小,毕竟clock``` only accurate to \ (\ MS {text} \) ).

int count;

int rand3()
{
    ++count;
    int t=clock()+1; // 使用当前时间 
    
    for(int i=1;i<12121307;++i) // 降速
        t+=rand();
    
    t+=clock();  // 降速后扩大时间变化 
    t*=-1234;
    srand(t*count+rand()); // 重置随机数种子 
    return rand();
}

After a lot of experiments, they found that the first three the number of function duplication probability will be relatively large (7 to 9%) recommendations from the fourth started.

The above code is used with no loss of accuracy to randomize, since a loss is too small with the progress of the equation that does not occur almost any change, and there is no use.

The pros and cons analysis

First, random degree, although seen before rand()function code, it may clear correlation between the numbers. But in practice, the association between this figure is still basically negligible. So in terms of random extent, rand()a function was able to scrape through of.

The average distribution, look at the code might not feel it.

So, we will do a test:

int data[10007];

int main()
{
    for(int i=1;i<=1000000;++i)
    {
        int tmp=rand()%100000; // 生成一个100000以内的随机数 
        ++data[tmp/10]; // 统计出现次数 
    }

    for(int i=1;i<=1000;++i)
        printf("%d\n",data[i]);
}

final result:

KHzmsP.png

We can see, this distribution is very uneven.

Cycle length ...

This is mainly rand()flawed function, and \ (32768 \) This length is really quite not enough. In the call requires a lot of rand()arithmetic functions (such as annealing), the basic will put rand()the card out of the loop.

That there is no long section of both quality and cycle algorithms do?

Part4: Mersenne Twister (MT19937)

Mersenne Twister algorithm is generally currently produce high-quality pseudo-random numbers in C ++ 11 standard, the realization MT19937 also increased in randomthe library.

In fact, this algorithm was developed by Matsumoto and Nishimura true and extension with disabilities in an algorithm developed in 1997, and Mason did not matter much. The reason it has this name because it has up to \ (2 ^ {19937} -1 \ ) circular section, which is a Mersenne prime. Moreover, this algorithm can generate a uniform random number in such a long cycle section.

So, MT19937 principle of what is it?

This is actually a rotation algorithm (19937 \) \ -bit binary sequence for the transformation. As we know, a length \ (n-\) binary sequence, which is arranged in the longest length \ (n-2 ^ \) . However, sometimes because of some improper operation, resulting in circulation section less than \ (2 ^ n \) . and how exactly this sequence arrangement reached \ (2 ^ n \) th, is the essence of this rotation algorithms.

If the feedback function itself \ (1 + \) is an irreducible polynomial, then the loop section reaches its maximum, i.e. \ (2-n-1 ^ \) .

We take a look at four analog register:

Here we use the feedback function is \ (the y-the X-^ = ^ 2 + 4 + the X-the X-+ 1 \) (This is not irreducible polynomial, just bring a good understanding)

The formulas \ (the X-^ 4 \) , \ (the X-^ 2 \) , \ (the X-\) , which means that every time we move forward on this number from binary sequence after the first four and the two do different or operation, then \ (x \) means that we retake the result and the last one to do XOR. the final result into the beginning of the sequence, the entire sequence of a shift, the last one to give up.

KqB6ts.png

1.初始数组\(\{1,0,0,0\}\).

KqBykj.png

2.将它的第四位和第二位取出来做异或运算.

KqB2pq.png

3.把刚刚的运算结果和最后一位再做一次运算

KqBR10.png

4.把最后的运算结果放到第一位,序列后移.最后一位被抛弃.

这就是一次运算,然后这个算法就是不断循环这几步,从而不断伪随机改变这个序列.

因为它所使用的反馈函数\(y=x^4+x+1\)是既约多项式,所以最后的循环节为\(2^4-1=15\),运算结果如下:

\[ \begin{array} {|c|c|c|c|}\\ a_3&a_2&a_1&a_0\\ 1&0&0&0\\ 1&1&0&0\\ 1&1&1&0\\ 1&1&1&1\\ 0&1&1&1\\ 1&0&1&1\\ 0&1&0&1\\ 1&0&1&0\\ 1&1&0&1\\ 0&1&1&0\\ 0&0&1&1\\ 1&0&0&1\\ 0&1&0&0\\ 0&0&1&0\\ 0&0&0&1\\ -&-&-&-\\ 1&0&0&0 \end{array} \]

大家可以看到,这个运算结果包含了\(1,2,...,2^4-1\)中的所有整数,并且没有循环,同时拥有很好的随机性.

Part5:MT19937的伪代码及C++实现

初始化随机种子:

\(index\leftarrow 0\)
\(MT\leftarrow\ new\ array\ with\ size\ 624//624\times 32-31=19937\)
\(//\text{Above are global variables}\)
\(\text{MT19937_SRAND}(seed):\)
\(index\leftarrow 0\)
\(MT[0]\leftarrow seed\)
\(\mathbf{for}\ i\leftarrow 1\ \mathbf{to}\ 623:\)
\(\quad t\leftarrow 1812433253\cdot(MT[i-1]\oplus(MT[i-1]\gg 30))+i//\oplus\text{ is the xor operation}, \gg\text{ is the right-shift operation}\)
\(\quad MT[i]\leftarrow t\& \text{0xffffffff}//\text{get the last 32 bits}\)
\(//\&\text{ is the bit-and operation}, \text{0x means that the number next is a hex number}\)

梅森旋转:

\(\text{MT19937_GENERATE}():\)
\(\mathbf{for}\ i\leftarrow\ 0\ \mathbf{to}\ 623:\)
\(\quad y\leftarrow (MT[i]\&\text{0x80000000})+(MT[(i+1)\bmod 624]\&\text{0x7fffffff})\)
\(\quad MT[i]\leftarrow MT[(i+397)\bmod 624]\oplus(y\gg 1)\)
\(\quad\mathbf{if}\ y\&1=1:\)
\(\quad\quad MT[i]\leftarrow MT[i]\oplus 2567483615\)

生成随机数:

\(\text{MT19937_RAND}():\)
\(\mathbf{if}\ index=0:\)
\(\quad \text{MT19937_GENERATE}()\)
\(y\leftarrow MT[index]\)
\(y\leftarrow y\oplus (y\gg 11)\)
\(y\leftarrow y\oplus ((y\ll 7)\&2636928640)\)
\(y\leftarrow y\oplus ((y\ll 15)\& 4022730752)\)
\(y\leftarrow y\oplus (y\gg 18)\)
\(index\leftarrow (index+1)\bmod 624\)
\(\mathbf{return}\ y\)

C++实现:

int index;
int MT[624];

// 设置随机数种子
inline void sramd(int seed)
{
    index=0;
    MT[0]=seed;
    
    for(int i=1;i<=623;++i)
    {
        int t=1812433253*(MT[i-1]^(MT[i-1]>>30))+i;
        MT[i]=t&0xffffffff;
    }   
} 

// 梅森旋转 
inline void generate()
{
    for(int i=0;i<=623;++i)
    {
        int y=(MT[i]&0x80000000)+(MT[(i+1)%624]&0x7fffffff);
        MT[i]=MT[(i+397)%624]^(y>>1);
        
        if(y&1)
            MT[i]^=2567483615;
    }
}

// 生成随机数
inline int rand()
{
    if(index==0)
        generate();
    
    int y=MT[index];
    y=y^(y>>1);
    y=y^((y<<7)&2636928640);
    y=y^((y<<15)&4022730752);
    y=y^(y>>18);
    index=(index+1)%624;
    return y;
 } 

本文完

Guess you like

Origin www.cnblogs.com/Anverking/p/oi-rand.html