Use C language to verify that discrete events with non-uniform probability conform to the normal distribution curve when the sample size is large enough (by generating an image in PPM format)

The reason why I want to write this article is to see [Official Bilingual] published by the famous mathematics popularization account 3Blue1Brown But what is the central limit theorem? It is mentioned in: Regardless of whether the probability of various situations of this discrete event is average or not, when the number is certain to be large, it will still conform to the normal distribution curve. I just wanted to try it myself to see if this was the case, because I think the central limit theorem and the normal distribution are an amazing part of probability theory.

This paper uses the dice points as a discrete event, and finds the probability of the sum of the points. First realize the program in the state of uniform distribution, and then adjust the probability of uneven distribution. The complete source code is placed at the end to prevent errors caused by problems such as header files.

Under uniform distribution, the probability of the sum of points

First, create a new array to store the points of the dice, as follows:

int a[] = {
    
    1,2,3,4,5,6};

To generate an image use the following function writePPMImage:

void writePPMImage(int* data, int width, int height, const char *filename, int maxIterations)
{
    
    
    FILE *fp = fopen(filename, "wb");

    // write ppm header
    fprintf(fp, "P6\n");
    fprintf(fp, "%d %d\n", width, height);
    fprintf(fp, "255\n");

    for (int i = 0; i < width*height; ++i) {
    
    
        float mapped = pow( std::min(static_cast<float>(maxIterations), static_cast<float>(data[i])) / 256.f, .5f);
        unsigned char result = static_cast<unsigned char>(255.f * mapped);
        for (int j = 0; j < 3; ++j)
            fputc(result, fp);
    }
    fclose(fp);
    printf("Wrote image file %s\n", filename);
}

The parameters of this function:

  1. dataIt is an array, in which each element corresponds to the color information of each pixel of the bitmap (Z arrangement), that is to say, there is one value in the sum of points corresponding to an element (or pixel) .
  2. widthand heightare the dimensions of the resulting bitmap.
  3. filenameis the generated bitmap file.
  4. maxIterationsis the maximum value of the color, that is, the value corresponding to white, here we set it as 2568-channel color in the code. We only need black and white, so it can be more concise, written directly 1, and then only use 0and 1two integer values ​​​​to represent black and white.

Write the code directly below, please see the comments for the introduction of each step:

int main() {
    
    
	//设置图片尺寸为1450x1000
    int width = 1450;
    int height = 1000;
    
    //待会需要随机从中选择一个元素,当作骰子的点数
    int a[] = {
    
    1,2,3,4,5,6};
    
    //用来存放各种点数之和的数量多数组,这里不要声明空数组,因为一些编译器会给没有值的元素分配一些很奇怪的值,导致运行错误(不像C语言是默认为0)
    int* sumArr = new int[width];
    //用来存放最后输出图像的像素色彩信息的数组
    int* output = new int[width*height];
	
	//样本量为30x1000=30000,也就是取3万次点数之和
    for (int i=0; i<height*30; i++) {
    
    
    	//获取到一个随机点数。模6表示随机值范围是0~5,刚好对应前面数组a的每个元素
        int temp = a[random()%6];
        //下面的循环将会累加100次,也就是表示多少个骰子点数之和
        for (int j=0; j<100; j++) {
    
    
            temp = temp + a[random()%6];
	        }
	        //给这个值对应的sumArr的元素加1
	        sumArr[temp] = sumArr[temp]+1;
	    }
	    
	    //因为输出图像的时候,条状图是从底部开始的,所以写这样的一个转换
	    for (int i=0; i<width; i++) {
    
    
	        for (int j=height-1; j>=height-sumArr[i]; j--) {
    
    
	            output[j*width+i]=256;
	        }
	    }
		//输出图像
	    writePPMImage(output, width, height, "output.ppm", 256);
	
	    delete[] sumArr;
	    delete[] output;
	    return 0;
	}

The images corresponding to the generated 30,000 samples are as follows:

Please add a picture description

It is very similar to the normal distribution curve, but it is too sharp. In order to make it more obvious, let's "stretch and flatten" it. forThe way to do this is to modify the second big loop as follows:

for (int i=0; i<width; i++) {
    
    
        // sumArr[i]/2是为了压缩图像
        for (int j=height-1; j>=height-sumArr[i]/2; j--) {
    
    
            //拉宽图像
            for (int k=0; k<10; k++) {
    
    
                output[j*width+i*10+k]=256;
        }
    }
}

That is, it becomes 2x10 pixels to represent a sample, and the images below are all displayed according to this scaling . At this time the image is as follows:

Please add a picture description

At this time, it is very similar to the image of the standard normal distribution. If you want to really realize the image of the standard normal distribution, then add the part of calculating the variance and sample mean, just a few more steps.

Probability of sum of points under non-uniform distribution

Next let's try a graph of non-uniform distribution probabilities. This was difficult for me at first. I didn’t know how to make the probability of each value different, but I quickly realized that this is just catching small balls (elements) in a box (array). Then modify the elements of the array. Quantity and value are enough, so the array of sample space at this time is:

int a[] = {
    
    1,1,1,1,1,2,3,4,5,6};

1There are five, which means 1the probability is 0.5, and the rest are all 0.1.

At this time, the source code also needs to be modified, not only because the number of elements has changed, but also the range of random values ​​has to be changed, and it is also necessary to consider various test situations and make it more general, so it is modified into the following style:

int main() {
    
    
    int width = 1700;
    int height = 1000;
    int a[] = {
    
    1,1,1,1,1,2,3,4,5,6};
    //count用来统计样本空间的大小,这样就不用手动去下面依次修改了
    int count = sizeof(a)/sizeof(int);
    
    
    int* sumArr = new int[width];
    int* output = new int[width*height];

    for (int i=0; i<height*30; i++) {
    
    
        int temp = a[random()%count];
        for (int j=0; j<100; j++) {
    
    
            temp = temp + a[random()%count];
        }
        sumArr[temp] = sumArr[temp]+1;
    }
    
    for (int i=0; i<width; i++) {
    
    
        // sumArr[i]/2是为了压缩图像
        for (int j=height-1; j>=height-sumArr[i]/2; j--) {
    
    
            //拉宽图像
            for (int k=0; k<10; k++) {
    
    
                output[j*width+i*10+k]=256;
            }
        }
    }

    writePPMImage(output, width, height, "output.ppm", 256);

    delete[] sumArr;
    delete[] output;
    return 0;
}

The image generated at this time is as follows:

Please add a picture description

It can be seen that it still conforms to the normal distribution curve, and 1the image does not change because of the high probability.

How about a little more extreme? What if 1the probability is as high as 99%?

Unfortunately, to make 1the probability as high as 99%, the sample space array needs to have 500 elements, which will cause some resource allocation errors. Just try the 1case where the probability is 95%, then the array is as follows (the array is listed here For the convenience of readers, you can copy it down and try it yourself):

int a[] = {
    
    
       	1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,    //40个
        1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,    //40个
        1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,    //40个
        2,3,4,5,6,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,                          //15个1
    };

At this time, the image is as follows:

Please add a picture description

It can be seen that 1+1=2the samples of the minimum value are the most, but the right side is still half of the normal distribution, so what if the number of accumulations is increased? For example, from 100 times to 1000 times (the number of samples is reduced to 10,000 times), the image at this time is as follows:

Please add a picture description

Because there are too many possibilities, the size of the image here is 17000x1000px, which is a bit unclear, so I cropped part of the image:

cropped image

It can be seen that in the end it still conforms to the normal distribution curve, which is exactly the central limit theorem.

full code

#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <algorithm>

using namespace std;

void
writePPMImage(int* data, int width, int height, const char *filename, int maxIterations)
{
    
    
    FILE *fp = fopen(filename, "wb");

    // write ppm header
    fprintf(fp, "P6\n");
    fprintf(fp, "%d %d\n", width, height);
    fprintf(fp, "255\n");

    for (int i = 0; i < width*height; ++i) {
    
    
        float mapped = pow( std::min(static_cast<float>(maxIterations), static_cast<float>(data[i])) / 256.f, .5f);
        unsigned char result = static_cast<unsigned char>(255.f * mapped);
        for (int j = 0; j < 3; ++j)
            fputc(result, fp);
    }
    fclose(fp);
    printf("Wrote image file %s\n", filename);
}

int main() {
    
    
	//输出图像的尺寸
	//图像会随着累加次数右移,所以增加累加次数的时候要把输出图像的宽度扩大一些
    int width = 17000;
    int height = 1000;
    int a[] = {
    
    
        1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,    //40个
        1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,    //40个
        1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,    //40个
        2,3,4,5,6,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,                          //15个
    };
    //count用来统计样本空间的大小,这样就不用手动去下面依次修改了
    int count = sizeof(a)/sizeof(int);
    
    //用来存放各种点数之和的数量多数组,这里不要声明空数组,因为一些编译器会给没有值的元素分配一些很奇怪的值,导致运行错误(不像C语言是默认为0)
    int* sumArr = new int[width];
    //用来存放最后输出图像的像素色彩信息的数组
    int* output = new int[width*height];
    
	//样本量为10x1000=10000,也就是取1万次点数之和
    for (int i=0; i<height*10; i++) {
    
    
    	//获取到一个随机点数。模6表示随机值范围是0~count,刚好对应前面数组a的每个元素
        int temp = a[random()%count];
        //下面的循环将会累加1000次,也就是表示多少个骰子点数之和
        for (int j=0; j<1000; j++) {
    
    
            temp = temp + a[random()%count];
        }
        sumArr[temp] = sumArr[temp]+1;
    }
    
    //因为输出图像的时候,条状图是从底部开始的,所以写这样的一个转换
    for (int i=0; i<width; i++) {
    
    
        // sumArr[i]/2是为了压缩图像
        for (int j=height-1; j>=height-sumArr[i]/2; j--) {
    
    
            //拉宽图像
            for (int k=0; k<10; k++) {
    
    
                output[j*width+i*10+k]=256;
            }
        }
    }

	//输出图像
    writePPMImage(output, width, height, "mandelbrot-serial.ppm", 256);

    delete[] sumArr;
    delete[] output;
    return 0;
}

It's very interesting, I hope it can help someone in need~

Guess you like

Origin blog.csdn.net/qq_33919450/article/details/130706860