Overlay and multi-dimensional statistical distributions python little exercise of (a)

Written at the beginning: Last night a friend suddenly asked me a question, if not the end like a long time, a long time later want to think of a solution, but I guess there should be a better way to solve.

Cumulative and multi-dimensional statistical distributions

The name is a temporary mission requirements according to the author thought, here we first describe the contents of the task,
the mission objectives :
the calculation of an array, back randomly selected n times, n times the output of all the possible extraction and has been owned and statistics the number and probability.
Give a small example, we have drawn twice from 0,1,2, to count twice before and may have been the result of all, let's draw a diagram. We can see all the results after two extracts are 0-4,
Here Insert Picture Description
and then draw the number of occurrences of each result,
Here Insert Picture Description
it is our task, when the extraction of an array n times, n times of results and statistics is kind of how it?
Readme task
this question because I do not know whether there is a mathematical inductive, but if only from the procedural point of departure, there will be the following two questions to ponder:
1. General can use a for loop, but extracting the n-th time, meaning n the need for loop, this is meaningless, then how to replace the for loop?
2. If we choose the result first n times drawn first save and then added together, then we need to face the problem of high latitudes storage array, you can use the tree to solve, but the complexity of creation is relatively high. So how simple data storage is also a problem.
Here I simply say something about research ideas of this task,
first of all to the first question if you want to replace more than a for loop, we can use numpy matrix operation, or who perform repetitive operation cycle operation with recursion, we may wish to the drawn to the array of all possible impressions, we have to (0,1,2) these three numbers were drawn, for example, extracted three times the results are as follows,
Here Insert Picture Description
we can find all the extraction results to show up, presents apparent regularity, we can see the first line of each of all possible, in accordance with the n 0, n number 1, n 2 a are arranged such that the length of the specific length of the n-th power of the original array. Then the first row of the law is found, then a few rows behind the law?
From the red and green point of view, drawn on a possible extraction of all time will be organized into a non-first line of the results of all the results, as shown, the first results appeared in 012 following the second time, with management. We use a recursive procedure to the last data used in the next generation, so that you can get a complete extraction results.
For the second question, if we are first saved all of the results, we can see that the framework we need is n × n 3 n\times n^{3} growth, this structure is very unreasonable. In order to solve such a problem, we combine our goal is to find a variety of results and so we direct calculation of each drawing and as a result of an iteration object,
Here Insert Picture Description
so we need to calculate a time, a row of 0,1,2 and the last drawn and the addition operation, intermediate process does not need to consider building an array does not need too much to consider numbers up, only need to set a good number of recursion, you can get the results we want, and then count each plus and the number of occurrences. Tasks get! Let's start with the realization of the program, when most of the operational program numpy to, but in some places it may be because of numpy unskilled operation is relatively cumbersome, the future will be improved.


Task realize :
We are here to extract [0,1,2,3,4,5] four times, and the statistical distribution of the extracted results show,

import numpy as np
import copy



# 统计抽取累积和函数定义
def sumStatic(choiceList, k, n):
    """
        描述:该函数用于统计对m个数字,放回随机抽样抽取n次,每次的值相加得到的和的所有可能结果,
              以及可能结果出现的次数和概率的统计。
        输入:
            chiceList:输入的是一个np.array,是一个m维的行向量。我们将从choiceList中抽取数值进行加和计算;
            k:循环控制变量,控制函数递归的次数,初始值为0,每递归一次k要增加1;
            n:抽取次数,通过手动设定想要抽取多少次。
        返回:
            y:返回的是对于各种和的统计次数,并调用到下一个函数中,进行可视化的展示
    """
    z = np.array([])  # 创建空np.array数组,方便结尾进行储存
    store = []        # 创建空list,储存每一步的计算结果
    k += 1
    for i in range(d.shape[0]):
        lists = copy.deepcopy(choiceList)  # 对输入数组进行深拷贝
        lists += d[i]                      # 对每个数组进行加和
        store.append(lists)                # 添加到list中
    z = np.hstack(store)                   # 将list处理为一个行np.array
    if k == (n - 1):                    # 如果递归次数到了n-1次,那么返回所有和的结果出现次数统计到可视化函数中
        return distribution(np.bincount(z))
    return sumStatic(z, k, n)             # 如果没有到达既定的抽取次数,那么继续递归


# 统计次数可视化
def distribution(z):
    number = np.argwhere(z)
    #sum = np.sum(z)
    for i in range(number.shape[0]):
        print(number[i], "=====>", z[i],"次")  # 输出每一个和的结果出现的概率
		#如果想计算概率可以把sum和下面这个语句输出
		#print(number[i], "=====>", round(z[i], 4))

n = int(input("[请输入你想抽取的次数]> "))
d = np.array([0, 1, 2, 3, 4, 5])
sumStatic(d, 0, n)
[请输入你想抽取的次数]> 4
[0] =====> 1 次
[1] =====> 4 次
[2] =====> 10 次
[3] =====> 20 次
[4] =====> 35 次
[5] =====> 56 次
[6] =====> 80 次
[7] =====> 104 次
[8] =====> 125 次
[9] =====> 140 次
[10] =====> 146 次
[11] =====> 140 次
[12] =====> 125 次
[13] =====> 104 次
[14] =====> 80 次
[15] =====> 56 次
[16] =====> 35 次
[17] =====> 20 次
[18] =====> 10 次
[19] =====> 4 次
[20] =====> 1 次

Some people may then want to plot, this is relatively simple, only need to number and z can be plotted, we can reconstruct the distribution,

def distribution(z):
    x = np.argwhere(z)
    X = np.reshape(x,(1,-1))[0]
    plt.bar(X,z,color="#ffcfdc")

The same can be invoked, we chart a.
Here Insert Picture Description


Conclusion
to this, we have to achieve this end the accumulation and distribution of statistics, but it is also a problem, such as when the sample 400 when the theoretical dimension is the 400 th power, the computer runs almost 5 does not work, if after a chance, see if you can get a quick dimensionality reduction approach, or be able to find a formula gap between each bar graph, if they can get, then you will soon be able to draw the appropriate statistical results.
thanks for reading.

Published 23 original articles · won praise 85 · views 4031

Guess you like

Origin blog.csdn.net/qq_35149632/article/details/104814820