Random array upset algorithm, algorithm reservoir

1. disrupted random array (shuffling algorithm)

  Shuffling algorithm correctness criteria analysis: The result must have n possible, otherwise, it is wrong!. This is a nice explanation, because a full array of length n, there is an array of n! Species, that is to say a total of upset results n! Species. Algorithm must be able to reflect this fact, it is correct.

Code:

def shuffleArr(arr):
    l = len(arr)
    for i in range(l):
        rand = random.randint(i,l-1)
        arr[i],arr[rand] = arr[rand],arr[i]
    print(arr)

arr = [1,2,3,4,5]
shuffleArr(arr)

  

2. reservoir algorithm

Scenario: unknown length massive data stream to extract a random data equal probability

Algorithmic process:

  Suppose the size of the data sequence is n, the number of samples required to be k.

  First build a receiving array of k elements, the first k elements of the sequence into an array.

  Then k + 1 from the first element to \ (frac \ {k} { n} \) probability to determine whether the element is replaced to the array (the probability of elements in the array to be replaced is the same). When finished traversing all the elements, the remaining elements in the array is the need to take samples.

proving process:

For the i-th ( \ (i≤k \) ). Before step k, the probability of being selected as 1. When we arrived at step k + 1, k is the probability of an element replaceable = + \ (k + 1 \) element selected probability \ (\ Times \) probability replaced i is selected, namely \ ( \ K FRAC {} + {K}. 1 \ Times \ FRAC. 1 {{}} K \) . It was reserved for the probability \ (l- \ FRAC. 1 {{}} = K +. 1 \ K FRAC {} {}. 1 K + \) . And so on, the probability of not being replaced k + 2 th element \ (1 - \ frac {k } {k + 2} \ times \ frac {1} {k} = \ frac {k + 1} {k + 2 } \) . When the operation to step n, the probability of the selected reserved = probability * probability is not replaced, namely:

\(1 \times \frac{k}{k + 1} \times \frac{k + 1}{k + 2} \times \frac{k + 2}{k + 3} \times … \times \frac{n - 1}{n} = \frac{k}{n}\)

For the number of the j-th ( \ (j> K \) ). Step j probability is selected as \ (\ K FRAC {j}} {\) . Probability is not replaced j + 1 th element \ (. 1 - \ K FRAC {} {} j + 1 \ Times \ FRAC. 1 {{}} = K \ FRAC {J} {j + 1} \) . When the operation to step n, the probability of being retained = probability of being selected \ (\ Times \) probability is not to be replaced, namely: \ (\ FRAC {K} {J} \ Times \ FRAC {J} {J + 1} \ times \ frac { j + 1} {j + 2} \ times \ frac {j + 2} {j + 3} \ times ... \ times \ frac {n - 1} {n} = \ frac {k} {n} \ ) so that for each element, are retained probability \ (\ frac {k} {n} \ ) .

Code:

import random
def ReservoirSamplingTest(k):
    N = 10
    pool = [i for i in range(N)]
    result = pool[:k]
    for i in range(k,N):
        r = random.randint(0,i+1)
        if r < k:
            result[r] = pool[i]
    return result

res = ReservoirSamplingTest(4)
print(res)

  

references:

[1] shuffling algorithm Depth Depth - upset array - power button (LeetCode)

[2] reservoir sampling algorithm (Reservoir Sampling) - alfred_zhong - blog Park

Guess you like

Origin www.cnblogs.com/nxf-rabbit75/p/11545315.html