Program verification Benford's Law

First, the definition

Benford's Law, also known as the law of Benford described occurrence probability pile derived from real life data, to 1 as the first digit of the number of about three percent of the total, nearly three times the expected value of 1/9. Promotion, the larger the number, the lower the probability that it led to several numbers appear. It can be used to check whether there is a variety of data fraud. [1]

Second, mathematics

Benford's Law described in b binary system, the probability of occurrence of the count n is beginning

Benford's Law not only applies to single digits, even more than the number can also be used.
The probability of occurrence of the first decimal digits (%, after a decimal place):

d p
1 30.1%
2 17.6%
3 12.5%
4 9.7%
5 7.9%
6 6.7%
7 5.8%
8 5.1%
9 4.6%

Third, to prove

In fact, for Benford's Law, it has so far not accepted proof.

Most of the data can be satisfied, but also part of the data is not satisfied, a uniform distribution of such data

1, a lot of growth in the amount of data will be proportional to the stock (similar to bank deposits, deposit the more, the more income) is there such a formula:

ΔN / (N * Δt) = const (constant)

Wherein incremental ΔN, Δt is the unit time, N is the stock

2, growth is exponential growth, i.e. the same time, the multiple turn is identical with

N=N0*e^(ct)

Wherein, when the growth of stock N0 to N times, the time required t, c is a constant

Shows that, when N1 N2 to grow, the time needed is:

t = c'lg(N2/N1)

3, computing

t1 = c'lg (2)

t2 = c'lg (3/2)

...

tn = c'lg (n + 1) / n

The first authentication data from time 1 to 9 needed

t = t1 + t2 + ... + t9 = c'lg(10) = c'

P1 = t1 / t = c'lg (2) / c '= lg (2) = lg (1 + 1) / 1 ≈ 30.1%

Pn = tn / t = log (n + 1) / n

Here's a word to think of our ancestors, things are difficult, perhaps this is the meaning of it. In fact, this law has so far not a recognized proof, just a lot of data is in line with Benford's Law is.

Fourth, verification

Verify Benford's Law has certain requirements for digital, it must be disorganized data, such as the national population, GDP, etc.

The following verification is a Fibonacci number and the random number

1, Fibonacci column verification

PHP:

<?php
$size = 1000;
$arr = array(1, 2);
for($i = 2; $i < $size; $i++) {
    $arr[] = $arr[$i-1] + $arr[$i-2];
}
$sum = array(0, 0, 0, 0, 0, 0, 0, 0, 0, 0);
for($k = 0; $k < count($arr); $k++) {
    $index = substr($arr[$k], 0, 1);
    $sum[$index]++;
}
print_r($sum);
for($n = 1; $n < count($sum); $n++) {
    echo "首位 {$n} ,比例 " . round($sum[$n]/$size, 2) . "\n";
}
?>

Output:

The Array 
( 
    [ 0] => 0 
    [ . 1] => 300 
    [ 2] => 177 
    [ . 3] => 125 
    [ . 4] => 96 
    [ . 5] => 80 
    [ . 6] => 67 
    [ . 7] => 57 is 
    [ 8] => 53 
    [ 9] => 45 
) 
the first one, the ratio of 0.3 
the first two, the ratio of 0.18 
the first 3, a rate of 0.13 
the first 4, a ratio of 0.1 
the first 5, the proportion of 0.08 
top 6, the ratio of 0.07 
the first 7 and 0.06 
first 8 , the ratio of 0.05 
the first 9, the ratio of 0.05

 

2, random number, note that the random process is a pseudo-random number, a random here plus growth rate, in addition to note that the data could lead to cross-border is too long, plus cycles to ensure that no more than fifteen random number ten square

PHP:

? < PHP
 $ COUNT = 0 ;
 $ size = 1000 ;
 $ Grow = 80000; // growth of 
$ A = RAND ();
 $ SUM = Array (0, 0, 0, 0, 0, 0, 0, 0, 0, 0 );
 for ( $ I = 0; $ I < $ size ; $ I ++ ) {
     // simulate the natural growth rate, 8w can be changed 
    $ K = ( RAND () - 16384) / $ Grow +. 1 ;
     A $ = $ A + $ K * $ A ; 
    the while (mb_strlen ( $ A )> = 15 ) {
         // lowering no effect on the magnitude of the first 
        $ A / = 10 ; 
    } 
    $ index = substr ( $ A , 0,. 1 );
     $ SUM [ $ index ] ++ ; 
} 
print_r ( $ SUM );
 for ( $ n- =. 1; $ n- < COUNT ( $ SUM ); $ n- ++ ) {
     echo "first { $ n- .}, the ratio of" round ( $ SUM [ $ n-]/$size, 2) . "\n";
}
?>

Output:

The Array 
( 
    [ 0] => 0 
    [ . 1] => 303 
    [ 2] => 176 
    [ . 3] => 121 
    [ . 4] => 111 
    [ . 5] => 89 
    [ . 6] => 65 
    [ . 7] => 54 is 
    [ 8] => 36 
    [ 9] => 45 
) 
the first one, the ratio of 0.3 
the first two, the ratio of 0.18 
the first 3, and 0.12 
the first 4, the ratio of 0.11 
the first 5, the proportion of 0.09 
top 6, the ratio of 0.07 
the first 7, the ratio of 0.05 
the first 8 , the ratio of 0.04 
the first 9, the ratio of 0.05

 

 V. Conclusion

For Fibonacci number and a random number, get out of the result is relatively close to Benford's Law, which is in most cases Benford's Law can be used to verify the cause of the false data

 

reference:

[1].  Benford's Law

Guess you like

Origin www.cnblogs.com/lyc94620/p/12079472.html
law