Number (NVDIA2019 written) each word appears in the text Perl- statistics

1, the original title

 

 

 2, perl script

Print " ================. 1 ===================== Method, \ n- " ; 
Open the IN, ' < ' , ' Anna-karenina.txt ' ;
 the while (<the IN> ) { 
        Chomp;   
        $ Line = $ _; 
        $ Line = S ~ / [\,;:.?! ' "() {} \ [\]] / / g; # periods, commas, spaces, etc. to the unified 
        #Print ( " $ Line \ n- " ); 
        @words = Split (/ \ S + / , $ Line); 
        the foreach $ Word (@words) { 
                $ {LC Counts ($ Word)} ++ ; # will appear in the word stores hash table 
        } 
};

 
The foreach $ Word (Sort Keys % Counts) { 
        Print " $ Word, Counts $ {$} Word \ n- " ; # print out the number of words appearing 
} 
Close the IN; 


Print " ======== Method, 2 ===================== ======== \ n- " ; 
Open the IN, ' < ' , ' Anna-karenina.txt ' ;
 the while ($ Line My = <the iN> ) 
{ 
        # map words {$ {$ _} ++;} = $ Line ~ / (\ + W) / G # and the following statements are equivalent 

        #Print ($ Line = ~ / (\ + W) / G); 
        the foreach ($ Line = ~ / (\ W +) /g) {# of words match 
                #Print ( " $ _ \ n- " ); 
                $ words {LC ($ _)} ++ ; 
        } 
} 
for (Sort Keys (% words)) 
{ 
    Print " $ _: $ words $ _} {\ n- " ; 
}

 

3. Results

1) test text

All happy families resemble one another; every unhappy family is unhappy in its own way.
All was confusion in the house of Oblonskys. happy? happy: [happy] {happy} "happy" 'happy'

2) Output

================ Method 1=====================
all,2
another,1
confusion,1
every,1
families,1
family,1
happy,7
house,1
in,2
is,1
its,1
oblonskys,1
of,1
one,1
own,1
resemble,1
the,1
unhappy,2
was,1
way,1
================ Method 2=====================
all: 2
another: 1
confusion: 1
every: 1
families: 1
family: 1
happy: 7
house: 1
in: 2
is: 1
its: 1
oblonskys: 1
of: 1
one: 1
own: 1
resemble: 1
the: 1
unhappy: 2
was: 1
way: 1

4, involved knowledge

1) replacement of a plurality of items may use square brackets:

  $ Line = ~ s / [\,;:.?! ' "() {} \ [\]] / / G; # periods, commas, spaces, etc. to the unified

2) the word lowercase lc, a hash count

  $ Counts {lc ($ word)} ++; # will appear in the word stores hash table

3) access% overall hash, hash key access keys%, to sort sort

  sort keys %counts

4) Method 2 using $ line = ~ / (\ w +) / g is directly converted into a list of words in the text

 

Guess you like

Origin www.cnblogs.com/wt-seu/p/12368915.html