1, the original title
2, perl script
Print " ================. 1 ===================== Method, \ n- " ; Open the IN, ' < ' , ' Anna-karenina.txt ' ; the while (<the IN> ) { Chomp; $ Line = $ _; $ Line = S ~ / [\,;:.?! ' "() {} \ [\]] / / g; # periods, commas, spaces, etc. to the unified #Print ( " $ Line \ n- " ); @words = Split (/ \ S + / , $ Line); the foreach $ Word (@words) { $ {LC Counts ($ Word)} ++ ; # will appear in the word stores hash table } }; The foreach $ Word (Sort Keys % Counts) { Print " $ Word, Counts $ {$} Word \ n- " ; # print out the number of words appearing } Close the IN; Print " ======== Method, 2 ===================== ======== \ n- " ; Open the IN, ' < ' , ' Anna-karenina.txt ' ; the while ($ Line My = <the iN> ) { # map words {$ {$ _} ++;} = $ Line ~ / (\ + W) / G # and the following statements are equivalent #Print ($ Line = ~ / (\ + W) / G); the foreach ($ Line = ~ / (\ W +) /g) {# of words match #Print ( " $ _ \ n- " ); $ words {LC ($ _)} ++ ; } } for (Sort Keys (% words)) { Print " $ _: $ words $ _} {\ n- " ; }
3. Results
1) test text
All happy families resemble one another; every unhappy family is unhappy in its own way. All was confusion in the house of Oblonskys. happy? happy: [happy] {happy} "happy" 'happy'
2) Output
================ Method 1===================== all,2 another,1 confusion,1 every,1 families,1 family,1 happy,7 house,1 in,2 is,1 its,1 oblonskys,1 of,1 one,1 own,1 resemble,1 the,1 unhappy,2 was,1 way,1 ================ Method 2===================== all: 2 another: 1 confusion: 1 every: 1 families: 1 family: 1 happy: 7 house: 1 in: 2 is: 1 its: 1 oblonskys: 1 of: 1 one: 1 own: 1 resemble: 1 the: 1 unhappy: 2 was: 1 way: 1
4, involved knowledge
1) replacement of a plurality of items may use square brackets:
$ Line = ~ s / [\,;:.?! ' "() {} \ [\]] / / G; # periods, commas, spaces, etc. to the unified
2) the word lowercase lc, a hash count
$ Counts {lc ($ word)} ++; # will appear in the word stores hash table
3) access% overall hash, hash key access keys%, to sort sort
sort keys %counts
4) Method 2 using $ line = ~ / (\ w +) / g is directly converted into a list of words in the text