7-1 Frequency statistics (30 points)

Word frequency statistics

Reference:
reference code made some additions and modifications, the general idea has not changed
do not mean this problem more difficult, but this problem can help solve problems of many structures STL, so make the code becomes very simple

topic

7-1 Frequency statistics (30 points)

Please write programs for a period of English text, statistics on the number of all the different words, and word frequency maximum of the top 10% of the words.
The "word" refers to no more than 80 consecutive string of words composed of characters, but the length of more than 15 will only be taken before the reserved word 15 word characters. The legal "word character" as uppercase and lowercase letters, numbers and underscores, other characters are considered to be word separators.
Input format:
input text given some non-empty, the end of the last symbol #. Input to ensure there are at least 10 different words.
Output format:
number of all the different words in the text output in the first row. Note that "word" is not case-sensitive in English, such as "PAT" and "pat" is considered to be the same word.
Followed in descending order word frequency, word frequency according to: format a word word frequency output of the top 10% of the maximum word. If tied, press increments lexicographical output.
Sample input:
This IS A the Test.

The word "this" is the word with the highest frequency.

Should BE OFF Cut Longlonglonglongword, SO AS IS Considered the Same, But this_8 longlonglonglonee AS IS Different Within last the this, and the this, and the this ... #.
The this Line Should BE ignored.
Sample output :( Note: Although there have been the word 4, but because we just before an output of 10% (i.e., 23 in the first two words) word, in alphabetical order, bit 3 the row is not output).
23 is
. 5: the this
4: iS

Thinking

Records title word frequency, then record the number of times each word appears
because the '#' indicates the end, it is necessary to read verbatim to judge, but to be so become a word of it? As long as each character read access behind the string on the line
then you can build a list, table, and there are occurrences of each word

Structure and function used

begin () and end () function

Substantially linear storage class will be some type of structure (vector, map), in fact, to quickly locate a first (usually subscript 0) and the last data. Note that they can directly add and subtract
commonly used in the initialization or function

vector<pair<string ,int>> v(ma.begin(), ma.end());    //定义一个pair<string ,int>类型的数组,并且数组第一个数据为ma的第一个数据,最后一个数据为ma的最后一个数据
sort(v.begin(), v.end(), cmp);    //将v从v 的第一个到最后一个数据根据自定义的比较函数cmp进行分类。sort的自定义比较函数之后会说
//来自C++Reference的样例
int myints[] = {32,71,12,45,26,80,53,33};
  std::vector<int> myvector (myints, myints+8);               // 32 71 12 45 26 80 53 33 (对前八个数排序)

  std::sort (myvector.begin(), myvector.begin()+4);           //(12 32 45 71)26 80 53 33 (对前四个数排序)

pair (pair group)

pair is a C ++ template-defined types, you can simultaneously store two types of data, in fact, you can use the structure to achieve it

pair<string, int> a    //定义
a.firts="Hello World";    //对第一个数据进行操作
a.second="3";             //对第二个数据进行操作

map

Let's look at the definition of the C ++ reference

/*
Maps are associative containers that store elements formed by a combination of a key value and a mapped value, following a specific order.
In a map, the key values are generally used to sort and uniquely identify the elements, while the mapped values store the content associated to this key. 
The types of key and mapped value may differ, and are grouped together in member type value_type, which is a pair type combining both:
*/
 
typedef pair<const Key, T> value_type;

In fact, we can be understood as a container map, which is stored inside a set of key-value pair , i.e., two different types of data.
It can be understood as "keywords" and "keyword value" (Yes, and pair like)

vector (container)

vector can be viewed as an enhanced version of the array, for storing a plurality of data of the same type. Here to introduce his sort function sort ()
from C ++ reference can be found in the following example

 int myints[] = {32,71,12,45,26,80,53,33};
  std::vector<int> myvector (myints, myints+8);               //初始化为 32 71 12 45 26 80 53 33(myints的前八个放入vector)

  // using default comparison (operator <):
  std::sort (myvector.begin(), myvector.begin()+4);           //(12 32 45 71)26 80 53 33

  // using function as comp
  std::sort (myvector.begin()+4, myvector.end(), myfunction); // 12 32 45 71(26 33 53 80)

  // using object as comp
  std::sort (myvector.begin(), myvector.end(), myobject);     //(12 26 32 33 45 53 71 80)

It can be seen sort () function can actually add their own sort, but if each add their own words to the default non-descending order .
What if I want to add their own functions to how to adjust it?
Reference: [C ++] is the vector from the simplest to use a custom sort comparison function CoMP sort algorithm to sort the structure
first custom return value comparison function is bool type , where an example is given

    bool comp(int a,int b){
        return a>b;
    }

    sort(v.begin(), v.end(), comp);

The sort function when compared to the size judgment comp lose function, default a <returns true b, the thus from small to large, and set as a function of I comp a> Returns true b, the sort result so obtained final accordingly become small to large descending. In fact can be understood: After sorting, in front of a number, b is a number behind our custom function is to define the relationship between a and b,
let us look at examples of the code

bool cmp(pair<string, int> a, pair <string, int> b) {
    bool result = false;
    if (a.second == b.second&&a.first < b.first) {
        result = true;
    }
    else if (a.second > b.second) {
        result = true;
    }
    return result;
}

sort(v.begin(), v.end(), cmp);

Then the above equation would indicate that the vector of the first data to the last data arrangement.
Because the data type is the vector pair, according to the defined comparison function, when a pair of second (word occurrences) are equal, First (word) in front of the small (if juxtaposed press lexicographic output increment).

Code

#include <iostream>
#include <string>
#include <vector>
#include <map>
#include <algorithm>

using namespace std;

bool cmp(pair<string, int> a, pair <string, int> b);

int main() {
    char ch;
    string s;   //字符串用于记录一个单词
    map<string, int> ma;    //map记录词频,string代表的单词出现次数为int
    do {
        ch = getchar();
        //当读到的是合法字符(大小写字母、数字下划线)
        if ((ch >= 'a'&&ch <= 'z') || (ch >= 'A'&&ch <= 'Z') || (ch >= '0'&&ch <= '9') || ch == '_') {
            if (s.size() <= 14) {   //当长度为14时再进行一次接入,长度为15就停止接入
                if (ch >= 'A'&&ch <= 'Z') {     //把大写换成小写
                    ch += 32;
                }
                s += ch;    //把单个字符ch接到字符串s后,string中有运算符重载所以加法表示接在后面
            }
        }
        else {      //当不是合法字符就表示这个词读取结束了,出现次数+1
            if (s.size() > 0) {
                ma[s]++;
            }
            s.clear();      //清空字符串以统计下一个单词
        }
        if (ch == '#') {    //读到#退出循环
            break;
        }
    } while (ch != '#');
    vector<pair<string ,int>> v(ma.begin(), ma.end());        //存储pair的一个数组(把vector理解为增强版的数组)
    sort(v.begin(), v.end(), cmp);
    cout << v.size() << endl;
    int cnt = (int)(ma.size()*0.1);
    for (int i = 0; i < cnt; i++) {
        cout << v[i].second << ":" << v[i].first << endl;
    }
    return 0;
}

//利用pair数据,每个pair数据都含有一个string数值和int数值
//
bool cmp(pair<string, int> a, pair <string, int> b) {
    bool result = false;
    if (a.second == b.second&&a.first < b.first) {
        result = true;
    }
    else if (a.second > b.second) {
        result = true;
    }
    return result;
}

Guess you like

Origin www.cnblogs.com/luoyang0515/p/10991972.html