描述
创建给定文档的反向索引
确保数据不包含标点符号.
样例
出一个包括id与内容的文档list(我们提供了document类).
返回一个反向索引(hashmap的key是单词, value是文档的id).
例 1:
输入:
[
{
“id”: 1,
“content”: “This is the content of document 1 it is very short”
},
{
“id”: 2,
“content”: “This is the content of document 2 it is very long bilabial bilabial heheh hahaha …”
},
]
输出:
{
“This”: [1, 2],
“is”: [1, 2],
…
}
例 2:
输入:
[
{
“id”: 1,
“content”: “you are young”
},
{
“id”: 2,
“content”: “you are handsome”
},
]
输出:
{
“are”: [1, 2],
…
}
思路
遍历每个content的每个字符串,插入到map中,并且更新map的vector,最后删除数组中的重复元素。
代码
/**
* Definition of Document:
* class Document {
* public:
* int id;
* string content;
* }
*/
class Solution {
public:
/**
* @param docs a list of documents
* @return an inverted index
*/
map<string, vector<int>> invertedIndex(vector<Document>& docs) {
// Write your code here
stringstream ss;
map<string, vector<int>> m;
for (int i = 0; i < docs.size(); i++) {
int m_id = docs[i].id;
ss << docs[i].content;
for (string str; ss >> str; m[str].push_back(m_id));
ss.clear();
}
for (map<string, vector<int>>::iterator it = m.begin(); it != m.end(); it++) {
it->second.erase(unique(it->second.begin(), it->second.end()), it->second.end());
}
return m;
}
};