PAT 1063 Set Similarity (25分) 使用set与map的超时问题

在 1063 Set Similarity (25分) 这道题目中，使用 unorderd_set 和 unordered_map 时出现超时问题，

Given two sets of integers, the similarity of the sets is defined to be Nc/Nt ×100%, where Nc is the number of distinct common numbers shared by the two sets, and Nt is the total number of distinct numbers in the two sets. Your job is to calculate the similarity of any given pair of sets.

Input Specification:

Each input file contains one test case. Each case first gives a positive integer N (≤50) which is the total number of sets. Then N lines follow, each gives a set with a positive M (≤10⁴) and followed by M integers in the range [0,109]. After the input of sets, a positive integer K (≤2000) is given, followed by K lines of queries. Each query gives a pair of set numbers (the sets are numbered from 1 to N). All the numbers in a line are separated by a space.

Output Specification:

For each query, print in one line the similarity of the sets, in the percentage form accurate up to 1 decimal place.

Sample Input:

3
3 99 87 101
4 87 101 5 87
7 99 101 18 5 135 18 99
2
1 2
1 3

Sample Output:

50.0%
33.3%

题意：Nc 是两个集合之间共有的值不相等的元素个数，意思是说两个集合（也不是真正意义上的集合）可能存在重复的值，那么两个集合间的共有元素可能会被重复计算，Nc 是指不同的这些“共有元素”的个数。Nt 是两个集合所有的不同元素的个数，简单说就是把两个集合合并，里面有多少个不同的元素。集合相似度就是 Nc/Nt

初始代码思路是，直接用集合存储数据，计算相似度的时候，先遍历第一个集合，用 map 记录出现过的元素，在遍历第二个集合的时候如果出现之前记录过的元素时，计算两集合重复的元素个数，然后两集合元素总数减去重复的数量就是所有不同元素的数量。相除即得到集合相似度。

#include <bits/stdc++.h>
using namespace std;
int main() {
    int n, m, k, a, b;
    scanf("%d", &n);
    vector<unordered_set<int> > v(n + 1);
    for (int i = 1; i <= n; i++) {
        scanf("%d", &k);
        for (int j = 0; j < k; j++) {
            scanf("%d", &a);
            v[i].insert(a);
        }
    }
    scanf("%d", &k);
    while (k--) {
        scanf("%d%d", &a, &b);
        unordered_map<int, bool> vis;
        for (auto it : v[a]) vis[it] = true;
        int cnt = 0;
        for (auto it : v[b])
            if (vis[it] == true) cnt++;
        printf("%.1f%%\n", cnt * 1.0 / (v[a].size() + v[b].size() - cnt) * 100);
    }
    return 0;
}

但是最后一个样例超时了，只得到21分。之后参考其他代码对原代码做改进：直接在一个集合中查找另一个集合元素中的元素，来计算重合元素的个数。

只对查找部分的代码做修改：

while (k--) {
      scanf("%d%d", &a, &b);
      int nc = 0, nt = v[b].size();
      for (auto it : v[a]) {
          if (v[b].find(it) == v[b].end()) nt++;	// 没找到总数+1
          else nc++;
      }
      printf("%.1f%%\n", nc * 1.0 / nt * 100);
  }

分析：原本以为之前的方式简单快捷，实际上，使用 unordered_map 的思路对两个集合都需要做一次遍历，在 unordered_map 中插入元素时也需要消耗时间。后一种思路，实际上只遍历了一个集合中的元素，而在另一个集合中查找，因为 unordered_set 使用哈希表实现，对元素的查找几乎是常数级的时间复杂度，所以总体时间复杂度应该是 O(M·N) — M 是在另一集合中查找的平均查找次数。

总结： 之后碰到类似的查找问题，直接使用 unordered_set 的 find() 函数是正确的选择，unorderd_map 也是同样的。如果需要用到有序的特性，再根据具体情况做选择。

最终代码：

#include <bits/stdc++.h>
using namespace std;
int main() {
    int n, m, k, a, b;
    scanf("%d", &n);
    vector<unordered_set<int> > v(n + 1);
    for (int i = 1; i <= n; i++) {
        scanf("%d", &k);
        for (int j = 0; j < k; j++) {
            scanf("%d", &a);
            v[i].insert(a);
        }
    }
    scanf("%d", &k);
    while (k--) {
        scanf("%d%d", &a, &b);
        int nc = 0, nt = v[b].size();
        for (auto it : v[a]) {
            if (v[b].find(it) == v[b].end()) nt++;
            else nc++;
        }
        printf("%.1f%%\n", nc * 1.0 / nt * 100);
    }
    return 0;
}

Jin_zc

发布了21 篇原创文章 · 获赞 0 · 访问量 38

私信关注