leetcode--839. Similar String Groups题解

题目

Two strings X and Y are similar if we can swap two letters (in different positions) of X, so that it equals Y.

For example, “tars” and “rats” are similar (swapping at positions 0 and 2), and “rats” and “arts” are similar, but “star” is not similar to “tars”, “rats”, or “arts”.

Together, these form two connected groups by similarity: {“tars”, “rats”, “arts”} and {“star”}. Notice that “tars” and “arts” are in the same group even though they are not similar. Formally, each group is such that a word is in the group if and only if it is similar to at least one other word in the group.

We are given a list A of strings. Every string in A is an anagram of every other string in A. How many groups are there?

Example 1:

Input: ["tars","rats","arts","star"]
Output: 2

Note:

  1. A.length <= 2000
  2. A[i].length <= 1000
  3. A.length * A[i].length <= 20000
  4. All words in A consist of lowercase letters only.
  5. All words in A have the same length and are anagrams of each other.
  6. The judging time limit has been increased for this question.

思路与解法

首先,按照题意,满足条件的字符串组的集合:对该集合中任意字符串X,存在字符串Y,使得X可以通过交换两个字母的位置得到Y。所以,在该集合中的任意一个字符串X一定可以找到相似(similar)的Y。我们可以采用图的思想,将字符串用节点V来表示,相似的两个字符串用边E相连,此时我们就得到了图G。那么一个集合中的任意两个字符串一定存在一条路径将二者相连,即该集合所代表的图为“连通”图。另一方面,一个连通图所代表的一定是满足题目要求的一个字符串的集合。最终,原问题“求多少组不同的字符串集合”便转化为了“求所有字符串构成的图中的连通分量”。具体实现如下:

  1. 构建图
    利用双重循环 ( 0   A . l e n g t h ) (0 ~ A.length) 遍历所有字符串(节点)的组合情况 ( A [ i ] A [ j ] ) (A[i]、A[j]) ,再通过一重循环 ( 0   m i n ( A [ i ] . l e n g t h , A [ j ] . l e n g t h ) ) (0 ~ min(A[i].length,A[j].length)) ,判断这两个字符串中 A [ i ] [ k ] ! = A [ j ] [ k ] A[i][k] != A[j][k] 的情况,记录 k k 的值到数组 d i f f I n d e x diffIndex 中。
    d i f f I n d e x . l e n g t h = = 2 diffIndex.length==2 时,且 A [ i ] [ k 0 ] = = A [ j ] [ k 1 ] a n d A [ i ] [ k 1 ] = = A [ j ] [ k 0 ] A[i][k_0] == A[j][k_1] and A[i][k_1] == A[j][k_0] 时,表示字符串(节点)A[i]和A[j]时相似(similar)的,此时,构建从A[i]到A[j]的边。时间复杂度为 O ( A . l e n g t h 2 A [ i ] . l e n g t h ) O(A.length^2*A[i].length)
  2. dfs遍历图
    根据构建的图,从一个未访问的节点出发,利用dfs深度优先搜索,遍历该节点所能达到的所有节点,进行标记,该集合即为一个连通分量。之后,类似的方法,从一个未访问的节点出发进行遍历标记,得到连通分量。最后统计连通分量的个数即可。

代码实现

我采用go语言来实现该算法:


func numSimilarGroups(A []string) int {
	// 利用go语言自带的数据结构map来实现图的存储,graph[str1]即得到与str1相连的其他字符串
	// visited存储节点的访问状态(是否被访问过)
    var graph = make(map[string][]string)
    var visited = make(map[string]bool)
    // 预处理,过滤掉相同字符串,减少后续过程中构造图的时间和空间的浪费
	B := make([]string, 0)
	for _, str1 := range A {
		flag := false
		for _, str2 := range B {
			if str2 == str1 {
				flag = true
				break
			}
		}
		if !flag {
			B = append(B, str1)
		}
	}
	// 三重循环进行构造图 
	for i, str1 := range B {
		for j, str2 := range B {
            if i == j || str1 == str2 {	// i==j 处理自环(str1 == str2 多余判断)
                continue
            }
            if length := len(str1); length == len(str2) {
				diffIndex := make([]int, 0)
                for k := 0; k < length; k++ {
					if str1[k] != str2[k] {
						diffIndex = append(diffIndex, k)
					}
				}
				if len(diffIndex) != 2 || !(str1[diffIndex[0]] == str2[diffIndex[1]] &&
					str1[diffIndex[1]] == str2[diffIndex[0]]) {
					continue
				}
				// 存储str1->str2
				graph[str1] = append(graph[str1], str2)
			}
		}
	}
	// num存储连通分量的数量
    num := 0
	for _, str := range B {
		if !visited[str] {
			num++
			visited[str] = true
			// 深度优先遍历图
			dfs(str, visited, graph)
		}
	}
	return num
}

func dfs(str string, visited map[string]bool, graph map[string][]string) {
	for _, strNext := range graph[str] {
		// 访问为访问过的节点
		if !visited[strNext] {
			visited[strNext] = true
            dfs(strNext, visited, graph)
		}
	}
}

遇到的问题

当我将 g r a p h v i s i t e d graph、visited 声明为全局变量时,最后一组数据无法通过:

var graph = make(map[string][]string)
var visited = make(map[string]bool)
func numSimilarGroups(A []string) int {
    ...
    num := 0
	for _, str := range B {
		if !visited[str] {
			...
			dfs(str)
		}
	}
	return num
}

func dfs(str string) {
	...
}

在这里插入图片描述
Frequently Asked Questions中有用户遇到了submit提交的结果与run code运行custom testcase中相同数据的结果不一致。 解释如下:

First, please check if you are using any global or static variables. They are Evil, period. If you must declare one, reset them in the first line of your called method or in the default constructor. Why? Because the judger executes all test cases using the same program instance, global/static variables affect the program state from one test case to another. See this Discuss thread for more details.

Are you using C or C++? If the answer is yes, chances are your code has bugs in it which cause one of the earlier test cases to trigger an undefined behavior. See this Discuss thread for an example of undefined behavior. These bugs could be hard to debug, so good luck. Or just give up on C/C++ entirely and code in a more predictable language, like Java. Just kidding.

我的情况与上述情况并不完全一致。所以具体原因并不清楚,有知道的用户希望可以评论告知。

猜你喜欢

转载自blog.csdn.net/liuyh73/article/details/82909815