940. Distinct Subsequences II

题目链接

题目描述

Given a string S, count the number of distinct, non-empty subsequences of S .

Since the result may be large, return the answer modulo 10^9 + 7.

Example 1:

Input: "abc"
Output: 7
Explanation: The 7 distinct subsequences are "a", "b", "c", "ab", "ac", "bc", and "abc".
Example 2:

Input: "aba"
Output: 6
Explanation: The 6 distinct subsequences are "a", "b", "ab", "ba", "aa" and "aba".
Example 3:

Input: "aaa"
Output: 3
Explanation: The 3 distinct subsequences are "a", "aa" and "aaa".
 

 

Note:

S contains only lowercase letters.
1 <= S.length <= 2000

这种题目其实可以类比subset相关题目，但是该题目和subset题目有相似之处也有不同之处。
不同：subset中的元素可以和原集合中的元素顺序保持不一致，因此，我们在计算之前可以将原集合进行排序Subset, SubsetII。但是该题目中，子集合中的元素顺序必须和原本集合中的元素顺序保持一致，因此，我们不能排序，排序之后再回溯也会超时。
相同： 都可以在之前求出结果的基础上添加当前元素形成新的集合。注：之前subset这道题我使用的是回溯的方法，因此复杂度是o(2的n次方)，但是subset这道题也能不使用回溯，而是使用在之前基础上添加新元素的形式（借助set）。

下面两种方法本质上都是在之前结果的基础上添加新元素的方式。

方法一

dp[i]表示以S[i]结尾的不同子字符串的数量，则对应的方程为：

i的范围为{0..len}， j的范围为{0, i}
dp[i] += dp[j]       s[i] != s[j]
dp[i] += 0            s[i]==s[j] //避免重复

以字符串abb为例，初始时，每个dp[i]都为1
当i = 0时，dp[0] = 1（初始化值)，其代表1个字符串a
当i = 1时，dp[1] += dp[0] ，即dp[1]=2，其代表字符串ab, b
当i = 2时， dp[2] += dp[0] , 即dp[2] = 2,其代表的字符串为bb和abb(注：原本应该为dp[0] + dp[1]的，即代表字符串ab, abb, bb, b的，就是将之前算过的所有字符串拼接上新的当前字符，但是因为存在重复的，例如dp[1]代表的是ab, b和当前算出4个中就有2重复，因此，我们就不将和之前相等的字符串结果进行相加了)

其实这个方法本质上下文的方法二是一样的

class Solution {
    public int distinctSubseqII(String S) {
        if(S == null || S.length() == 0) {
        	return 0;
        }
        int mod = 1_000_000_007;
        int[] dp = new int[S.length()];
        Arrays.fill(dp, 1);
        for(int i = 0; i < S.length(); i++) {
        	for(int j = 0; j < i; j++) {
        		if(S.charAt(i) != S.charAt(j)) {
        			dp[i] = (dp[i] + dp[j]) % mod;
        		}
        	}
        }
        int ans = 0;
        for(int i = 0; i < S.length(); i++) {
        	ans = (ans + dp[i]) % mod;
        }
        return ans;
    }
}

方法二

题目说只会有小写字母出现，因此我们新建数组endWithChar[26]，其含义为：endWithChar[0]表示以字母a结尾的字符串个数，endWithChar[1]表示以字母b结尾的字符串的个数，依次类推。

我们从头开始遍历字符串的每个字符，当前我们已经或者的字符串个数为N = sum(endWithChar[0] + endWithChar[1] + ... + endWithChar[25]),假设现在新来一个字符c，则在之前字符串的末尾加上c之后，会有新的N个字符串以c结尾，再加上1(表示单独的一个c字符串)，就是当前以c结尾的字符串的数量。下面是一个例子。

假设S = abb，初始时，endWithChar的每个元素都为0，下面是遍历过程：
i = 0, S[i] = a , 则endWithChar[a] = sum(endWithChar) + 1 = 1, 代表字符串 a
i = 1, S[i] = b, 则endWithChar[b] = sum(endWithChar) + 1 = 2, 代表两个字符串ab, b
i=2， S[i] =b, 则endWithChar[b] = sum(endWIthChar) + 1 = 1 + 2 + 1 = 4，代表字符串ab, abb, bb, b
因此，最后结果为sum(endWithChar) = 1 + 4 = 5，代表字符串a, ab, abb, bb, b

class Solution {
    public int distinctSubseqII(String S) {
        if(S == null || S.length() == 0) {
        	return 0;
        }
        int mod = 1_000_000_007;
        int[] endWithChar = new int[26];
        int length = S.length();
        for(int i = 0; i < length; i++) {
        	endWithChar[S.charAt(i) - 'a'] = (getSum(endWithChar, mod) + 1) % mod;
        }
        return getSum(endWithChar, mod);
    }
    
    private int getSum(int[] array, int mod) {
    	int temp = 0;
    	for(int j = 0; j < 26; j++) {
    		temp = (temp + array[j]) % mod; 
    	}
    	return temp;
    }
}