Data structure refresher: Day 6

On the first pass, we use a hash map to count the number of occurrences of each character in the string. In the second traversal, as long as we traverse a character that only appears once, we will return its index, otherwise -1 will be returned after the traversal is completed.

class Solution {
public:
    int firstUniqChar(string s) {
        unordered_map<int, int> frequency;
        for (char ch: s) {
            ++frequency[ch];
        }
        for (int i = 0; i < s.size(); ++i) {
            if (frequency[s[i]] == 1) {
                return i;
            }
        }
        return -1;
    }
};

Complexity analysis

Time complexity: O(n), where n is the length of string s. We need to do two passes.

Space complexity: O(∣Σ∣), where Σ is the character set. In this question, s only contains lowercase letters, so ∣Σ∣≤26. We need O(∣Σ∣) space to store the hash map.

2. Use hash table to store index

Ideas and Algorithms

We can modify method one so that the object traversed for the second time changes from a string to a hash map.

Specifically, for each key-value pair in the hash map, the key represents a character and the value represents the index of its first occurrence (if the character appears only once) or −1 (if the character appears multiple times). When we traverse the string for the first time, let the currently traversed character be c. If c is not in the hash map, we will add cc and its index to the hash map as a key-value pair, otherwise we will add c The corresponding value in the hash map is modified to −1.

After the first traversal, we only need to traverse all the values in the hash map again to find the smallest value that is not -1, which is the index of the first non-repeating character. If all values in the hash map are −1, we return −1.

class Solution {
public:
    int firstUniqChar(string s) {
        unordered_map<int, int> position;
        int n = s.size();
        for (int i = 0; i < n; ++i) {
            if (position.count(s[i])) {
                position[s[i]] = -1;
            }
            else {
                position[s[i]] = i;
            }
        }
        int first = n;
        for (auto [_, pos]: position) {
            if (pos != -1 && pos < first) {
                first = pos;
            }
        }
        if (first == n) {
            first = -1;
        }
        return first;
    }
};

Complexity analysis

Time complexity: O(n), where n is the length of string s. The time complexity of traversing the string for the first time is O(n), and the time complexity of traversing the hash map for the second time is O(∣Σ∣). Since the number of character types contained in ss must be less than the length of s, therefore O(∣Σ∣) is smaller than O(n) in an asymptotic sense and can be ignored.

Space complexity: O(∣Σ∣), where Σ is the character set. In this question, s only contains lowercase letters, so ∣Σ∣≤26. We need O(∣Σ∣) space to store the hash map.

3. Queue

Ideas and Algorithms

We can also use the queue to find the first non-repeating character. The queue has a "first in, first out" nature, so it is very suitable for finding the first element that meets a certain condition.

Specifically, we use the same hash map as method 2, and use an additional queue to store each character and the position of their first occurrence in order. When we traverse the string, let the currently traversed character be c. If c is not in the hash map, we will put c and its index into the end of the queue as a tuple, otherwise we need to check the queue Whether the elements in all meet the requirement of "appearing only once", that is, we continuously select the element that pops up the head of the team based on the value stored in the hash map (whether it is -1), until the head element of the team "really" only appears. Once or the queue is empty.

After the traversal is completed, if the queue is empty, it means that there are no unique characters, and −1 is returned. Otherwise, the element at the head of the queue is the tuple of the first unique character and its index.

Tips

When maintaining the queue, we use the "delayed deletion" technique. In other words, even if some characters in the queue appear more than once, as long as they are not at the head of the queue, they will not affect the answer, and we do not need to delete them. We only need to remove it when all the characters before it are removed from the queue and it becomes the head of the queue.

class Solution {
public:
    int firstUniqChar(string s) {
        unordered_map<char, int> position;
        queue<pair<char, int>> q;
        int n = s.size();
        for (int i = 0; i < n; ++i) {
            if (!position.count(s[i])) {
                position[s[i]] = i;
                q.emplace(s[i], i);
            }
            else {
                position[s[i]] = -1;
                while (!q.empty() && position[q.front().first] == -1) {
                    q.pop();
                }
            }
        }
        return q.empty() ? -1 : q.front().second;
    }
};

Complexity analysis

Time complexity: O(n), where n is the length of string s. The time complexity of traversing a string is O(n), and during the traversal process we also maintain a queue. Since each character will only be put into and popped out of the queue at most once, the total time to maintain the queue is complex. The degree is O(∣Σ∣). Since the number of character types contained in s must be smaller than the length of s, O(∣Σ∣) is smaller than O(n) in an asymptotic sense and can be ignored.

Space complexity: O(∣Σ∣), where Σ is the character set. In this question, s only contains lowercase letters, so ∣Σ∣≤26. We need O(∣Σ∣) space to store the hash map and queue.

2. Ransom letter

383. Ransom letter - LeetCode https://leetcode.cn/problems/ransom-note/

1. Character statistics

The question requires using characters in the string magazine to construct a new string ransomNote, and each character in ransomNote can only be used once, and only needs to satisfy each English letter ('a'-'z') in the string magazine The number of statistics is greater than or equal to the number of statistics of the same letters in ransomNote.
●If the length of the string magazine is less than the length of the string randomNote, then we can be sure that magazine cannot constitute ransomNote, and in this case, false will be returned directly.
●First count the number of times of each English letter a in magazine; cnt[a], and then traverse and count the number of times
of each English letter in ransomNote. If it is found that the number of times of an English letter c in ransomNote is greater than the
number of times of that letter in magazine cnt[c], then we directly return false at this time.

class Solution {
public:
    bool canConstruct(string ransomNote, string magazine) {
        if (ransomNote.size() > magazine.size()) {
            return false;
        }
        vector<int> cnt(26);
        for (auto & c : magazine) {
            cnt[c - 'a']++;
        }
        for (auto & c : ransomNote) {
            cnt[c - 'a']--;
            if (cnt[c - 'a'] < 0) {
                return false;
            }
        }
        return true;
    }
};

Complexity analysis

●Time complexity: O(m+n), where m is the length of the string randomNote and n is the
length of the string magazine. We only need to traverse the two characters once.
●Space complexity: O(|S|), S is the character set. In this question, S is all lowercase English letters, so |S|= 26.

3. Valid allophones

242. Valid anagrams - LeetCode https://leetcode.cn/problems/valid-anagram/?plan=data-structures&plan_progress=ggfacv7

1. Sorting

t is an anagram of s, which is equivalent to "two strings are equal after sorting". Therefore, we can sort the strings s and t separately and judge whether the sorted strings are equal. Furthermore, if s and t have different lengths, t must not be an anagram of s.

class Solution {
public:
    bool isAnagram(string s, string t) {
        if (s.length() != t.length()) {
            return false;
        }
        sort(s.begin(), s.end());
        sort(t.begin(), t.end());
        return s == t;
    }
};

Complexity analysis

●Time complexity: O(nlogn), where n is the length of s. The time complexity of sorting is O(n logn), and the time complexity of comparing two strings is O(n), so the overall time complexity is O(nlogn+ n) = O(n logn).
●Space complexity: O(logn). Sorting requires O(logn) space complexity. Note that in some languages (such as Java & JavaScript) strings are immutable, so we need additional O(n) space to copy the string. But we
ignore this complexity analysis because:
. This depends on language details;
. It depends on how the function is designed, for example, you can change the function parameter type to char[].

2. Hash table

From another perspective, t is an anagram of s, which is equivalent to "the types and times of occurrence of characters in the two strings are equal." Since the string only contains 26 lowercase letters, we can maintain a frequency array table with a length of 26, first traverse and record the frequency of occurrence of characters in string s, then traverse string t, and subtract the corresponding frequencies in the table. If If table[i]<0 appears, it means that t contains an extra character that is not in s, just return false.

class Solution {
public:
    bool isAnagram(string s, string t) {
        if (s.length() != t.length()) {
            return false;
        }
        vector<int> table(26, 0);
        for (auto& ch: s) {
            table[ch - 'a']++;
        }
        for (auto& ch: t) {
            table[ch - 'a']--;
            if (table[ch - 'a'] < 0) {
                return false;
            }
        }
        return true;
    }
};

For advanced questions, Unicode is a solution that was created to solve the limitations of traditional character encodings. It specifies a unique binary encoding for characters in each language. There may be a problem in Unicode that one character corresponds to multiple bytes. In order to let the computer know how many bytes represent a character, the transmission-oriented encoding methods UTF-8 and UTF-16 were also born and gradually became widely used. Specifically related Knowledge readers can continue to consult relevant information to expand their horizons, which will not be expanded here.

Back to this question, the core point of the advanced problem is that "characters are discrete and unknown", so we can use a hash table to maintain the frequency of the corresponding characters. At the same time, readers need to pay attention to the problem that one Unicode character may correspond to multiple bytes. Different languages have different ways of reading and processing strings.

Complexity analysis

Time complexity: O(n)O(n), where n is the length of s.
Space complexity: O(S), where S is the character set size, here S=26.

class Solution {
public:
    bool isAnagram(string s, string t) {
        int freq[26] {};
        for (char ch : s) ++freq[ch - 'a'];
        for (char ch : t) --freq[ch - 'a'];
        return all_of(begin(freq), end(freq), [](int x) { return x == 0; });
    }
};