[C++ code] Valid letter anagrams, the intersection of two arrays, happy numbers, the sum of two numbers - Code Thoughts

  • A hash table is also called a hash table. A hash table is a data structure that provides fast insertion and search operations. No matter how many pieces of data there are in the hash table, the time complexity of insertion and search is O (1), Because the lookup speed of hash tables is very fast, hash tables are used in many programs, such as pinyin checkers.

  • Hash tables also have their own shortcomings. Hash tables are based on arrays. We know that the cost of expanding the array after creation is relatively high, so when the hash table is filled up, the performance drops more seriously. A hash table is a data structure that is directly accessed based on the value of a key . The key code in the hash table is the index subscript of the array, and then the elements in the array are directly accessed through the subscript .

  • Generally, hash tables are used to quickly determine whether an element appears in a set . For example, you want to query whether a name is in this school. To enumerate, the time complexity is O(n), but if you use a hash table, you only need O(1) to do it. We only need to initialize the names of the students in this school and store them in the hash table. When querying, we can directly know whether the student is in this school through the index . Mapping student names to a hash table involves hash function, which is a hash function .

  • The hash function directly maps the student's name to the index on the hash table, and then you can quickly know whether the student is in this school by querying the index subscript. The hash function is shown in the figure below. The name is converted into a numerical value through hashCode. Generally, hashcode can convert other data formats into different numerical values ​​through a specific encoding method , thus mapping the student name to an index number on the hash table. .

    • Insert image description here
  • What should I do if the value obtained by hashCode is larger than the size of the hash table, that is, larger than tableSize? At this time, in order to ensure that the mapped index values ​​fall on the hash table, we will perform a modulo operation on the values ​​again , so that we can ensure that the student's name can be mapped to the hash table.

  • What if the number of students is greater than the size of the hash table? Even if the hash function is calculated evenly, it is inevitable that the names of several students will be mapped to the same index subscript of the hash table at the same time. As shown in the figure, both Xiao Li and Xiao Wang are mapped to the position of index subscript 1. This phenomenon is called hash collision .

    • Insert image description here
  • There are generally two solutions to hash collisions, the zipper method and the linear detection method.

    • Zipper method: Xiao Li and Xiao Wang just had a conflict at index 1, and the conflicting elements are stored in the linked list. In this way, we can find Xiao Li and Xiao Wang through the index. (The data size is dataSize, the size of the hash table is tableSize)

    • Insert image description here

    • In fact, the zipper method is to choose the appropriate size of the hash table, so that it will not waste a lot of memory because of empty array values , nor waste too much time on searching because the linked list is too long.

    • Linear detection method: When using the linear detection method, be sure to ensure that tableSize is greater than dataSize . We need to rely on empty bits in the hash table to resolve collisions. For example, if Xiao Li is placed in the conflicting position, then find an empty space downwards to place Xiao Wang's information. Therefore, tableSize must be larger than dataSize, otherwise there will be no free space in the hash table to store conflicting data. as the picture shows:

    • Insert image description here

  • There are three common hash structures: array, set (collection), and map (mapping). In C++, set and map provide the following three data structures respectively. Their underlying implementation and advantages and disadvantages are shown in the following table:

    • gather underlying implementation Is it in order? Whether the value can be repeated Can the value be changed? Query efficiency Addition and deletion efficiency
      std::set red black tree orderly no no O(log n) O(log n)
      std::multiset red black tree orderly yes no O(log n) O(log n)
      std::unordered_set Hash table disorder no no O(1) O(1)
    • The underlying implementation of std::unordered_set is a hash table. The underlying implementation of std::set and std::multiset is a red-black tree. The red-black tree is a balanced binary search tree, so the key values ​​​​are ordered, but the key It cannot be modified. Changing the key value will cause chaos in the entire tree, so it can only be deleted and added .

    • mapping underlying implementation Is it in order? Whether the value can be repeated Whether the value can be repeated Query efficiency Addition and deletion efficiency
      std::map red black tree orderly no no O(logn) O(logn)
      std::multimap red black tree orderly yes no O(logn) O(logn)
      std::unordered_map Hash table disorder no no O(1) O(1)
    • The underlying implementation of std::unordered_map is a hash table, and the underlying implementation of std::map and std::multimap is a red-black tree. In the same way, the keys of std::map and std::multimap are also ordered.

  • When we want to use sets to solve hashing problems, unordered_set is preferred because its query and addition and deletion efficiency are optimal. If the set is required to be ordered, then use set. If it is required not only to be ordered but also to be If the data is repeated, use multiset .

  • Then let's take a look at map. Map is a data structure of key value. In map, there are restrictions on keys but no restrictions on values, because the storage method of keys is implemented using red-black trees . Other languages ​​such as: HashMap and TreeMap in java have the same principle. Can be flexibly connected.

  • Although the underlying implementation of std::set and std::multiset is a red-black tree, not a hash table, std::set and std::multiset use red-black trees for indexing and storage, but the way we use it is still hazy. The way to use Shifa is key and value. So the method of using these data structures to solve mapping problems is still called hashing. The same is true for map.

  • unordered_set was introduced into the standard library in C++11, but hash_set was not, so it is recommended to use unordered_set. This is like one that is officially certified, hash_set and hash_map were spontaneously created by private experts before the C++11 standard. of wheels.

    • Insert image description here
  • When we need to quickly determine whether an element appears in a set, we must consider hashing . But hashing also sacrifices space for time , because we need to use additional arrays, sets or maps to store data to achieve fast search.

Topic: Valid allophones

  • Given two strings sand t, write a function to determine twhether is san anagram of . Note: If each character in sand tappears the same number of times, it is said that sand tare anagrams of each other.

  • Use ASCII code features to simply solve the problem:

    • class Solution {
              
              
      public:
          bool isAnagram(string s, string t) {
              
              
              int s_len = s.size();
              int t_len = t.size();
              if(s_len!=t_len){
              
              
                  return false;
              }
              vector<int> s_vec(26),t_vec(26);
              for(int i =0;i<s_len;i++){
              
              
                  s_vec[s[i]-97]++;
                  t_vec[t[i]-97]++;
              }
              for(int i=0;i<26;i++){
              
              
                  if(s_vec[i]!=t_vec[i]){
              
              
                      return false;
                  }
              }
              return true;
          }
      };
      
  • An array is actually a simple hash table , and the string in this question only has lowercase characters, so you can define an array to record the number of occurrences of characters in the string s. How big an array needs to be defined? The size is 26 and initialized to 0, because the ASCII characters a to z are also 26 consecutive values. The characters need to be mapped to the index subscript of the array, that is, the hash table. Because the ASCII from character a to character z is 26 consecutive values, character a is mapped to subscript 0, and the corresponding character z is mapped to subscript 25. .

  • When traversing the string s, you only need to perform a +1 operation on the element where s[i] - 'a' is located. You do not need to remember the ASCII of the character a, you only need to find a relative value. In this way, the number of occurrences of characters in string s is counted. Let’s take a look at how to check whether these characters appear in string t. Also when traversing string t, perform a -1 operation on the value on the hash table index of the character mapping hash table that appears in t. Then check at the end. If some elements are not zero, it means that the strings s and t must have more characters or less characters, return false . Finally, if all elements of the record array are zero, it means that the strings s and t are letter anagrams, and return true.

  • The time complexity is O(n). In terms of space, because the definition is an auxiliary array of constant size, the space complexity is O(1).

  • For advanced questions, Unicode is a solution that was created to solve the limitations of traditional character encodings. It specifies a unique binary encoding for characters in each language. There may be a problem in Unicode that one character corresponds to multiple bytes. In order to let the computer know how many bytes represent a character, the transmission-oriented encoding methods UTF-8 and UTF-16 were also born and gradually used. The core point of the advanced problem is that " characters are discrete and unknown ", so we can use a hash table to maintain the frequency of the corresponding characters. At the same time, readers need to pay attention to the problem that one Unicode character may correspond to multiple bytes. Different languages ​​have different ways of reading and processing strings.

Topic: Intersection of two arrays

  • Given two arrays nums1and nums2, return their intersection. Each element in the output result must be unique . We can ignore the order of output results .
answer
  • For this question, you mainly need to learn to use a hash data structure: unordered_set. This data structure can solve many similar problems. Note that the question specifically states: Each element in the output result must be unique, which means that the output result is deduplicated, and the order of the output results does not need to be considered . However, please note that using arrays for hashing questions is because the questions limit the size of the values . However, this question does not limit the size of the numerical value, so the array cannot be used as a hash table. And if the hash values ​​are relatively small, particularly scattered, and the span is very large, using an array will cause a huge waste of space .

  • At this time, another structure is used, set. Regarding set, C++ provides the following three available data structures: std::set; std::multiset; std::unordered_set

  • The underlying implementation of std::set and std::multiset are red-black trees. The underlying implementation of std::unordered_set is a hash table. Using unordered_set has the highest reading and writing efficiency. There is no need to sort the data, and it does not require The data is repeated, so unordered_set is selected.

  • class Solution {
          
          
    public:
        vector<int> intersection(vector<int>& nums1, vector<int>& nums2) {
          
          
            // std::unordered_set<int> temp_set;
            // for(int i=0;i<nums1.size();i++){
          
          
            //     temp_set.insert(nums1[i]);
            // }
            // for(int i=0;i<nums2.size();i++){
          
          
            //     temp_set.insert(nums2[i]);
            // }
            // vector<int> res;
            // res.assign(temp_set.begin(),temp_set.end());
            // return res;
            unordered_set<int> res; //存结果,set可去重
            unordered_set<int> nums1_set(nums1.begin(),nums1.end());
            for(int item:nums2){
          
          
                if(nums1_set.find(item)!=nums1_set.end()){
          
          // 发现nums2的元素 在 nums1_set 里又出现过
                    res.insert(item);
                }
            } 
            return vector<int>(res.begin(),res.end());
        }
    };
    
  • Time complexity: O(m+n), where m and n are the lengths of the two arrays respectively. It takes O(m+n) time to use two sets to store the elements in two arrays respectively. It takes O(min⁡(m,n)) time to traverse the smaller set and determine whether the element is in the other set. So the total time complexity is O(m+n) . Space complexity: O(m+n), where m and n are the lengths of the two arrays respectively. The space complexity mainly depends on two sets.

Topic: Happy Numbers

  • Write an algorithm to determine nwhether a number is a happy number. "Happy number" is defined as: for a positive integer, each time the number is replaced by the sum of the squares of the numbers in each position. Then repeat this process until the number becomes 1, or it may loop infinitely but never reaches 1. If the result of this process is 1, then this number is the happy number. If nis a happy number, return it true; if not, return it false.
answer
  • Method: Use the idea of ​​"fast and slow pointers" to find the cycle: the "fast pointer" takes two steps each time, and the "slow pointer" takes one step each time. When the two are equal, it is a cycle. At this time, determine whether it is a cycle caused by 1. If so, it is a happy number, otherwise it is not a happy number. Note: In this question, it is not recommended to use a set to record each calculation result to determine whether to enter a loop, because the set may be too large to be stored; in addition, it is not recommended to use recursion. Similarly, if the recursion level is deep, it will directly lead to the call The stack crashes . Don't be opportunistic because the integer given in this question is of type int.

  • class Solution {
          
          
    public:
        int sum_n(int n){
          
          
            int sum=0;
            while(n>0){
          
          
                int temp=n%10;//找一个数
                sum += pow(temp,2);
                n=n/10;
            }
            return sum;
        }
        bool isHappy(int n) {
          
               
            int slow=n,fast=n;
            do{
          
          
                slow = sum_n(slow);
                fast = sum_n(fast);
                fast = sum_n(fast);
            }while(slow!=fast);
            return slow==1;
        }
    };
    
  • The question says that there will be an infinite loop , which means that during the summation process, sum will appear repeatedly, which is very important for solving the problem ! When we need to quickly determine whether an element appears in a set, we need to consider hashing. Therefore, this question uses the hash method to determine whether the sum is repeated. If it is repeated, return false, otherwise it will be found until the sum is 1. You can use unordered_set to determine whether sum appears repeatedly.

  • class Solution {
          
          
    public:
        int sum_n(int n){
          
          
            int sum=0;
            while(n>0){
          
          
                int temp=n%10;//找一个数
                sum += pow(temp,2);
                n=n/10;
            }
            return sum;
        }
        bool isHappy(int n) {
          
           
            // int slow=n,fast=n;
            // do{
          
          
            //     slow = sum_n(slow);
            //     fast = sum_n(fast);
            //     fast = sum_n(fast);
            // }while(slow!=fast);
            // return slow==1;
            unordered_set<int> temp_set;
            while(true){
          
          
                int sum = sum_n(n);
                if(sum==1){
          
          
                    return true;
                }
                if(temp_set.find(sum)!=temp_set.end()){
          
          
                     // 如果这个sum曾经出现过,说明已经陷入了无限循环了,立刻return false
                    return false;
                }else{
          
          
                    temp_set.insert(sum);
                }
                n=sum;
            }
        }
    };
    

Topic: Sum of two numbers

  • Given an integer array numsand an integer target value , please find the twotarget integers in the array whose sum is the target value and return their array indexes. You can assume that each input will correspond to only one answer. However, the same element in the array cannot appear repeatedly in the answer. You can return answers in any order.target
answer
  • target - xUsing a hash table, the time complexity of searching can be reduced from O(N) to O(1). In this way, we create a hash table. For each one x, we first query whether it exists in the hash table target - x, and then xinsert into the hash table, which ensures that it will not xmatch itself.

  • class Solution {
          
          
    public:
        vector<int> twoSum(vector<int>& nums, int target) {
          
          
            unordered_map<int,int> temp_map;
            for(int i=0;i<nums.size();i++){
          
          
                auto it = temp_map.find(target-nums[i]);
                if(it!=temp_map.end()){
          
          
                    return {
          
          it->second,i};
                }
                temp_map[nums[i]]=i;
            }
            return {
          
          };        
        }
    };
    
  • First of all, I would like to emphasize when to use the hash method . When we need to query whether an element has appeared before, or whether an element is in a set, we must think of the hash method immediately. For this question, I need a collection to store the elements we have traversed, and then when traversing the array, I will ask this collection whether a certain element has been traversed, that is, whether it appears in this collection.

  • Because it is local, we not only need to know whether the element has been traversed, but also know the subscript corresponding to this element. We need to use the key value structure to store it. The key is used to store the element and the value is used to store the subscript. Then using map is appropriate . Let's look at the limitations of using arrays and sets for hashing.

    • The size of the array is limited, and if there are few elements and the hash value is too large, it will cause a waste of memory space.

    • Set is a set, and the elements placed in it can only be one key. For the question of the sum of two numbers, we not only need to determine whether y exists, but also record the subscript position of y, because we need to return the subscripts of x and y . So set cannot be used either.

  • At this time, you need to choose another data structure: map. Map is a key value storage structure. You can use key to save the value, and use value to save the subscript where the value is located. The purpose of map is to store the elements we have visited, because when traversing the array, we need to record which elements we have traversed before and the corresponding subscripts, so that we can find the ones that match the current element (that is, the addition is equal to target) .

  • For this question, we need to give an element, determine whether this element has appeared before, and if so, return the subscript of this element. Then to determine whether an element appears, this element must be used as a key, so the elements in the array are used as keys. The corresponding key is the value, and the value is used to store the subscript.

  • Therefore, the storage structure in map is {key: data element, value: subscript corresponding to the array element}.

  • When traversing the array, you only need to query the map to see if there is a value that matches the currently traversed element. If so, find the matching pair. If not, put the currently traversed element into the map , because the map stores The elements we visited.

  • Time complexity: O(N), where N is the number of elements in the array. For each element x, we can find it in O(1) time target - x. Space complexity: O(N), where N is the number of elements in the array. Mainly due to the overhead of hash tables.

Guess you like

Origin blog.csdn.net/weixin_43424450/article/details/132549591