Alibaba's Two Sides: 17 Questions about Data Structures and Algorithms

foreword

Gold, three, silver and four are really too complicated. Recently, when I was sorting out the java interview questions, I accidentally found this Ali interviewer manual . This interview question is really different from the previous java core interview knowledge points. Different, this interviewer's manual is to analyze the problem from the perspective of the interviewer's question. To ask how fragrant it is, we will finish it by looking at the catalogue, but the editor has only extracted a part of the interviewer's frequently asked sharing. to everyone.

1. Distributed

2. Middleware

3. Big data and high development

Fourth, the database

5. Design Patterns and Practice

6. Data Structures and Algorithms

6. Data Structures and Algorithms

1. tree

A tree is a data structure, which consists of n (n>=1) finite nodes to form a set with a hierarchical relationship.

It's called a "tree" because it looks like an upside-down tree, which means it has the roots up and the leaves down. It has the following characteristics: (01) each node has zero or more child nodes; (02) a node without a parent node is called a root node; (03) each non-root node has one and only one parent node; (04) ) In addition to the root node, each child node can be divided into multiple disjoint subtrees.

2. BST tree

Definition of a binary tree

A binary tree is a tree structure with at most two subtrees per node. It has five basic forms: a binary tree can be an empty set; the root can have an empty left subtree or a right subtree; or both left and right subtrees can be empty.

Properties of Binary Trees

The binary tree has the following properties: TODO (superscript and subscript) property 1: The number of nodes on the i-th layer of the binary tree is at most 2**(i-1}**(i≥1). Property 2: The depth is A binary tree of k has at most 2k-1 nodes (e1). Property 3: The height of a binary tree containing n nodes is at least log2(n+1). Property 4: In any binary tree, if the terminal node The number of is n0, the number of nodes with degree 2 is n2, then n0=n2+1.

2.1 Property 1:

The number of nodes on the first level of a binary tree is at most 2(l-1(≥1)

**Proof:**The following uses "mathematical induction" to prove. (O1) When i=1, the number of nodes in the i-th layer is 2i-1]=2(0)=1. Since there is only one root node on level 1, the proposition holds. (02) Assume that when i>1, the number of nodes in the i-th layer is 2i-1). This is inferred based on (O1! Based on this assumption, it is inferred that "the number of nodes in the (i+1)th layer is 20)". Since each node of a binary tree has at most two children, "the number of nodes at the (i+1)th level" is at most "twice the number of nodes at the i-th level". That is, the maximum number of nodes on the (+1)th layer=2x2(i-1=20). Therefore, the hypothesis is established, and the original proposition is proved!

2.2 Properties 2:

A binary tree of depth k has at most 2k)-1 nodes (k≥1)

**Proof:** In a binary tree with the same depth, when each level contains the maximum number of nodes, the number of nodes in the tree is the largest. Using "property 1", we can see that the number of nodes in a binary tree with depth k is at most: 20+21+…+2k-1=2k-1, so the original proposition is proved!

2.3 Properties 3:

The height of a binary tree with n nodes is at least log2(n+1)

Proof: According to "property 2", a binary tree with height h has at most 2h-1 nodes. Conversely, the height of a binary tree with n nodes is at least log2(n+1).

2.4 Properties 4:

In any binary tree, if the number of terminal nodes is n0 and the number of nodes with degree 2 is n2, then nO=n2+1

**Proof:**Because the degree of all nodes in the binary tree is not greater than 2, the total number of nodes is recorded as n) = "0 degree nodes n0)" + "1 degree nodes (n1)" + "2 degrees Number of nodes (n2)". Thus, Equation 1 is obtained. (Equation 1) n=n0+n1+n2 On the other hand, the 0-degree node has no children, the 1-degree node has one child, and the 2-degree node has two children, so the total number of child nodes in the binary tree is: n1 +2n2. Also, only the root is not a child of any node. Therefore, the total number of nodes in the binary tree can be expressed as Equation 2. (Equation 2) n=n1+2n2+1 is calculated from (Equation 1) and (Equation 2): nO=n2+1. Original title proved!

3. BST tree

Definition: Binary Search Tree, also known as binary search tree. Let x be a node in the binary search tree, the x node contains the keyword key, and the key value of the node x is recorded as keyx. If y is a node in the left subtree of x, then keyly]<=keyIx]; if y is a node in the right subtree of x, then keyy >= keyx.

In binary search tree: (01) If the left subtree of any node is not empty, then the value of all nodes on the left subtree is less than the value of its root node; (02) The right subtree of any node is not empty If it is empty, the value of all nodes on the right subtree is greater than the value of its root node; (03) The left and right subtrees of any node are also binary search trees, respectively. (04) No node with equal key value

4. AVL tree

AVL tree is a highly balanced binary search tree. According to the properties of binary search tree (Binary Search Tree), AVL must first satisfy: ·

  • If its left subtree is not empty, the value of all nodes on the left subtree is less than the value of its root node;
  • If its right subtree is not empty, the value of all nodes on the right subtree is greater than the value of its root node;
  • Its left and right subtrees are also binary search trees, respectively.

The properties of AVL trees: the absolute value of the difference between the heights of the left subtree and the right subtree does not exceed 1. Each left subtree and right subtree in the tree is an AVL tree. Each node has a balance factor (balance factor- bf), the balance factor of any node is one of 1, 0, 1 (the balance factor bf of each node is equal to the height of the right subtree minus the height of the left subtree)

After inserting or deleting a node, if the condition of the AVL tree is destroyed, a rotation operation is required to adjust the structure of the data to restore the AVL condition

Rotation involves at least three layers of nodes, so at least one layer of backtracking is required to find an illegal balance factor and perform a rotation backtracking check. There are several situations that need to be rotated:

  • 1. When the balance factor of the parent node of the current node is equal to 2, it means that the right tree of the parent node is higher than the left tree: at this time, if the balance factor of the current node is equal to 1, then the right tree of the current node is higher than the left tree, the shape "\ ", it needs to be left-handed; if the balance factor of the current node is equal to -1, then the right tree of the current node is lower than the left tree, in the form of ">", right-left double-handed rotation is required!
  • 2 When the balance factor of the parent node of the current node is equal to -2, it means that the right tree of the parent node is lower than the left tree: at this time, if the balance factor of the current node is equal to -1, then the right tree of the current node is lower than the left tree, as in "l"; Right-hand rotation is required; if the balance factor of the current node is equal to 1, then the right tree of the current node is higher than the left tree, in the form of <", left and right double rotation is required!

5. Red-black tree

A red-black tree is a self-balancing binary search tree that satisfies the following conditions:

  • 1. Nodes are red or black.
  • ⒉ The root node is black.
  • 3. Each leaf node is a black empty node (NIL node).
  • 4. Both children of each red node are black. (There cannot be two consecutive red nodes on all paths from each leaf to the root)
  • 5. All paths from any node to each of its leaves contain the same number of black nodes

These properties make the longest path from the root node to the leaf node in the red-black tree not more than twice the shortest path

The red-black tree is balanced by changing colors, left-handed and right-handed, any imbalance will be resolved within three spins

First of all, the red-black tree does not meet the equilibrium condition of the AVL tree, that is, a binary search tree with a height difference of at most 1 between the left subtree and the right subtree of each node. However, it is proposed to add colors to the nodes. Red and black use non-strict balance in exchange for the reduction of the number of rotations when adding or deleting nodes. Any imbalance will be resolved within three rotations, while AVL is a strictly balanced tree, so adding or deleting nodes When , depending on the situation, the number of rotations is more than that of the red-black tree. So the insertion efficiency of the red-black tree is higher!!!

6. B-tree

B-tree is B-tree, don't read B-reduced tree! ! ! !

  1. Logically speaking, the number of searches and comparisons in a binary search tree is the smallest. However, we have to consider a real problem: Disk Io
  2. The database index is stored on the disk. When the amount of data is relatively large, the size of the index may be several gigabytes or more.
  3. When we use the index query, can we load the entire index file into the memory? Obviously not possible, we can only load each disk page one by one, where the disk pages correspond to the nodes of the index tree

In the process of querying using binary search tree, we found that in the worst case, the number of disk IOs is equal to the height of the index tree

Therefore, in order to reduce the number of disk IOs, we need to make the original "thin and tall" tree structure "stubby". This is one of the characteristics of B-trees. A B-tree is a multi-way balanced search tree. Each node of it contains at most K children. K is called the order of the B-tree, and the size of K depends on the size of the disk page. A A B-tree of order m has the following characteristics:

1. The root node has at least two children.

2. Each intermediate node contains k-1 elements and k children, where m/2 <= k <= m

3. Each leaf node contains k-1 elements, where m/2<=k <=m

4. All leaf nodes are located at the same level.

5. The elements in each node are arranged from small to large, and the k-1 elements in the node are exactly the value range division of the elements contained in a child.

In this B-tree, assuming that the query keyword we want to query is 6, the query process is as follows:

  • However, compared to the speed of disk IO, the time-consuming in memory is almost negligible. So as long as the height of the tree is low enough and the number of I0s is small enough, the search performance can be improved
  • In contrast, it does not matter if there are more elements inside the node, just a few more memory interactions, as long as it does not exceed the size of the disk page. This is one of the advantages of B-trees
  • Insert and delete operations in B-trees...

7. B+ tree

B+ tree is a variant based on B-weight, which has higher query performance than B-tree

A B+ tree of order m has the following characteristics:

1. The intermediate node with k subtrees contains k elements (k-1 elements in the B tree), each element does not store data, it is only used for indexing, and all data is stored in leaf nodes.

2. All leaf nodes contain information of all elements and pointers to records containing these elements, and the leaf nodes themselves are linked in ascending order according to the size of the keywords.

3. All intermediate node elements exist in child nodes at the same time, and are the largest (or smallest) elements in the child node elements.

Since the elements of the parent node appear in the child nodes, all leaf nodes contain full element information, and each leaf node has a pointer to the next node, forming an ordered linked list

B+ tree also has a feature, which is outside the index and is indeed a crucial feature, that is [satellite data]

The so-called satellite data refers to the data record pointed to by the index element, such as a row in a database. In B-tree species, both intermediate nodes and leaf nodes carry satellite data

Satellite data in B-trees

In the B+ tree, only the leaf nodes have satellite data, the rest of the intermediate nodes are just indexes, and no one has any data associated with the satellite data in the B+ tree.

It should be added that in the clustered index (Clustered Index) of the database, the leaf nodes directly contain satellite data. In a nonclustered index (NonClustered index), leaf nodes carry pointers to satellite data.

Find element 3 in B+ tree species, the process is as follows:

First disk IO:

The second disk IO:

The third disk IO:

Different from the B-tree, the intermediate node of the B+ tree has no satellite data, so the disk page of the same size can accommodate more node elements, which means that the structure of the B+ tree is more than that of the B-tree under the same amount of data. "chunky" and therefore less IO when querying

Secondly, the query of the B+ tree must finally find the leaf node, while the B-tree only needs to find the matching element, no matter whether the matching element is in the middle node or the leaf node

Therefore, the search performance of the B-tree is not stable. In the best case, the root node is directly searched, and in the worst case, the leaf node is searched, and each search of the B+ tree is stable.

Advantages of B+ tree:

1. A single node stores more elements, resulting in fewer IOs for queries.

⒉ All queries must find leaf nodes, and the query performance is stable.

3. All leaf nodes form an ordered linked list, which is convenient for range query.

8. Dictionary tree

Also known as word search tree, Trie tree, is a tree structure and a variant of hash tree. The typical application is to count, sort and save a large number of strings, so it is often used for text word frequency statistics by search engine systems. Its advantage is that it uses the largest common prefix to reduce query time, minimize unnecessary string comparisons, and has higher query efficiency than hash tables.

nature

  • The root node contains no characters, and every node except the root node contains only one character;
  • From the root node to a certain node, the strings passing on the path are connected to form the string corresponding to the node.
  • All children of each node contain different characters.

Implementation

Methods to search dictionary items:

  • Start a search from the root node;
  • Get the first letter of the keyword to be searched, and select the corresponding subtree according to the letter to continue the search;
  • On the corresponding subtree, obtain the second letter of the keyword to be searched, and further select the corresponding subtree for retrieval
  • iterate
  • ·At a certain node, the letter of the keyword has been taken out, then read the information attached to the node, that is, complete the search.

application

(1) Fast retrieval of strings

Given a familiar vocabulary list consisting of N words, and an article written in lowercase English, please write all the new words that are not in the familiar vocabulary list in the earliest order.

  • Method 1: You can compare all the words in the English article with the familiar word list one by one, O(N)=O ( navg(length1Navg(length2))=O(nN)
  • Method 2: Use a hash table,
  • Method 3: Use a dictionary tree, form a dictionary tree from the familiar word list, and then search through the dictionary tree. The time complexity of building a tree: O(n) = O(N), the time complexity of search is only related to the depth of the tree, and has nothing to do with the number of words in the familiar vocabulary list, and the depth of the tree is related to the length of the word. , and the longest word is no more than 30 characters, so D(N=O(1); In addition, it is superior to other algorithms in terms of space complexity. Due to the existence of common prefixes, there is no need to store a large number of repeated characters.

(2) Sorting of strings

Given N different English names consisting of only one word, let you output them in lexicographic order from smallest to largest.

Sort the dictionary tree and create the dictionary tree in the form of an array, because all the child nodes of each node of the tree are obviously sorted according to their alphabetical size, then the preorder traversal of the tree can be performed.

(3) Longest common prefix

Build a dictionary of all strings

9. Jump table

10、HashMap

Java defines an interface java.util.Map for mapping in the data structure. This interface mainly has four commonly used implementation classes, namely HashMap.Hashtable, LinkedHashMap and TreeMap. The class inheritance relationship is shown in the following figure:

11、ConcurrentHashMap

CAS lock-free algorithm

Method to realize

  • CAS: Compare and Swap, translated into compare and exchange.
  • The java.util.concurrent package implements an optimistic lock that is different from the synchronous lock with the help of CAS.
  • CAS has 3 operands, the memory value V, the old expected value A, and the new value B to be modified. Modify memory value V to B if and only if expected value A and memory value V are the same, otherwise do nothing.

12、ConcurrentLinkedQueue

Delay updating tail node

Delay removal of head node

13. Topk problem

14. The idea of ​​resource pool

15. JVM memory management algorithm

16. Container virtualization technology, Docker's idea

**17, ** continuous integration, continuous release, jenkins

In order not to affect everyone's reading experience, the 2022 Ali Interviewer Handbook has been packaged for everyone. I hope this interview question can be helpful to everyone in this year's gold three silver four interviews. !

Guess you like

Origin blog.csdn.net/SQY0809/article/details/123575759