Red-black tree, B(+) tree, skip table, AVL and other data structures, application scenarios and analysis

Learned some material online.

This one: https://www.zhihu.com/question/30527705

copy code
AVL tree: One of the earliest balanced binary trees. There are relatively few applications compared to other data structures. Windows uses the AVL tree to manage the process address space

Red-Black Tree: Balanced binary tree, widely used in C ++ 's STL. Both map and set are implemented with red-black trees. The bottom layer of the map container of STL that we are familiar with is RBtree, which of course does not refer to unordered_map, which is hash.

B /B+ tree is used in disk file organization data index and database index

Trie tree dictionary tree, used to count and sort a large number of strings

------

AVL is a highly balanced binary tree, so the usual result is that the cost of maintaining this high balance is greater than the efficiency gain obtained from it, so there are not many practical applications, and 
more places are used to pursue local rather than Very strictly overall balanced red-black tree. Of course, if there are infrequent insertions and deletions in the scene, but only special requirements for search, AVL is still better than red and black. There are many applications of red-black trees. In addition to the STL mentioned by the students above, there are also Implementation of epoll in the kernel, using red-black tree to manage event blocks In nginx, use red-black tree to manage timers, etc. Java's TreeMap implementation The famous linux process scheduling Completely Fair Scheduler, using red-black tree to manage process control blocks B and B
+ are mainly used for indexing in file systems and databases, such as Mysql: B- Tree Index in MySql A typical application of trie tree is prefix matching, such as the following very common scenario, when we input, the search engine will give a hint For example, IP routing is also prefix matching, and trie will be used to a certain extent. ------ Jump table : In Redis, the jump table is used instead of the red-black tree to store and manage the elements in it (it should be said that the first-level element - the direct key, the value in it should have different data structures). First of all, is the skiplist a skiplist? Not a ziplist. ziplist is a very memory-saving linked list in redis (at the cost of slightly lower performance), so when the number of hash elements is very small (for example, only dozens),
then using this structure to store can achieve very little performance loss. In this case, it saves a lot of memory (redis is an in-memory database, it can still be saved). Well the question is clear. On the server side, when concurrency and performance are required, how to choose an appropriate data structure (here is a jump table and a red-black tree). If you simply compare the performance, the jump table and the red-black tree can be said to have little difference, but the concurrent environment is different.
If you want to update the data, the jump table needs to be updated less, and the locks are less. , so the cost of different threads contending for locks is relatively small,
and the red-black tree has a balancing process, involving a large number of nodes , and the cost of contending for locks is relatively high. Performance is not as good as the former. In a concurrent environment, skiplist has another advantage. Red-black trees may need to perform some rebalance operations when inserting and deleting. Such operations may involve other parts of the entire tree ,
and skiplist operations are obviously more local. Locks need to be pegged to fewer nodes , so performance is better in such cases.
copy code

In addition, the reasons for using the skip table described by the Redis author:

copy code
Please see what the developer said, why he chose skiplist The Skip list

There are a few reasons:
1) They are not very memory intensive . It's up to you basically. 
Changing parameters about the probability of a node to have a given number of levels will make then less memory intensive
than btrees.
Note: One disadvantage of skipping tables is memory consumption ( Because of repeated hierarchical storage nodes), but the author also said that parameters can be adjusted to reduce memory consumption, which is similar to those of balanced tree structures.
2) A sorted set is often target of many ZRANGE or ZREVRANGE operations, that is, traversing the skip list as a linked list.
With this operation the cache locality of skip lists is at least as good as with other kind of balanced trees.
Note : Redis has a range operation, so it can be easily operated by using the doubly linked list in the jump table. In addition,
cache locality is no worse than balanced tree.
3) They are simpler to implement, debug, and so forth. For instance thanks to the skip list simplicity I received a patch 
(already in Redis master) with augmented skip lists implementing ZRANK in O(log(N)). It required little changes to the code.
注:实现简单。zrank操作能够到O(log(N)).
About the Append Only durability
& speed, I don't think it is a good idea to optimize Redis at cost of more code
and more complexity for a use case that IMHO should be rare for the Redis target (fsync() at every command).
Almost no one is using this feature even with ACID SQL databases, as the performance hint is big anyway.
About threads: our experience shows that Redis is mostly I/O bound. I'm using threads to serve things from Virtual Memory.
The long term solution to exploit all the cores, assuming your link is so fast that you can saturate a single core,
is running multiple instances of Redis (no locks, almost fully scalable linearly with number of cores),
and using the "Redis Cluster" solution that I plan to develop in the future.
copy code

There are some English abbreviations in the above article, which are arranged as follows:

copy code
imho, imo (in my humble opinion, in my opinion): Common in forums, in my opinion.

idk (I don't know): I don't know.

rofl (rolling on the floor laughing): laughing until you fall to the floor.

roflmao (rolling on the floor laughing my ass of): A combined version of the first two, which means super funny.

sth (something): something.

nth (nothing): nothing.

plz (please): Please. The suffix of please is a z sound, so it is abbreviated as plz according to the pronunciation.

thx (thanks): Thank you. According to the pronunciation, the ks at the end of thanks can be replaced by the letter X.
copy code

Comparison of red-black tree and B(+) tree engineering implementation:

copy code
Several existing answers are analyzed from the perspective of algorithm. I try to analyze the application scenario of distinguishing red-black tree and b+ tree from an engineering point of view. A node of red-black tree only stores a pair of kv, so it can use a similar embedded linked list. The 
data structure itself does not manage memory, it is more lightweight, more flexible to use and saves memory. For example, a node can exist in several trees or linked lists at the same time, which is more common in the kernel.

As for the b+ tree, since each node needs to store multiple pairs of kv, the memory of the node structure is generally managed by the data structure itself. It is a container in the true sense. Compared with the red-black tree implemented by the embedded method, the
advantage is that it is simple to use and can be used by itself. It is easier to manage memory by lockfree, and the cpu cache hit rate is higher when a node stores multiple pairs of kv, so the high concurrent index implemented in user mode generally chooses b+ tree.

Besides b-tree and b+ tree, the intermediate node of btree stores more value than b+ tree. In the case of the same out-degree, the node is larger, and relatively speaking, the CPU cache hit rate is not as good as that of b+ tree.
In addition, the scanning feature of the b+ tree (leaf nodes connected by a linked list) is difficult to do in a lock-free case (I haven't seen a solution yet), so the lock-free b+ tree leaf nodes I have seen so far All are not linked.
copy code

From the perspective of their own characteristics, analyze the application scenarios of various data structures:

copy code
Red- black trees and AVL trees are simply used for searching .

AVL tree: Balanced binary tree, which is generally determined by the difference of the balance factor and implemented by rotation. The height difference between the left and right subtrees does not exceed 1, then compared with the red-black tree, it is a strictly balanced binary tree, and the balance conditions are very strict (the tree height The difference is only 1),
as long as the insertion or deletion does not meet the above conditions, the balance must be maintained by rotation. Since the rotation is very time consuming. We can infer that AVL trees are suitable for cases where the number of insertions and deletions is relatively small, but there are many searches. Red-Black Tree: Balanced binary tree, which is approximately balanced by constraining the color of each node on any simple path from root to leaf to ensure that no path is twice as long as the others.
Therefore, compared with the AVL tree that strictly requires balance, its rotation keeps the balance less frequently. When used for searching, we use red-black tree to replace AVL when there are many insertions and deletions . ( Some scenarios now use skip lists to replace red-black trees , search for "Why does redis use skiplists instead of red
- black?") B-tree, B + tree: They have the same characteristics. They are multi-way search trees. They are generally used in database systems. Why, because they have a small number of multi-level branches ,
we all know that disk IO is very time-consuming, and like a large number of Data is stored on disk, so we need to effectively reduce the number of disk IOs to avoid frequent disk searches. The B
+ tree is a variant of the B tree. The node with n subtrees contains n keywords. Each keyword does not store data, but is only used for indexing, and the data is stored in the leaf nodes . It is for the file system . Trie tree: Also known as a word lookup tree, a tree-like structure that is often used to manipulate strings. It is only one copy of the same prefix for different strings. Saving strings relatively directly is definitely space saving, but it's memory-intensive (yes, memory) when saving a lot of strings.
Similar to prefix tree (prefix tree), suffix tree (suffix tree), radix tree (patricia tree, compact prefix tree), crit
- bit tree (to solve the problem of memory consumption),
and the aforementioned double array trie. Simply add that I understand the application Prefix tree: fast retrieval of strings, sorting of strings, longest common prefix, automatic matching of prefixes and display of suffixes. Suffix tree: Find the number of occurrences of string s1 in s2, string s1 in s2, the longest common part of strings s1 and s2, and the longest palindrome. radix tree: linux kernel, nginx.
copy code

For the introduction of red-black trees, you can read these two articles: The clearest explanation of red-black trees in history (top) + (bottom)

http://mt.sohu.com/20161014/n470317653.shtml

http://mt.sohu.com/20161018/n470610910.shtml

When the structure of the search tree changes, the conditions of the red-black tree may be destroyed, and it is necessary to adjust the search tree to meet the conditions of the red-black tree again.
Adjustments can be divided into two categories:
One is color adjustment, that is, changing the color of a node;
The other type is structural adjustment, in which the set changes the structural relationship of the retrieval tree. The structural adjustment process includes two basic operations: Rotate Left and RotateRight

Remember, no matter how many cases there are, there are only two specific adjustment operations: 1. Change the color of some nodes, 2. Rotate some nodes.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324628837&siteId=291194637