Skip List-- jump table, an efficient indexing techniques

For a query task, if you can not open up a space and then continuous sampling binary search processing, it is usually a variety of trees, more advanced data structures of various bunkers will be presented for fast data query.

Here, I do not intend to show their superior or mentally retarded side, but for the common data structure, dynamics processing for the index to build a set of data to work more sampling data structures are faced with a complex programming process, the program difficult to debug, algorithmic problems difficult to understand, and jump table - can meet in O (LogN) query complexity premise is very simple to complete the data query.

Typically for searching dynamic data, red-black trees balanced binary tree, simulated binary search process can achieve a query cost is the result O (log N), whereas the programming process is quite easy to list, query cost is indeed amazing O ( N), so a simple idea appeared, through the list to modify the query so that it can be obtained shorten the time period by increasing the node data on space - this is the idea jump table.

Jump table was invented by William Pugh. He of the ACM June 1990, 33 (6) 668-676 Skip lists published in Communications: a probabilistic alternative to balanced trees, the paper explains in detail the data structure of the jump table and insert delete operations.

William Pugh is of University Maryl and, a professor at the University College.

Introduction to Algorithms in open class at MIT, Erik Demaine gives a very creative and very easy to understand example - the subway system.

If a city, the subway express train and local train, express train to quickly reach the station from one station to another, while the middle and stop, slow wanted list as an ordinary, each station must stop (I very often sit 7374's style train). Thus, the first to reach the destination by express train nearby, then by reversing the slow to reach the destination, that's how skiplist conduct search works.

Because we do not need to consider the EMU computer scheduling problems, so if you feel the express line is not fast enough, you can then build a super express train line, invincible super express train line, Invincible Hot Wheels Express lines rocket ,,,,, Express lines.

As long as we are able to manage these well-pointer (which is the problem of my code) in c ++, we can get an optimal design of the car line results.

Given below illustrating a complete jump table:

If we just entered Shenyang, this time we located the point e, and want to go to No. 21 site, rich handsome as you certainly will not care about these issues drizzle fare, so I chose the subway, sit 6 station (line too expensive, Shenyang, only a repair station).

We know the destination number 21 standing in front of Station No. 6, so from the 6th stop of you, choose the second line move, breath and sit 25th station (baidu map application in this case you can only ride past the time to remind you). So, we go back to the 6th station, take Line. . . (Is not that the money spent ah)

From Line (now Shenyang Line no sign of it), all the way through No. 9 station, station No. 17, and No. 25 to the station (now surely be making personal attacks on the idea of ​​baidu map of the developer ), after a pass tangled, 17 back to the station.

This time baidu map reminds us that under the No. 17 bus stop is, if in the course of the bus, until the 25th did not reach the destination station, it only shows that you took the time expired map. . . But we after two bus stations, successfully reached the 21 stations.

Enter Shenyang we voted nine steel Beng, only 21 successfully reached the platform, it seems that after entering Shenyang directly by bus only spend 8 dollars, but also save a large steel Beng Street to buy fruit. . . . . . However, skiplist is the best result on a statistically example if the network is the whole of China, which is completely different.

Written above all how to search, but the search in a data structure is the most simple operation, these two operations are the additions and deletions to make the most of our attention, and this is the essence skiplist by a probability of 50% process to determine whether a node to improve on one level, theoretical analysis, we can get a surprising conclusion --O (logN) expectations query cost.

The cost analysis:

This process is easy to understand, but it really can provide a data structure O (logN) of the desired query cost for us?

For a list of length N, since we upgrade upwardly nodes according to the probability of 0.5, the number of the desired layer node is N / 2, a higher level of the desired number of nodes is N / 4, and so on , the high expectations that logN entire chain.

Query structure and the various tree, like jump table query cost analysis is highly correlated with the jump table, if two chains, namely the length L (1) and L (2), the maximum cost of a query is :

COST = L(2) + L(2)/L(1)

COST = L(3) + L(3)/L(2) + L(2)/L(1) 

For longer chains are analyzed using the same cost, so that it is clear that, when these are the same, when the amount of, the COST minimum value is obtained.

For a jump table, there are k chain, then his query cost is minimum K * [N ^ (1 / K)]

Prior to the biggest cost is the cost of a query of the query, the obtained K * [N ^ (1 / K)] is the minimum value of the ultimate price, however, as the cost of the entire process may be performed by the average cost amortization analysis, the average cost of a query that is K * [N ^ (1 / K)]. The value of K into which the expression finally get a price.

 

In this way, for this jump table structure, query time cost is O (logN), in particular 2logN. All table is the number of nodes N + logN, where the size of the newly added index logN.

Query process:

For a query value value, we start the search from the top of the chain, in the code is BEGIN node.

1. Go straight ahead in the chain L until the node exceeds the value or NULL value or the value of this node.

2. If the node is value, down the search until the lowest layer. Returns the node; otherwise, enter 3.

3. Return to the node, the next level L chain, L chain, if present, the probe 1 continues back - elem detection values ​​of the two nodes wherein transmission range is a layer down; if not, return NULL.

Insertion process:

skiplist is a structure for processing to sort the table, each node is inserted, insert the lowest level, then find a coin to get a 50% probability of 0 means no action, 1 said it would raise up this node layer after layer to improve if the coin toss again until you get a 0 stopped. . . .

If you want to insert a value value, the insertion process is as follows:

1. First jump in existing table in the query, if there is this point, give up the insert operation, if not, create a new node, and returns the node (the code is difficult and query of this process should be inserted into the perfect butt, how whole has a problem, reducing code duplication process a long way ah).

2. Insert the node, then add a stochastic process, in order to increase the probability of a layer of 0.5 up this node, the process is stopped until the improved operation.

3. If you create a new layer to improve the chain, update BEGIN.

Delete operations:

For a node only value, the object you want to delete this node.

1. Find the node with the search process, the return value for the process.

2. Search up to get the highest level.

3. To delete a node in turn on each layer, if you delete layers lead to change, update BEGIN.

 

As can be seen in addition to the insertion process was unusual, and various other aspects of the common list of operations the same, so we can write a simple jump table realized by the above analysis.

View Code
  1 #include <iostream>
  2 #include <random>
  3 using namespace std;
  4 #define MINELEM -2147483648
  5 class Node;
  6 class Node{
  7 public:
  8     friend void print();
  9     Node* getForward(){return forward;}
 10     Node* getBackward(){return backward;}
 11     Node* getUplevel(){return uplevel;}
 12     Node* getDownlevel(){return downlevel;}
 13 
 14     void setForward(Node *obj){forward = obj;}
 15     void setBackward(Node *obj){backward = obj;}
 16     void setUplevel(Node *obj){uplevel = obj;}
 17     void setDownlevel(Node *obj){downlevel = obj;}
 18 
 19     void setLevel(int a){level = a;}
 20     int getLevel()const{return level;}
 21     int getElem()const{return elem;}
 22     Node():forward(NULL),backward(NULL),uplevel(NULL),downlevel(NULL),elem(0){}
 23     Node(int a):forward(NULL),backward(NULL),uplevel(NULL),downlevel(NULL),elem(a){}
 24     Node(const Node &obj):forward(obj.forward),backward(obj.backward),uplevel(obj.uplevel),downlevel(obj.downlevel){
 25         elem = obj.getElem();
 26     }
 27     Node(Node *forward_ptr,Node *backward_ptr,Node *uplevel_ptr,Node *downlevel_ptr,int value){
 28         forward = forward_ptr;
 29         backward = backward_ptr;
 30         uplevel = uplevel_ptr;
 31         downlevel = downlevel_ptr;
 32         elem = value;
 33     }
 34     Node& operator=(const Node &obj){
 35         forward = obj.forward;
 36         backward = obj.backward;
 37         uplevel = obj.uplevel;
 38         downlevel = obj.downlevel;
 39         elem = obj.getElem();
 40         level = obj.getElem();
 41         return* this; 
 42     }
 43 private:
 44     Node *forward,*backward,*uplevel,*downlevel;
 45     int elem;
 46     int level;
 47 };
 48 static Node *START = new Node(NULL,NULL,NULL,NULL,MINELEM);
 49 static Node *BEGIN = START;
 50 Node* Levelup(Node &obj){
 51     Node *p = &obj;
 52     Node *new_obj = new Node(obj);    
 53     obj.setUplevel(new_obj);
 54     new_obj->setDownlevel(&obj);
 55     while((p->getElem())!= (START->getElem())){
 56         p = p->getBackward();
 57     }//find the next level up position
 58     if(p->getUplevel() == NULL)
 59     {
 60         Node *q = new Node(*START);
 61         q->setDownlevel(p);
 62         p->setUplevel(q);
 63         q->setUplevel(NULL);
 64         BEGIN = q;
 65         new_obj->setBackward(q);
 66         q->setForward(new_obj);
 67         new_obj->setUplevel(NULL);
 68         new_obj->setForward(NULL);
 69         return new_obj;
 70     }
 71     else{
 72         p = obj.getBackward();
 73         new_obj->setUplevel(NULL);
 74         while(p->getUplevel() == NULL)
 75             p = p->getBackward();
 76         p = p->getUplevel();
 77         Node *q = p->getForward();
 78         new_obj->setBackward(p);
 79         p->setForward(new_obj);
 80         new_obj->setForward(q);
 81         if(q != NULL){
 82             q->setBackward(new_obj);
 83         }
 84         return new_obj;
 85     }
 86 }
 87 Node* Search(int value){
 88     Node *p = START,*q = BEGIN;
 89     while(q->getDownlevel()!=NULL){
 90         p = q;
 91         while(p->getForward()!=NULL && p->getElem()<value){
 92                 p = p->getForward();
 93         }
 94         if(p->getElem() == value){
 95             while(p->getDownlevel()!=NULL)
 96                 p = p->getDownlevel();
 97             return p;
 98         }
 99         else {
100             q = p->getDownlevel()->getBackward();
101         }
102         //search the next level
103     }//serach part return the former object!
104     if(q->getDownlevel() != NULL){
105         if(p->getForward() != NULL){
106             q = p->getBackward()->getDownlevel();
107             while(q->getElem()!=p->getElem()){
108                 if(q->getElem() == value)
109                     return q;
110                 else q = q->getForward();
111             }
112         }
113         else {
114             p = p->getDownlevel();
115             while(p!=NULL){
116                 if(p->getElem() == value)
117                     return p;
118                 else p = p->getForward();
119             }
120         }
121     }//there are many level
122     else {
123         while(p!=NULL){
124             if(p->getElem() == value)
125                 return p;
126             else p = p->getForward();
127         }
128     }//there only one level
129     return NULL;
130 }
131 int Insert(int value){
132     Node *p = START,*q = BEGIN;
133     if(Search(value) == NULL){
134         Node *obj = new Node(value);
135         while(q!=NULL){
136             p = q;
137             while(p->getForward()!=NULL){
138                 if(obj->getElem()>p->getElem())
139                 {
140                     p = p->getForward();
141                 }//end if
142                 else break;
143             }
144             if(p->getBackward()==NULL)
145                 break;
146             if(p->getBackward()->getDownlevel() != NULL)
147                 q = p->getBackward()->getDownlevel();
148             else break;
149             //search the next level
150         }//serach part return the former object!
151         if(p->getElem() == START->getElem())
152             q = p->getForward();
153         else {
154             q = p;
155             p = p->getBackward();
156         }
157         if(q!=NULL){
158             if(q->getElem()>value){
159                 q->setBackward(obj);
160                 p->setForward(obj);
161                 obj->setBackward(p);
162                 obj->setForward(q);
163             }
164             else {
165                 q->setForward(obj);
166                 obj->setBackward(q);
167             }
168         }
169         else {
170             p->setForward(obj);
171             obj->setBackward(p);
172         }
173         int r = rand();
174         Node *temp = obj;
175         while(r%2 == 0){
176             temp = Levelup(*temp);
177             r = rand();
178         }
179         return 1;
180     }//end if value do not exist
181     else {
182         return 0;
183     }
184 }
185 int Delete(int value){
186     Node *temp = Search(value);
187     if(temp == NULL)return -1;
188     else{
189         Node *p = temp;
190         while(p->getUplevel()!=NULL)p = p->getUplevel();
191         while(p != NULL){
192             if(p->getBackward()->getElem() == START->getElem() && p->getForward()==NULL){
193                 Node *q;
194                 if(p->getBackward()->getDownlevel()!=NULL){
195                     p->getBackward()->getDownlevel()->setUplevel(NULL);
196                     q= p->getDownlevel();
197                 }
198                 if(BEGIN != START){
199                     BEGIN = p->getBackward()->getDownlevel();
200                     delete p->getBackward();                
201                     delete p;
202                     p = q;
203                 }
204                 else{
205                     BEGIN->setForward(NULL);
206                     START->setForward(NULL);
207                     delete p;
208                     p = NULL;
209                 }//there only one node exist
210             }//end if, 
211             else {
212                 if(p->getForward() != NULL)
213                     p->getForward()->setBackward(p->getBackward());
214                 p->getBackward()->setForward(p->getForward());
215                 Node *q = p->getDownlevel();
216                 delete p;
217                 p = q;
218             }
219         }
220     }
221 }
222 void print(){
223     cout<<"PRINT!"<<endl;
224     Node *q =BEGIN,*p = START;
225     while(p!=NULL){
226         cout<<p->getElem()<<"    ";
227         p = p->forward;
228     }
229     cout<<endl;
230     p = START->uplevel;
231     q = p;
232     while(p!=NULL){
233         if(q != NULL){
234             cout<<q->getElem()<<"    ";
235             q = q->getForward();
236         }
237         Node *t = START->getForward();
238         while(q != NULL){
239             Node *temp = q->getBackward();            
240             while(t->getElem()!=q->getElem()){
241                 cout<<"    ";
242                 t = t->getForward();
243             }
244             cout<<q->getElem();
245             q = q->getForward();
246         }
247         p = p->uplevel;
248         q = p;
249         cout<<endl;
250     }
251 }
252 
253 int main(){
254     int i;
255     cin>>i;
256     while(i--){    
257         print();
258         Insert(i);
259     }
260     char c;
261     cout<<"i--Insert    d--Delete"<<endl;
262     print();
263     while(cin>>c>>i){
264         switch(c){
265         case 'd':
266             cout<<Delete(i)<<endl;
267             print();
268             break;
269         case 'i':
270             cout<<Insert(i)<<endl;
271             print();
272             break;
273         default:
274             break;
275         }
276         cout<<"i--Insert    d--Delete"<<endl;
277     }
278     return EXIT_SUCCESS;
279 }

这是我的一个实现,问题有几个,但是近期并不打算修改了,实验室老板肯定不能容忍我这样逍遥法外。
1.跳表的数据结构的设计就有问题,听着MIT算法导论的视频突击写的程序,今天分析的时候才发现,根本不需要复制对象,仅仅复制指针就可以了。

2.跳表的头结点选择一个无穷小进行初始化,作为永恒的老小,但是忘记选择一个无穷大作为永远的老大,这就如同红黑树,设置一个NIL好处多多,可以极大地改进原有的逻辑。

3.C++的伪随机过程真是坑爹,而且更加关键的是,我懒得整一大堆随机数进行输入,所以跳表的实现总是效果很差,如果跳表太大,又不是特别好看。

不过整个程序就是为了验证跳表的操作,不是为了验证跳表的碉堡,就让那些完美主义的想法先压在心底吧,,,

程序实现结果:

8
PRINT!
-2147483648
PRINT!
-2147483648     7
PRINT!
-2147483648     6       7
PRINT!
-2147483648     5       6       7
-2147483648     5
-2147483648     5
PRINT!
-2147483648     4       5       6       7
-2147483648     4       5
-2147483648     4       5
-2147483648     4
-2147483648     4
-2147483648     4
PRINT!
-2147483648     3       4       5       6       7
-2147483648             4       5
-2147483648             4       5
-2147483648             4
-2147483648             4
-2147483648             4
PRINT!
-2147483648     2       3       4       5       6       7
-2147483648                     4       5
-2147483648                     4       5
-2147483648                     4
-2147483648                     4
-2147483648                     4
PRINT!
-2147483648     1       2       3       4       5       6       7
-2147483648                             4       5
-2147483648                             4       5
-2147483648                             4
-2147483648                             4
-2147483648                             4
i--Insert       d--Delete
PRINT!
-2147483648     0       1       2       3       4       5       6       7
-2147483648                                     4       5
-2147483648                                     4       5
-2147483648                                     4
-2147483648                                     4
-2147483648                                     4
d 4
0
PRINT!
-2147483648     0       1       2       3       5       6       7
-2147483648                                     5
-2147483648                                     5
i--Insert       d--Delete
i 4
1
PRINT!
-2147483648     0       1       2       3       4       5       6       7
-2147483648                                             5
-2147483648                                             5
i--Insert       d--Delete

即使不看代码也能够猜出,我的插入过程就是输入一个数,然后不断插入直到0.这种插入果断显示不出来任何随机性。

跳表缺陷:

对于跳表,删除操作绝对是一大硬伤,而且似乎找不到什么修改的办法,因为插入过程尽管还可以随机插入(通过随机插入过程),但是删除操作可是不能随机删除的,跳表在一定数目的节点删除之后就会失去原有的随机性,而如果采用再一次的随机过程进行每个节点的提升层次,这只会导致一个结果——代价太高,让我们用链表吧!

转载于:https://www.cnblogs.com/wangbiaoneu/archive/2013/04/27/SkipList.html

Guess you like

Origin blog.csdn.net/weixin_33827731/article/details/93550864