Symbol table binary search (based on sorted array)

Every sentence: A truly happy man is one who can enjoy his creation. Those who are like sponges, who only take and do not give, will only lose their happiness. - "38 Letters from Rockefeller to His Son"

1. Basic idea


The data structure used by the ordered symbol table is a pair of parallel arrays, one for the key and one for the value. The array keys can be kept in order when put, and then use the index of the array to efficiently implement get() and other operations

2. API for ordered symbol table

/ method describe
void put(Key key, Value value) Store key-value pairs in the table
Value get(Key key) Get the value corresponding to the key key
boolean contains(Key key) Whether the key key exists in the table
boolean isEmpty() Is the table empty
int size() Number of key-value pairs in the table
Key min () smallest key
Key max() largest key
Key floor(Key key) The largest key less than or equal to key
Key ceiling(Key key) Minimum key greater than or equal to key
int rank(Key key) number of keys smaller than key
Key select(int k) rank k
void deleteMin() delete the smallest key
void deleteMax() delete the largest key
int size(Key lo, Key hi) The number of keys between [lo...hi]
Iterable keys (Key lo, Key hi) The number of keys between [lo...hi]
Iterable keys() The set of all keys in the table, sorted

3. Code implementation

package symboltable;

import com.sun.org.apache.xpath.internal.functions.FuncFloor;

import edu.princeton.cs.algs4.Queue;

public class BinarySearchST<Key extends Comparable<Key>, Value> {

    private Key[] keys;              //这里使用两个数组来保存键和值
    private Value[] values;
    private int N;

    @SuppressWarnings("unchecked")
    public BinarySearchST(int capacity) {
        keys = (Key[]) new Comparable[capacity];
        values = (Value[]) new Object[capacity];
    }

    public void put(Key key, Value value) {
        int i = rank(key);
        if(i < N && keys[i].compareTo(key) == 0) {
            values[i] = value;                  //如果找到匹配的值则更新
        }
        for(int j = N; j > i; j--) {            //将所有较大的元素全部向后移动一位
            keys[i] = keys[j-1];
            values[j] = values[j-1];
        }
        keys[i] = key;
        values[i] = value;
        N++;
    }

    public Value get(Key key) {
        if(isEmpty())
            return null;
        int i = rank(key);                      //返回小于它的元素数量
        if(i < N && keys[i].compareTo(key) == 0)
            return values[i];
        else
            return null;
    }

    public Key delete(Key key) {
        int i = rank(key);
        if(keys[i].compareTo(key) == 0) {             //如果找到元素,则将后面的元素向前移动一位
            for(int j = i; j < N - 1; j++) {
                keys[j] = keys[j + 1];
                values[j] = values[j + 1];
            }
            N--;
            return keys[i];
        }
        return null;
    }

    public boolean contains(Key key) {
        int i = rank(key);
        return keys[i].equals(key);
    }

    public boolean isEmpty() {
        return N == 0;
    }

    public int size() {
        return N;
    }

    public Key min() {
        return keys[0];
    }

    public Key max() {
        return keys[N-1];
    }

    public Key floor(Key key) {
        int i = rank(key);
        for(int j = i; j >= 0; j--) {
            if(select(j).compareTo(key) != 1)
                return select(j);
        }
        return null;
    }

    public Key ceiling(Key key) {
        int i = rank(key);
        return keys[i];
    }

    public int rank(Key key) {
        int lo = 0, hi = N - 1;
        while(lo <= hi) {
            int mid = lo + (hi - lo) / 2;
            int cmp = key.compareTo(keys[mid]);
            if(cmp < 0)
                hi = mid - 1;
            else if(cmp > 0)
                lo = mid + 1;
            else
                return mid;        //如果找到该键,rank() 会返回该键 的位置,也就是表中小于它的键的数量
        }
        return lo;                 //如果不存在,lo 就是表中小于它的键的数量
    }

    public Key select(int k) {
        return keys[k];
    }

    public void deledtMin() {
        delete(min());
    }

    public void deleteMax() {
        delete(max());
    }

    public int size(Key lo, Key hi) {
        if(hi.compareTo(lo) < 0)
            return 0;
        else if(contains(hi))
            return rank(hi) - rank(lo) + 1;
        else {
            return rank(hi) - rank(lo);
        }
    }

    public Iterable<Key> keys(Key lo, Key hi){
        Queue<Key> queue = new Queue<Key>();
        for(int i = rank(lo); i < rank(hi); i++) {       //将 lo~hi(不包括hi)的元素入队
            queue.enqueue(keys[i]);
        }
        if(contains(hi))                         //判断表中是否包含 hi 
            queue.enqueue(keys[rank(hi)]);
        return queue;
    }

    public Iterable<Key> keys(){
        return keys(min(), max());
    }

    public static void main(String[] args) {
        BinarySearchST<Integer, String> binarySearchST = new BinarySearchST<>(10);

        for(int i = 0; i < 5; i++) {
            binarySearchST.put(i, "Timber" + i);
        }
        System.out.println("size = " + binarySearchST.size());

        for(int k : binarySearchST.keys()) {
            System.out.println("key:" + k + ", value:" + binarySearchST.get(k));
        }

        binarySearchST.delete(3);
        System.out.println("删除后:");
        for(int k : binarySearchST.keys()) {
            System.out.println("key: " + k + ", value: " + binarySearchST.get(k));
        }

        System.out.println("小于等于 3 的最大键: " + binarySearchST.floor(3));

        System.out.println("大于等于 3 的最小键: " + binarySearchST.ceiling(3));
    }
}

3. Results display

4. rank() method analysis


At the heart of this implementation is the rank() method, which returns the number of less than a given key in the table. It first compares the key with the middle key, returns its index if equal, looks in the left half if it is less than the middle key, and looks in the right half if it is greater.

public int rank(Key key){
    int lo = 0, hi = N -1;
    while(lo <= hi){
        int mic = lo + (hi - lo) / 2;
        int cmp = key.compareTo(keys[mid]);
        if(cmp < 0){
            hi = mid - 1;
        }else if(cmp > 0){
            lo = mid + 1;
        }else{
            return mid;
        }
    }
}


Properties of the non-recursive version of binary search:

  • If the key exists in the table, rank() returns the position of the key, that is, the number of keys in the table that are smaller than this key;
  • If the key does not exist in the table, rank() should still return the number of keys in the table that are smaller than it, that is, the value of lo at the end of the loop is exactly equal to the number of keys in the table that are smaller than the key being looked up.

The trajectory of ranking using binary search in an ordered array is shown in the figure below. (Image via Algorithms, 4th Edition )

5. Performance Analysis


A binary search in an ordered array of N keys requires at most (lgN + 1) comparisons (whether successful or not). Inserting a new element into it requires worst-case accesses to the array ~2N times, so inserting N elements into an empty symbol table requires worst-case accesses to the arrays ~2N times.

The operating costs of the specific method are as follows:

method Order of magnitude increase in time required to run
put() N
get() calm
delete() N
contains() calm
size() 1
min () 1
max() 1
floor() calm
ceiling() calm
rank() calm
select() 1
deleteMin() N
deleteMax() 1

6. Comparison of sequential search and binary search


In general, binary search is much faster than sequential search. However, binary search is also not suitable for many applications. For example, the Leipzig Corpora database cannot be processed because lookups and inserts are mixed and the symbol table is too large.


The following table lists the performance characteristics of sequential search and binary search, with the order of magnitude increase in running time (binary search is the number of accesses to the array, and the others are the number of comparisons):

algorithm Worst case (after N insertions) Average case (after N insertions) Is it efficient to support ordered operations?
find insert find find
Sequential search N N N/2 N no
binary search lgN 2N lgN N Yes

7. Write at the end

If there is anything wrong or suggestion, welcome criticism and correction.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324953472&siteId=291194637