数据结构(Map) — SparseArray 源码分析

一、概述

SparseArray 通过两个数组来实现 <Key， Value> 的数据结构；

还有三个类似的数据结构：

SparseBooleanArray -> value 为 boolean

SparseIntArray -> value 为 int

parseLongArray -> value 为 long

与SparseArray的区别在于value的类型；SparseArray的value可以是任意类型，而它们是三个拆箱后的基本类型。

二、源码

问题：

SparseArray 与 HashMap相比，有什么优势？
SparseArray 适用场景？
每次查找都要二分法查询，查询效率的问题？

分析：

SparseArray构造;
添加元素 put(int, Object);
删除指定key的元素；
获取指定key的元素；
获取数组元素的个数；

2.1 SparseArray构造

// 被删除的元素的默认占位元素；
private static final Object DELETED = new Object();
// 是否有被标记为删除的元素；
private boolean mGarbage = false;

// 元素的key
private int[] mKeys;
// 元素的value
private Object[] mValues;
// 元素个数
private int mSize;


public SparseArray() {
	// 默认申请长度为10的数组；
    this(10);
}

public SparseArray(int initialCapacity) {
    if (initialCapacity == 0) {
        mKeys = EmptyArray.INT;
        mValues = EmptyArray.OBJECT;
    } else {
        mValues = ArrayUtils.newUnpaddedObjectArray(initialCapacity);
        mKeys = new int[mValues.length];
    }
    mSize = 0;
}

2.2 添加元素 `put(int, Object)`

public void put(int key, E value) {
	// 1.二分法查找指定的key，如果之前存储过，则返回指定的位置；否则返回值<0;
    int i = ContainerHelpers.binarySearch(mKeys, mSize, key);
	
	// 匹配到指定的key,则直接返回；
    if (i >= 0) {
        mValues[i] = value;
    } else {
        i = ~i;

        if (i < mSize && mValues[i] == DELETED) {
            mKeys[i] = key;
            mValues[i] = value;
            return;
        }

        if (mGarbage && mSize >= mKeys.length) {
            gc();

            // Search again because indices may have changed.
            i = ~ContainerHelpers.binarySearch(mKeys, mSize, key);
        }

        mKeys = GrowingArrayUtils.insert(mKeys, mSize, i, key);
        mValues = GrowingArrayUtils.insert(mValues, mSize, i, value);
        mSize++;
    }
}

ContainerHelpers.binarySearchint[] array, int size, int value)

// 二分法查找指定元素value所在的位置；
static int binarySearch(int[] array, int size, int value) {
    int lo = 0;
    int hi = size - 1;
	
    while (lo <= hi) {
        final int mid = (lo + hi) >>> 1;
        final int midVal = array[mid];

        if (midVal < value) {
            lo = mid + 1;
        } else if (midVal > value) {
            hi = mid - 1;
        } else {
        	// 返回数组中元素所在的位置；
            return mid;  // value found
        }
    }
    // 没有匹配到就返回负数
    return ~lo;  // value not present
}

GrowingArrayUtils.insert(int[] array, int currentSize, int index, int element)

public static int[] insert(int[] array, int currentSize, int index, int element) {
	assert currentSize <= array.length;
	
	// 新增一个元素后，array内元素的个数没有超出原有数组的大小；
	if (currentSize + 1 <= array.length) {
		//把array的index(包括自身)后的元素往后移动1个单位
	    System.arraycopy(array, index, array, index + 1, currentSize - index);
	    array[index] = element;
	    return array;
	}
	
	// 1.新增一个元素后，大小超过数组大小，所以数组需要扩容；
	// growSize(int currentSize) => currentSize <= 4 ? 8 : currentSize * 2;
	int[] newArray = ArrayUtils.newUnpaddedIntArray(growSize(currentSize));
	// 2.向新数组中拷贝原数组的前index个元素(不包含第index个元素)；
	System.arraycopy(array, 0, newArray, 0, index);
	// 3.将element赋值给新数组的第index个元素；
	newArray[index] = element;
	// 4.将原数组从第（index+1）个元素到最后一个元素拷贝到新数组中；
	System.arraycopy(array, index, newArray, index + 1, array.length - index);
	return newArray;
}

2.3 删除指定key的元素

public void delete(int key) {
	// 通过二分法查询指定的key;
    int i = ContainerHelpers.binarySearch(mKeys, mSize, key);

    if (i >= 0) {
    	// 将要移除的元素替换成 DELETED 对象(先不删除);
        if (mValues[i] != DELETED) {
            mValues[i] = DELETED;
            // 将该标记设置为true，表示有删除的元素；
            mGarbage = true;
        }
    }
}

public void remove(int key) {
    delete(key);
}

// 删除指定key的value值，同时返回被删除的value;
public E removeReturnOld(int key) {
    int i = ContainerHelpers.binarySearch(mKeys, mSize, key);

    if (i >= 0) {
        if (mValues[i] != DELETED) {
            final E old = (E) mValues[i];
            mValues[i] = DELETED;
            mGarbage = true;
            return old;
        }
    }
    return null;
}

// 删除指定索引的value；
public void removeAt(int index) {
    if (mValues[index] != DELETED) {
        mValues[index] = DELETED;
        mGarbage = true;
    }
}

// 移除[index, index+size)位的元素；
public void removeAtRange(int index, int size) {
    final int end = Math.min(mSize, index + size);
    for (int i = index; i < end; i++) {
        removeAt(i);
    }
}

2.4 获取指定key的元素

public E get(int key, E valueIfKeyNotFound) {
	// 通过二分法查找指定的key；
    int i = ContainerHelpers.binarySearch(mKeys, mSize, key);

    if (i < 0 || mValues[i] == DELETED) {
        return valueIfKeyNotFound;
    } else {
    	// 匹配到后，返回对应位置的value;
        return (E) mValues[i];
    }
}

2.5 获取数组元素的个数

public int size() {
    if (mGarbage) {
    	//如果有被标记为删除的数据，则数组需要先执行一次删除操作;
        gc();
    }

    return mSize;
}

gc()

private void gc() {
    int n = mSize;
    int o = 0;
    int[] keys = mKeys;
    Object[] values = mValues;
	
	// 遍历value数组，移除标记为 DELETED 的元素；
    for (int i = 0; i < n; i++) {
        Object val = values[i];

        if (val != DELETED) {
            if (i != o) {
                keys[o] = keys[i];
                values[o] = val;
                values[i] = null;
            }

            o++;
        }
    }

    mGarbage = false;
    mSize = o;
}

三、示例

依次插入以下元素：

名称	元素	元素	元素	元素	元素
Key (int)	5	4	3	2	4
Value (String)	“5”	“4”	“3”	“2”	“44”

按顺序插入上表的元素：
横轴表示数组的位置，纵轴表示插入的元素，加粗的表示元素插入的位置；

	0	1	2	3	4	5	6
插入5	<5, “5”>	<0, null>	<0, null>	<0, null>	<0, null>	<0, null>	<0, null>
插入4	<4, “4”>	<5, “5”>	<0, null>	<0, null>	<0, null>	<0, null>	<0, null>
插入3	< 3, ”3“>	<4, “4”>	<5, “5”>	<0, null>	<0, null>	<0, null>	<0, null>
插入2	<2, ”2“>	< 3, ”3“>	<4, “4”>	<5, “5”>	<0, null>	<0, null>	<0, null>
插入4	<2, ”2“>	< 3, ”3“>	<4, “44”>	<5, “5”>	<0, null>	<0, null>	<0, null>

四、总结

从以下几个问题的角度考虑：

扫描二维码关注公众号，回复： 3681824 查看本文章

SparseArray 与 HashMap相比，有什么优势？
SparseArray 适用场景？
每次查找都要二分法查询，查询效率的问题？

4.1 优点

SparseArray 的 key 的类型是 int，避免了基本数据类型的装箱操作；
数据量小时(不超过千位)，随机访问效率高，插入数据可能会涉及到数组的拷贝，类似将元素插入 ArrayList 第一个位置；
SparseArray 在删除操作时做了一些优化；
当删除一个元素时，不是立即从value数组中删除它，而是将其标记为DELETED。当存储相同的key的value时，可重用；在合适的时机里执行gc操作。

4.2 缺点

插入操作需要复制数组，增删效率降低；
数据量巨大时，复制数组成本巨大，gc() 成本也巨大；
每次查找数据都需要进行二分法查找，数据量巨大时，查询效率低；