illustrate

If you need to use this knowledge but don't have it, it will be frustrating and may lead to rejection from the interview. Whether you spend a few days "blitzing" or using fragmented time to continue learning, it is worthwhile to work on the data structure. So what data structures are there in Python? Lists, dictionaries, sets, and...stacks? Does Python have a stack? This series of articles will give detailed puzzle pieces.

Chapter 1: ADT abstract data type, defining data and its operations

What is ADT: Abstract Data Type, everyone who has studied data structures should know it.

How to choose a data structure for ADT

Does the data structure meet the storage requirements specified by the ADT domain?
Do the data structures provide the data access and manipulation capabilities to fully implement ADT?
Efficient execution? Based on complexity analysis.

The following code is a simple example. For example, to implement a simple Bag class, first define its operations, and then we use the magic method of the class to implement these methods:

class Bag:
    """
    constructor: 构造函数
    size
    contains
    append
    remove
    iter
    """
    def __init__(self):
        self._items = list()

    def __len__(self):
        return len(self._items)

    def __contains__(self, item):
        return item in self._items

    def add(self, item):
        self._items.append(item)

    def remove(self, item):
        assert item in self._items, 'item must in the bag'
        return self._items.remove(item)

    def __iter__(self):
        return _BagIterator(self._items) 


class _BagIterator:
    """ Note that the iterator class is implemented here""" 
    def __init__(self, seq): 
        self._bag_items = seq 
        self._cur_item = 0 

    def __iter__(self): 
        return self 

    def __next__(self): 
        if self._cur_item < len(self._bag_items): 
            item = self._bag_items[self._cur_item] 
            self._cur_item += 1 
            return item 
        else: 
            raise StopIteration 


b = Bag() 
b.add(1) 
b.add(2) 
for i in b : # for is constructed using __iter__, and __next__ is used to iterate 
    print(i) 


""" 
# The for statement is equivalent to 
i = b.__iter__() 
while True: 
    try:
        item = i.__next__()
        print(item)
    except StopIteration:
        break
"""

Chapter 2: array and list

array: fixed length, limited operations, but saves memory; it seems that I have never used it in my career, but I tried it in python3.5 and it does have an array class, which can be imported directly using import array

list: will allocate memory in advance and have rich operations, but it consumes memory. I did experiments with sys.getsizeof. My personal understanding is that it is very similar to vector in C++ STL, which is the most frequently used data structure.

list.append: If there is not enough memory allocated before, a new area will be re-opened, and then the previous data will be copied, and the complexity will be degraded.
list.insert: will move all elements after the inserted area, O(n)
list.pop: Different positions of pop require different complexity. pop(0) has O(1) complexity, and the first position of pop() has O(n) complexity.
list[]: slice operation copies data (reserve space) to another list

To implement an array of ADT:

import ctypes

class Array:
    def __init__(self, size):
        assert size > 0, 'array size must be > 0'
        self._size = size
        PyArrayType = ctypes.py_object * size
        self._elements = PyArrayType()
        self.clear(None)

    def __len__(self):
        return self._size

    def __getitem__(self, index):
        assert index >= 0 and index < len(self), 'out of range'
        return self._elements[index]

    def __setitem__(self, index, value):
        assert index >= 0 and index < len(self), 'out of range'
        self._elements[index] = value

    def clear(self, value):
        """ Set each element to value """
        for i in range(len(self)):
            self._elements[i] = value

    def __iter__(self):
        return _ArrayIterator(self._elements)


class _ArrayIterator:
    def __init__(self, items):
        self._items = items
        self._idx = 0

    def __iter__(self):
        return self

    def __next__(self):
        if self._idex < len(self._items):
            val = self._items[self._idx]
            self._idex += 1
            return val
        else:
            raise StopIteration

2.1 Two-Demensional Arrays

class Array2D:
    """ 要实现的方法
    Array2D(nrows, ncols):    constructor
    numRows()
    numCols()
    clear(value)
    getitem(i, j)
    setitem(i, j, val)
    """
    def __init__(self, numrows, numcols):
        self._the_rows = Array(numrows)     # 数组的数组
        for i in range(numrows):
            self._the_rows[i] = Array(numcols)

    @property
    def numRows(self):
        return len(self._the_rows)

    @property
    def NumCols(self):
        return len(self._the_rows[0])

    def clear(self, value):
        for row in self._the_rows:
            row.clear(value)

    def __getitem__(self, ndx_tuple):    # ndx_tuple: (x, y)
        assert len(ndx_tuple) == 2
        row, col = ndx_tuple[0], ndx_tuple[1]
        assert (row >= 0 and row < self.numRows and
                col >= 0 and col < self.NumCols)

        the_1d_array = self._the_rows[row]
        return the_1d_array[col]

    def __setitem__(self, ndx_tuple, value):
        assert len(ndx_tuple) == 2
        row, col = ndx_tuple[0], ndx_tuple[1]
        assert (row >= 0 and row < self.numRows and
                col >= 0 and col < self.NumCols)
        the_1d_array = self._the_rows[row]
        the_1d_array[col] = value

2.2 The Matrix ADT, m rows, n columns. It is best to use pandas to process the matrix. It is more painful to implement it yourself.

class Matrix:
    """ 最好用pandas的DataFrame
    Matrix(rows, ncols): constructor
    numCols()
    getitem(row, col)
    setitem(row, col, val)
    scaleBy(scalar): 每个元素乘scalar
    transpose(): 返回transpose转置
    add(rhsMatrix):    size must be the same
    subtract(rhsMatrix)
    multiply(rhsMatrix)
    """
    def __init__(self, numRows, numCols):
        self._theGrid = Array2D(numRows, numCols)
        self._theGrid.clear(0)

    @property
    def numRows(self):
        return self._theGrid.numRows

    @property
    def NumCols(self):
        return self._theGrid.numCols

    def __getitem__(self, ndxTuple):
        return self._theGrid[ndxTuple[0], ndxTuple[1]]

    def __setitem__(self, ndxTuple, scalar):
        self._theGrid[ndxTuple[0], ndxTuple[1]] = scalar

    def scaleBy(self, scalar):
        for r in range(self.numRows):
            for c in range(self.numCols):
                self[r, c] *= scalar

    def __add__(self, rhsMatrix):
        assert (rhsMatrix.numRows == self.numRows and
                rhsMatrix.numCols == self.numCols)
        newMartrix = Matrix(self.numRows, self.numCols)
        for r in range(self.numRows):
            for c in range(self.numCols):
                newMartrix[r, c] = self[r, c] + rhsMatrix[r, c]

Chapter 3: Sets and Maps

In addition to list, the most commonly used ones are probably Python's built-in set and dict.

3.1 sets ADT

A collection is a container that stores a collection of unique values in a given comparable domain, where the values are stored in no particular order.

class Set:
    """ 使用list实现set ADT
    Set()
    length()
    contains(element)
    add(element)
    remove(element)
    equals(element)
    isSubsetOf(setB)
    union(setB)
    intersect(setB)
    difference(setB)
    iterator()
    """
    def __init__(self):
        self._theElements = list()

    def __len__(self):
        return len(self._theElements)

    def __contains__(self, element):
        return element in self._theElements

    def add(self, element):
        if element not in self:
            self._theElements.append(element)

    def remove(self, element):
        assert element in self, 'The element must be set'
        self._theElements.remove(element)

    def __eq__(self, setB):
        if len(self) != len(setB):
            return False
        else:
            return self.isSubsetOf(setB)

    def isSubsetOf(self, setB):
        for element in self:
            if element not in setB:
                return False
        return True

    def union(self, setB):
        newSet = Set()
        newSet._theElements.extend(self._theElements)
        for element in setB:
            if element not in self:
                newSet._theElements.append(element)
        return newSet

3.2 Maps or Dict: Key-value pairs, implemented internally by Python using hash.

class Map:
    """ Map ADT list implemention
    Map()
    length()
    contains(key)
    add(key, value)
    remove(key)
    valudOf(key)
    iterator()
    """
    def __init__(self):
        self._entryList = list()

    def __len__(self):
        return len(self._entryList)

    def __contains__(self, key):
        ndx = self._findPosition(key)
        return ndx is not None

    def add(self, key, value):
        ndx = self._findPosition(key)
        if ndx is not None:
            self._entryList[ndx].value = value
            return False
        else:
            entry = _MapEntry(key, value)
            self._entryList.append(entry)
            return True

    def valueOf(self, key):
        ndx = self._findPosition(key)
        assert ndx is not None, 'Invalid map key'
        return self._entryList[ndx].value

    def remove(self, key):
        ndx = self._findPosition(key)
        assert ndx is not None, 'Invalid map key'
        self._entryList.pop(ndx)

    def __iter__(self):
        return _MapIterator(self._entryList)

    def _findPosition(self, key):
        for i in range(len(self)):
            if self._entryList[i].key == key:
                return i
        return None


class _MapEntry:    # or use collections.namedtuple('_MapEntry', 'key,value')
    def __init__(self, key, value):
        self.key = key
        self.value = value

3.3 The multiArray ADT, a multi-dimensional array, is generally simulated using a one-dimensional array, and then the elements are obtained by calculating the subscript

class MultiArray:
    """ row-major or column-marjor ordering, this is row-major ordering
    MultiArray(d1, d2, ...dn)
    dims():   the number of dimensions
    length(dim): the length of given array dimension
    clear(value)
    getitem(i1, i2, ... in), index(i1,i2,i3) = i1*(d2*d3) + i2*d3 + i3
    setitem(i1, i2, ... in)
    计算下标：index(i1,i2,...in) = i1*f1 + i2*f2 + ... + i(n-1)*f(n-1) + in*1
    """
    def __init__(self, *dimensions):
        # Implementation of MultiArray ADT using a 1-D # array,数组的数组的数组。。。
        assert len(dimensions) > 1, 'The array must have 2 or more dimensions'
        self._dims = dimensions
        # Compute to total number of elements in the array
        size = 1
        for d in dimensions:
            assert d > 0, 'Dimensions must be > 0'
            size *= d
        # Create the 1-D array to store the elements
        self._elements = Array(size)
        # Create a 1-D array to store the equation factors
        self._factors = Array(len(dimensions))
        self._computeFactors()

    @property
    def numDims(self):
        return len(self._dims)

    def length(self, dim):
        assert dim > 0 and dim < len(self._dims), 'Dimension component out of range'
        return self._dims[dim-1]

    def clear(self, value):
        self._elements.clear(value)

    def __getitem__(self, ndxTuple):
        assert len(ndxTuple) == self.numDims, 'Invalid # of array subscripts'
        index = self._computeIndex(ndxTuple)
        assert index is not None, 'Array subscript out of range'
        return self._elements[index]

    def __setitem__(self, ndxTuple, value):
        assert len(ndxTuple) == self.numDims, 'Invalid # of array subscripts'
        index = self._computeIndex(ndxTuple)
        assert index is not None, 'Array subscript out of range'
        self._elements[index] = value

    def _computeIndex(self, ndxTuple):
        # using the equation: i1*f1 + i2*f2 + ... + in*fn
        offset = 0
        for j in range(len(ndxTuple)):
            if ndxTuple[j] < 0 or ndxTuple[j] >= self._dims[j]:
                return None
            else:
                offset += ndexTuple[j] * self._factors[j]
        return offset

第4章：Algorithm Analysis

The Big O notation is generally used to measure the average time complexity of the algorithm, 1 < log(n) < n < nlog(n) < n^2 < n^3 < a^n. Understanding the average time complexity of common data structure operations is helpful to use more efficient data structures. Of course, sometimes it needs to be measured in time and space, and some operations may even degrade, such as the append operation of the list. If the list space is not enough, Will open up new space, the operation complexity degrades to O(n), and sometimes it is necessary to use amortized analysis (amortized)

Use python to implement basic data structures [01/4]