illustrate
If you need to use this knowledge but don't have it, it will be frustrating and may lead to rejection from the interview. Whether you spend a few days "blitzing" or using fragmented time to continue learning, it is worthwhile to work on the data structure. So what data structures are there in Python? Lists, dictionaries, sets, and...stacks? Does Python have a stack? This series of articles will give detailed puzzle pieces.
Chapter 1: ADT abstract data type, defining data and its operations
What is ADT: Abstract Data Type, everyone who has studied data structures should know it.
How to choose a data structure for ADT
- Does the data structure meet the storage requirements specified by the ADT domain?
- Do the data structures provide the data access and manipulation capabilities to fully implement ADT?
- Efficient execution? Based on complexity analysis.
The following code is a simple example. For example, to implement a simple Bag class, first define its operations, and then we use the magic method of the class to implement these methods:
class Bag: """ constructor: 构造函数 size contains append remove iter """ def __init__(self): self._items = list() def __len__(self): return len(self._items) def __contains__(self, item): return item in self._items def add(self, item): self._items.append(item) def remove(self, item): assert item in self._items, 'item must in the bag' return self._items.remove(item) def __iter__(self): return _BagIterator(self._items) class _BagIterator: """ Note that the iterator class is implemented here""" def __init__(self, seq): self._bag_items = seq self._cur_item = 0 def __iter__(self): return self def __next__(self): if self._cur_item < len(self._bag_items): item = self._bag_items[self._cur_item] self._cur_item += 1 return item else: raise StopIteration b = Bag() b.add(1) b.add(2) for i in b : # for is constructed using __iter__, and __next__ is used to iterate print(i) """ # The for statement is equivalent to i = b.__iter__() while True: try: item = i.__next__() print(item) except StopIteration: break """
Chapter 2: array and list
array: fixed length, limited operations, but saves memory; it seems that I have never used it in my career, but I tried it in python3.5 and it does have an array class, which can be imported directly using import array
list: will allocate memory in advance and have rich operations, but it consumes memory. I did experiments with sys.getsizeof. My personal understanding is that it is very similar to vector in C++ STL, which is the most frequently used data structure.
- list.append: If there is not enough memory allocated before, a new area will be re-opened, and then the previous data will be copied, and the complexity will be degraded.
- list.insert: will move all elements after the inserted area, O(n)
- list.pop: Different positions of pop require different complexity. pop(0) has O(1) complexity, and the first position of pop() has O(n) complexity.
- list[]: slice operation copies data (reserve space) to another list
To implement an array of ADT:
import ctypes class Array: def __init__(self, size): assert size > 0, 'array size must be > 0' self._size = size PyArrayType = ctypes.py_object * size self._elements = PyArrayType() self.clear(None) def __len__(self): return self._size def __getitem__(self, index): assert index >= 0 and index < len(self), 'out of range' return self._elements[index] def __setitem__(self, index, value): assert index >= 0 and index < len(self), 'out of range' self._elements[index] = value def clear(self, value): """ Set each element to value """ for i in range(len(self)): self._elements[i] = value def __iter__(self): return _ArrayIterator(self._elements) class _ArrayIterator: def __init__(self, items): self._items = items self._idx = 0 def __iter__(self): return self def __next__(self): if self._idex < len(self._items): val = self._items[self._idx] self._idex += 1 return val else: raise StopIteration
2.1 Two-Demensional Arrays
class Array2D: """ 要实现的方法 Array2D(nrows, ncols): constructor numRows() numCols() clear(value) getitem(i, j) setitem(i, j, val) """ def __init__(self, numrows, numcols): self._the_rows = Array(numrows) # 数组的数组 for i in range(numrows): self._the_rows[i] = Array(numcols) @property def numRows(self): return len(self._the_rows) @property def NumCols(self): return len(self._the_rows[0]) def clear(self, value): for row in self._the_rows: row.clear(value) def __getitem__(self, ndx_tuple): # ndx_tuple: (x, y) assert len(ndx_tuple) == 2 row, col = ndx_tuple[0], ndx_tuple[1] assert (row >= 0 and row < self.numRows and col >= 0 and col < self.NumCols) the_1d_array = self._the_rows[row] return the_1d_array[col] def __setitem__(self, ndx_tuple, value): assert len(ndx_tuple) == 2 row, col = ndx_tuple[0], ndx_tuple[1] assert (row >= 0 and row < self.numRows and col >= 0 and col < self.NumCols) the_1d_array = self._the_rows[row] the_1d_array[col] = value
2.2 The Matrix ADT, m rows, n columns. It is best to use pandas to process the matrix. It is more painful to implement it yourself.
class Matrix: """ 最好用pandas的DataFrame Matrix(rows, ncols): constructor numCols() getitem(row, col) setitem(row, col, val) scaleBy(scalar): 每个元素乘scalar transpose(): 返回transpose转置 add(rhsMatrix): size must be the same subtract(rhsMatrix) multiply(rhsMatrix) """ def __init__(self, numRows, numCols): self._theGrid = Array2D(numRows, numCols) self._theGrid.clear(0) @property def numRows(self): return self._theGrid.numRows @property def NumCols(self): return self._theGrid.numCols def __getitem__(self, ndxTuple): return self._theGrid[ndxTuple[0], ndxTuple[1]] def __setitem__(self, ndxTuple, scalar): self._theGrid[ndxTuple[0], ndxTuple[1]] = scalar def scaleBy(self, scalar): for r in range(self.numRows): for c in range(self.numCols): self[r, c] *= scalar def __add__(self, rhsMatrix): assert (rhsMatrix.numRows == self.numRows and rhsMatrix.numCols == self.numCols) newMartrix = Matrix(self.numRows, self.numCols) for r in range(self.numRows): for c in range(self.numCols): newMartrix[r, c] = self[r, c] + rhsMatrix[r, c]
Chapter 3: Sets and Maps
In addition to list, the most commonly used ones are probably Python's built-in set and dict.
3.1 sets ADT
A collection is a container that stores a collection of unique values in a given comparable domain, where the values are stored in no particular order.
class Set: """ 使用list实现set ADT Set() length() contains(element) add(element) remove(element) equals(element) isSubsetOf(setB) union(setB) intersect(setB) difference(setB) iterator() """ def __init__(self): self._theElements = list() def __len__(self): return len(self._theElements) def __contains__(self, element): return element in self._theElements def add(self, element): if element not in self: self._theElements.append(element) def remove(self, element): assert element in self, 'The element must be set' self._theElements.remove(element) def __eq__(self, setB): if len(self) != len(setB): return False else: return self.isSubsetOf(setB) def isSubsetOf(self, setB): for element in self: if element not in setB: return False return True def union(self, setB): newSet = Set() newSet._theElements.extend(self._theElements) for element in setB: if element not in self: newSet._theElements.append(element) return newSet
3.2 Maps or Dict: Key-value pairs, implemented internally by Python using hash.
class Map: """ Map ADT list implemention Map() length() contains(key) add(key, value) remove(key) valudOf(key) iterator() """ def __init__(self): self._entryList = list() def __len__(self): return len(self._entryList) def __contains__(self, key): ndx = self._findPosition(key) return ndx is not None def add(self, key, value): ndx = self._findPosition(key) if ndx is not None: self._entryList[ndx].value = value return False else: entry = _MapEntry(key, value) self._entryList.append(entry) return True def valueOf(self, key): ndx = self._findPosition(key) assert ndx is not None, 'Invalid map key' return self._entryList[ndx].value def remove(self, key): ndx = self._findPosition(key) assert ndx is not None, 'Invalid map key' self._entryList.pop(ndx) def __iter__(self): return _MapIterator(self._entryList) def _findPosition(self, key): for i in range(len(self)): if self._entryList[i].key == key: return i return None class _MapEntry: # or use collections.namedtuple('_MapEntry', 'key,value') def __init__(self, key, value): self.key = key self.value = value
3.3 The multiArray ADT, a multi-dimensional array, is generally simulated using a one-dimensional array, and then the elements are obtained by calculating the subscript
class MultiArray: """ row-major or column-marjor ordering, this is row-major ordering MultiArray(d1, d2, ...dn) dims(): the number of dimensions length(dim): the length of given array dimension clear(value) getitem(i1, i2, ... in), index(i1,i2,i3) = i1*(d2*d3) + i2*d3 + i3 setitem(i1, i2, ... in) 计算下标:index(i1,i2,...in) = i1*f1 + i2*f2 + ... + i(n-1)*f(n-1) + in*1 """ def __init__(self, *dimensions): # Implementation of MultiArray ADT using a 1-D # array,数组的数组的数组。。。 assert len(dimensions) > 1, 'The array must have 2 or more dimensions' self._dims = dimensions # Compute to total number of elements in the array size = 1 for d in dimensions: assert d > 0, 'Dimensions must be > 0' size *= d # Create the 1-D array to store the elements self._elements = Array(size) # Create a 1-D array to store the equation factors self._factors = Array(len(dimensions)) self._computeFactors() @property def numDims(self): return len(self._dims) def length(self, dim): assert dim > 0 and dim < len(self._dims), 'Dimension component out of range' return self._dims[dim-1] def clear(self, value): self._elements.clear(value) def __getitem__(self, ndxTuple): assert len(ndxTuple) == self.numDims, 'Invalid # of array subscripts' index = self._computeIndex(ndxTuple) assert index is not None, 'Array subscript out of range' return self._elements[index] def __setitem__(self, ndxTuple, value): assert len(ndxTuple) == self.numDims, 'Invalid # of array subscripts' index = self._computeIndex(ndxTuple) assert index is not None, 'Array subscript out of range' self._elements[index] = value def _computeIndex(self, ndxTuple): # using the equation: i1*f1 + i2*f2 + ... + in*fn offset = 0 for j in range(len(ndxTuple)): if ndxTuple[j] < 0 or ndxTuple[j] >= self._dims[j]: return None else: offset += ndexTuple[j] * self._factors[j] return offset
第4章:Algorithm Analysis
The Big O notation is generally used to measure the average time complexity of the algorithm, 1 < log(n) < n < nlog(n) < n^2 < n^3 < a^n. Understanding the average time complexity of common data structure operations is helpful to use more efficient data structures. Of course, sometimes it needs to be measured in time and space, and some operations may even degrade, such as the append operation of the list. If the list space is not enough, Will open up new space, the operation complexity degrades to O(n), and sometimes it is necessary to use amortized analysis (amortized)