Summary data structure

  • Data Structure Category:
  1. Logical structure: a collection structure, linear structure, a tree structure, pattern structure
  2. Physical Structure: sequential storage structure, storing a link structure
  • algorithm:
  1. Describe a specific algorithm to solve the problem solution step, expressed as a finite sequence of instructions in a computer, and each represents one or more instruction operations
  2. Algorithm has five basic features: input, output, finite, certainty and can feasibility
  3. Complexity cycle is: equal to the complexity of the loop multiplied by the number of cycles run
  • Method order to derive a large O:
  1. All run-time constants substituted in addition by a constant 1
  2. In the run function in the revised, retaining only the highest order term
  3. If the highest order entry exists and is not 1, then the removal of the constant term is multiplied
  • Time lapsed time complexity arranged Size:

Time Complexity: refers to the demand of the running time

Space complexity: refers to the space requirements

Linear table (array)

  • Finite sequence of zero or more data elements: linear table (List)
  1. Sequential storage structure of linear form, refers to the use of addresses are sequentially contiguous memory locations storing the data elements in the linear form. Description sequential storage structure requires three attributes:
  2. Starting position of memory space: the array data, its storage location is a storage location of the storage space.
  3. Maximum linear table stored easily: the array length MaxSize.
  4. Table linear current length: length.
  5. Sequential storage structure of linear form, when stored, read data, time complexity is O (1), and the inserted or deleted, the time complexity is O (n). Description for the number of elements does not change much, but more data is stored in the application.

Single list

  1. Insertion and deletion algorithm: consists of two parts, first part is to find the i-th element of traversal; the second portion is insert and delete elements.
  2. On the insert and delete operations, and the linear form of sequential storage structure is not much advantage, however, if a plurality of insert, such as 10 when sequential storage means, to move the first insert (ni) elements, each time is O (n); and the list only need to find the i-th position of the pointer at the time, this time is O (n), the next movement of the pointer is O (1). So for insert and delete data more frequent operations, the more obvious advantages of efficiency single list.
  • Static lists: a list called a static list described array, data array element consists of two domains, data, and the next pointer, the index stored in the successor array element.

 

Stacks and queues

 

  • Stack (Stack) is a linear table is well defined for insertions and deletions in the trailer.
  1. We stack allows insertion and deletion of one end called stack (top), and the other end is called the stack bottom (bottom), without any data elements called empty stack. (Last out)

  • Recursive definition: in high-level languages, different calls and other functions themselves are not essential. We own a direct call or call your own function indirectly by calling a series of statements, called recursive functions. The following purposes:
  • The recursion stack principle is implemented: the front row stage, one for each recursive local variables, functions, parameters, and return addresses are pushed onto the stack. In the return phase, the top of the stack of local variables, parameters, and the return address is popped for the rest of the code execution returns to the calling hierarchy, that is, to restore the state of the call.
  • Subtraction, multiplication and division (four operations) also use the stack principle: there are parentheses to the left into the stack, and a right parenthesis on the back of the stack, so that the digital operation period.
  • Postfix (RPN): Each numbers and symbols from left to right traversal expression, a numeral on the stack, the symbol will be encountered in the two figures stack of the stack, calculates the operation result into the stack, all the way to get results.
  • Results: To get a computer with the ability to handle our usual standard (infix) expressions, the most important thing is two steps:
  • Infix expressions into postfix notation (sign out operations to stack)
  • The postfix expression calculates the outcome (for digital stack out of operation)
  • It is iterative and recursive difference: using the iteration loop structure using a recursive selection structure. Recursive make the program structure clear, concise and easy to understand. But a large number of recursive function calls will create copies, will spend a lot of time and memory. Iteration is no need to repeatedly call functions and additional memory.

 queue

  • Queue (Queue) is allowed only in the insertion end and at the other end of the linear table delete operation.
  • Circular queues and queues chain comparison, the basic operation is the time constant, i.e., O (1), but is a prior application good circular queue space, is not released during use; the queue chain for each application and release node will be some time overhead, if a team into the team frequently, or circular queue a good point. For the space, a circular queue must have a fixed length, the number of elements with the storage space and the problem of waste; chain and the queue does not have this problem, despite the need for a pointer field will produce some space overhead, but can be acceptable. So in space, the queue chain more flexible.

 

String

  • String (string): it is a finite sequence of zero or more characters, and name string.
  • Linear table is more concerned about the operation of a single element, such as finding an element, inserting or deleting an element, but the more the string is to find the position of a substring, to give the specified position of the substring, replacing the substring operation.

 

  • Compare strings: KMP pattern matching algorithm

 https://www.cnblogs.com/zhangtianq/p/5839909.html

  Tree (Tree): a plurality of singly linked list to represent the structure

  • It is n (n> = 0) of a finite set of nodes. n = 0, called null tree. In any one nonempty tree: one and only one specific called the root (Root) node; when n> 1, the remaining nodes can be divided into m (m> 0) disjoint finite set T1, T2, ...., Tn, wherein each set is itself a tree, and the subtree rooted called (subTree).

 

Binary tree:

  • Five basic forms: empty binary tree; only a root node; only the left sub-tree root; root only right subtree; the root of both the left subtree have the right subtree; linear form can be understood as a tree a special form, the swash tree, i.e. all sub-trees are left or right node of the node.
  • Complete binary tree: The order of storage, not a complete binary tree, because some empty nodes will cause the array to the space useless.
  • Binary: the binary list storage structures, a pointer data field and two fields.
  • Before DLR-- preorder traversal (root front, left to right, a root of the tree is always in front of the left subtree, and the left subtree is always in front of the right subtree)
  • LDR-- inorder traversal (in root, from left to right, a left subtree of the tree is always in front of the root, the root is always in front of the right subtree)
  • LRD-- postorder (after roots, from left to right, a left subtree of the tree is always in front of the right subtree, always in front of the right subtree root)
  • It is known in the front and back may be determined unique binary tree, a binary tree before and after can not be determined uniquely known.

 

 

 

 

 

 

  • Threaded binary: doubly linked list storage structure, the empty left child node points to the predecessor, the empty right child pointing to the successor node. Bool additionally determine if two child nodes.
    If you frequently need to traverse a binary tree or find nodes need to be traversed in the sequence predecessor and successor, with binary storage structure cue list is a good choice.

    Huffman tree: The most basic compression coding method - Huffman coding. Generally, it is assumed to be encoded as a character set {d1, d2, ... dn}, the number or frequency of occurrence of individual characters in the message is set {w1, w2, ..., wn}, to d1, d2, ..., DN as leaf nodes to w1, w2, ..., wn Huffman tree is constructed as a leaf node corresponding weight values. Left branch of the predetermined Huffman tree represents 0, 1 representative of the right branch, from the root node to leaf nodes of the path the branch sequence consisting of 0 and 1 will be the node corresponding character for encoding, this is Huff Manchester encoding.

Map

Find (Searching)

  • It is based on a given value, which determines a key equal to a given data element value (or recorded) in a lookup table.
  • Logically, a set of lookup data structure is based on, there is no essential relationship between the record set. When the storage can be organized into a set of lookup tables, a tree like structure.
  • Static lookup table can be used to organize the data structure of a linear, sequential search algorithm may be used, if the primary sort key, the binary search techniques can be applied for efficient lookup.
  • Dynamic lookup, you can find technical considerations binary sort tree. It can also be used to find the hash table structure.
  • Static lookup: From the beginning of the first one or the last one, traversing find each element. Determining whether each will be used out of range for, can set a minimum value (Sentinel) while loop then determines whether the same value.

  • Binary search (Binary Search): provided that the linear table records is key ordered, linear table must be stored in order. Thought: In order table, taken as a comparison target intermediate recording, if equal, the search is successful; if the value is less than the predetermined intermediate recording key, the search continues in the left half region, is greater than if the search continues in the right area.
  1. Cons: frequent insert and delete tables, the maintenance of orderly sequencing will bring no small amount of work, not recommended.
  • Interpolation Find: The discovery methods compare the keywords of keyword lookup table lookup key and recording the maximum and minimum calculated value is not an intermediate, and Comparative out by interpolation.
  1. Disadvantages: uneven distribution data is not suitable for use interpolation to find.
  • Fibonacci find: according to the golden ratio.
  • Binary search for addition and removal operation (mid = (low + hight) / 2), the interpolation to find complex four arithmetic operations (mid = low + (high-low) * (key-a [low]) / ([high] a - a [low])), to find the Fibonacci addition or subtraction (mid = low + F [k-1] -1), the search process huge amounts of data, such subtle differences may affect the final Find efficiency.

If the search data set is an ordered linear table, and is stored sequentially, can be used to find binary, interpolation, Fibonacci search algorithm implemented, but since the order in the insertion and deletion operations, it takes a lot of time.

  • Binary sort tree: in order to sort by, find and easy to insert, delete hard.
  • Balanced binary tree: the ideal of a dynamic lookup table algorithm, to find and insert and delete time complexity is O (logN). No order for the collection itself, but also frequently find the need for frequent insert and delete operations.
  • B-tree: For memory and hard drive data exchange to prepare.

Hash table (hash table)

  • Sequence table lookup is a [i] is the value of the key "==", is considered equal until the search is successful, return i. When ordered lookup table, you may multiplex a [i] and the key to find the "<" or ">" to binary, until you find the index i. The final goal is to find the i, in fact, is relatively subscript, then the position calculation method by sequentially stored, Loc (ai) = Loc (a1) + (i-1) * c, i.e. through the first element memory storage position plus position i-1 cells, to give final memory address.
  • Hash table definition: hashing technique is to create a record corresponding to the determined relationship f between the storage position and its keyword, keyword that each key corresponds to a memory location f (key). We called the correspondence between the hash function f, also known as hash (Hash) function. This idea by using hashing technique records are stored in a contiguous storage space, this storage space is called continuous hash or a hash table (Hash table).
  • Hash technology is both a storage method, but also a search method.
  • Hash technology is best suited to solve the problem is to find equal value with the given record.
  • Hash function construction principle: simple calculation; hash address distribution
  1. The time required to calculate the hash address
  2. Length keyword
  3. The size of the hash table
  4. The distribution of keywords
  5. Find the recording frequency
  • Direct addressing method: f (key) = a * key + b (a, b are constants)
  • Digital Analysis: As will take four phone number as a key, because the corresponding top three brands, the intermediate corresponding to four home, the probability of repeated large.
  • Middle-square method: to take the middle of the square a few keywords for the address. We do not know for a keyword distribution, little bits.
  • Folding Method: keywords bits from left to right is divided into equal parts, after a few were summed to make a hash address. Appropriate to the circumstances more keywords digits.
  • Random number method: random function key hash address. f (key) = random (key). Keywords for unequal length.
  • In addition I stay: The most commonly used constructor method. The formula is f (key) = key mod p (p <= m) (length hash table is m, p is typically equal to or less than the length of the table together smallest prime number or not less than 20 mass comprising Factor)
  • Processing hash collision:
  1. Open-addressable: is the event of conflict, went looking for the next empty hash address, as long as the hash table is large enough, empty hash address can always find and record stores. Formula: fi (key) = (f (key) + di) MOD m (di = 1,2,3, ..., m-1). Also called linear probing method.
  2. Random Detection method: i.e. linear detection method is not linear di plus 1, but with a random function. Formula: fi (key) = (f (key) + di) MOD m (di = 1 square, square -1, 2 square, square -2, ..., q square, q <= m / 2) fi ( key) = (f (key) + di) MOD m (di-random number sequence is a)
  3. Re-hash function method: fi (key) = RHi (key) (i = 1,2, ..., k), RHi here is different hash functions, can be left in front of the remainder of said other folding, in use all take the square, when the hash address conflict, to a hash function to calculate the change. Disadvantages: the corresponding increase in computation time.
  4. Chain address method: all the synonyms of the keywords records are stored in a single linked list, there is conflict increases nodes. If the conflict and more, need to traverse performance loss when looking for a single list.
  5. Public Law overflow area: divided into basic table and the overflow table, if you can not find the base table overflow went to the table to find. Few cases of conflict data, the performance is still very high.

 

  • Hash table lookup performance analysis: look at all of the efficiency is the highest, because the time complexity is O (1), but in the case did not conflict. So the average search depends on the complexity of the following categories:
  1. Hash function is uniform
  2. The method of dealing with conflict, like the chain address will not accumulate, so they have to find a better average performance.
  3. Filling factor of the hash table, the loading factor a = the number of entries in the table recording / length hash. The greater the easier conflict, usually a hash table to find space to set up than a collection of large, although some waste of space, but in return is to find efficient big improvement.

Sequence

  • Inner and outer Sort Sort: the row is the sorting process, all the records to be sorted are all placed in memory. External sorting is due to the number of sort records too, can not be placed in memory at the same time, between the inner and the need to exchange data to be stored a plurality of times.
  1. The sort: insertion sort, exchange sorting, selection sort, merge sort.
  2. Simple algorithms: bubble sort, select, insert directly
  3. Improved algorithms: Hill sort, heap sort, merge sort, quick sort
  • Bubble Sort (Bubble Sort): pairwise comparisons with the keyword suburbs record, if the exchange reverse order, until there is no record so far in reverse order.
  • Selection Sort (Simple Selection Sort): By comparing the times between ni keyword, the keyword is selected from the minimum recording in the recording ni + 1, and and i (1 <= i <= n) records exchange . (Selecting a minimum exchange)
  • Directly into (Straight Insertion Sort): a record into the sorted been ordered, thereby obtaining a new record ordered by number in Table 1.
  • Hill sorting (Shell Sort): The distance a "increment" of a composition of the recording sequence, thus ensuring that the results were sorted directly into the sequence obtained is not substantially ordered local order.
  • Heap sort (Heap Sort): complete binary tree, each node value is equal to or greater than said left and right child large pile top, or less, said small children around the top stack.
  • Merge sort (Merging Sort): like an inverted tree. Assumed that the initial sequence contains n records, it can be seen as the n ordered subsequences, each subsequence of length 1, then merge twenty-two give [n / 2] ([x] denotes a minimum no less than x an integer) the ordered sequence of length 1 or 2; twenty-two merge again. Relatively if (SR [i] <SR [j]) is required so pairwise comparison, the jump is not present, it is stable sort, is a memory for comparison (Copy sequence, and then combined), but highly efficient and stable algorithms.
  • Quicksort (Quick Sort): Sort by trip row to be recorded is divided into two independent parts, wherein the key part of the record key is smaller than the other part of the recording, the recording can be continued respectively two portions sort Reed in order to achieve an orderly sequence the entire purpose.
  1. Hill quite directly into, belong to the same class is inserted; HEAPSORT quite selection sort, with belonging to the selected category; Quick Sort equivalent slowest bubble sort, belonging to class switching.
  2. If the array is very small, but not as fast ordering direct insertion sort is more better (plugged directly into a simple sort in the best performance). Because quicksort uses recursion.
  3. Hill sort quick sort >>> directly into> Select Sort> Bubble Sort

Guess you like

Origin www.cnblogs.com/wwhhgg/p/12566268.html