Data Structure

         1. Introduction

                  Data structure mainly studies the computer operation objects that appear in non-numerical programming and the relationship and operation between them, etc.

        the term  

        Data ( Data ): Information     refers to the general term for all symbols that can be input into a computer and processed by a computer program in computer science;

                              For example, in the symbolic representation of objective things, in the structure, the content depicted in the structure is data

        Data Element ( Data Element ) : the basic unit of data  

        Data item ( Data Item ) : It is the smallest unit with independent meaning. Data item constitutes a data element

         A data element can contain one or several data items. For example, a column in a database table is a data item, and each row is a data element.

        The basic unit of data is usually considered and processed as a whole in the calculation program. Sometimes, a data element can be composed of several data items; for example, the number information of a book is a data element, and the bibliographic information Each item (book title, author name, etc.) is a data item, and the data item is the smallest unit of data. There are 10 elements in the array int a[10], and the structure as a whole is a data element

        Data Object ( Data Object ) : a collection of elements

It is a collection of data elements with the same nature and is a subset of data; 
such as: integer data object N = {0, ±1, ±2, ±3, ...}, alphabetic character data object C = {'A', ' B', 'C', ..., 'Z'}

       Data Structure ( Data Structure ) :

                 It is a collection of data elements with one or more specific relationships between them. (Usually there are four types of basic structures)

                         Set : The data elements in the structure have no other relationship except the relationship of "belonging to the same set";

                         Linear structure : There is a one-to-one relationship between the data elements in the structure. Except for  the first element, other elements have only one predecessor, and except for the last element, other elements have only one successor

                        Tree structure : There is a one-to-many relationship between data elements in the structure    , that is, an element has only one predecessor but can have multiple successors

                        Graph or network structure : the data elements in the structure have a many-to-many relationship,    and the logical relationship between elements can be arbitrary

                 It can be seen from the above      that a data structure is a collection of data elements with a structure 

     Three elements of data structure

(1) Logical structure : description of the relationship between data, virtual

The logical structure uses a binary group, B= ( K, R ) , K is a set of nodes, R is a set of relations on K , for example < k, k ' > means that k is the predecessor of k' , and k ' is k The successor of , the logical structure of adjacent nodes        is independent of the storage structure and the storage structure depends on the logical structure

      (1.1) Linear structure: there is and only one start node and one terminal node, and all nodes have at most one direct predecessor and one direct successor ( 1 to 1 ) . Typical examples are: linear list, stack, queue, array, string

      (1.2) Non-linear structure: each node can have more than one direct predecessor and direct successor ( not 1 to 1 ) , typical trees, graphs, sets

 (2) Storage structure : The data structure mapped in the computer is called the storage structure, and the real    

                             The storage structure is the image of the logical relationship and the image of the element itself, and the storage structure is essentially memory allocation

     Note: The smallest unit of storing information in a computer is called a bit (bit), 8 bits represent a byte (byte), two bytes are called a word (word), and bytes, words or more binary bits can be called is a bit string, which is called an element or a node. When a data element consists of several data items, the sub-bit string corresponding to each data item in the bit string is called a data field

        (2.1) Sequential storage structure (sequential storage structure ) : Store logically connected nodes in physically adjacent storage units, and the logical relationship between nodes is reflected by the adjacency relationship of the storage units. Generally, an array or a structure array is used to describe.  

       (2.2) Linked Storage Structure (LinkedStorage Structure ): It does not require logically adjacent nodes, they are physically adjacent, and the logical relationship between nodes is represented by an additional (pointer) reference field. The reference field of a node often guides the storage location of the next node. Generally, a linked list is used to describe

       (2.3) Index storage method (index ) : The index storage method is a storage method that uses an additional index table to store node information. An index table consists of several index entries. The general form of the index item in the index storage mode is ( keyword, address ) . Among them, the key is a data item that can uniquely identify a node.

      (2.4) Hash storage method (hash) : directly calculate the storage address of the node through the hash function according to the keyword of the node

  (3) Data calculation : some operations, add, delete, modify, check

       Data Type ( Data Type ) : type

A general term for a set of values ​​and a set of operations defined on this value set (for example, an integer variable in C language whose value set is an integer on a certain interval), and the operations defined on it are addition, subtraction, multiplication Arithmetic operations such as digital and analog operations), according to the different characteristics of "value", data types in high-level programming languages ​​can be divided into two categories

            Non-structural atomic types : Atomic types are not decomposable. For example: C's basic types (integer, real, character, and enumeration types), pointer types, and empty types.

           Structural type : The value of a structural type is composed of several components according to a certain structure, so it can be decomposed, and its components can be non-structural or structural. For example: the value of an array is composed of several components, each A component can be an integer, can be an array, etc. In a sense, a data structure can be regarded as "a group of values ​​with the same structure", and a data structure can be regarded as a data structure and defined on it A set of operations is composed.

            Abstract Data Type (Abstract Data Type, ADT) : Refers to a mathematical model and a set of operations defined on the model. The definition of an abstract data type depends only on its set of logical characteristics, rather than its internal representation and implementation in the computer Irrelevant, that is, no matter how its internal structure changes, as long as its mathematical characteristics remain unchanged, it will not affect its external use.

       Abstract Data Type ( Abstract Data Type , ADT ) : model, type

A software module containing an abstract data type should usually contain three parts : " definition , representation and implementation
" . The definition of an abstract data type consists of a value range and a set of operations defined in the value range. According to the different characteristics of its value, Can be divided into 3 types

           Atomic data type (atomic data type) The value of a variable belonging to an atomic type is not decomposable. There are fewer such abstract data types, because in general, the existing inherent data types are sufficient to meet the requirements, but sometimes it is necessary to define new atoms Data type, such as an integer whose whole digit is 100;

            structure type

                     Fixed-aggregate data type (fixed-aggregate data type) is a variable of this type, and its value is composed of a certain number of components in a certain structure. For example, a complex number is composed of two real numbers in a certain order relationship.

Compared with the fixed aggregate type, the variable-aggregate date type (variable-aggregate date type) has an uncertain number of components that constitute the "value" of the variable-aggregate type. For example, an abstract data type of "ordered integer sequence" can be defined                      , The length of the sequence is variable. The fixed aggregation type and the variable aggregation type can be collectively referred to as

            Polymorphic data type (polymorphic data type) refers to the data type whose value composition is uncertain. In the abstract data type, no matter what characteristics the elements have, the relationship between the elements is the same, and the basic operations are also the same. From the abstract data type From the point of view, it has the same mathematical abstraction characteristics, so it is called polymorphic data type .

     2. Algorithm

       1. Concept

          Algorithm + data structure = program , indicating that data structure and algorithm are two major elements of a program, and the two complement each other and are indispensable. Algorithms are the soul of programs.

          Algorithm : An algorithm is a method or a process for solving a problem

          Program : A program is a concrete implementation of an algorithm in a programming language

          Algorithms and programs are both used to express logical steps to solve problems, but algorithms are independent of specific computers and have nothing to do with specific programming languages, while programs are just the opposite; programs are algorithms, but algorithms are not necessarily programs .

        2. Characteristics of the algorithm

           Finiteness : The algorithm must end after executing a finite number of steps, and each step must be completed in finite time.

           Determinism : The meaning of each step in the algorithm must be deterministic and cannot be ambiguous.

           Inputs : An algorithm can have zero or more inputs. 

           Outputs : An algorithm has one or more outputs.

           Feasibility : An algorithm must be feasible, that is, each operation in the algorithm can be realized by a known set of basic operations.

        The representation of a data structure (storage structure) is described by a type definition (typedef). The data element type convention is ElemType

       3. Time complexity

Algorithm execution time: The execution time of an algorithm is roughly equal to the sum of the execution time of all its statements . The execution time of a statement refers to the product of the execution times of the statement and the time required for one execution.

Statement frequency: refers to the number of times the statement is repeatedly executed in an algorithm. The sum of all statement frequencies in the algorithm is recorded as T(n) , and the basic operation frequency is f(n) , T(n)=O(f( n)) means that as the problem scale n increases, the growth rate of the execution time of the algorithm is the same as that of f(n) , which is called the asymptotic time complexity of the algorithm, or time complexity for short , so the time complexity is equivalent to count frequency

Time complexity is how many times the execution statement is called

Time complexity depends on the problem size , and the initial state of the data

Select the item with the fastest growth rate in T(n)=O(f(n)) , and the coefficient must be written as 1. If f(n) has no relationship with n, that is , the frequency is a constant , that is, a number that can be clearly expressed , write O (1)

Worst time complexity and average time complexity The time complexity in the worst case is called the worst time complexity. Generally not specifically stated, the time complexity discussed is the time complexity in the worst case. The reason for this is: the worst-case time complexity is an upper bound on the running time of the algorithm on any input instance, which guarantees that the running time of the algorithm will not be longer than any input instance. 

rule

T(n)=T1(n)+T2(n)=O(f(n)+g(n))=O(max(f(n),g(n))

T(n)=T1(n)*T2(n)=O(f(n)*g(n))=O(f(n)*g(n))

The general calculation of the statement frequency in the deepest loop is the time complexity

Common time complexity

Common time recovery: O(1) <O(log 2 n) < O(n) < O(n log 2 n) < O(n 2 ) < O(n 3 ) < O(2 n ) < O (n!) < O( n n )

how to calculate time complexity

(1) If it is called only once, such as:
x=5; 
if(x<-4)
{x=x+4;}
else
{x=x+3;}
, only one call will be made statement, then O(n)=1;  
(2) If called twice, such as:
x=5;
if(x<-4)
{x=x+4;}
else
{x=x+3;}
x =x+56;
The content in curly brackets will only call one statement, but at the end, there is still a calculation formula to call the statement; the total is called 2 times. Then O(n)=2;
(3) Use a FOR loop to call
for(x=0;x<n;x++)
{x=x+1;}
x will loop from 0 to n-1, and execute the statement It is to add the current x value to the new x and call it n times in total; then O(n)=n; (
4) use 2 nested FOR loops to call
for(x=0;x<n;x++)
{ for( y=1;y<=n;y++) {x=x+y;} }



When encountering nested loops, you can first fix the variables in the outer FOR statement to the initial value x=0, mainly depending on the time complexity of the inner FOR statement. Obviously, the number of executions of the inner statement is from 1 to n to call a total of n times, O(n)=n; this is only the call when x=0. x can go from 0 to n-1, n times in total. Each call will execute n calls to y, therefore, execute the statement x=x+y; a total of n*n calls will be made. O(n)=n^2.

The number of execution times of the execution statement is the time complexity. Note:
(1) Find the correct execution statement.
(2) The initial value and termination value in the for loop .
for(i=0;i<n;i++) The value of i changes from 0 to n-1, a total of n times.
for(i=0;i<=n;i++) The value of i changes from 0 to n, a total of n+1 times.
(3) Pay attention to the calling order of the for loop, from the inside to the outside.

Note: Visiting the i-th node is random time complexity is o(1); those that do not need to be moved are o(1) and those that need to be moved are o(n);

         4. Space complexity 

Regarding the storage space requirements of the algorithm, similar to the time complexity of the algorithm, we use the space complexity as the measure S of the storage space required by the algorithm, which is recorded as: S(n)=O(f(n)) , where n is The scale of the problem, O means order of magnitude.

The specific storage amount occupied by the input data only depends on the problem itself, and has nothing to do with the algorithm. Therefore, it is only necessary to analyze the auxiliary space required for the implementation of the algorithm . If the required auxiliary space is constant relative to the input data, the algorithm is said to work in place , and the auxiliary space is O(1)

          3. Linear table

     1. Definition

A linear list ( Linear_List ) is referred to as a list for short : n ( n≥0 ) finite sequences   of data elements of the same type , where n is the length of the list, and when n=0, the linear list is an empty list. If L is used to name the linear table, it is generally expressed as follows:     L=(a1, a2, ..., ai, ai+1, ..., an)

A linear table is the simplest, most basic, and most commonly used linear structure (a linear structure is characterized by a linear relationship between data elements ). It has sequential structure storage and chain structure storage, and its main basic operations include insertion, deletion, and search. In a linear table, the types of data elements are the same, or a linear table is a linear structure composed of data elements of the same type.

The length of the linear table: the number of data elements in the linear table

Empty list: a linear list with length equal to zero

Every element except the first has one and only one immediate predecessor.

Every element except the last has one and only one immediate successor.

    2. Features

Identity : The linear table is composed of similar data elements, and each a i must belong to the same data object

Finiteness : A linear table consists of a finite number of data elements, and the length of the table is the number of data elements in the table

Order : There is an order-even relationship between adjacent data elements in the linear table <a i , a i +1>

  • The number of elements in the table is limited.
  • The elements in the table have a logical order, and the order of each element in the sequence has its own sequence.
  • The elements in the table are all data elements, and each table element is a single element.
  • The data types of the elements in the table are all the same. This means that each table element occupies the same amount of storage space.

Unique first element, unique last element , every element except the first element has a predecessor , every element except the last element has a successor , each element has a bit order

Note: Note: A linear list is a logical structure that represents a one-to-one adjacency relationship between elements. Sequence list and linked list refer to the storage structure , which belong to different levels of concepts, so don't confuse them 

 3. Linear table sequential storage

        The sequential storage of the linear table ( Sequential Mapping , referred to as the sequential table ) : refers to using a group of storage units with continuous addresses to store each element in the linear table in sequence, so that the logically adjacent data elements in the linear table are stored in the adjacent in the physical storage unit.

         Since the one-dimensional array of C language occupies a storage area with continuous addresses in the memory , the core of sequential storage is the array

         A linear table with a sequential storage structure is usually called a sequential table

        Suppose there are n elements in the linear table , each element occupies k units (bytes), and the address of the first element is loc(a1) , then the address loc(a i ) of the i -th element can be calculated :

                 place(ai) = place(a 1 )+(i-1)×k

where loc(a 1 ) is called the base address. Distinguish between the serial number of the element and the subscript of the array, such as the serial number of a 1 is 1 , and its corresponding array subscript is 0

         The characteristics of the sequence table: random access, search time complexity O (1), delete, insert and move a large number of elements

          Sequence table structure description    

               Declaration of a struct type :

                          typedef struct List

                             {

                                       ElemType data[MAXSIZE]; // Array stores data elements     

                                       int length; // The current length of the linear table

                              }SqList, *list;

Note: the operation of the sequence table

int isEmpty(SqList &L);//判断表是否为空
int getElem(SqList L, int i);//返回第i个位置的值
int listInsert(SqList &L, int i, int e);//在指定位置第i处插入数据e
int listDelete(SqList &L, int i);//删除指定位置的元素
void printList(SqList &L);//打印线性表
int listLength(SqList &L);//求线性表的长度
void initList(SqList &L);//初始化线性表
int locateElem(SqList &L, int x)//返回该值的位置
int destroylist(sqlist &l)//销毁链表

4. Dynamic allocation of sequence table

       The statically allocated sequence table has fixed memory, less memory allocation, easy to overflow, and more memory allocation wastes memory, so it is best to dynamically allocate, but it still belongs to the sequence table, not the linked list

     The malloc() function is used to dynamically allocate memory space

 Note: The release of static allocation is determined by the program, and the memory is released only after the main function runs

        The release of dynamic allocation is manually released, the keyword is free (parameter);

     The realloc() function is used to reallocate the memory space, and its prototype is: 
        void* realloc (void* ptr, size_t size);

[Parameter description] ptr is the memory space pointer to be reallocated, and size is the size of the new memory space.

realloc() reallocates the space of size to the memory pointed to by ptr, and the size can be larger or smaller than the original one, and can also remain unchanged. When  the memory space allocated by malloc() and calloc()  is not enough, realloc() can be used to adjust the allocated memory.

If ptr is NULL, its effect is the same as malloc(), that is, allocate size bytes of memory space.

If the value of size is 0, the memory space pointed to by ptr will be released, but since no new memory space has been opened, a null pointer will be returned; similar to calling  free() .

A few points to note:

  • The pointer ptr must be a pointer that has been successfully allocated in the dynamic memory space. A pointer in the form of the following is not allowed: int *i; int a[2]; it will cause a runtime error. You can simply remember it like this: use malloc(), calloc(), realloc() The pointer allocated successfully can be accepted by the realloc() function.
  • After successfully allocating memory, ptr will be reclaimed by the system, and you must not do any operations on the ptr pointer, including free(); on the contrary, you can perform normal operations on the return value of the realloc() function.
  • If the operation is to expand the memory, the data in the memory pointed to by ptr will be copied to the new address (the new address may also be the same as the original address, but still cannot perform any operations on the original pointer); if the operation is to shrink the memory, the original data will be deleted Copy and truncate to new length.


【Return value】The allocation succeeds and returns the new memory address, which may be the same as ptr or different; if it fails, it returns NULL.

Note: If the allocation fails, the memory pointed to by ptr will not be released, its content will not change, and it can still be used normally

Supplement: usage of references

       A reference is an alias of a variable (target), and the operation on the reference is exactly the same as the direct operation on the variable. It belongs to the same storage unit.

 
 Reference declaration method: type identifier & reference name = target variable name;

[Example 1]: int a; int &ra=a; //Define the reference ra, which is the reference of the variable a, that is, an alias Explanation: (1)

  &

  here is not an address calculation, but an identification.

  (2) The type identifier refers to the type of the target variable.

  (3)
When declaring a reference, it must be initialized at the same time.

  (4) After the reference declaration is completed, it means that the target variable name has two names, that is, the original name of the target and the reference name, and the reference name cannot be used as an alias of other variable names.

   ra=1; is equivalent to a=1;

  
(5) Declaring a reference does not define a new variable, it only means that the reference name is an alias of the target variable name, which itself is not a data type, so the reference itself No storage unit is occupied, and the system does not allocate storage units for references. Therefore: to find the address of the reference is to find the address of the target variable. &ra is equal to &a.

  (6) A reference to an array cannot be established. Because an array is a collection of several elements, it is not possible to create an alias for an array.

Reference application

  1. Reference as a parameter

  An important function of reference is as a parameter of a function
. In the past, function parameter passing in C language was value passing. If a large block of data was passed as a parameter, pointers were often used, because this could avoid pushing all the entire block of data onto the stack and improve the efficiency of the program. But now (in C) there is an equally efficient option (and a necessary option in some special cases), that is, references.


  (1) Passing a reference to a function has the same effect as passing a pointer. At this time, the formal parameter of the called function is used as an alias of the actual parameter variable or object in the original calling function, so the operation on the formal parameter variable in the called function is the corresponding target object (in the main calling function) Call function) operation.

  (2) The parameters of the function are passed by reference, and no copy of the actual parameter is generated in the memory, it is directly operated on the actual parameter; while the parameters of the function are passed by the general variable, when the function call occurs, storage needs to be allocated to the formal parameter Unit, the formal parameter variable is a copy of the actual parameter variable; if an object is passed, the copy constructor will also be called . Therefore, when the data passed by the parameter is large, the efficiency and space occupied by the reference are better than that of the general variable.

  (3) Although using a pointer as a function parameter can also achieve the effect of using a reference, in the called function, the storage unit must also be allocated to the formal parameter, and the operation in the form of "*pointer variable name" needs to be repeated. This is prone to errors and the readability of the program is poor; on the other hand, at the calling point of the calling function, the address of the variable must be used as an actual parameter. References, on the other hand, are easier to use and clearer.

  If you want to use references to improve the efficiency of the program and protect the data passed to the function from being changed in the function, you should use constant references.

  2. Common

  references The declaration method of common references: const type identifier & reference name = target variable name;

  references declared in this way cannot modify the value of the target variable through reference, so that the target of the reference becomes const, achieving Citation security.

  The reason is that both the foo( ) and "hello world" strings generate a temporary object, and in C, these temporary objects are of const type. So the above expression is trying to convert an object of const type to a non-const type, which is illegal.

  Reference parameters should be defined as const as much as possible if they can be defined as const.

  3. Reference as a return value To

  return a function value by reference, the function definition must follow the following format:

type identifier & function name (formal parameter list and type description)
{function body}

  Description:

  (1) Return function value by reference , When defining a function, you need to add & before the function name

  (2) The biggest advantage of returning a function value by reference is that a copy of the returned value is not generated in memory.

  References as return values ​​must comply with the following rules:

  (1) References to local variables cannot be returned. This article can refer to Item 31 of Effective C [1]. The main reason is that local variables will be destroyed after the function returns, so the returned reference becomes a "nothing" reference, and the program will enter an unknown state.

  (2) A reference to the memory allocated by new inside the function cannot be returned. This article can refer to Item 31 of Effective C [1]. Although there is no problem of passive destruction of local variables, but for this situation (returning a reference to the memory allocated by new inside the function), it faces other embarrassing situations. For example, the reference returned by the function only appears as a temporary variable without being assigned an actual variable, then the space pointed to by the reference (allocated by new) cannot be released, causing a memory leak.

  (3) A reference to a class member can be returned, but preferably const. This principle can refer to Item 30 of Effective C [1]. The main reason is that when the attribute of an object is associated with a business rule, its assignment is often related to some other attribute or the state of the object, so it is necessary to encapsulate the assignment operation in a business rule. If other objects can get a non-const reference (or pointer) to the property, then a mere assignment to the property would break the integrity of the business rules.
  4. References and polymorphism

  References are another means besides pointers that can produce polymorphic effects. This means that a reference to a base class can point to an instance of its derived class.

  [Example 7]:

class A;
class B: public A{...};
B b;
A &Ref = b; // use the derived class object to initialize the reference

  Ref of the base class object can only be used to access the derived class object from the base Members inherited from a class are base class references pointing to derived classes. If there is a virtual function defined in class A, and this virtual function is rewritten in class B, polymorphic effects can be generated through Ref.

  Summary of references

  (1) In the use of references, it is meaningless to simply give a variable an alias. The purpose of references is mainly used in the transfer of function parameters to solve the problem of unsatisfactory transfer efficiency and space of large blocks of data or objects question.

  (2) Passing the parameters of the function by reference can ensure that no copy is generated during the parameter transfer, which improves the efficiency of the transfer, and through the use of const, the security of the reference transfer is guaranteed.

    5. Linked list

  1. Head interpolation: the order of insertion is opposite to the order in the linked list
  2. Tail insertion method: the order of insertion is the same as the order in the linked list

        ( 1) Features

          Logical order and physical order are not necessarily the same, physical memory does not require continuous logical relationship between elements expressed by pointers

        (2) node

         In order to correctly represent the logical relationship between nodes, it is necessary to store the address (or position) information indicating its successor node while storing the value of each data element in the linear table. The storage image composed of these two parts of information is called a node. point ( Node )

        (3) Singly linked list

            Single list

Guess you like

Origin blog.csdn.net/Wang_kang1/article/details/82966533