【王道考研】王道数据结构与算法详细笔记(全)

目录

第一章 数据结构绪论 

1.1 数据结构的基本概念

1.2 数据结构的三要素

1.2.1. 数据的逻辑结构

1.2.2. 数据的存储结构(物理结构)

1.2.3. 数据的运算

1.2.4. 数据类型和抽线数据类型

1.3 算法的基本概念

1.4 算法的时间复杂度

1.5 算法的空间复杂度

第二章 线性表

2.1 线性表的定义和基本操作

2.1.1 线性表的定义

2.1.2 线性表的基础操作

2.2 顺序表

2.2.1 顺序表的概念

2.2.2. 顺序表的实现​编辑

2.2.3 顺序表的基本操作

2.3 线性表的链式表示

2.3.1. 单链表的基本概念

2.3.2. 单链表的实现

2.3.3. 单链表的插入

2.3.4. 单链表的删除

2.3.5. 单链表的查找

2.3.6. 单链表的建立

2.3.7. 双链表

2.3.8. 循环链表

2.3.9. 静态链表

2.3.10. 顺序表和链表的比较

第三章 栈和队列

3.1. 栈

3.1.1. 栈的基本概念

3.1.2. 栈的基本操作

3.1.3. 栈的顺序存储实现

3.1.4. 栈的链式存储

3.2. 队列

3.2.1. 队列的基本概念

3.2.2. 队列的基本操作

3.2.3. 队列的顺序存储实现

3.2.4. 队列的链式存储实现

3.2.5. 双端队列

3.3. 栈与队列的应用

3.3.1 栈在括号匹配中的应用

3.3.2. 栈在表达式求值中的应用 

3.3.3. 栈在递归中的应用

3.3.4. 队列的应用

3.4. 特殊矩阵的压缩存储 

3.4.1 数组的存储

3.4.2 对称矩阵的压缩存储

3.4.3 三角矩阵的压缩存储

3.4.4 三对角矩阵的压缩存储

3.4.5 稀疏矩阵的压缩存储

第四章 串

4.1. 串的基本概念

4.2. 串的基本操作

4.3. 串的存储实现

4.3.1 静态数组实现

4.3.2 基本操作的实现

4.4. 串的朴素模式匹配

4.5. KPM算法

第五章 图

5.1. 树的概念

5.1.1. 树的基本定义

5.1.2. 树的常考性质

5.2. 二叉树

5.2.1. 二叉树的定义

5.2.2. 特殊二叉树

5.2.3. 二叉树的性质

5.2.4. 二叉树存储实现

5.3. 二叉树的遍历和线索二叉树

5.3.1. 二叉树的先中后序遍历

5.3.2. 二叉树的层序遍历

5.3.3. 由遍历序列构造二叉树

5.3.4. 线索二叉树的概念

5.3.5. 二叉树的线索化

5.3.6. 在线索二叉树中找前驱/后继

5.4. 树和森林

5.4.1. 树的存储结构 

5.4.2. 树和森林的遍历

5.5. 应用

5.5.1. 二叉排序树

5.5.2. 平衡二叉树

5.5.3. 哈夫曼树

第六章 图

6.1. 图的基本概念

6.2. 图的存储

6.2.1. 邻接矩阵

6.2.2. 邻接表

6.2.3. 十字链表、临接多重表

6.2.4. 图的基本操作

6.3. 图的遍历

6.3.1. 广度优先遍历

6.3.2. 深度优先遍历

6.4. 图的应用

6.4.1. 最小生成树

6.4.2. 无权图的单源最短路径问题——BFS算法

6.4.3. 单源最短路径问题——Dijkstra算法

6.4.4. 各顶点间的最短路径问题——Floyd算法

6.4.5. 有向⽆环图描述表达式

6.4.6. 拓扑排序

6.4.7. 关键路径

第七章 查找

7.1 查找概念

7.2 顺序查找

7.3 折半查找 

7.4 分块查找

7.5 红黑树

7.5.1 为什么要发明红黑树?

7.5.2 红黑树的定义

7.5.3 红黑树的插入

7.6 B树和B+树

7.6.1 B树 

7.6.2 B树的基本操作

7.6.3 B+树

7.6.4 B树和B+树的比较 

7.7  散列查找及其性能分析

7.7.1 散列表的基本概念

7.7.2 散列查找及性能分析

第八章 排序

8.1. 排序的基本概念

8.2. 插入排序 

8.2.1. 直接插入排序

8.2.2. 折半插入排序

8.2.3. 希尔排序 

8.3. 交换排序 

8.3.1. 冒泡排序

8.3.2. 快速排序 

8.4. 选择排序 

8.4.1. 简单选择排序 

8.4.2. 堆排序

8.5. 归并排序

8.6. 基数排序

8.7. 内部排序算法总结

8.7.1. 内部排序算法比较

8.7.2. 内部排序算法的应用

8.8. 外部排序

8.8.1. 外部排序的基本概念和方法

8.8.2. 败者树 

8.8.3. 置换-选择排序(生成初始归并段)

8.8.4. 最佳归并树


第一章 数据结构绪论 

1.1 数据结构的基本概念

  1. 数据:数据是信息的载体,符号的集合、所有能输入到计算机中并能被计算机程序处理的符号的集合,数据是计算机程序加工的原料。
  2. 数据元素:数据的基本单位,通常作为一个整体进行考虑和处理。一个数据元素可由若干数据项组成。
  3. 数据项:构成数据元素的不可分割的最小单位。
  4. 数据对象:数据对象是具有相同性值的数据元素的集合,是数据的一个子集。
  5. 数据结构:数据结构是相互之间存在一种或多种特定关系的数据元素的集合。

举例需要理解几点: 

  1. 学校里的好多类型的表:数据
  2. 单独的一张成绩单表:数据对象
  3. 成绩单中每一行有姓名、课程、班级、成绩:数据元素
  4. 成绩单中每一行的每一个表格姓名等都是一个个的数据项

1.2 数据结构的三要素

1.2.1. 数据的逻辑结构

逻辑结构是指数据元素之间的逻辑关系,即从逻辑关系上描述数据。

逻辑结构包括

  1. 集合结构:结构中的数据元素之间除“同属一个集合”外,别无其它关系(例如:一群羊)。
  2. 线性结构:结构中的数据元素之间只存在一对一的关系,除了第一个元素,所有元素都有唯一前驱;除了最后一个元素,所有元素都有唯一后继(例如:排队取号)。
  3. 树形结构:结构中数据元素之间存在一对多的关系(例如:思维导图)。
  4. Graph structure : There is a many-to-many relationship between data elements (for example: road information).

1.2.2. Data storage structure (physical structure)

How to use a computer to represent the logical relationship of data elements?
The storage structure refers to the representation (also known as image) of the data structure in the computer, also known as the physical structure .

Storage structures include:

  1. Sequential storage : Store logically adjacent elements in storage units that are also physically adjacent, and the relationship between elements is reflected by the adjacency of the storage units.
  2. Linked storage : logically adjacent elements may not be physically adjacent, and the logical relationship between elements is represented by pointers indicating the storage addresses of the elements.
  3. Index storage : While storing element information, an additional index table is also established. Each item in the index table is called an index item, and the general form of an index item is (keyword, address).
  4. Hash storage : directly calculate the storage address of the element according to the keyword of the element, also known as hash (Hash) storage.

There are a few things to understand:

  1. If sequential storage is used, each data element must be physically continuous; if non-sequential storage is used, each data element can be physically discrete.
  2. The storage structure of the data affects the convenience of storage space allocation.
  3. The storage structure of data will affect the speed of data operation

1.2.3. Operation of data

  • Operations on data include the definition and implementation of operations.
  • The definition of an operation is to point out the function of the operation with respect to the logical structure.
  • The realization of the operation is aimed at the storage structure, pointing out the specific operation steps of the operation.

For a certain logical structure, combined with actual needs, define basic operations.
For example: logical structure -> linear structure

基本运算:
1.查找第i个数据元素
2.在第i个位置插入新的数据元素
3.删除第i个位置的数据元素......

1.2.4. Data types and drawline data types

A data type is a general term for a collection of values ​​and a set of operations defined on this collection. For example: define int integer, we can add, subtract, multiply and divide them.

  1. atomic type. A data type whose value cannot be subdivided. Such as bool and int types.
  2. structure type. A data type (for example: a structure) whose value can be broken down into several components (components).

Abstract Data Type (Abstract Data Type, ADT) is an abstract data organization and its related operations. ADT uses mathematical language to define the logical structure of data and define operations. Nothing to do with the specific implementation.

A few points to understand when discussing a data structure:

  1. Define logical structure (relationships between data elements)
  2. Define data operations (what kind of operations should be performed on this logical structure according to actual needs)
  3. Determine a certain storage structure, implement the data structure, and implement some basic operations on the data structure

1.3 Basic Concepts of Algorithms

Program = data structure + algorithm
data structure : how to use data to correctly describe real-world problems and store them in the computer.
Algorithms: how to efficiently process this data to solve practical problems

Algorithm is a description of the steps to solve a specific problem. It is a finite sequence of instructions, each of which represents one or more operations.
Features of the algorithm: 

  1. Finiteness: An algorithm must always end after executing a finite number of steps, and each step can be completed in finite time.
  2. Determinism: Each instruction in the algorithm must have a definite meaning, and only the same output can be obtained for the same input.
  3. Feasibility: The operations described in the algorithm can be realized by executing the basic operations that have been realized for a limited number of times.
  4. Inputs: An algorithm has zero or more inputs, which are taken from a specific set of objects.
  5. Outputs: An algorithm has one or more outputs, which are quantities that have some specific relationship to the inputs.

We can make an analogy: y = f(x) function, where x is the output, y is the output, and this function is the algorithm.

A good algorithm achieves:

  1. Correctness: The algorithm should be able to solve the problem correctly.
  2. Readability: Algorithms should be well readable to help people understand them.
  3. 健壮性:输入非法数据时,算法能适当地做出反应或进行处理,而不会产生莫名奇妙地输出结果。
  4. 效率与低存储量需求:花的时间少即:时间复杂度低。不费内存即:空间复杂度低。

1.4 算法的时间复杂度

  1. 顺序执行的代码只会影响常数项,可以忽略。
  2. 只需挑循环中的一个基本操作分析它的执行次数与 n 的关系即可。
  3. 如果有多层嵌套循环只需关注最深层循环循环了几次。
  • 事前预估算法时间开销T(n)与问题规模 n 的关系 (T 表示“time“)

O\left(1 \right )<O(\log_{2}n)<O(n)<O(n\log_{2}n)<O(n^{2})<O(n^{3})<O(2^{n})<O(n!)<O(n^{n})

1.5 算法的空间复杂度

  • 指算法消耗的存储空间(即算法除本身所需存储外的辅助空间)
  • 算法的空间复杂度S(n)定义为该算法所耗费的存储空间,它是问题规模n的函数。
    记为S(n)=O(g(n))


第二章 线性表

2.1 线性表的定义和基本操作

2.1.1 线性表的定义

  • 线性表是具有相同数据类型的n(n>0)个数据元素的有限序列。
    (其中n为表长,当n=0时线性表是一个空表。若用L命名线性表,则其一般表示为)

\large L=(a_{1},a_{2},...,a_{i},a_{1i+1},a_{n},)

  • 特点:
    1. 存在惟一的第一个元素。
    2. 存在惟一的最后一个元素。
    3. 除第一个元素之外,每个元素均只有一个直接前驱。
    4. 除最后一个元素之外,每个元素均只有一个直接后继
  • 几个概念:
    1. a_{i}是线性表中的“第i个”元素线性表中的位序。
    2. a_{1}是表头元素;a_{n}是表尾元素。
    3. 除第一个元素外,每个元素有且仅有一个直接前驱:除最后一个元素外,每个元素有且仅有一个直接后继。
  • 存储结构:
    1. 顺序存储结构:顺序表
    2. 链式存储结构:链表

2.1.2 线性表的基础操作

  1. InitList(&L): initialization table. Construct an empty linear table L and allocate memory space.
  2. DestroyList(&L): Destroy operation. Destroy the linear table and release the memory space occupied by the linear table L.
  3. ListInsert(&L;i,e): insert operation. Inserts the specified element e at the i-th position in the list L. 
  4. ListDelete(&L,i,&e): Delete operation. Delete the element at the i-th position in the list L, and return the value of the deleted element with e.
  5. LocateElem(L,e): Find operation by value. Finds an element in the table L with the given key value.
  6. GetElem(L,i): Bitwise search operation. Get the value of the element at position i in the list L.
  7. Length(L): Find the length of the table. Returns the length of the linear list L, that is, the number of data elements in L.
  8. PrintList(L): Output operation. Output all element values ​​of the linear table L in sequence.
  9. Empty(L): Empty operation is judged. Returns true if L is an empty list, otherwise returns false.

When to pass in the parameter reference "&"-- the modification result of the parameter needs to be "brought back" see the following example:

  • The first is the pass-by-value call:
#include<stdio.h>
void test(int x)  //形参是实参的临时拷贝
{
	x = 1024;
	printf("test函数内部 x=%d\n",x);
}
int main()
{
	int x = 1;
	printf("调用test前 x=%d\n",x);
	test(x);                       //这里的x改变了并没有传回来
	printf("调用test后 x=%d\n",x);

	return 0;
}

//输出为:
//调用test前 x=1
//test函数内部 x=1024
//调用test后 x=1
//请按任意键继续. . .

  • Then look at the call by address
#include<stdio.h>
void test(int &x)  //把x的地址传到函数
{
	x = 1024;
	printf("test函数内部 x=%d\n",x);
}
int main()
{
	int x = 1;
	printf("调用test前 x=%d\n",x);
	test(x);                       //这里的x通过函数传回来值改变了
	printf("调用test后 x=%d\n",x);

	return 0;
}


//输出为:
//调用test前 x=1
//test函数内部 x=1024
//调用test后 x=1024
//请按任意键继续. . .

2.2 Sequence table

We have read the logical structure and basic operations of the linear table, and now continue to learn the physical structure: sequential table 

2.2.1 The concept of sequence table

  • Sequential table : use sequential storage to realize sequential storage of linear tables. Store logically adjacent elements in physically adjacent storage units, and the relationship between elements is reflected by the adjacency of the storage units. 
  • The characteristics of the sequence table:
    1. Random access, that is, O(1)the i-th element can be found within time.
    2. High storage density, each node only stores data elements.
    3. It is inconvenient to expand the capacity (even if it is realized by dynamic allocation, the time complexity of expanding the length is relatively high, because the data needs to be copied to a new area).
    5. The insertion and deletion operations are inconvenient, and a large number of elements need to be moved: O(n).

2.2.2. Implementation of sequence table

  • Static allocation of the sequence table
    The table length of the sequence table cannot be changed once it is initially determined (the storage space is static)
//顺序表的实现--静态分配

#include<stdio.h>
#define MaxSize 10          //定义表的最大长度 
typedef struct{
	int data[MaxSize];      //用静态的"数组"存放数据元素
	int length;             //顺序表的当前长度  
}SqList;                    //顺序表的类型定义(静态分配方式) 
void InitList(SqList &L){
	 for(int i=0;i<MaxSize;i++){
	 	L.data[i]=0;        //将所有数据元素设置为默认初始值
		 }
	 L.length=0;
}
int main(){
	SqList L;               //声明一个顺序表
	InitList(L);            //初始化一个顺序表
	for(int i=0;i<MaxSize;i++){                //顺序表的打印
		printf("data[%d]=%d\n",i,L.data[i]);
	}
	return 0; 
}
  • Dynamic Allocation of Sequence Tables
//顺序表的实现——动态分配
#include<stdio.h>
#include<stdlib.h>  //malloc、free函数的头文件 
#define InitSize 10 //默认的初始值

typedef struct{
	int  *data;    //指示动态分配数组的指针
	int MaxSize;   //顺序表的最大容量
	int length;    //顺序表的当前长度 
}SeqList; 

void InitList(SeqList &L){                 //初始化
	//用malloc 函数申请一片连续的存储空间
	L.data =(int*)malloc(InitSize*sizeof(int)) ;
	L.length=0;
	L.MaxSize=InitSize;
} 

void IncreaseSize(SeqList &L,int len){  //增加动态数组的长度
	int *p=L.data;
	L.data=(int*)malloc((L.MaxSize+len)*sizeof(int));
	for(int i=0;i<L.length;i++){
		L.data[i]=p[i];      //将数据复制到新区域 
	}
	L.MaxSize=L.MaxSize+len; //顺序表最大长度增加len
	free(p);                 //释放原来的内存空间 
	
} 
int main(){
	SeqList L;        //声明一个顺序表
	InitList(L);      //初始化顺序表
	IncreaseSize(L,5);//增加顺序表的长度
	return 0; 
}

2.2.3 Basic operation of sequence table

  • Insertion operation of sequence table
    ListInsert(&L,i,e): Insertion operation. Inserts the specified element e at the i-th position in the list L.
    Average Time Complexity =O(n)
#define MaxSize 10    //定义最大长度
typedef struct{
	int data[MaxSize];  //用静态的数组存放数据
	int length;         //顺序表的当前长度
}SqList;                //顺序表的类型定义  

bool ListInsert(SqList &L, int i, int e){ 
    if(i<1||i>L.length+1)    //判断i的范围是否有效
        return false;
    if(L.length>=MaxSize) //当前存储空间已满,不能插入  
        return false;

    for(int j=L.length; j>=i; j--){    //将第i个元素及其之后的元素后移
        L.data[j]=L.data[j-1];
    }
    L.data[i-1]=e;  //在位置i处放入e
    L.length++;      //长度加1
    return true;
}

int main(){ 
	SqList L;   //声明一个顺序表
	InitList(L);//初始化顺序表
	//...此处省略一些代码;插入几个元素

	ListInsert(L,3,3);   //再顺序表L的第三行插入3

	return 0;
}
  • Delete operation of sequence list
    ListDelete(&Li,&e): delete operation. Delete the element at the i-th position in the list L, and return the value of the deleted element with e.
    Average Time Complexity =O(n)
#define MaxSize 10

typedef struct {
	int data[MaxSize];
	int length;
} SqList;

// 删除顺序表i位置的数据并存入e
bool ListDelete(SqList &L, int i, int &e) {
	if (i < 1 || i > L.length) // 判断i的范围是否有效
		return false;
	e = L.data[i-1]; // 将被删除的元素赋值给e 
	for (int j = i; j < L.length; j++) //将第i个位置后的元素前移 
		L.data[j-1] = L.data[j];
	L.length--;
	return true; 
}

int main() {
	SqList L;
	InitList(L);
	int e = -1;
	if (ListDelete(L, 3, e))
		printf("已删除第3个元素,删除元素值为%d\n", e);
	else
		printf("位序i不合法,删除失败\n"); 
	return 0; 
} 
  • Sequence lookup

  • Bitwise lookup of sequence table
    GetElem(L,): bitwise lookup operation. Get the value of the element at the i-th position in the table L
    Average time complexityO(1)
// 静态分配的按位查找
#define MaxSize 10

typedef struct {
	ElemType data[MaxSize]; 
	int length;
}SqList;

ElemType GetElem(SqList L, int i) {
	return L.data[i-1];
}
// 动态分配的按位查找
#define InitSize 10

typedef struct {
	ElemType *data;
	int MaxSize;
	int length;
}SeqList;

ElemType GetElem(SeqList L, int i) {
	return L.data[i-1];
}

  • LocateElem(L,e): Lookup operation by value . Find an element in a table L with a given key value
    Average Time Complexity =O(n)
#define InitSize 10          //定义最大长度 
typedef struct{
    ElemTyp *data;           //用静态的“数组”存放数据元素 
    int Length;              //顺序表的当前长度
}SqList;   

//在顺序表L中查找第一个元素值等于e的元素,并返回其位序
int LocateElem(SqList L, ElemType e){
    for(int i=0; i<L.lengthl i++)
        if(L.data[i] == e)  
            return i+1;     //数组下标为i的元素值等于e,返回其位序i+1
    return 0;               //推出循环,说明查找失败
}
//调用LocateElem(L,9)

2.3 Chain representation of linear table

 We have read the physical storage of the sequence table above, and then we learn about the singly linked list

2.3.1. The basic concept of singly linked list

  • Singly linked list: implements a linear structure with linked storage . A node stores a data element, and the relationship between each node is represented by a pointer.
  • Features:
    Advantages: It does not require a large continuous space, and it is convenient to change the capacity.
    Disadvantages: Random access is not possible, and it takes a certain amount of space to store pointers.
  • Two implementation methods:
    take the lead node, and it is more convenient to write code. The head node does not store data, and the next node pointed to by the head node stores actual data.
    If you don't take the lead, it's troublesome. The processing of the first data node and subsequent data nodes requires different code logic, and the processing of empty tables and non-empty tables requires different code logic.
typedef struct LNode
{                      //定义单链表结点类型
    ElemType data;     //数据域
    struct LNode *next;//指针域
}LNode, *LinkList;
  • Emphasize that this is a singly linked list - use LinkList
  • Emphasize that this is a node - use LNode* 

2.3.2. Implementation of singly linked list

  • no leader node
typedef struct LNode{
    ElemType data;
    struct LNode *next;
}LNode, *LinkList;

//初始化一个空的单链表
bool InitList(LinkList &L){
    L = NULL; //空表,暂时还没有任何结点
    return true;
}

void test(){
    LinkList L;  //声明一个指向单链表的头指针
    //初始化一个空表
    InitList(L);
    ...
}

//判断单链表是否为空
bool Empty(LinkList L){
    return (L==NULL)
}
  • lead node
typedef struct LNode
{
    ElemType data;
    struct LNode *next;
}LNode, *LinkList;

//初始化一个单链表(带头结点)
bool InitList(LinkList &L)
{  
    L = (LNode*) malloc(sizeof(LNode));  //头指针指向的结点——分配一个头结点(不存储数据)
    if (L == NULL)          //内存不足,分配失败
        return false;
    L -> next = NULL;       //头结点之后暂时还没有结点
    return true;
}

void test()
{
    LinkList L;  //声明一个指向单链表的指针: 头指针
    //初始化一个空表
    InitList(L);
    //...
}

//判断单链表是否为空(带头结点)
bool Empty(LinkList L)
{
    if (L->next == NULL)
        return true;
    else
        return false;
}

Comparison of leading nodes and non-leading nodes:

  • No leading node: writing code is troublesome! The processing of the first data node and subsequent data nodes requires different code logic, and the processing of empty tables and non-empty tables also requires different code logic; the node pointed by the head pointer is used to store actual data;
  • Leading node: The head node pointed to by the head pointer does not store actual data, and the next node pointed to by the head node stores actual data;

2.3.3. Insertion of singly linked list

  • Insert in bit order (lead node)
    Listlnsert(&Li,e): Insert operation. Insert the specified element e at the i-th position in the table L
    to find the i-1th node (precursor node), and insert the new node after it; the head node can be regarded as the 0th node, so Also applies when i=1.
    Average time complexity:O(n)
typedef struct LNode
{
    ElemType data;
    struct LNode *next;
}LNode, *LinkList;

//在第i个位置插入元素e(带头结点)
bool ListInsert(LinkList &L, int i, ElemType e)
{  
    //判断i的合法性, i是位序号(从1开始)
    if(i<1)
        return False;
    
    LNode *p;       //指针p指向当前扫描到的结点 
    int j=0;        //当前p指向的是第几个结点
    p = L;          //L指向头结点,头结点是第0个结点(不存数据)

    //循环找到第i-1个结点
    while(p!=NULL && j<i-1){     //如果i>lengh, p最后会等于NULL
        p = p->next;             //p指向下一个结点
        j++;
    }

    if (p==NULL)                 //如果p指针知道最后再往后就是NULL
        return false;
    
    //在第i-1个结点后插入新结点
    LNode *s = (LNode *)malloc(sizeof(LNode)); //申请一个结点
    s->data = e;
    s->next = p->next;
    p->next = s;                 //将结点s连到p后,后两步千万不能颠倒qwq

    return true;
}
  •  Insert in bit order (without the head node)
    Listlnsert(&L,i,e): Insert operation. Inserts the specified element e at the i-th position in the list L. Insert the new node after it;
    since there is no head node, there is no "0th" node, so ! When i=1, special processing is required - when inserting (deleting) the first element, the head pointer L needs to be changed;
typedef struct LNode
{
    ElemType data;
    struct LNode *next;
}LNode, *LinkList;

bool ListInsert(LinkList &L, int i, ElemType e)
{
    if(i<1)
        return false;
    
    //插入到第1个位置时的操作有所不同!
    if(i==1){
        LNode *s = (LNode *)malloc(size of(LNode));
        s->data =e;
        s->next =L;
        L=s;          //头指针指向新结点
        return true;
    }

    //i>1的情况与带头结点一样!唯一区别是j的初始值为1
    LNode *p;       //指针p指向当前扫描到的结点 
    int j=1;        //当前p指向的是第几个结点
    p = L;          //L指向头结点,头结点是第0个结点(不存数据)

    //循环找到第i-1个结点
    while(p!=NULL && j<i-1){     //如果i>lengh, p最后会等于NULL
        p = p->next;             //p指向下一个结点
        j++;
    }

    if (p==NULL)                 //i值不合法
        return false;
    
    //在第i-1个结点后插入新结点
    LNode *s = (LNode *)malloc(sizeof(LNode)); //申请一个结点
    s->data = e;
    s->next = p->next;
    p->next = s;          
    return true;

}

  • InsertNextNode(LNode *p, ElemType e) for specifying a node
    ; Given a node p, insert an element e after it; According to the link pointer of a singly linked list, it can only be searched backwards, so given a node p , then we can know the nodes after p, but we cannot know the nodes before p
typedef struct LNode
{
    ElemType data;
    struct LNode *next;
}LNode, *LinkList;

bool InsertNextNode(LNode *p, ElemType e)
{
    if(p==NULL){
        return false;
    }

    LNode *s = (LNode *)malloc(sizeof(LNode));
    //某些情况下分配失败,比如内存不足
    if(s==NULL)
        return false;
    s->data = e;          //用结点s保存数据元素e 
    s->next = p->next;
    p->next = s;          //将结点s连到p之后

    return true;
}                         //平均时间复杂度 = O(1)


//有了后插操作,那么在第i个位置上插入指定元素e的代码可以改成:
bool ListInsert(LinkList &L, int i, ElemType e)
{  
    if(i<1)
        return False;
    
    LNode *p;       //指针p指向当前扫描到的结点 
    int j=0;        //当前p指向的是第几个结点
    p = L;          //L指向头结点,头结点是第0个结点(不存数据)

    //循环找到第i-1个结点
    while(p!=NULL && j<i-1){     //如果i>lengh, p最后4鸟会等于NULL
        p = p->next;             //p指向下一个结点
        j++;
    }

    return InsertNextNode(p, e)
}

  • Pre-insert operation of specified node
    Suppose the node to be inserted is s, and insert s in front of p. We can still insert s after *p. Then exchange p->data with s->data, so that the logical relationship can be satisfied, and the time complexity can beO(1)
//前插操作:在p结点之前插入元素e
bool InsertPriorNode(LNode *p, ElenType e){
    if(p==NULL)
        return false;
    
    LNode *s = (LNode *)malloc(sizeof(LNode));
    if(s==NULL) //内存分配失败
        return false;

    //重点来了!
    s->next = p->next;
    p->next = s;       //新结点s连到p之后
    s->data = p->data; //将p中元素复制到s
    p->data = e;       //p中元素覆盖为e

    return true;
} 

2.3.4. Deletion of singly linked list

  • Delete nodes in bit order
    ListDelete(&L, i, &e): Delete operation, delete the element at the i-th position in the table L, and use e to return the value of the deleted element; the head node is regarded as the "0th" node;
    idea : Find the i-1th node, point its pointer to the i+1th node, and release the i-th node
typedef struct LNode{
    ElemType data;
    struct LNode *next;
}LNode, *LinkList;

bool ListDelete(LinkList &L, int i, ElenType &e){
    if(i<1) return false;

    LNode *p;       //指针p指向当前扫描到的结点 
    int j=0;        //当前p指向的是第几个结点
    p = L;          //L指向头结点,头结点是第0个结点(不存数据)

    //循环找到第i-1个结点
    while(p!=NULL && j<i-1){     //如果i>lengh, p最后会等于NULL
        p = p->next;             //p指向下一个结点
        j++;
    }

    if(p==NULL) 
        return false;
    if(p->next == NULL) //第i-1个结点之后已无其他结点
        return false;

    LNode *q = p->next;         //令q指向被删除的结点
    e = q->data;                //用e返回被删除元素的值
    p->next = q->next;          //将*q结点从链中“断开”
    free(q)                     //释放结点的存储空间

    return true;
}


  •  Delete the specified node
bool DeleteNode(LNode *p){
    if(p==NULL)
        return false;
    
    LNode *q = p->next;      //令q指向*p的后继结点
    p->data = p->next->data; //让p和后继结点交换数据域
    p->next = q->next;       //将*q结点从链中“断开”
    free(q);
    return true;
} //时间复杂度 = O(1)

2.3.5. Singly linked list search

  • Bitwise lookup GetElem(L, i) of a singly linked list
    : bitwise lookup operation to obtain the value of the element at the i-th position in the table L;
    average time complexityO(n)
LNode * GetElem(LinkList L, int i){
    if(i<0) return NULL;
    
    LNode *p;               //指针p指向当前扫描到的结点
    int j=0;                //当前p指向的是第几个结点
    p = L;                  //L指向头结点,头结点是第0个结点(不存数据)
    while(p!=NULL && j<i){  //循环找到第i个结点
        p = p->next;
        j++;
    }

    return p;               //返回p指针指向的值
}

  • LocateElem(L, e) by value of singly linked list
    : lookup operation by value, finds an element with a given key value in table L;
    average time complexity:O(n)
LNode * LocateElem(LinkList L, ElemType e){
    LNode *P = L->next;    //p指向第一个结点
    //从第一个结点开始查找数据域为e的结点
    while(p!=NULL && p->data != e){
        p = p->next;
    }
    return p;           //找到后返回该结点指针,否则返回NULL
}

  • Find the length of the singly linked list
    Length(LinkList L): To calculate the number of data nodes (excluding the head node) in the singly linked list, you need to visit each node in the list sequentially from the first node.
    The time complexity of the algorithm isO(n)
int Length(LinkList L){
    int len=0;       //统计表长
    LNode *p = L;
    while(p->next != NULL){
        p = p->next;
        len++;
    }
    return len;
}

2.3.6. Establishment of singly linked list

  1. Step 1: Initialize a singly linked list
  2. Step 2: Take one data element each time and insert it into the end/header of the table
  • Tail insertion method to establish a single linked list
    Average time complexity O(n)
    Idea: Every time a new node is inserted into the end of the current linked list, a tail pointer r must be added to make it always point to the end node of the current linked list. Benefits: The order of the nodes in the generated linked list will be consistent with the order of the input data.
// 使用尾插法建立单链表L
LinkList List_TailInsert(LinkList &L){   
    int x;			//设ElemType为整型int  
    L = (LinkList)malloc(sizeof(LNode));     //建立头结点(初始化空表)     
    LNode *s, *r = L;                        //r为表尾指针    
    scanf("%d", &x);                         //输入要插入的结点的值   
    while(x!=9999){                          //输入9999表示结束     
        s = (LNode *)malloc(sizeof(LNode));    
        s->data = x;           
        r->next = s;           
        r = s;                               //r指针指向新的表尾结点     
        scanf("%d", &x);       
    }    
    r->next = NULL;                          //尾结点指针置空      
    return L;
}

  • The average time complexity of creating a singly linked list by head interpolationO(n)
LinkList List_HeadInsert(LinkList &L){       //逆向建立单链表
    LNode *s;
    int x;
    L = (LinkList)malloc(sizeof(LNode));     //建立头结点
    L->next = NULL;                          //初始为空链表,这步不能少!

    scanf("%d", &x);                         //输入要插入的结点的值
    while(x!=9999){                          //输入9999表结束
        s = (LNode *)malloc(sizeof(LNode));  //创建新结点
        s->data = x;
        s->next = L->next;
        L->next = s;                         //将新结点插入表中,L为头指针
        scanf("%d", &x);   
    }
    return L;
   
}

  • The idea of ​​the inversion
    algorithm of the linked list: the inverse linked list is initially empty, the nodes in the original list are "deleted" from the original linked list one by one, and then the headers of the inverted linked list are inserted one by one (that is, "the head is inserted" into the inverted linked list), Make it the "new" first node of the inverted linked list, and so on until the original linked list is empty;
LNode *Inverse(LNode *L)
{
	LNode *p, *q;
	p = L->next;     //p指针指向第一个结点
	L->next = NULL;  //头结点指向NULL

	while (p != NULL){
		q = p;
		p = p->next;
		q->next = L->next;  
		L->next = q;
	}
	return L;

2.3.7. Double linked list

  • Description of node types in doubly linked list
typedef struct DNode{            //定义双链表结点类型
    ElemType data;               //数据域
    struct DNode *prior, *next;  //前驱和后继指针
}DNode, *DLinklist;

  • Initialization of double-linked list (leading node)
typedef struct DNode{            //定义双链表结点类型
    ElemType data;               //数据域
    struct DNode *prior, *next;  //前驱和后继指针
}DNode, *DLinklist;

//初始化双链表
bool InitDLinkList(Dlinklist &L){
    L = (DNode *)malloc(sizeof(DNode));      //分配一个头结点
    if(L==NULL)                              //内存不足,分配失败
        return false;
    
    L->prior = NULL;   //头结点的prior指针永远指向NULL
    L->next = NULL;    //头结点之后暂时还没有结点
    return true;
}

void testDLinkList(){
    //初始化双链表
    DLinklist L;         // 定义指向头结点的指针L
    InitDLinkList(L);    //申请一片空间用于存放头结点,指针L指向这个头结点
    //...
}

//判断双链表是否为空
bool Empty(DLinklist L){
    if(L->next == NULL)    //判断头结点的next指针是否为空
        return true;
    else
        return false;
}

  • Insertion operation of double linked list
    InsertNextDNode(p, s): Insert s node after p node
bool InsertNextDNode(DNode *p, DNode *s){ //将结点 *s 插入到结点 *p之后
    if(p==NULL || s==NULL) //非法参数
        return false;
    
    s->next = p->next;
    if (p->next != NULL)   //p不是最后一个结点=p有后继结点  
        p->next->prior = s;
    s->prior = p;
    p->next = s;
    
    return true;
}
  • The deletion operation of the double linked list
    deletes the successor node of the p node
//删除p结点的后继结点
bool DeletNextDNode(DNode *p){
    if(p==NULL) return false;
    DNode *q =p->next;            //找到p的后继结点q
    if(q==NULL) return false;     //p没有后继结点;
    p->next = q->next;
    if(q->next != NULL)           //q结点不是最后一个结点
        q->next->prior=p;
    free(q);

    return true;
}

//销毁一个双链表
bool DestoryList(DLinklist &L){
    //循环释放各个数据结点
    while(L->next != NULL){
        DeletNextDNode(L);  //删除头结点的后继结点
    free(L); //释放头结点
    L=NULL;  //头指针指向NULL

    }
}

  • Traversal operation of double linked list
    forward traversal
while(p!=NULL){
    //对结点p做相应处理,eg打印
    p = p->prior;
}

backward traversal

while(p!=NULL){
    //对结点p做相应处理,eg打印
    p = p->next;
}

Note: The double-linked list cannot be accessed randomly, and the operations of bitwise search and value search can only be realized by traversal, and the time complexity isO(n)

2.3.8. Circular linked list

  • The pointer of the last node of the circular singly linked list
    is not NULL, but points to the head node
typedef struct LNode{            
    ElemType data;               
    struct LNode *next;  
}DNode, *Linklist;

/初始化一个循环单链表
bool InitList(LinkList &L){
    L = (LNode *)malloc(sizeof(LNode)); //分配一个头结点
    if(L==NULL)             //内存不足,分配失败
        return false;
    L->next = L;            //头结点next指针指向头结点
    return true;
}

//判断循环单链表是否为空(终止条件为p或p->next是否等于头指针)
bool Empty(LinkList L){
    if(L->next == L)
        return true;    //为空
    else
        return false;
}

//判断结点p是否为循环单链表的表尾结点
bool isTail(LinkList L, LNode *p){
    if(p->next == L)
        return true;
    else
        return false;
}

Comparison of single-linked list and circular single-linked list:
\bigstar single-linked list: starting from a node, only the subsequent nodes of the node can be found; most operations on the linked list are at the head or tail; set up a head pointer, and find the tail from the head node The time complexity = O(n), that is, the time complexity of O(n) is required to operate the tail of the table; circular singly linked list: starting from one
\bigstarnode, you can find any other node; set up a tail pointer, start from the tail The time complexity of finding the head is O(1), that is, the time complexity of O(1) is only required to operate on the head and tail of the table; the advantages of the circular singly linked list: the table can be found from any node in the
\bigstartable other nodes in .

  • Circular double-linked 
    list The prior of the head node points to the tail node, and the next of the tail node points to the head node
typedef struct DNode{          
    ElemType data;               
    struct DNode *prior, *next;  
}DNode, *DLinklist;

//初始化空的循环双链表
bool InitDLinkList(DLinklist &L){
    L = (DNode *) malloc(sizeof(DNode));    //分配一个头结点
    if(L==NULL)            //内存不足,分配失败
        return false;  
    L->prior = L;          //头结点的prior指向头结点
    L->next = L;           //头结点的next指向头结点
}

void testDLinkList(){
    //初始化循环单链表
    DLinklist L;
    InitDLinkList(L);
    //...
}

//判断循环双链表是否为空
bool Empty(DLinklist L){
    if(L->next == L)
        return true;
    else
        return false;
}

//判断结点p是否为循环双链表的表尾结点
bool isTail(DLinklist L, DNode *p){
    if(p->next == L)
        return true;
    else
        return false;
}

  • Insertion of circular linked list
bool InsertNextDNode(DNode *p, DNode *s){ 
    s->next = p->next;
    p->next->prior = s;
    s->prior = p;
    p->next = s;
  • Circular linked list deletion
//删除p的后继结点q
p->next = q->next;
q->next->prior = p;
free(q);

2.3.9. Static linked list

  • Singly linked list: each node is scattered in every corner of the memory, and each node has a pointer to the next node (the address of the next node in memory);

  • Static linked list: describe the linked storage structure of the linear list in the form of an array: allocate a whole piece of continuous memory space, and place each node centrally, including - data element and the array subscript (cursor) of the next node

  • Static linked list expressed in code
#define MaxSize 10        //静态链表的最大长度

struct Node{              //静态链表结构类型的定义
    ElemType data;        //存储数据元素
    int next;             //下一个元素的数组下标(游标)
};

//用数组定义多个连续存放的结点
void testSLinkList(){
    struct Node a[MaxSize];  //数组a作为静态链表, 每一个数组元素的类型都是struct Node
    //...
}

or it could be:

#define MaxSize 10        //静态链表的最大长度

typedef struct{           //静态链表结构类型的定义
    ELemType data;        //存储数据元素
    int next;             //下一个元素的数组下标
}SLinkList[MaxSize];

void testSLinkList(){
    SLinkList a;
}

Equivalent to:

#define MaxSize 10        //静态链表的最大长度

struct Node{              //静态链表结构类型的定义
    ElemType data;        //存储数据元素
    int next;             //下一个元素的数组下标(游标)
};

typedef struct Node SLinkList[MaxSize]; //重命名struct Node,用SLinkList定义“一个长度为MaxSize的Node型数组;


2.3.10. Comparison between sequenced list and linked list

【Logical structure】

  • Sequence lists and linked lists are both linear lists and are linear structures

【Storage structure】

  • Sequence table: Sequential storage

    • Advantages: support random access, high storage density
    • Disadvantages: It is inconvenient to allocate a large continuous space, and it is inconvenient to change the capacity
  • Linked list: linked storage

    • Advantages: Discrete small spaces are easy to allocate and change capacity is convenient
    • Disadvantages: no random access, low storage density

[Basic Operation - Create]

  • Sequence table: A large continuous space needs to be pre-allocated. If the allocated space is too small, it will be inconvenient to expand the capacity later; if the allocated space is too large, memory resources will be wasted;
  • Static allocation: static array, the capacity cannot be changed
  • Dynamic allocation: dynamic array, the capacity can be changed, but a large number of elements need to be moved, and the time cost is high (malloc(), free())
  • Linked list: only need to allocate a head node or only declare a head pointer

【Basic Operation - Destroy】

  • Static array - the system automatically reclaims space
  • Dynamic allocation: dynamic array - requires manual free()

[Basic Operation - Add/Delete]

  • Sequence table: Inserting/deleting elements requires moving subsequent elements backward/forward; time complexity = O(n), and the time overhead mainly comes from moving elements;

  • Linked list: Inserting/deleting elements only needs to modify the pointer; time complexity = O(n), and the time overhead mainly comes from finding the target element

[Basic Operation - Check]

sequence table

  • Bitwise lookup: O(1)
  • Search by value: O(n), if the elements in the table are ordered, they can be found in O(log2n) time

linked list

  • Bitwise lookup: O(n)
  • Lookup by value: O(n)

Comparison of sequential, chained, static, and dynamic storage methods
Inherent characteristics of sequential storage:

  • The logical order is consistent with the physical order, essentially using arrays to store each element of the linear table ( that is, random access ); the storage density is high, and the storage space utilization rate is high.

Inherent features of chained storage:

  • The relationship between elements is represented by the "pointer" information of the nodes where these elements are located ( insertion and deletion do not need to move nodes ).

Inherent characteristics of static storage:

  • Do not consider the allocation of additional memory during the running of the program .

Inherent characteristics of dynamic storage:

  • Can dynamically allocate memory; effectively use memory resources to make the program scalable.

Chapter 3 Stacks and Queues

3.1. Stack

3.1.1. The basic concept of the stack

  • A restricted linear list that only allows insertion or deletion at one end (the top of the stack).
  • Last In First Out (Last In First Out) LIFO.

  • Push order: a1 > a2 > a3 > a4 > a5
  • Stack order: a5 > a4 > a3 > a2 > a1 

3.1.2. Basic operation of the stack

  1. InitStack (& ​​S): initialization stack. Construct an empty stack S and allocate memory space.
  2. DestroyStack (& ​​S): Destroy the stack. Destroy and release the memory space occupied by stack S.
  3. Push(&S, x): push into the stack. If the stack S is not full, add x to make it the new top element of the stack.
  4. Pop(&S, &x): Pop out the stack. If the stack S is not empty, pop (delete) the top element of the stack and return with x.
  5. GetTop(S, &x): Read the top element of the stack. If the stack S is not empty, use x to return the top element of the stack.
  6. StackEmpty(S): empty. Determine whether a stack S is empty, if S is empty, return true, otherwise return false.

3.1.3. Sequential storage implementation of stack

[Definition of sequential stack]

#define MaxSize 10              //定义栈中元素的最大个数

typedef struct{
    ElemType data[MaxSize];     //静态数组存放栈中元素
    int top;                    //栈顶元素
}SqStack;

void testStack(){
    SqStack S;                 //声明一个顺序栈(分配空间)
                               //连续的存储空间大小为 MaxSize*sizeof(ElemType)
}

[ Initialization of the sequential stack ]

#define MaxSize 10
typedef struct{   
	ElemType data[MaxSize];    
    int top;
}SqStack;

// 初始化栈
void InitStack(SqStack &S){ 
    S.top = -1;                   //初始化栈顶指针
}

// 判断栈是否为空
bool StackEmpty(SqStack S){    
    if(S.top == -1)        
        return true;    
    else        
        return false;
}

[Push and pop the sequential stack]

// 新元素进栈
bool Push(SqStack &S, ElemType x){    // 判断栈是否已满    
    if(S.top == MaxSize - 1)        
        return false;    
    S.data[++S.top] = x;    
    return true;
}

// 出栈
bool Pop(SqStack &x, ElemType &x){    // 判断栈是否为空    
    if(S.top == -1)        
        return false;    
    x = S.data[S.top--];    
    return true;
}

[Read the top element of the stack] 

// 读栈顶元素
bool GetTop(SqStack S, ElemType &x){        
    if(S.top == -1)                
        return false;        
    x = S.data[S.top];        
    return true; 
}
  • Push operation: When the stack is not full, the top pointer of the stack is incremented by 1 first, and then the value is sent to the top element of the stack.S.data[++S.top] = x
  • Pop operation: When the stack is not empty, first take the value of the top element of the stack, and then decrement the top pointer of the stack by 1. x = S.data[S.top--]
  • Stack empty condition: S.top==-
  • Stack full condition: S.top==MaxSize-1
  • Stack length: S.top+1

[Shared stack (two stacks share the same space)]

  • Shared stack - special sequential stack
  • The bottom of the stack is designed at both ends of the shared space, and the top of the stack moves closer to the middle
#define MaxSize 10         //定义栈中元素的最大个数

typedef struct{
    ElemType data[MaxSize];       //静态数组存放栈中元素
    int top0;                     //0号栈栈顶指针
    int top1;                     //1号栈栈顶指针
}ShStack;

//初始化栈
void InitSqStack(ShStack &S){
    S.top0 = -1;        //初始化栈顶指针
    S.top1 = MaxSize;   
}

3.1.4. Stack chain storage

【Definition of chain stack】

  • Definition: A stack that uses chain storage is called a chain stack.
  • Advantages: The advantage of the chain stack is that it is convenient for multiple stacks to share storage space and improve their efficiency, and there is no stack overflow.
  • Features: Pushing and popping can only be done at the top end of the stack (the head of the chain is the top of the stack)

The head of the linked list is used as the top of the stack, which means:

  • 1. When implementing the data "pushing" operation, the data needs to be inserted from the head of the linked list;
  • 2. When implementing the data "popping" operation, it is necessary to delete the head node at the head of the linked list;

Therefore, the chain stack is actually a linked list that can only insert or delete data by head interpolation; the
chain storage structure of the stack can be described as:

【Definition of chain stack】

typedef struct Linknode{        
    ElemType data;        //数据域    
    Linknode *next;       //指针域
}Linknode,*LiStack;

void testStack(){   
    LiStack L;            //声明一个链栈
}

[Initialization of chain stack]

typedef struct Linknode{       
    ElemType data;      
    Linknode *next;
}Linknode,*LiStack;

// 初始化栈
bool InitStack(LiStack &L){    
    L = (Linknode *)malloc(sizeof(Linknode));   
    if(L == NULL)             
        return false;   
    L->next = NULL;    
    return true;
}

// 判断栈是否为空
bool isEmpty(LiStack &L){    
    if(L->next == NULL)      
        return true;   
    else           
        return false;
}

【Push in and out of stack】

// 新元素入栈
bool pushStack(LiStack &L,ElemType x){  
    Linknode *s = (Linknode *)malloc(sizeof(Linknode));  
    if(s == NULL)         
        return false;   
    s->data = x;     
    // 头插法      
    s->next = L->next;  
    L->next = s;     
    return true;
}

// 出栈
bool popStack(LiStack &L, int &x){     
    // 栈空不能出栈  
    if(L->next == NULL)     
        return false;    
    Linknode *s = L->next;  
    x = s->data;       
    L->next = s->next;
    free(s);       
    return true;
}

3.2. Queue

3.2.1. Basic concept of queue

  • A restricted linear table that only allows insertion at one end of the table (tail) and deletion at the other end (head) of the table.
  • Features: first-in-first-out (the elements that enter the team first go out), FIFO (First In First Out).

3.2.2. Basic operation of the queue

  1. InitQueue(&Q): Initialize the queue and construct an empty queue Q.
  2. QueueEmpty(Q): judges that the queue is empty, returns true if the queue Q is empty, otherwise returns false.
  3. EnQueue(&Qx): Enter the queue, if the queue Q is not full, add x to make it the new tail of the queue.
  4. DeQueue(&Q&x): Dequeue, if the queue Q is not empty, delete the element at the head of the queue and return with x.
  5. GetHead(Q&x): Read the queue head element, if the queue Q is not empty, use x to return the queue head element.
  6. ClearQueue(&Q): Destroy the queue and release the memory space occupied by the queue Q.

3.2.3. Sequential storage implementation of queue

  • Head pointer: pointing to the element at the head of the queue
  • Tail pointer: points to the next position of the element at the end of the queue

[Definition of sequential queue]

#define MaxSize 10;     //定义队列中元素的最大个数

typedef struct{     
    ElemType data[MaxSize];   //用静态数组存放队列元素     
    int front, rear;          //队头指针和队尾指针
}SqQueue;

void test{     
    SqQueue Q;                //声明一个队列
}

 [ Initialization of sequential queue]

#define MaxSize 10;

typedef struct{   
    ElemType data[MaxSize];  
    int front, rear;
}SqQueue;

// 初始化队列
void InitQueue(SqQueue &Q){    
    // 初始化时,队头、队尾指针指向0   
    // 队尾指针指向的是即将插入数据的数组下标  
    // 队头指针指向的是队头元素的数组下标
    Q.rear = Q.front = 0;
}

// 判断队列是否为空
bool QueueEmpty(SqQueue Q){     
    if(Q.rear == Q.front)            
        return true;   
    else          
        return false;
}

[Entry and dequeue (circular queue)]

// 新元素入队
bool EnQueue(SqQueue &Q, ElemType x){       
    // 如果队列已满直接返回
    if((Q.rear+1)%MaxSize == Q.front) 	//牺牲一个单元区分队空和队满   
        return false;    
    Q.data[Q.rear] = x;   
    Q.rear = (Q.rear+1)%MaxSize; 
    return true;
}

// 出队
bool DeQueue(SqQueue &Q, ElemType &x){    
    // 如果队列为空直接返回    
    if(Q.rear == Q.front)  
        return false;     
    x = Q.data[Q.front];  
    Q.front = (Q.front+1)%MaxSize;
    return true;
}

 【Obtain the team leader element】

// 获取队头元素并存入x
bool GetHead(SqQueue &Q, ElemType &x){
    if(Q.rear == Q.front)      
        return false;
    x = Q.data[Q.front];  
    return true;
}
  • The circular queue cannot use Q.rear == Q.front as the condition for empty judgment, because this condition is also met when the queue is full, which will conflict with the empty judgment!

Solution 1: Sacrifice a unit to distinguish between empty and full queues, that is, (Q.rear+1)%MaxSize == Q.front is used as the condition for judging whether the queue is full. (Mainstream method)
Solution 2: Set the size variable to record the queue length.

#define MaxSize 10; 

typedef struct{   
    ElemType data[MaxSize]; 
    int front, rear;    
    int size;
}SqQueue;

// 初始化队列
void InitQueue(SqQueue &Q){ 
    Q.rear = Q.front = 0;   
    Q.size = 0;
}

// 判断队列是否为空
bool QueueEmpty(SqQueue 0){     
    if(Q.size == 0)      
        return true;   
    else       
        return false;
}

// 新元素入队
bool EnQueue(SqQueue &Q, ElemType x){ 
    if(Q.size == MaxSize)    
        return false;
    Q.size++; 
    Q.data[Q.rear] = x; 
    Q.rear = (Q.rear+1)%MaxSize;  
    return true;
}

// 出队
bool DeQueue(SqQueue &Q, ElemType &x){   
    if(Q.size == 0)        
        return false;
    Q.size--;
    x = Q.data[Q.front]; 
    Q.front = (Q.front+1)%MaxSize; 
    return true;
}

 Solution 3 : Set the tag variable to record the latest operations of the queue. ( tag=0: the most recent operation is a delete operation; tag=1 : the most recent operation is an insertion operation)

#define MaxSize 10;   

typedef struct{    
    ElemType data[MaxSize]; 
    int front, rear;        
    int tag;
}SqQueue;

// 初始化队列
void InitQueue(SqQueue &Q){    
    Q.rear = Q.front = 0;   
    Q.tag = 0;
}

// 判断队列是否为空,只有tag==0即出队的时候才可能为空
bool QueueEmpty(SqQueue 0){  
    if(Q.front == Q.rear && Q.tag == 0)    
        return true;   
    else       
        return false;
}

// 新元素入队
bool EnQueue(SqQueue &Q, ElemType x){
    if(Q.rear == Q.front && tag == 1)     
        return false;     
    Q.data[Q.rear] = x; 
    Q.rear = (Q.rear+1)%MaxSize;  
    Q.tag = 1;  
    return true;
}

// 出队
bool DeQueue(SqQueue &Q, ElemType &x){
    if(Q.rear == Q.front && tag == 0)  
        return false;   
    x = Q.data[Q.front];
    Q.front = (Q.front+1)%MaxSize; 
    Q.tag = 0;     
    return true;
}

3.2.4. Linked storage implementation of queue

[Definition of chain queue]

// 链式队列结点
typedef struct LinkNode{  
    ElemType data;    
    struct LinkNode *next;
}

// 链式队列
typedef struct{       
    // 头指针和尾指针  
    LinkNode *front, *rear;
}LinkQueue;

Initialization of the chain queue (leading node)]

typedef struct LinkNode{    
    ElemType data;     
    struct LinkNode *next;
}LinkNode;

typedef struct{    
    LinkNode *front, *rear;
}LinkQueue;

// 初始化队列
void InitQueue(LinkQueue &Q){   
    // 初始化时,front、rear都指向头结点 
    Q.front = Q.rear = (LinkNode *)malloc(sizeof(LinkNode));  
    Q.front -> next = NULL;
}

// 判断队列是否为空
bool IsEmpty(LinkQueue Q){ 
    if(Q.front == Q.rear)     
        return true;      
    else         
        return false;
}

[ Enter the team and leave the team (lead node)]

// 新元素入队
void EnQueue(LinkQueue &Q, ElemType x){ 
    LinkNode *s = (LinkNode *)malloc(sizeof(LinkNode)); 
    s->data = x;  
    s->next = NULL; 
    Q.rear->next = s;  
    Q.rear = s;
}

// 队头元素出队
bool DeQueue(LinkQueue &Q, ElemType &x){   
    if(Q.front == Q.rear)         
        return false;    
    LinkNode *p = Q.front->next; 
    x = p->data;   
    Q.front->next = p->next; 
    // 如果p是最后一个结点,则将队头指针也指向NULL  
    if(Q.rear == p)          
        Q.rear = Q.front;   
    free(p);     
    return true;
}

[Chain queue operation without a leader node ]

typedef struct LinkNode{   
    ElemType data;  
    struct LinkNode *next;
}LinkNode;

typedef struct{   
    LinkNode *front, *rear;
}LinkQueue;

// 初始化队列
void InitQueue(LinkQueue &Q){ 
    // 不带头结点的链队列初始化,头指针和尾指针都指向NULL
    Q.front = NULL;   
    Q.rear = NULL;
}

// 判断队列是否为空
bool IsEmpty(LinkQueue Q){ 
    if(Q.front == NULL)   
        return true;      
    else             
        return false;
}

// 新元素入队
void EnQueue(LinkQueue &Q, ElemType x){ 
    LinkNode *s = (LinkNode *)malloc(sizeof(LinkNode));  
    s->data = x;   
    s->next = NULL; 
    // 第一个元素入队时需要特别处理   
    if(Q.front == NULL){
        Q.front = s;    
        Q.rear = s; 
    }else{
        Q.rear->next = s;
        Q.rear = s;
    }
}

//队头元素出队
bool DeQueue(LinkQueue &Q, ElemType &x){
    if(Q.front == NULL)
        return false;
    LinkNode *s = Q.front;
    x = s->data;
    if(Q.front == Q.rear){
        Q.front = Q.rear = NULL;
    }else{
        Q.front = Q.front->next;
    }
    free(s);
    return true;
}

3.2.5. Double-ended queue

double-ended queue definition 

  • A double-ended queue is a linear list that allows insertion at both ends and deletion at both ends.
  • If only the insertion and deletion operations at one end are used, it is equivalent to a stack.
  • Input-restricted double-ended queue: a linear list that allows insertion at one end and deletion at both ends.
  • Output-limited double-ended queue: a linear list that allows insertion at both ends and deletion at one end.

Double-ended queue test points: judging the legalization of the output sequence

  • Example: The input sequence of data elements is 1, 2, 3, 4, judge 4! = 24 legality of the output sequence input-
           restricted double-ended queue: only 4213 and 4231 illegal
           output-restricted double-ended queue: only 4132 and 4231 Illegal

3.3. Application of stack and queue

3.3.1 Application of stack in bracket matching

  • Use a stack to implement parenthesis matching:
    1. The last opening parenthesis is matched first (stack property - LIFO).
    2. When a left parenthesis is encountered, it is pushed onto the stack.
    3. When a closing parenthesis is encountered, a left parenthesis is "consumed" (popped).
  • Failed to match:
    1. If a closing parenthesis is scanned and the stack is empty, the closing parenthesis is single.
    2. After scanning all the brackets, if the stack is not empty, then the left bracket is single.
    3. Left and right brackets do not match.
#define MaxSize 10 
typedef struct{    
    char data[MaxSize];   
    int top;
}SqStack;

void InitStack(SqStack &S);
bool StackEmpty(SqStack &S);
bool Push(SqStack &S, char x);
bool Pop(SqStack &S, char &x);

// 判断长度为length的字符串str中的括号是否匹配
bool bracketCheck(char str[], int length){ 
    SqStack S;      
    InitStack(S); 
    // 遍历str    
    for(int i=0; i<length; i++){   
        // 扫描到左括号,入栈     
        if(str[i] == '(' || str[i] == '[' || str[i] == '{'){    
            Push(S, str[i]);        
        }else{              
            // 扫描到右括号且栈空直接返回   
            if(StackEmpty(S))      
                return false;       
            char topElem;          
            // 用topElem接收栈顶元素   
            Pop(S, topElem);          
            // 括号不匹配           
            if(str[i] == ')' && topElem != '(' ) 
                return false;           
            if(str[i] == ']' && topElem != '[' )  
                return false;   
            if(str[i] == '}' && topElem != '{' )   
                return false;              }   
    }  
    // 扫描完毕若栈空则说明字符串str中括号匹配    
    return StackEmpty(S);
}

3.3.2. Application of stack in expression evaluation 

  • Infix expression : Infix expression is a general expression method of arithmetic or logical formula, and the operator is in the middle of the operand in the form of infix. Infix expressions are very complicated for computers, so when calculating the value of an expression, it is usually necessary to convert the infix expression to a prefix or postfix expression before evaluating it.
  • Prefix expression (Polish expression): The operator of a prefix expression precedes the two operands.
  • Postfix Expression (Reverse Polish Expression): The operator of a postfix expression comes after the two operands.

Convert infix expression to suffix expression - manual calculation
Step 1: Determine the operation order of each operator in the infix expression
Step 2 : Select the next operator, and combine it according to the method of [left operand right operand operator] A new operand
step 3 : If there are still operators not processed, continue to step 2

"Left priority" principle: As long as the operator on the left can be calculated first, the operator on the left will be calculated first (the order of operation is guaranteed to be unique);

中缀:A + B - C * D / E + F
       ①   ④   ②   ③   ⑤     
后缀:A B + C D * E / - F +

Calculation of suffix expressions—hand calculation:
scan from left to right, and whenever an operator is encountered, the two closest operands in front of the operator will perform the corresponding operation, and combine them into one operand

Calculation of suffix expressions—computer calculation
Use the stack to realize the calculation of suffix expressions (the stack is used to store operands whose operation order cannot be determined temporarily)
Step 1: Scan the next element from left to back until all elements are processed;
step 2: If the operand is scanned, push it onto the stack, and return to step 1; otherwise, perform step 3;
Step 3: if the operator is scanned, pop two top elements of the stack, perform the corresponding operation, and push the operation result back Top of the stack, back to step 1;

Infix expression to postfix expression (computer calculation) 
initializes a stack for saving operators whose order of operation cannot be determined for the time being and processes each element from left to right until the end. There may be three situations:
1. Encountered operands: directly add the suffix expression.
2. Encountering a delimiter: Encountering "(" directly into the stack; encountering ")" will pop up operators in the stack and add suffix expressions in turn until "(" pops up. Note: "(" does not add suffix expressions 3.
When encountering an operator: pop all operators whose priority is higher than or equal to the current operator in the stack in turn, and add a suffix expression, if it encounters "(" or the stack is empty, it will stop. After that, the current operation The character is pushed onto the stack.

#define MaxSize 40 
typedef struct{     
    char data[MaxSize];   
    int top;
}SqStack;

typedef struct{  
    char data[MaxSize];  
    int front,rear;
}SqQueue;

void InitStack(SqStack &S);
bool StackEmpty(SqStack S);
bool Push(SqStack &S, char x);
bool Pop(SqStack &S, char &x);
void InitQueue(SqQueue &Q);
bool EnQueue(LQueue &Q, char x);
bool DeQueue(LQueue &Q, char &x);
bool QueueEmpty(SqQueue Q);

// 判断元素ch是否入栈
int JudgeEnStack(SqStack &S, char ch){
    char tp = S.data[S->top];   
    // 如果ch是a~z则返回-1    
    if(ch >= 'a' && ch <= 'z')   
        return -1;    
    // 如果ch是+、-、*、/且栈顶元素优先级大于等于ch则返回0  
    else if(ch == '+' && (tp == '+' || tp == '-' || tp == '*' || tp == '/'))   
        return 0;     
    else if(ch == '-' && (tp == '+' || tp == '-' || tp == '*' || tp == '/'))   
        return 0;  
    else if(ch == '*' && (tp == '*' || tp == '/'))  
        return 0;    
    else if(ch == '/' && (tp == '*' || tp == '/'))     
        return 0;    
    // 如果ch是右括号则返回2   
    else if(ch == ')')      
        return 2;     
    // 其他情况ch入栈,返回1   
    else return 1;
}

// 中缀表达式转后缀表达式
int main(int argc, char const *argv[]) {  
    SqStack S;     
    SqQueue Q;	 
    InitStack(S); 
    InitQueue(Q);  
    char ch;	  
    printf("请输入表达式,以“#”结束:");  
    scanf("%c", &ch);   
    while (ch != '#'){  
        // 当栈为空时     
        if(StackEmpty(&S)){ 
            // 如果输入的是数即a~z,直接入队 
            if(ch >= 'a' && ch <= 'z')               
                EnQueue(Q, ch);      	
            // 如果输入的是运算符,直接入栈    
            else                      
                Puch(S, ch);       
        }else{                
            // 当栈非空时,判断ch是否需要入栈 
            int n = JudgeEnStack(S, ch);     
            // 当输入是数字时直接入队      	
            if(n == -1){        	    
                EnQueue(Q, ch);        
            }else if(n == 0){       
                // 当输入是运算符且运算符优先级不高于栈顶元素时    
                while (1){         
                    // 取栈顶元素入队    
                    char tp;        
                    Pop(S, tp);      
                    EnQueue(Q, tp);         
                    // 再次判断是否需要入栈     
                    n = JudgeEnStack(S, ch);
                    // 当栈头优先级低于输入运算符或者栈头为‘)’时,入栈并跳出循环  
                    if(n != 0){           
                        EnStack(S, ch);           
                        break;              
                    }                   
                }            
            }else if(n == 2){  
                // 当出现‘)’时 将()中间的运算符全部出栈入队   
                while(1){                
                    char tp;                
                    Pop(S, tp);             
                    if(tp == '(')          
                        break;        
                    else            
                        EnQueue(Q, tp);    
                }             
            }else{        
                // 当运算符优先级高于栈顶元素或出现‘(’时直接入栈     
                Push(S, ch);         
            }          
        }         
        scanf("%c", &ch);   
    }     
    // 将最后栈中剩余的运算符出栈入队 
    while (!StackEmpty(S)){	  
        char tp;            
        Pop(S, tp);      
        EnQueue(Q, tp);  
    }      
    // 输出队中元素 
    while (!QueueEmpety(Q)){    
        printf("%c ", DeQueue(Q));  
    }    
    return 0;
}

Use the stack to realize the calculation of infix expressions:
     1. Initialize two stacks, the operand stack and the operator stack;
     2. If the operand is scanned, push it into the operand stack;
     3. If the operator or delimiter is scanned, Then push into the operator stack according to the same logic of "infix to suffix" (the operator will also be popped up during the period, and whenever an operator is popped up, it is necessary to pop up the top elements of the two operand stacks and perform the corresponding operation. The result of the operation is pushed back to the operand stack) 

3.3.3. Application of stack in recursion

Features of function calls: the last called function is executed first (LIFO)

When the function is called, a stack storage is required:

  • call return address
  • Arguments
  • local variable

When calling recursively, the function call stack is called "recursive working stack":

  • Every time a level of recursion is entered, the information required for the recursive call is pushed onto the top of the stack;
  • Every time a level of recursion is exited, the corresponding information is popped from the top of the stack;

Disadvantages: too many layers of recursion may lead to stack overflow; suitable for "recursive" algorithm to solve: the original problem can be converted into a problem with the same attributes but a smaller scale

3.3.4. Application of Queue

  1. Queue Application: Hierarchical Traversal of Trees
  2. Queue Application: Breadth-First Traversal of Graphs
  3. Queue application: When multiple processes in the operating system are competing to use limited system resources, the First Come First Service algorithm (First Come First Service) is a common strategy.

3.4. Compressed storage of special matrices 

3.4.1  Storage of arrays

Storage of one-dimensional arrays : Each array element has the same size and is physically contiguously stored. Let the starting address be LOC, then a[i]the storage address of array elements = LOC + i * sizeof(ElemType) (0≤i<10)

Two- dimensional array storage  : 1.  In a two-dimensional array with M rows and N columns , set the starting address as LOC. If the row is stored first, then the storage address =         2.  In a two-dimensional array with M rows and N columns , set the starting address The starting address is LOC, if the column-first storage is used, then the storage address =
        b[M][N]b[i][j]LOC + (i*N + j) * sizeof(ElemType)
b[M][N]b[i][j]LOC + (i*N + j) * sizeof(ElemType)

 

3.4.2 Compressed storage of symmetric matrices

         Compressed storage of symmetric matrix: If any element in the n-order square matrix has any element a_{i,j}, a_{i,j}=a_{j,i}the matrix is ​​a symmetric matrix. For a symmetric matrix, only the main diagonal + lower triangular area needs to be stored. If each element is stored in a one-dimensional array according to the principle of row priority, that is, a_{i,j}stored B[k]in the array, then the array B[k]has \frac{n(n-1)}{2}+1a total of elements. For k, there are:

\left\{\begin{matrix}\frac{i(i-1)}{2}+j-1,i\geqslant j & \\\frac{n(n-1)}{2},i< j & & \end{matrix}\right.

3.4.3  Compressed storage of triangular matrices

  1. Lower triangular matrix: Except for the main diagonal and the lower triangular area, the rest of the elements are the same.

  2. Upper triangular matrix: Except for the main diagonal and the upper triangular area, the rest of the elements are the same.

  3. Compressed storage strategy: store the main diagonal + lower triangle in a one-dimensional array according to the principle of row priority, and store a constant in the last position. That is, a_{i,j}stored in the array B[k], then the array B[k]has \frac{n(n-1)}{2}+1a total of elements. For k, there are:

 \left\{\begin{matrix} \frac{i(i-1)}{2}+j-1, i\geqslant j& \\ \frac{n(n-1)}{2}, i<j& \end{matrix}\right.

 

3.4.4 Compressed storage of tridiagonal matrices

三对角矩阵,又称带状矩阵: 当|i-j|>1时,有a_{i,j} =0(1\leqslant i,j\leqslant n)。对于三对角矩阵,按行优先原则,只存储带状部分,即a_{i,j}存入到数组B[k]中,那么k = 3i + j - 3。若已知数组下标k,则i=\left \lfloor (k+1)/3+1) \right \rfloor 。

3.4.5 稀疏矩阵的压缩存储

稀疏矩阵的非零元素远远少于矩阵元素的个数。压缩存储策略:

  • 顺序存储:三元组 <行,列,值>

  • 链式存储:十字链表法 

第四章 串

4.1. 串的基本概念

  • 串,即字符串 (String) 是由零个或多个字符组成的有限序列。一般记为S='a1a2.....·an'(n>=0)
  • 其中,S是串名,单引号括起来的字符序列是串的值;a;可以是字母、数字或其他字符;串中字符的个数n称为串的长度。n =0时的串称为空串 。

例:

S="HelloWorld!"
T='iPhone 11 Pro Max?'
  • 子串:串中任意个连续的字符组成的子序列。
  • 主串:包含子串的串。
  • 字符在主串中的位置:字符在串中的序号。子串在主串中的位置:子串的第一个字符在主串中的位置。
  • 串是一种特殊的线性表,数据元素之间呈线性关系
  • 串的数据对象限定为字符集(如中文字符、英文字符、数字字符、标点字符等)
  • 串的基本操作,如增删改查等通常以子串为操作对象。

4.2. 串的基本操作

假设有串 T = '', S = 'iPhone 11 Pro Max?', W = 'Pro'

  1. StrAssign(&T, chars): 赋值操作,把串T赋值为chars。
  2. StrCopy(&T, S)::复制操作,把串S复制得到串T。
  3. StrEmpty(S):判空操作,若S为空串,则返回TRUE,否则返回False。
  4. StrLength(S):求串长,返回串S的元素个数。
  5. ClearString(&S):清空操作,将S清为空串。
  6. DestroyString(&S):销毁串,将串S销毁(回收存储空间)。
  7. Concat(&T, S1, S2): serial connection, use T to return a new string formed by the connection of S1 and S2.
  8. SubString(&Sub, S, pos, len) finds a substring, and uses Sub to return a substring whose length is len from the character pos of the string S.
  9. Index(S, T): Positioning operation, if there is a substring with the same value as T in the main string S, then return the position where it appears for the first time in the main string S, otherwise the function value is 0.
  10. StrCompare(S, T): string comparison operation, refer to the English dictionary sorting method; if S > T, return value > 0; S = T, return value = 0 (need two strings are exactly the same); S < T, return value<0.

4.3. String storage implementation

4.3.1  Static array implementation

Static array implementation (fixed-length sequential storage) 

#define MAXLEN 255   //预定义最大串长为255

typedef struct{
    char ch[MAXLEN];   //静态数组实现(定长顺序存储)
                       //每个分量存储一个字符
                       //每个char字符占1B
    int length;        //串的实际长度
}SString;

 Dynamic array implementation (heap allocated storage)

typedef struct{
    char *ch;          //按串长分配存储区,ch指向串的基地址
    int length;        //串的实际长度
}HString;
HString S;
S.ch = (char*)malloc(MAXLEN *sizeof(char));
S.length = 0;

4.3.2  Realization of basic operations

#define MAXLEN 255

typedef struct{
    char ch[MAXLEN];   
    int length;       
}SString;

// 1. 求子串
bool SubString(SString &Sub, SString S, int pos, int len){
    //子串范围越界
    if (pos+len-1 > S.length)
        return false;
    
    for (int i=pos; i<pos+len; i++)
        Sub.cn[i-pos+1] = S.ch[i];
    
    Sub.length = len;

    return true;
}

// 2. 比较两个串的大小
int StrCompare(SString S, SString T){
    for (int i; i<S.length && i<T.length; i++){
        if(S.ch[i] != T.ch[i])
            return S.ch[i] - T.ch[i];
    }
    //扫描过的所有字符都相同,则长度长的串更大
    return S.length - T.length;
}

// 3. 定位操作
int Index(SString S, SString T){
    int i=1;
    n = StrLength(S);
    m = StrLength(T);
    SString sub;        //用于暂存子串

    while(i<=n-m+1){
        SubString(Sub,S,i,m);
        if(StrCompare(Sub,T)!=0)
            ++i;
        else 
            return i;    // 返回子串在主串中的位置
    }
    return 0;            //S中不存在与T相等的子串
}

4.4. Naive pattern matching of strings

  • String pattern matching: Find the same substring as the pattern string in the main string, and return its position in the main string.

Naive pattern matching algorithm (simple pattern matching algorithm) idea:

  • Extract the substrings of the same length as the pattern string in the main string, and compare them with the pattern string one by one. When the substring does not match a corresponding character in the pattern string, immediately abandon the current substring and search for the next substring.
  • If the length of the pattern string is m and the length of the main string is n, it will take at most (n-m+1)*m comparisons until the match succeeds/fails. Worst time complexity: 0(nm)
  • Worst case: the first m-1 characters of each substring match the pattern string, only the mth character does not match.
  • Better case: the first character of each substring does not match the pattern string

String simple pattern matching algorithm code implementation:

// 在主串S中找到与模式串T相同的子串并返回其位序,否则返回0
int Index(SString S, SString T){   
    int k=1;    
    int i=k, j=1;  
    while(i<=S.length && j<=T.length){    
        if(S.ch[i] == T.ch[j]){     
            ++i; ++j; 
        }else{        
            k++; i=k; j=1; 
        }   
    }   
    if(j>T.length) 
        return k;   
    else       
        return 0;
}

Or the way without k: 

int Index(SString S, SString T){
    int i=1;                //扫描主串S
    int j=1;                //扫描模式串T
    while(i<=S.length && j<=T.length){
        if(S.ch[i] == T.ch[j]){
            ++i;
            ++j;             //继续比较后继字符
        }
        else{
            i = i-j+2;
            j=1;             //指针后退重新开始匹配
        }
    }
    if(j>T.length)
        return i-T.length;
    else
        return 0;
}

Time complexity: Let the length of the pattern string be m, and the length of the main string be n

  • The best time complexity for a successful match:O(m)
  • Best time complexity for matching failures:O(n)
  • Worst time complexity:O(mn)

4.5. KPM algorithm

algorithm thinking

  • The disadvantage of the simple pattern matching algorithm: when some substrings can partially match the pattern string, the scan pointer i of the main string often backtracks, resulting in increased time overhead. worst time complexity O(mn).
  • KMP algorithm: when the substring does not match the pattern string, the main string pointer i does not backtrack, and the pattern string pointer j = next[j] algorithm average time complexity: O(m+n).

Find the next array of the pattern string

  • String prefix: a substring that contains the first character and does not contain the last character.
  • String suffix: A substring that contains the last character and does not contain the first character.
  • When the i-th character fails to match, the string s composed of the first 1~j-1 characters, next[i]=the longest equal prefix and suffix length of S+1 In particular, next[1]=0.

KPM algorithm code implementation:

// 获取模式串T的next[]数组
void getNext(SString T, int next[]){ 
    int i=1, j=0;  
    next[1]=0;  
    while(i<T.length){   
        if(j==0 || T.ch[1]==T.ch[j]){ 
            ++i; ++j;      
            next[i]=j;  
        }else      
            j=next[j]; 
    }
}

// KPM算法,求主串S中模式串T的位序,没有则返回0
int Index_KPM(SString S, SString T){   
    int i=1, j=1;  
    int next[T.length+1]; 
    getNext(T, next);  
    while(i<=S.length && j<=T.length){  
        if(j==0 || S.ch[i]==T.ch[j]){   
            ++i; ++j;   
        }else   
            j=next[j];   
    }    
    if(j>T.length)   
        return i-T.length;  
    else
        return 0;
}

int main() {
	SString S={"ababcabcd", 9};
	SString T={"bcd", 3};
	printf("%d ", Index_KPM(S, T));	//输出9
}

Further optimization of the KPM algorithm: improve the next array:

void getNextval(SString T, int nextval[]){
    int i=1,j=0;
    nextval[1]=0;
    while(i<T.length){
        if(j==0 || T.ch[i]==T.ch[j]){
            ++i; ++j;
            if(T.ch[i]!=T.ch[j])
                nextval[i]=j;
            else
                nextval[i]=nextval[j];
        }else
            j=nextval[j];
    }
}

Chapter 5 Figure

5.1. The concept of a tree

5.1.1. Basic definition of a tree

Tree: a finite set of n (n>=0) nodes, which is a logical structure. When n=0, it is an empty tree, and a non-empty tree satisfies:

  • There is one and only one specific node called the root.
  • When n>1, the remaining nodes can be divided into m (m >0) finite sets that are mutually disjoint T1,T2,...,Tm, each of which is itself a tree, and is called a subtree of the root node.

A tree is a recursive data structure

Non-empty tree features:

  • has one and only one root node
  • Nodes without successors are called "leaf nodes" (or terminal nodes)
  • Nodes with successors are called "branch nodes" (or non-terminal nodes)
  • Except for the root node, any node has one and only one predecessor
  • Each node can have 0 or more successors

basic terms

  • Ancestor node: all above its own are ancestor nodes.
  • Descendant node: All descendant nodes are under one's own.
  • Parent node (parent node): The last one connected to itself is the parent node.
  • Child node: the next one connected to itself.
  • Brother node: I have the same parent node.
  • Cousin nodes: Nodes in the same layer.

Attributes:

  • Hierarchy (depth) of nodes -- counting from top to bottom
  • The height of the node - counting from bottom to top
  • The height (depth) of the tree - how many layers in total
  • The degree of the node - how many children (branches) there are
  • Degree 1 of the tree - the maximum value of the degree of each node

Ordered and Unordered Trees

  • Ordered tree - Logically, the subtrees of the nodes in the tree are ordered from left to right and cannot be interchanged
  • Unordered tree - Logically, the subtrees of the nodes in the tree are out of order from left to right and can be interchanged

A forest is a collection of m (>=0) mutually disjoint trees.

5.1.2. Common properties of trees

  • Common test point 1 : the number of nodes in the tree = total degree + 1
  • Common test point 2 : The difference between a tree with degree m and an m-fork tree: the degree of a tree-the maximum value of the degree of each node; m-fork tree-a tree with a maximum of m children for each node
  • Common test point 3m^{i}-1 : There is at most one node in the i-th layer of a tree with degree m
  • Common test point 4 : An m-ary tree with a height of h has at most \frac{m^{h}-1}{m-1}one node.
  • Common test point 5 : An m-ary tree with a height of h has at least h nodes; a tree with a height of h and degree m has at least h+m-1 nodes.
  • Common test point 6 : The minimum height of an m-ary tree with n nodes is\log_{m}[n(m-1)+1]

5.2. Binary tree

5.2.1. Definition of binary tree

A binary tree is a finite set of n (n>=0) nodes:

  • Or an empty binary tree, ie n = 0.
  • Or it consists of a root node and two disjoint left and right subtrees called the root. The left subtree and the right subtree are respectively a binary tree.

Features:

  • Each node has at most two subtrees
  • The left and right subtrees cannot be reversed (a binary tree is an ordered tree)
  • A binary tree can be an empty set, and the root can have an empty left subtree and an empty right subtree

5.2.2. Special binary trees

Full Binary Tree : A binary tree with depth k and one 2^{k-1}node is called a full binary tree. Features:

  • The number of nodes on each layer reaches the maximum
  • The leaves are all at the lowest level.
  • Numbering starts from 1 according to the sequence, the left child of node i is 2i, the right child is 2i+1; the parent node of node i is [i/2]

Complete binary tree :

A binary tree with n nodes of depth k is called a complete binary tree if and only if each node corresponds to the nodes numbered 1~n in the full binary tree of depth k. Features:

  • Only the last two layers may have leaf nodes
  • There is at most one node with degree 1
  • Numbering starts from 1 according to the sequence, the left child of node i is 2i, the right child is 2i+1; the parent node of node i is [i/2]
  • i<=[n/2] is a branch node, i>[n/2] is a leaf node

Binary sort tree: A binary tree is either an empty binary tree, or a binary tree with the following properties:

  • The keywords of all nodes on the left subtree are smaller than the keywords of the root node;
  • The keywords of all nodes on the right subtree are greater than the keywords of the root node;
  • The left subtree and the right subtree are each a binary sorted tree.

Balanced binary tree : The difference between the depths of the left subtree and the right subtree of any node on the tree does not exceed 1.

5.2.3. Properties of Binary Trees

Common test point 1: Let the number of nodes with degrees 0, 1 and 2 in a non-empty binary tree be n_{0}, n_{1}, and respectively n_{2}, then n_{0}=n_{2}+1(the leaf node is one more than the binary branch node)

Common test point 2: The binary tree ihas at most 2^{i-1}one node ( i>=1); the m-ary tree ihas at most m^{i-1}one node ( i>=1)

Common test point 3: A binary tree with a height of h has at most 2^{h}-1one node (full binary tree); an m-ary tree with a height of h has at most \frac{m^{h}-1}{m-1}nodes

Common test point 4: The height h of a complete binary tree with n (n>0) nodes is \log_{2}(n+1)or \log_{2}n+1.

Common test point 5: For a complete binary tree, the number of nodes with degrees 0, 1 and 2 can be deduced from the number of summary points n, n_{0}and n_{1}the n_{2}
derivation process:
        because: n_{0} = n_{2}+1: so n_{0} + n_{2}it is an odd number
        and because: n=n_{0} +n_{1}+n_{2}
        therefore: if a complete binary tree has an even number of n nodes , n_{1}then it is 1; n_{0}for \frac{n}{2}; n_{2}for \frac{n}{2}-1
        if the complete binary tree has an odd number of n nodes, n_{1}it is 0; n_{0}for \frac{n+1}{2}; n_{2}for\frac{n+1}{2}-1    

5.2.4. Binary tree storage implementation

Sequential storage of the binary tree:
In the sequential storage of the binary tree, the node numbers of the binary tree must be corresponding to the complete binary tree;

Several important basic operations for frequent examinations:

  • i's left child: 2i
  • i's right child: 2i+1
  • i's parent node: i/2
  • the level at which i is located: \log_{2}(n+1)or\log_{2}n+1

If there are n nodes in the complete binary tree, then

  • Determine if i has a left child?   2i\leqslant n ?
  • Determine whether i has a right child?  2i+1\leqslant n ?
  • Determine whether i is a leaf/branch node?   i>n/2 ?

#define MaxSize 100

struct TreeNode{
   ElemType value; //结点中的数据元素
   bool isEmpty;   //结点是否为空
}

main(){
   TreeNode t[MaxSize];
   for (int i=0; i<MaxSize; i++){
      t[i].isEmpty = true;
   }
}

chain storage

//二叉树的结点

struct ElemType{
   int value;
};

typedef struct BiTnode{
   ElemType data;          //数据域
   struct BiTNode *lchild, *rchild; //左、右孩子指针
}BiTNode, *BiTree;

//定义一棵空树
BiTree root = NULL;

//插入根节点
root = (BiTree) malloc (sizeof(BiTNode));
root -> data = {1};
root -> lchild = NULL;
root -> rchild = NULL;

//插入新结点
BiTNode *p = (BiTree) malloc (sizeof(BiTNode));
p -> data = {2};
p -> lchild = NULL;
p -> rchild = NULL;
root -> lchild = p; //作为根节点的左孩子

  • In the sequential storage of the binary tree, it is necessary to correspond the node numbers of the binary tree with the complete binary tree
  • Worst case: a single tree with height h and only h nodes (all nodes have only right children), also requires at least 2^h-1 storage units
  • Conclusion: The sequential storage structure of binary tree is only suitable for storing complete binary tree

5.3. Binary tree traversal and thread binary tree

5.3.1. Inorder traversal of binary tree

  • Traversal: Visit all nodes in a certain order.
  • Hierarchical traversal: ordering rules determined based on the hierarchical properties of the tree

 The recursive characteristics of the binary tree:
[1] either an empty binary tree
[2] or a binary tree composed of "root node + left subtree + right subtree"

[Binary tree first and then traversal]

  • Preorder traversal: root left and right (NLR)
typedef struct BiTnode{
   ElemType data;          
   struct BiTNode *lchild, *rchild; 
}BiTNode, *BiTree;

void PreOrder(BiTree T){
   if(T!=NULL){
      visit(T);                 //访问根结点
      PreOrder(T->lchild);      //递归遍历左子树
      PreOrder(T->rchild);      //递归遍历右子树
   }
}

  • Inorder traversal: left root right (LNR)
typedef struct BiTnode{
   ElemType data;          
   struct BiTNode *lchild, *rchild; 
}BiTNode, *BiTree;

void InOrder(BiTree T){
   if(T!=NULL){
      InOrder(T->lchild);       //递归遍历左子树
      visit(T);                 //访问根结点
      InOrder(T->rchild);       //递归遍历右子树
   }
}

  • Post-order traversal: left and right root (LRN)
typedef struct BiTnode{
   ElemType data;          
   struct BiTNode *lchild, *rchild; 
}BiTNode, *BiTree;

void PostOrder(BiTree T){
   if(T!=NULL){
      PostOrder(T->lchild);       //递归遍历左子树    
      PostOrder(T->rchild);       //递归遍历右子树
      visit(T);                 //访问根结点
   }
}

5.3.2. Level order traversal of binary tree

Algorithm idea:

  • 1. Initialize an auxiliary queue
  • 2. The root node joins the queue
  • 3. If the queue is not empty, the head node of the queue will go out of the queue, visit the node, and insert the child into the tail of the queue (if any)
  • 4. Repeat 3 until the queue is empty
//二叉树的结点(链式存储)
typedef struct BiTnode{
   ElemType data;          
   struct BiTNode *lchild, *rchild; 
}BiTNode, *BiTree;

//链式队列结点
typedef struct LinkNode{
   BiTNode * data;
   typedef LinkNode *next;
}LinkNode;

typedef struct{
   LinkNode *front, *rear;  
}LinkQueue;

//层序遍历
void LevelOrder(BiTree T){
   LinkQueue Q;
   InitQueue (Q);          //初始化辅助队列
   BiTree p;
   EnQueue(Q,T);           //将根节点入队
   while(!isEmpty(Q)){     //队列不空则循环
      DeQueue(Q,p);        //队头结点出队
      visit(p);            //访问出队结点
      if(p->lchild != NULL)
         EnQueue(Q,p->lchild);   //左孩子入队
      if(p->rchild != NULL)
         EnQueue(Q,p->rchild);   //右孩子入队
   }
}

5.3.3. Constructing a binary tree from a traversal sequence

  • A preorder traversal sequence may correspond to various binary tree shapes. Similarly, a post-order traversal sequence, an in-order traversal sequence, and a layer-order traversal sequence may also correspond to various binary tree forms. That is: if only one of the front/middle/back/level order traversal sequences of a binary tree is given, a binary tree cannot be uniquely determined.

Construct a binary tree from the traversal sequence of the binary tree:
1. Preorder + inorder traversal sequence
2. Postorder + inorder traversal sequence
3. Layer order + inorder traversal sequence

  • Construct a binary tree from the preorder + inorder traversal sequence: the root node can be deduced from the traversal order of the preorder traversal (root node, left subtree, right subtree), and the left node can be deduced from the position of the root node in the inorder traversal sequence. Which nodes are in the subtree and the right subtree respectively.
  • Construct a binary tree from the postorder + inorder traversal sequence: the root node can be deduced from the traversal order of the postorder traversal (left subtree, right subtree, root node), and the left node can be deduced from the position of the root node in the inorder traversal sequence. Which nodes are in the subtree and the right subtree respectively.
  • A binary tree is constructed from a sequence of levels + in-order traversal: the root node can be deduced from the traversal order of the level-order traversal (level traversal), and the left subtree and the right subtree can be deduced from the position of the root node in the in-order traversal sequence. which nodes.

5.3.4. The concept of threaded binary tree

The concept and function of thread binary tree

  • A binary tree with n nodes has n+1 empty link domains, which can be used to record the information of predecessors and successors. The pointers to the predecessor and the successor are called "threads", and the binary tree formed is called a thread binary tree.
  • The binary tree with clues added to the nodes of the binary tree is called a threaded binary tree, and the process of traversing the binary tree in a certain traversal method (such as pre-order, in-order, post-order, or hierarchy, etc.) to turn it into a threaded binary tree is called a threaded binary tree. Thread a binary tree.
  • On the basis of the original binary tree, the nodes of the clue binary tree have added left and right clue tags. When tag == 0, it means that the pointer points to the child; when tag == 1, it means that the pointer is a "clue".
//线索二叉树结点
typedef struct ThreadNode{
   ElemType data;
   struct ThreadNode *lchild, *rchild;
   int ltag, rtag;                // 左、右线索标志
}ThreadNode, *ThreadTree;


in-order threaded storage

pre-order threaded storage

 post-order threaded storage

5.3.5. Threading of Binary Trees

 In-order threading:

typedef struct ThreadNode{
   int data;
   struct ThreadNode *lchild, *rchild;
   int ltag, rtag;                // 左、右线索标志
}ThreadNode, *ThreadTree;

//全局变量pre, 指向当前访问的结点的前驱
TreadNode *pre=NULL;

void InThread(ThreadTree T){
    if(T!=NULL){
        InThread(T->lchild);    //中序遍历左子树
        visit(T);               //访问根节点
        InThread(T->rchild);    //中序遍历右子树
    }
}

void visit(ThreadNode *q){
   if(q->lchid = NULL){                 //左子树为空,建立前驱线索   
      q->lchild = pre;
      q->ltag = 1;
   }

   if(pre!=NULL && pre->rchild = NULL){ 
      pre->rchild = q;           //建立前驱结点的后继线索
      pre->rtag = 1;
   }
   pre = q;
}

//中序线索化二叉树T
void CreateInThread(ThreadTree T){
   pre = NULL;                //pre初始为NULL
   if(T!=NULL);{              //非空二叉树才能进行线索化
      InThread(T);            //中序线索化二叉树
      if(pre->rchild == NULL)
         pre->rtag=1;         //处理遍历的最后一个结点
   }
}


Preorder threading:

typedef struct ThreadNode{
   int data;
   struct ThreadNode *lchild, *rchild;
   int ltag, rtag;                // 左、右线索标志
}ThreadNode, *ThreadTree;

//全局变量pre, 指向当前访问的结点的前驱
TreadNode *pre=NULL;

//先序遍历二叉树,一边遍历一边线索化
void PreThread(ThreadTree T){
   if(T!=NULL){
      visit(T);
      if(T->ltag == 0)         //lchild不是前驱线索
         PreThread(T->lchild);
      PreThread(T->rchild);
   }
}

void visit(ThreadNode *q){
   if(q->lchid = NULL){                 //左子树为空,建立前驱线索   
      q->lchild = pre;
      q->ltag = 1;
   }

   if(pre!=NULL && pre->rchild = NULL){ 
      pre->rchild = q;           //建立前驱结点的后继线索
      pre->rtag = 1;
   }
   pre = q;
}

//先序线索化二叉树T
void CreateInThread(ThreadTree T){
   pre = NULL;                //pre初始为NULL
   if(T!=NULL);{              //非空二叉树才能进行线索化
      PreThread(T);            //先序线索化二叉树
      if(pre->rchild == NULL)
         pre->rtag=1;         //处理遍历的最后一个结点
   }
}

Subsequent threading:

typedef struct ThreadNode{
   int data;
   struct ThreadNode *lchild, *rchild;
   int ltag, rtag;                // 左、右线索标志
}ThreadNode, *ThreadTree;

//全局变量pre, 指向当前访问的结点的前驱
TreadNode *pre=NULL;

//先序遍历二叉树,一边遍历一边线索化
void PostThread(ThreadTree T){
   if(T!=NULL){
      PostThread(T->lchild);
      PostThread(T->rchild);
      visit(T);                  //访问根节点
   }
}

void visit(ThreadNode *q){
   if(q->lchid = NULL){                 //左子树为空,建立前驱线索   
      q->lchild = pre;
      q->ltag = 1;
   }

   if(pre!=NULL && pre->rchild = NULL){ 
      pre->rchild = q;           //建立前驱结点的后继线索
      pre->rtag = 1;
   }
   pre = q;
}

//先序线索化二叉树T
void CreateInThread(ThreadTree T){
   pre = NULL;                //pre初始为NULL
   if(T!=NULL);{              //非空二叉树才能进行线索化
      PostThread(T);            //后序线索化二叉树
      if(pre->rchild == NULL)
         pre->rtag=1;         //处理遍历的最后一个结点
   }
}

5.3.6. Find the predecessor/successor in the thread binary tree

The inorder thread binary tree finds the inorder successor next of the specified node * p:

  1. if p->rtag==1, then next = p->rchild;
  2. If p->rtag==0, then next is the leftmost bottom node in the right subtree of p.
// 找到以p为根的子树中,第一个被中序遍历的结点
ThreadNode *FirstNode(ThreadNode *p){
    // 循环找到最左下结点(不一定是叶结点)
    while(p->ltag==0)
        p=p->lchild;
    return p;
}

// 在中序线索二叉树中找到结点p的后继结点
ThreadNode *NextNode(ThreadNode *p){
    // 右子树中最左下的结点
    if(p->rtag==0)
        return FirstNode(p->rchild);
    else
        return p->rchild;
}

// 对中序线索二叉树进行中序循环(非递归方法实现)
void InOrder(ThreadNode *T){
    for(ThreadNode *p=FirstNode(T); p!=NULL; p=NextNode(p)){
        visit(p);
    }
}

The inorder thread binary tree finds the inorder predecessor pre of the specified node * p:

  1. if p->ltag==1, then pre = p->lchild;
  2. If p->ltag==0, then next is the rightmost bottom node in the left subtree of p.
// 找到以p为根的子树中,最后一个被中序遍历的结点
ThreadNode *LastNode(ThreadNode *p){
    // 循环找到最右下结点(不一定是叶结点)
    while(p->rtag==0)
        p=p->rchild;
    return p;
}

// 在中序线索二叉树中找到结点p的前驱结点
ThreadNode *PreNode(ThreadNode *p){
    // 左子树中最右下的结点
    if(p->ltag==0)
        return LastNode(p->lchild);
    else
        return p->lchild;
}

// 对中序线索二叉树进行中序循环(非递归方法实现)
void RevOrder(ThreadNode *T){
    for(ThreadNode *p=LastNode(T); p!=NULL; p=PreNode(p))
        visit(p);
}

The preorder thread binary tree finds the preorder and successor next of the specified node * p:

  1. if p->rtag==1, then next = p->rchild;
  2. If p->rtag==0:
    1. If p has a left child, then the left child will be followed by the first order;
    2. If p does not have a left child, then the right child is the successor.

The preorder thread binary tree finds the preorder predecessor pre of the specified node * p:

  1. Premise: Use the triple linked list instead, and you can find the parent node of node * p.
  2. If the parent node of p can be found, and p is the left child: the parent node of p is its predecessor;
  3. If the parent node of p can be found, and p is the right child, and its left sibling is empty: the parent node of p is its predecessor;
  4. If the parent node of p can be found, and p is the right child, and its left sibling is not empty: the predecessor of p is the last node traversed in the left sibling subtree;
  5. If p is the root node, then p has no prior predecessors.

The postorder thread binary tree finds the postorder predecessor pre of the specified node * p:

  1. If p->ltag==1, then pre = p->lchild;
  2. If p->ltag==0:
  3. If p has a right child, the postorder predecessor is the right child;
  4. If p has no right child, the subsequent predecessor is the right child.

The post-order thread binary tree finds the post-order successor next of the specified node * p:

  1. Premise: Use the triple linked list instead, and you can find the parent node of node * p.
  2. If the parent node of p can be found, and p is the right child: the parent node of p is its successor;
  3. If the parent node of p can be found, and p is the left child, its right sibling is empty: the parent node of p is its successor;
  4. If the parent node of p can be found, and p is the left child, and its right sibling is not empty: the successor of p is the first node traversed in the right sibling subtree;
  5. If p is the root node, then p has no postorder successor.

5.4. Trees and forests

5.4.1. Tree storage structure 

Parent notation (sequential storage): A "pointer" to the parent is stored in each node.

//数据域:存放结点本身信息。
//双亲域:指示本结点的双亲结点在数组中的位置。
#define MAX_TREE_SIZE 100  //树中最多结点数

typedef struct{      //树的结点定义
   ElemType data; 
   int parent;      //双亲位置域
}PTNode;

typedef struct{                   //树的类型定义
   PTNode nodes[MAX_TREE_SIZE];   //双亲表示
   int n;                         //结点数
}PTree;

Increase: add new data elements, no need to store in logical order; (Need to change the number of nodes n)

Delete: (leaf node):
① Set the pseudo-pointer field to -1;
② Fill it with the following data; (Need to change the number of nodes n)

Query:
①Advantages - it is very convenient to check the parents of the specified node;
②Disadvantages - to check the children of the specified node can only be traversed from the beginning, and empty data makes the traversal slower;

Advantages : It is very convenient to check the parents of the specified node
Disadvantages : To check the children of the specified node can only traverse from the beginning

Child representation (sequence + chain storage)
child representation: store each node sequentially, and each node saves the head pointer of the child linked list.

struct CTNode{
   int child;    //孩子结点在数组中的位置
   struct CTNode *next;    // 下一个孩子
};

typedef struct{
   ElemType data;
   struct CTNode *firstChild;    // 第一个孩子
}CTBox;

typedef struct{
   CTBox nodes[MAX_TREE_SIZE];
   int n, r;   // 结点数和根的位置
}CTree;

Child-brother notation (chained storage)
The tree can be converted into a binary tree with the child-brother notation.

//孩子兄弟表示法结点
typedef struct CSNode{
    ElemType data;
    struct CSNode *firstchild, *nextsibling;	//第一个孩子和右兄弟结点
}CSNode, *CSTree;

5.4.2. Tree and forest traversals

Root-first traversal of the tree
If the tree is not empty, visit the root node first, and then perform root-first traversal on each subtree in turn; (same as the pre-order traversal sequence of the corresponding binary tree) The root-first traversal sequence of the
tree corresponds to the binary tree of this tree The sequence of precedence is the same.

void PreOrder(TreeNode *R){
   if(R!=NULL){
      visit(R);    //访问根节点
      while(R还有下一个子树T)
         PreOrder(T);      //先跟遍历下一个子树
   }
}

Back-root traversal of the tree
If the tree is not empty, first perform back-root traversal on each subtree in turn, and finally visit the root node. (Depth-first traversal)
The post-root traversal sequence of a tree is the same as the inorder sequence of the corresponding binary tree of this tree.

void PostOrder(TreeNode *R){
   if(R!=NULL){
      while(R还有下一个子树T)
         PostOrder(T);      //后跟遍历下一个子树
      visit(R);    //访问根节点
   }
}

Layer order traversal (queue implementation)

  1. If the tree is not empty, the root node enters the queue;
  2. If the queue is not empty, the element at the head of the queue is dequeued and accessed, and the children of the element are enqueued sequentially;
  3. Repeat the above operations until the end of the queue is empty;

forest traversal

  • Pre-order traversal: It is equivalent to performing root-first traversal on each tree in turn; it can also be converted into a corresponding binary tree first, and perform pre-order traversal on the binary tree;
  • In-order traversal: It is equivalent to performing post-root traversal on each tree in turn; it can also be converted into the corresponding binary tree first, and perform in-order traversal on the binary tree;

5.5. Application

5.5.1. Binary sorted tree

Second, sorting tree, also known as binary search tree (BST, Binary Search Tree), a binary tree or an empty binary tree, or a binary tree with the following properties:

  1. The keywords of all nodes on the left subtree are smaller than the keywords of the root node;
  2. The keywords of all nodes on the right subtree are greater than the keywords of the root node;
  3. The left subtree and the right subtree are each a binary sorted tree;
  4. Left subtree node value < root node value < right subtree node value;
  5. An in-order traversal can be performed to obtain an increasing ordered sequence.

[Search for binary sorting tree]

  1. If the tree is not empty, the target value is compared with the value of the root node;
  2. If they are equal, the search is successful;
  3. If it is smaller than the root node, search on the left subtree;
  4. Otherwise search on the right subtree;
  5. If the search is successful, return the node pointer; if the search fails, return NULL.
typedef struct BSTNode{
   int key;
   struct BSTNode *lchild, *rchild;
}BSTNode, *BSTree;

//在二叉排序树中查找值为key的结点(非递归)
//最坏空间复杂度:O(1)
BSTNode *BST_Search(BSTree T, int key){
   while(T!=NULL && key!=T->key){        //若树空或等于跟结点值,则结束循环
      if(key<T->key)       //值小于根结点值,在左子树上查找
         T = T->lchild;
      else                  //值大于根结点值,在右子树上查找
         T = T->rchild;
   }
   return T;
}

//在二叉排序树中查找值为key的结点(递归)
//最坏空间复杂度:O(h)
BSTNode *BSTSearch(BSTree T, int key){
   if(T == NULL)
      return NULL;
   if(Kry == T->key)
      return T;
   else if(key < T->key)
      return BSTSearch(T->lchild, key);
   else 
      return BSTSearch(T->rchild, key);
}

[Insertion operation of binary sort tree]

  1. If the original binary sorting tree is empty, insert the node directly; otherwise;
  2. If the key k is less than the value of the root node, it is inserted into the left subtree;
  3. If the key k is greater than the value of the root node, it is inserted into the right subtree.
//在二叉排序树中插入关键字为k的新结点(递归)
//最坏空间复杂度:O(h)
int BST_Insert(BSTree &T, int k){
   if(T==NULL){           //原树为空,新插入的结点为根结点
      T = (BSTree)malloc(sizeof(BSTNode));
      T->key = k;
      T->lchild = T->rchild = NULL;
      return 1;                       //插入成功
   }
   else if(K == T->key)               //树中存在相同关键字的结点,插入失败
      return 0;
   else if(k < T->key)                 
      return BST_Insert(T->lchild,k);
   else 
      return BST_Insert(T->rchild,k);
}

[Construction of binary sorting tree]

//按照str[]中的关键字序列建立二叉排序树
void Crear_BST(BSTree &T, int str[], int n){
   T = NULL;                     //初始时T为空树
   int i=0;
   while(i<n){
      BST_Insert(T,str[i]);     //依次将每个关键字插入到二叉排序树中
      i++;
   }
}

[Deletion of binary sorting tree]

First search to find the target node:

  1. If the deleted node z is a leaf node, it will be deleted directly without destroying the nature of the binary sorting tree;
  2. If the node z has only one left subtree or right subtree, let the subtree of z become the subtree of z's parent node, replacing the position of z;
  3. If the node z has left and right subtrees, let the direct successor (or direct predecessor) of z replace z, and then delete the direct successor (or direct predecessor) from the binary sorting tree, thus converting to the first One or the second case.

Lookup length: In the lookup operation, the number of times you need to compare keywords is called the lookup length, which reflects the time complexity of the lookup operation

5.5.2. Balanced Binary Tree

Balanced Binary Tree (Balanced Binary Tree), referred to as Balanced Tree (AVL Tree) - the height difference between the left subtree and the right subtree of any node does not exceed 1.
Node balance factor = left subtree height - right subtree height

//平衡二叉树结点
typedef struct AVLNode{
   int key;         //数据域
   int balance;     //平衡因子
   struct AVLNode *lchild; *rchild; 
}AVLNode, *AVLTree;

Insertion into a balanced binary tree

  • The object of each adjustment is the "minimum unbalanced subtree"
  • In the insertion operation, as long as the minimum unbalanced subtree is adjusted to balance, other ancestor nodes will restore balance.

Adjust the least unbalanced subtree (LL): Since a new node is inserted on the left subtree (L) of the left child (L) of node A, the balance factor of A increases from 1 to 2, resulting in A as the root The subtree of is out of balance and requires a rightward rotation. Rotate A's left child B to the right to replace A as the root node, rotate A node to the right and down to become the root node of B's ​​right subtree, and B's original right subtree is used as the left child of A node Tree.

image-20210925175200174

Adjust the minimum unbalanced subtree (RR): Since a new node is inserted on the right subtree (R) of the right child (R) of node A, the balance factor of A is reduced from -1 to -2, resulting in A The rooted subtree is out of balance, requiring a leftward rotation. Rotate A's right child B to the left instead of A to become the root node, rotate A node to the left and down to become the root node of B's ​​left subtree, and B's original left subtree is used as the right subtree of A node .

image-20210925175501850
Adjust the least unbalanced subtree (LR) : Due to the insertion of a new node on the right subtree (R) of A's left child (L), the balance factor of A increases from 1 to 2, resulting in a subtree rooted at A Losing balance requires two rotations, first to the left and then to the right. First rotate the root node C of the right subtree of node A's left child B to the position of node B, and then rotate the node C to the position of node A.

image-20210925175800442

image-20210925175842473


Adjust the least unbalanced subtree (RL) : due to the insertion of a new node on the left subtree (L) of the right child (R) of A, the balance factor of A is reduced from -1 to -2, resulting in A as the root The subtree is out of balance, requiring two rotation operations, first right and then left. First rotate the root node C of the left subtree of the right child B of node A to the upper right to the position of node B, and then rotate the node C to the upper left to the position of node A.

image-20210925180242786

image-20210925180312469

Search efficiency analysis : If the tree height is h, in the worst case, searching for a keyword needs to compare h times at most, that is, the time complexity of the search operation cannot exceed O(h).
Since the height difference between the left subtree and the right subtree of any node on a balanced binary tree does not exceed 1, if the n_{h}minimum number of nodes contained in a balanced tree with a depth of h is represented by: ;
n_{0}=0 ;n_{1}=1;n_{2}=2and:n_{h} = n_{h1}+n_{h2}+1

5.5.3. Huffman tree

1. Huffman tree definition

  1. The weight of the node: a value with some practical meaning (such as: indicating the importance of the node, etc.)
  2. The weighted path length of a node: the product of the path length (the number of edges passed) from the root of the tree to the node and the weight of the node.
  3. The weighted path length of the tree: the sum of the weighted path lengths of all leaf nodes in the tree (WPL, Weighted Path Length).
  4. Definition of Huffman tree: In a binary tree containing n weighted leaf nodes, the binary tree with the smallest weighted path length (WPL) is also called the optimal binary tree.

2. Construction of Huffman tree (emphasis)

Given n nodes with weights w1, W2,..., w, respectively, the algorithm for constructing a Huffman tree is described as follows:

  1. The n nodes are respectively regarded as n binary trees containing only one node to form a forest F.
  2. Construct a new node, select two trees with the smallest root node weight from F as the left and right subtrees of the new node, and set the weight of the new node as the root node on the left and right subtrees The sum of the weights of .
  3. Delete the two trees just selected from F, and add the newly obtained tree to F
  4. Repeat steps 2 and 3 until only one tree remains in F.

Notes on constructing Huffman tree:

  • Each initial node eventually becomes a leaf node, and the path length from the node with the smaller weight to the root node is greater
  • The total number of nodes in the Huffman tree is 2n - 1
  • There is no node with degree 1 in Huffman tree.
  • Huffman tree is not unique, but WPL must be the same and optimal

image-20210926174150209

3. Harduman coding (emphasis)

  • Fixed-length encoding: each character is represented by binary bits of equal length
  • Variable length encoding: Allows different characters to be represented by binary bits of unequal length
  • If no encoding is a prefix of another encoding, such an encoding is called a prefix encoding

The Huffman code is obtained from the Man tree - each character in the character set is used as a leaf node, and the frequency of each character is used as the weight of the node. The Huffman tree is constructed according to the method introduced earlier.


Chapter 6 Figure

6.1. Basic concepts of graphs

(1) Graph G is composed of vertex set V and edge set E, denoted as G=(V,E), where V(G) represents the finite non-empty set of vertices in graph G; E(G) represents the vertices in graph G Relationships (edges) between collections. If V={V, V2,...,Vn}, then use V to represent the number of vertices in the graph G, also known as the order of the graph G, E={(u, v) lu V, v V} \epsilon, \epsilonuse |E| represents the number of edges in graph G.

(2) Undirected graph : If E is a finite set of undirected edges (referred to as edges), then graph G is an undirected graph. An edge is really an unordered pair of vertices, denoted as (v,w) or (w,v), since (v,w)=(w,v), where v,w are vertices. It can be said that vertex w and vertex v are neighbors to each other. The edge (v,w) is attached to the vertices w and v, or the edge (v,w) is associated with the vertices v and w.

(3) Directed graph : If E is a finite set of directed edges (also called arcs), then graph G is a directed graph. Arcs are ordered pairs of vertices, denoted as <vw>, where v and w are vertices , v is called the arc tail, w is called the arc head, <vw> is called the arc from vertex v to vertex w, also called v is adjacent to w, or w is adjacent to v. <v, w> f <w, v

(4) Simple graph: ① There is no repeated edge; ② There is no edge from a vertex to itself.

(5) Multigraph : If there are more than one edge between two nodes in graph G, and the vertices are allowed to associate with themselves through the same edge, then G is a multigraph.

(6) Degree of vertex 
       ① For undirected graphs: the degree of vertex v refers to the number of edges attached to the vertex, denoted as TD(v).
       ② For directed graphs: the in-degree is the number of directed edges ending at vertex v, which is recorded as ID(v); the out-degree is the number of directed edges starting at vertex v, recorded as OD(v), The degree of a vertex v is equal to the sum of its in-degree and out-degree, that is, TD(v) = ID(v) + OD(v).

(7) Other concepts

  • Path: Vertex v, a path between vertex v refers to the sequence of vertices.
  • Circuit: A path whose first vertex is the same as the last vertex is called a circuit.
  • Simple path: In the path sequence, a path whose vertices do not appear repeatedly is called a simple path.
  • Simple circuit: Except for the first vertex and the last vertex, the rest of the vertices do not repeat the circuit is called a simple circuit.
  • Path Length: The number of edges on the path.
  • Infinity: point-to-point distance. If the shortest path from vertex u to vertex v exists, the length of this path is called the distance from u to v. If there is no path from u to v, the distance is recorded as infinite.
  • Connected: In an undirected graph, if there is a path from vertex v to vertex w, then v and w are said to be connected.
  • Strongly connected: In a directed graph, if there are paths from vertex v to vertex w and from vertex w to vertex v, then the two vertices are said to be strongly connected.
  • Disconnected graph: If any two vertices in graph G are connected, then graph G is called connected graph, otherwise it is called disconnected graph.
  • Strongly connected graph: If any pair of vertices in the graph is strongly connected, the graph is called strongly connected graph.
  • Connected Components: A maximally connected subgraph of an undirected graph is called a connected component.
  • Strongly Connected Component: A maximally connected subgraph in a directed graph is called a strongly connected component of the directed graph.
  • Minimal Connected Subgraph: The spanning tree of a connected graph is a minimal connected subgraph that contains all the vertices in the graph.
  • Spanning tree of a connected graph: A minimal connected subgraph that contains all the vertices in the graph. If the number of vertices in the graph is n, its spanning tree contains n-1 edges. For a spanning tree, if one of its edges is cut off, it will become a disconnected graph, and if an edge is added, it will form a cycle.
  • Edge weight: In a graph, each edge can be marked with a value with a certain meaning, which is called the edge weight.
  • Weighted graph/network: A graph with weights on its edges is called a weighted graph, also known as a network.
  • Weighted path length: When the graph is a weighted graph, the sum of the weights of all edges on a path is called the weighted path length of the path.

6.2. Graph storage

6.2.1. Adjacency matrix

  • The adjacency matrix (Adjacency Matrix) of the graph is stored in two arrays to represent the graph. A one-dimensional array stores information about vertices in the graph, and a two-dimensional array (called an adjacency matrix) stores information about edges or arcs in the graph.

  • Degree: ithe degree of the th node = the number of non-zero elements in the th irow (or jcolumn);
  • Out degree: ithe number of non-zero elements in the row;
  • In-degree: the number of non-zero elements in column i.

Adjacency matrix method code implementation

#define MaxVertexNum 100	//顶点数目的最大值
typedef char VertexType;	//顶点的数据类型
typedef int EdgeType;	//带权图中边上权值的数据类型
typedef struct{
	VertexType Vex[MaxVertexNum];	//顶点表
	EdgeType Edge[MaxVertexNum][MaxVertexNum];	//邻接矩阵,边表
	int vexnum, arcnum;	//图的当前顶点数和弧树
}MGraph;

Performance analysis:
(1) Space complexity: O\left ( \left | V \right |^{2} \right ), only related to the number of vertices, and has nothing to do with the actual number of edges.
(2) Suitable for storing dense graphs.
(3) The adjacency matrix of the undirected graph is a symmetric matrix, which can be compressed and stored (only the upper or lower triangular area is stored).

\bigstar \bigstar \bigstarThe nature of the adjacency matrix : Let the adjacency matrix of the graph G be A(matrix elements are 0 or 1), then A^{n}the elements of A^{n}\left [ i \right ]\left [ j \right ]is equal to the number of paths whose length is from the point ito the vertex .jn

6.2.2. Adjacency list

Code

#define MAXVEX 100	//图中顶点数目的最大值
type char VertexType;	//顶点类型应由用户定义
typedef int EdgeType;	//边上的权值类型应由用户定义
/*边表结点*/
typedef struct EdgeNode{
	int adjvex;	//该弧所指向的顶点的下标或者位置
	EdgeType weight;	//权值,对于非网图可以不需要
	struct EdgeNode *next;	//指向下一个邻接点
}EdgeNode;

/*顶点表结点*/
typedef struct VertexNode{
	Vertex data;	//顶点域,存储顶点信息
	EdgeNode *firstedge	//边表头指针
}VertexNode, AdjList[MAXVEX];

/*邻接表*/
typedef struct{
	AdjList adjList;
	int numVertexes, numEdges;	//图中当前顶点数和边数
}

The characteristics of adjacency list:

  • For undirected graphs, the storage space is O(|V| +2|E|); for directed graphs, the storage space is O(|V| +|E|);
  • More suitable for sparse graphs;
  • If G is an undirected graph, the degree of the vertex is the length of the edge table of the vertex. If G is a directed graph, the out-degree of the vertex is the length of the edge table of the vertex, and the calculation of the in-degree requires traversing the entire adjacency list;
  • The adjacency list is not unique, and the order of the nodes in the edge list may vary depending on the algorithm and input.

Adjacency list vs. adjacency matrix:

adjacency matrix adjacency list
space complexity O\left ( \left | V \right |^{2} \right ) undirected graph O(|V| +2|E|); directed graphO(|V| +|E|)
Applicable scene store dense graph Storing sparse graphs
Representation only not unique
Calculate degree, in degree, out degree Iterate over the corresponding row or column It is inconvenient to calculate the degree and in-degree of a directed graph, but the rest are very convenient
find adjacent edges Iterate over the corresponding row or column It is not convenient to find the incoming edge of the directed graph, but the rest is very convenient

6.2.3. Cross-linked list, adjacent multi-list

【Cross Link List】

  • A cross linked list is a linked storage structure of a directed graph.

#define MAX_VERTEX_NUM 20	//最大顶点数量

struct EBox{				//边结点
	int i,j; 				//该边依附的两个顶点的位置(一维数组下标)
	EBox *ilink,*jlink; 	//分别指向依附这两个顶点的下一条边
	InfoType info; 			//边的权值
};
struct VexBox{
	VertexType data;
	EBox *firstedge; 		//指向第一条依附该顶点的边
};
struct AMLGraph{
	VexBox adjmulist[MAX_VERTEX_NUM];
	int vexnum,edgenum; 	//无向图的当前顶点数和边数
};

[adjacent multiple tables]

Adjacency multilist is another linked storage structure for undirected graphs.

 Comparison of four storage methods:

6.2.4. Basic operations on graphs

Adjacent(G, x, y): Determine whether there is an edge < x , y > or ( x , y ) in graph G.
Neighbors(G, x): List the edges adjacent to node x in graph G.
InsertVertex(G, x): Insert vertex x in graph G.
DeleteVertex(G, x): Delete vertex x from graph G.
AddEdge(G, x, y): If an undirected edge ( x , y ) or a directed edge < x , y > does not exist, add this edge to graph G.
RemoveEdge(G, x, y): If undirected edge ( x , y ) (x, y)(x,y) or directed edge < x , y > <x,y><x,y> exists, then Remove the edge from graph G.
FirstNeighbor(G, x): Find the first neighbor point of vertex x in graph G, and return the vertex number if there is one. Returns -1 if x has no neighbors or x does not exist in the graph.
NextNeighbor(G, x, y) : Assuming that vertex y in graph G is an adjacent point of vertex x, return the vertex number of the next adjacent point of vertex x except y, if y is the last adjacent point of x, then returns -1.
Get_edge_value(G, x, y): Get the weight corresponding to the edge ( x , y ) or < x , y > in graph G.
Set_edge_value(G, x, y, v): set the edge ( x , y ) or < x in graph G,

6.3. Graph Traversal

6.3.1. Breadth-first traversal

1. Breadth-First-Search (BFS) main points:
(1) Find all vertices adjacent to a vertex;
(2) Mark which vertices have been visited;
(3) Need an auxiliary queue.

2. Operations used in breadth-first traversal:
FirstNeighbor(G, x): find the first neighbor point of vertex x in graph G, and return the vertex number if there is one; if x has no neighbor point or x does not exist in the graph, then returns -1.
NextNeighbor(G, x, y): Assuming that vertex y in graph G is an adjacent point of vertex x, return the vertex number of the next adjacent point of vertex x other than y, if y is the last adjacent point of x point, returns -1.

3. Breadth-first traversal code implementation: 

/*邻接矩阵的广度遍历算法*/
void BFSTraverse(MGraph G){
	int i, j;
	Queue Q;
	for(i = 0; i<G,numVertexes; i++){
		visited[i] = FALSE;
	}
	InitQueue(&Q);	//初始化一辅助用的队列
	for(i=0; i<G.numVertexes; i++){
		//若是未访问过就处理
		if(!visited[i]){
			vivited[i] = TRUE;	//设置当前访问过
			visit(i);	//访问顶点
			EnQueue(&Q, i);	//将此顶点入队列
			//若当前队列不为空
			while(!QueueEmpty(Q)){
				DeQueue(&Q, &i);	//顶点i出队列
				//FirstNeighbor(G,v):求图G中顶点v的第一个邻接点,若有则返回顶点号,否则返回-1。
				//NextNeighbor(G,v,w):假设图G中顶点w是顶点v的一个邻接点,返回除w外顶点v
				for(j=FirstNeighbor(G, i); j>=0; j=NextNeighbor(G, i, j)){
					//检验i的所有邻接点
					if(!visited[j]){
						visit(j);	//访问顶点j
						visited[j] = TRUE;	//访问标记
						EnQueue(Q, j);	//顶点j入队列
					}
				}
			}
		}
	}
}

4. Complexity analysis: 
(1) Space complexity: In the worst case, the auxiliary queue size is O\left ( \left | V \right | \right ).
(2) For a graph stored in an adjacency matrix, the time \left | V \right |required to access a vertex O\left ( \left | V \right | \right )and the time required to find the adjacent points of each vertex O\left ( \left | V \right | \right ), and there is a total of \left | V \right |one vertex, the time complexity is O\left ( \left | V \right |^{2} \right ).
(3) For the graph stored in the adjacency list, the time required to visit \left | V \right |a vertex , and the total time required to find the adjacency points of each vertex, the time complexity is .O\left ( \left | V \right | \right )O\left ( \left | E \right | \right )O\left ( \left | V \right | +\left | E \right |\right )

5. Breadth-first spanning tree: The breadth-first spanning tree is determined by the breadth-first traversal process. Since the representation of the adjacency list is not unique, the breadth-first spanning tree based on the adjacency list is also not unique. 

6.3.2. Depth-first traversal

1. Depth First Search (Depth First Search), also known as Depth First Search, referred to as DFS .

Code:

bool visited[MAX_VERTEX_NUM];	//访问标记数组
/*从顶点出发,深度优先遍历图G*/
void DFS(Graph G, int v){
	int w;
	visit(v);	//访问顶点
	visited[v] = TRUE;	//设已访问标记
	//FirstNeighbor(G,v):求图G中顶点v的第一个邻接点,若有则返回顶点号,否则返回-1。
	//NextNeighbor(G,v,w):假设图G中顶点w是顶点v的一个邻接点,返回除w外顶点v
	for(w = FirstNeighbor(G, v); w>=0; w=NextNeighor(G, v, w)){
		if(!visited[w]){	//w为u的尚未访问的邻接顶点
			DFS(G, w);
		}
	}
}
/*对图进行深度优先遍历*/
void DFSTraverse(MGraph G){
	int v; 
	for(v=0; v<G.vexnum; ++v){
		visited[v] = FALSE;	//初始化已访问标记数据
	}
	for(v=0; v<G.vexnum; ++v){	//从v=0开始遍历
		if(!visited[v]){
			DFS(G, v);
		}
	}
}

2. Complexity analysis: 
(1) The space complexity mainly comes from the function call stack. In the worst case, the recursion depth is O\left ( \left | V \right | \right ) ; in the best case, it is O(1).
(2) For a graph stored in an adjacency matrix, the time \left | V \right |required to access a vertex O\left ( \left | V \right | \right )and the time required to find the adjacent points of each vertex O\left ( \left | V \right | \right ), and there is a total of \left | V \right |one vertex, the time complexity is O\left ( \left | V \right |^{2} \right ).
(3) For the graph stored in the adjacency list, the time required to visit \left | V \right |a vertex , and the total time required to find the adjacency points of each vertex, the time complexity is .O\left ( \left | V \right | \right )O\left ( \left | E \right | \right )O\left ( \left | V \right | +\left | E \right |\right )

3. Depth-first traversal sequence:

  • Depth-first traversal sequence starting from 2: 2, 1, 5, 6, 3, 4, 7, 8
  • Depth-first traversal sequence starting from 3: 3, 4, 7, 6, 2, 1, 5, 8
  • Depth-first traversal sequence starting from 1: 1, 2, 6, 3, 4, 7, 8, 5

4. Depth-first spanning tree:

  • The adjacency matrix representation of the same graph is unique, so the depth-first traversal sequence is unique, and the depth-first spanning tree is also unique;
  • The adjacency list representation of the same graph is not unique, so the depth-first traversal sequence is not unique, and the depth-first spanning tree is not unique.

5. Depth-first generation of deep forest: 

6.4. Applications of graphs

6.4.1. Minimum spanning tree

1. Spanning tree: The spanning tree of a connected graph is a minimal connected subgraph that contains all the vertices in the graph. If the number of vertices in the graph is n, its spanning tree contains n-1 edges. For a spanning tree, if one of its edges is cut off, it will become a disconnected graph, and if an edge is added, it will form a cycle.

2. Minimum spanning tree (minimum cost tree): For a weighted connected undirected graph G =(V,E), the spanning tree is different, and the weight of each tree (that is, the sum of the weights on all edges in the tree) may also be different. Let R be the set of all spanning trees of G, if T is the spanning tree with the smallest sum of edge weights in R, then T is called the minimum spanning tree (Minimum-Spannino-Tree, MST) of G.

  • There may be multiple minimum spanning trees, but the sum of edge weights is always unique and minimal.
  • The number of edges of the minimum spanning tree = the number of vertices - 1. If one edge is cut off, there will be no connection, and if an edge is added, a circuit will appear.
  • If a connected graph is itself a tree, then its minimum spanning tree is itself.
  • Only connected graphs have spanning trees, and disconnected graphs only have spanning forests.

3. Two ways to find the minimum spanning tree

  • Prim algorithm
  • Kruskal algorithm 

Prim's algorithm (Prim) : Build a spanning tree from a certain vertex; each time a new vertex with the least cost is included in the spanning tree until all vertices are included. Time complexity: O(V2) suitable for edge-dense graphs.

Kruskal algorithm (Kruskal) : Select an edge with the smallest weight each time, so that the two ends of this edge are connected (the ones that are already connected are not selected) until all nodes are connected. Time complexity: O(lEllog2lEl ) suitable for edge-sparse graphs.

Prim's Algorithm (Prim) Kruskal's Algorithm (Kruskal)
time complexity $O( V
Applicable scene dense graph sparse graph

6.4.2. Single-source shortest path problem for unweighted graphs——BFS algorithm

1. Use the BFS algorithm to find the shortest path problem of an unweighted graph, you need to use three arrays

  • d[]The array is used to record the shortest path from vertex u to other vertices.
  • path[]The array is used to record the shortest path from which vertex.
  • visited[]The array is used to record whether it has been visited.

2. Code implementation:

#define MAX_LENGTH 2147483647			//地图中最大距离,表示正无穷

// 求顶点u到其他顶点的最短路径
void BFS_MIN_Disrance(Graph G,int u){
    for(i=0; i<G.vexnum; i++){
        visited[i]=FALSE;				//初始化访问标记数组
        d[i]=MAX_LENGTH;				//初始化路径长度
        path[i]=-1;						//初始化最短路径记录
    }
    InitQueue(Q);						//初始化辅助队列
    d[u]=0;
    visites[u]=TREE;
    EnQueue(Q,u);
    while(!isEmpty[Q]){					//BFS算法主过程
        DeQueue(Q,u);					//队头元素出队并赋给u
        for(w=FirstNeighbor(G,u);w>=0;w=NextNeighbor(G,u,w)){
            if(!visited[w]){
                d[w]=d[u]+1;
                path[w]=u;
                visited[w]=TREE;
                EnQueue(Q,w);			//顶点w入队
            }
        }
    }
}

6.4.3. Single-source shortest path problem - Dijkstra's algorithm

  1. Limitations of the BFS algorithm: The BFS algorithm for finding the single-source shortest path is only applicable to unweighted graphs, or graphs with the same weight of all edges.
  2. Dijkstra's algorithm can handle the single-source shortest path problem of weighted graphs well, but it is not suitable for weighted graphs with negative weights.
  3. To find the shortest path using Dijkstra's algorithm, three arrays are required:
  • final[]The array is used to mark whether each vertex has found the shortest path.
  • dist[]The array is used to record the shortest path length from each vertex to the source vertex.
  • path[]The array is used to record the predecessors of each vertex on the shortest path.

Code

#define MAX_LENGTH = 2147483647;

// 求顶点u到其他顶点的最短路径
void BFS_MIN_Disrance(Graph G,int u){
    for(int i=0; i<G.vexnum; i++){		//初始化数组
        final[i]=FALSE;
        dist[i]=G.edge[u][i];
        if(G.edge[u][i]==MAX_LENGTH || G.edge[u][i] == 0)
            path[i]=-1;
        else
            path[i]=u;
        final[u]=TREE;
    }
 
  	for(int i=0; i<G.vexnum; i++){
        int MIN=MAX_LENGTH;
        int v;
		// 循环遍历所有结点,找到还没确定最短路径,且dist最⼩的顶点v
        for(int j=0; j<G.vexnum; j++){
	        if(final[j]!=TREE && dist[j]<MIN){
 	            MIN = dist[j];
                v = j;
            }
        }
        final[v]=TREE;
        // 检查所有邻接⾃v的顶点路径长度是否最短
        for(int j=0; j<G.vexnum; j++){
	        if(final[j]!=TREE && dist[j]>dist[v]+G.edge[v][j]){
            	dist[j] = dist[v]+G.edge[v][j];
                path[j] = v;
            }
        }
	}
}

6.4.4. The shortest path problem between vertices - Floyd's algorithm

  1. Floyd's algorithm: find the shortest path between each pair of vertices, and use dynamic programming ideas to divide the solution of the problem into multiple stages.

  2. The Floyd algorithm can be used for negative weighted graphs, but it cannot solve graphs with "negative weight loops" (edges with negative weights form loops), and such graphs may not have the shortest path.

  3. Floyd's algorithm uses two matrices:

    1. dist[][]: The current shortest path between vertices.
    2. path[][]: Transition point between two vertices.
  4. Code:

int dist[MaxVertexNum][MaxVertexNum];
int path[MaxVertexNum][MaxVertexNum];

void Floyd(MGraph G){
	int i,j,k;
    // 初始化部分
	for(i=0;i<G.vexnum;i++){
		for(j=0;j<G.vexnum;j++){
			dist[i][j]=G.Edge[i][j];		
			path[i][j]=-1;
		}
	}
    // 算法核心部分
	for(k=0;k<G.vexnum;k++){
		for(i=0;i<G.vexnum;i++){
			for(j=0;j<G.vexnum;j++){
	   	    	if(dist[i][j]>dist[i][k]+dist[k][j]){
	   		    	dist[i][j]=dist[i][k]+dist[k][j];
	   		    	path[i][j]=k;
                }
			}
        }
    }
}

4. Shortest path algorithm comparison:

BFS algorithm Dijkstra algorithm Floyd algorithm
Unweighted graph
weighted graph
graph with negative weights
Graph with negative weight loops
time complexity O(|V|^2) or (|V|+|E|) O(|V|^2) O(|V|^3)
usually used for Find the single-source shortest path for an unweighted graph Finding the single-source shortest path in a weighted graph Find the shortest path between vertices in a weighted graph

6.4.5. Directed Acyclic Graph Description Expression

1. Directed acyclic graph : If there is no cycle in a directed graph, it is called a directed acyclic graph, or DAG graph (Directed Acyclic Graph) for short.

DAG description expression: ((a+b)*(b*(c+d))+(c+d)*e)*((c+d)*e)

2. The problem-solving steps of the expression describing the directed acyclic graph:

  • Step 1: Arrange the operands in a row without repetition
  • Step 2: Mark the effective order of each operator (it doesn't matter if the order is a bit different)
  • Step 3: Add operators in order, pay attention to "layering"
  • Step 4: Check whether the operators on the same layer can be combined from bottom to top

6.4.6. Topological sort

1. AOV network (Activity on Vertex Network, which uses vertices to represent the active network): a project is represented by a DAG graph (directed acyclic graph). The vertex represents the activity, and the directed edge <Vi, Vj> indicates that the activity Vi must be performed before the activity Vj.

2. Topological sorting : In graph theory, a sequence composed of vertices of a directed acyclic graph is called a topological sorting of the graph if and only if the following conditions are met:

  • Each vertex appears exactly once;
  • If vertex A is in front of vertex B in the sequence, then there is no path from vertex B to vertex A in the graph.
  • Or defined as: Topological sorting is a sorting of the vertices of the directed acyclic graph, which makes if there is a path from vertex A to vertex B, then vertex B appears behind vertex A in the sorting. Each AOV network has one or more topological sorting sequences.
     

3. Implementation of topological sorting:

  • Select a vertex with no predecessor (in-degree 0) from the AOV network and output
  • Remove this vertex and all edges originating from it from the mesh.
  • Repeat 1 and 2 until the current AOV network is empty or there are no predecessor-free vertices in the current network.

4. Code to implement topological sorting (adjacency list implementation):

#define MaxVertexNum 100			//图中顶点数目最大值

typedef struct ArcNode{				//边表结点
    int adjvex;						//该弧所指向的顶点位置
    struct ArcNode *nextarc;		//指向下一条弧的指针
}ArcNode;

typedef struct VNode{				//顶点表结点
    VertexType data;				//顶点信息
    ArcNode *firstarc;				//指向第一条依附该顶点的弧的指针
}VNode,AdjList[MaxVertexNum];

typedef struct{
    AdjList vertices;				//邻接表
    int vexnum,arcnum;				//图的顶点数和弧数
}Graph;								//Graph是以邻接表存储的图类型

// 对图G进行拓扑排序
bool TopologicalSort(Graph G){
    InitStack(S);					//初始化栈,存储入度为0的顶点
    for(int i=0;i<g.vexnum;i++){
        if(indegree[i]==0)
            Push(S,i);				//将所有入度为0的顶点进栈
    }
    int count=0;					//计数,记录当前已经输出的顶点数
    while(!IsEmpty(S)){				//栈不空,则存入
        Pop(S,i);					//栈顶元素出栈
        print[count++]=i;			//输出顶点i
        for(p=G.vertices[i].firstarc;p;p=p=->nextarc){
            //将所有i指向的顶点的入度减1,并将入度为0的顶点压入栈
            v=p->adjvex;
            if(!(--indegree[v]))
                Push(S,v);			//入度为0,则入栈
        }
    }
    if(count<G.vexnum)
        return false;				//排序失败
    else
        return true;				//排序成功
}

6.4.7. Critical path

1. AOE network : In a weighted directed graph, events are represented by vertices, activities are represented by directed edges, and the cost of completing the activity is represented by the weight on the edge (such as the time required to complete the activity), which is called The network that uses edges to represent activities is referred to as AOE Network (Activity On Edge Network).

2. The AOE network has the following two properties:

  1. Only after the event represented by a vertex occurs, the activities represented by the directed edges starting from the vertex can start;
  2. The event represented by a vertex can only occur when the activities represented by all the directed edges entering a vertex have ended. Also, some activities can be done in parallel.

3. In the AOE network, there is only one vertex with an in-degree of 0, called the start vertex (source point) , which represents the beginning of the entire project; there is only one vertex with an out-degree of 0, called the end vertex (Sink), which signifies the end of the entire project.

  • There may be multiple directed paths from the source point to the sink point. Among all paths, the path with the largest path length is called the critical path, and the activities on the critical path are called critical activities.
  • The shortest time to complete the entire project is the length of the critical path. If the key activities cannot be completed on time, the completion time of the entire project will be extended.

4. Sum the earliest and latest occurrence time of activities and events
1. Find the earliest occurrence time ve() of all events: According to the topological sorting sequence, find the ve(k) of each vertex in turn, which is the key path of the event.
2. Find the latest occurrence time vl() of all events: According to the reverse topological sorting sequence, find the vl(k) of each vertex in turn, that is, push from the back to the front (the earliest and latest of the last point are equal).
3. Find the earliest occurrence time e() of all activities: If edge < vk,v; > represents activity ai, then e(i) = ve(k).
4. Find the latest occurrence time of all activities 1(): If side < Vk,v; > represents activity a, then l(i) = vlj)- Weight(vk,v;). 5. Find all
activities Time margin d(): d() = l() - e(i) = earliest - latest.

5. Characteristics of key activities and critical paths:
If the time consumption of key activities increases, the duration of the entire project will increase.
Reducing the time of key activities can shorten the duration of the entire project.
Critical activities may become non-critical when shortened to a certain extent.
There may be multiple critical paths, and only increasing the speed of critical activities on one critical path cannot shorten the duration of the entire project. Only by speeding up the critical activities included in all critical paths can the purpose of shortening the duration be achieved.


Chapter 7 Search

7.1 Find concepts

  • Search: The process of finding data elements that meet certain conditions in a data set is called search.
  • Lookup table (lookup structure): used for lookup. A collection of data is called a lookup table, and it consists of data elements (or records) of the same type.
  • Keyword: The value of a data item that uniquely identifies the element in the data element. Using keyword-based search, the search result should be unique.
  • Common operations on lookup tables:
    1. Find eligible data elements
    2. Insert, delete a data element

For example: 

  • Search length: In the search operation, the number of times that keywords need to be compared is called the search length.
  • Average search length (ASL, Average Search Length):  the average value of the number of keyword comparisons during all search processes.

7.2 Sequential search

  • Sequential lookup, also known as "linear lookup", is usually used in linear table algorithms.

  • Thought: Find from head to jio (or vice versa) 

Code: 

typedef struct{				//查找表的数据结构(顺序表)
    ElemType *elem;			//动态数组基址
    int TableLen;			//表的长度
}SSTable;

//顺序查找
int Search_Seq(SSTable ST,ElemType key){
    int i;
    for(i=0;i<ST.TableLen && ST.elem[i]!=key;++i);
    // 查找成功返回数组下标,否则返回-1
    	return i=ST.TableLen? -1 : i;
}

Sentinel mode code implementation:

typedef struct{				//查找表的数据结构(顺序表)
    ElemType *elem;			//动态数组基址
    int TableLen;			//表的长度
}SSTable;

//顺序查找
int Search_Seq(SSTable ST,ElemType key){
    ST.elem[0]=key;
    int i;
    for(i=ST.TableLen;ST.elem[i]!=key;--i)
    // 查找成功返回数组下标,否则返回0
	    return i;
}

 Analyzing ASL with Search Decision Tree

  • The search length of a successful node = the number of layers it is in

  • The search length of a failed node = where its parent node is located

  • Layers By default, all failures and successes are equally probable

7.3 Binary search 

【Half search concept】

  • Binary search, also known as "binary search", is only applicable to ordered sequence tables

Half-find code implementation: 

typedef struct{
    ElemType *elem;
    int TableLen;
}SSTable;

// 折半查找
int Binary_Search(SSTable L,ElemType key){
    int low=0,high=L.TableLen,mid;
    while(low<=high){
        mid=(low+high)/2;
        if(L.elem[mid]==key)
            return mid;
        else if(L.elem[mid]>key)
            high=mid-1;					//从前半部分继续查找
        else
            low=mid+1;					//从后半部分继续查找
    }
    return -1;
}
  • The construction of the half-search decision tree: mid=\left \lfloor \right (low + high )/2 \rfloor , if there is an odd number of elements between the current low and high, then after mid is separated, the number of elements in the left and right parts is equal; if there is an even number of elements between the current low and high, then after the mid is separated, The left half has one element less than the right half.
  • In the decision tree of binary search, if mid=\left \lfloor \right (low + high )/2 \rfloor, then for any node, there must be: the number of nodes in the right subtree - the number of nodes in the left subtree = 0 or 1.
  • The decision tree for binary search must be a balanced binary tree. In the decision tree of binary search, only the bottom layer is not satisfied. Therefore, when the number of elements is n, the tree height is h=\left \lceil (low + high )/2 \right \rceil.
  • Decision tree node keywords: left<middle<right, satisfying the definition of binary sorting tree. Failed nodes: n+1pcs (equal to the number of empty chain domains of successful nodes)
  • The search efficiency of the half search: the time complexity of the half search = O\log_{2}n.

7.4 Block search

The situation for block search : disorder within a block and order among blocks.


 Index table and sequence table code

// 索引表
typedef struct{
    ElemType maxValue;
    int low,high;
}Index;

// 顺序表存储实际元素
ElemType List[100];
  • There are two methods of sequential search and half search to find the block where the target keyword is located.
  • If a binary search is used and the target keyword is not included in the index table, it will eventually stop at low > high, and the target keyword must be searched in the block pointed to by low.
  • Lookup Efficiency Analysis (ASL): Suppose a lookup table of length n is evenly divided into b blocks, each block has s elements. Let the average search lengths of index lookup and block lookup be ASL=LI​+LS​

7.5 Red-black trees

7.5.1 Why was the red-black tree invented?

  • Balanced binary tree AVL: Insertion/deletion can easily destroy the "balance" property, requiring frequent adjustments to the shape of the tree. For example, if the insertion operation leads to balance, you need to calculate the balance factor first, find the minimum unbalanced subtree (the time overhead is large), and then adjust L/RR/LR/RL;
  • Red-black tree RBT: Insertion/deletion will not destroy the "red-black" feature in many cases, and there is no need to frequently adjust the shape of the tree. Even if adjustment is required, it can generally be completed in constant time;
  • Balanced binary tree: suitable for scenarios that are mainly searched and rarely inserted/deleted;
    red-black tree: suitable for scenarios with frequent insertion and deletion, and is more practical.

7.5.2 Definition of red-black tree

  1. Must be a binary sorting tree and each node is either red or black;
  2. The root node is black;
  3. Leaf nodes (external nodes, NULL nodes, failure nodes) are all black;
  4. There are no two adjacent red nodes (that is, the parent node and child node of the red node are both black);
  5. For each node, the simple path from the node to any leaf node contains the same number of black nodes.
  • Black height bh of a node -- the total number of black nodes on the path from a node (excluding this node) to any empty leaf node.
  • Property 1: The longest path from the root node to the leaf node is not greater than twice the shortest path.
  • Property 2: The height of a red-black tree with n internal nodes h\leqslant 2\log_{2}(n+1).

7.5.3 Red-black tree insertion

  • Search first, determine the insertion position (the principle is the same as the binary sorting tree), and insert a new node
  • The new node is the root - one is colored black
  • New nodes that are not roots are colored red
    • If the red-black tree definition is still satisfied after inserting a new node, the insertion ends
    • If the definition of red-black tree is not satisfied after inserting a new node, it needs to be adjusted so that it meets the definition of red-black tree again 
    • Uncle Hei: Rotation + Dyeing
      • LL type: right single rotation, father for father + dyeing
      • RR type: left single rotation, father for father + dyeing
      • LR type: left and right double rotation, son for master + dyeing
      • RL type: right and left double rotation, son for master + dyeing
    • Uncle Hong: Dyeing + Renewal
      • Uncle and grandpa are dyed, grandpa becomes a new node 

7.6 B-Trees and B+Trees

7.6.1 B-tree 

B-tree, also known as a multi-way balanced search tree, the maximum number of children of all nodes in the B-tree is called the order of the B-tree, usually denoted by m. An m-order B-tree is either an empty tree or an m-ary tree that satisfies the following characteristics:

  • Each node in the tree has at most m subtrees, that is, at most m-1 keywords.
  • If the root node is not a terminal node, there are at least two subtrees.
  • All non-leaf nodes except the root node have at least ⌈ m / 2 ⌉ subtrees, that is, at least ⌈ m / 2 ⌉ - 1 keyword. (In order to ensure search efficiency, the keywords of each node should not be too few)
  • All leaf nodes appear on the same level, and have no information (can be regarded as external nodes or search failure nodes similar to the half search decision tree, in fact these nodes do not exist, pointing to these nodes pointer is null).

The core characteristics of the m-order B-tree:

  • The number of subtrees of the root node ∈ [ 2 , m ] ∈ [2, m] ∈ [2,m], the number of keywords ∈ [ 1 , m − 1 ] ∈ [1, m-1] ∈ [1,m− 1].
  • The number of subtrees of other nodes ∈ [ ⌈ m / 2 ⌉ , m ] ∈ [⌈m/2⌉ , m] ∈ [⌈m/2⌉,m]; the number of keywords ∈ [ − 1 , m − 1 ] ∈[-1,m-1]∈[−1,m−1].
  • For any node, all its subtrees have the same height.
  • Keyword value: Subtree 0 < Keyword 1 < Subtree 1 < Keyword 2 < Subtree 2 <... (similar to binary search tree left<middle<right)

The height of the B-tree: for an m-fork B-tree with n keywords, what is the minimum and maximum height?

  • logm​n+1≤h≤log⌈m/2⌉​2n+1​+1

7.6.2 Basic operation of B-tree

B-tree lookup :

  • The search operation of B-tree is similar to that of binary search tree. B-tree search includes two basic operations: ① find nodes in B-tree; ② find keywords in nodes. B-trees are often stored on disk, so the previous lookup operation is performed on disk, and the latter lookup operation is performed in memory. After a node is found in the B-tree, it is first searched in the ordered table, if found, the search is successful, otherwise it is searched in the pointed subtree according to the corresponding pointer information. If a leaf node is found (the corresponding pointer is a null pointer), it means that there is no corresponding keyword in the tree, and the search fails.

B-tree insertion: the process of inserting the keyword key into the B-tree: 

  • Positioning: Use the B-tree search algorithm to find a non-leaf node in the bottom layer where the keyword is inserted. (The insertion position must be a non-leaf node at the bottom!)
  • Insert: In the B tree, the number of keywords of each non-failure node is in the interval [ ⌈ m / 2 ⌉ − 1 , m − 1 ] [⌈m/2⌉- 1,m-1][⌈m/2 ⌉−1,m−1]. If the number of node keywords is less than m after inserting the keyword key, it can be inserted directly; otherwise, the node must be split.
  • Splitting: Divide the keywords contained in the node into two parts from the middle position of the node (⌈ m / 2 ⌉ ⌈m/2⌉⌈m/2⌉), put the keywords contained in the left half into the original node, and put the keywords contained in the right The keywords contained in the half part are placed in the new node, and the keywords in the middle position (⌈ m / 2 ⌉ ⌈m/2⌉⌈m/2⌉) are inserted into the parent node of the original node. If the keyword of the parent node also exceeds the upper limit at this time, the parent node will continue to be split until the process is passed to the root node, resulting in an increase in the height of the B-tree.

B-tree deletion:

  1. Deletion of non-terminal nodes: Use direct predecessors or direct successors to replace the deleted keywords, and convert to delete terminal nodes.
  2. The deletion of terminal nodes can be divided into three situations
  • Delete keywords directly: If the number of keywords in the node where the keyword is to be deleted is ≥ ⌈ m / 2 ⌉ ≥ ⌈m/2⌉≥ ⌈m/2⌉, it can be deleted directly.
  • Brothers are enough: if the number of keywords in the node where the keyword is deleted = ⌈ m / 2 ⌉ − 1 = ⌈m/2⌉-1=⌈m/2⌉−1, and the left side adjacent to this node ( or right) sibling node keywords ≥ ⌈ m / 2 ⌉ ≥ ⌈m/2⌉≥⌈m/2⌉, then the node, left (or right) sibling node and its parent node (parent-child) need to be adjusted Transposition method) to achieve a new balance.
  • Brothers are not enough to connect: If the number of keywords in the node where the keyword is deleted = ⌈ m / 2 ⌉ − 1 = ⌈m/2⌉-1=⌈m/2⌉−1, and the left side adjacent to this node ( or right) sibling node keywords = ⌈ m / 2 ⌉ − 1 = ⌈m/2⌉-1=⌈m/2⌉−1, then the node, left (or right) sibling node and its The keywords in the parent node are merged.
     

7.6.3 B+ tree

An m-order B+ tree must meet the following conditions: 

  • Each branch node has at most m subtrees (child nodes).
  • Non-leaf root nodes have at least two subtrees, and each other branch node has at least ⌈ m / 2 ⌉ ⌈m/2⌉⌈m/2⌉ subtrees.
  • The number of subtrees of nodes is the same as the number of keywords.
  • All leaf nodes contain all keywords and pointers to corresponding records. In the leaf nodes, the keywords are arranged in order of size, and adjacent leaf nodes are linked to each other in order of size. (Indicating that B+ tree supports sequential search)
  • All branch nodes contain only the maximum value of keys in each of its nodes and pointers to its child nodes.

7.6.4 Comparison of B-tree and B+-tree 

(1) m-order B tree: n keywords in the node correspond to n+1 subtrees;
         m-order B+ tree: n keywords in the node correspond to n subtrees.
(2) m-order B-tree: the number of keywords of the root node n \in[1, m-1]. The number of keywords of other nodes n \in[[m/2]-1,m-1];
         m-order B+ tree: the number of keywords of the root node n \in[1,m] the number of keywords of other nodes n \in[[m /2],m].
(3) m-order B-tree: in the B-tree, the keywords contained in each node are not repeated;
         m-order B+ tree: in the B+ tree, the leaf nodes contain all keywords that have appeared in non-leaf nodes The keywords of will also appear in the leaf nodes.
(4) m-order B-tree: the nodes of the B-tree contain the storage address of the record corresponding to the keyword; m-order
         B+ tree: in the B+ tree, leaf nodes contain information, and all non-leaf nodes are only indexed Function, each index entry in the non-leaf node only contains the largest key of the corresponding subtree and a pointer to the subtree, and does not contain the storage address of the record corresponding to the key.

7.7 Hash lookup and its performance analysis

7.7.1 Basic concept of hash table

  • Hash function: a function that maps the key in the lookup table to the address corresponding to the key, denoted as H ash ( key ) = A ddr Hash(key)=AddrHash(key)=Addr.
  • The hash function may map two or more different keys to the same address, which is called a collision. Different keywords that collide are called synonyms.
  • Hash table: A data structure that is directly accessed based on keywords. The hash table establishes a direct mapping relationship between keywords and storage addresses.

How to construct a hash function

  1. Direct addressing method: directly take a certain linear function of the key as the hash address, and the hash function is H ( key ) = key or H ( key ) = a × key + b . This method is computationally simple and does not cause conflicts. The disadvantage is that there are more vacancies, which will cause waste of storage space.
  2. Remainder method: Assuming the length of the hash table is m, take a prime number p not greater than but closest to m, and use the hash function H ( key ) = key % p to convert the key into a hash address. The reason p is a prime number is that the key can be mapped to any address in the hash space with equal probability after conversion through the hash function.
  3. Digital analysis method: Assume that the keyword is an r-ary number, and the frequency of occurrence of the r digits in the ones digit is not necessarily the same. It may be evenly distributed in some digits, and unevenly distributed in some digits. At this time, a number of evenly distributed digits should be selected as the hash address.
  4. Square method: This method takes the middle digits of the square value of the keyword as the hash address, and the specific number of digits depends on the specific situation. The hash address obtained by this method is related to every bit of the keyword, so the distribution of hash addresses is relatively uniform. Applicable to keywords whose values ​​are not uniform enough or smaller than the number of bits required for a hash address.

7.7.2 Hash search and performance analysis

The hash lookup execution steps are as follows

  1. Initialization: A ddr = H ash ( key ) 
  2. Detect whether there is a record at the address of Addr in the lookup table, if there is no record, return search failure; if there is a record, compare it with the value of key, if they are equal, return search success, otherwise go to step ③.
  3. Calculate the "next hash table address" with the given conflict handling method, and set Addr to this address, then go to step ②.

Average lookup length (ASL) : The average lookup length of a successful hash table lookup is the average number of comparisons to find existing entries in the table; the average lookup length of a failed hash table lookup means that the entry to be looked up cannot be found but the insertion position can be found average number of comparisons.

Chapter 8 Sorting

8.1. Basic concepts of sorting

  1. Sorting: The process of rearranging the elements in the table so that the elements in the table meet the order of keywords.
  2. Input: n records \small R_{1},R_{2},...,R_{n}, the corresponding keywords are \small R_{1},R_{2},...,R_{n} .     
  3. Output: A rearrangement of the input sequence \small R_{1}^{'},R_{2}^{'},...,R_{n}^{'} such that \small k_{1}^{'}\leq k_{2}^{'}\leq ...\leq k_{n}^{'} .
  4. Algorithm stability: If there are two elements in the list to be sorted \small R_{1}, \small R_{2}their corresponding keywords are the same, i.e. \small key_{i} =  \small key_{j}  , and they are in front of the list before sorting \small R_{1}. If they are still in front of \small R_{2}the list after sorting using a certain sorting algorithm , it is called this The sorting algorithm is stable, otherwise the sorting algorithm is said to be unstable.\small R_{1}\small R_{2}
  5. Evaluation indicators for sorting algorithms: time complexity, space complexity, and stability.
  6. Classification of sorting algorithms:
    Internal sorting: elements are in memory during sorting - focus on how to make time and space complexity lower.
    External sorting: During sorting, all elements cannot be stored in memory at the same time, and must be continuously moved between internal and external memory according to requirements during the sorting process - focus on how to make time and space complexity lower, and how to make read/write disk Fewer times.

8.2. Insertion sort 

8.2.1. Direct insertion sort

  • Algorithm idea: Each time a record to be sorted is inserted into the previously sorted subsequence according to its key size until all records are inserted.

Code implementation (without sentinels): 

// 对A[]数组中共n个元素进行插入排序
void InsertSort(int A[],int n){
    int i,j,temp;
    for(i=1; i<n; i++){
        if(A[i]<A[i-1]){    	//如果A[i]关键字小于前驱
            temp=A[i];  
            for(j=i-1; j>=0 && A[j]>temp; --j)
                A[j+1]=A[j];    //所有大于temp的元素都向后挪
            A[j+1]=temp;
        }
    }
}

Code implementation (with sentry):

// 对A[]数组中共n个元素进行插入排序
void InsertSort(int A[], int n){
    int i,j;
    for(i=2; i<=n; i++){
        if(A[i]<A[i-1]){
            A[0]=A[i];     	//复制为哨兵,A[0]不放元素
            for(j=i-1; A[0]<A[j]; --j)
                A[j+1]=A[j];
            A[j+1]=A[0];
        }
    }
}
  • Algorithm efficiency analysis:
    • Time complexity: best case O(n), worst case O( n^{2}), average case O( n^{2}).
    • Space complexity: O(1).
    • Algorithm stability: stable.
    • Applicability: Linear tables suitable for sequential storage and chain storage.

Insertion sort code implementation for linked list:

//对链表L进行插入排序
void InsertSort(LinkList &L){
    LNode *p=L->next, *pre;
    LNode *r=p->next;
    p->next=NULL;
    p=r;
    while(p!=NULL){
        r=p->next;
        pre=L;
        while(pre->next!=NULL && pre->next->data<p->data)
            pre=pre->next;
        p->next=pre->next;
        pre->next=p;
        p=r;
    }
}

8.2.2. Binary insertion sort

  • Algorithm idea: Each time a record to be sorted is sorted according to its key size, use the binary search to find the position that should be inserted in the previous subsequence and insert it until all records are inserted.
  • Note: The binary search does not stop until low>high. When Yuan Qin pointed to by mid is equal to the current element, low=mid+1 should be continued to ensure "stability". Finally, the current element should be inserted at the position pointed by low (ie high+1)

Code:

//对A[]数组中共n个元素进行折半插入排序
void InsertSort(int A[], int n){ 
    int i,j,low,high,mid;
    for(i=2; i<=n; i++){
        A[0]=A[i];    		     	 //将A[i]暂存到A[0]
            low=1; high=i-1;
        while(low<=high){            //折半查找
            mid=(low+high)/2;
            if(A[mid]>A[0])
                high=mid-1;
            else
                low=mid+1;
        }
        for(j=i-1; j>high+1; --j)
            A[j+1]=A[j];
        A[high+1]=A[0];
    }
}
  • Compared with direct insertion sort, the number of comparison keys is reduced, but the number of moving elements is unchanged. The time complexity is still O(n²).

8.2.3. Hill sort 

  • Algorithm idea: first pursue the partial order of the elements in the table, and then gradually approach the global order to reduce the time complexity of the insertion sort algorithm.
  • Specific implementation: Hill sorting: first divide the table to be sorted into several  L[i,i+d,i+ed,...,i+kd] "special" sub-tables, and perform direct insertion sorting on each sub-table. Decrease the increment d and repeat the above process until d=1.

Hill sort code implementation: 

// 对A[]数组共n个元素进行希尔排序
void ShellSort(ElemType A[], int n){
    int d,i,j;
    for(d=n/2; d>=1; d=d/2){  	//步长d递减
        for(i=d+1; i<=n; ++i){
            if(A[i]<A[i-d]){
                A[0]=A[i];		//A[0]做暂存单元,不是哨兵
                for(j=i-d; j>0 && A[0]<A[j]; j-=d)
                    A[j+d]=A[j];
                A[j+d]=A[0];
            }
		}
    }
}
  • Algorithm efficiency analysis:
    • Time Complexity: Hill sort time complexity depends on the function of the incremental sequence. O( ) in the worst case n^{2}, up to O( ) when n is in a certain top range n^{1.3}.
    • Space complexity: O(1)
    • Algorithm stability: unstable.

8.3. Swap Sort 

8.3.1. Bubble sort

  • Algorithm idea: Compare the values ​​of adjacent elements two by two from back to front (or from front to back), and if they are in reverse order (ie A [ i − 1 ] > A [ i ]), exchange them until the sequence comparison is complete. By repeating this up to n-1 times of bubbling, all elements can be sorted. To ensure stability, elements with the same key are not exchanged.

Bubble sort code implementation:

// 交换a和b的值
void swap(int &a, int &b){
    int temp=a;
    a=b;
    b=temp;
}

// 对A[]数组共n个元素进行冒泡排序
void BubbleSort(int A[], int n){
    for(int i=0; i<n-1; i++){
        bool flag = false; 			//标识本趟冒泡是否发生交换
        for(int j=n-1; j>i; j--){
            if(A[j-1]>A[j]){
                swap(A[j-1],A[j]);
                flag=true;
            }
        }
        if(flag==false)
            return;       //若本趟遍历没有发生交换,说明已经有序
    }
}
  • Algorithm efficiency analysis:
    • Time complexity: best case O(n), worst case O( n^{2}), average case O( n^{2}).
    • Space complexity: O(1).
    • Stability: Stable.
    • Applicability: Bubble sorting can be used in sequential lists and linked lists.

8.3.2. Quick Sort 

  • Algorithm idea: choose an element pivot in the list to be sorted L [ 1... n ] as the pivot (usually the first element), and divide the list to be sorted into two independent parts L [ 1.. . k − 1 ] and L [ k − 1... n ] . Make all elements in L [ 1... k − 1 ] smaller than pivot, and all elements in L [ k − 1... n ] greater than or equal to pivot, then pivot is placed in its final position L [ k ]. Repeat this process until each part has only one element or is empty.
  • Quicksort is the most performant sorting algorithm among all internal sorting algorithms.
  • Each pass in the quicksort algorithm places the pivot element in its final position. (It can be used to judge how many quick sorts have been performed)
  • Quick sort can be regarded as n elements in the array organized into a binary tree, the pivot of each pass is the root node of the binary tree, and the number of layers of recursive calls is the number of layers of the binary tree.

Quick sort code implementation:

// 用第一个元素将数组A[]划分为两个部分
int Partition(int A[], int low, int high){
    int pivot = A[low];
    while(low<high){
        while(low<high && A[high]>=pivot)
            --high;
        A[low] = A[high];
        while(low<high && A[low]<=pivot) 
            ++low;
        A[high] = A[low];
    }
    A[low] = pivot;
    return low;
} 

// 对A[]数组的low到high进行快速排序
void QuickSort(int A[], int low, int high){
    if(low<high){
        int pivotpos = Partition(A, low, high);  //划分
        QuickSort(A, low, pivotpos - 1);
        QuickSort(A, pivotpos + 1, high);
    }
}
  • Algorithm efficiency analysis:
    • Time complexity: Time complexity of quicksort = O(n × number of layers of recursive calls). Best case O( nlog_{2}n), worst case O( n^{2}), average case O( n^{2}).
    • Space complexity: Space complexity of quick sort = O (number of layers of recursive calls) O (number of layers of recursive calls) O (number of layers of recursive calls). Best case O( nlog_{2}n), worst case O(n), average case O( n^{2}).

8.4. Selection sort 

  • Selection sorting idea: In each pass, select the element with the smallest (or largest) keyword among the elements to be sorted and add it to the ordered subsequence.

8.4.1. Simple selection sort 

  • Algorithm idea: In each pass, select the element with the smallest key among the elements to be sorted and exchange the position with the first element among the elements to be sorted.

Simple selection sort code implementation:

// 交换a和b的值
void swap(int &a, int &b){
    int temp = a;
    a = b;
    b = temp;
}

// 对A[]数组共n个元素进行选择排序
void SelectSort(int A[], int n){
    for(int i=0; i<n-1; i++){          	//一共进行n-1趟,i指向待排序序列中第一个元素
        int min = i;
        for(int j=i+1; j<n; j++){		//在A[i...n-1]中选择最小的元素
            if(A[j]<A[min])
                min = j;
        }
        if(min!=i)                     
            swap(A[i], A[min]);
    }
}
  • Algorithm efficiency analysis:
    • 时间复杂度:无论待排序序列有序、逆序还是乱序,都需要进行 n-1 次处理,总共需要对比关键字(n−1)+(n−2)+. . .+1=n( n−1) /2 次,因此时间复杂度始终是O(n^{2}) 。
    • 空间复杂度:O(1) 。
    • 稳定性:不稳定。
    • 适用性:适用于顺序存储和链式存储的线性表。
       

对链表进行简单选择排序:

void selectSort(LinkList &L){
    LNode *h=L,*p,*q,*r,*s;
    L=NULL;
    while(h!=NULL){
        p=s=h; q=r=NULL;
        while(p!=NULL){
            if(p->data>s->data){
                s=p; r=q;
            }
            q=p; p=p->next;
        }
        if(s==h)
            h=h->next;
        else
            r->next=s->next;
        s->next=L; L=s;
    }
}

8.4.2. 堆排序

  • 算法思路:首先将存放在 L [ 1... n ] 中的n个元素建成初始堆,由于堆本身的特点,堆顶元素就是最大值。将堆顶元素与堆底元素交换,这样待排序列的最大元素已经找到了排序后的位置。此时剩下的元素已不满足大根堆的性质,堆被破坏,将堆顶元素下坠使其继续保持大根堆的性质,如此重复直到堆中仅剩一个元素为止。
  • 在顺序存储的完全二叉树中:
    • 非终端结点的编号 :i ≤ [ n / 2 ]
    • i 的左右孩子 :2i 和 2i+1
    • i 的父节点:[ i / 2 ]

堆排序代码实现:

// 对初始序列建立大根堆
void BuildMaxHeap(int A[], int len){
    for(int i=len/2; i>0; i--) 		//从后往前调整所有非终端结点
        HeadAdjust(A, i, len);
}

// 将以k为根的子树调整为大根堆
void HeadAdjust(int A[], int k, int len){
    A[0] = A[k];
    for(int i=2*k; i<=len; i*=2){	//沿k较大的子结点向下调整
        if(i<len && A[i]<A[i+1])	
            i++;
        if(A[0] >= A[i])
            break;
        else{
            A[k] = A[i];			//将A[i]调整至双亲结点上
            k=i;					//修改k值,以便继续向下筛选
        }
    }
    A[k] = A[0]
}

// 交换a和b的值
void swap(int &a, int &b){
    int temp = a;
    a = b;
    b = temp;
}

// 对长为len的数组A[]进行堆排序
void HeapSort(int A[], int len){
    BuildMaxHeap(A, len);         	//初始建立大根堆
    for(int i=len; i>1; i--){      	//n-1趟的交换和建堆过程
        swap(A[i], A[1]);
        HeadAdjust(A,1,i-1);
    }
}
  • 算法效率分析:
    • 时间复杂度:O(n\log_{2}n)。建堆时间 O(n) ,之后进行 n-1 次向下调整操作,每次调整时间复杂度为O(\log_{2}n)。
    • 空间复杂度:O(1)。
    • 稳定性:不稳定。
       
  • 堆的插入:对于大(或小)根堆,要插入的元素放到表尾,然后与父节点对比,若新元素比父节点更大(或小),则将二者互换。新元素就这样一路==“上升”==,直到无法继续上升为止。

  • 堆的删除:被删除的元素用堆底元素替换,然后让该元素不断==“下坠”==,直到无法下坠为止。

8.5. 归并排序

  • 归并(Merge):把两个或多个已经有序的序列合并成一个新的有序表。k路归并每选出一个元素,需对比关键字k-1次。
  • 算法思想:先将数组进行拆分,每次拆成两份,然后继续拆分直到一组有两个元素为止,然后再进行两两整合排序,重复两两整合排序直至数组元素排序完成。

  

代码实现:

// 辅助数组B
int *B=(int *)malloc(n*sizeof(int));

// A[low,...,mid],A[mid+1,...,high]各自有序,将这两个部分归并
void Merge(int A[], int low, int mid, int high){
    int i,j,k;
    for(k=low; k<=high; k++)
        B[k]=A[k];
    for(i=low, j=mid+1, k=i; i<=mid && j<= high; k++){
        if(B[i]<=B[j])
            A[k]=B[i++];
        else
            A[k]=B[j++];
    }
    while(i<=mid)
        A[k++]=B[i++];
    while(j<=high) 
        A[k++]=B[j++];
}

// 递归操作
void MergeSort(int A[], int low, int high){
    if(low<high){
        int mid = (low+high)/2;
        MergeSort(A, low, mid);
        MergeSort(A, mid+1, high);
        Merge(A,low,mid,high);     //归并
    }
}
  • 算法效率分析:
  • The time complexity is O( \log_{2}n).
  • Space complexity: O(n).
  • Stability: Stable.

8.6. Radix sort

  • Algorithm idea: split the entire keyword into d bits, and do d times of "distribution" and "collection" according to the increasing order of each keyword bit (for example: one, ten, hundred), if the current processing keyword bit may be obtained r values, you need to create r queues.
  • Allocation: Each element is scanned sequentially, and the element is inserted into the corresponding queue according to the currently processed key bit. Time-consuming for a tripO(n)
  • Collection: dequeue and link the nodes in each queue in turn. It takes time to collect in one trip O(r).

  • Problems that radix sorting is good at handling:
    1. The keywords of data elements can be conveniently split into d groups, and d is small.
    2. The value range of each group of keywords is not large, that is, r is small.
    3. The number n of data elements is relatively large.
  • Algorithm Efficiency Analysis: Algorithm Efficiency Analysis:
    1. Time complexity: A total of d rounds of allocation and collection are performed, one round of allocation is required O(n), and one round of collection is required O(r). The time complexity is O\left \lfloor d(n+r) \right \rfloor, and it has nothing to do with the initial state of the sequence.
    2. Space complexity O(r), where ris the number of auxiliary queues.
    3. Stability: Stable.

8.7. Summary of Internal Sorting Algorithms

8.7.1. Comparison of internal sorting algorithms

Algorithm type best time complexity worst time complexity average time complexity space complexity stability
direct insertion sort O(n) O(n^{2}) O(n^{2}) O(1) Stablize
Bubble Sort O(n) O(n^{2}) O(n^{2}) O(1) Stablize
simple selection sort

O(n^{2})

O(n^{2}) O(n^{2}) O(1) unstable
Hill sort O(n^{2}) O(1) unstable
quick sort O(nlog_{2}n) O(n^{2}) O(nlog_{2}n) O(nlog_{2}n) unstable
heap sort O(nlog_{2}n) O(nlog_{2}n) O(nlog_{2}n) O(1) unstable
2-way merge sort O(nlog_{2}n) O(nlog_{2}n) O(nlog_{2}n) O(n) Stablize
radix sort O\left \lfloor d(n+r) \right \rfloor O\left \lfloor d(n+r) \right \rfloor O\left \lfloor d(n+r) \right \rfloor O(r) Stablize

8.7.2. Application of internal sorting algorithm

  • Factors to consider when choosing a sorting method:
    1. The number n of elements to sort.
    2. The size of the information volume of the element itself.
    3. The structure of keywords and their distribution.
    4. Stability requirements.
    5. The conditions of the language tool, the storage structure and the size of the auxiliary space, etc.
  • Choice of sorting algorithm:
    1. If n is small, direct insertion sorting or simple selection sorting can be used. Since the number of record movements required by direct insertion sorting is more than that of simple selection sorting, it is better to use simple selection sorting when the record itself has a large amount of information.
    2. If the initial state of the file is basically ordered by keywords, it is appropriate to choose direct insertion sorting or bubble sorting.
    3. O(nlog_{2}n)If n is large, a sorting method with a time complexity of : quick sort, heap sort, or merge sort should be used . Quick sorting is considered to be the best method among comparison-based internal sorting methods at present. When the keywords to be sorted are randomly distributed, the average time of quick sorting is the shortest. Heapsort requires less auxiliary space than quicksort, and does not exhibit the worst-case scenarios that quicksort can, both of which are unstable. If the sorting is required to be stable and the time complexity is O(nlog_{2}n), then merge sorting can be selected. However, the sorting algorithm introduced in this chapter for pairwise merging from a single record is not worth advocating, and it can usually be used in combination with direct insertion sorting. First use direct insertion sorting to obtain longer ordered sub-files, and then merge them in pairs. Insertion sort is stable, so the improved merge sort is still stable.
    4. In the comparison-based sorting method, after comparing the size of two keywords each time, there are only two possible transitions, so a binary tree can be used to describe the comparison and judgment process, which can prove that: when n keys of a file When the words are randomly distributed, any sorting algorithm with the help of "comparison" will take at least 100 O(nlog_{2}n)minutes.
    5. If n is large, and the number of key words recorded is small and can be decomposed, it is better to use radix sorting.
    6. When the record itself has a large amount of information, in order to avoid spending a lot of time moving records, a linked list can be used as a storage structure.

8.8. External sorting

8.8.1. Basic concepts and methods of external sorting

  1. External sorting: When sorting large files, because there are many records in the file and the amount of information is huge, it is impossible to copy the entire file into the memory for sorting. Therefore, it is necessary to store the records to be sorted in the external storage, and then transfer the data part by part into the memory for sorting. During the sorting process, multiple exchanges between the internal memory and the external storage are required.
  2. Steps of external sorting:
    ① According to the size of the memory buffer, divide the files on the external storage into r sub-files, read them into the memory in turn and use the internal sorting method to sort them, and rewrite the ordered sub-files obtained after sorting Return to external storage (merge segments).
    ② Carry out S k-way merging for these merged segments, so that the merged segments (ordered subfiles) gradually increase from small to large until the entire ordered file is obtained, among which (need to S=\left \lceil \log_{k}r \right \rceilallocate k input buffers and 1 output buffer)
  3. How to perform k-way merge:
    ① Read the blocks of k merged segments into k input buffers.
    ② Use the "merge sort" method to select a few minimum records from the k merged segments and temporarily store them in the output buffer.
    ③ When the output buffer is full, write out the external memory.
  4. External sorting time overhead = time for reading and writing external storage + time required for internal sorting + time required for internal merging
  5. Optimization:
    ①Increase the number of merging paths k, but the corresponding input buffer needs to be increased, and each time a minimum element is selected from k merging segments, it needs to be compared (k-1) times.
    ② Reduce the initial number of merged segments r.

8.8.2. Loser tree 

  • The problem solved by the loser tree : using multi-way balanced merging can reduce the number of merging times, but selecting a minimum element from k merge segments requires comparing keywords (k-1) times, and constructing a loser tree can increase the number of keyword comparisons reduced to \left \lceil \log_{k}r \right \rceil.
  • The loser tree can be regarded as a complete binary tree (with one more head). The k leaf nodes correspond to the elements currently participating in the comparison in the k merged segments, and the non-leaf nodes are used to remember the "losers" in the left and right subtrees, and let the winner continue to compare up until the root node .

8.8.3. Permutation-selection sort (generate initial merge segment)

  • Replacement-selection sort: produce longer initial merge segments, thereby reducing the number of initial merge segments.
  • Assume that the initial file to be sorted is FI, the initial output file of the merged segment is FO, the memory work area is WA, the initial state of FO and WA is empty, and WA can accommodate w records. The steps of the replacement-selection algorithm are as follows:
    1. Import w records from FI to workspace WA.
    2. Select the record in which the keyword takes the minimum value from WA, and record it as MINIMAX record.
    3. Output MINIMAX records to FO.
    4. If FI is not empty, input the next record from FI to WA.
    5. Select the record with the smallest key from all the records in WA whose key is larger than that of MINIMAX, and use it as a new MINIMAX record.
    6. Repeat ③~⑤ until no new MINIMAX record can be selected in WA, and thus get an initial merge segment, and output an end mark of the merge segment to FO.
    7. Repeat ②~⑥ until WA is empty. Thus, all initial merged segments are obtained.

8.8.4. Optimal merge trees

  • Optimal merging tree: Adjust the order of multiple initial merging segments for multi-way merging, thereby reducing the number of I/Os during multi-way merging.
  • Theoretical basis: Each initial merging segment corresponds to a leaf node, and the number of blocks in the merging segment is used as the weight of the leaf. The WPL of the merging tree=the sum of the weighted path lengths of all leaf nodes in the tree. The number of disk I/Os during the merge process = the WPL of the merge tree* 2.
  • Note: The best merging tree for k-ary merging must be a strict k-ary tree, that is, there are only nodes with degree k and degree 0 in the tree.
  • Construct a k-fork Huffman tree: each time select k trees with the smallest weight of the root node to merge, and use the sum of the weights of the k root nodes as the weight of the new root node.
  • Supplementary virtual segment:
    ① (Number of initial merged segments - 1) % (k-1) = 0, indicating that a strict k-ary tree can just be formed, and there is no need to add a virtual segment at this time ② (Number of initial
    merged segments - 1) % (k -1) = \neq0, you need to add (k-1) - u virtual segments

Guess you like

Origin blog.csdn.net/weixin_43313333/article/details/129588429