C language data structure [handwritten version] Chapter 1 Introduction

Chapter 1 Introduction

Insert image description here

1.1. Introduction

Since the advent of the world's first computer in 1946, computer science and technology have developed rapidly. At the same time, computer applications have gradually expanded from the initial scientific calculation to various fields of human society. Computer processing objects are not just simple numbers, but have developed to include various non-numeric data such as characters, tables, graphics, images, sounds, etc. To develop a software with reasonable structure and good performance, and to write a "good" program, you not only need to master at least one suitable high-level computer language or software development tool, but you must also be able to analyze the characteristics of the object to be processed and the characteristics of the object to be processed. The relationships that exist between objects. This is the background for the development of "Data Structures" as an independent course.

In the early days of computer development, the main purpose of people using computers was to deal with numerical calculation problems. When using a computer to solve a specific problem, you generally need to go through the following steps: first, abstract an appropriate mathematical model from the specific problem, then design or select an algorithm to solve the mathematical model, and finally compile a program for debugging and testing. , until the final answer is obtained. Since the operands involved at that time were simple integer, real or Boolean data types, programmers mainly focused on programming skills without paying attention to data structures. With the expansion of computer application fields and the development of software and hardware, non-numerical computing problems are becoming more and more important. According to statistics, processing non-numerical computational problems takes up more than 90% of computer running time. The data structures involved in such problems are more complex, and the relationships between data elements generally cannot be described by mathematical equations. Obviously, the key to solving such problems is no longer mathematical analysis and calculation methods, but the design of appropriate data structures to effectively solve the problem.

The famous computer scientist Professor N. Wirth once proposed: algorithm + data structure = program. The data structure here refers to the logical structure and storage structure of the data, while the algorithm is the description of data operations. (It can be seen that the essence of programming is to choose a good data structure and design a good algorithm for actual problems, and a good algorithm depends to a large extent on the data structure that describes the actual problem. To design a " A good program must have a good algorithm, and a good algorithm must be based on studying the characteristics of data and the relationships between data. These are exactly what the course "Data Structure" will study.

What exactly is a data structure? Let’s first illustrate the concept of data structure through an example.

[Example 1.1] Library Information Retrieval System
When we search for relevant information about a book based on its title, or when we search for related information about a book based on its author or publisher, or when we search for relevant information about its author and publisher based on its book number. , as long as the relevant data structure is established and the relevant programs are written according to a certain algorithm, automatic computer retrieval can be achieved. If you use a computer to handle the above book retrieval problem, you must first create a basic book information table. Listed on each row is the information of a book, which generally includes the registration number, book title, author, classification number, publisher and publication number. Time and other items, among which the registration number is unique, as shown in Table 1.1.
Insert image description here
The data elements (one row) in the table can establish corresponding index tables according to accession numbers, book titles, authors, etc. The files composed of these tables are mathematical models for book catalog retrieval. The main operation of the computer is to press a certain requirement (such as Book title, author) to query and retrieve bibliographic documents. Problems such as this also include various directory inquiry systems, warehouse management systems, long-term processing, etc. The processing objects in this type of problem are all in the simplest linear relationship, and their corresponding mathematical models are called linear data structures.

[Example 1.2] Graph coloring problem. The graph coloring problem is derived from the map coloring problem: use m colors to color the map, so that each area of ​​the map is colored with a color, and the colors of adjacent areas are different. If you shrink a region into a vertex and connect two adjacent regions with an edge, you can abstract a region graph into a plane graph and a region adjacency graph, as shown in Figure 1.1.
Insert image description here

1.1.1.What is data structure?

In the 1850s, British scholars proposed the 4-coloring conjecture problem that any map can be colored with 4 colors. It took more than 100 years for this problem to be proved on a computer by American scholars. This is the famous four-color theorem. For example, in Figure 1.1, colors are represented by numbers and letters represent regions, and the figure represents the different coloring conditions of different regions.

For another example, the blood relationship of a family, the Game Tree problem (a person and a machine play chess), the file system of a computer, etc. are all tree structures, while the transportation network between cities, the arrangement of activities in project management, and multi-intersections Problems such as traffic light management are graphically structured. They are both non-linear data structures.

It can be seen that the mathematical models describing such non-numerical computing problems are no longer mathematical equations, but data structures such as tables, trees and graphs. Simply put, data structure is a course that studies the computer's operating objects and the relationships and operations between them in non-numerical computing programming problems. Specifically, the data structure refers to the logical structure between data elements, the storage structure and the abstract operation of the data, that is, a set of data organized according to a certain logical relationship, and then stored in a certain storage representation method. In the computer's memory, and defining a set of operations on these data, this is called a data structure.

Data structure is one of the core courses in computer software and computer application majors. Various data structures are used in many computer system software and application software. Therefore, it is difficult to cope with many complex topics by mastering only a few computer languages. If you want to use computers effectively, you must also learn relevant knowledge of data structures.

1.2.Basic concepts and common terms

This section will describe and define some basic concepts and common terms related to data structures to facilitate the study of subsequent chapters.

Data is a collection of numbers, characters, and symbols that describe objective things and can be input into a computer and processed by the computer. For example, an algebraic equation solver uses integers and real numbers, whereas a text editor uses strings. With the development of computers and the expansion of computer application fields, the meaning of data has also expanded. For example, graphics, images, sounds, etc. that can be processed by today's computers also belong to the category of data.

Data element is the basic unit of data. For example, in the previous example, a card in the card table (a row in the table), a node in the tree, a vertex in the graph, etc. are all data elements. Sometimes a data element can be composed of several data items (also called fields, domains, attributes). Data items are the smallest identification units with independent meanings, such as accession numbers, book titles, authors, etc. in book card information.

A data object is a collection of data elements with the same properties and is a subset of data. For example, the uppercase data object is the set {'A', 'B', ..., 'Z'}.

1.2.1. Contents of the data structure

A data structure is a collection of structured data elements. Structure refers to the relationship between data elements, that is, the organizational form of data. The data elements in the structure are called nodes. Although there is no standard definition of data structure, it generally includes the following three aspects:

(1) Logical structure of data

The logical (or abstract) relationship between data elements, also known as the logical structure of data.

The logical structure of data describes data in terms of logical relationships. It has nothing to do with the storage structure of data elements and is independent of the computer. Therefore, the logical structure of data can be viewed as a mathematical model abstracted from a specific problem. For example, the logical relationship between data elements in Table 1.1 in Section 1.1 is an adjacent relationship: for any node in the table, the node adjacent to it and in front of it is called a direct predecessor. This direct predecessor There is at most one trend; the node adjacent to any node in the table and behind it is called a direct successor, and there is at most one. Only the first node in the table has no direct predecessor, which is called the start node; only the last node has no direct successor, which is called the terminal node. For example, the direct predecessor node and direct successor node of the node where "operating system" is located in the table are the nodes where "data structure" and "database principle" are located respectively. The relationship between these nodes constitutes The logical structure of the book catalog card table. The logical structure of data can be divided into two categories: linear structure and nonlinear structure.

The characteristics of linear structure are: there is a one-to-one relationship between data elements (nodes), and there is only one start node and one terminal node in the structure, and the other nodes have only one direct predecessor and one Direct successor. Table 1.1 shows a typical linear structure. Chapters 2 and 3 of this book both introduce linear structures.

The characteristic of non-linear structure is that there is a one-to-many or many-to-many relationship between data elements, that is, a node may have multiple direct predecessors and multiple direct successors. The structure includes tree structure, graph structure, mesh structure, etc. Chapters 5 to 7 of this book all introduce nonlinear structures.

(2) Data storage structure (physical structure)

The way data elements and their relationships are stored in a computer is called the storage structure (physical structure) of data. For example, if the elements in a vector are stored in "order" according to their logical relationships, it is called a "sequential storage structure"; if the elements in the vector are connected and stored in memory through "pointers", it is called a "chain storage structure."

The storage structure of data is the storage representation (image) of data in the computer, also known as the physical structure of data. It includes the representation of data elements and relationships and is computer language dependent. The storage structure of data can be implemented using the following four basic storage methods:

① Sequential storage method:

The sequential storage method is to store logically adjacent nodes in continuous storage units that are also physically adjacent. The resulting storage structure is called a sequential storage structure. It is usually described with the help of arrays in programming languages. This method is mainly applied to linear data structures, but non-linear data structures can also be stored sequentially through some linearization method.

② Link storage method:

The linked storage method uses a set of storage units that are not necessarily continuous to store logically adjacent elements. The logical relationship between elements is represented by additional pointer fields. The resulting storage structure is called a chained storage structure. It is usually described with the help of pointers in programming languages.

③ Index storage method:

The index storage method usually stores element information and also creates an additional index table. The general form of index entries in the table is: (keyword, address). A keyword is a data item or a combination of multiple data items that uniquely identifies an element.

④ Hash storage method

The basic idea of ​​the hash storage method is to directly calculate the storage address of the element based on its keyword.

No matter how the data structure is defined, the three aspects of data logical structure, storage structure and operation should be regarded as a whole. Therefore, storage structure is an indispensable aspect of data structure.

The same logical structure can produce different storage structures using different storage methods. Which storage structure is chosen to represent the corresponding logical structure depends on the specific application system requirements, and the main considerations are the convenience of operation and the time and space requirements of the algorithm.

(3) Data operations, that is, operations (behaviors) applied to data elements

Data operations are defined on the logical structure of the data. Each logical structure has a set of operations. The most commonly used operations are: retrieval, insertion, deletion, update, sorting, etc. Data operations are an inseparable aspect of the data structure. After the logical structure and storage structure of the data are given, completely different data structures may result depending on the defined set of operations and their operation properties.

If the insertion and deletion operations on a linear list are restricted to one end of the list, the line list is called a stack;

If the insertion operation of a linear list is restricted to one end of the list, and the deletion operation is restricted to the other end of the list, the linear list is called a queue.

Data type is a concept closely related to data structure. The so-called data type is a collective name for a set of values ​​and a set of operations defined on this value set. In a program written using a high-level programming language, each variable, constant, or expression has a data type to which it belongs. A data type specifies the possible range of values ​​a variable or expression can take during program execution and the operations allowed on these values. For example, the integer type in the C language gives the value range of an integer (taken from different machines or compilation systems), and defines the addition, subtraction, multiplication, division and operation that can be applied to the integer. Modular arithmetic operations.

In high-level programming languages, data types can be divided into two categories according to the different characteristics of "value": one type is called atomic type (or non-structural type) whose value cannot be decomposed, such as the basic type in C language (integer, real, character and enumeration types) as well as simple types such as pointer types and null types; the other type is the structural type, whose value can be composed of several components (or components) according to a certain structure, and its Components can be non-structural or structural, such as arrays, structures and other types in C language. Generally, data types can be thought of as data structures that have been implemented in a programming language.

Abstract Data Type (ADT) is a new concept proposed in the 1970s. It is the organization of abstract data and operations related to it. An ADT can be viewed as a mathematical model that defines relevant operations. For example, the union, intersection, and difference operations of sets and sets can be defined as an abstract data type.

An abstract data type can be viewed as a model that describes a problem and is independent of the specific implementation. Its characteristic is that data definition and data operations are encapsulated together, so that user programs can only access the data through certain operations defined in ADT, thus achieving information hiding. This abstract data type is similar to a class in C++.

As an example, look at the description of a "circle" data type. We know that to represent a circle, the position of the center of the circle and the size of the radius should generally be included. If you only care about the area of ​​a circle, then you only need data representing the radius in this abstract data type. Suppose you want to design a circle (Circle) abstract data type, which includes operations to calculate the area (area) and circumference (circumfereCircle). The abstract data type of Circle is described as follows:
Insert image description here

Since this book is based on the C language to describe algorithms, and the C language does not provide the data type "class", abstract data types cannot be implemented, so we will not use the form of ADT to describe the data structure. But just remember that ADT is actually equivalent to the logical structure of the data we define and the abstract operations defined on the logical structure.

1.3. Description and analysis of algorithms

As stated in the previous article: the purpose of studying data structures is to better program design. Programming is inseparable from the operation of data. This operation process (or problem-solving method) is usually called an algorithm. For example, you need to use a computer to calculate the area of ​​a triangle formed by three known coordinate points a(x1, y1), b(x2, y2), c(x3, y3). First, we must find out the relevant calculation formulas for solving the triangle area based on actual problems (abstract the mathematical model), and then solve the calculation step by step. For example, to calculate the area, you must first find the side length. The formula for finding the side length is: The
Insert image description here
formula for finding the area of ​​a triangle is. After
Insert image description here
having these formulas (models), the process of solving the problem (also called the method of solving the problem) is given. or steps), this is called an algorithm. The algorithm of this problem is described as follows:
(1) Input the three coordinate points a, b and c of the triangle
(2) Calculate the length of the three sides and half of the sum of the side lengths.
(3) Calculate the area of ​​the triangle.
(4) Output the side length and area of ​​the triangle.
Then write the corresponding program code according to the description of the algorithm, and debug and run it on the computer until the correct result is obtained.

1.3.1. Algorithm description

As can be seen from the above examples, an algorithm is a description of the steps to solve a problem, the methods and steps taken to solve a certain problem. . In layman's terms, an algorithm is a method of solving problems. Strictly speaking, an algorithm is a finite sequence of instructions, each of which represents one or more operations.

In addition, the algorithm must also meet the following five criteria:
(1) The variables used in the algorithm must be initialized before starting the input algorithm. The input of an algorithm can contain zero or more data.
(2) The output algorithm has at least one or more outputs.
(3) The number of executions of each instruction in the finite algorithm is limited, and each step is completed within a finite time, that is, the algorithm must end after executing a finite number of steps.
(4) The meaning of each instruction in a deterministic algorithm must be clear and unambiguous.
(5) Feasibility algorithm is feasible, that is, the operations described in the algorithm can be realized through a limited number of basic operations.

Obviously a program is an algorithm if it does not fall into an infinite loop for any input. The meaning of an algorithm is very similar to a program, but there is a difference between the two: a program must rely on a computer programming language, while an algorithm can be described in natural language, computer programming language, mathematical language or conventional symbolic language.

For example, the above algorithm for solving the area of ​​a triangle is described in Chinese language. There are currently two most commonly used languages ​​for describing algorithms, one is Pascal-like, and the other is C-like, which is similar to C language but not completely identical to C language. The C-like language relies on the grammatical structure of the C language, supplemented by natural language descriptions, so that the algorithms written in it have a good structure and are not restricted to certain details of the specific programming language. Therefore, C-like languages ​​make algorithms easy to read and write.

In order to facilitate computer verification of algorithms and improve readers' practical programming capabilities, this book mostly uses C language to describe algorithms. Each algorithm in the book is basically a C function, but there are also very few functions that use some knowledge of C++. , for example, the C++ line comment character "//" is used in the algorithm description. Some C language compilers may not be able to pass the debugging, but it can be debugged in the Visual C environment. However, special attention should be paid when using it. When debugging and running the algorithm, some related types and variable descriptions or functions must be added. For example, if you want to write an algorithm to find n! (n factorial), it is actually a C language function:
Insert image description here

1.3.2. Representation of algorithm

Common representation methods of algorithms:
• Natural language: directly described in English, Chinese or other languages.
• Computer language: described in a certain language format, such as C language.
• Pseudocode: described by words and symbols between natural language and computer language.
• Flowchart: A logic diagram describing an algorithm, divided into traditional flowchart and NS flowchart.

  1. Traditional flowcharts:
    Insert image description here
    usage examples:
    Insert image description here
    three basic structures:
    Insert image description here

  2. NS structure flow chart
    has three basic structures:
    Insert image description here

1.3.3. Algorithm analysis

There may be many different algorithms to solve a problem, and the quality of the algorithm directly affects the execution efficiency of the program, and the operating efficiency of different algorithms varies greatly.

[Example 1.3] The problem of buying a hundred chickens for a hundred dollars.
At the end of the 5th century AD, the ancient Chinese mathematician Zhang Qiujian raised such a question in his "Suan Jing": "One chicken is worth five; one hen is worth three." "Three chickens are worth one. If you buy a hundred chickens for a hundred dollars, what are the roosters, hens, and chicks?" The number of roosters is a, the number of hens is b, and the number of chicks is c. According to the meaning of the question, the following equation can be obtained :
Analysis: Suppose the number of roosters is a, the number of hens is b, and the number of chicks is c. According to the meaning of the question, the following equation is obtained: Algorithm: The mathematical
Insert image description here
model obtained based on the above is difficult to solve using the usual analytical method. But it is easy to achieve using the exhaustive method. The specific implementation algorithm is as follows:
Insert image description hereThe above algorithm is a triple loop. The main execution time depends on the number of executions of the loop body of the third loop. Every time the outer loop is executed once, the inner loop needs to be executed 101 times, so the entire algorithm needs to be executed 101×101× 101 (approximately more than 1 million) times. For a computer to solve such a simple problem, the execution time is intolerable. Therefore, this algorithm is not a good algorithm.
In fact, the above algorithm can be improved completely. For example, if you buy a rooster for 5 yuan, you can only buy 20 roosters for 100 yuan. Similarly, you can only buy 33 hens for 100 yuan. The chickens can only be bought with the remaining money from the roosters and hens. purchase. Therefore, the above algorithm can be changed to:
Insert image description hereThis algorithm has only two loops, and the inner loop body only needs to be executed 21×34-714 times, which is quite different from the more than 1 million times of the previous algorithm. Therefore, designing a good algorithm is crucial to improving the execution efficiency of the program.

So, how to evaluate the pros and cons of these algorithms and then choose a good algorithm? Obviously, the "correctness" of the algorithm is the first thing to consider. The so-called correctness of an algorithm means that for all legal input data, the algorithm can obtain correct results after a limited time of execution. In addition, the following points should be mainly considered:
(1) The time it takes to execute the algorithm, that is, the time complexity.
(2) The storage space consumed to execute the algorithm is mainly auxiliary space, that is, space complexity.
(3) The algorithm should be easy to understand, easy to program, easy to debug, etc., that is, readable and operable.

Among the above points, the main one is time complexity. The time spent by an algorithm should be the sum of the execution time of each statement in the algorithm, and the execution time of each statement is the product of the number of executions of the statement (also called frequency) and the time required to execute the statement once. However, the time it takes for different computer systems to perform a basic operation varies widely and cannot be measured by a unified quantity. Generally speaking, the number of times the basic operations in the algorithm are repeated is a function f(n) of the problem size n, and the time measurement of the algorithm is recorded as: T(n) =O(f(n)).


Insert image description here
[Example 1.4] Find the product of two n-order matrices C= A The loop will terminate only when it is established, so its frequency is n+1, but its loop body can only be executed n times. Statement (2), as the statement in the loop of statement (1), should be executed n times, but statement (2) itself must be executed n+1 times, so the frequency of statement (2) is n (n+1). In the same way, The frequencies of statement (3), statement (4) and statement (5) are n2, n2 (n+1) and n3 times respectively. Therefore, the sum of the frequencies of all statements in this algorithm is:

T(n)=(n+1)+n(n+1)+n2+n2(n+1)+n3=2n3+3n2+2n+1

The elapsed time T(n) is a function of the matrix order n. Generally speaking, the input volume required by the algorithm to solve the problem is called the scale of the problem, and is represented by a positive integer n. For example, the scale of the above matrix product problem is the order n of the matrix. The time complexity (time complexity) T(n) of an algorithm is the time consumption of the algorithm, which is a function of the problem modulo n. When the problem size n tends to be infinitely large, we call the order of magnitude (order) of the time complexity T(n) the asymptotic time complexity of the algorithm.

For example, the time complexity of the matrix product algorithm T(n), when n is large enough, the ratio of T(n) to n3 is a non-zero constant, then T(n) and n3 are said to be of the same order, or It is said that T(n) and n3 are of the same order of magnitude, which can be written as T(n) =O(n3). At this time, we say that T(n) =O(n3) is the asymptotic time complexity of the matrix product algorithm.

If the number of repeated executions of the basic operations in the algorithm is regarded as a function f(n) of the problem size n, the asymptotic time complexity of the algorithm is recorded as: T(n) =O(f (n)). It means that as the problem size n increases, the growth rate of the algorithm execution time is the same as the growth rate of f(n), where f(n) is generally the most frequent statement frequency in the algorithm. When analyzing algorithms, the time complexity and asymptotic time complexity of the algorithm are often not distinguished, and the asymptotic time complexity T(n)=O(f(n)) is often referred to as time complexity. For example, the time complexity of the matrix product algorithm is generally T(n)=O(n3), where f(n)=n3 is the frequency of statement (5) in the algorithm.

[Example 1.5] Find the algorithm time complexity of the following program segment.
Insert image description here
Analysis: Since the time complexity of the algorithm only considers the growth rate for the problem size n, when it is difficult to accurately calculate the number of basic operation executions (or statement frequency), you only need to find its growth rate or order with respect to n. That’s it. Therefore, the growth rate of the number of executions of the above statement x=x+1 with respect to n is n2, which is the fastest growing term in the statement frequency expression (n-1) (n-2) /2, so the program segment The time complexity of the algorithm is O(n2).

If the execution time of an algorithm is a constant independent of the problem size n, even if it is a large constant, the time complexity of the algorithm is of constant order, recorded as T(n) =O (1). For example:
Insert image description here
since x and y are constants, the number of runs can be counted, and the total number of runs is also a constant. For the time complexity of any constant, we express it as 1, that is, O(1). Therefore, the time complexity of the algorithm described in this program segment is O(1).

The time complexity of the algorithm usually has O(1), O(n), O(log2n), O(nlog2n), O(n2), O(n3), O(n3), O(2n), and O(n !) and other forms, arranged in increasing order of magnitude, in order: constant order O(1), logarithmic order O (log2n), linear order O(n), linear logarithmic valence O(nlog2n), square order O(n2) , cubic order O(n3),...k-th power order O(nk), exponential order O(2n) and factorial order O(n!).

Similar to the time complexity, the space complexity S(n) of an algorithm is defined as the storage space consumed by the algorithm. It is a measure of the amount of storage space temporarily occupied by an algorithm during operation and is a function of the problem size n. The storage space occupied by an algorithm on the computer memory includes three aspects: the storage space occupied by the storage algorithm itself, the storage space occupied by the input and output data of the algorithm, and the storage space temporarily occupied by the algorithm during operation.

1.3.4. Common time complexity

The time cost of program execution will not increase as the size of the execution data increases (that is, it will not change). This time cost is a constant-level time complexity, marked as O(1).
Insert image description here
The time cost of program execution has a stable linear growth trend (i.e. linear growth) as the size of the execution data increases. This time cost is a linear level of time complexity, marked as O(N) (similar to y=kx linear function).
Insert image description here
The time cost of program execution increases exponentially as the size of the execution data increases. This time cost is an exponential time complexity, which is marked as O(N2).
Insert image description here

Note: O(log n) and O(1) overlap in the chart, and O(log n) is slightly above O(1).

1.3.5. Common space complexity

The space cost of program execution will not increase as the execution data size and time increase (that is, it will not change). This space usage cost is a constant-level space complexity, marked as O(1).
Insert image description here
The space cost of program execution has a stable linear growth trend (i.e. linear growth) as the execution data size and time increase. This space usage cost is a linear level of space complexity, marked as O(N).
Insert image description here

summary

The famous Swiss computer scientist Professor Worth once proposed: algorithm + data structure = program. He not only pointed out the status of data structures and algorithms in computer science, but also pointed out the close relationship between algorithms and data structures. In fact, in the process of using computers to solve practical problems, data structure and algorithm are two aspects that complement each other and are indispensable: data structure is the object of algorithm processing and the basis for designing algorithms. The data of a specific problem is stored in the computer. It can often be represented by a variety of different data structures: On the other hand, the calculation process of a practical problem often has multiple available algorithms. Therefore, choosing what kind of data structure and algorithm has become the most important issue in the process of implementing an application.

Algorithm analysis is a focus and difficulty of this chapter. The quality of the algorithm directly affects the operating efficiency of the program, and the operating efficiency of the program directly affects the use of the actual application system and its life cycle. Therefore, it is necessary to deeply understand and master the ideas and methods of algorithm analysis, as well as concepts such as time complexity measurement.

The main concepts in this chapter include: data, data elements, data structure, logical structure, storage structure; algorithm, algorithm design, algorithm analysis, time complexity, etc.

This article is a C language data structure [handwritten version]. There are some changes in the article and it is not original.

Guess you like

Origin blog.csdn.net/qq_43460743/article/details/130002078