【Transfer】Data Structure and Algorithm Analysis

problem elicited

Suppose there is a problem: there is a set of N numbers and the k-th largest one is to be determined, which we call the choice problem. How to write this program? Most intuitively, there are at least two ideas:

1. Read N numbers into an array, and then sort the array in descending order through some simple algorithm, such as bubble sort, then the element at the kth position is the element we need

2. Slightly better, read k elements into the array and sort them in descending order, then read the next elements one by one, when the new element is read, if it is less than the kth element in the array then ignore it, otherwise put it in the correct position in the array, and squeeze an element in the array out of the array at the same time, when the algorithm terminates, the element at the kth position is returned as the answer

Both algorithms are simple, but assuming that we run a simulation with a random file of 10 million elements and k=5000000, we will find that both algorithms, although eventually giving the correct answer, cannot end in a reasonable amount of time. Therefore, neither of these algorithms can be considered good algorithms because from a practical point of view, they cannot process the input data in a reasonable amount of time.

Data Structure and Algorithm Analysis

In many problems, a very important concept is that writing a working program is not enough . If the program is run on a huge dataset, then the running time becomes an important issue, we will see in the next article how to estimate the running time of the program for a large number of inputs, especially how to Compare the running time of the two programs. We'll also see ways to drastically improve program speed and identify program bottlenecks that will allow us to discover those pieces of code that require our focused efforts to optimize.

So, first of all, let's understand what is data structure and algorithm analysis (specially point out that the following examples are all written in Java code).

data structure

A data structure is a way for a computer to store and organize data, and it refers to a collection of data elements that have one or more specific relationships between data. Often, a well-chosen data structure can lead to higher operational or storage efficiency (which is why we study data structures), and data structures are often associated with efficient retrieval algorithms and indexing techniques.

Common data structures include arrays, stacks, queues, linked lists, trees, hashes, etc. These data structures will be the focus of research in the classification of this data structure.

Analysis of Algorithms

An algorithm is a clearly specified collection of simple instructions to be followed in order to solve a problem . For a problem, once an algorithm is given and (in some way) determined to be correct, the important asynchrony is a matter of determining how much of the algorithm will need to inject resources such as time or space. if:

1. It takes as long as a year to solve a problem, so this algorithm is hardly useful

2. An algorithm that requires several GB of memory for a problem cannot be used on most current machines.

Mathematical basis

Whether it is data structure or algorithm analysis, a lot of mathematical foundations are used. The following is a brief summary of these mathematical foundations:

1. Index

（1）X ^A X ^B = X ^A+B

（2）X ^A /X ^B = X ^A-B

（3） (X ^A ) ^B = X ^ABsi

（4）X^N + X^N= 2X^N ≠ X^2N

（5）2 ^N + 2 ^N = 2 ^{N + 1}

2. Logarithmic

(1) X ^A = B if and only if log _x B = A

（2）log_AB = log_CB / log_CA

（3）logAB = logA + logB，A>0且B>0

3. Levels

（1）∑2 ⁱ = 2 ^N+1 - 1

(2) ∑A ⁱ = (A ^N+1 - 1) / (A - 1), if 0<A<1, then ∑A ⁱ ≤ 1 / (1 - A)

4. Modulo operation

If N is divisible by A and N is divisible by B, then A and B are said to be congruent modulo N, denoted as A≡B(mod N) . Intuitively, this means that whether A or B is removed by N, the remainder is the same, so if there is A≡B(mod N), then:

（1）A + C ≡ B + C(mod N)

（2） AD ≡ BD (mod N)

time complexity

In computer science, the time complexity of an algorithm is a function that quantitatively describes the running time of the algorithm. This is a function of the length of the string representing the input value of the algorithm. The time complexity is often expressed in big O notation, excluding the low-order term and leading coefficient of this function. When using this method, the time complexity can be called To be asymptotic, he examines the situation as the magnitude of the input value approaches infinity.

So first look at a simple example, here is a simple program fragment to calculate Σi ^{3 :}

1 public static void main(String[] args)
 2 {
 3     System.out.println(sum(5));
 4 }
 5
 6 public static int sum(int n)
 7 {
 8     int partialSum;
 9     
10 partialSum = 0;
11     for (int i = 0; i <= n; i++)
12         partialSum += i * i * i;
13     
14     return partialSum;
15 }

Analysis of this program fragment is simple:

1. The statement is not recorded in time

2. The 10th line and the 14th line each occupy a time unit

3. Line 12 takes 4 time units for each execution (two multiplications, one addition and one assignment), and N times of execution takes up 4N time units in total

4. Line 11 is initializing i, testing i≤n and auto-incrementing i all imply overhead, all these total overheads are initializing 1 time unit, all tests are N+1 time units, all auto-incrementing is N time units, a total of 2N+2 time units

Ignoring the overhead of calling the method and the return value, the total amount is 6N+4 time units. According to the original definition , the low-order term and leading coefficient of this function are not included , so we say that the time complexity of this method is O( N). Then, by the way, we derive some general rules:

Rule 1 - for loop

The running time of a for loop is at most the running time of the statements (including tests) inside the for loop multiplied by the number of iterations, so if a for loop iterates N times, its time complexity should be O(N)

Rule 2 - Nested for loops

Analyzing these loops from the inside out, the total running time of a statement inside a set of nested loops is the product of the running time of the statement multiplied by the size of all the for loops in the set , so if you have the following code:

1 public static int mutliSum(int n)
 2 {
 3     int k = 0;
 4     for (int i = 0; i < n; i++)
 5     {
 6         for (int j = 0; j < n; j++)
 7         {
 8             k++;
 9         }
10     }
11     
12     return k;
13 }

Then its time complexity should be O(N ² )

Rule 3 - Sequential Statements

The running time of each statement can be summed, for example, the following code:

1 public static int sum(int n)
 2 {
 3     int k = 0;
 4     
 5     for (int i = 0; i < n; i++)
 6         k++;
 7     for (int i = 0; i < n; i++)
 8     {
 9         for (int j = 0; j < n; j++)
10         {
11             k++;
12         }
13     }
14
15     return k;
16 }

The time complexity of the first for loop is N, and the time complexity of the second nested for loop is N ² . Taken together, the time complexity of the sum method is O(N ² )

Common time complexity and time efficiency relationships have the following rules of thumb:

O(1) < O(log₂N) < O(N) < O(N * log₂N) < O(N²) < O(N³) < O(N!)

As for which data structure and algorithm each time complexity corresponds to, we will talk about it later. From the above rules of thumb: the first four algorithms are relatively efficient, the middle two are unsatisfactory, and the last one is relatively poor (as long as n is relatively large, The algorithm will not work).

【Source】

【Transfer】Data Structure and Algorithm Analysis

Guess you like