Xiaobai Getting Started to Learn the Fundamentals of Data Structures and Algorithms "1"

table of Contents

Introduce

First try

Algorithm proposal

Algorithm concept

Five characteristics of the algorithm

Second attempt

Algorithm efficiency measurement

Execution time reflects algorithm efficiency

Is the time value alone absolutely credible?

Time complexity and "Big O Notation"

How to understand the "big O notation"

Worst time complexity

Several basic calculation rules of time complexity

Space complexity

Analysis of Algorithms

Common time complexity

The relationship between common time complexity

Python built-in type performance analysis

timeit module

Operation test of list

Time complexity of list built-in operations

Time complexity of dict built-in operations

data structure

concept

The difference between algorithm and data structure

Abstract Data Type


Introduce

Let's look at a question first:

If a+b+c=1000, and a^2+b^2=c^2 (a, b, c are natural numbers), how to find all possible combinations of a, b, and c?

First try

import time

start_time = time.time()

# 注意是三重循环
for a in range(0, 1001):
    for b in range(0, 1001):
        for c in range(0, 1001):
            if a**2 + b**2 == c**2 and a+b+c == 1000:
                print("a, b, c: %d, %d, %d" % (a, b, c))

end_time = time.time()
print("elapsed: %f" % (end_time - start_time))
print("complete!")

operation result:

a, b, c: 0, 500, 500
a, b, c: 200, 375, 425
a, b, c: 375, 200, 425
a, b, c: 500, 0, 500
elapsed: 214.583347
complete!

Note the running time: 214.583347 seconds

Algorithm proposal

Algorithm concept

Algorithm is the essence of computer processing information, because a computer program is essentially an algorithm to tell the computer the exact steps to perform a specified task. Generally, when an algorithm is processing information, it reads data from the storage address of the input device or data, and writes the result to the output device or a certain storage address for later recall.

Algorithm is an independent method and idea to solve problems.

For algorithms, the language of implementation is not important, what is important is thought.

Algorithms can have different language description implementation versions (such as C description, C++ description, Python description, etc.). We are now using the Python language to describe the implementation.

Five characteristics of the algorithm

  1. Input : Algorithm has 0 or more inputs
  2. Output : The algorithm has at least one or more outputs
  3. Finite resistance : after a limited step algorithm automatically closed without an endless loop, and each step can be completed within an acceptable time
  4. Certainty : Every step in the algorithm has a certain meaning, and there will be no ambiguity
  5. Feasibility : Each step of the algorithm is feasible, which means that each step can be executed a limited number of times to complete

Second attempt

import time

start_time = time.time()

# 注意是两重循环
for a in range(0, 1001):
    for b in range(0, 1001-a):
        c = 1000 - a - b
        if a**2 + b**2 == c**2:
            print("a, b, c: %d, %d, %d" % (a, b, c))

end_time = time.time()
print("elapsed: %f" % (end_time - start_time))
print("complete!")

operation result:

a, b, c: 0, 500, 500
a, b, c: 200, 375, 425
a, b, c: 375, 200, 425
a, b, c: 500, 0, 500
elapsed: 0.182897
complete!

Note the running time: 0.182897 seconds

Algorithm efficiency measurement

Execution time reflects algorithm efficiency

For the same problem, we have given two solutions to the same problem. In the implementation of the two algorithms, we measured the execution time of the program and found that the execution time of the two programs is very different (214.583347 seconds compared to 0.182897 seconds). From this we can draw the conclusion: the execution time of the algorithm program can reflect the efficiency of the algorithm, that is, the pros and cons of the algorithm.

Is the time value alone absolutely credible?

Suppose we run the algorithm program of the second attempt on a computer with an ancient configuration and poor performance. What will happen? It is very likely that the running time will not be much faster than the 214.583347 seconds of running Algorithm 1 on our computer.

Simply relying on running time to compare the pros and cons of algorithms is not necessarily objective and accurate!

The operation of the program cannot be separated from the computer environment (including hardware and operating system). These objective reasons will affect the speed of the program and reflect the execution time of the program. So how can we objectively judge the pros and cons of an algorithm?

Time complexity and "Big O Notation"

We assume that the time for the computer to execute each basic operation of the algorithm is a fixed unit of time, so how many basic operations there are represents how many units of time it will take. Obviously, for different machine environments, the exact unit time is different, but how many basic operations (that is, how many time units it takes) for the algorithm are the same in the order of magnitude, so the influence of the machine environment can be ignored. Objectively reflect the time efficiency of the algorithm.

For the time efficiency of the algorithm, we can use the "big O notation" to express.

"Big O notation": For a monotonic integer function f, if there is an integer function g and a real constant c>0, so that for a sufficiently large n there is always f(n)<=c*g(n), just say The function g is an asymptotic function of f (ignoring constants), denoted as f(n)=O(g(n)). That is to say, under the limit of infinity, the growth rate of function f is constrained by function g, that is, the characteristics of function f and function g are similar.

Time complexity: Suppose there is a function g, so that the time taken by algorithm A to process a problem with a scale of n is T(n)=O(g(n)), then O(g(n)) is called the asymptotic of algorithm A Time complexity, referred to as time complexity, denoted as T(n)

How to understand the "big O notation"

Although it is very good to conduct a particularly detailed and detailed analysis of the algorithm, the actual value in practice is limited. For the time and space nature of the algorithm, the most important thing is its magnitude and trend. These are the main parts of analyzing the efficiency of the algorithm. The constant factors in the scale function of the basic operation quantity of the measurement algorithm can be ignored. For example, it can be considered that 3n2 and 100n2 belong to the same order of magnitude. If the costs of two algorithms for processing instances of the same scale are these two functions respectively, they are considered to be "similar" in efficiency, both of which are n2.

Worst time complexity

When analyzing the algorithm, there are several possible considerations:

  • How many basic operations are required for the algorithm to complete the work, that is, the optimal time complexity
  • How many basic operations are required for the algorithm to complete the work, that is, the worst time complexity
  • How many basic operations are required for the algorithm to complete its work on average, that is, the average time complexity

For the optimal time complexity, its value is not great, because it does not provide any useful information, it reflects only the most optimistic and ideal situation, and has no reference value.

For the worst time complexity, a guarantee is provided, indicating that the algorithm must be able to complete the work in this level of basic operation.

The average time complexity is a comprehensive evaluation of the algorithm, so it fully and comprehensively reflects the nature of the algorithm. But on the other hand, this measurement is not guaranteed, and not every calculation can be completed within this basic operation. Moreover, for the calculation of the average case, it is also difficult to calculate because the instance distribution of the applied algorithm may not be uniform.

Therefore, we mainly focus on the worst case of the algorithm, that is, the worst time complexity.

Several basic calculation rules of time complexity

  1. The basic operation, that is, there is only a constant term, and the time complexity is considered to be O(1)
  2. Sequence structure, time complexity is calculated by addition
  3. Loop structure, time complexity is calculated by multiplication
  4. Branch structure, maximum time complexity
  5. When judging the efficiency of an algorithm, it is often only necessary to pay attention to the highest order item of the number of operations, and other minor items and constant items can be ignored
  6. Unless otherwise specified, the time complexity of the algorithms we analyzed refers to the worst time complexity

Space complexity

Similar to the discussion of time complexity, the space complexity S(n) of an algorithm is defined as the storage space consumed by the algorithm, which is also a function of the problem size n.

Asymptotic space complexity is often referred to simply as space complexity .

Space Complexity (SpaceComplexity) is a measure of the amount of storage space that an algorithm temporarily occupies during its operation.

The time complexity and space complexity of the algorithm are collectively called the complexity of the algorithm.

Analysis of Algorithms

  1. The core part of the algorithm for the first attempt
for a in range(0, 1001):
    for b in range(0, 1001):
        for c in range(0, 1001):
            if a**2 + b**2 == c**2 and a+b+c == 1000:
                print("a, b, c: %d, %d, %d" % (a, b, c))

time complexity:

T (n) = O (n * n * n) = O (n3)

  1. The core part of the algorithm for the second attempt
for a in range(0, 1001):
    for b in range(0, 1001-a):
        c = 1000 - a - b
        if a**2 + b**2 == c**2:
            print("a, b, c: %d, %d, %d" % (a, b, c))

time complexity:

T (n) = O (n * n * (1 + 1)) = O (n * n) = O (n2)

It can be seen that the time complexity of the second algorithm we tried is much better than that of the first algorithm.

Common time complexity

Example of execution frequency function Order Informal term
12 O (1) Constant order
2n + 3 O (n) Linear order
3n2 + 2n + 1 O (n2) Square order
5log2n + 20 O (logn) Logarithmic order
2n + 3nlog2n + 19 O (nlogn) nlogn order
6n3 + 2n2 + 3n + 4 O (n3) Cubic order
2n O (2n) Exponential order

Note that log2n (base 2 logarithm) is often abbreviated as logn

The relationship between common time complexity

Time spent from small to large

O(1) < O(logn) < O(n) < O(nlogn) < O(n2) < O(n3) < O(2n) < O(n!) < O(nn)

Exercise: Time complexity exercise (refer to the efficiency rule judgment of the algorithm)
O(5)
O(2n + 1)
O(n²+ n + 1)
O(3n³+1)

 

Python built-in type performance analysis

timeit module

The timeit module can be used to test the execution speed of a small piece of Python code.

class timeit.Timer(stmt='pass', setup='pass', timer=<timer function>)

Timer is a class that measures the execution speed of small pieces of code.

The stmt parameter is the code statement (statment) to be tested;

The setup parameter is the setting required when running the code;

The timer parameter is a timer function and is related to the platform.

timeit.Timer.timeit(number=1000000)

The object method of the Timer class to test the execution speed of the statement. The number parameter is the number of tests when testing the code, the default is 1000000 times. The method returns the average time it takes to execute the code, the number of seconds in a float type.

Operation test of list

def t1():
   l = []
   for i in range(1000):
      l = l + [i]
def t2():
   l = []
   for i in range(1000):
      l.append(i)
def t3():
   l = [i for i in range(1000)]
def t4():
   l = list(range(1000))

from timeit import Timer

timer1 = Timer("t1()", "from __main__ import t1")
print("concat ",timer1.timeit(number=1000), "seconds")
timer2 = Timer("t2()", "from __main__ import t2")
print("append ",timer2.timeit(number=1000), "seconds")
timer3 = Timer("t3()", "from __main__ import t3")
print("comprehension ",timer3.timeit(number=1000), "seconds")
timer4 = Timer("t4()", "from __main__ import t4")
print("list range ",timer4.timeit(number=1000), "seconds")

# ('concat ', 1.7890608310699463, 'seconds')
# ('append ', 0.13796091079711914, 'seconds')
# ('comprehension ', 0.05671119689941406, 'seconds')
# ('list range ', 0.014147043228149414, 'seconds')

pop operation test

x = range(2000000)
pop_zero = Timer("x.pop(0)","from __main__ import x")
print("pop_zero ",pop_zero.timeit(number=1000), "seconds")
x = range(2000000)
pop_end = Timer("x.pop()","from __main__ import x")
print("pop_end ",pop_end.timeit(number=1000), "seconds")

# ('pop_zero ', 1.9101738929748535, 'seconds')
# ('pop_end ', 0.00023603439331054688, 'seconds')

Test pop operation: As can be seen from the results, the efficiency of the last element of pop is much higher than that of the first element of pop

Can you try the append(value) and insert(0,value) of the list yourself, that is, one insert at the back and one insert at the front? ? ?

Time complexity of list built-in operations

Time complexity of dict built-in operations

data structure

How do we use types in Python to save student information in a class? What if you want to quickly get information about students through their names?

In fact, when we are thinking about this problem, we have already used the data structure. Both lists and dictionaries can store the student information of a class, but when you want to get the information of a classmate in the list, you have to traverse the list. The time complexity is O(n), and when you use the dictionary to store The student name is used as the key of the dictionary, and the student information is used as the value. The student information can be obtained quickly without traversal when querying. The time complexity is O(1).

In order to solve the problem, we need to save the data, and then design algorithms for processing according to the storage method of the data. Then the different storage methods of the data will lead to the need for different algorithms for processing. We hope that the efficiency of the algorithm to solve the problem is as fast as possible, so we need to consider how to save the data. This is the data structure.

In the above question, we can choose a list or dictionary in Python to store student information. Lists and dictionaries are two data structures built into Python to help us encapsulate them.

concept

Data is an abstract concept. After categorizing it, the basic types in the programming language are obtained. Such as: int, float, char, etc. Data elements are not independent, there are specific relationships, and these relationships are structures. Data structure refers to the relationship between data elements in a data object.

Python provides us with a lot of ready-made data structure types. The data structures that these systems define themselves and do not require us to define are called Python's built-in data structures, such as lists, tuples, and dictionaries. And some data organization methods are not directly defined in the Python system. We need to define the organization method of these data by ourselves. These data organization methods are called Python extended data structures, such as stacks, queues, etc.

The difference between algorithm and data structure

The data structure only statically describes the relationship between data elements.

Efficient programs need to design and select algorithms based on data structures.

Program = data structure + algorithm

Summary: The algorithm is designed to solve practical problems, and the data structure is the carrier of the problem that the algorithm needs to deal with

Abstract Data Type

The meaning of abstract data type (ADT) refers to a mathematical model and a set of operations defined on this mathematical model. That is, the data type and the operation on the data type are bundled together and encapsulated. The purpose of introducing abstract data types is to separate the representation of data types and the realization of operations on data types from the references of these data types and operations in the program, making them independent of each other.

There are five most commonly used data operations:

  • insert
  • delete
  • modify
  • Find
  • Sort

 

 

Guess you like

Origin blog.csdn.net/weixin_45293202/article/details/114707875