1. The introduction of the concept

Data Structures and Algorithms (Python)

Why?

    We may not give a good example:

    If the program runs in the final written likened the battlefield, our code is the command of a general farm, and we have written the code is the soldiers and weapons.

    So what data structures and algorithms? The answer: the art of war!

    We can not see the art of war fight on the battlefield, so, might win, might fail. Even victory may also pay a heavy price. We write programs versa: not read data structures and algorithms, sometimes face the problem may not have any ideas, I do not know how to start to solve; most of the time might solve the problem, but the program is not running efficiency and cost consciousness, performance low; sometimes help tool developed by others to solve the problem temporarily, but the encounter performance bottlenecks when they do not know how targeted optimization.

    If we often look at the art of war, you can do it with confidence, sometimes more with less! Similarly, if we often look at the data structures and algorithms, we can write the program with ease, perspicacious, penetrating can also encounter problems solved.

   Therefore, data structures and algorithms is an essential basic skills a program developer, not a quick fix to excel peerless master. Rome was not built in a day, we usually need to continue to take the initiative to learn to accumulate.

    3-day study, we hope so that we can understand the concepts, grasp the commonly used data structures and algorithms.

Introduced

First look at a question:

If a + b + c = 1000, and a ^ 2 + b ^ 2 = c ^ 2 (a, b, c is a natural number), how to obtain all of a, b, c the possible combinations of?

1.1 The first attempt

 1 import time
 2 
 3 start_time = time.time()
 4 
 5 # 注意是三重循环
 6 for a in range(0, 1001):
 7     for b in range(0, 1001):
 8         for c in range(0, 1001):
 9             if a**2 + b**2 == c**2 and a+b+c == 1000:
10                 print("a, b, c: %d, %d, %d" % (a, b, c))
11 
12 end_time = time.time()
13 print("elapsed: %f" % (end_time - start_time))
14 print("complete!")

operation result:

a, b, c: 0, 500, 500
a, b, c: 200, 375, 425
a, b, c: 375, 200, 425
a, b, c: 500, 0, 500
elapsed: 214.583347
complete!

1.2 The proposed algorithm

The concept of algorithm

    Is the essence of computer algorithms to process information, because a computer program is essentially an algorithm that tells the computer the exact steps to perform a specified task. Generally, when the information processing algorithm, reads data from the storage device or the address input data, writes the result to an output device or a memory address for later recall.

    Algorithm is independent of the presence of a problem-solving methods and ideas.

    For the algorithm, language is not important, important is the idea.

    Algorithm can have different language to describe the implementation version (described as C, C ++ description, description Python, etc.), we now realize are described in the Python language.

Five characteristics of the algorithm

  1. Input: 0 or algorithm having a plurality of inputs
  2. Output: algorithm outputs at least one or more of
  3. Finite: the algorithm after the step of automatically limited infinite loop without ends, and each step can be completed within an acceptable time
  4. Uncertainty: Algorithms Each step has a definite meaning, does not appear ambiguous
  5. Feasibility: Each step of the algorithm are feasible, which means that every step possible to complete a limited number of executions

1.3 second attempt

 1 import time
 2 
 3 start_time = time.time()
 4 
 5 # 注意是两重循环
 6 for a in range(0, 1001):
 7     for b in range(0, 1001):
 8         c = 1000 - a - b
 9         if a**2 + b**2 == c**2:
10             print("a, b, c: %d, %d, %d" % (a, b, c))
11 
12 end_time = time.time()
13 print("elapsed: %f" % (end_time - start_time))
14 print("complete!")

operation result:

a, b, c: 0, 500, 500
a, b, c: 200, 375, 425
a, b, c: 375, 200, 425
a, b, c: 500, 0, 500
elapsed: 0.182897
complete!

Note Run time: 0.182897 seconds

1.4 algorithm efficiency measure

Reaction efficiency of the algorithm execution time

    For the same problem, we give two solutions algorithms, in the realization of the two algorithms, we have time program execution were estimated time found significant differences between the two programs to be executed (214.583347 seconds compared to 0.182897 seconds), From this we can conclude: algorithm program execution time can reflect the efficiency of the algorithm, the merits of that algorithm.

Time alone absolute value credible?

    We will assume that the second attempt of the algorithm running in the low performance of a machine with old computers, the situation will be? It is likely to run out of time and not a method of 214.583347 seconds to run much faster than our computers.

    Relying solely on the running time of the algorithm to compare the pros and cons are not necessarily objective and accurate!

    A computer program running is inseparable from the environment (including hardware and operating system), these objective reasons can affect the running speed and reaction on the execution time of the program. So how can objectively judge the merits of an algorithm do?

Time complexity and the "big O notation"

    We assume that every time a computer algorithm to perform the basic operations of a unit of time is fixed, then the number of basic operations on behalf of the unit will take much time. However, operators of machines for different environments, the exact time the unit is different, but the basic operations for a number of algorithms (ie how many units of time spent) in size is the same order of magnitude, thus negligible environmental impact machine the objective of reaction time efficiency of the algorithm.

    For time efficiency of the algorithm, we can use the "big O notation" to represent.

   "Big O notation": an integer monotonic function f, function g if there exists an integer and real constants c> 0, so that for sufficiently large n there is always f (n) <= c * g (n), said the function of f and g is an asymptotic function (ignoring constant), denoted as f (n) = O (g (n)). In other words, tends to infinity limit of significance, the growth rate of the function f subject to g function, that function similar to the function f g characteristics.

    Time Complexity: assuming the presence function g, the algorithm A such that the treatment capacity of the problem with sample n time T (n) = O (g (n)), called O (g (n)) algorithm is asymptotically A time complexity, time complexity for short, referred to as T (n)

How to understand the "big O notation"

   Although good, but limited practical value in practice to be very specific and detailed analysis algorithms. For the time space nature and the nature of the algorithm, the most important is its magnitude and trends, which is the main part of the algorithm efficiency analysis. The number of basic operations of arithmetic scale metering function that the constant factor is negligible. For example, it is considered 3N 2 and 100N 2 belong to the same order of magnitude, if the algorithm processing cost two instances of the same size are these two functions, I think their efficiency "almost", are the n- 2 stages.

The worst time complexity

    Analysis algorithms, there are several possibilities to consider:

  • Algorithm to complete the work requires a minimum number of basic operations, that is the optimal time complexity
  • How many basic arithmetic operations take up the job done, that is the worst time complexity
  • How many basic arithmetic operations to complete work on average, that the average time complexity

    对于最优时间复杂度,其价值不大,因为它没有提供什么有用信息,其反映的只是最乐观最理想的情况,没有参考价值。

    对于最坏时间复杂度,提供了一种保证,表明算法在此种程度的基本操作中一定能完成工作。

    对于平均时间复杂度,是对算法的一个全面评价,因此它完整全面的反映了这个算法的性质。但另一方面,这种衡量并没有保证,不是每个计算都能在这个基本操作内完成。而且,对于平均情况的计算,也会因为应用算法的实例分布可能并不均匀而难以计算。

    因此,我们主要关注算法的最坏情况,亦即最坏时间复杂度

时间复杂度的几条基本计算规则

  1. 基本操作,即只有常数项,认为其时间复杂度为O(1)
  2. 顺序结构,时间复杂度按加法进行计算
  3. 循环结构,时间复杂度按乘法进行计算
  4. 分支结构,时间复杂度取最大值
  5. 判断一个算法的效率时,往往只需要关注操作数量的最高次项,其它次要项和常数项可以忽略
  6. 在没有特殊说明时,我们所分析的算法的时间复杂度都是指最坏时间复杂度

1.5 算法分析

1.第一次尝试的算法核心部分

for a in range(0, 1001):
    for b in range(0, 1001):
        for c in range(0, 1001):
            if a**2 + b**2 == c**2 and a+b+c == 1000:
                print("a, b, c: %d, %d, %d" % (a, b, c))

时间复杂度:

T(n) = O(n*n*n) = O(n3)

2.第二次尝试的算法核心部分

for a in range(0, 1001):
    for b in range(0, 1001-a):
        c = 1000 - a - b
        if a**2 + b**2 == c**2:
            print("a, b, c: %d, %d, %d" % (a, b, c))

时间复杂度:

T(n) = O(n*n*(1+1)) = O(n*n) = O(n2)

由此可见,我们尝试的第二种算法要比第一种算法的时间复杂度好多的。

1.6 常见时间复杂度

执行次数函数举例  阶 非正式术语
12  O(1)  常数阶
2n+3  O(n)  线性阶
3n2+2n+1  O(n2)  平方阶
5log2n+20  O(logn)  对数阶
2n+3nlog2n+19  O(nlogn)  nlogn阶
6n3+2n2+3n+4  O(n3)  立方阶
2n  O(2n)  指数阶

注意,经常将log2n(以2为底的对数)简写成logn

常见时间复杂度之间的关系

所消耗的时间从小到大

O(1) < O(logn) < O(n) < O(nlogn) < O(n2) < O(n3) < O(2n) < O(n!) < O(nn)

 练习: 时间复杂度练习( 参考算法的效率规则判断 )
O(5)
O(2n + 1)
O(n²+ n + 1)
O(3n³+1)

1.7 Python内置类型性能分析

timeit模块

timeit模块可以用来测试一小段Python代码的执行速度。

class timeit.Timer(stmt='pass', setup='pass', timer=<timer function>)

Timer是测量小段代码执行速度的类。

stmt参数是要测试的代码语句(statment);

setup参数是运行代码时需要的设置;

timer参数是一个定时器函数,与平台有关。

timeit.Timer.timeit(number=1000000)

Timer类中测试语句执行速度的对象方法。number参数是测试代码时的测试次数,默认为1000000次。方法返回执行代码的平均耗时,一个float类型的秒数。

list的操作测试

def test1():
   l = []
   for i in range(1000):
      l = l + [i]
def test2():
   l = []
   for i in range(1000):
      l.append(i)
def test3():
   l = [i for i in range(1000)]
def test4():
   l = list(range(1000))

from timeit import Timer

t1 = Timer("test1()", "from __main__ import test1")
print("concat ",t1.timeit(number=1000), "seconds")
t2 = Timer("test2()", "from __main__ import test2")
print("append ",t2.timeit(number=1000), "seconds")
t3 = Timer("test3()", "from __main__ import test3")
print("comprehension ",t3.timeit(number=1000), "seconds")
t4 = Timer("test4()", "from __main__ import test4")
print("list range ",t4.timeit(number=1000), "seconds")

# ('concat ', 1.7890608310699463, 'seconds')
# ('append ', 0.13796091079711914, 'seconds')
# ('comprehension ', 0.05671119689941406, 'seconds')
# ('list range ', 0.014147043228149414, 'seconds')

 pop操作测试

x = range(2000000)
pop_zero = Timer("x.pop(0)","from __main__ import x")
print("pop_zero ",pop_zero.timeit(number=1000), "seconds")
x = range(2000000)
pop_end = Timer("x.pop()","from __main__ import x")
print("pop_end ",pop_end.timeit(number=1000), "seconds")

# ('pop_zero ', 1.9101738929748535, 'seconds')
# ('pop_end ', 0.00023603439331054688, 'seconds')

 

测试pop操作:从结果可以看出,pop最后一个元素的效率远远高于pop第一个元素

可以自行尝试下list的append(value)和insert(0,value),即一个后面插入和一个前面插入???

list内置操作的时间复杂度

 

 

 

1.8 数据结构

我们如何用Python中的类型来保存一个班的学生信息? 如果想要快速的通过学生姓名获取其信息呢?

    实际上当我们在思考这个问题的时候,我们已经用到了数据结构。列表和字典都可以存储一个班的学生信息,但是想要在列表中获取一名同学的信息时,就要遍历这个列表,其时间复杂度为O(n),而使用字典存储时,可将学生姓名作为字典的键,学生信息作为值,进而查询时不需要遍历便可快速获取到学生信息,其时间复杂度为O(1)。

    我们为了解决问题,需要将数据保存下来,然后根据数据的存储方式来设计算法实现进行处理,那么数据的存储方式不同就会导致需要不同的算法进行处理。我们希望算法解决问题的效率越快越好,于是我们就需要考虑数据究竟如何保存的问题,这就是数据结构。

    在上面的问题中我们可以选择Python中的列表或字典来存储学生信息。列表和字典就是Python内建帮我们封装好的两种数据结构。

概念

    数据是一个抽象的概念,将其进行分类后得到程序设计语言中的基本类型。如:int,float,char等。数据元素之间不是独立的,存在特定的关系,这些关系便是结构。数据结构指数据对象中数据元素之间的关系。

   Python给我们提供了很多现成的数据结构类型,这些系统自己定义好的,不需要我们自己去定义的数据结构叫做Python的内置数据结构,比如列表、元组、字典。而有些数据组织方式,Python系统里面没有直接定义,需要我们自己去定义实现这些数据的组织方式,这些数据组织方式称之为Python的扩展数据结构,比如栈,队列等。

算法与数据结构的区别

         数据结构只是静态的描述了数据元素之间的关系。

    高效的程序需要在数据结构的基础上设计和选择算法。

    程序 = 数据结构 + 算法

    总结:算法是为了解决实际问题而设计的,数据结构是算法需要处理的问题载体

抽象数据类型(Abstract Data Type)

    抽象数据类型(ADT)的含义是指一个数学模型以及定义在此数学模型上的一组操作。即把数据类型和数据类型上的运算捆在一起,进行封装。引入抽象数据类型的目的是把数据类型的表示和数据类型上运算的实现与这些数据类型和运算在程序中的引用隔开,使它们相互独立。

    最常用的数据运算有五种:

  • 插入
  • 删除
  • 修改
  • 查找
  • 排序

Guess you like

Origin www.cnblogs.com/livelychen/p/11655276.html