总结:函数式编程避免了for循环式结构,有利于debug时只关注核心代码。此外,在任务复杂和数据量比较大的时候,函数式编程+多进程效率更高。
1.for循环式编程
假设现在有一组数据data,我们想对数据中的每一个成员求平方,常用的实现方式包括for循环和 [ ] 两种,具体如下:
#1.for循环方式
res1 = []
for n in data:
res1.append(n ** 2)
#2.[]方式
res2 = [n ** 2 for n in data]
2.函数式编程
以上都是基于for循环实现的方式,对于平时的debug来说,可能会不太友好。函数式编程的目的是尽量减少for/while等循环的结构,debgu的时候只需要关注核心代码。其结构就是map(函数,数据)。直接看代码:
#需要先定义核心代码
def square(n):
time.sleep(0.1)
return n ** 2
#3.函数式编程
res3 = map(square, data)
#map出的结果需要list转换一下
print(list(res3)[:10])
直接看一下3种方式的完整代码,并对比一下效率:
import random
import time
from multiprocessing import Pool
def square(n):
return n ** 2
if __name__ == "__main__":
random.seed(1)
data=[random.randint(0, 1000) for i in range(1000000)]
#对data每个成员求平方
#1.for循环方式
tis1 = time.time()
res1 = []
for n in data:
res1.append(n ** 2)
print(res1[:10])
tie1 = time.time()
print("1time:", (tie1-tis1)*1000)
#2.[]方式
tis2 = time.time()
res2 = [n ** 2 for n in data]
print(res2[:10])
tie2 = time.time()
print("2time:", (tie2-tis2)*1000)
#3.函数式编程
tis3 = time.time()
res3 = map(square, data)
print(list(res3)[:10])
tie3 = time.time()
print("3time:", (tie3-tis3)*1000)
运行结果:可以看出[ ]的效率最高,实际上任务越复杂,数据量越多,函数式编程的效率应该会更高。有兴趣的可以自己尝试一下
[18769, 338724, 751689, 674041, 611524, 4096, 68121, 14400, 257049, 606841]
1time: 389.9209499359131
[18769, 338724, 751689, 674041, 611524, 4096, 68121, 14400, 257049, 606841]
2time: 313.1868839263916
[18769, 338724, 751689, 674041, 611524, 4096, 68121, 14400, 257049, 606841]
3time: 364.02082443237305
3.多进程
首先说明一下,多进程并不是适用于任何场景,例如我们这里的简单任务,在大量数据的情况下,会发现多进程的时间远慢于单进程。贴上自己的验证效果:
[18769, 338724, 751689, 674041, 611524, 4096, 68121, 14400, 257049, 606841]
1time: 3.988504409790039
[18769, 338724, 751689, 674041, 611524, 4096, 68121, 14400, 257049, 606841]
2time: 3.0193328857421875
[18769, 338724, 751689, 674041, 611524, 4096, 68121, 14400, 257049, 606841]
3time: 2.995014190673828
#多进程的时间远慢于单进程
[18769, 338724, 751689, 674041, 611524, 4096, 68121, 14400, 257049, 606841]
4time: 755.9380531311035
[18769, 338724, 751689, 674041, 611524, 4096, 68121, 14400, 257049, 606841]
5time: 225.39448738098145
这是因为多进程之间是有通信开销的,如果任务简单,则通信开销时间会远大于处理任务的时间,这就会得不偿失。因此我们对任务进行了修改,使用time.sleep()来代表函数复杂性
多进程采用了常规的pool.apply_async( )和pool.map( )两种方式。直接上代码:
import random
import time
from multiprocessing import Pool
def square(n):
time.sleep(0.2)
return n ** 2
if __name__ == "__main__":
random.seed(1)
data=[random.randint(0, 1000) for i in range(50)]
#对data每个成员求平方
#1.for循环方式
tis1 = time.time()
res1 = []
for n in data:
res1.append(n ** 2)
time.sleep(0.2)
print(res1[:10])
tie1 = time.time()
print("1time:", (tie1-tis1)*1000)
#2.[]方式
tis2 = time.time()
res2 = [n ** 2 for n in data]
time.sleep(0.2 * len(data))
print(res2[:10])
tie2 = time.time()
print("2time:", (tie2-tis2)*1000)
#3.函数式编程
tis3 = time.time()
res3 = map(square, data)
print(list(res3)[:10])
tie3 = time.time()
print("3time:", (tie3-tis3)*1000)
#4.多进程
tis4 = time.time()
p = Pool(4)
res_l = []
for n in data:
res = p.apply_async(square, (n, ))
res_l.append(res)
p.close()
p.join()
print([res_l[i].get() for i in range(10)])
tie4 = time.time()
print("4time:", (tie4-tis4)*1000)
#5.pool.map
tis5 = time.time()
p_m = Pool(4)
res5 = p_m.map(square, data)
print(res5[:10])
tie5 = time.time()
print("5time:", (tie5-tis5)*1000)
运行结果:可以看出pool.map这种方式的效率最高
[18769, 338724, 751689, 674041, 611524, 4096, 68121, 14400, 257049, 606841]
1time: 10190.139770507812
[18769, 338724, 751689, 674041, 611524, 4096, 68121, 14400, 257049, 606841]
2time: 10013.084888458252
[18769, 338724, 751689, 674041, 611524, 4096, 68121, 14400, 257049, 606841]
3time: 10185.515880584717
[18769, 338724, 751689, 674041, 611524, 4096, 68121, 14400, 257049, 606841]
4time: 3127.182722091675
[18769, 338724, 751689, 674041, 611524, 4096, 68121, 14400, 257049, 606841]
5time: 3075.0463008880615
总结:函数式编程避免了for循环式结构,有利于debug时只关注核心代码。此外,在任务复杂和数据量比较大的时候,函数式编程+多进程效率更高。