编码
Python3的效率略低于Python2,但是优化空间很大;
Python3默认使用的是UTF-8编码,命名空间更加广阔;
语法
Python2使用<>表示不等于,Python3使用的是!=表示不等于;
Python3加入了as和with以及True False None等关键字;
对于整除需要使用//;
Python3去除了print语句,加入了print()函数;
Python3去除了row_input语句,加入了input()函数;
添加了新的super(),可以不进行参数传递;
改变了顺序操作符的行为,在进行不同类型的数据比较时,不再进行数据类型的自动转换;
对于8进制的转换,必须使用0o来进行引导;
数据类型
Python2使用的是8-bit字符串存储,Python3使用的是16-bit位进行存储,并且字符串只支持str一种类型;
Python3去除了long只有一种整形,就是long,并且新增了Bytes类型,对应于2.x版本的八位串,并且支持str<–>bytes使用.encode()以及.decode()进行转换;
面向对象上面引入了基类的概念;
异常
所有异常继承于BaseException,并且删除了StardardError
Python2使用异常的格式是
try:
-----
expect Exception,e:
-----

Python3使用的异常类型:
try:
------
except Exception as e:
-------

其他

xrange()更名为range(),如果需要使用range()获得一个list必须进行显示调用;

fie类被废弃,打开文件使用open();

高阶函数map和reduce

map reduce是一个分布式编程框架,对于Python来说内置了map以及Reduce函数;

map(fn, lsd):这个函数是将传入的函数依次作用于序列中的每一个元素,并且把结果作为新的Iterator进行返回;

fn:表示函数;
lsd:表示序列;
reduce(fn, lsd):将一个函数作用在序列上面,这个函数必须接受两个函数,Reduce把结果和序列的下一个元素进行累积运算;

fn:表示是一个函数,
lsd:表示是一个列表;
这个函数的返回结果类似于f(f(a,b),c),d);

高阶函数filter

filter(fc, lsd):用于过滤序列,用于将传入的函数一次作用于序列中的每个元素,根据返回的结果,决定是否保留这个元素,并且不修改原始序列的数据;

fn:表示传递一个函数;
lsd:表示传递一个序列;
可以用于去掉奇数,留下偶数;

高阶函数sorted

这个函数是用来排序的,常见的排序分为冒泡,选择; 快速,插入,计数器;

普通排序:

默认是升序排序

list2 = sorted(list1)
print(list2)

按照绝对值大小排序,key:接收函数来实现自定义的排序规则;
list3 = sorted(list1, key=abs)
print(list3)

进行降序排序
list4 = sorted(list1, reverse=True)
print(list4)

同样也可以进行字符串排序
list6 = [‘a123’, ‘b234’, ‘dsawsa’, ‘dasdw’, ‘fdeds’]
list7 = sorted(list6)
print(list7)

支持按照字符串的长度进行排序,对于len支持使用自定的函数
list8 = [‘a123’, ‘b234’, ‘dsawsa’, ‘dasdw’, ‘fdeds’]
list9 = sorted(list8, key=len)
print(list9)

单元测试
是用来对于一个函数或者一个类,或者一个模块进行正确性校验;
单元测试
测试文件
结束测试时自动调用
正则表达式
Python1.5以后增加了re模块,提供了正则表达式,re模块使Python拥有了全部的正则表达式功能;
re.match()函数:
def match(pattern, string, flags=0):
“”“Try to apply the pattern at the start of the string, returning
a match object, or None if no match was found.”""
return _compile(pattern, flags).match(string)

尝试从字符串的起始位置匹配一个模式,如果不是起始位置匹配成功,返回None
pattern:表示进行模式匹配的正则表达式;
string:表示需要进行匹配的字符串;
flag:表示标志位,用于控制正则表达式的匹配方式,
re.I:表示忽略大小写;
re.L:表示用于进行本地化识别;
re.M:表示进行多行匹配的,影响^以及$
re.S:使用.匹配包括换行符在内的任何字符;
re.U:表示Unicode字符集解析字符,影响\w \W \b \B
re.X:使用更加灵活的格式来理解正则表达式;
re.search():函数
def search(pattern, string, flags=0):
“”“Scan through string looking for a match to the pattern, returning
a match object, or None if no match was found.”""
return _compile(pattern, flags).search(string)

函数原型和上面的基本相同,函数名不同;

pattern:表示进行模式匹配的正则表达式;

string:表示需要进行匹配的字符串;

flag:表示标志位,用于控制正则表达式的匹配方式;

功能:用于扫面整个字符串,返回第一个成功的匹配;

re.findall():

def findall(pattern, string, flags=0):
“”"Return a list of all non-overlapping matches in the string.

If one or more capturing groups are present in the pattern, return
a list of groups; this will be a list of tuples if the pattern
has more than one group.

Empty matches are included in the result."""
return _compile(pattern, flags).findall(string)

pattern:表示进行模式匹配的正则表达式;
string:表示需要进行匹配的字符串;
flag:表示标志位,用于控制正则表达式的匹配方式,
re.I:表示忽略大小写;
re.L:表示用于进行本地化识别;
re.M:表示进行多行匹配的,影响^以及$
re.S:使用.匹配包括换行符在内的任何字符;
re.U:表示Unicode字符集解析字符,影响\w \W \b \B
re.X:使用更加灵活的格式来理解正则表达式;
扫描整个字符串,并且返回结果列表
关于正则表达式
匹配单个字符和数字:
.:用于匹配除了换行符之外的任意字符;
[0123456789]:用于匹配任意单个数字;
[like]:表示用于匹配l i k e中的任意单个字符
[a-z]:表示用于匹配a-z中的字符,也就是任意的小写字符,同样也支持[A-Z];
[0-9a-zA-Z]:表示任意的数字以及大小写字母;
[0-9a-zA-Z_]:表示任意的数字以及大小写字母以及下划线;

几个练手题:

QQ: 一般是6—>10位的纯数字
mai *******@163.com
Phone 010-85832376
user 6–>12位,
IP
url

re模块深入
re.split():函数用于进行切割
str1 = “today is a good day”

print(str1.split(" “))
print(re.split(r” +", str1)) //表示按照至少一个空格进行切割

re.finditer()函数:
def finditer(pattern, string, flags=0):
“”"Return an iterator over all non-overlapping matches in the
string. For each match, the iterator returns a match object.

Empty matches are included in the result."""
return _compile(pattern, flags).finditer(string)

pattern:表示进行模式匹配的正则表达式;
string:表示需要进行匹配的字符串;
flag:表示标志位,用于控制正则表达式的匹配方式,
re.I:表示忽略大小写;
re.L:表示用于进行本地化识别;
re.M:表示进行多行匹配的,影响^以及$
re.S:使用.匹配包括换行符在内的任何字符;
re.U:表示Unicode字符集解析字符,影响\w \W \b \B
re.X:使用更加灵活的格式来理解正则表达式;
这个函数和findAll类似,用于扫描整个字符串,返回的是一个迭代器;
str3 = “today is a good day today is a nice day today is a great day”
itera = re.finditer(r"(today)", str3)
while True:
try:
l = next(itera)
print(l)
except StopIteration as e:
break

字符串的替换和修改
包括两个函数
re.sub()
def sub(pattern, repl, string, count=0, flags=0):
“”“Return the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in string by the
replacement repl. repl can be either a string or a callable;
if a string, backslash escapes in it are processed. If it is
a callable, it’s passed the match object and must return
a replacement string to be used.”""
return _compile(pattern, flags).sub(repl, string, count)

以及函数re.subn()
def subn(pattern, repl, string, count=0, flags=0):
“”“Return a 2-tuple containing (new_string, number).
new_string is the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in the source
string by the replacement repl. number is the number of
substitutions that were made. repl can be either a string or a
callable; if a string, backslash escapes in it are processed.
If it is a callable, it’s passed the match object and must
return a replacement string to be used.”""
return _compile(pattern, flags).subn(repl, string, count)

参数以及含义:
pattern:表示的是正则表达式
repl:表示替换成的内容;
string:表示需要进行处理的字符串
count:最多替换次数
flags:和之前的flags是一样的;
功能:用于在目标字符串中,使用正则表达式的规则,匹配字符串,然后替换成为指定的字符串,可以指定替换的次数,默认是替换所有匹配字符串;
这两个函数的返回值类型是不同的;
//返回的是一个被替换的字符串
print(re.sub(r"(today)", “tomorrow”, str3, count=2))
print(type(re.sub(r"(today)", “tomorrow”, str3, count=2)))

//返回值是一个元组,第一个是字符串,后面的是被替换的次数
print(re.subn(r"(today)", “tomorrow”, str3, count=2))
print(type(re.subn(r"(today)", “tomorrow”, str3, count=2)))

分组
概念:除了简单的判断是否匹配之外,正则表达式还可以用提取子串的功能,使用()表示分组,使用()括起来的就是一组;
group(0)代表的是原始的字符串
print("------------------------")
str4 = “010-52347654”
m = re.match(r"(?P\d{3})-(?P\d{8})", str4)
print(m)

使用序号获取对应组的信息,`group(0)`代表的是原始的字符串

print(m.group(0))
print(m.group(1))
print(m.group(2))

查看匹配的各组的情况

print(m.groups())

支持给各个组进行重命名
print(m.groups(“first”))
print(m.group(“last”))
1
2
编译
在使用正则表达式时,re模块需要完成两件事
1.首先编译正则表达式,如果正则表达式本身不合法,会提示错误
2.使用编译后的正则表达式匹配对象
这里的编译一般是将经常使用的正则表达式编译成为对象,方便使用
编译使用函数re.compele(parttetn, flags)来完成
pat = r"1(([3578]\d)|(47))\d{8}$"

re_telephone = re.compile(pat)
re_telephone.match(“13700000000”)

作者：bug–maker
来源：CSDN
原文：https://blog.csdn.net/qq_36294875/article/details/80790100
版权声明：本文为博主原创文章，转载请附上博文链接！

表示匹配所有的非数字字符;
\d:表示匹配所有的数字,效果类似于[0-9]
\D:表示匹配所有的非数字字符,效果同¹;
\w:表示匹配数字,字母和下划线,效果和[0-9a-zA-Z_];
\W:表示匹配非数字,字母和下划线,效果和[^0-9a-zA-Z_];
\s:表示匹配任意的空白字符,换行,换页,制表,回车,效果类似于[ \f\n\r\t]
\S:匹配任意的非空白字符,效果类似于[^ \f\n\r\t];
边界定位符
^:不在[]里面时,表示的含义是行首匹配,用于表示是否是某个字符开头的;
$:表示行尾匹配;
\A:匹配字符串的开始,和^的区别是\A只匹配整个字符串的开头,即使在re.M模式下,也不会匹配其它行的行首;
\Z:匹配字符串的结束,和^的区别是\A只匹配整个字符串的开头,即使在re.M模式下,也不会匹配其它行的结尾;
\b:表示用于匹配一个单词的边界,也就是单词和空格的位置;
\B:用于匹配非单词的边界
匹配多个字符
(xyz):以下的x, y, z均为假设的普通字符,不是正则表达式的元字符,(xyz)表示的含义是将小括号内的xyz作为一个整体去进行匹配;
x?:表示匹配0个或者1个x;
x*:表示匹配0个或者任意多个x.字符,换行符除外;
x+:表示匹配至少一个x;
x{n}:表示匹配确定的n个x,n是一个非负整数;
x{n,}:表示匹配至少n个x
x{n,m}:表示匹配至少n个,至多m个x
((t|T)oday):表示匹配的是大写或者是小写字母开头的today
在进行匹配的过程中,默认执行的是贪婪匹配;
? +? x?:表示的是最小匹配原则;
print(re.findall(r"/.?//", r"/* part1 / / part2 */"))
1
(?:x):类似于(xyz),但是不表示一个组; ↩︎

python2 python3区别

使用序号获取对应组的信息,`group(0)`代表的是原始的字符串

查看匹配的各组的情况

re_telephone = re.compile(pat)
re_telephone.match(“13700000000”)

猜你喜欢

python2 python3区别

使用序号获取对应组的信息,group(0)代表的是原始的字符串

查看匹配的各组的情况

re_telephone = re.compile(pat) re_telephone.match(“13700000000”)

猜你喜欢

使用序号获取对应组的信息,`group(0)`代表的是原始的字符串

re_telephone = re.compile(pat)
re_telephone.match(“13700000000”)