table of Contents

day17 regular expressions and log

day17 regular expressions and log

Content Today

reRegular expressions - extract content from a string
loggingThe log

Yesterday Review

Serialization
1. json (important, common)
  - dumps loads for network transmission
  - dump load for file storage
2. pickle (Python private)
  - dumps loads for network transmission
  - dump load for file storage
- json string is acquired, pickle acquired byte
- Some special data conversion
hashlib
- For encryption and checksum
- md5，sha1，sha256，sha512
- String - byte - ciphertext
- Irreversible encryption
```
import hashlib
md5obj = hashlib.md5('要加的盐'.encode('utf-8'))
md5obj.update('要加密的内容'.encode('utf-8'))
print(md5.hexdigest())
```
- With salt:
  - Fixed salt
  - Dynamic salt
collections
1. Counter
2. Ordered dictionary, OrderedDict
3. The default dictionary, defaultdict
4. Deque queue, stack
5. Named tuples

Software development specification

----blog_demo\
    |----bin\
    |    |----start.py
    |----conf\
    |    |----setting.py
    |----core\
    |    |----src.py
    |----db\
    |    |----userinfo
    |----file_tree.py
    |----lib\
    |    |----commom.py
    |----log\

Today's detailed content

`re`Regular expressions

re, that is, regular expressions, used to get what we want from a string:

import re
a = 'alex,meet,eva_j'
print(re.findall('e', a))    # findall放来的两个参数为：参数1用来传入要查找的内容，参数2用来传入原来的字符串

输出的结果为：['e', 'e', 'e', 'e']

Metacharacters

Of course, regular expressions can not simply look for this character. Regular expression real power lies in its all-encompassing matching rules. This set of rules for matching are common for all languages, through a combination of meta characters:

Metacharacters	Matched content
\w	Match letters (including Chinese), numbers, or underscore
\W	In addition to matching characters other than letters (including Chinese), numbers, or underscore
\s	Matches any whitespace character
\S	Matches any non-whitespace characters
\d	Match any digit
\D	Matches any non-numeric characters
\ A or ^	From the beginning of the string match
\ Z or &	From the end of the string matching
.	Match any character except newline. When re.DOTALL flag is specified, any character can match newline comprising
[0-9a-zA-Z]	To match a range of numbers and letters, no spaces and commas
*	Matches zero or more characters to the left of greedy match (match the longest possible content)
+	Match one or more characters to the left of greedy match
?	Character matches zero or one on the left, non-greedy match. * And it can also be used to convert non-greedy matching pattern +
{n}	Precisely matches the n preceding character
{n, m}	Match n to m in front of the character segment, matching greedy
a\|b	a or b, a matching priority
()	Matching expression in parentheses, also represents a group

Regular use of the expression is:

s = 'alex1, 哈啰，123,meet!@'
print(re.findall('\w', s))    # 匹配数字字母（包括中文）字符串
print(re.findall('\W', s))
输出的结果为：
['a', 'l', 'e', 'x', '1', '哈', '啰', '1', '2', '3', 'm', 'e', 'e', 't']
[',', ' ', '，', ',', '!', '@']

s1 = 'alex w \n, dal\t,sdjfl'
print(re.findall('\s', s1))    # 匹配空白字符
print(re.findall('\S', s1))    # 匹配非空白字符
输出的结果为：
[' ', ' ', '\n', ' ', '\t']
['a', 'l', 'e', 'x', 'w', ',', 'd', 'a', 'l', ',', 's', 'd', 'j', 'f', 'l']

s2 = 'a1b2c3'
print(re.findall('\d', s2))    # 匹配所有数字
print(re.findall('\D', s2))    # 匹配所有非数字
输出的结果为：
['1', '2', '3']
['a', 'b', 'c']

s3 = 'alex'
print(re.findall('\Aa', s3))    # 从开头匹配
print(re.findall('^a', s3))
print(re.findall('x\Z', s3))    # 从结尾匹配
print(re.findall('x$', s3))
输出的结果为：
['a']
['a']
['x']
['x']

s4 = 'abc a12\n3!@\t#中文_'
print(re.findall('.', s4))    # 匹配任意字符（换行符除外）
print(re.findall('\n', s4))
print(re.findall('.', s4, re.DOTALL))    # 匹配包括换行符之内的任意字符
输出的结果为：
['a', 'b', 'c', ' ', 'a', '1', '2', '3', '!', '@', '\t', '#', '中', '文', '_']
['\n']
['a', 'b', 'c', ' ', 'a', '1', '2', '\n', '3', '!', '@', '\t', '#', '中', '文', '_']

s5 = 'al ex()123!@'
print(re.findall('[0-9]', s5))    # 匹配范围内的数字
print(re.findall('[0-9a-zA-Z]', s5))    # 匹配范围内的数字和字母
输出的结果为：
['1', '2', '3']
['a', 'l', 'e', 'x', '1', '2', '3']

s6 = 'a.l e!x123'
print(re.findall('[.]', s6))    # 查找特定的字符“.”
print(re.findall('.', s6))    # 查找任意字符
print(re.findall('[^.]', s6))    # 查找除了“.”之外的所有字符
输出的结果为：
['.']
['a', '.', 'l', ' ', 'e', '!', 'x', '1', '2', '3']
['a', 'l', ' ', 'e', '!', 'x', '1', '2', '3']

s7 = 'alex-meet-leeek'
print(re.findall('e*', s7))    # 查找0个或多个字符e，贪婪匹配
print(re.findall('e+', s7))    # 匹配1个或多个字符e，贪婪匹配
print(re.findall('e?', s7))    # 匹配0个或1个字符e，非贪婪匹配
print(re.findall('e{2}', s7))   # 匹配刚好2个字符e
print(re.findall('e{1,2}', s7))    # 匹配1到2个字符2，贪婪匹配，打括号里面不能有空格
输出的结果为：
['', '', 'e', '', '', '', 'ee', '', '', '', 'eee', '', '']
['e', 'ee', 'eee']
['', '', 'e', '', '', '', 'e', 'e', '', '', '', 'e', 'e', 'e', '', '']
['ee', 'ee']
['e', 'ee', 'ee', 'e']

s8 = 'alex-meet-leeek'
print(re.findall('a|e', s8))   # 查找a或e，优先查找a
输出的结果为：['a', 'e', 'e', 'e', 'e', 'e', 'e']

s9 = 'alex-meeet-leeek'
print(re.findall('e(e)e', s9))    # 分组，对于findall方法只返回括号内的字符，对于search方法则返回整个表达式结构
print(re.search('e(e)e', s9).group())
print(re.findall('m(ee)e', s9))
print(re.findall('m(?:ee)e', s9))    # 分组，返回整个正则表达式结构
输出的结果为：
['e', 'e']
eee
['ee']
['meee']

Next, we'll adopt some of the exercises, more in-depth understanding of the use of regular expressions.

Has the following string: 'alex_sb ale123_sb wu12sir_sb wusir_sb ritian_sb 的 alex wusir 'find where all band _sbcontent. Requirements should be included in the output _sb.

import re
s = 'alex_sb ale123_sb wu12sir_sb wusir_sb ritian_sb 的 alex wusir '
print(re.findall('\w[(?:.+?)]_sb', s))    # 第二个?用来让+变成非贪婪匹配
print(re.findall('\w[a-z0-9]+_sb', s))
print(re.findall('(\w.+?_sb) ', s))

输出结果全都为：
['alex_sb', 'ale123_sb', 'wu12sir_sb', 'wusir_sb', 'ritian_sb']

There are strings "1-2*(60+(-40.35/5)-(-4*3))",

Please find all integers in a string
Please find all the numbers in a string (including decimals)
Please find all numbers (positive, negative and decimal)

import re
s = "1-2*(60+(-40.35/5)-(-4*3))"
print(re.findall('\d+', s))
print(re.findall('\d+\.?\d*', s))
print(re.findall('-?\d+\.?\d*', s))

输出的结果为：
['1', '2', '60', '40', '35', '5', '4', '3']
['1', '2', '60', '40.35', '5', '4', '3']
['1', '-2', '60', '-40.35', '5', '-4', '3']

Find the string "http://blog.csdn.net/[email protected]/article/details/51656638"in the mailbox.

import re
s = "http://blog.csdn.net/[email protected]/article/details/51656638"
print(re.findall('\w+@\w+\.\w+', s))

输出的结果为：['[email protected]']

Find the following string

As all forms of 1995-04-27time
Shaped like a 1980-04-27:1980-04-27time

import re
s = '''
时间就是1995-04-27,2005-04-27
1999-04-27 老男孩教育创始人
老男孩老师 alex 1980-04-27:1980-04-27
2018-12-08
'''
print(re.findall('\d{4}-\d{2}-\d{2}', s))
print(re.findall('(\S+:\S+)'))

输出的结果为：
['1995-04-27', '2005-04-27', '1999-04-27', '1980-04-27', '1980-04-27', '2018-12-08']
['1980-04-27:1980-04-27']

Matching string "1231,11,2,1,-1,12.34545,abc,ssed"of floating point

import re
s = "1231,11,2,1,-1,12.34545,abc,ssed"
print(re.findall('\d+\.\d+', s))

输出的结果为：
['12.34545']

Matching string "1231231,324233,123,1123,2435,1234,2546,23451324,3546354,13241234"may QQ number (5-11 bits).

import re
s = "1231231,324233,123,1123,2435,1234,2546,23451324,3546354,13241234"
print(re.findall('\d{5,11}', s))

输出的结果为：
['1231231', '324233', '23451324', '3546354', '13241234']

The following data has a length, determine the URL of the href wherein a label (such as http://www.cnblogs.com/guobaoyuan/articles/7087629.html).

import re
msg = '''
<div id="cnblogs_post_body" class="blogpost-body"><h3><span style="font-family: 楷体;">python基础篇</span></h3>
<p><span style="font-family: 楷体;">&nbsp; &nbsp;<strong><a href="http://www.cnblogs.com/guobaoyuan/p/6847032.html" target="_blank">python 基础知识</a></strong></span></p>
<p><span style="font-family: 楷体;"><strong>&nbsp; &nbsp;<a href="http://www.cnblogs.com/guobaoyuan/p/6627631.html" target="_blank">python 初始python</a></strong></span></p>
<p><span style="font-family: 楷体;"><strong>&nbsp; &nbsp;<strong><a href="http://www.cnblogs.com/guobaoyuan/articles/7087609.html" target="_blank">python 字符编码</a></strong></strong></span></p>
<p><span style="font-family: 楷体;"><strong><strong>&nbsp; &nbsp;<a href="http://www.cnblogs.com/guobaoyuan/articles/6752157.html" target="_blank">python 类型及变量</a></strong></strong></span></p>
<p><span style="font-family: 楷体;"><strong>&nbsp; &nbsp;<a href="http://www.cnblogs.com/guobaoyuan/p/6847663.html" target="_blank">python 字符串详解</a></strong></span></p>
<p><span style="font-family: 楷体;">&nbsp; &nbsp;<strong><a href="http://www.cnblogs.com/guobaoyuan/p/6850347.html" target="_blank">python 列表详解</a></strong></span></p>
<p><span style="font-family: 楷体;"><strong>&nbsp; &nbsp;<a href="http://www.cnblogs.com/guobaoyuan/p/6850496.html" target="_blank">python 数字元祖</a></strong></span></p>
<p><span style="font-family: 楷体;">&nbsp; &nbsp;<strong><a href="http://www.cnblogs.com/guobaoyuan/p/6851820.html" target="_blank">python 字典详解</a></strong></span></p>
<p><span style="font-family: 楷体;"><strong>&nbsp; &nbsp;<strong> <a href="http://www.cnblogs.com/guobaoyuan/p/6852131.html" target="_blank">python 集合详解</a></strong></strong></span></p>
<p><span style="font-family: 楷体;"><strong>&nbsp; &nbsp;<a href="http://www.cnblogs.com/guobaoyuan/articles/7087614.html" target="_blank">python 数据类型</a>&nbsp;</strong></span></p>
<p><span style="font-family: 楷体;"><strong>&nbsp; &nbsp;<a href="http://www.cnblogs.com/guobaoyuan/p/6752169.html" target="_blank">python文件操作</a></strong></span></p>
<p><span style="font-family: 楷体;"><strong>&nbsp; &nbsp;<a href="http://www.cnblogs.com/guobaoyuan/p/8149209.html" target="_blank">python 闭包</a></strong></span></p>
<p><span style="font-family: 楷体;"><strong>&nbsp; &nbsp;<a href="http://www.cnblogs.com/guobaoyuan/articles/6705714.html" target="_blank">python 函数详解</a></strong></span></p>
<p><span style="font-family: 楷体;"><strong>&nbsp; &nbsp;<a href="http://www.cnblogs.com/guobaoyuan/articles/7087616.html" target="_blank">python 函数、装饰器、内置函数</a></strong></span></p>
<p><span style="font-family: 楷体;"><strong>&nbsp; &nbsp;<a href="http://www.cnblogs.com/guobaoyuan/articles/7087629.html" target="_blank">python 迭代器 生成器</a>&nbsp;&nbsp;</strong></span></p>
<p><span style="font-family: 楷体;"><strong>&nbsp; &nbsp;<a href="http://www.cnblogs.com/guobaoyuan/articles/6757215.html" target="_blank">python匿名函数、内置函数</a></strong></span></p>
</div>
'''
print(re.findall('href="(.+?)"'))

输出的结果为：
['http://www.cnblogs.com/guobaoyuan/p/6847032.html', 'http://www.cnblogs.com/guobaoyuan/p/6627631.html', 'http://www.cnblogs.com/guobaoyuan/articles/7087609.html', 'http://www.cnblogs.com/guobaoyuan/articles/6752157.html', 'http://www.cnblogs.com/guobaoyuan/p/6847663.html', 'http://www.cnblogs.com/guobaoyuan/p/6850347.html', 'http://www.cnblogs.com/guobaoyuan/p/6850496.html', 'http://www.cnblogs.com/guobaoyuan/p/6851820.html', 'http://www.cnblogs.com/guobaoyuan/p/6852131.html', 'http://www.cnblogs.com/guobaoyuan/articles/7087614.html', 'http://www.cnblogs.com/guobaoyuan/p/6752169.html', 'http://www.cnblogs.com/guobaoyuan/p/8149209.html', 'http://www.cnblogs.com/guobaoyuan/articles/6705714.html', 'http://www.cnblogs.com/guobaoyuan/articles/7087616.html', 'http://www.cnblogs.com/guobaoyuan/articles/7087629.html', 'http://www.cnblogs.com/guobaoyuan/articles/6757215.html']

Regular expressions commonly used method

In addition to findall, regular expressions commonly used method as well as search, match and so on:

import re
s = 'alexmeet'
print(re.findall('e', s))
print(re.search('e', s))    # 从字符串任意位置进行匹配，查找到一个就停止，返回的是match对象，如果找不到则返回None
print(re.search('e', s).group())    # 将match对象转换为字符串
print(re.match('e', s))    # 从字符串的开头位置进行匹配，查找到返回match对象，查找不到返回None
print(re.match('a', s).group())

输出的结果为：
['e', 'e', 'e']
<_sre.SRE_Match object; span=(2, 3), match='e'>
e
None
a

Comparison of methods and search methods match:

The method of matching the search string at any position, finds a stop
match matching method from a start position of the string, not found return None

In addition to several of the above methods, re there are some less common but useful methods:

import re
s1 = 'alex wusir,日天，太白;女神;肖锋：吴超'
print(re.split('[ :;,：；，]', s1))    # split分割
输出的结果为：
['alex', 'wusir', '日天', '太白', '女神', '肖锋', '吴超']

s2 = 'barry是最好的讲师，barry就是一个普通老师，请不要将barry当男神对待。'
print(re.sub('barry', 'meet', s2))    # sub替换
输出的结果为：
meet是最好的讲师，meet就是一个普通老师，请不要将meet当男神对待。

obj = re.compile('\d{2}')    # compile 定义匹配规则
print(obj.findall('alex12345'))
输出的结果为：
['12', '34']

it = re.finditer('e', 'meet,alex')    # 返回一个迭代器
print(it)
print(next(it))
print(next(it).group())
输出的结果为：
<callable_iterator object at 0x0000021E54F2AC50>
<_sre.SRE_Match object; span=(1, 2), match='e'>
e

s = '<h1>hello</h1>'
ret = re.search('<(?P<h>\w+)>(?P<h1>\w+)<(?P<h2>/\w+)>')    # 给分组取名字
print(ret.group('h'))
print(ret.group('h1'))
print(ret.group('h2'))
返回的结果为：
h1
hello
/h1

Interview questions:

What is greed greedy matching and non-matching?
Search and match the difference?

`logging`The log

logging is logging its major role in are:

Record running state (time, file name, line number error, error message)
User preferences (and some of the preferences of the user operation analysis)
Bank (account water)

Log total is divided into five levels:

No.	Error name	meaning	level
1	debug	debugging	10
2	info	information	20
3	warning	caveat	30
4	error	error	40
5	critical	Danger	50

Under normal circumstances, the default logging level reported more than 30 events.

Basic log

The basic version of the log is already written in Python, you can directly call:

import logging

logging.basicConfig(
    level=10,
    format="%(asctime)s %(name)s %(filename)s %(lineno)s %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S",
    filename='test.log',
    filemode='a', )
logging.debug('这是调试')
logging.info('这是信息')
logging.warning('这是警告')
logging.error('这是错误')
logging.error('这是危险')

Window and did not print out any information, the log file is written to all the text.log. But because the default encoding is gbk, so Chinese garbled. Although you can change the encoding to read the file, but the basic version of the log is no way to specify the encoding method.

1569680789354

With After the log, we can interact with the user by means of the log, and avoid error:

import logging

logging.basicConfig(
    level=10,
    format="%(asctime)s %(name)s %(filename)s %(lineno)s %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S",)
num = input('请输入数字：')
try:
    num = int(num)
    print(num)
except ValueError:
    logging.warning('字符串不能强行转换成数字!')
    
输出的结果为：
请输入数字：a
2019-09-28 22:30:56 root exercise.py 165 字符串不能强行转换成数字!

Basic call log is very convenient, but there are two drawbacks:

The default encoding is gbk, and can not be modified
Screen display and file storage can not have

Because these two shortcomings, we are more commonly used is the advanced version of the log.

Premium Log

Advanced version of the journal is to achieve our own assembly, is obtained by secondary development, the basic usage is:

import logging
logger = logging.getLogger()    # 创建一个空架子
fh = logging.FileHandler('test.log', mode='a', encoding='utf-8')    # 创建一个文件句柄，用来记录日志（文件流）
ch = logging.StreamHandler()    # 创建一个屏幕流，打印记录的内容
f_str = logging.Formatter('%(asctime)s %(name)s %(filename)s %(lineno)s %(message)s')    # 定义一个记录日志的格式
logger.level = 10    # 设置日志记录的级别
fh.setFormatter(f_str)    # 给文件句柄设置记录内容的格式
ch.setFormatter(f_str)    # 给中控台设置记录内容的格式
logger.addHandler(fh)    # 将文件句柄添加到logger对象中
logger.addHandler(ch)    # 将中控台添加到logger对象中

logger.debug("这是调试")
logger.info("这是信息")
logger.warning("这是警告")
logger.error("这是错误")
logger.critical("这是危险")

The screen and documents have emerged log information, and the Chinese do not need to turn encoders can be displayed properly.

1569681846706