Chapter VI module

Chapter VI module

6.1 module definitions

  1. Module Definition
    • py files written to the programmer to provide a direct functional aspects of file
  2. Defined package
    • Folder to store multiple file folders py
    • If a package is introduced, the bag is not used the default module
    • Introducing a package corresponds to the content execution _ _ init _ _.py file
  3. Python2 difference with seven python3
    • python2: folder must have _ _ init _ _.py file
    • python3: does not require _ _ init _ _.py file
    • Recommendation: After the recommended write code, whether or python2 python3, should add this file

6.2 Classification Module (library)

6.2.1 built-in module

  • Internal functions provided by python
  • After import module, can be used directly

6.2.1.1 random

  • Random number of modules
  1. randint: obtain a random number
import random    # 导入一个模块 
v = random.randint(起始,终止)    # 得到一个随机数

#示例:生成随机验证码
import random     
def get_random_code(length=6):
    data = []
    for i in range(length):
        v = random.randint(65,90)
        data.append(chr(v))
    return  ''.join(data)

code = get_random_code()
print(code)
  1. uniform: generating a random decimal
  2. choice: an object extraction
    • Application: verification code lottery
  3. sample: extracting a plurality of objects
    • Application: an award to extract more than one person
  4. shuffle: scrambled
    • Applications: shuffling algorithm

6.2.1.2 Hash

  • Digest algorithm module
    • Ciphertext verification
    • Consistency checking files
  1. md5
# 将指定的 “字符串” 进行 加密
import hashlib     # 导入一个模块
def get_md5(data):          # md5 加密函数  
    obj = hashlib.md5()
    obj.update(data.encode('utf-8'))
    result = obj.hexdigest()
    return result
val = get_md5('123')
print(val)

# 加盐
import hashlib
def get_md5(data):
    obj = hashlib.md5("sidrsdxff123ad".encode('utf-8'))            # 加盐
    obj.update(data.encode('utf-8'))
    result = obj.hexdigest()
    return result
val = get_md5('123')
print(val)
# 应用:用户注册+用户登录
import hashlib
USER_LIST = []
def get_md5(data):                   # md5 加密函数 
    obj = hashlib.md5("12:;idrsicxwersdfsaersdfs123ad".encode('utf-8'))         # 加盐
    obj.update(data.encode('utf-8'))
    result = obj.hexdigest()
    return result

def register():                      # 用户注册函数
    print('**************用户注册**************')
    while True:
        user = input('请输入用户名:')
        if user == 'N':
            return
        pwd = input('请输入密码:')
        temp = {'username':user,'password':get_md5(pwd)}
        USER_LIST.append(temp)

def login():                          # 用户登录函数
    print('**************用户登陆**************')
    user = input('请输入用户名:')
    pwd = input('请输入密码:')
    for item in USER_LIST:
        if item['username'] == user and item['password'] == get_md5(pwd):
            return True

register()
result = login()
if result:
    print('登陆成功')
else:
    print('登陆失败')
  1. sha

    import hashlib
    md5 = hashlib.sha1('盐'.encode())
    md5.update(b'str')
    print(md5.hexdigest())

6.2.1.3 getpass

  • Only run in the terminal
  1. getpass.getpass: When a password is not displayed
import getpass        # 导入一个模块
pwd = getpass.getpass('请输入密码:')
if pwd == '123':
    print('输入正确')

6.2.1.4 time [common]

  • Time Module
  1. time.time: timestamp (from 1970 to the present experience of seconds)

    # https://login.wx.qq.com/cgi-bin/mmwebwx-bin/login?loginicon=true&uuid=4ZwIFHM6iw==&tip=1&r=-781028520&_=1555559189206
  2. time.sleep: The number of seconds to wait

  3. time.timezone

  • Examples

    # 计算函数执行时间
    import time
    def wrapper(func):
        def inner():
            start_time = time.time()
            v = func()
            end_time = time.time()
            print(end_time-start_time)
            return v
        return inner
    @wrapper
    def func1():
        time.sleep(2)
        print(123)   
    func1()

6.2.1.5 datetime

  • Time Module
  1. datetime.now (): Current local time

  2. datetime.utcnow (): the current UTC time

    import time,timezone,timedelta
    from datetime import datetime,timezone,timedelta
    # 获取datetime格式时间
    # 当前本地时间
    v1 = datetime.now() 
    
    # 当前东7区时间
    tz = timezone(timedelta(hours=7)) 
    v2 = datetime.now(tz)
    
    # 当前UTC时间
    v3 = datetime.utcnow() 
    print(v3)
  • Conversion

    import time
    from datetime import datetime,timedelta
    # 1.datetime格式和字符串的相互转换
    # 把datetime格式转换成字符串:strftime
    v1 = datetime.now()
    val = v1.strftime("%Y-%m-%d %H:%M:%S")
    # 字符串转成datetime格式:strptime
    v1 = datetime.strptime('2011-11-11','%Y-%m-%d')
    
    # 2.datetime时间的加减
    v1 = datetime.strptime('2011-11-11','%Y-%m-%d')
    v2 = v1 - timedelta(days=140)
    # 再转换成字符串
    date = v2.strftime('%Y-%m-%d')
    
    # 3.时间戳和datetime的相互转换
    # 时间戳转换成datetime格式:fromtimestamp
    ctime = time.time()
    v1 = datetime.fromtimestamp(ctime)
    # datetime格式转换成时间戳:timestamp
    v1 = datetime.now()
    val = v1.timestamp()

6.2.1.6 sys

  • python interpreter data
  1. sys.getrefcount: acquiring a count value of the application

  2. sys.getrecursionlimit: python supported by default the number of recursive

  3. sys.stdout.write: Input Output

    • Supplementary: \ n: newline \ t: Tab \ r: back to the beginning of the current row

      import time
      for i in range(1,101):
          msg = "%s%%\r" %i
          print(msg,end='')
          time.sleep(0.05)
      
    • Example: a progress bar to read the file

    import os
    # 1. 读取文件大小(字节)
    file_size = os.stat('20190409_192149.mp4').st_size
    
    # 2.一点一点的读取文件
    read_size = 0
    with open('20190409_192149.mp4',mode='rb') as f1,open('a.mp4',mode='wb') as f2:
        while read_size < file_size:
            chunk = f1.read(1024) # 每次最多去读取1024字节
            f2.write(chunk)
            read_size += len(chunk)
            val = int(read_size / file_size * 100)
            print('%s%%\r' %val ,end='')
    
  4. sys.argv: getting users to execute the script, the incoming parameter

    • Example: allow users to execute scripts passed to delete the file path, internally to help with the directory delete
    """
    让用户执行脚本传入要删除的文件路径,在内部帮助用将目录删除。
    C:\Python36\python36.exe D:/code/s21day14/7.模块传参.py D:/test
    C:\Python36\python36.exe D:/code/s21day14/7.模块传参.py
    """
    
    import sys
    # 获取用户执行脚本时,传入的参数。
    # C:\Python36\python36.exe D:/code/s21day14/7.模块传参.py D:/test
    # sys.argv = [D:/code/s21day14/7.模块传参.py, D:/test]
    path = sys.argv[1]
    
    # 删除目录
    import shutil
    shutil.rmtree(path)
    
  5. sys.exit (0): program termination, 0 for normal termination

  6. sys.path: default python to import module will follow the path of sys.path

    • Add directory: sys.path.append ( 'directory')
    import sys
    sys.path.append('D:\\goodboy')
    
  7. sys.modules: storing the contents of all modules, reflecting the current procedures used in this document in

6.2.1.7 os [common]

  • And operating system-related data
  1. os.path.exists (path): If the path exists, returns True; if the path does not exist, returns False

  2. os.stat ( 'file path') .st_size / os.path.getsize: Get File Size

  3. os.path.abspath (): Gets the absolute path of a file

    import os
    os.path.abspath(__file__)  #找到运行脚本的绝对路径
    v1 = os.path.abspath(path)
    print(v1)
    
  4. os.path.dirname (): Gets the path of the parent directory

    import os
    v = r"D:\code\s21day14\20190409_192149.mp4"
    print(os.path.dirname(v))
    
    • Added: Escape

      v1 = r"D:\code\s21day14\n1.mp4"  (推荐) 加了r就相当于转义了
      v2 = "D:\\code\\s21day14\\n1.mp4"
      
  5. Mosaic path: os.path.join

    import os
    path = "D:\code\s21day14" # user/index/inx/fasd/
    v = 'n.txt'
    result = os.path.join(path,v)
    print(result)
    
  6. os.listdir: View a list of all the files [first floor]

    import os
    result = os.listdir(r'D:\code\s21day14')
    for path in result:
        print(path)
    
  7. os.walk: View a list of all the files of all the layers []

    import os
    result = os.walk(r'D:\code\s21day14')
    for a,b,c in result:
        # a,正在查看的目录 b,此目录下的文件夹  c,此目录下的文件
        for item in c:
            path = os.path.join(a,item)
            print(path)
    
  8. os.makedir: Create a directory, can only produce a directory (basically do this)

  9. os.makedirs: Create a directory and its subdirectories (recommended)

    # 将内容写入指定文件中
    import os
    file_path = r'db\xx\xo\xxxxx.txt'
    file_folder = os.path.dirname(file_path)
    if not os.path.exists(file_folder):
        os.makedirs(file_folder)
    with open(file_path,mode='w',encoding='utf-8') as f:
        f.write('asdf')
    
  10. os.rename: Rename

    # 将db重命名为sb
    import os
    os.rename('db','sb')
    
  11. os.path.isdir: determine whether the folder

  12. os.path.isfile: determine whether the file

6.2.1.8 shutil

  • Uses: delete, rename, compression, decompression, etc.
  1. shutil.rmtree (path): remove directory

    # 删除目录
    import shutil
    shutil.rmtree(path)
  2. shutil.move: Rename

    # 重命名
    import shutil
    shutil.move('test','ttt')
  3. shutil.make_archive: compressed files

    # 压缩文件
    import shutil
    shutil.make_archive('zzh','zip','D:\code\s21day16\lizhong')
  4. shutil.unpack_archive: Unzip the file

    # 解压文件
    import shutil
    shutil.unpack_archive('zzh.zip',extract_dir=r'D:\code\xxxxxx\xxxx',format='zip')
  • Examples

    import os
    import shutil
    from datetime import datetime
    ctime = datetime.now().strftime('%Y-%m-%d-%H-%M-%S')
    # 1.压缩lizhongwei文件夹 zip
    # 2.放到到 code 目录(默认不存在)
    # 3.将文件解压到D:\x1目录中。
    if not os.path.exists('code'):
        os.makedirs('code')
    shutil.make_archive(os.path.join('code',ctime),'zip','D:\code\s21day16\lizhongwei')
    
    file_path = os.path.join('code',ctime) + '.zip'
    shutil.unpack_archive(file_path,r'D:\x1','zip')
    

6.2.1.9 json

  • json is a special string looks like a list / dictionary / string / nesting numbers, etc.
  • Serialization: python in the value converted to a string format json
  • Deserialize: json format Converts a string into a data type python
  • json format requirements: essentially string
    • It contains only int / float / str / list / dict
    • The outermost layer must list / dict outermost removed quotes
    • In json, the internal str must be double quotes
    • Dictionary presence of key can only be str
    • Not continuous load times
  1. json.dumps (): Serialization

    • json only supports dict / list / typle / str / int / float / True / False / None serialization
    • Dictionary or list if there are Chinese, serialization, if you want to keep the Chinese show
    import json
    v = {'k1':'alex','k2':'李杰'}
    val = json.dumps(v,ensure_ascii = False)  #ensure_ascii 保留中文
    #{"k1": "alex", "k2": "李杰"}
    
  2. json.loads (): deserialization

    import json
    # 序列化,将python的值转换为json格式的字符串。
    v = [12,3,4,{'k1':'v1'},True,'asdf']
    v1 = json.dumps(v)
    print(v1)
    
    # 反序列化,将json格式的字符串转换成python的数据类型
    v2 = '["alex",123]'
    print(type(v2))
    v3 = json.loads(v2)
    print(v3,type(v3))
    
  3. json.dump: After opening the file, serialization, write to the file

    import json
    v = {'k1':'alex','k2':'李杰'}
    f = open('x.txt',mode='w',encoding='utf-8')
    val = json.dump(v,f)
    print(val)
    f.close()
    
  4. json.load: open the file, read the contents of the file

    import json
    v = {'k1':'alex','k2':'李杰'}
    f = open('x.txt',mode='r',encoding='utf-8')
    data = json.load(f)
    f.close()
    print(data,type(data))
    

6.2.1.10 pickle

  • The difference between the pickle and json
    • json
      • Pros: All common language
      • Disadvantages: Only basic data types of the sequence list / dict like
    • pickle
      • Advantages: python all the things he can be serialized (socket objects), support continuous load times
      • Cons: serialized content only know python
  1. pickle.dumps: Serialization

    • Things serialized unreadable
  2. pickle.loads: deserialized

    import pickle
    # 序列化
    v = {1,2,3,4}
    val = pickle.dumps(v)
    print(val)
    
    # 反序列化
    data = pickle.loads(val)
    print(data,type(data))
  3. pickle.dump: writing file (Note: mode = 'wb')

  4. pickle.load: read the file (Note: mode = 'rb')

    import pickle
    # 写入文件
    v = {1,2,3,4}
    f = open('x.txt',mode='wb')
    val = pickle.dump(v,f)
    f.close()
    
    # 读取文件
    f = open('x.txt',mode='rb')
    data = pickle.load(f)
    f.close()
    print(data)

6.2.1.11 copy

  • Copy module
  1. copy.copy: shallow copy

  2. copy.deepcopy: deep copy

    import copy 
    v1 = [1,2,3]      
    v2 = copy.copy(v1)             #浅拷贝
    v3 = copy.deepcopy(v1)         #深拷贝
    

6.2.1.12 importlib

  1. importlib.import_module: import module as a string

    #示例一:
    import importlib
    # 用字符串的形式导入模块。
    redis = importlib.import_module('utils.redis')
    
    # 用字符串的形式去对象(模块)找到他的成员。
    getattr(redis,'func')()
    
    #示例二:
    import importlib
    middleware_classes = [
        'utils.redis.Redis',
        # 'utils.mysql.MySQL',
        'utils.mongo.Mongo'
    ]
    for path in middleware_classes:
        module_path,class_name = path.rsplit('.',maxsplit=1)
        module_object = importlib.import_module(module_path)# from utils import redis
        cls = getattr(module_object,class_name)
        obj = cls()
        obj.connect()
    

6.2.1.13 logging

  • Log module: logging

    • Look to the user: the water category, such as bank water
    • To the programmer to see:
      • Statistics used
      • Used for troubleshooting, debug
      • For recording errors, optimized code is completed
  • Log Processing essence: Logger / FileHandler / Formatter

  • Two configurations:

    • basicConfig

      • Pros: Easy to use
      • Disadvantages: can not achieve coding problem, can not simultaneously output to a file and screen
    • logger objects

      • Advantages: can realize coding problem, it could also output to a file and screen
      • Disadvantages: Complex
      • Example:
      import logging
      # 创建一个logger对象
      logger = logging.getLogger()
      # 创建一个文件操作符
      fh = logging.FileHandler('log.log')
      # 创建一个屏幕操作符
      sh = logging.StreamHandler()
      # 给logger对象绑定 文件操作符
      logger.addHandler(fh)
      # 给logger对象绑定 屏幕操作符
      logger.addHandler(sh)
      # 创建一个格式
      formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
      # 给文件操作符 设定格式
      fh.setFormatter(formatter)
      # 给屏幕操作符 设定格式
      sh.setFormatter(formatter)
      # 用logger对象来操作
      logger.warning('message')
      
  • Log abnormal level

    CRITICAL = 50       # 崩溃
    FATAL = CRITICAL
    ERROR = 40          # 错误
    WARNING = 30
    WARN = WARNING
    INFO = 20
    DEBUG = 10
    NOTSET = 0
  • Recommended processing log mode

    import logging
    
    file_handler = logging.FileHandler(filename='x1.log', mode='a', encoding='utf-8',)
    logging.basicConfig(
        format='%(asctime)s - %(name)s - %(levelname)s -%(module)s:  %(message)s',
        datefmt='%Y-%m-%d %H:%M:%S %p',
        handlers=[file_handler,],
        level=logging.ERROR
    )
    
    logging.error('你好')
  • Recommended log splitting process the log mode +

    import time
    import logging
    from logging import handlers
    # file_handler = logging.FileHandler(filename='x1.log', mode='a', encoding='utf-8',)
    file_handler = handlers.TimedRotatingFileHandler(filename='x3.log', when='s', interval=5, encoding='utf-8')
    logging.basicConfig(
        format='%(asctime)s - %(name)s - %(levelname)s -%(module)s:  %(message)s',
        datefmt='%Y-%m-%d %H:%M:%S %p',
        handlers=[file_handler,],
        level=logging.ERROR
    )
    
    for i in range(1,100000):
        time.sleep(1)
        logging.error(str(i))

    Precautions:

    # 在应用日志时,如果想要保留异常的堆栈信息。
    import logging
    import requests
    
    logging.basicConfig(
        filename='wf.log',
        format='%(asctime)s - %(name)s - %(levelname)s -%(module)s:  %(message)s',
        datefmt='%Y-%m-%d %H:%M:%S %p',
        level=logging.ERROR
    )
    
    try:
        requests.get('http://www.xxx.com')
    except Exception as e:
        msg = str(e) # 调用e.__str__方法
        logging.error(msg,exc_info=True)

6.2.1.14 collections

  1. OrderedDict: ordered dictionary

    from collections import OrderedDict
    odic = OrderedDict([('a', 1), ('b', 2), ('c', 3)])
    print(odic)
    for k in odic:
        print(k,odic[k])
  2. defaultdict: default dictionary

  3. deque: deque

  4. namedtuple: tuples can be named

    # 创建一个类,这个类没有方法,所有属性的值都不能修改
    from collections import namedtuple       # 可命名元组
    Course = namedtuple('Course',['name','price','teacher'])
    python = Course('python',19800,'alex')
    print(python)
    print(python.name)
    print(python.price)

6.2.1.15 re

Regular Expressions
  1. definition
  • Definition: A regular expression is a regular string matching rules
  • re module itself is only used to operate the regular expressions, and it does not matter itself canonical
  • Why should there be a regular expression?
    • Matching string
      • A person's phone number
      • A person's identity card number
      • A machine ip address
    • form validation
      • Validate user input information is accurate
      • Bank card number
    • reptile
      • Get some links to important data from web pages source code
  1. Regular rules
  • The first rule: which itself is a character, to match a character string in which
  • The second rule: character set [Character 1 Character 2], on behalf of a group of characters to match a character, this character appears in character as long as the group, then it shows the character to match the
    • You can also use the character set range
    • All ranges are ascii codes must be followed to be specified from large
    • Common: [0-9] [az] [AZ]
  1. Metacharacters
  • \ D: represents all numbers
    • \ Escape character is the escape character escaping d, let d be able to match all the numbers between 0-9
  • \ W: represents the uppercase and lowercase letters, numbers, underscores
  • \ S: represents the blank spaces, line breaks, tabs
  • \ T: the matching tab
  • \ N: newline
  • \ D: represents all non-numeric
  • \ W: represents all characters except numbers, letters, underlined
  • \ S: represents the non-blank
  • : Represents any content except newline
  • [] Character set: As long as all of the characters within the brackets are in line with the rules of the character
  • [^] Non-character set: As long as all of the characters within the brackets are not in line with the rules of the character
  • ^: Indicates the start of a character
  • $: Indicates the end of a character
  • |: Representation or
    • Note: If the rule has two overlapping portions is always in front of the long, short back
  • (): Indicates the packet, specified as a regular part of a group, | the scope of this symbol can be reduced
  • special:
    • [\ D], [0-9], \ d: no distinction is to be matched digit
    • [\ D \ D], [\ W \ w], [\ S \ s] matches all characters in all
  1. quantifier
  • {N}: n times occurred only represents
  • {N,}: indicating the occurrence of at least n times
  • {N, m}: indicates that an at least n times, occur at most m times
  • ? : Means match 0 or 1, represents the essential, but there is only one, such as decimal point
  • +: Means match one or more times
  • *: Match indicates zero or more times, represents optional, but for example there may be a plurality of n bits after the decimal point
  • 0 matching occurrences:
    • Match any retention of digital two decimal places
    • Match an integer or decimal
  1. Greed match
  • The default greedy match will always be matched as much as possible within the scope of compliance with the conditions of quantifiers
  • Non-greedy Match: inert Match
    • Always match within the conditions as small as possible in line with the string
    • Format: quantifier metacharacter x?
      • Metacharacter means match in accordance with the rules in quantifier scope, the event x stop
      • Example:? * X matches any of the content as many times as encountered immediately cease x
  1. Escapes:
  • Regular expression character string escape python role in metastasis also happens
  • But the relationship did not escape the regular expression and string escapes, and also likely to have conflict
  • To avoid this conflict, all of us are to the regular test results tool for results
  • Then only in the regular and the outside of the string to be matched are added to r
邮箱规则
@之前必须有内容且只能是字母(大小写)、数字、下划线(_)、减号(-)、点(.)
@和最后一个点(.)之间必须有内容且只能是字母(大小写)、数字、点(.)、减号(-),且两个点不能挨着
最后一个点(.)之后必须有内容且内容只能是字母(大小写)、数字且长度为大于等于2个字节,小于等于6个字节

邮箱验证的正则表达式:
^[a-zA-Z0-9_.-]+@[a-zA-Z0-9-]+(\.[a-zA-Z0-9-]+)*\.[a-zA-Z0-9]{2,6}$
Regular module
  1. re.findall: All items will match the string matching the rule, and returns a list, if not matched, return empty list

    import re
    ret = re.findall('\d+','alex83')
    print(ret)
    # findall 会匹配字符串中所有符合规则的项
    # 并返回一个列表
    # 如果未匹配到返回空列表
    
  2. re.search: If matched, returns an object, with the group value, if not matched, return None, not by group

    import re
    ret = re.search('\d+','alex83')
    print(ret)                 # 如果能匹配上返回一个对象,如果不能匹配上返回None
    if ret:
        print(ret.group())     # 如果是对象,那么这个对象内部实现了group,所以可以取值
                               # 如果是None,那么这个对象不可能实现了group方法,所以报错
    # 会从头到尾从带匹配匹配字符串中取出第一个符合条件的项
    # 如果匹配到了,返回一个对象,用group取值
    # 如果没匹配到,返回None,不能用group
  3. re.match: match = search + ^ regular

    import re
    ret = re.match('\d','alex83') == re.match('^\d','alex83')
    print(ret)
    # 会从头匹配字符串,从第一个字符开始是否符合规则
    # 如果符合,就返回对象,用group取值
    # 如果不符合,就返回None
  4. re.finditer: results of a query in the case of more than one, it is possible to effectively save memory and reduce the space complexity, thus reducing the time complexity

    import re
    ret = re.finditer('\d','safhl02urhefy023908'*20000000)  # ret是迭代器
    for i in ret:    # 迭代出来的每一项都是一个对象
        print(i.group())  # 通过group取值即可
  5. re.compile: in the same regular expression used many times when the use of time can reduce overhead

    import re
    ret = re.compile('\d+')
    r1 = ret.search('alex83')
    r2 = ret.findall('wusir74')
    r3 = ret.finditer('taibai40')
    for i in r3:
        print(i.group())
  6. re.split: Regular use of cutting rules

    import re
    ret = re.split('\d(\d)','alex83wusir74taibai')  # 默认自动保留分组中的内容
    print(ret)
    
  7. re.sub / re.subn: using the rules of regular replacement

    import re
    ret = re.sub('\d','D','alex83wusir74taibai',1)
    print(ret)      # 'alexD3wusir74taibai'
    
    ret = re.subn('\d','D','alex83wusir74taibai')
    print(ret)      # ('alexDDwusirDDtaibai', 4)
    
  • Grouping and re module
  1. About group values

    import re
    ret = re.search('<(\w+)>(.*?)</\w+>',s1)
    print(ret)
    print(ret.group(0))   # group参数默认为0 表示取整个正则匹配的结果
    print(ret.group(1))   # 取第一个分组中的内容
    print(ret.group(2))   # 取第二个分组中的内容
    
    
  2. Packet name :(? P <name> Regular Expressions)

    import re
    ret = re.search('<(?P<tag>\w+)>(?P<cont>.*?)</\w+>',s1)
    print(ret)
    print(ret.group('tag'))   # 取tag分组中的内容
    print(ret.group('cont'))   # 取cont分组中的内容
  3. Group references :(? P = group name) about this group and the group must complete before the existing content to match exactly

    import re
    # 方法一:
    s = '<h1>wahaha</h1>'
    ret = re.search('<(?P<tag>\w+)>.*?</(?P=tag)>',s)
    print(ret.group('tag'))      # 'h1'
    
    # 方法二:
    s = '<h1>wahaha</h1>'
    ret = re.search(r'<(\w+)>.*?</\1>',s)
    print(ret.group(1))          # 'h1'
  4. Grouping and findall: findall default display content within a packet priority, ungroup prioritize :( ?: Regular)

    import re
    ret = re.findall('\d(\d)','aa1alex83')
    # findall遇到正则表达式中的分组,会优先显示分组中的内容
    print(ret)
    
    # 取消分组优先显示:
    ret = re.findall('\d+(?:\.\d+)?','1.234+2')
    print(ret)
  5. Sometimes we want to match the content included in the content of which do not match, this time just do not want to put the match out of the first match, and then removed by means

    import re
    ret=re.findall(r"\d+\.\d+|(\d+)","1-2*(60+(-40.35/5)-(-4*3))")
    print(ret)       # ['1', '2', '60', '', '5', '4', '3']
    ret.remove('')
    print(ret)       # ['1', '2', '60', '5', '4', '3']
    
  • Examples of reptiles

    # 方法一:
    import re
    import json
    import requests
    
    def parser_page(par,content):
        res = par.finditer(content)
        for i in res:
            yield {'id': i.group('id'),
                   'title': i.group('title'),
                   'score': i.group('score'),
                   'com_num': i.group('comment_num')}
    
    def get_page(url):
        ret = requests.get(url)
        return  ret.text
    
    
    pattern = '<div class="item">.*?<em class="">(?P<id>\d+)</em>.*?<span class="title">(?P<title>.*?)</span>.*?' \
                  '<span class="rating_num".*?>(?P<score>.*?)</span>.*?<span>(?P<comment_num>.*?)人评价</span>'
    par = re.compile(pattern,flags=re.S)
    num = 0
    with open('movie_info',mode = 'w',encoding='utf-8') as f:
        for i in range(10):
            content = get_page('https://movie.douban.com/top250?start=%s&filter=' % num)
            g = parser_page(par,content)
            for dic in g:
                f.write('%s\n'%json.dumps(dic,ensure_ascii=False))
            num += 25
    # 方法二:进阶
    import re
    import json
    import requests
    
    def parser_page(par,content):
        res = par.finditer(content)
        for i in res:
            yield {'id': i.group('id'),
                   'title': i.group('title'),
                   'score': i.group('score'),
                   'com_num': i.group('comment_num')}
    
    def get_page(url):
        ret = requests.get(url)
        return  ret.text
    
    def write_file(file_name):
        with open(file_name,mode = 'w',encoding='utf-8') as f:
            while True:
                dic = yield
                f.write('%s\n' % json.dumps(dic, ensure_ascii=False))
    
    pattern = '<div class="item">.*?<em class="">(?P<id>\d+)</em>.*?<span class="title">(?P<title>.*?)</span>.*?' \
                  '<span class="rating_num".*?>(?P<score>.*?)</span>.*?<span>(?P<comment_num>.*?)人评价</span>'
    par = re.compile(pattern,flags=re.S)
    num = 0
    f = write_file('move2')
    next(f)
    for i in range(10):
        content = get_page('https://movie.douban.com/top250?start=%s&filter=' % num)
        g = parser_page(par,content)
        for dic in g:
            f.send(dic)
        num += 25
    f.close()
    

6.2.2 third-party modules

6.2.2.1 Basics

  • After the need to download and install into the use of

  • Installation:

    • pip package management tools
    # 把pip.exe 所在的目录添加到环境变量中。
    pip install 要安装的模块名称  # pip install xlrd
    • Source Installation
    # 下载源码包(压缩文件) -> 解压 -> 打开cmd窗口,并进入此目录:cd C:\Python37\Lib\site-packages
    # 执行:python3 setup.py build
    # 执行:python3 setup.py install
  • Installation Path: C: \ Python37 \ Lib \ site-packages

6.2.2.2 popular third-party modules

  1. requests
  2. xlrd

6.2.3 custom module

  • Write your own xx.py

    def f1():
        print('f1')
    
    def f2():
        print('f2')
    
  • Call in yy.py

    # 调用自定义模块中的功能
    import xx
    xx.f1()
    xx.f2()
  • run

    python yy.py 

6.3 module call

Note: Naming files and folders can not be the same module name to import, otherwise you will find directly in the current directory

6.3.1 absolute imports

1. The basic calling module and import

  • Importing files XXX.py
  • method one
# 导入模块,加载此模块中所有的值到内存。
import XXX

# 调用模块中的函数
XXX.func()
  • Second way
# 导入XXX.py中的func和show
from XXX import func,show

# 导入XXX.py中的所有值
from XXX import *

# 调用模块中的函数
func()
  • Three ways
# 如果有重名情况,就导入时起一个别名
# 导入XXX.py中的func,并起一个别名为f
from XXX import func as f

# 调用模块中的函数
f()
  • to sum up

    • Import: import module calls: a function module ()
    • Import: from module import function call: function ()
    • Import: from import function module as an alias, call: alias ()
    • Knowledge points:
      • as: surnamed
      • *: On behalf of all
  • supplement

    • Multiple imports will not reload

      import jd # 第一次加载:会加载一遍jd中所有的内容。
      import jd # 由已经加载过,就不再加载。
      print(456)
    • Have to reload

      import importlib
      import jd
      importlib.reload(jd)
      print(456)
  1. py files and invoke the import folder
  • XXX.py file into the folder YYY
  • method one
# 导入模块
import YYY.XXX

# 调用模块中的函数
XXX.func()
  • Second way
# 导入模块
from YYY import XXX

# 调用模块中的函数
XXX.func()
  • Three ways
# 导入模块
from YYY.XXX import func

# 调用模块中的函数
func()
  1. to sum up
  • And py module file to be executed in the same directory and requires a lot of features in the module, recommended:
    • Import: import module calls: a function module ()
  • Other Recommendations:
    • Import: from module import module calls: a function module ()
    • Import:. From module import function module call: function ()

6.3.2 relative imports (not recommended)

from . import xxx
from .. import xxx

Guess you like

Origin www.cnblogs.com/hanfe1/p/11570548.html