"Python Basics" I/O programming, regular expressions

1. I/O process

I/Orefers to Input/ Output;

Input StreamFlow into memory from the outside (disk, network);

Output Streamflow from memory to outside;

同步 I/OThe CPU waits I/Ofor completion, and the program suspends subsequent execution;

异步 I/OThe CPU does not wait I/Ofor completion, but does other things first, and handles the follow- up through callback or pollingI/O ;

file read and write

The function of reading and writing files on the disk is provided by the operating system, and the modern operating system does not allow ordinary programs to directly operate the disk;

File Stream Operation Method

method illustrate
open() Open the file object in the specified mode, the parameters are 文件名and 模式标示符, the optional parameter encoding(encoding) errors(encoding error handling method)
read() Read all the contents of the file at once and return strthe object
read(size) read sizebytes at a time
readline() Read one line at a time
readlines() Reads all at once and returns a row-separatedlist
write() Write the content to be written into the memory cache, and closeactually write the content out when called
close() Close the file and write out all the contents of the memory cache before closing

file object mode

character meaning
r read (default)
w Write, first truncate the file
x Exclusive creation, fails if file already exists
a Write, append to the end of the file if it already exists
b binary model
t text mode (default)
+ update (read and write)

read file

with open('/Users/aurelius/test.txt', 'r') as f:
    print(f.read())

withThe statement can guarantee openthat the file will eventually be saved close, and the same function can be realized by executing try ... finallythe statement in ;finallyclose

write file

with open('/User/aurelius/test.txt', 'w') as f:
    f.write('hello, world.')

StringIO and BytesIO

StringIO

Reading and writing in memory strhas a consistent interface with reading and writing files;

from io import StringIO
# InputStream
f = StringIO()
f.write('hello')
# 读取写入的 str
f.getvalue()

# OutputStream
f = StringIO('hello, 中国')
f.read()

BytesIO

read and write in memorybytes

from io import BytesIO
# InputStream
f = BytesIO()
f.write('中文'.encode('utf-8'))
print(f.getvalue())

# OutputStream
f = BytesIO(b'\xe4\xb8\xad\xe6\x96\x87')
print(f.read().decode('utf-8'))

Manipulating files and directories

The built-in osmodules of Python can directly call the interface functions provided by the system to operate files and directories;

>>> import os
>>> os.name
nt

environment variable

os.environ # 全部环境变量 (Class<Environ>)
os.environ.get('key', 'default') # 指定的环境变量,default 可选

Manipulating files and directories

function effect
os.path.abspath(‘.’) The absolute path of the current path
os.path.join(r’d:\a’, ‘b’) Splice path 2 ( b) to path 1 ( d:\a), if path 2 is an absolute path, return path 2 directly
os.mkdir(r'd:\test') create a directory
os.mkdir(r'd:\test') delete a directory
os.path.split(r’d:\test\file.txt’) Split into last level directory and filename
os.path.splitext(r’d:\test\file.txt’) Split file extensions
os.rename(‘test.txt’, ‘text.py’) rename file
os.remove(‘test.py’) Delete Files
os.listdir('.') List the specified path
os.path.isdir(‘d:\test’) Determine whether the path
os.path.isfile(‘d:\test\test.txt’) Judging whether the file

shutilThe module ossupplements the function, which copyfile()provides file copy function;

Serialization

The process of changing variables from memory to storage or transmission is called serialization pickling, and rereading serialized objects into memory is called deserialization unpickling;

Pickle

  • dumps/dump
>>> import pickle
>>> d = dict(name='中国人', age=18, score=99)
# pickle.dumps 把任意对象序列化成 bytes
>>> pickle.dumps(d)
b'\x80\x04\x95*\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x04name\x94\x8c\t\xe4\xb8\xad\xe5\x9b\xbd\xe4\xba\xba\x94\x8c\x03age\x94K\x12\x8c\x05score\x94Kcu.'
# pickle.dump 直接把对象序列化后写入 file-like Ojbect
>>> with open('dump.txt', 'wb') as w:
...     pickle.dump(d, w)
  • loads/load
>>> with open('dump.txt', 'rb') as r:
...     d = pickle.load(r)
...
>>> d
{
    
    'name': 'Aurelius', 'age': 18, 'score': 99}

pickleThe variable obtained by deserialization has nothing to do with the original variable, but the value is the same;

pickleSerialization is only available for Python, and different versions are not compatible with each other;

JSON

A standard format for serialization, suitable for transfer between different programming languages, the standard encoding uses UTF-8;

  • JSON type relationship
JSON type Python type
{} dict
[] list
string str
int/float int/float
true/false True/False
null None
>>> import json
>>> d = dict(name='Aurelius', age=18, score=99)
>>> json_str = json.dumps(d)
>>> json_str
'{"name": "Aurelius", "age": 18, "score": 99}'
>>> json.loads(json_str)
{
    
    'name': 'Aurelius', 'age': 18, 'score': 99}

dumps/dumpensure_ascii参数可以决定是否统一将返回的str对象编码为ascii字符;

JSON 进阶

自定义类的对象不能直接序列化,需要实现dumps/dumpdefault参数对应的方法,将该对象转化成dict对象;

json.dumps(o, default=object2dict)

通常class都有__dict__属性,存储着实例的变量(定义了__solts__除外),因此可以直接如此调用;

json.dumps(o, default=lambda o: o.__dict__)

loads/load在反序列化自定义类型时也需传入object_hook相应方法,将dict对象转化为自定义类型的对象;

json.loads(json_str, object_hook=dict2object)

2. 正则表达式

用一种描述性的语言给字符串定义一个规则,用这种规则匹配字符串;

描述符 作用 示例
\d 匹配数字 ‘00\d’ 匹配 ‘007’
\w 字母或数字 ‘\w\w\d’ 匹配 ‘py3’
. 任意字符 ‘py.’ 匹配 ‘pyc’、‘py!’
* 人一个字符串(包括 0 个)
+ 至少 1 个字符
? 0 个或 1 个字符
{n} n 个字符 ‘\d{3}’ 匹配 ‘010’
{n,m} n ~ m 个字符 ‘\d{3,8}’ 匹配 ‘1234567’
\ 转义字符 ‘\d{3}-\d{3,8}’ 匹配 ‘010-12345’
\s 空格、空位符

进阶

描述符 作用 示例
[] 表示范围 ‘[0-9a-zA-Z_]’ 匹配任意一个数字、字母或下划线
A|B 匹配 A 或 B
^ 行的开头 ‘^\d’ 表示以数字开头
$ 行的结束 ‘\d$’ 表示以数字结束

re 模块

Python 字符串本身用\转义,正则表达式也用\转义,在拼写正则表达式时使用r前缀可以忽略掉 Python 本身字符串的转义;

match

>>> import re
>>> re.match(r'^\d{3}\-\d{3,8}$', '010-12345')
<re.Match object; span=(0, 9), match='010-12345'>
>>> re.match(r'^\d{3}\-\d{3,8}$', '010 12345')
>>>

当匹配成功时,返回一个 Match 对象,否则返回 None;

split

>>> re.split(r'\s+', 'a b   c')
['a', 'b', 'c']
>>> re.split(r'[\s\,\;]+', 'a,b;; c  d')
['a', 'b', 'c', 'd']

通过模式分割字符串,返回分割的数组;

group

>>> m = re.match(r'^(\d{3})-(\d{3,8})$', '010-12345')
>>> m
<re.Match object; span=(0, 9), match='010-12345'>
>>> m.group(2)
'12345'
>>> m.group(1)
'010'
>>> m.group(0)
'010-12345'

通过()提取分组子串,group(0)表示匹配的全部字符串,group(n)表示第 n 个子串;

贪婪匹配

匹配尽可能多的字符

>>> re.match(r'^(\d+)(0*)$', '102300').groups()
('102300', '')
>>> re.match(r'^(\d+)(0+)$', '102300').groups()
('10230', '0')

正则匹配默认是贪婪匹配,想要非贪婪匹配(尽可能少匹配),在\d+后加?

>>> re.match(r'^(\d+?)(0*)$', '102300').groups()
('1023', '00')

编译

re模块执行步骤:

  1. 编译正则表达式,不合法则报错;
  2. 用编译后的正则表达式匹配字符串;
  • 预编译
>>> import re
>>> re_telephone = re.compile(r'^(\d{3})-(\d{3,8})$')
>>> re_telephone.match('010-12345').groups()
('010', '12345')
>>> re_telephone.match('010-8086').groups()
('010', '8086')

匹配简单邮箱

def is_valid_email(addr):
    if re.match(r'(^[a-zA-Z\.]+)\@(gmail|microsoft)\.com$', addr):
        return True
    else:
        return False

匹配带名称邮箱,提取名称

def name_of_email(addr):
    # 提取邮箱前缀
    m = re.match(r'^([a-zA-Z\d\s\<\>]+)\@(voyager|example)\.(org|com)$', addr)
    if not m:
        return None
    # 提取前缀中 <> 里面的名称,若不存在,则取全名
    m = re.match(r'^\<([a-zA-Z\s]+)\>[\s]+[a-zA-Z\d]+|([a-zA-Z\d]+)$', m.group(1))

    return m.group(1) if m and m.group(1) else m.group(2)

PS:感谢每一位志同道合者的阅读,欢迎关注、评论、赞!

Guess you like

Origin blog.csdn.net/ChaoMing_H/article/details/129432991