Article directory
1. I/O process
I/O
refers to Input
/ Output
;
Input Stream
Flow into memory from the outside (disk, network);
Output Stream
flow from memory to outside;
同步 I/O
The CPU waits I/O
for completion, and the program suspends subsequent execution;
异步 I/O
The CPU does not wait I/O
for completion, but does other things first, and handles the follow- up through callback or pollingI/O
;
file read and write
The function of reading and writing files on the disk is provided by the operating system, and the modern operating system does not allow ordinary programs to directly operate the disk;
File Stream Operation Method
method | illustrate |
---|---|
open() | Open the file object in the specified mode, the parameters are 文件名 and 模式标示符 , the optional parameter encoding (encoding) errors (encoding error handling method) |
read() | Read all the contents of the file at once and return str the object |
read(size) | read size bytes at a time |
readline() | Read one line at a time |
readlines() | Reads all at once and returns a row-separatedlist |
write() | Write the content to be written into the memory cache, and close actually write the content out when called |
close() | Close the file and write out all the contents of the memory cache before closing |
file object mode
character | meaning |
---|---|
r |
read (default) |
w |
Write, first truncate the file |
x |
Exclusive creation, fails if file already exists |
a |
Write, append to the end of the file if it already exists |
b |
binary model |
t |
text mode (default) |
+ |
update (read and write) |
read file
with open('/Users/aurelius/test.txt', 'r') as f:
print(f.read())
with
The statement can guarantee open
that the file will eventually be saved close
, and the same function can be realized by executing try ... finally
the statement in ;finally
close
write file
with open('/User/aurelius/test.txt', 'w') as f:
f.write('hello, world.')
StringIO and BytesIO
StringIO
Reading and writing in memory str
has a consistent interface with reading and writing files;
from io import StringIO
# InputStream
f = StringIO()
f.write('hello')
# 读取写入的 str
f.getvalue()
# OutputStream
f = StringIO('hello, 中国')
f.read()
BytesIO
read and write in memorybytes
from io import BytesIO
# InputStream
f = BytesIO()
f.write('中文'.encode('utf-8'))
print(f.getvalue())
# OutputStream
f = BytesIO(b'\xe4\xb8\xad\xe6\x96\x87')
print(f.read().decode('utf-8'))
Manipulating files and directories
The built-in os
modules of Python can directly call the interface functions provided by the system to operate files and directories;
>>> import os
>>> os.name
nt
environment variable
os.environ # 全部环境变量 (Class<Environ>)
os.environ.get('key', 'default') # 指定的环境变量,default 可选
Manipulating files and directories
function | effect |
---|---|
os.path.abspath(‘.’) | The absolute path of the current path |
os.path.join(r’d:\a’, ‘b’) | Splice path 2 ( b ) to path 1 ( d:\a ), if path 2 is an absolute path, return path 2 directly |
os.mkdir(r'd:\test') | create a directory |
os.mkdir(r'd:\test') | delete a directory |
os.path.split(r’d:\test\file.txt’) | Split into last level directory and filename |
os.path.splitext(r’d:\test\file.txt’) | Split file extensions |
os.rename(‘test.txt’, ‘text.py’) | rename file |
os.remove(‘test.py’) | Delete Files |
os.listdir('.') | List the specified path |
os.path.isdir(‘d:\test’) | Determine whether the path |
os.path.isfile(‘d:\test\test.txt’) | Judging whether the file |
shutil
The module os
supplements the function, which copyfile()
provides file copy function;
Serialization
The process of changing variables from memory to storage or transmission is called serialization pickling
, and rereading serialized objects into memory is called deserialization unpickling
;
Pickle
- dumps/dump
>>> import pickle
>>> d = dict(name='中国人', age=18, score=99)
# pickle.dumps 把任意对象序列化成 bytes
>>> pickle.dumps(d)
b'\x80\x04\x95*\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x04name\x94\x8c\t\xe4\xb8\xad\xe5\x9b\xbd\xe4\xba\xba\x94\x8c\x03age\x94K\x12\x8c\x05score\x94Kcu.'
# pickle.dump 直接把对象序列化后写入 file-like Ojbect
>>> with open('dump.txt', 'wb') as w:
... pickle.dump(d, w)
- loads/load
>>> with open('dump.txt', 'rb') as r:
... d = pickle.load(r)
...
>>> d
{
'name': 'Aurelius', 'age': 18, 'score': 99}
pickle
The variable obtained by deserialization has nothing to do with the original variable, but the value is the same;
pickle
Serialization is only available for Python, and different versions are not compatible with each other;
JSON
A standard format for serialization, suitable for transfer between different programming languages, the standard encoding uses UTF-8;
- JSON type relationship
JSON type | Python type |
---|---|
{} | dict |
[] | list |
string | str |
int/float | int/float |
true/false | True/False |
null | None |
>>> import json
>>> d = dict(name='Aurelius', age=18, score=99)
>>> json_str = json.dumps(d)
>>> json_str
'{"name": "Aurelius", "age": 18, "score": 99}'
>>> json.loads(json_str)
{
'name': 'Aurelius', 'age': 18, 'score': 99}
dumps
/dump
的ensure_ascii
参数可以决定是否统一将返回的str
对象编码为ascii
字符;
JSON 进阶
自定义类的对象不能直接序列化,需要实现dumps
/dump
的default
参数对应的方法,将该对象转化成dict
对象;
json.dumps(o, default=object2dict)
通常class
都有__dict__
属性,存储着实例的变量(定义了__solts__
除外),因此可以直接如此调用;
json.dumps(o, default=lambda o: o.__dict__)
loads
/load
在反序列化自定义类型时也需传入object_hook
相应方法,将dict
对象转化为自定义类型的对象;
json.loads(json_str, object_hook=dict2object)
2. 正则表达式
用一种描述性的语言给字符串定义一个规则,用这种规则匹配字符串;
描述符 | 作用 | 示例 |
---|---|---|
\d | 匹配数字 | ‘00\d’ 匹配 ‘007’ |
\w | 字母或数字 | ‘\w\w\d’ 匹配 ‘py3’ |
. | 任意字符 | ‘py.’ 匹配 ‘pyc’、‘py!’ |
* | 人一个字符串(包括 0 个) | |
+ | 至少 1 个字符 | |
? | 0 个或 1 个字符 | |
{n} | n 个字符 | ‘\d{3}’ 匹配 ‘010’ |
{n,m} | n ~ m 个字符 | ‘\d{3,8}’ 匹配 ‘1234567’ |
\ | 转义字符 | ‘\d{3}-\d{3,8}’ 匹配 ‘010-12345’ |
\s | 空格、空位符 |
进阶
描述符 | 作用 | 示例 |
---|---|---|
[] | 表示范围 | ‘[0-9a-zA-Z_]’ 匹配任意一个数字、字母或下划线 |
A|B | 匹配 A 或 B | |
^ | 行的开头 | ‘^\d’ 表示以数字开头 |
$ | 行的结束 | ‘\d$’ 表示以数字结束 |
re 模块
Python 字符串本身用\
转义,正则表达式也用\
转义,在拼写正则表达式时使用r
前缀可以忽略掉 Python 本身字符串的转义;
match
>>> import re
>>> re.match(r'^\d{3}\-\d{3,8}$', '010-12345')
<re.Match object; span=(0, 9), match='010-12345'>
>>> re.match(r'^\d{3}\-\d{3,8}$', '010 12345')
>>>
当匹配成功时,返回一个 Match 对象,否则返回 None;
split
>>> re.split(r'\s+', 'a b c')
['a', 'b', 'c']
>>> re.split(r'[\s\,\;]+', 'a,b;; c d')
['a', 'b', 'c', 'd']
通过模式分割字符串,返回分割的数组;
group
>>> m = re.match(r'^(\d{3})-(\d{3,8})$', '010-12345')
>>> m
<re.Match object; span=(0, 9), match='010-12345'>
>>> m.group(2)
'12345'
>>> m.group(1)
'010'
>>> m.group(0)
'010-12345'
通过()
提取分组子串,group(0)
表示匹配的全部字符串,group(n)
表示第 n 个子串;
贪婪匹配
匹配尽可能多的字符
>>> re.match(r'^(\d+)(0*)$', '102300').groups()
('102300', '')
>>> re.match(r'^(\d+)(0+)$', '102300').groups()
('10230', '0')
正则匹配默认是贪婪匹配,想要非贪婪匹配(尽可能少匹配),在\d+
后加?
;
>>> re.match(r'^(\d+?)(0*)$', '102300').groups()
('1023', '00')
编译
re
模块执行步骤:
- 编译正则表达式,不合法则报错;
- 用编译后的正则表达式匹配字符串;
- 预编译
>>> import re
>>> re_telephone = re.compile(r'^(\d{3})-(\d{3,8})$')
>>> re_telephone.match('010-12345').groups()
('010', '12345')
>>> re_telephone.match('010-8086').groups()
('010', '8086')
匹配简单邮箱
def is_valid_email(addr):
if re.match(r'(^[a-zA-Z\.]+)\@(gmail|microsoft)\.com$', addr):
return True
else:
return False
匹配带名称邮箱,提取名称
def name_of_email(addr):
# 提取邮箱前缀
m = re.match(r'^([a-zA-Z\d\s\<\>]+)\@(voyager|example)\.(org|com)$', addr)
if not m:
return None
# 提取前缀中 <> 里面的名称,若不存在,则取全名
m = re.match(r'^\<([a-zA-Z\s]+)\>[\s]+[a-zA-Z\d]+|([a-zA-Z\d]+)$', m.group(1))
return m.group(1) if m and m.group(1) else m.group(2)
- 上一篇:「Python 基础」错误、调试与测试
- 下一篇:「Python 基础」进程与线程
PS:感谢每一位志同道合者的阅读,欢迎关注、评论、赞!