Pillow
PIL: Python Imaging Library, which is already the de facto image processing standard library for the Python platform. PIL is very powerful, but the API is very simple and easy to use. PIL only supports Python 2.7, Pillow, supports the latest Python 3.x, and adds many new features, so we can install and use Pillow directly.
Manipulate images
The most common image scaling operation requires only three or four lines of code:
from PIL import Image # Open a jpg image file, note the current path: im = Image.open('test.jpg') # Get image dimensions: w, h = im.size print('Original image size: %sx%s' % (w, h)) # Zoom to 50%: im.thumbnail((w//2, h//2)) print('Resize image to: %sx%s' % (w//2, h//2)) # Save the scaled image in jpeg format: im.save('thumbnail.jpg', 'jpeg')Other functions such as slice, rotate, filter, output text, color palette, etc. are all available.
For example, the blur effect is also just a few lines of code:
from PIL import Image, ImageFilter # Open a jpg image file, note the current path: im = Image.open('test.jpg') # Apply the blur filter: im2 = im.filter(ImageFilter.BLUR) im2.save('blur.jpg', 'jpeg')
PIL's ImageDraw provides a series of drawing methods that allow us to draw directly. For example, to generate a letter verification code picture:
from PIL import Image, ImageDraw, ImageFont, ImageFilter import random # random letters: def rndChar(): return chr(random.randint(65, 90)) # random color 1: def rndColor(): return (random.randint (64, 255), random.randint (64, 255), random.randint (64, 255)) # random color 2: def rndColor2(): return (random.randint (32, 127), random.randint (32, 127), random.randint (32, 127)) # 240 x 60: width = 60 * 4 height = 60 image = Image.new('RGB', (width, height), (255, 255, 255)) # Create the Font object: font = ImageFont.truetype('arial.ttf', 36) # Create the Draw object: draw = ImageDraw.Draw(image) # Fill each pixel: for x in range(width): for y in range(height): draw.point((x, y), fill=rndColor()) # Output text: for t in range(4): draw.text((60 * t + 10, 10), rndChar(), font=font, fill=rndColor2()) # blurry: image = image.filter(ImageFilter.BLUR) image.save('code.jpg', 'jpeg')
request
Python's built-in urllib module for accessing network resources. However, it is cumbersome to use and lacks many useful advanced features.
A better solution is to use requests. It is a Python third-party library that is especially handy for dealing with URL resources.
To access a page via GET, it only takes a few lines of code:
>>> import requests
>>> r = requests.get('https://www.douban.com/') # Douban homepage
>>> r.status_code
200
>>> r.text
r.text
'<!DOCTYPE HTML>\n<html>\n<head>\n<meta name="description" content="Provide book, movie, music album recommendations, reviews and...'
For URLs with parameters, pass in a dict as the params
parameter:
>>> r = requests.get('https://www.douban.com/search', params={'q': 'python', 'cat': '1001'}) >>> r.url # The actual requested URL 'https://www.douban.com/search?q=python&cat=1001'requests automatically detects the encoding, which can be viewed using
encoding
properties:
>>> r.encoding 'utf-8'Whether the response is text or binary content, we can get the object with
content
properties :
bytes
>>> r.content b'<!DOCTYPE html>\n<html>\n<head>\n<meta http-equiv="Content-Type" content="text/html; charset=utf-8">\n...'The convenience of requests is also that for certain types of responses, such as JSON, you can get it directly:
>>> r = requests.get('https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20weather.forecast%20where%20woeid%20%3D%202151330&format=json') >>> r.json() {'query': {'count': 1, 'created': '2017-11-17T07:14:12Z', ...When we need to pass in HTTP Header, we pass in a dict as a
headers
parameter:
>>> r = requests.get('https://www.douban.com/', headers={'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit'}) >>> r.text '<!DOCTYPE html>\n<html>\n<head>\n<meta charset="UTF-8">\n <title>豆瓣(手机版)</title>...'To send a POST request, just change the
get()
method to
post()
, and then pass
data
in the parameters as the data of the POST request:
>>> r = requests.post('https://accounts.douban.com/login', data={'form_email': '[email protected]', 'form_password': '123456'})requests uses the default
application/x-www-form-urlencoded
encoding for POST data. If you want to pass JSON data, you can directly pass in the json parameter:
params = {'key': 'value'} r = requests.post(url, json=params) # Internal automatic serialization to JSONSimilarly, uploading files requires a more complex encoding format, but requests simplifies it into
files
parameters:
>>> upload_files = {'file': open('report.xls', 'rb')} >>> r = requests.post(url, files=upload_files)
When reading a file, be sure to use 'rb'
the binary mode to read, so that the bytes
length obtained is the length of the file.
Replacing the post()
method with put()
, delete()
etc , you can request the resource by PUT or DELETE .
In addition to being able to easily get the content of the response, requests are also very simple to get other information about the HTTP response. For example, to get the response headers:
>>> r.headers {Content-Type': 'text/html; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Content-Encoding': 'gzip', ...} >>> r.headers['Content-Type'] 'text/html; charset=utf-8'requests do special processing on cookies, so that we can easily get the specified cookie without parsing the cookie:
>>> r.cookies['ts'] 'example_cookie_12345'To pass cookies in the request, just prepare a dict to pass in
cookies
parameters:
>>> cs = {'token': '12345', 'status': 'working') >>> r = requests.get(url, cookies=cs)Finally, to specify a timeout, pass in the timeout parameter in seconds:
>>> r = requests.get(url, timeout=2.5) # timeout after 2.5 seconds
chardet
String encoding has always been a very troublesome problem, especially when we are dealing with some non-standard third-party web pages. Although Python provides Unicode representation str
and bytes
two data types, and can be converted by encode()
and methods, but it is not easy to do decode()
it without knowing the encoding .bytes
decode()
For an unknown encoding bytes
, to convert it to str
, you need to "guess" the encoding first. The third-party library chardet just came in handy. Use it to detect encoding, simple and easy to use.
When we get one bytes
, we can detect the encoding for it. To detect encoding with chardet, only one line of code is required:
>>> chardet.detect(b'Hello, world!') {'encoding': 'ascii', 'confidence': 1.0, 'language': ''}
The detected code is ascii
, notice that there is also a confidence
field, indicating that the probability of detection is 1.0 (ie 100%).
Let's try to detect GBK-encoded Chinese:
>>> data = 'Leaving the plains on the grass, one year old and one dying'.encode('gbk') >>> chardet.detect(data) {'encoding': 'GB2312', 'confidence': 0.7407407407407407, 'language': 'Chinese'}Detect UTF-8 encoding:
>>> data = '离离原上草,一岁一枯荣'.encode('utf-8') >>> chardet.detect(data) {'encoding': 'utf-8', 'confidence': 0.99, 'language': ''}对日文进行检测:
>>> data = '最新の主要ニュース'.encode('euc-jp') >>> chardet.detect(data) {'encoding': 'EUC-JP', 'confidence': 0.99, 'language': 'Japanese'}
用chardet检测编码,使用简单。获取到编码后,再转换为str
,就可以方便后续处理。
psutil
在Python中获取系统信息使用psutil
第三方模块。顾名思义,psutil = process and system utilities,它不仅可以通过一两行代码实现系统监控,还可以跨平台使用,支持Linux/UNIX/OSX/Windows等,是系统管理员和运维小伙伴不可或缺的必备模块。
获取CPU信息
我们先来获取CPU的信息:
>>> import psutil
>>> psutil.cpu_count() # CPU逻辑数量
4
>>> psutil.cpu_count(logical=False) # CPU物理核心
2
# 2说明是双核超线程, 4则是4核非超线程
统计CPU的用户/系统/空闲时间:
>>> psutil.cpu_times() scputimes(user=10963.31, nice=0.0, system=5138.67, idle=356102.45)再实现类似
top
命令的CPU使用率,每秒刷新一次,累计10次:
>>> for x in range(10): ... psutil.cpu_percent(interval=1, percpu=True) ... [14.0, 4.0, 4.0, 4.0] [12.0, 3.0, 4.0, 3.0] [8.0, 4.0, 3.0, 4.0] [12.0, 3.0, 3.0, 3.0] [18.8, 5.1, 5.9, 5.0] [10.9, 5.0, 4.0, 3.0] [12.0, 5.0, 4.0, 5.0] [15.0, 5.0, 4.0, 4.0] [19.0, 5.0, 5.0, 4.0] [9.0, 3.0, 2.0, 3.0]
获取内存信息
使用psutil获取物理内存和交换内存信息,分别使用:
>>> psutil.virtual_memory() svmem(total=8589934592, available=2866520064, percent=66.6, used=7201386496, free=216178688, active=3342192640, inactive=2650341376, wired=1208852480) >>> psutil.swap_memory() sswap(total=1073741824, used=150732800, free=923009024, percent=14.0, sin=10705981440, sout=40353792)
返回的是字节为单位的整数,可以看到,总内存大小是8589934592 = 8 GB,已用7201386496 = 6.7 GB,使用了66.6%。
而交换区大小是1073741824 = 1 GB。
获取磁盘信息
可以通过psutil获取磁盘分区、磁盘使用率和磁盘IO信息:
>>> psutil.disk_partitions() # 磁盘分区信息 [sdiskpart(device='/dev/disk1', mountpoint='/', fstype='hfs', opts='rw,local,rootfs,dovolfs,journaled,multilabel')] >>> psutil.disk_usage('/') # 磁盘使用情况 sdiskusage(total=998982549504, used=390880133120, free=607840272384, percent=39.1) >>> psutil.disk_io_counters() # 磁盘IO sdiskio(read_count=988513, write_count=274457, read_bytes=14856830464, write_bytes=17509420032, read_time=2228966, write_time=1618405)
可以看到,磁盘'/'
的总容量是998982549504 = 930 GB,使用了39.1%。文件格式是HFS,opts
中包含rw
表示可读写,journaled
表示支持日志。
获取网络信息
psutil可以获取网络接口和网络连接信息:
>>> psutil.net_io_counters() # 获取网络读写字节/包的个数 snetio(bytes_sent=3885744870, bytes_recv=10357676702, packets_sent=10613069, packets_recv=10423357, errin=0, errout=0, dropin=0, dropout=0) >>> psutil.net_if_addrs() # 获取网络接口信息 { 'lo0': [snic(family=<AddressFamily.AF_INET: 2>, address='127.0.0.1', netmask='255.0.0.0'), ...], 'en1': [snic(family=<AddressFamily.AF_INET: 2>, address='10.0.1.80', netmask='255.255.255.0'), ...], 'en0': [...], 'en2': [...], 'bridge0': [...] } >>> psutil.net_if_stats() # 获取网络接口状态 { 'lo0': snicstats(isup=True, duplex=<NicDuplex.NIC_DUPLEX_UNKNOWN: 0>, speed=0, mtu=16384), 'en0': snicstats(isup=True, duplex=<NicDuplex.NIC_DUPLEX_UNKNOWN: 0>, speed=0, mtu=1500), 'en1': snicstats(...), 'en2': snicstats(...), 'bridge0': snicstats(...) }
要获取当前网络连接信息,使用net_connections()
:
>>> psutil.net_connections() Traceback (most recent call last): ... PermissionError: [Errno 1] Operation not permitted During handling of the above exception, another exception occurred: Traceback (most recent call last): ... psutil.AccessDenied: psutil.AccessDenied (pid=3847)你可能会得到一个
AccessDenied
错误,原因是psutil获取信息也是要走系统接口,而获取网络连接信息需要root权限,这种情况下,可以退出Python交互环境,用
sudo
重新启动:
$ sudo python3 Password: ****** Python 3.6.3 ... on darwin Type "help", ... for more information. >>> import psutil >>> psutil.net_connections() [ sconn(fd=83, family=<AddressFamily.AF_INET6: 30>, type=1, laddr=addr(ip='::127.0.0.1', port=62911), raddr=addr(ip='::127.0.0.1', port=3306), status='ESTABLISHED', pid=3725), sconn(fd=84, family=<AddressFamily.AF_INET6: 30>, type=1, laddr=addr(ip='::127.0.0.1', port=62905), raddr=addr(ip='::127.0.0.1', port=3306), status='ESTABLISHED', pid=3725), sconn(fd=93, family=<AddressFamily.AF_INET6: 30>, type=1, laddr=addr(ip='::', port=8080), raddr=(), status='LISTEN', pid=3725), sconn(fd=103, family=<AddressFamily.AF_INET6: 30>, type=1, laddr=addr(ip='::127.0.0.1', port=62918), raddr=addr(ip='::127.0.0.1', port=3306), status='ESTABLISHED', pid=3725), sconn(fd=105, family=<AddressFamily.AF_INET6: 30>, type=1, ..., pid=3725), sconn(fd=106, family=<AddressFamily.AF_INET6: 30>, type=1, ..., pid=3725), sconn(fd=107, family=<AddressFamily.AF_INET6: 30>, type=1, ..., pid=3725), ... sconn(fd=27, family=<AddressFamily.AF_INET: 2>, type=2, ..., pid=1) ]
Get process information
Detailed information about all processes can be obtained through psutil:>>> psutil.pids() # all process IDs [3865, 3864, 3863, 3856, 3855, 3853, 3776, ..., 45, 44, 1, 0] >>> p = psutil.Process(3776) # Get the specified process ID=3776, which is actually the current Python interactive environment >>> p.name() # process name 'python3.6' >>> p.exe() # Process exe path '/Users/michael/anaconda3/bin/python3.6' >>> p.cwd() # Process working directory '/Users/michael' >>> p.cmdline() # The command line where the process starts ['python3'] >>> p.ppid() # parent process ID 3765 >>> p.parent() # parent process <psutil.Process(pid=3765, name='bash') at 4503144040> >>> p.children() # list of child processes [] >>> p.status() # process status 'running' >>> p.username() # Process username 'michael' >>> p.create_time() # Process creation time 1511052731.120333 >>> p.terminal() # 进程终端 '/dev/ttys002' >>> p.cpu_times() # 进程使用的CPU时间 pcputimes(user=0.081150144, system=0.053269812, children_user=0.0, children_system=0.0) >>> p.memory_info() # 进程使用的内存 pmem(rss=8310784, vms=2481725440, pfaults=3207, pageins=18) >>> p.open_files() # 进程打开的文件 [] >>> p.connections() # 进程相关网络连接 [] >>> p.num_threads() # 进程的线程数量 1 >>> p.threads() # 所有线程信息 [pthread(id=1, user_time=0.090318, system_time=0.062736)] >>> p.environ() # 进程环境变量 {'SHELL': '/bin/bash', 'PATH': '/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:...', 'PWD': '/Users/michael', 'LANG': 'zh_CN.UTF-8', ...} >>> p.terminate() # 结束进程 Terminated: 15 <-- 自己把自己结束了
和获取网络连接类似,获取一个root用户的进程需要root权限,启动Python交互环境或者.py
文件时,需要sudo
权限。
psutil还提供了一个test()
函数,可以模拟出ps
命令的效果:
$ sudo python3 Password: ****** Python 3.6.3 ... on darwin Type "help", ... for more information. >>> import psutil >>> psutil.test() USER PID %MEM VSZ RSS TTY START TIME COMMAND root 0 24.0 74270628 2016380 ? Nov18 40:51 kernel_task root 1 0.1 2494140 9484 ? Nov18 01:39 launchd root 44 0.4 2519872 36404 ? Nov18 02:02 UserEventAgent root 45 ? 2474032 1516 ? Nov18 00:14 syslogd root 47 0.1 2504768 8912 ? Nov18 00:03 kextd root 48 0.1 2505544 4720 ? Nov18 00:19 fseventsd _appleeven 52 0.1 2499748 5024 ? Nov18 00:00 appleeventsd root 53 0.1 2500592 6132 ? Nov18 00:02 configd ...