datetime
datetime is Python's standard library for handling dates and times.
Get current date and time
from datetime import datetime
now = datetime.now() # 获取当前datetime
print(now) #2023-09-13 10:28:48.621343
print(type(now))#<class 'datetime.datetime'>
- Notice
datetime是模块
thatfrom datetime import datetime
it is the datetime class that is imported.- If you only import import datetime, you must quote
全名datetime.datetime
- If you only import import datetime, you must quote
Get the specified date and time
dt = datetime(2023, 9, 13, 12, 20) # 用指定日期时间创建datetime
print(dt)
Convert datetime to timestamp
- In computers, time is actually
数字
represented in terms of . We1970年1月1日 00:00:00 UTC+00:00
call the moment in the time zoneepoch time
, recorded as0(
the time before 1970 (timestamp is a negative number ), and the current time is relative toepoch time的秒数
, called timestamp.
dt = datetime(2023, 9, 13, 12, 20) # 用指定日期时间创建datetime
print(dt.timestamp()) # 把datetime转换为timestamp
#1694578800.0
- Note that Python's timestamp is one
浮点数,整数位表示秒
.
The timestamp can also be converted directly to the UTC standard time zone:
t = 1429417200.0
print(datetime.fromtimestamp(t)) # 本地时间
#2015-04-19 12:20:00
print(datetime.utcfromtimestamp(t)) # UTC时间
#2015-04-19 04:20:00
Convert str to datetime
- Implemented through datetime.strptime(), a formatted string of date and time is required:
cday = datetime.strptime('2015-6-1 18:19:59', '%Y-%m-%d %H:%M:%S')
print(cday)
#2015-06-01 18:19:59
Convert datetime to str
now = datetime.now()
print(now.strftime('%a, %b %d %H:%M'))
#Mon, May 05 16:28
datetime addition and subtraction
- Adding and subtracting dates and times is actually to
datetime往后或往前计算
get a new datetime.加减可以直接用+和-运算符,不过需要导入timedelta这个类
:
from datetime import datetime, timedelta
now = datetime.now()
datetime(2023, 9, 13, 10, 30, 3, 540997)
print(now + timedelta(hours=10))#2023-09-13 20:38:44.709003
datetime(2023, 9, 13, 10, 30, 3, 540997)
print(now - timedelta(days=1))#2023-09-12 10:38:44.709003
datetime(2023, 9, 13, 10, 30, 3, 540997)
print(now + timedelta(days=2, hours=12))#2023-09-12 10:38:44.709003
Convert local time to UTC time
from datetime import datetime, timedelta, timezone
tz_utc_8 = timezone(timedelta(hours=8)) # 创建时区UTC+8:00
now = datetime.now()
print(now)
dt = now.replace(tzinfo=tz_utc_8) # 强制设置为UTC+8:00
print(dt)
dt = datetime(2015, 9, 13, 10, 40, 13, 610986, tzinfo=timezone(timedelta(0, 28800)))
print(dt)
time zone conversion
from datetime import datetime, timedelta, timezone
# 拿到UTC时间,并强制设置时区为UTC+0:00:
utc_dt = datetime.utcnow().replace(tzinfo=timezone.utc)
print(utc_dt)
# astimezone()将转换时区为北京时间:
bj_dt = utc_dt.astimezone(timezone(timedelta(hours=8)))
print(bj_dt)
# astimezone()将转换时区为东京时间:
tokyo_dt = utc_dt.astimezone(timezone(timedelta(hours=9)))
print(tokyo_dt)
# astimezone()将bj_dt转换时区为东京时间:
tokyo_dt2 = bj_dt.astimezone(timezone(timedelta(hours=9)))
print(tokyo_dt2)
summary
-
The time represented by datetime requires time zone information to determine a specific time, otherwise it can only be regarded as local time.
-
If you want to store datetime, the best way is to use it
转换为timestamp再存储,因为timestamp的值与时区完全无关
.
base64
Base64 is an encoding method for converting text strings into arbitrary binary formats. It is commonly used in URLs, cookies, and web pages 传输少量二进制数据
.
-
The principle of Base64 is very simple. First, prepare an array containing 64 characters:
['A', 'B', 'C', ... 'a', 'b', 'c', ... '0', '1', ... '+', '/']
-
Then, the binary data is processed,
每3个字节一组
, in total3x8=24bit
, divided into 4 groups, each group is exactly6个bit
: -
In this way, we get 4 numbers as index, and then
查表
we get the corresponding 4 characters, which is the encoded string.-
会把3字节的二进制数据编码为4字节的文本数据,长度增加33%
Therefore, the advantage of Base64 encoding is that the encoded text data can be directly displayed in the body of emails, web pages, etc. -
What if the binary data to be encoded is not a multiple of 3 and there will be 1 or 2 bytes left at the end?
- Base64 is used after padding at the end, and then indicates how many bytes are padded
\x00字节
at the end of the encoding .上1个或2个=号,
解码的时候,会自动去掉
- Base64 is used after padding at the end, and then indicates how many bytes are padded
-
Python's built-in base64 can directly encode and decode base64:
import base64
#`b'str'`可以表示字节,
a = base64.b64encode(b'binary\x00string')
print(a)
b = base64.b64decode(b'YmluYXJ5AHN0cmluZw==')
print(b)
#b'YmluYXJ5AHN0cmluZw=='
#b'binary\x00string'
b'str'
Can represent bytes,
Since it may appear after standard Base64 encoding 字符+和/
, it cannot be used directly as a parameter in the URL, so there is another "url safe"
base64 encoding, which is actually 把字符+和/分别变成-和_
:
#`b'str'`可以表示字节,
c= base64.b64encode(b'i\xb7\x1d\xfb\xef\xff')
print(c)#b'abcd++//'
d = base64.urlsafe_b64encode(b'i\xb7\x1d\xfb\xef\xff')
print(d)#b'abcd++//'
e = base64.urlsafe_b64decode('abcd--__')
print(e)#b'abcd++//'
hashlib
Python's hashlib provides common digest algorithms, such as MD5,SHA1
etc.
**What is a summary algorithm?
-
**Digest algorithm is also known as
哈希算法、散列算法
. The summary algorithm calculates a fixed length摘要函数f()
from any length , with the purpose of discovering whether the original data has been tampered with. (Usually represented by a hexadecimal string).数据data
摘要digest
-
The reason why the digest algorithm can indicate whether the data has been tampered with
- Since the summary function is one
单向函数
, the calculationf(data)
is easy but passesdigest反推data却非常困难
. Moreover, any single bit modification to the original data will result in a completely different calculated summary.
- Since the summary function is one
Application scenarios
- Wrote an article, the content is a string 'how to use python hashlib - by Michael', and attached the abstract of this article is '
2d73d4f15c0db7f5ecb321b6a65e5d6d
'. If someone tampered with your article and published it as 'how to use python hashlib - by Bob
', you can immediately point out that Bob tampered with your article, becausehow to use python hashlib - by Bob
the abstract calculated based on ' ' is different from the abstract of the original article.
MD5 is the most common digest algorithm. It is very fast and the generated result is fixed 128 bit/16字节
. It is usually 32位的16进制字符串
represented by one. As follows
import hashlib
md5 = hashlib.md5()
md5.update('how to use md5 in python hashlib?'.encode('utf-8'))
print(md5.hexdigest())
#d26a53750bc40b38b65a520292f69306
If the amount of data is large, yes 分块多次调用update()
, the final calculation result is the same:
import hashlib
md5 = hashlib.md5()
md5.update('how to use md5 in '.encode('utf-8'))
md5.update('python hashlib?'.encode('utf-8'))
print(md5.hexdigest())
#d26a53750bc40b38b65a520292f69306
Another common digest algorithm is SHA1
that calling SHA1 is exactly like calling MD5: the result of SHA1 is 160 bit/20字节
, usually 40位的16进制字符串
represented by a .
import hashlib
sha1 = hashlib.sha1()
sha1.update('how to use sha1 in '.encode('utf-8'))
sha1.update('python hashlib?'.encode('utf-8'))
print(sha1.hexdigest())
#2c76b57293ce30acef38d98f6046927161b46a44
There are more secure algorithms than SHA1 SHA256和SHA512
, though 越安全的算法不仅越慢,而且摘要长度更长
.
hmac
password_md5
Through the hash algorithm, we can verify whether a piece of data is valid by comparing the hash value of the data. For example, to determine whether the user password is correct, we use the comparison calculation result stored in the database. If it is consistent, the password entered by the md5(password)
user That's right.
In order to prevent hackers from 彩虹表
inferring 哈希值
the original password, when calculating the hash, it cannot only be calculated based on the original input. It is necessary to add one salt
so that the same input can also get different hashes. This greatly increases the difficulty for hackers to crack.
-
If the salt is randomly generated by ourselves, we usually use it when calculating MD5
md5(message + salt)
. But in fact, considering salt as a "password", the hash of salt is: when calculating the hash of a message, different hashes are calculated based on different passwords. To verify the hash value, the correct password must also be provided.-
This is actually
Hmac算法
: Keyed-Hashing for Message Authentication. It uses a standard algorithm to calculate the hash把key混入计算过程中
. -
Different from our custom salt-adding algorithm,
Hmac算法针对所有哈希算法都通用,无论是MD5还是SHA-1
. Using Hmac to replace our own salt algorithm can make the program algorithm more standardized and safer.
-
The hmac module that comes with Python implements the standard Hmac algorithm. Let's take a look at how to use hmac to implement hashing with keys.
import hmac
#原始数据
message = b'Hello, world!'
#密钥
key = b'secret'
h = hmac.new(key, message, digestmod='MD5')
# 如果消息很长,可以多次调用h.update(msg)
print(h.hexdigest())
#'fa4ee7d173f2d97ee79022d1a7355bcf'
- It should be noted that the incoming key and message are both
bytes类型,str类型需要首先编码为bytes
.
screaming
For details, see [Python] From Getting Started to Top—Application Scenarios of Network Request Modules urlib and reuests (12)
XML
There are two ways to manipulate XML:DOM和SAX
.
-
DOM will read the entire XML into memory and parse it into a tree, so it takes up space
内存大,解析慢
. The advantage is that it can be used任意遍历树的节点
. -
SAX is
流模式
, parsing while reading, occupying内存小,解析快
, the disadvantage is us需要自己处理事件
. -
Under normal circumstances, SAX is given priority because DOM takes up too much memory.
Using SAX to parse XML in Python is very simple. Usually what we care about is start_element,end_element和char_data
to prepare these three functions and then parse the xml.
For example: When the SAX parser reads a node:
<a href="/">python</a>
3 events will be generated:
-
start_element event, when reading
<a href="/">
; -
char_data event, when reading
python
; -
end_element event, while reading
</a>
.from xml.parsers.expat import ParserCreate class DefaultSaxHandler(object): def start_element(self, name, attrs): print('sax:start_element: %s, attrs: %s' % (name, str(attrs))) def end_element(self, name): print('sax:end_element: %s' % name) def char_data(self, text): print('sax:char_data: %s' % text) xml = r'''<?xml version="1.0"?> <ol> <li><a href="/python">Python</a></li> <li><a href="/ruby">Ruby</a></li> </ol> ''' handler = DefaultSaxHandler() parser = ParserCreate() #start_element事件 parser.StartElementHandler = handler.start_element #end_element事件 parser.EndElementHandler = handler.end_element #char_data事件 parser.CharacterDataHandler = handler.char_data #解析 parser.Parse(xml)
Results of the
sax:start_element: ol, attrs: { } sax:char_data: sax:char_data: sax:start_element: li, attrs: { } sax:start_element: a, attrs: { 'href': '/python'} sax:char_data: Python sax:end_element: a sax:end_element: li sax:char_data: sax:char_data: sax:start_element: li, attrs: { } sax:start_element: a, attrs: { 'href': '/ruby'} sax:char_data: Ruby sax:end_element: a sax:end_element: li sax:char_data: sax:end_element: ol
- It should be noted that when reading a large string,
CharacterDataHandler
it may be called multiple times, so it needs to be merged inside自己保存起来
.EndElementHandler
- It should be noted that when reading a large string,
In addition to parsing XML, how to generate XML?
-
In 99% of cases the XML structure that needs to be generated is very simple, therefore
最简单也是最有效的生成XML的方法是拼接字符串
:L = [] L.append(r'<?xml version="1.0"?>') L.append(r'<root>') L.append(encode('some & data')) L.append(r'</root>') return ''.join(L)
HTMLParser
If we want to write a search engine, the first step is to use a crawler to crawl the page of the target website. The second step is to parse the HTML page to see whether the content is news, pictures or videos.
- Assuming that the first step has been completed, how should the second step parse HTML?
HTML essentially is XML的子集
, but HTML的语法
it is not as strict as XML, so you cannot use the standard DOM或SAX
to parse HTML.
Python provides HTMLParser
a very convenient way to parse HTML with just a few lines of code:
from html.parser import HTMLParser
from html.entities import name2codepoint
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
print('<%s>' % tag)
def handle_endtag(self, tag):
print('</%s>' % tag)
def handle_startendtag(self, tag, attrs):
print('<%s/>' % tag)
def handle_data(self, data):
print(data)
def handle_comment(self, data):
print('<!--', data, '-->')
def handle_entityref(self, name):
print('&%s;' % name)
def handle_charref(self, name):
print('&#%s;' % name)
parser = MyHTMLParser()
parser.feed('''<html>
<head></head>
<body>
<!-- test html parser -->
<p>Some <a href=\"#\">html</a> HTML tutorial...<br>END</p>
</body></html>''')
-
The feed() method can be used
多次调用
, that is, the entire HTML string does not have to be inserted at once, but can be inserted part by part. -
There are two types of special characters, one is expressed in English
and the other is expressed in numbers的Ӓ
. Both of these characters canParser
be parsed.
random
The Python random module is mainly used to generate random numbers. Implemented pseudo-random number generators for various distributions.
Common methods
andom() 生成一个 [0.0, 1.0) 之间的随机小数
seed(seed) 初始化给定的随机数种子
randint(a, b) 生成一个 [a, b] 之间的随机整数
uniform(a, b) 生成一个 [a, b] 之间的随机小数
choice(seq) 从序列 seq 中随机选择一个元素
shuffle(seq) 将序列 seq 中元素随机排列, 返回打乱后的序列
random.random()
import random
print(random.random())
#0.4784904215869241
**random.seed(seed) **
-
Initialize the given random number seed
-
The computer uses a deterministic algorithm to calculate a sequence of random numbers. Random numbers generated by computers are not truly random
但具有类似于随机数的统计特征,如均匀性、独立性等
. -
The computer is based on
随机数种子产生随机数序列,如果随机数种子相同,每次产生的随机数序列是相同的
; if the random number seeds are different, the random number sequences generated are different.random.seed(10) a = random.randint(0, 100) print(a) a = random.randint(0, 100) print(a) a = random.randint(0, 100) print(a) # 73 # 4 # 54 random.seed(10) a = random.randint(0, 100) print(a) a = random.randint(0, 100) print(a) a = random.randint(0, 100) print(a) # 73 # 4 # 54
- result
第1个random.seed(10)设定种子为 10 产生第 1 个随机数 73 产生第 2 个随机数 4 产生第 3 个随机数 54 第2个random.seed(10)设定种子为 10 产生第 1 个随机数 73 产生第 2 个随机数 4 产生第 3 个随机数 54 可以看出,当种子相同时,产生的随机数序列是相同的
random.randint(a, b)
-
Generate a random integer between [a, b], the example is as follows:
a = random.randint(0, 2) print(a) a = random.randint(0, 2) print(a) a = random.randint(0, 2) print(a) # 1 # 2 # 0
random.uniform(a, b)
- is to generate a random decimal between [a, b]
import random random.uniform(0, 2) #0.20000054219225438 random.uniform(0, 2) #1.4472780206791538 random.uniform(0, 2) #0.5927807855738692
random.choice(seq)
-
Randomly select an element from the sequence seq
import random seq = [1, 2, 3, 4] random.choice(seq) #3 random.choice(seq) #1
random.shuffle(seq)
-
Randomly arrange the elements in the sequence seq and return the scrambled sequence
import random seq = [1, 2, 3, 4] random.shuffle(seq) #[1, 3, 2, 4]
summary
- Using HTMLParser, you can parse the content in the web page
文本、图像
.