Encryption logic commonly used in crawlers (python implementation)

Table of contents

1. MD5

2. URL Encode and Base64

1、URLEncode

2、Base64

3. Symmetric encryption

1. AES encryption

2. DES encryption

4. Asymmetric encryption

1.RSA


1. MD5

from hashlib import md5     
# MD5是一个大的hash算法. 不是加密. 不存在解密逻辑

# hash 算法是不可逆的

salt = b"suibianjiashenmesalt"

# 加密器
obj = md5(salt)

# 准备好明文
massage = 'DK_COOl'
obj.update(massage.encode('utf-8'))    # 需要将字符串编码成字节

# 获取密文
ct_x = obj.hexdigest()
print(ct_x)         # 02350223f1dfe2ed625329c51c9cd26f, salt -> 8878d4fd97a85c434cc8ffeb70b658b9

Note: When creating an encryptor, adding salt can make the ciphertext not so easy to be credentialed.

MD5 can complete the verification of the file.

extension: sha256

Whether it is sha1, sha256, or md5, they all belong to the digest algorithm, and they are all calculating the hash value. Only the degree of hashing is different. Such algorithms have a property that they are hashes, not encryption. Moreover, since the hash algorithm is irreversible, there is no decryption logic.


2. URL Encode and Base64

1、URLEncode

When we visit a url, we can always see such a URL:

https://sp1.baidu.com/5bU_dTmfKgQFm2e88IuM_a/union.gif?
q=execjs%2E%5Fexceptions%2EProcessExitedWithNonZeroStatus%3A+%281%2C+%27%27%2C+%27%5Bstdin%5D%3A1%5Cn%28function%28program%2C+execJS%29+%7B+execJS%28program%29+%7D%29%28function%28%29+%7B+function%28t%29+%7B%5Cn&rsv_ct=2&rsv_cst=1


At this time, you will find that you can clearly see Chinese on the browser. But once copied, or seen in the packet capture tool, it is all this kind of %. So what the hell is this %? Is it also encrypted? Also, in fact, when we visit a ur, the browser will automatically perform the urlencode operation. It will encode the ur we request. This encoding rule is called percent encoding, which is specially designed for url (uniform resource locator ) prepared a set of encoding rules.
In fact, the rules inside are very simple. It is to convert the parameter part in ur into bytes. Each byte is then converted into a hexadecimal number. The front is filled with %.

It looks very complicated, and it can be done in one step directly in python

from urllib.parse import urlencode, unquote

# url 的 编码
base_url = 'https://www.baidu.com/s?'

param_dic = {
    "wd": "我饿了"
}

# wd=%E6%88%91%E9%A5%BF%E4%BA%86
result = urlencode(param_dic)
print(result)
url = base_url + result
print(url)

# 解码
url_1 = 'https://www.baidu.com/s?wd=%E6%88%91%E9%A5%BF%E4%BA%86'
print(unquote(url_1))   # 查看url 中的特殊符号以及中文信息

2、Base64

Base64 is actually very easy to understand. Usually the encrypted content is bytes, and our ciphertext is used for transmission (whoever encrypts it if it is not transmitted). However, it is very troublesome to transmit bytes in the http protocol . Correspondingly, if the transmission is a string, it is easier to control. At this time, base64 came into being. 26 uppercase letters + 26 A lowercase letter + 10 numbers + 2 special symbols (+ and /) form a group of calculation logic similar to base64 . This is base64.

import base64

bs = "我要吃饭,我饿fadksljfkljaskl呵啊哒。吃了么呵啊哒了".encode('utf-8')

# 编码
# base64主要是处理字节的
print(bs)
# 把字节 按照 base64的规则.进行编码。编码成base64的字符串形式
#           b64的字节      #b64的字符串
s = base64.b64encode(bs).decode("utf-8")
print(s)


# 解码
s = '5oiR6KaB5ZCD6aWt77yM5oiR6aW/ZmFka3NsamZrbGphc2ts5ZG15ZWK5ZOS44CC5ZCD5LqG5LmI5ZG15ZWK5ZOS5LqG'
bs = base64.b64decode(s)

source_s = bs.decode('utf-8')
print(source_s) # 我要吃饭,我饿fadksljfkljaskl呵啊哒。吃了么呵啊哒了

3. Symmetric encryption

        The so-called symmetric encryption means that the same secret key is used for encryption and decryption. It’s like I want to mail you a box with a lock on it. I gave you a key in advance, and I have one. Then I can lock the box before posting it to you. You can open the box with the same key.


Condition: Encryption and decryption use the same secret key. Then both parties must have the key at the same time.


Common symmetric encryption: AES, DES , 3DES. We discuss AES and DES here

1. AES encryption

import base64

from Crypto.Cipher import AES

s = '这是我要加密的明文'
"""
key -> 16, 24, 32
It must be 16, 24 or 32 bytes long (respectively for *AES-128*,
        *AES-192* or *AES-256*).
"""
key = b'dkdkcooldkdkcool'

aes = AES.new(key, mode=AES.MODE_CBC, IV=b'0102030405060708')

# ValueError: Data must be padded to 16 byte boundary in CBC mode
# 需要做填充
# 填充最好的方案(通用):缺少字节的个数 * chr(缺少字节的个数)
bs = s.encode('utf-8')

que = 16 - len(bs) % 16     # 缺少字节的个数
bs += (que * chr(que)).encode('utf-8')

# 加密
result = aes.encrypt(bs)    # 要求加密的内容必须是字节
# 可以选择编码成 base64
# jL5CgtiUFlRJ1Oi/IGXutF9WLfAeRynlUOexzETGRT8=
b64 = base64.b64encode(result).decode()
print(b64)

# 如果aes对象 经过了加密。 就不能再解密了,必须重新写
miwen = "jL5CgtiUFlRJ1Oi/IGXutF9WLfAeRynlUOexzETGRT8="
aes1 = AES.new(key, mode=AES.MODE_CBC, IV=b'0102030405060708')

# 处理base64
miwen = base64.b64decode(miwen)
result = aes1.decrypt(miwen)
print(result.decode('utf-8').replace('', ""))

2. DES encryption

It is very similar to the implementation of AES encryption!

from Crypto.Cipher import DES

s = "我爱热巴"
key = b'dkdkcool'

des = DES.new(key, mode=DES.MODE_CBC, IV=b'01020304')

# 加密
bs = s.encode("utf-8")
que = 8 - len(bs) % 8  # 缺少字节的个数
bs += (que * chr(que)).encode('utf-8')
result = des.encrypt(bs)
print(result)

# 解密
miwen = b'\xc2[\xa5/u,\t \x95\xe0{Z\x8e\xc4?\xb7'
des1 = DES.new(key, mode=DES.MODE_CBC, IV=b'01020304')
result = des1.decrypt(miwen)
print(result.decode('utf-8').replace("", ""))

4. Asymmetric encryption

Asymmetric encryption: the encryption and decryption keys are not the same key. Two keys are needed here: a public key and a private key. The public key is sent to the client. The sending end encrypts the data with the public key, and then sends it to the receiving end. The receiving end uses the private key to decrypt the data. Since the private key is only stored on the receiving end. So even if the data is intercepted, it cannot be decrypted.
Common asymmetric encryption algorithms: RSA, DSA, etc., we will introduce one. RSA encryption is also the most common encryption scheme.

1.RSA

# ***************************************************************
# 1.生成私钥和公钥
import base64

from Crypto.PublicKey import RSA  # 管理秘钥的

rsa_key = RSA.generate(2048)

private_key = rsa_key.exportKey()
public_key = rsa_key.publickey().exportKey()

print(public_key)
with open("rsa_public_pem.txt", mode="wb") as f:
    f.write(public_key)
with open("rsa_private_pem.txt", mode="wb") as f:
     f.write(private_key)


# ***************************************************************
# 2. 加密

from Crypto.Cipher import PKCS1_v1_5  # 加密
from Crypto.PublicKey import RSA
import base64

# 2.1 准备明文
massage = '今天晚上没吃饭'

# 2.2 读取公钥
f = open('rsa_public_pem.txt', mode='r', encoding='utf-8')
# 2.3 把公钥字符串转化成 rsa_key (object)
rsa_key = RSA.import_key(f.read())
# 2.4 创建加密对象
rsa = PKCS1_v1_5.new(rsa_key)
# 2.5 加密
miwen = rsa.encrypt(massage.encode('utf-8'))
# 2.6 b64处理
miwen = base64.b64encode(miwen).decode('utf-8')
print(miwen)

# ***************************************************************
# 3. 解密
from Crypto.Cipher import PKCS1_v1_5
import base64
from Crypto.PublicKey import RSA

# 3.1 准备密文
ctx = 'UqkvnZf8Gd5F1dGxi/9+Nq7lBe1OKk1Kpbn0so0UIZivY3zFqH/UOEjau0/to4gOhtOZ0SNJ0CiKD3kIHqlNE07bY/eT15oqNj8qwMLZfGuUYcqnSDCqUi4qad1sZUlg9qrXHT2Ypr2VhZM2RT+6Fb4mUWb1M7RlTLfJUGkId1ixP7xZFeY7qf10eElrckW5dxX5EV6BZ2xRFxKizJV0DrgsPH44Ixn1cipokqFJGVBR2PnwY0Dwoy+Fcr/SjQe0tIxmRKVr2cU7eMjrsZFGBAYHEWujqfwNhWBgeoOmC9nJJS+GaIYKuCECXoQV1nRd9o/2JM2DvxzQi0zlVCYbBQ=='

# 3.2 读取私钥
f = open('rsa_private_pem.txt', mode='r', encoding='utf-8')
# 3.3 生成密钥对象
rsa_key = RSA.import_key(f.read())
# 3.4 生成解密对象
rsa = PKCS1_v1_5.new(rsa_key)
# 3.5 处理bs64,以及解密
mingwen_bytes = rsa.decrypt(base64.b64decode(ctx), None)
# 3.6 utf-8 处理
mingwen = mingwen_bytes.decode('utf-8')
print(mingwen)

Guess you like

Origin blog.csdn.net/m0_57126939/article/details/128137563