One, hashlib file consistency check
Why file consistency check?
To make sure you get the correct version of the file and not injected with viruses and trojans. For example, we often download software on the Internet, and these software have been injected with some advertisements and viruses. If we do not check the consistency between the file and the original publisher, it may bring us certain losses.
The principle
of file consistency check is to check the consistency of files. It is impossible for us to compare two files together like text files, because the files are often very large. The most ideal way at present is to generate a corresponding value for the file through an encryption algorithm, and confirm whether the two files are consistent by comparing the generated value with the value provided by the publisher.
MD5 and SHA1 are the most widely used encryption algorithms.
Example:
First manually create 2 files, file1 and file 2, content 123
Calculate encrypted value of file1 using MD5
import hashlib md5obj = hashlib.md5() with open('file1','rb')as f: content = f.read() md5obj.update(content) print(md5obj.hexdigest())
Output after execution:
Calculate the encrypted value of fiel2 again, and copy the above code again? Too low, what if there are multiple files?
Define a method:
import hashlib def check_md5(filename): md5obj = hashlib.md5() with open(filename,'rb')as f: content = f.read() md5obj.update(content) return md5obj.hexdigest() ret1 = check_md5('file1') ret2 = check_md5('file2') print(ret1) print(ret2)
Execution output:
In this way, you can know whether the two files are consistent.
However, the above method has a flaw. When the file reaches the GB level, how can the memory support it? (This comparison is equivalent to reading all files into memory first)
So what to do? Take a look at the following small chestnut:
import hashlib md5obj = hashlib.md5() md5obj.update(b'john') #b 'string' means bytes type and cannot have Chinese symbols print(md5obj.hexdigest())
split string
import hashlib md5obj = hashlib.md5() #Create MD5 object md5obj.update(b'john') #Split string md5obj.update(b'alen') print(md5obj.hexdigest())
Execution output:
in conclusion:
The result of digesting a string directly and dividing it into several digests is the same
Then you can encrypt the large file into segments and perform MD5 encryption, and that's it.
Then you can encrypt the large file in segments and perform md5 encryption, and that's it.
Download a movie "The Pianist at Sea" with a file size of 1.58GB
The film tells the life of a legendary piano genius. Douban score 9.2
Calculate the md5 value of a movie
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
import
hashlib
def
check(filename):
md5obj
=
hashlib.md5()
with
open
(filename,
'rb'
) as f:
while
True
:
content
=
f.read(
1048576
)
# 每次读取1048576字节,也就是1MB
if
content:
md5obj.update(content)
else
:
break
# 当内容为空时,终止循环
return
md5obj.hexdigest()
ret1
=
check(
'E:\迅雷下载\[迅雷下载www.2tu.cc]海上钢琴师.BD1280高清中英双字.rmvb'
)
print
(ret1)
|
It took 9 seconds to execute the output:
30c7f078203d761d3f13bec6f8fd3088
Summarize:
Serialize converts data type to string
Why serialize, because only bytes can exist on the network and in files
json
Common in all languages, serializes only a limited number of data types dictionary list string number tuple
When the dump data is written multiple times into the file, it cannot be retrieved by load.
pickle
Can only be used in python, and can be serialized for most data types
In the load is Oh Lake, you must have the class corresponding to the load data type in memory
dumps serialization
loads deserialization
dump serializes directly to a file
load deserializes the file directly
shelve
f = open() opens the file
json and pickle must be proficient
Second, the configarser module
The format of this module for configuration files is similar to windows ini files, which can contain one or more bytes (section), and each byte can have multiple parameters (key=value).
Create a file
Let's take a look at a common document format for a lot of software as follows:
Section is called a node, and the assignment pair in the node is called an item
What if you want to use python to generate such a document?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
import
configparser
config
=
configparser.ConfigParser()
#创建一个ConfigParser对象
config[
"DEFAULT"
]
=
{
'ServerAliveInterval'
:
'45'
,
#默认参数
'Compression'
:
'yes'
,
'CompressionLevel'
:
'9'
,
'ForwardX11'
:
'yes'
}
config[
'bitbucket.org'
]
=
{
'User'
:
'hg'
}
#添加一个节点bitbucket.org
config[
'topsecret.server.com'
]
=
{
'Host Port'
:
'50022'
,
'ForwardX11'
:
'no'
}
with
open
(
'example.ini'
,
'w'
) as configfile:
#写入配置文件example.ini
config.write(configfile)
|
Execute the program to view the content of example.ini
1
2
3
4
5
6
7
8
9
10
11
12
|
[DEFAULT]
serveraliveinterval
=
45
forwardx11
=
yes
compression
=
yes
compressionlevel
=
9
[bitbucket.org]
user
=
hg
[topsecret.server.com]
forwardx11
=
no
host port
=
50022
|
可以看出节点的项,都变成小写了。
这是因为它在写入的时候,将所有字符串使用了lower()方法,转换为小写了。
查找文件
import configparser config = configparser.ConfigParser() config.read('example.ini') ###上面内容为固定部分### print(config.sections()) # 查看所有的节点,但默认不显示DEFAULT,返回列表
执行输出:
下面的代码,固定部分我就不贴了
1
|
print
(
'bitbucket.org'
in
config)
# 验证某个节点是否在文件中
|
执行输出: True
1
|
print
(config[
'bitbucket.org'
][
'user'
])
# 查看某节点下面的某个项的值
|
执行输出: hg
1
|
print
(config[
'bitbucket.org'
])
# 输出一个可迭代对象
|
执行输出: <Section: bitbucket.org>
1
2
3
|
#使用for循环一个可迭代对象
for
key
in
config[
'bitbucket.org'
]:
# 注意,有default时,会默认输出它的键
print
(key)
|
执行输出:
user
serveraliveinterval
forwardx11
compression
compressionlevel
1
|
print
(config.items(
'bitbucket.org'
))
# 找到'bitbucket.org'下所有的键值对
|
执行输出:
[('serveraliveinterval', '45'), ('forwardx11', 'yes'), ('compression', 'yes'), ('compressionlevel', '9'), ('user', 'hg')]
1
|
print
(config.get(
'bitbucket.org'
,
'compression'
))
# get方法section下的key对应的value
|
执行输出: yes
增删改操作
增加一个节点
1
|
print
(config.add_section(
'yuan'
))
# 增加一个节点
|
注意,它不会立即写入!必须执行下面的代码
1
|
config.write(
open
(
'example.ini'
,
"w"
))
# 写入文件
|
open('example.ini',w) 表示清空文件
config.write 表示写入内容
再次查看文件内容:
[DEFAULT] serveraliveinterval = 45 forwardx11 = yes compression = yes compressionlevel = 9 [bitbucket.org] user = hg [topsecret.server.com] forwardx11 = no host port = 50022 [yuan]
删除一个节点
config.remove_section('bitbucket.org') config.write(open('example.ini','w'))
修改节点
1
2
|
config.
set
(
'yuan'
,
'k2'
,
'222'
)
# yuan节点增加项k2 = 222
config.write(
open
(
'example.ini'
,
"w"
))
# 写入文件
|
总结:
section 可以直接操作它的对象来获取所有的节信息
option 可以通过找到的节来查看多有项
三,loggin
为了保护数据安全
所有的增加,修改,删除操作,都要记录日志
比如log日志,管理员操作日志,消费记录...
日志给我们在内部操作的时候提供很多遍历
日志给用户提供更多的信息
在程序使用的过程中自己调试需要看的信息
帮助程序员排查程序的问题
ogging模块 不会自动帮你添加日志的内容
你自己想打印什么 你就写什么
import logging logging.debug('debug message') logging.info('info message') logging.warning('warning message') logging.error('error message') logging.critical('critical message')
执行输出:
设置INFO,只显示INFO以上的错误
能不能只显示一种级别信息呢?不行!
只能打印某个级别以上的信息
增加时间显示
import logging logging.basicConfig(level=logging.DEBUG, format='%(asctime)s %(filename)s[line:%(lineno)d] ' '%(levelname)s %(message)s') logging.debug('debug message') #debug调试模式 级别模式 logging.info('info message') #info 显示正常信息 logging.warning('warning messafe') #warning 显示警告信息 logging.error('error message') #error 显示错误信息 logging.critical('critical message') #critical 显示验证错误信息
执行输出:
设置时间格式
#设置时间格式 import logging logging.basicConfig(level=logging.DEBUG, format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s'), datefmt = '%a, %d %b %y %H:%M:%S', logging.debug('debug message') # debug 调试模式 级别最低 logging.info('info message') # info 显示正常信息 logging.warning('warning message') # warning 显示警告信息 logging.error('error message') # error 显示错误信息 logging.critical('critical message') # critical 显示验证错误信息
执行输出:
logging.basicConfig()函数中可通过具体参数来更改logging模块默认行为,可用参数有: filename:用指定的文件名创建FiledHandler,这样日志会被存储在指定的文件中。 filemode:文件打开方式,在指定了filename时使用这个参数,默认值为“a”还可指定为“w”。 format:指定handler使用的日志显示格式。 datefmt:指定日期时间格式。 level:设置rootlogger(后边会讲解具体概念)的日志级别 stream:用指定的stream创建StreamHandler。可以指定输出到sys.stderr,sys.stdout或者文件(f=open(‘test.log’,’w’)),默认为sys.stderr。若同时列出了filename和stream两个参数,则stream参数会被忽略。 format参数中可能用到的格式化串: %(name)s Logger的名字 %(levelno)s 数字形式的日志级别 %(levelname)s 文本形式的日志级别 %(pathname)s 调用日志输出函数的模块的完整路径名,可能没有 %(filename)s 调用日志输出函数的模块的文件名 %(module)s 调用日志输出函数的模块名 %(funcName)s 调用日志输出函数的函数名 %(lineno)d 调用日志输出函数的语句所在的代码行 %(created)f 当前时间,用UNIX标准的表示时间的浮 点数表示 %(relativeCreated)d 输出日志信息时的,自Logger创建以 来的毫秒数 %(asctime)s 字符串形式的当前时间。默认格式是 “2003-07-08 16:49:45,896”。逗号后面的是毫秒 %(thread)d 线程ID。可能没有 %(threadName)s 线程名。可能没有 %(process)d 进程ID。可能没有 %(message)s用户输出的消息
写入文件
1
2
3
4
5
6
7
8
9
10
11
|
import
logging
logging.basicConfig(level
=
logging.DEBUG,
format
=
'%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s'
,
datefmt
=
'%a, %d %b %y %H:%M:%S'
,
filename
=
'userinfo.log'
)
logging.debug(
'debug message'
)
# debug 调试模式 级别最低
logging.info(
'info message'
)
# info 显示正常信息
logging.warning(
'warning message'
)
# warning 显示警告信息
logging.error(
'error message'
)
# error 显示错误信息
logging.critical(
'critical message'
)
# critical 显示验证错误信息
|
执行程序,查看文件内容
某些情况下,查看文件是乱码的。
它的局限性有2个
编码格式不能设置
不能同时输出到文件和屏幕