Python full stack development, Day26 (hashlib file consistency, configparser, logging, collections module)

One, hashlib file consistency check

Why file consistency check?

To make sure you get the correct version of the file and not injected with viruses and trojans. For example, we often download software on the Internet, and these software have been injected with some advertisements and viruses. If we do not check the consistency between the file and the original publisher, it may bring us certain losses.

The principle
of file consistency check is to check the consistency of files. It is impossible for us to compare two files together like text files, because the files are often very large. The most ideal way at present is to generate a corresponding value for the file through an encryption algorithm, and confirm whether the two files are consistent by comparing the generated value with the value provided by the publisher.

MD5 and SHA1 are the most widely used encryption algorithms.

Example:

First manually create 2 files, file1 and file 2, content 123

Calculate encrypted value of file1 using MD5

import hashlib
md5obj = hashlib.md5()
with open('file1','rb')as f:
    content = f.read()
    md5obj.update(content)
print(md5obj.hexdigest())

Output after execution:

Calculate the encrypted value of fiel2 again, and copy the above code again? Too low, what if there are multiple files?　　

Define a method:

import hashlib
def check_md5(filename):
    md5obj = hashlib.md5()
    with open(filename,'rb')as f:
        content = f.read()
        md5obj.update(content)
    return md5obj.hexdigest()
ret1 = check_md5('file1')
ret2 = check_md5('file2')
print(ret1)
print(ret2)

　Execution output:

In this way, you can know whether the two files are consistent.

However, the above method has a flaw. When the file reaches the GB level, how can the memory support it? (This comparison is equivalent to reading all files into memory first)

So what to do? Take a look at the following small chestnut:

import hashlib
md5obj = hashlib.md5()
md5obj.update(b'john') #b 'string' means bytes type and cannot have Chinese symbols
print(md5obj.hexdigest())

split string

import hashlib
md5obj = hashlib.md5() #Create MD5 object
md5obj.update(b'john') #Split string
md5obj.update(b'alen')
print(md5obj.hexdigest())

Execution output:

in conclusion:

The result of digesting a string directly and dividing it into several digests is the same

Then you can encrypt the large file into segments and perform MD5 encryption, and that's it.

Then you can encrypt the large file in segments and perform md5 encryption, and that's it.

Download a movie "The Pianist at Sea" with a file size of 1.58GB

The film tells the life of a legendary piano genius. Douban score 9.2

Calculate the md5 value of a movie

 
         import  
         hashlib 
        
         def  
         check(filename): 
        
         md5obj  
         =  
         hashlib.md5() 
        
         with  
         open 
         (filename, 
         'rb' 
         ) as f: 
        
         while  
         True 
         : 
        
         content  
         =  
         f.read( 
         1048576 
         )   
         # 每次读取1048576字节，也就是1MB 
        
         if  
         content: 
        
         md5obj.update(content) 
        
         else 
         : 
        
         break   
         # 当内容为空时,终止循环 
        
         return  
         md5obj.hexdigest() 
        
         ret1  
         =  
         check( 
         'E:\迅雷下载\[迅雷下载www.2tu.cc]海上钢琴师.BD1280高清中英双字.rmvb' 
         ) 
        
         print 
         (ret1)

It took 9 seconds to execute the output:

30c7f078203d761d3f13bec6f8fd3088

Summarize:

Serialize converts data type to string

　　Why serialize, because only bytes can exist on the network and in files

json

　　Common in all languages, serializes only a limited number of data types dictionary list string number tuple

　　When the dump data is written multiple times into the file, it cannot be retrieved by load.

pickle

　　Can only be used in python, and can be serialized for most data types

　　In the load is Oh Lake, you must have the class corresponding to the load data type in memory

　　dumps serialization

　　loads deserialization

　　dump serializes directly to a file

　　load deserializes the file directly

shelve

　　f = open() opens the file

json and pickle must be proficient

Second, the configarser module

　　The format of this module for configuration files is similar to windows ini files, which can contain one or more bytes (section), and each byte can have multiple parameters (key=value).

Create a file

Let's take a look at a common document format for a lot of software as follows:

Section is called a node, and the assignment pair in the node is called an item

What if you want to use python to generate such a document?

 
      import  
      configparser 
     

         
     
 
      config  
      =  
      configparser.ConfigParser()   
      #创建一个ConfigParser对象 
     

         
     
 
      config[ 
      "DEFAULT" 
      ]  
      =  
      { 
      'ServerAliveInterval' 
      :  
      '45' 
      ,   
      #默认参数 
     
 
                             
      'Compression' 
      :  
      'yes' 
      , 
     
 
                            
      'CompressionLevel' 
      :  
      '9' 
      , 
     
 
                            
      'ForwardX11' 
      : 
      'yes' 
     
 
                            
      } 
     
 
      config[ 
      'bitbucket.org' 
      ]  
      =  
      { 
      'User' 
      : 
      'hg' 
      }  
      #添加一个节点bitbucket.org 
     
 
      config[ 
      'topsecret.server.com' 
      ]  
      =  
      { 
      'Host Port' 
      : 
      '50022' 
      , 
      'ForwardX11' 
      : 
      'no' 
      } 
     

         
     
 
      with  
      open 
      ( 
      'example.ini' 
      ,  
      'w' 
      ) as configfile:  
      #写入配置文件example.ini 
     
 
          
      config.write(configfile) 
     

Execute the program to view the content of example.ini

 
         [DEFAULT] 
        
         serveraliveinterval  
         =  
         45 
        
         forwardx11  
         =  
         yes 
        
         compression  
         =  
         yes 
        
         compressionlevel  
         =  
         9 
        
         [bitbucket.org] 
        
         user  
         =  
         hg 
        
         [topsecret.server.com] 
        
         forwardx11  
         =  
         no 
        
         host port  
         =  
         50022

可以看出节点的项，都变成小写了。

这是因为它在写入的时候，将所有字符串使用了lower()方法，转换为小写了。

查找文件

import configparser
config = configparser.ConfigParser()
config.read('example.ini')          ###上面内容为固定部分###
print(config.sections())            # 查看所有的节点，但默认不显示DEFAULT,返回列表

执行输出：

下面的代码，固定部分我就不贴了

 
        print 
        ( 
        'bitbucket.org'  
        in  
        config)   
        # 验证某个节点是否在文件中

执行输出： True

 
         print 
         (config[ 
         'bitbucket.org' 
         ][ 
         'user' 
         ])   
         # 查看某节点下面的某个项的值

执行输出： hg

 
         print 
         (config[ 
         'bitbucket.org' 
         ])   
         # 输出一个可迭代对象

执行输出： <Section: bitbucket.org>

 
         #使用for循环一个可迭代对象 
        
         for  
         key  
         in  
         config[ 
         'bitbucket.org' 
         ]:   
         # 注意,有default时,会默认输出它的键 
        
         print 
         (key)

执行输出：

user
serveraliveinterval
forwardx11
compression
compressionlevel

 
         print 
         (config.items( 
         'bitbucket.org' 
         ))   
         # 找到'bitbucket.org'下所有的键值对

执行输出：

[('serveraliveinterval', '45'), ('forwardx11', 'yes'), ('compression', 'yes'), ('compressionlevel', '9'), ('user', 'hg')]

 
         print 
         (config.get( 
         'bitbucket.org' 
         , 
         'compression' 
         ))   
         # get方法section下的key对应的value

执行输出： yes

增删改操作

增加一个节点

 
        print 
        (config.add_section( 
        'yuan' 
        ))   
        # 增加一个节点

注意，它不会立即写入!必须执行下面的代码

 
         config.write( 
         open 
         ( 
         'example.ini' 
         ,  
         "w" 
         ))  
         # 写入文件

open('example.ini',w) 表示清空文件

config.write 表示写入内容

再次查看文件内容：

[DEFAULT]
serveraliveinterval = 45
forwardx11 = yes
compression = yes
compressionlevel = 9

[bitbucket.org]
user = hg

[topsecret.server.com]
forwardx11 = no
host port = 50022

[yuan]

　删除一个节点

config.remove_section('bitbucket.org')
config.write(open('example.ini','w'))

　修改节点

1 2	`config.` `set` `(` `'yuan'` `,` `'k2'` `,` `'222'` `)` `# yuan节点增加项k2 = 222` `config.write(` `open` `(` `'example.ini'` `,` `"w"` `))` `# 写入文件`

总结：

section 可以直接操作它的对象来获取所有的节信息

option 可以通过找到的节来查看多有项

三，loggin

为了保护数据安全
所有的增加，修改，删除操作，都要记录日志

比如log日志，管理员操作日志，消费记录...

日志给我们在内部操作的时候提供很多遍历
日志给用户提供更多的信息
在程序使用的过程中自己调试需要看的信息
帮助程序员排查程序的问题

ogging模块不会自动帮你添加日志的内容
你自己想打印什么你就写什么

import logging
logging.debug('debug message')
logging.info('info message')
logging.warning('warning message')
logging.error('error message')
logging.critical('critical message')

执行输出：　

设置INFO，只显示INFO以上的错误

能不能只显示一种级别信息呢？不行！
只能打印某个级别以上的信息

增加时间显示

import logging
logging.basicConfig(level=logging.DEBUG,
                    format='%(asctime)s %(filename)s[line:%(lineno)d] '
                           '%(levelname)s %(message)s')
logging.debug('debug message')  #debug调试模式 级别模式
logging.info('info message')    #info 显示正常信息
logging.warning('warning messafe')  #warning 显示警告信息
logging.error('error message')      #error 显示错误信息
logging.critical('critical message') #critical 显示验证错误信息

　执行输出：

设置时间格式

#设置时间格式
import logging
logging.basicConfig(level=logging.DEBUG,
                    format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s'),
datefmt = '%a, %d %b %y %H:%M:%S',
logging.debug('debug message')       # debug 调试模式 级别最低
logging.info('info message')         # info  显示正常信息
logging.warning('warning message')   # warning 显示警告信息
logging.error('error message')       # error 显示错误信息
logging.critical('critical message') # critical 显示验证错误信息

　　执行输出：

logging.basicConfig()函数中可通过具体参数来更改logging模块默认行为，可用参数有：

filename：用指定的文件名创建FiledHandler，这样日志会被存储在指定的文件中。
filemode：文件打开方式，在指定了filename时使用这个参数，默认值为“a”还可指定为“w”。
format：指定handler使用的日志显示格式。
datefmt：指定日期时间格式。
level：设置rootlogger（后边会讲解具体概念）的日志级别
stream：用指定的stream创建StreamHandler。可以指定输出到sys.stderr,sys.stdout或者文件(f=open(‘test.log’,’w’))，默认为sys.stderr。若同时列出了filename和stream两个参数，则stream参数会被忽略。

format参数中可能用到的格式化串：
%(name)s Logger的名字
%(levelno)s 数字形式的日志级别
%(levelname)s 文本形式的日志级别
%(pathname)s 调用日志输出函数的模块的完整路径名，可能没有
%(filename)s 调用日志输出函数的模块的文件名
%(module)s 调用日志输出函数的模块名
%(funcName)s 调用日志输出函数的函数名
%(lineno)d 调用日志输出函数的语句所在的代码行
%(created)f 当前时间，用UNIX标准的表示时间的浮 点数表示
%(relativeCreated)d 输出日志信息时的，自Logger创建以 来的毫秒数
%(asctime)s 字符串形式的当前时间。默认格式是 “2003-07-08 16:49:45,896”。逗号后面的是毫秒
%(thread)d 线程ID。可能没有
%(threadName)s 线程名。可能没有
%(process)d 进程ID。可能没有
%(message)s用户输出的消息

写入文件

 
        import  
        logging 
       
        logging.basicConfig(level 
        = 
        logging.DEBUG, 
       
        format 
        = 
        '%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s' 
        , 
       
        datefmt 
        = 
        '%a, %d %b %y %H:%M:%S' 
        , 
       
        filename  
        =  
        'userinfo.log' 
       
        ) 
       
        logging.debug( 
        'debug message' 
        )        
        # debug 调试模式 级别最低 
       
        logging.info( 
        'info message' 
        )          
        # info  显示正常信息 
       
        logging.warning( 
        'warning message' 
        )    
        # warning 显示警告信息 
       
        logging.error( 
        'error message' 
        )        
        # error 显示错误信息 
       
        logging.critical( 
        'critical message' 
        )  
        # critical 显示验证错误信息

执行程序，查看文件内容

某些情况下，查看文件是乱码的。

它的局限性有2个

　　编码格式不能设置

　　不能同时输出到文件和屏幕

Python full stack development, Day26 (hashlib file consistency, configparser, logging, collections module)

Guess you like