Python few lines of code to realize mail parsing

foreword

How to implement mail parsing through python? The format of the mail is very complex, mainly the mime protocol. This article mainly starts from the realization, and the specific principle can be studied by yourself.

1. Installation

Mail parsing is achieved through mailgun's open source Flanker library. The library contains email address parsing and email mime format parsing.

Enter the following command:

pip install flanker

Second, the code implementation

1. Mail header

def emlAnayalyse(path):
    with open(path, 'rb') as fhdl:
        raw_email = fhdl.read()
        eml = mime.from_string(raw_email)
        subject = eml.subject
        eml_header_from = eml.headers.get('From')
        eml_header_to = eml.headers.get('To')
        eml_header_cc=eml.headers.get('Cc')
        eml_time = eml.headers.get('Date')
        # get_annex(eml, '1')
        eml_attachs=attachEml1(eml)
        eml_body = contentEml(eml)
        f = HTMLFilter()
        f.feed(eml_body)
        print(f.text)
        
def main():
    path='邮件名.eml'
    emlAnayalyse(path)
    
if __name__ == "__main__":
    main()

The eml.header contains header information such as sender, recipient, cc, and time.

2. Email body

# 邮件正文
def contentEml(eml):
    # 判断是否为单部分

    if eml.content_type.is_singlepart():
        eml_body = eml.body
    else:
        eml_body = ''
        for part in eml.parts:
            # 判断是否是多部分
            if part.content_type.is_multipart():
                eml_body = contentEml(part)
            else:
                if part.content_type.main == 'text':
                    eml_body = part.body
    return eml_body

Through the callback function, take out the body part of the email 

3. Mail attachments

def attachEml1(eml):
    for part in eml.parts:
        if not part.content_type.is_multipart():    
            name = part.detected_file_name

            with open(name, 'wb') as annex:
                annex.write(part.body)

Determine whether it is an attachment by content_type.is_multipart() and save it.

Summarize

The basic content of email parsing has been introduced, and friends who need it can communicate more! ! !

Guess you like

Origin blog.csdn.net/kobepaul123/article/details/121962260