Sesame HTTP: TXT text storage

The operation of saving data to TXT text is very simple, and TXT text is compatible with almost any platform, but this has the disadvantage that it is not conducive to retrieval. Therefore, if the retrieval and data structure requirements are not high, and the pursuit of convenience is the first, TXT text storage can be used. In this section, let's take a look at how to save TXT text files using Python.

1. Objectives of this section

In this section, we want to save the "Hot Topics" section of the "Discovery" page on Zhihu, and save its questions and answers as text.

2. Basic example

First, you can use requests to get the source code of the web page, then use the pyquery parsing library to parse it, and then save the extracted title, answerer, and answer to text. The code is as follows:

import requests
from pyquery import PyQuery as pq

url = 'https://www.zhihu.com/explore'
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'
}
html = requests.get(url, headers=headers).text
doc = pq(html)
items = doc('.explore-tab .feed-item').items()
for item in items:
    question = item.find('h2').text()
    author = item.find('.author-link-line').text()
    answer = pq(item.find('.content').html()).text()
    file = open('explore.txt', 'a', encoding='utf-8')
    file.write('\n'.join([question, author, answer]))
    file.write('\n' + '=' * 50 + '\n')
    file.close()

This is mainly to demonstrate how the file is saved, so the exception handling part of requests is omitted here. First, use requests to extract the "discovery" page of Zhihu, and then extract the full text of the question, answerer, and answer of the hot topic, and then use the open()method provided by Python to open a text file to obtain a file operation object, which is assigned as file, then Use filethe method of the object to write()write the extracted content into the file, and finally call close()the method to close it, so that the captured content can be successfully written into the text.

Run the program, you can find that an explore.txt file is generated locally, and its content is shown in the figure.

 

In this way, the content of popular questions and answers is saved in text form.

The first parameter of this open()method is the name of the target file to be saved, and the second parameter is ato write to the text by appending. Additionally, we specified the encoding of the file as utf-8. Finally, after writing is complete, you also need to call a close()method to close the file object.

3. Open method

In the example just now, open()the second parameter of the method is set aso that each time the text is written, the source file will not be emptied, but new content will be written at the end of the file, which is a file opening method. There are actually several other ways to open files, which are briefly introduced here.

  • r: Open the file as read-only. The file pointer will be placed at the beginning of the file. This is the default mode.
  • rb: Open a file as binary read-only. The file pointer will be placed at the beginning of the file.
  • r+: Open a file for reading and writing. The file pointer will be placed at the beginning of the file.
  • rb+: Open a file for binary read-write. The file pointer will be placed at the beginning of the file.
  • w: Open a file for writing. If the file already exists, it will be overwritten. If the file does not exist, create a new file.
  • wb: Open a file for binary writing. If the file already exists, it will be overwritten. If the file does not exist, create a new file.
  • w+: Open a file for reading and writing. If the file already exists, it will be overwritten. If the file does not exist, create a new file.
  • wb+: Open a file in binary read-write format. If the file already exists, it will be overwritten. If the file does not exist, create a new file.
  • a: Open a file for appending. If the file already exists, the file pointer will be placed at the end of the file. That is, the new content will be written after the existing content. If the file does not exist, create a new file to write to.
  • ab: Open a file in binary append mode. If the file already exists, the file pointer will be placed at the end of the file. That is, the new content will be written after the existing content. If the file does not exist, create a new file to write to.
  • a+: Open a file for reading and writing. If the file already exists, the file pointer will be placed at the end of the file. The file will be opened in append mode. If the file does not exist, create a new file for reading and writing.
  • ab+: Open a file in binary append mode. If the file already exists, the file pointer will be placed at the end of the file. If the file does not exist, a new file is created for reading and writing.

4. Simplified writing

Also, there is a shorthand way of writing to a file, which is to use the with assyntax. At the withend of the control block, the file is automatically closed, so there is no need to call close()the method again. This way of saving can be abbreviated as follows:

with open('explore.txt', 'a', encoding='utf-8') as file:
    file.write('\n'.join([question, author, answer]))
    file.write('\n' + '=' * 50 + '\n')

If you want to clear the original text when saving, you can rewrite the second parameter as w, the code is as follows:

with open('explore.txt', 'w', encoding='utf-8') as file:
    file.write('\n'.join([question, author, answer]))
    file.write('\n' + '=' * 50 + '\n')

The above is the method of using Python to save the result as a TXT file. This method is simple and easy to use, and the operation is efficient. It is the most basic method of saving data.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324932310&siteId=291194637