Can't copy? python to help you solve

I believe that everyone will often encounter this situation ( cannot be copied ):

This is a direct "extortion", and poor college students say they can't afford it~

All of the above are cases where the web page cannot copy the text. None of this is a problem for Python though. Today I will take you to solve it with Python .

Core technology: use pdfkit library to save html web pages as pdf

1. Install the pdfkit library

pip install pdfkit

Install pdfkit through the command, in addition, you need to install the exe file ( wkhtmltopdf ) download link:

https://wkhtmltopdf.org/downloads.html

 Select the corresponding version to download and install ( remember your own installation directory )

2. Actual combat operation

Here we are now testing one of the articles on the Baidu Wenku platform (the article is set to prohibit copying )

 For example, when the author wants to copy, there will be a prohibition of reprinting ( no copying ), and then start to save the pdf of this webpage.

import pdfkit
import time

if __name__ == '__main__':
    url = "https://wenku.baidu.com/view/e1dd3a2f0066f5335a812103?aggId=e1dd3a2f0066f5335a812103"
    config = pdfkit.configuration(wkhtmltopdf=r'D:\wkhtmltopdf\bin\wkhtmltopdf.exe')
    pdfkit.from_url(url, r"D:\素材库\%s.pdf"
                    % time.strftime('%Y-%m-%d-%H-%M-%S', time.localtime(time.time())), configuration=config)

This saves the content as a pdf and can be copied directly. 

Guess you like

Origin blog.csdn.net/yyfloveqcw/article/details/124317632