Python batch crawler downloads files - quickly turn hyperlinks in Excel into URLs

The background of this article is: a teacher with a good relationship with the university asked me if I could download the pdf files corresponding to the 1000 hyperlink URLs in Excel. Although you can manually click to download one by one, but this is too labor-intensive and time-consuming. I remembered my previous reptile experience, analyzed the feasibility for the teacher, and then practiced it.
  
Unexpectedly, I encountered difficulties at the beginning, and the hyperlinks in Excel were directly displayed in Chinese when read in Python. So the first step is to sort out the URLs corresponding to the hyperlinks, and then use Python to crawl the pdf of the corresponding URLs. This article shares the first step of batch crawler downloading files, converting hyperlinks into corresponding URLs from Excel. The next article shares the code for batch crawlers to download pdf files.


  

1. The desired effect

  
Let's first look at the desired effect. The first column is the original hyperlink, and the second column is the corresponding URL we want to get.
  
insert image description here

  
  

2. Three ways to convert hyperlinks into corresponding URLs

  
There are many ways to achieve hyperlink conversion on the Internet. I will share 3 methods that I tried. The first 2 methods failed, and the last one was successful.
  
  

1 Method 1: Single hyperlink mouse click conversion

  
The first method is to select the cell that you want to convert the hyperlink into the corresponding URL, then double-click the left mouse button, and then press Enter, the cell content will be automatically converted into the URL. This method is only suitable for converting a small number of hyperlinks, and I failed in the process of trying.

  
  

2 Method Two: AutoFormat

  
The second method is to click File-More-Options-Proofing-Autocorrect Options-Automatically format as you type, select Internet and network paths to replace with hyperlinks, and then click OK. I tried the second method and it still failed...
  
  

3 Method 3: Custom VBA function conversion

  
The third method is to customize the VBA function for conversion.
  

[1] Enable [Development Tools], the specific steps are as follows:

  
Left-click the [File] tab in the menu bar, then left-click [More], then left-click [Options]. Left-click the [Customize Ribbon] option in [Excel Options], then tick the small box in front of [Development Tools], and then left-click the [OK] button, and there will be more Choose [Development Tools].
  
step1: Left-click the [File] tab in the menu bar, then left-click [More], and then left-click [Options].
  

insert image description here
  

Step2: Left-click the [Custom Ribbon] option in [Excel Options], then tick the small box in front of [Development Tools], and then left-click the [OK] button.
  
insert image description here

Step3: Check if there is an option [Development Tools] in the menu bar.
  
insert image description here

  

[2] Customize a VBA function GetAdrs.

  
First left-click the [Development Tools] option, and then left-click [Visual Basic Editor] in the [Code] ribbon. Right-click the [Project Explorer] window, move the mouse pointer to the [Insert] option, left-click the [Module] option in the secondary menu, insert [Module 1], and copy and paste the following code into the [Module] 1] after the code window, and finally close the Visual Basic editor.
  
Step1: Left-click the [Development Tools] option, and then left-click [Visual Basic Editor] in the [Code] ribbon.
  
insert image description here
  
Step2: Right-click the [Project Explorer] window, move the mouse pointer to the [Insert] option, left-click the [Module] option in the secondary menu, insert [Module 1], and copy and paste the following code into After the code window of [Module 1], finally close the Visual Basic editor.

Function GetAdrs(Rng)
  Application.Volatile True
  With Rng.Hyperlinks(1)
    GetAdrs = IIf(.Address = "", .SubAddress, .Address)
  End With
End Function

insert image description here

  

[3] Get the URL with the function GetAdrs.

  
First, left-click to select the [B2] cell, type in the custom function [=GetAdrs(A2)], and press Enter to calculate. Move the mouse pointer to the lower right corner of cell [B2], when the mouse pointer turns into a [+] sign, press and hold the left mouse button and drag down to fill in the formula.
  
So far, the quick conversion of hyperlinks in Excel to URLs has been explained, and interested students can realize the pictures by themselves.
  
[ Limited time free access to the group ] The group provides recruitment information related to learning Python, playing with Python, risk control modeling, artificial intelligence, and data analysis, excellent articles, learning videos, and can also exchange related problems encountered in learning and work. Friends who need it can add WeChat ID 19967879837, and add time to note the groups they want to join, such as risk control modeling.
  
References
https://baike.baidu.com/
https://zhuanlan.zhihu.com/ "Data Science and Big Data Technology" School Ranking - Zhihu (zhihu.com)

You may be interested in:
Drawing Pikachu in PythonUsing
Python to draw word cloudsPython
face recognition - you are the only one in my eyesPython
draws a beautiful starry sky map (beautiful background)
Use the py2neo library in Python to operate neo4j and build an association mapPython
romance Confession source code collection (love, rose, photo wall, confession under the stars)

Guess you like

Origin blog.csdn.net/qq_32532663/article/details/132395576