Python notes (thirteen): urllib module

(1)      URL address

URL address component

URL component

illustrate

scheme

Network protocol or download scheme

net_loc

Server location (may contain user information)

path

Use (/) split files or paths to CGI applications

params

optional parameter

query

A sequence of key-value pairs separated by an ampersand (&)

fragment

Specify a section within a document for a specific anchor

net_loc component

user:password@host:port

components

illustrate

user

username or login

password

user password

host

The name or address of the computer running the web server (required)

port

port number (if not default 80)

 

(2)      urllib

    Here we mainly describe urllib.request and urllib.parse.

(三)      urllib.request

urllib.request

components

illustrate

urlopen(url,data=None)

Opens a URL link and returns a file-type object as if open had opened a file locally with binary read-only.

url: can be a url string or a request object

When data:url is the request object, you can specify the data to be passed

urlretrieve(url,filename=None)

Download the file in the url

filename: file name and path (if no path is specified, it will be stored in the current working directory)

urlopen object method

components

illustrate

read()

read all data

readline()

read a row of data

readlines()

Read all rows, return as a list

fileno()

return file handle

close()

Close the url connection (close and the above four methods are the same as the open method of the same name)

info()

Returns MIME (Multi-Object Internet Mail Extensions) headers. This header file informs the browser what type of files to return, and what kinds of applications can be used to open them.

geturl()

Returns the real url (e.g. if there is a redirect, you can get the real url from the final opened file)

getcode()

Return HTTP status code


1
import urllib.request 2 url = 'https://tieba.baidu.com/p/5475267611' 3 #打开url(就像用open二进制只读方式打开一个文件一样),使用read读取所有数据 4 html = urllib.request.urlopen(url).read() 5 print(type(html)) 6 7 url_file = 'https://imgsa.baidu.com/forum/w%3D580/sign=99114e38abec08fa260013af69ef3d4d/e549b13533fa828bc80c7764f61f4134960a5a85.jpg' 8 #下载url中的文件并保存 9 urllib.request.urlretrieve(url_file,'C:\Temp\\1.jpg') 10 11 #返回MIEM头文件 12 html_info = urllib.request.urlopen(url).info() 13 print(html_info)

(四)      urlib.parse

urlib.parse

函数

说明

urlparse(urlstr)

将url解析为一个元组(scheme='', netloc='', path='', params='', query='', fragment='')

urlunparse(urltup)

和urlparse相反,将url组件(一个元组)拼接为完整的url

urljoin(base,url)

将base的根域名和url拼接为一个完整的url

base:函数会自动截取net_loc及前面的所有内容


1
import urllib.parse 2 3 url = 'https://www.cnblogs.com/cate/python/' 4 newurl = '/cate/ruby/' 5 #将url解析为一个元组(scheme='', netloc='', path='', params='', query='', fragment='') 6 urlpar = urllib.parse.urlparse(url) 7 print('urlparse示例:',urlpar) 8 #和urlparse刚好相反,将元组(scheme='', netloc='', path='', params='', query='', fragment='')拼接为完整的url 9 urlunp = urllib.parse.urlunparse(urlpar) 10 print('urlunparse示例:',urlunp) 11 #将url netloc及前面部分的内容与newurl连接起来 12 url_ruby = urllib.parse.urljoin(url,newurl) 13 print('urljoin示例:',url_ruby)

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325013468&siteId=291194637