from using urllib import parse module

I. INTRODUCTION

Url defines a standard interface to achieve a variety of extraction url

Parse action module: parsing the url, combined, encoding, decoding

Second, the code

Method One: urlparse

Identification and segmentation of realization url

from the urllib Import the parse 

url = ' https://www.cnblogs.com/angelyan/ ' 
"" " 
url: be resolved url 
scheme = '': If the url is not parsed protocol, a default protocol may be provided, if there is url protocol, set this parameter is invalid 
allow_fragments = True: whether to ignore the anchor, said they did not ignore the default is True, False to omit 
"" " 
the Result = parse.urlparse (url = url, scheme = ' HTTP ' , allow_fragments = True) 

Print (Result)
 Print (result.scheme)
 "" " 
(= scheme 'HTTPS', netloc = 'www.cnblogs.com', path = '/ angelyan /', the params = '', Query = '', the fragment = ' ') 
scheme: indicates the protocol 
netloc:Domain name 
path: the path 
params: parameters
query: query, url generally get request
fragment: anchor for direct registration page 
pull-down position of the surface, jump to a specific location on the page of 
"" "

Method Two: urlunparse

Url structure can be achieved

= url_parmas ( ' HTTPS ' , ' www.cnblogs.com ' , ' / angelyan / ' , '' , ' name = Maple ' , ' log ' )
 # Components: is an iterator object must length. 6 
Result = the parse .urlunparse (url_parmas)
 Print (the Result) 

"" " 
https://www.cnblogs.com/angelyan/?name=maple#log 
" ""

Method three: urljoin

Transmitting a connection base, The base may be connected to any one link incomplete splicing is a full link

base_url = 'https://www.cnblogs.com'
sub_url = '/angelyan/?name=maple#log'

full_url = parse.urljoin(base_url,sub_url)

print(full_url)
"""
https://www.cnblogs.com/angelyan/?name=maple#log
"""

Method four: urlencode

After the sequence of parameters into a dictionary coding url string request parameters used to construct get and post requests

parmas = {
     ' name ' : ' Maple ' ,
     ' Age ' : 18 is 
} 
parmas_str = parse.urlencode (parmas)
 Print (parmas_str)
 "" " 
name = Maple & Age = 18 is 
" "" 
parmas_str = ' name = Maple & Age = 18 is ' 
# url parameter encoding format deserialize dictionaries 
parmas = parse.parse_qs (parmas_str)
 Print (parmas) 

"" " 
{ 'name': [ 'Maple'], 'Age': [ '18 is']} 
" " "

Method five: quote

You can convert Chinese encoding format for the url

= Word ' China Dream ' 
URL = ' http://www.baidu.com/s?wd= ' + parse.quote (Word)
 Print (parse.quote (Word))
 Print (URL) 

"" " 
% E4% the AD E5%%% B8 9B the BD% A2%%% E6 A6 
http://www.baidu.com/s?wd=%E4%B8%AD%E5%9B%BD%E6%A2%A6 
"" " 
# unquote: the URL encoded can be decoded 
URL = ' http://www.baidu.com/s?wd=%E4%B8%AD%E5%9B%BD%E6%A2%A6 ' 
Print (parse.unquote (url))
 "" " 
http://www.baidu.com/s?wd= China dream 
" ""

 

Guess you like

Origin www.cnblogs.com/angelyan/p/11265614.html