python study notes (17) urllib.parse module

url.parse: url defines a standard interface to achieve a variety of extraction url
Use parse module: parsing the url, combined, encoding, decoding
Require the use of import

from urllib import parse

the urlparse () is implemented to identify and segment URL
= URL ' https://book.qidian.com/info/1004608738?wd=123&page=20#Catalog ' 
"" "
 URL: URL to be resolved
scheme = '' : If there is no agreement url parsing can set the default protocol, url if there is an agreement, this parameter set is invalid
allow_fragments = True: whether to ignore the anchor, said they did not ignore the default is True, False to omit
 "" "
 the Result = parse.urlparse (url = url, scheme = ' HTTP ' , allow_fragments = True)

print(result)
print(result.scheme)
"""
(scheme='https', netloc='book.qidian.com', path='/info/1004608738', params='', query='wd=123&page=20', fragment='Catalog')
scheme: presentation protocol
netloc: domain name
path: the path
params : parameters
query: query, usually get url request
fragment: anchor for direct registration page
The position of the drop-down surface, jump to a specific location page
"""

urlunparse () URL structure can be achieved

url_parmas = ('https', 'book.qidian.com', '/info/1004608738', '', 'wd=123&page=20', 'Catalog')
#components: the object is an iterative, it must be 6
result = parse.urlunparse(url_parmas)
print(result)

"""
https://book.qidian.com/info/1004608738?wd=123&page=20#Catalog
""

urljoin () to pass a basic link, according to the underlying link may be a link to an incomplete mosaic is a complete link

base_url = 'https://book.qidian.com/info/1004608738?wd=123&page=20#Catalog'
sub_url = '/info/100861102'

full_url = parse.urljoin(base_url,sub_url)

print(full_url)

urlencode () of the string of the dictionary tautomeric forms url parameter sequence encoding into (get request parameters used to construct and post request) k1 = v1 & k2 = v2

parmas = {
    'wd':'123',
    'page':20
}
parmas_str = parse.urlencode (Parma)

print (parmas_str)

"""
page=20&wd=123
"""

parse_qs () url parameter encoding format deserialize dictionaries
parmas_str = 'page=20&wd=123'
parmas = parse.parse_qs(parmas_str)
print (Parma)

"""
{'page': ['20'], 'wd': ['123']}
"""

 

quote () can convert Chinese into URL-encoded format

= Word ' China Dream ' 
URL = ' http://www.baidu.com/s?wd= ' + parse.quote (Word)
print(parse.quote(word))
print(url)

"""
%E4%B8%AD%E5%9B%BD%E6%A2%A6
HTTP: // www.baidu.com/s?wd=%E4%B8%AD%E5%9B%BD%E6%A2%A6 
"" "
 unquote: the URL of coding can be decoded
url = 'http://www.baidu.com/s?wd=%E4%B8%AD%E5%9B%BD%E6%A2%A6'
print(parse.unquote(url))
"" "
 HTTP: // www.baidu.com/s?wd= China Dream 
" ""

 

Guess you like

Origin www.cnblogs.com/wuzm/p/11655035.html