I. INTRODUCTION
Url defines a standard interface to achieve a variety of extraction url
Parse action module: parsing the url, combined, encoding, decoding
Second, the code
Method One: urlparse
Identification and segmentation of realization url
from the urllib Import the parse url = ' https://www.cnblogs.com/angelyan/ ' "" " url: be resolved url scheme = '': If the url is not parsed protocol, a default protocol may be provided, if there is url protocol, set this parameter is invalid allow_fragments = True: whether to ignore the anchor, said they did not ignore the default is True, False to omit "" " the Result = parse.urlparse (url = url, scheme = ' HTTP ' , allow_fragments = True) Print (Result) Print (result.scheme) "" " (= scheme 'HTTPS', netloc = 'www.cnblogs.com', path = '/ angelyan /', the params = '', Query = '', the fragment = ' ') scheme: indicates the protocol netloc:Domain name path: the path params: parameters query: query, url generally get request fragment: anchor for direct registration page pull-down position of the surface, jump to a specific location on the page of "" "
Method Two: urlunparse
Url structure can be achieved
= url_parmas ( ' HTTPS ' , ' www.cnblogs.com ' , ' / angelyan / ' , '' , ' name = Maple ' , ' log ' ) # Components: is an iterator object must length. 6 Result = the parse .urlunparse (url_parmas) Print (the Result) "" " https://www.cnblogs.com/angelyan/?name=maple#log " ""
Method three: urljoin
Transmitting a connection base, The base may be connected to any one link incomplete splicing is a full link
base_url = 'https://www.cnblogs.com' sub_url = '/angelyan/?name=maple#log' full_url = parse.urljoin(base_url,sub_url) print(full_url) """ https://www.cnblogs.com/angelyan/?name=maple#log """
Method four: urlencode
After the sequence of parameters into a dictionary coding url string request parameters used to construct get and post requests
parmas = { ' name ' : ' Maple ' , ' Age ' : 18 is } parmas_str = parse.urlencode (parmas) Print (parmas_str) "" " name = Maple & Age = 18 is " "" parmas_str = ' name = Maple & Age = 18 is ' # url parameter encoding format deserialize dictionaries parmas = parse.parse_qs (parmas_str) Print (parmas) "" " { 'name': [ 'Maple'], 'Age': [ '18 is']} " " "
Method five: quote
You can convert Chinese encoding format for the url
= Word ' China Dream ' URL = ' http://www.baidu.com/s?wd= ' + parse.quote (Word) Print (parse.quote (Word)) Print (URL) "" " % E4% the AD E5%%% B8 9B the BD% A2%%% E6 A6 http://www.baidu.com/s?wd=%E4%B8%AD%E5%9B%BD%E6%A2%A6 "" " # unquote: the URL encoded can be decoded URL = ' http://www.baidu.com/s?wd=%E4%B8%AD%E5%9B%BD%E6%A2%A6 ' Print (parse.unquote (url)) "" " http://www.baidu.com/s?wd= China dream " ""