url.parse: url defines a standard interface to achieve a variety of extraction url
Use parse module: parsing the url, combined, encoding, decoding
Require the use of import
from urllib import parse
the urlparse () is implemented to identify and segment URL
= URL ' https://book.qidian.com/info/1004608738?wd=123&page=20#Catalog ' "" " URL: URL to be resolved scheme = '' : If there is no agreement url parsing can set the default protocol, url if there is an agreement, this parameter set is invalid allow_fragments = True: whether to ignore the anchor, said they did not ignore the default is True, False to omit "" " the Result = parse.urlparse (url = url, scheme = ' HTTP ' , allow_fragments = True) print(result) print(result.scheme) """ (scheme='https', netloc='book.qidian.com', path='/info/1004608738', params='', query='wd=123&page=20', fragment='Catalog') scheme: presentation protocol netloc: domain name path: the path params : parameters query: query, usually get url request fragment: anchor for direct registration page The position of the drop-down surface, jump to a specific location page """
urlunparse () URL structure can be achieved
url_parmas = ('https', 'book.qidian.com', '/info/1004608738', '', 'wd=123&page=20', 'Catalog') #components: the object is an iterative, it must be 6 result = parse.urlunparse(url_parmas) print(result) """ https://book.qidian.com/info/1004608738?wd=123&page=20#Catalog ""
urljoin () to pass a basic link, according to the underlying link may be a link to an incomplete mosaic is a complete link
base_url = 'https://book.qidian.com/info/1004608738?wd=123&page=20#Catalog' sub_url = '/info/100861102' full_url = parse.urljoin(base_url,sub_url) print(full_url)
urlencode () of the string of the dictionary tautomeric forms url parameter sequence encoding into (get request parameters used to construct and post request) k1 = v1 & k2 = v2
parmas = { 'wd':'123', 'page':20 } parmas_str = parse.urlencode (Parma) print (parmas_str) """ page=20&wd=123 """ parse_qs () url parameter encoding format deserialize dictionaries parmas_str = 'page=20&wd=123' parmas = parse.parse_qs(parmas_str) print (Parma) """ {'page': ['20'], 'wd': ['123']} """
quote () can convert Chinese into URL-encoded format
= Word ' China Dream ' URL = ' http://www.baidu.com/s?wd= ' + parse.quote (Word) print(parse.quote(word)) print(url) """ %E4%B8%AD%E5%9B%BD%E6%A2%A6 HTTP: // www.baidu.com/s?wd=%E4%B8%AD%E5%9B%BD%E6%A2%A6 "" " unquote: the URL of coding can be decoded url = 'http://www.baidu.com/s?wd=%E4%B8%AD%E5%9B%BD%E6%A2%A6' print(parse.unquote(url)) "" " HTTP: // www.baidu.com/s?wd= China Dream " ""