Method of extracting domain name from URL with Python

This article will use practical examples to explain Python's urlparse() function to parse and extract domain names in URLs. We'll also discuss how to improve our ability to parse URLs and the different components that use them.

Extract domain names from URLs with urlparse()

The urlparse() method is part of Python's urllib module and is useful when you need to split a URL into its different components and use them for different purposes. Let's look at this example:

from urllib.parse import urlparse
component = urlparse('http://www.google.com/doodles/mothers-day-2021-april-07')
print(component)

In this code snippet, we first include the library files from the urllib module. Then, we pass a URL to the urlparse function. The return value of this function is an object, which is like an array with six elements, as follows:

  • scheme – specifies the protocol we can use to fetch online resources, for example, HTTP/HTTPS.
  • netloc – net means network and loc means location; so it means network location of URLs.
  • path – A specific path a web browser uses to access the provided resource.
  • params – These are the parameters for the path element.
  • query - Following the path component and data steam, a resource can be used.
  • fragment – ​​it classifies the parts.

When we display this object using the print function, it will print the values ​​of its components. The output of the above code fence will be as follows:

ParseResult(scheme='http', netloc='www.google.com', path='/doodles/mothers-day-2021-april-07', params='', query='', fragment='')

You can see from the output that all URL components are separated out and stored in the object as separate elements. We can get the value of any component by using its name like this:

from urllib.parse import urlparse
domain_name = urlparse('http://www.google.com/doodles/mothers-day-2021-april-07').netloc
print(domain_name)

Using the netloc component, we can get the domain name of the URL as follows:

www.google.com

This way, we can get our URL parsed and use its different components for various purposes in our programming.

Guess you like

Origin blog.csdn.net/qdPython/article/details/132064049