Table of Contents
Introduction
A URL contains three to four components namely scheme, hostname, path and query string. The task is to find the protocol and hostname for the given URL.
Program
Approach 1
import re ip_url = input("Enter the url: ") protocol = re.findall('(\w+)://', ip_url) print("Protocol: ", protocol) hostname = re.findall('://www.([\w\-\.]+)', ip_url) print("Hostname: ", hostname)
Output
Approach 2
import re ip_url = input("Enter the url: ") file = re.findall('(\w+)://', s) print("Protocol: ", file) hostname = re.findall('://([\w\-\.]+)(:(\d+))?', ip_url) print("Hostname: ", hostname)
Output
Explanation
In first approach, the regex expression for extracting protocol is ‘(\w+)://‘ and for extracting hostname is ‘://([\w\-\.]+)‘. The metacharacter ‘\w’ matches the alphanumeric character.
In the second approach, we are extracting the protocol and hostname for URLs having port numbers also with it. The metacharacter ‘\d’ matches the numeric character and ‘?’ is used for the optional occurrences.
0 Comments