Program for parsing and processing the given URL using regex

Introduction

A URL contains three to four components namely scheme, hostname, path and query string. The task is to find the protocol and hostname for the given URL.

Program

parsing and processing the given URL

Approach 1

import re   

ip_url = input("Enter the url: ")

protocol = re.findall('(\w+)://', ip_url) 
print("Protocol: ", protocol) 

hostname = re.findall('://www.([\w\-\.]+)', ip_url) 
print("Hostname: ", hostname)

Output

parsing and processing the given URL Output

Approach 2

import re   

ip_url = input("Enter the url: ")

file = re.findall('(\w+)://', s)   
print("Protocol: ", file) 

hostname = re.findall('://([\w\-\.]+)(:(\d+))?', ip_url) 
print("Hostname: ", hostname)

Output

parsing and processing the given URL Output 2

Explanation

In first approach, the regex expression for extracting protocol is ‘(\w+)://‘ and for extracting hostname is ‘://([\w\-\.]+)‘. The metacharacter ‘\w’ matches the alphanumeric character.

In the second approach, we are extracting the protocol and hostname for URLs having port numbers also with it. The metacharacter ‘\d’ matches the numeric character and ‘?’ is used for the optional occurrences.

Author

  • Barry Allen

    A Full Stack Developer with 10+ years of experience in different domain including SAP, Blockchain, AI and Web Development.

    View all posts

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.