Program to remove duplicate lines form text file using Python

Introduction

The task is to remove duplicate lines from text file. This can be useful when we need to remove duplicate lines in large files.

Program to remove duplicate lines form text file using Python

Program

openFile = open("demo.txt", "r") 
writeFile = open("updatedFile", "w") 
#Store traversed lines
tmp = set() 
for txtLine in openFile: 
#Check new line
    if txtLine not in tmp: 
        writeFile.write(txtLine) 
#Add new traversed line to tmp 
        tmp.add(txtLine)         
openFile.close() 
writeFile.close()

Output

Program to remove duplicate lines form text file using Python Output

Explanation

Approach:

  • Open the input file in read mode and output file in write mode to store the output.
  • Iterate over the lines of input file and check whether the line exists in the set(). If the line is not found in the set(), add the line to set and then add the line to output file.
  • If the line is found in the set(), skip the next process and move to next line of file.
  • Repeat the above steps.
  • Save and close the output file.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.