how to replace HTML codes in HTML file using python?


I’m trying to replace all HTML codes in my HTML file in a for Loop (not sure if this is the easiest approach) without changing the formatting of the original file. When I run the code below I don’t get the codes replaced. Does anyone know what could be wrong?

import re
tex=open('ALICE.per-txt.txt', 'r')

for i in tex:
  if i =='õ':
  elif i == 'ç':

with open('Alice1.replaced.txt', "w") as f:


You can use html.unescape.

>>> import html
>>> html.unescape('õ')

With your code:

import html

with open('ALICE.per-txt.txt', 'r') as f:
    html_text =

html_text = html.unescape(html_text)

with open('ALICE.per-txt.txt', 'w') as f:

Please note that I opened the files with a with statement. This takes care of closing the file after the with block – something you forgot to do when reading the file.

Answered By – Matthias

Answer Checked By – Marie Seifert (AngularFixing Admin)

