Getting a specific text from html using BeautifulSoup

Issue

I have this .html code:

<div id="content">
            <ul id="tree">
                <li xmlns="" class="level top failed open">
                    <span><em class="time">
                            <div class="time">1.89 s</div>
                        </em>I need to get this text</span>

I need to get only the text that is outside all of the other tags (text is: I need to get this text).

I was trying to use this piece of code:

path = document.find('li', class_='level top').find_all("em")[-1].next_sibling
if not path:
    path = document.find('li', class_='level top failed open').find_all("em")[-1].next_sibling
return path

But I get an error: AttributeError: ‘NoneType’ object has no attribute ‘find_all’.

Does anybody know how to access this text?

Thank you!

Solution

You can apply .contents and it will generate a list of output and the desired one is [-1]

html = '''
<div id="content">
 <ul id="tree">
  <li class="level top failed open" xmlns="">
   <span>
    <em class="time">
     <div class="time">
      1.89 s
     </div>
    </em>
    I need to get this text
   </span>
  </li>
 </ul>
</div>

'''

from bs4 import BeautifulSoup
soup=BeautifulSoup(html,'html.parser')
#print(soup.prettify())

txt= soup.select_one('#tree > li > span').contents[-1]
print(txt)

Output:

  I need to get this text

Answered By – F.Hoque

Answer Checked By – Pedro (AngularFixing Volunteer)

Leave a Reply

Your email address will not be published.