Parsing a list of <li> elements organized in two columns within a row(cell) with BeautifulSoup & Python

Ovidiu Diaconu :
<div id="b_detalii_caracteristici" class="margin-boxes"> <h2 class="titlu-box special-caracteristici">Caracteristici</h2> <div class="row"> <div class="col-lg-6 col-md-6 col-sm-6"> <ul class="lista-tabelara"> <li>Nr. camere:<span>2</span></li> <li>Suprafaţă utilă:<span>44 mp</span></li> <li>Suprafaţă construită:<span>44 mp</span></li> <li>Compartimentare:<span>decomandat</span></li> <li>Confort:<span>lux</span></li> <li>Etaj:<span>Etaj 1 / 8</span></li> <li>Nr. bucătării:<span>1</span></li> <li>Nr. băi:<span>1</span></li> </ul> </div> <div class="col-lg-6 col-md-6 col-sm-6"> <ul class="lista-tabelara mobile-list"> <li>An construcţie:<span>2019</span></li> <li>Structură rezistenţă:<span>beton</span></li> <li>Tip imobil:<span>bloc de apartamente</span></li> <li>Regim înălţime:<span>P+8E</span></li> <li>Nr. balcoane:<span>1</span></li> </ul> </div> </div></div>

being given the above structure: I need to find a way to parse it and store in separate variables, each of the li values: i.e.

if string = "Nr. camere:":
  var1 = 2
elsif string = "Suprafata utila:":
  var2 = 44mp

and so on...

i have tried:

property_detail.find_all('div', id="b_detalii_caracteristici")[0].find_all('ul', class_='lista-tabelara')[0].find_all("li")[0]

and, this will give me next results I would need to parse in a for loop: enter image description here

but, I'm stuck in here. Thanks for the support.

Ahmed Soliman :

There is a very useful method for that called contents which returns a list contains a tag’s children:

from bs4 import BeautifulSoup 
html = '''<div id='b_detalii_caracteristici'>
    <ul class="lista-tabelara">
        <li>
            "Nr. camere:"
            <span>2</span>
        </li>
        <li>
            "Suprafata utila:"
            <span>44mp</span>
        </li>
    </ul>
</div>'''
soup = BeautifulSoup(html, 'html.parser')        
lis = soup.select('#b_detalii_caracteristici ul.lista-tabelara li')
for li in lis:
        li_content = li.contents
        li_text = li_content[0].strip()
        span_text = li_content[1].text
        print('li_content ==> ',li_content)
        print('li_text ==> ',li_text)
        print('span_text ==>',span_text)

Output:

li_content ==>  ['\n            "Nr. camere:"\n            ', <span>2</span>, '\n']
li_text ==>  "Nr. camere:"
span_text ==> 2
li_content ==>  ['\n            "Suprafata utila:"\n            ', <span>44mp</span>, '\n']
li_text ==>  "Suprafata utila:"
span_text ==> 44mp

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=368701&siteId=1