Exclude non wanted html from Simple Html Dom - PHP

Faisal Shani :

I am using HTML Simple Dom Parser with PHP to get title, description and images from a website. The issue I am facing is I am getting the html which I dont want and how to exclude those html tags. Below is the explanation.

Here is a sample html structure which is being parsed.

<div id="product_description">
<p> Some text</p>
<ul>
<li>value 1</li>
<li>value 2</li>
<li>value 3</li>
</ul>

// the div I dont want
<div id="comments">
<h1> Some Text </h1>
</div>

</div>

I am using below php script to parse,

foreach($html->find('div#product_description') as $description)
{
    echo $description->outertext ;
    echo "<br>";
}

The above code parses everything inside the div with id "product_description". What I want to exclude the div with Id "comments". I tried to convert this into string and then used substr to exclude the last character but thats not working. Dont know why. Any idea about how can I do this? Any approach that will allow me to exclude the div from parsed html will work. Thanks

Nima :

You can remove the elements you don't want by setting their outertext = '':

$src =<<<src
<div id="product_description">
    <p> Some text</p>
    <ul>
        <li>value 1</li>
        <li>value 2</li>
        <li>value 3</li>
    </ul>

    <!-- the div I don't want -->                                                                                                                                        
    <div id="comments">
        <h1> Some Text </h1>
    </div>

</div>
src;

$html = str_get_html($src);

foreach($html->find('#product_description') as $description)
{
    $comments = $description->find('#comments', 0); 
    $comments->outertext = ''; 
    print $description->outertext ;
}

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=398368&siteId=1