solving the xpath property error even though xpath looks good

mvsr :

I tried to extract the version info from the webpage but I am getting an error even though XPath looks good on the HTML page.

code I tried is

use DOMDocument;
use DOMXPath;
function getVersionFromDownloads(string $url): string
{
    // support only windows
    $content = $this->fetch($url);
    $curl = curl_init($url);

    curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($curl, CURLOPT_FRESH_CONNECT, true);
    $content = curl_exec($curl);
    curl_close($curl);

    $dom = new DOMDocument();
    @$dom->loadHTML($content);

    $xpath = new DOMXPath($dom);

    $result = $xpath->query("//a[contains(text(),'paint.net')]");

    $header = $result->item(0)->textContent;
    echo $header;

}
getVersionFromDownloads('https://www.dotpdn.com/downloads/pdn.html');

The desired result is 4.2.10

when I checked in the HTML page XPath looks to be good and it is showing the correct element. but when I tried to extract the text content it gives an error.

error given statement: $header = $result->item(0)->textContent;

mickmackusa :

While testing my solutions, I was getting a lot of DOM errors using $dom->load(). You can see all of the invalid markup using an online html validator such as https://www.freeformatter.com/html-validator.html. This program was barking about many minor decprecations and then a few notable items like:

Malformed byte sequence: “a9”.

and

Malformed byte sequence: “ae”.

When I tried to script my own php code with $dom->loadHTML()...

$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->load('https://www.dotpdn.com/downloads/pdn.html');
$xpath = new DOMXPath($dom);
return libxml_get_last_error();

I printed the return value to screen with var_export() to see:

LibXMLError::__set_state(array( 'level' => 3, 'code' => 77, 'column' => 8, 'message' => 'Premature end of data in tag html line 1 ', 'file' => 'https://www.dotpdn.com/downloads/pdn.html', 'line' => 153, ))


RECOMMENDED

So instead of using load(), I decided to try file_get_contents($url) to get the source code and feed it to the DOM parser.

function getVersionFromDownloads($url)
{
    $dom = new DOMDocument();
    $dom->loadHTML(file_get_contents($url));
    $xpath = new DOMXPath($dom);
    $text = $xpath->query("//a[contains(text(),'paint.net')]")->item(0)->textContent;
    return preg_replace('/paint\.net\s+/', '', $text);
}
var_export(getVersionFromDownloads('https://www.dotpdn.com/downloads/pdn.html'));

Output:

'4.2.10'
  • To remove the single quotes, use echo instead of var_export(). I only used it to demonstrate that there is no leading or trailing whitespace.

  • preg_replace() is used before returning so that paint.net followed by multiple consecutive whitespaces inside the string are removed.

  • For the record, this extraction technique will work the same:

     $xpath->query("//a[contains(text(),'paint.net')]/text()")->item(0)->nodeValue;
    
  • In your:

     function getVersionFromDownloads(string $url): string
    

    the : string demands that a string value is returned from your function, but you are merely echoing -- be sure to return a string value.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=220269&siteId=1