Jsoup doesn't work properly with encoded link containing non-letter characters

Zastrix Arundell :

I'm creating a discord bot for an online game and one of the bots' features is using a webcrawler to get item info.

My problem is that when I use an UTF-8 encoded URL Jsoup doesn't work for some reason.

I did try to iterate through all of the elements with the same class name but that doesn't work at all. It looks like the class is fully absent there.

String url = "http://coryn.club/item.php?name=";

StringBuilder stringBuilder = new StringBuilder();
stringBuilder.append(arguments.get(0));

for (int i = 1; i < arguments.size(); i++)
    stringBuilder.append(" ").append(arguments.get(i));

url = url + URLEncoder.encode(stringBuilder.toString(), "UTF-8");
System.out.println(url);
Document document = Jsoup.connect(url).get();
Element table = document.getElementsByClass("table table-striped").first();
System.out.println(table == null ? "Table is null" : "Table is not null"); //returns that the table is null only on the %27 link

For instance, the url: http://coryn.club/item.php?name=dark+general would totally work but the url http://coryn.club/item.php?name=dark+general%27s will not. The only difference is the %27 near the end.

I do get a null value with the element of class "table table-striped".

Just to note that I use the same code in both URL's but only the first one does work.

Also to note that if you do open the page in a browser it will work and you will still see the HTML data with inspect element.

Pshemo :

It looks like if you use raw (not-encoded) query data like

String url = "http://coryn.club/item.php?name=dark general's";

you will get correct results.

This suggests that Jsoup encodes those parameters on its own, which means that if you use data in form dark+general%27s it will be encoded again causing final URL to contain dark%2Bgeneral%2527s.

Because of that server after decoding it will see value of name as dark+general%27s NOT as dark general's and will search and fail to find result matching it. Because of that there will be no result table in returned HTML.

So don't encode your data, let Jsoup do it for you.


BTW: you can also change your code into more (IMO) readable version

Document document = Jsoup
        .connect("http://coryn.club/item.php")
        .data("name", stringBuilder.toString()) //query parameters - don't encode manually
        .get();

Notice that stringBuilder.toString() is not encoded by us, it contains raw data like dark general's.


BTW 2: if arguments is defined to contain CharacterSequence like String, for instance List<String> since Java 8 instead of

StringBuilder stringBuilder = new StringBuilder();
stringBuilder.append(arguments.get(0));

for (int i = 1; i < arguments.size(); i++)
    stringBuilder.append(" ").append(arguments.get(i));

you can use

String joined = String.join(" ", arguments);

or

String joined = arguments.stream().collect(Collectors.joining(" "));

More info: Java equivalent of PHP's implode(',' , array_filter( array () ))

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=78770&siteId=1