I'm parsing an HTML by jSoup and get the bellow output. the text is splitted by annotators into segments and marked each section by |||. So, I need to retrieve each segment. enter image description here
File input = new File("C:\\Test\\aaa.html");
Document doc = Jsoup.parse(input, "UTF-8", "");
Element body = doc.body();
String body2 = body.toString();
String[] test = body2.split("|||");
for (String s:test)
System.out.print(s+"111111111");
output: 11111111 111111111<111111111b111111111r111111111>111111111|111111111|111111111|111111111<111111111s111111111t111111111r111111111o111111111n111111111g111111111>111111111 111111111B111111111u111111111s111111111i111111111n111111111e111111111s111111111s111111111 111111111T111111111r111111111a111111111n111111111s111111111f111111111e111111111r111111111s111111111 111111111:111111111 111111111<111111111/111111111s111111111t111111111r111111111o111111111n111111111g111111111>111111111 111111111A111111111s111111111 111111111w111111111e111111111 111111111c111111111o111111111n111111111t111111111i111111111n111111111u111111111e111111111 111111111t111111111o111111111 111111111d111111111e111111111v111111111e111111111l111111111o111111111p111111111 111111111o111111111u111111111r111111111 111111111b111111111u111111111s111111111i111111111n111111111e111111111s111111111s111111111,111111111 111111111w111111111e111111111 111111111m111111111i111111111g111111111h111111111t111111111 111111111s111111111e111111111l111111111l111111111 111111111o111111111r111111111 111111111b111111111u111111111y111111111 111111111a111111111d111111111d111111111i111111111t111111111i111111111o111111111n111111111a111111111l111111111
I am just guessing, but I think you are looking for something like this:
String s = "cheese|||bread";
String[] splits = s.split("\\|\\|\\|");
for (String split : splits) {
System.out.println(split);
}
Output:
cheese
bread
Implemented in your code:
File input = new File("C:\\Test\\aaa.html");
Document doc = Jsoup.parse(input, "UTF-8", "");
Element body = doc.body();
String body2 = body.toString();
String[] test = body2.split("\\|\\|\\|");
for (String s:test)
System.out.print(s+"111111111");