java can not read a line from file

ayyoob imani :

I'm reading a file with the following piece of code:

 Scanner in = new Scanner(new File(fileName));
    while (in.hasNextLine()) {
        String[] line = in.nextLine().trim().split("[ \t]");
       .
       .
       .
    }

When I open the file with the vim, some lines begin with the following special character:

enter image description here

but the java code can't read these lines. When it reaches these lines it thinks that it's the end of the file and hasNextLine() function returns false!!

EDIT: this is the hex dump of the mentioned (problematic) line:

0000000: e280 9c20 302e 3230 3133 3220 302e 3231 ... 0.20132 0.21 0000010: 3431 392d 302e 3034 0a 419-0.04.

that other guy :

@VGR got it right.

tl;dr: Use Scanner in = new Scanner(new File(fileName), "ISO-8859-1");

What appears to be happening is that:

  • Your file is not valid UTF-8 due to that lone 0x9C character.
  • The Scanner is reading the file as UTF-8 since this is the system default
  • The underlying libraries throw a MalformedInputException
  • The Scanner catches and hides it (a well meaning but misguided design decision)
  • It starts reporting that it has no more lines
  • You won't know anything's gone wrong unless you actually ask the Scanner

Here's a MCVE:

import java.io.*;
import java.util.*;

class Test {
  public static void main(String[] args) throws Exception {
    Scanner in = new Scanner(new File(args[0]), args[1]);
    while (in.hasNextLine()) {
      String line = in.nextLine();
      System.out.println("Line: " + line);
    }
    System.out.println("Exception if any: " + in.ioException());
  }
}

Here's an example of a normal invocation:

$ printf 'Hello\nWorld\n' > myfile && java Test myfile UTF-8
Line: Hello
Line: World
Exception if any: null

Here's what you're seeing (except that you don't retrieve and show the hidden exception). Notice in particular that no lines are shown:

$ printf 'Hello\nWorld \234\n' > myfile && java Test myfile UTF-8
Exception if any: java.nio.charset.MalformedInputException: Input length = 1

And here it is when decoded as ISO-8859-1, a decoding in which all byte sequences are valid (even though 0x9C has no assigned character and therefore doesn't show up in a terminal):

$ printf 'Hello\nWorld \234\n' > myfile && java Test myfile ISO-8859-1
Line: Hello
Line: World
Exception if any: null

If you're only interested in ASCII data and don't have any UTF-8 strings, you can simply ask the scanner to use ISO-8859-1 by passing it as a second parameter to the Scanner constructor:

Scanner in = new Scanner(new File(fileName), "ISO-8859-1");

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=102968&siteId=1