Java is said to ignore extra whitespace. Why does c=a++ + ++b not compile without the spaces?

John Allison :

In all books on Java, I've read that the compiler treats all whitespace in the same way and simply ignores extra whitespace, so it's best practice to use them liberally to improve code readability. I've found proof to that in every expression that I've written: It didn't matter whether there were spaces or not, and how many (or maybe I just didn't pay attention).

Recently I decided to experiment a little with operator precedence and associativity to test the precedence table in action and tried to compile

int a = 2;
int b = 3;    
int c = a+++b;
int d = a+++++b;

While the former statement compiled perfectly, the latter produced an exception:

Exception in thread "main" java.lang.RuntimeException: Uncompilable source code - unexpected type. Required: variable. Found: value.

However, when I added spaces: int d = a++ + ++b, it compiled. Why is this the case? Java is said to ignore extra whitespace anyway. (I have Java 8 and Netbeans IDE 8.2, if this matters.)

I guess this might have something to do with how expressions are parsed, but I'm not sure. I tried looking up several questions on parsing, whitespace, and operators on SO and on Google but couldn't find a definitive answer.

UPD. To address the comments that it's the 'extra' that matters, not all whitespace: since int c = a++ + b; and int c=a+++b; both compile, one could say, by analogy, that in int d = a ++ + ++b; whitespace is 'extra' as well.

Daniel Pryden :

Java Language Specification section 3.2, "Lexical Translations", says (emphasis mine):

A raw Unicode character stream is translated into a sequence of tokens, using the following three lexical translation steps, which are applied in turn:

  1. A translation of Unicode escapes [...]

  2. A translation [...] into a stream of input characters and line terminators [...].

  3. A translation of the stream of input characters and line terminators resulting from step 2 into a sequence of input elements (§3.5) which, after white space (§3.6) and comments (§3.7) are discarded, comprise the tokens (§3.5) that are the terminal symbols of the syntactic grammar (§2.3).

The longest possible translation is used at each step, even if the result does not ultimately make a correct program while another lexical translation would.

So white space characters are discarded, but after the "sequence of input elements" is determined. Section 3.5, "Input Elements and Tokens", says:

White space (§3.6) and comments (§3.7) can serve to separate tokens that, if adjacent, might be tokenized in another manner. For example, the ASCII characters - and = in the input can form the operator token -= (§3.12) only if there is no intervening white space or comment.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=71898&siteId=1