javaCC入门教程-1、匹配括号

1、配置javacc环境变量

将javacc的路径添加到系统变量Path，D:\java源码包\javacc\javacc-5.0\bin
在这里插入图片描述

测试javacc命令

cmd模式下j输入javacc测试
在这里插入图片描述

Simple1.jj 文件内容如下：


options {
  LOOKAHEAD = 1;
  CHOICE_AMBIGUITY_CHECK = 2;
  OTHER_AMBIGUITY_CHECK = 1;
  STATIC = true;
  DEBUG_PARSER = false;
  DEBUG_LOOKAHEAD = false;
  DEBUG_TOKEN_MANAGER = false;
  ERROR_REPORTING = true;
  JAVA_UNICODE_ESCAPE = false;
  UNICODE_INPUT = false;
  IGNORE_CASE = false;
  USER_TOKEN_MANAGER = false;
  USER_CHAR_STREAM = false;
  BUILD_PARSER = true;
  BUILD_TOKEN_MANAGER = true;
  SANITY_CHECK = true;
  FORCE_LA_CHECK = false;
}

PARSER_BEGIN(Simple1)

/** Simple brace matcher. */
public class Simple1 {

  /** Main entry point. */
  public static void main(String args[]) throws ParseException {
    Simple1 parser = new Simple1(System.in);
    parser.Input();
  }

}

PARSER_END(Simple1)

/** Root production. */
void Input() :
{}
{
  MatchedBraces() ("\n"|"\r")* <EOF>
}

/** Brace matching production. */
void MatchedBraces() :
{}
{
  "{" [ MatchedBraces() ] "}"
}

测试步骤

1、通过javacc命令生产一群java文件，该文件可以进行转换和词法分析

javacc Simple1.jj

2、编译java文件

javac *.java

3、执行词法转换器parser

java Simple1

测试案例

% java Simple1
{{}}<return>
<control-d>
%

% java Simple1
{x<return>
Lexical error at line 1, column 2.  Encountered: "x"
TokenMgrError: Lexical error at line 1, column 2.  Encountered: "x" (120), after : ""
        at Simple1TokenManager.getNextToken(Simple1TokenManager.java:146)
        at Simple1.getToken(Simple1.java:140)
        at Simple1.MatchedBraces(Simple1.java:51)
        at Simple1.Input(Simple1.java:10)
        at Simple1.main(Simple1.java:6)
%

% java Simple1
{}}<return>
ParseException: Encountered "}" at line 1, column 3.
Was expecting one of:
    <EOF> 
    "\n" ...
    "\r" ...

        at Simple1.generateParseException(Simple1.java:184)
        at Simple1.jj_consume_token(Simple1.java:126)
        at Simple1.Input(Simple1.java:32)
        at Simple1.main(Simple1.java:6)
%

功能介绍

这个是javacc 语法程序，可以匹配左右括号，最后输入0获取多个空行结束程序。

合法的语法例子如下：
“{}”, “{{{{{}}}}}”

非法例子如下：
“{{{{”, “{}{}”, “{}}”, “{{}{}}”, 等等

括号 […]
在JavaCC输入文件中指示…是可选的。

[…]也可以写成（…）？这两种形式是等价的。
可能出现在扩展中的其他结构是：
e1 | e2 | e3 | …：e1，e2，e3等的选择
（e）+：e的一次或多次出现
（e）*：零次或多次出现e

案例2-Simple2.jj

Simple2.jj是对Simple1.jj的一个小修改，允许空格
角色中间插入的角色。所以然后输入这样的
如：

“{{} \ n} \ n \ n”

现在是合法的。

这个文件和Simple1.jj之间的另一个区别就是这个
文件包含词法规范 - 以…开头的区域
“跳跃”。在这个区域内有4个正则表达式 - 空格，制表符，
换行，并返回。这说明这些常规比赛
表达式将被忽略（并不考虑解析）。于是
只要遇到这4个字符中的任何一个，它们就是
扔掉了。

除了SKIP之外，JavaCC还有其他三个词法规范
区域。这些是：

TOKEN：用于指定词法标记（参见下一个示例）
SPECIAL_TOKEN：用于指定要使用的词法标记
在解析期间被忽略。从这个意义上讲，SPECIAL_TOKEN是
与SKIP相同。但是，这些令牌可以被恢复
在解析器操作中要进行适当处理。
MORE：这指定了部分令牌。完整的令牌是
由一系列MORE组成，后跟一个TOKEN
或SPECIAL_TOKEN。

您可以构建Simple2并使用来自的输入调用生成的解析器
键盘作为标准输入。

javacc -debug_parser Simple2.jj
javac Simple2*.java
java Simple2

javacc -debug_token_manager Simple2.jj
javac Simple2*.java
java Simple2

请注意，debug_token_manager 调试会产生大量诊断信息
信息，它通常用于查看单个调试跟踪
一次令牌。

Simple2.jj文件内容如下：

/* Copyright (c) 2006, Sun Microsystems, Inc.
 * All rights reserved.
 * 
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions are met:
 * 
 *     * Redistributions of source code must retain the above copyright notice,
 *       this list of conditions and the following disclaimer.
 *     * Redistributions in binary form must reproduce the above copyright
 *       notice, this list of conditions and the following disclaimer in the
 *       documentation and/or other materials provided with the distribution.
 *     * Neither the name of the Sun Microsystems, Inc. nor the names of its
 *       contributors may be used to endorse or promote products derived from
 *       this software without specific prior written permission.
 * 
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
 * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
 * THE POSSIBILITY OF SUCH DAMAGE.
 */


PARSER_BEGIN(Simple2)

/** Simple brace matcher. */
public class Simple2 {

  /** Main entry point. */
  public static void main(String args[]) throws ParseException {
    Simple2 parser = new Simple2(System.in);
    parser.Input();
  }

}

PARSER_END(Simple2)

SKIP :
{
  " "
| "\t"
| "\n"
| "\r"
}

/** Root production. */
void Input() :
{}
{
  MatchedBraces() <EOF>
}

/** Brace matching production. */
void MatchedBraces() :
{}
{
  "{" [ MatchedBraces() ] "}"
}

案例3-Simple3.jj

Simple3.jj是我们匹配括号的第三个也是最终版本探测器。此示例说明了TOKEN区域的用法指定词法标记。在这种情况下，“{”和“}”被定义为代币和名称分别为LBRACE和RBRACE。这些标签然后可以在尖括号内使用（如示例中所示）来引用这个标记。通常使用这种令牌规范复杂的标记，如标识符和文字。令牌是简单的字符串保留原样（在前面的例子中）。

此示例还说明了语法中的操作的使用制作。此示例中插入的操作计算数量匹配括号。注意使用声明区域来声明变量“count”和“nested_count”。另请注意非终端如何“MatchedBraces”将其值作为函数返回值返回。

/* Copyright (c) 2006, Sun Microsystems, Inc.
 * All rights reserved.
 * 
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions are met:
 * 
 *     * Redistributions of source code must retain the above copyright notice,
 *       this list of conditions and the following disclaimer.
 *     * Redistributions in binary form must reproduce the above copyright
 *       notice, this list of conditions and the following disclaimer in the
 *       documentation and/or other materials provided with the distribution.
 *     * Neither the name of the Sun Microsystems, Inc. nor the names of its
 *       contributors may be used to endorse or promote products derived from
 *       this software without specific prior written permission.
 * 
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
 * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
 * THE POSSIBILITY OF SUCH DAMAGE.
 */


PARSER_BEGIN(Simple3)

/** Simple brace matcher. */
public class Simple3 {

  /** Main entry point. */
  public static void main(String args[]) throws ParseException {
    Simple3 parser = new Simple3(System.in);
    parser.Input();
  }

}

PARSER_END(Simple3)

SKIP :
{
  " "
| "\t"
| "\n"
| "\r"
}

TOKEN :
{
  <LBRACE: "{">
| <RBRACE: "}">
}

/** Root production. */
void Input() :
{ int count; }
{
  count=MatchedBraces() <EOF>
  { System.out.println("The levels of nesting is " + count); }
}

/** Brace counting production. */
int MatchedBraces() :
{ int nested_count=0; }
{
  <LBRACE> [ nested_count=MatchedBraces() ] <RBRACE>
  { return ++nested_count; }
}

案例4-IdList.jj

此示例说明了SKIP的一个重要属性规格。需要注意的要点是正则表达式在SKIP规范中，只有在Token之间忽略而不是
between tokens。该语法接受任何标识符序列中间有空白区域。

该语法的合法输入是：

“abc xyz123 A B C \ t \ n aaa”

这是因为允许任意数量的SKIP正则表达式在连续之间。但是，以下不合法输入：

“xyz 123”

这是因为“xyz”之后的空格字符在SKIP中类别因此导致一个标记结束而另一个标记开始。这要求“123”是单独的标记，因此不匹配语法。

如果中的空格正常，那么所有人必须做的就是替换Id的定义为：

TOKEN：
{
  <Id：[“a” - “z”，“A” - “Z”]（（“”）* [“a” - “z”，“A” - “Z”，“0” - “9” ]）*>
}

请注意，在TOKEN规范中包含空格字符并不意味着空格字符不能在SKIP中使用规格。所有这一切都意味着任何空间角色
出现在可以放在标识符中的上下文中将参加的比赛，而所有其他空间字符将被忽略。匹配算法的细节是在网页的JavaCC文档中描述。

作为必然结果，必须将令牌定义为其中的任何内容不得出现空白字符等字符。在里面如上所示，如果被定义为语法生成而不是如下所示的词汇标记，然后是“xyz 123”已被公认为合法（错误地）。

void Id（）：
{}
{
<[“a” - “z”，“A” - “Z”]>（<[“” - “z”，“A” - “Z”，“0” - “9”]>）*
}

注意，在上述非终端Id的定义中，它由一系列单个字符标记（注意<…> s的位置），因此在这些角色之间允许有空格。

/* Copyright (c) 2006, Sun Microsystems, Inc.
 * All rights reserved.
 * 
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions are met:
 * 
 *     * Redistributions of source code must retain the above copyright notice,
 *       this list of conditions and the following disclaimer.
 *     * Redistributions in binary form must reproduce the above copyright
 *       notice, this list of conditions and the following disclaimer in the
 *       documentation and/or other materials provided with the distribution.
 *     * Neither the name of the Sun Microsystems, Inc. nor the names of its
 *       contributors may be used to endorse or promote products derived from
 *       this software without specific prior written permission.
 * 
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
 * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
 * THE POSSIBILITY OF SUCH DAMAGE.
 */

PARSER_BEGIN(IdList)


/** ID lister. */
public class IdList {

  /** Main entry point. */
  public static void main(String args[]) throws ParseException {
    IdList parser = new IdList(System.in);
    parser.Input();
  }

}

PARSER_END(IdList)

SKIP :
{
  " "
| "\t"
| "\n"
| "\r"
}

TOKEN :
{
  < Id: ["a"-"z","A"-"Z"] ( ["a"-"z","A"-"Z","0"-"9"] )* >
}

/** Top level production. */
void Input() :
{}
{
  ( <Id> )+ <EOF>
}

案例5-NL_Xlator.jj

这个例子详细介绍了编写正则表达式JavaCC语法文件。它还说明了一个稍微复杂的集合转换语法描述的表达式的动作
英文

上面例子中的新概念是使用更复杂的常用表达。正则表达式：

<ID：[“a” - “z”，“A” - “Z”，“”]（[“a” - “z”，“A” - “Z”，“”，“0” - “9”]）*>

创建一个名为ID的新正则表达式。这可以在语法中的任何其他地方简单地称为。接下来是什么方括号是一组允许的字符 - 在这种情况下它是任何大写或小写字母或下划线。这是然后是0或更多次出现的任何大写或小写
字母，数字或下划线。

可能出现在正则表达式中的其他构造是：

（…）+：一次或多次…
（…）？：可选的出现…（注意在这种情况下
词汇标记，（…）？和[…]不等同）
（r1 | r2 | …）：r1，r2中的任何一个，…

形式[…]的构造是一个与之匹配的模式在…中指定的字符。这些角色可以是个人的字符或字符范围。在该构造之前的“〜”是a匹配任何未在…中指定的字符的模式。因此：
[“a” - “z”]匹配所有小写字母
〜[]匹配任何字符
〜[“\ n”，“\ r”]匹配除新行字符以外的任何字符

在扩展中使用正则表达式时，它的值为键入“令牌(Token)”。这将生成到生成的解析器目录中作为“Token.java”。在上面的例子中，我们定义了一个变量键入“Token”并为其分配正则表达式的值。

/* Copyright (c) 2006, Sun Microsystems, Inc.
 * All rights reserved.
 * 
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions are met:
 * 
 *     * Redistributions of source code must retain the above copyright notice,
 *       this list of conditions and the following disclaimer.
 *     * Redistributions in binary form must reproduce the above copyright
 *       notice, this list of conditions and the following disclaimer in the
 *       documentation and/or other materials provided with the distribution.
 *     * Neither the name of the Sun Microsystems, Inc. nor the names of its
 *       contributors may be used to endorse or promote products derived from
 *       this software without specific prior written permission.
 * 
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
 * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
 * THE POSSIBILITY OF SUCH DAMAGE.
 */

PARSER_BEGIN(NL_Xlator)

/** New line translator. */
public class NL_Xlator {

  /** Main entry point. */
  public static void main(String args[]) throws ParseException {
    NL_Xlator parser = new NL_Xlator(System.in);
    parser.ExpressionList();
  }

}

PARSER_END(NL_Xlator)

SKIP :
{
  " "
| "\t"
| "\n"
| "\r"
}

TOKEN :
{
  < ID: ["a"-"z","A"-"Z","_"] ( ["a"-"z","A"-"Z","_","0"-"9"] )* >
|
  < NUM: ( ["0"-"9"] )+ >
}

/** Top level production. */
void ExpressionList() :
{
	String s;
}
{
	{
	  System.out.println("Please type in an expression followed by a \";\" or ^D to quit:");
	  System.out.println("");
	}
  ( s=Expression() ";"
	{
	  System.out.println(s);
	  System.out.println("");
	  System.out.println("Please type in another expression followed by a \";\" or ^D to quit:");
	  System.out.println("");
	}
  )*
  <EOF>
}

/** An Expression. */
String Expression() :
{
	java.util.Vector termimage = new java.util.Vector();
	String s;
}
{
  s=Term()
	{
	  termimage.addElement(s);
	}
  ( "+" s=Term()
	{
	  termimage.addElement(s);
	}
  )*
	{
	  if (termimage.size() == 1) {
	    return (String)termimage.elementAt(0);
          } else {
            s = "the sum of " + (String)termimage.elementAt(0);
	    for (int i = 1; i < termimage.size()-1; i++) {
	      s += ", " + (String)termimage.elementAt(i);
	    }
	    if (termimage.size() > 2) {
	      s += ",";
	    }
	    s += " and " + (String)termimage.elementAt(termimage.size()-1);
            return s;
          }
	}
}

/** A Term. */
String Term() :
{
	java.util.Vector factorimage = new java.util.Vector();
	String s;
}
{
  s=Factor()
	{
	  factorimage.addElement(s);
	}
  ( "*" s=Factor()
	{
	  factorimage.addElement(s);
	}
  )*
	{
	  if (factorimage.size() == 1) {
	    return (String)factorimage.elementAt(0);
          } else {
            s = "the product of " + (String)factorimage.elementAt(0);
	    for (int i = 1; i < factorimage.size()-1; i++) {
	      s += ", " + (String)factorimage.elementAt(i);
	    }
	    if (factorimage.size() > 2) {
	      s += ",";
	    }
	    s += " and " + (String)factorimage.elementAt(factorimage.size()-1);
            return s;
          }
	}
}

/** A Factor. */
String Factor() :
{
	Token t;
	String s;
}
{
  t=<ID>
	{
	  return t.image;
	}
|
  t=<NUM>
	{
	  return t.image;
	}
|
  "(" s=Expression() ")"
	{
	  return s;
	}
}

javaCC教程-1、简单语法解析案例