Use pegjs to parse java code
What is pegjs
pegjs is an implementation of peg grammar, peg grammar is a kind of analytic expression grammar, its specific analytic formula is very similar to commonly used regular expressions, it should be noted that peg does not allow parsing to be ambiguous.
pegjs official website https://pegjs.org/
The role of pegjs
When regular matching cannot be achieved or is difficult, you can choose pegjs to handle parsing, such as the parsing of sql statements. It is also very convenient to write custom rules when constructing dsl.
Simple application of pegjs
1. Take the parsing of a piece of java code as an example, prepare a piece of java code that needs to be parsed
class Test {
@tag(1)
@label("名字")
String name;
@tag(2)
@label("性别 0-男 1-女")
Int sex;
}
2. Define the root node, including the type, version and code block array, use * to indicate multiple
CodeBlock = blocks:IdlBlock* {
return {
type: 'javaSchema',
version: '1.0.0',
blocks
}
}
3. Use _ to indicate white space, match the class keyword, use the Identifier rule to match the class name and assign it to className, then return, and
_ '{' children:Children* '}'
parse the sub-nodes between the two curly braces after the white space
IdlBlock =
_ 'class'
_ className:Identifier
_ '{' children:Children* '}'
_ {
return {
className,
children
}
}
_ "whitespace" = [ \t\r\n]*
Identifier = $([a-zA-Z_])+
4. Resolve variables in child nodes
Children =
_ variable:Variable';'
_ {
return {
variable
}
}
5. Define variable resolution rules
Variable =
_ tag:Tag?
_ label:Label?
_ type:DataType?
_ name:Identifier?
_ {
return {
tag,
label,
type,
name
}
}
6. Parse @tag(1)
and return the parameters
Tag = '@tag('tagColumn:TagColumn')' {
return tagColumn
}
TagColumn = $([0-9])*
7. Parse @label("名字")
and return the description information in the parameters
Label = '@label("'labelColumn:LabelColumn'")' {
return labelColumn
}
LabelColumn = $([^\r\n\t\"\)])*
8. Analyze the types of variables, here only two types used in the code block are defined, and more types can be extended
DataType = 'String' / 'Int'
Complete example
CodeBlock = blocks:IdlBlock* {
return {
type: 'javaSchema',
version: '1.0.0',
blocks
}
}
IdlBlock =
_ 'class'
_ className:Identifier
_ '{' children:Children* '}'
_ {
return {
className,
children
}
}
_ "whitespace" = [ \t\r\n]*
Identifier = $([a-zA-Z_])+
Children =
_ variable:Variable';'
_ {
return {
variable
}
}
Variable =
_ tag:Tag?
_ label:Label?
_ type:DataType?
_ name:Identifier?
_ {
return {
tag,
label,
type,
name
}
}
Tag = '@tag('tagColumn:TagColumn')' {
return tagColumn
}
TagColumn = $([0-9])*
Label = '@label("'labelColumn:LabelColumn'")' {
return labelColumn
}
LabelColumn = $([^\r\n\t\"\)])*
DataType = 'String' / 'Int'
Analysis result
{
"type": "javaSchema",
"version": "1.0.0",
"blocks": [
{
"className": "Test",
"children": [
{
"variable": {
"tag": "1",
"label": "名字",
"type": "String",
"name": "name"
}
},
{
"variable": {
"tag": "2",
"label": "性别 0-男 1-女",
"type": "Int",
"name": "sex"
}
}
]
}
]
}
The above rules can be directly verified in the web version of pegjs official website, pegjs also provides npm package, pegjs js api is relatively simple, you can refer to pegjs official documentation