The complete code for the project in C2j-Compiler
Foreword
Documents relating to the symbol table in symboltable bag
In front of us by completing a (1) and a finite state automaton reduce information to build a LALR parsing tables, parsing the formal completion of the C language. The next step is to enter semantic analysis section, and In the second mentioned, the main task is to generate semantic analysis of the symbol table to record the types of variables and variables, and found not to conform to the semantics of the sentence
Description variables
In the C language variables defined in the declaration, there are two descriptions
Specifier (Specifier)
Specifier is a number of variables corresponding to the type described in C language or the like static, extern keyword (like extern these keywords are not used in the realization of the compiler, because extern may also involve multiple source files compile and link)Modifiers (Declarator)
Modifier is representative of a variable or pointer type asterisk in parentheses array, the modifier can be part of the complex belongs, because of the modifier may be combined. So for the modifier combination you can create multiple Declarator, in order to link
This can be done two classes, two classes are relatively simple logic:
Declarator class
- declareType: it is used to indicate the current Declarator array or a pointer to a function or
- numberOfElements, elements: if the current type is an array of words, they represent the number of elements and the array element of the array
public class Declarator {
public static int POINTER = 0;
public static int ARRAY = 1;
public static int FUNCTION = 2;
private int declareType;
private int numberOfElements = 0;
HashMap<Integer, Object> elements = null;
public Declarator(int type) {
this.declareType = type;
}
...
}
Specifier class
Specifier the property a bit more, but after the compiler might only supports int, char, void, struct four types
basicType: used to indicate the type of the current variable
storageClass: represents the storage of variables (fixed, auto), where we put information on the typedef is also here, that if met typedef, then storageClass will be set to TYPEDEF
constantValue and vStruct: both belong to two special properties that represent enum type and structure, is special because for a special treatment after them. If the enum type meet a configuration corresponding to the CONSTANT is basicType Specifier, i.e. a value corresponding to the constantValue
public class Specifier {
/**
* Variable types
*/
public static int NONE = -1;
public static int INT = 0;
public static int CHAR = 1;
public static int VOID = 2;
public static int STRUCTURE = 3;
public static int LABEL = 4;
/**
* storage
*/
public static int FIXED = 0;
public static int REGISTER = 1;
public static int AUTO = 2;
public static int TYPEDEF = 3;
public static int CONSTANT = 4;
public static int NO_OCLASS = 0;
public static int PUBLIC = 1;
public static int PRIVATE = 2;
public static int EXTERN = 3;
public static int COMMON = 4;
private int basicType;
private int storageClass;
private int outputClass = NO_OCLASS;
private boolean isLong = false;
private boolean isSigned = false;
private boolean isStatic = false;
private boolean isExternal = false;
private int constantValue = 0;
private StructDefine vStruct = null;
}
Descriptor table
Defined variables described in the previous two classes, but only by these two classes still can not accurately express a symbol, so we need to look at the packaging of these two categories, making it more expressive
Programming is done many times a particular data structure according to the specific needs of the symbol table on the computer where nature is only used to describe the data structure variable only
The data structure as the symbol table has several basic conditions:
- Speed
because of frequent insertion of the symbol table and look, so the query and insertion speed must be fast enough - Flexible
because the definition of variables can be complex, for example, more than one modifier plus pointer ((long int, long doube * ), it must be flexible enough in design
Because learning the compiler has been followed Chen's class, so the design of the symbol table also follow the teacher's design
In order to ensure the above two conditions, we use a hash table to achieve chain
This picture is I find online is actually not that complicated
All variables are stored in the hash table, the hash is variable with the same name will be the same place, of course, they belong to different scopes, scopes distinguish different is that this portion of FIG above, it will produce the same effect variable domain connected
symboltable.Symbol
This class is used to describe one of the symbols in the table
If you download the source files from github, then there are many behind only need to use the code generation, can now be ignored
The main properties are:
- level: variable used to indicate the level of
- duplicate: whether it is a variable of the same name
- args: If the name of the function corresponding to the symbol, the input parameters args point sign function list
- next: points to the next level with variable symbol
public class Symbol {
String name;
String rname;
int level;
boolean duplicate;
Symbol args;
Symbol next;
}
Prior to this time with a Symbol plus Specifier and Declarator have enough expressive power to describe a symbol, we need to link these three categories, the first increase in a TypeLink
TypeLink
TypeLink represents a Specifier or a Declarator, here with inheritance to achieve may seem look a little better
public class TypeLink {
public boolean isDeclarator;
/**
* typedef int
*/
public boolean isTypeDef;
/**
* Specifier or Declarator
*/
public Object typeObject;
private TypeLink next = null;
public TypeLink(boolean isDeclarator, boolean typeDef, Object typeObj) {
this.isDeclarator = isDeclarator;
this.isTypeDef = typeDef;
this.typeObject = typeObj;
}
public Object getTypeObject() {
return typeObject;
}
public TypeLink toNext() {
return next;
}
public void setNextLink(TypeLink obj) {
this.next = obj;
}
}
So that Symbol was necessary to add two more attributes
typeLinkBegin and typeLinkEnd variables used to describe the entire list of specifiers and modifiers, i.e. said before these modifiers or specifiers sequentially connected
public class Symbol {
String name;
String rname;
int level;
boolean implicit;
boolean duplicate;
Symbol args;
Symbol next;
TypeLink typeLinkBegin;
TypeLink typeLinkEnd;
}
example
After this is completed, e.g.
long int (*e)[10];
I can say so
Symbol | declares his | declares his | specifer |
---|---|---|---|
name:e | declareType = PONITER | declareType = array | basicType = INT isLong = TRUE |
-> | -> | -> | -> |
Definition of the structure of the symbol
This document has not talked StructDefine, this file is used to describe the structure, because of the complexity of the structure itself, so it will need special handling, but still the combination of structure variables pile nature, it can still using the above procedure described in
- tag: name of the structure
- Nested hierarchical structure: level
- Symbol: the corresponding structure in the variable
public class StructDefine {
private String tag;
private int level;
private Symbol fields;
public StructDefine(String tag, int level, Symbol fields) {
this.tag = tag;
this.level = level;
this.fields = fields;
}
}
example
See example of a structure definition
struct dejavidwh {
int array1[5];
struct dejavudwh *pointer1;
} one;
summary
So in the end only
private HashMap<String, ArrayList<Symbol>> symbolTable = new HashMap<>();
private HashMap<String, StructDefine> structTable = new HashMap<>();
You may describe a symbol table
symbolTable in the key equivalent variable names, while the back of the store with the same name ArrayList variable, because each Symbol has a next pointer to point to other Symbol same level, so this structure is equivalent to that described in the beginning of the hash table
This section describes the data structure of a symbol table, two key points are
Description variables
Therefore, modifiers and descriptors defined to describe a variable
Associated variable
Symbol list to define the variables in series
Also my github blog: https://dejavudwh.cn/