From zero to write a compiler (VII): data structure semantic analysis of the symbol table

The complete code for the project in C2j-Compiler

Foreword

Documents relating to the symbol table in symboltable bag

In front of us by completing a (1) and a finite state automaton reduce information to build a LALR parsing tables, parsing the formal completion of the C language. The next step is to enter semantic analysis section, and In the second mentioned, the main task is to generate semantic analysis of the symbol table to record the types of variables and variables, and found not to conform to the semantics of the sentence

Description variables

In the C language variables defined in the declaration, there are two descriptions

  • Specifier (Specifier)

    Specifier is a number of variables corresponding to the type described in C language or the like static, extern keyword (like extern these keywords are not used in the realization of the compiler, because extern may also involve multiple source files compile and link)
  • Modifiers (Declarator)

    Modifier is representative of a variable or pointer type asterisk in parentheses array, the modifier can be part of the complex belongs, because of the modifier may be combined. So for the modifier combination you can create multiple Declarator, in order to link

This can be done two classes, two classes are relatively simple logic:

Declarator class

  • declareType: it is used to indicate the current Declarator array or a pointer to a function or
  • numberOfElements, elements: if the current type is an array of words, they represent the number of elements and the array element of the array
public class Declarator {
    public static int POINTER = 0;
    public static int ARRAY = 1;
    public static int FUNCTION = 2;

    private int declareType;
    private int numberOfElements = 0;

    HashMap<Integer, Object> elements = null;

    public Declarator(int type) {
        this.declareType = type;
    }
    ...
}

Specifier class

Specifier the property a bit more, but after the compiler might only supports int, char, void, struct four types

  • basicType: used to indicate the type of the current variable

  • storageClass: represents the storage of variables (fixed, auto), where we put information on the typedef is also here, that if met typedef, then storageClass will be set to TYPEDEF

  • constantValue and vStruct: both belong to two special properties that represent enum type and structure, is special because for a special treatment after them. If the enum type meet a configuration corresponding to the CONSTANT is basicType Specifier, i.e. a value corresponding to the constantValue

public class Specifier {
    /**
     * Variable types
     */
    public static int NONE = -1;
    public static int INT = 0;
    public static int CHAR = 1;
    public static int VOID = 2;
    public static int STRUCTURE = 3;
    public static int LABEL = 4;

    /**
     * storage
     */
    public static int FIXED = 0;
    public static int REGISTER = 1;
    public static int AUTO = 2;
    public static int TYPEDEF = 3;
    public static int CONSTANT = 4;

    public static int NO_OCLASS = 0;
    public static int PUBLIC = 1;
    public static int PRIVATE = 2;
    public static int EXTERN = 3;
    public static int COMMON = 4;

    private int basicType;
    private int storageClass;
    private int outputClass = NO_OCLASS;
    private boolean isLong = false;
    private boolean isSigned = false;
    private boolean isStatic = false;
    private boolean isExternal = false;
    private int constantValue = 0;
    private StructDefine vStruct = null;
}

Descriptor table

Defined variables described in the previous two classes, but only by these two classes still can not accurately express a symbol, so we need to look at the packaging of these two categories, making it more expressive

Programming is done many times a particular data structure according to the specific needs of the symbol table on the computer where nature is only used to describe the data structure variable only

The data structure as the symbol table has several basic conditions:

  1. Speed
    because of frequent insertion of the symbol table and look, so the query and insertion speed must be fast enough
  2. Flexible
    because the definition of variables can be complex, for example, more than one modifier plus pointer ((long int, long doube * ), it must be flexible enough in design

Because learning the compiler has been followed Chen's class, so the design of the symbol table also follow the teacher's design

In order to ensure the above two conditions, we use a hash table to achieve chain

This picture is I find online is actually not that complicated

All variables are stored in the hash table, the hash is variable with the same name will be the same place, of course, they belong to different scopes, scopes distinguish different is that this portion of FIG above, it will produce the same effect variable domain connected

symboltable.Symbol

This class is used to describe one of the symbols in the table

If you download the source files from github, then there are many behind only need to use the code generation, can now be ignored

The main properties are:

  • level: variable used to indicate the level of
  • duplicate: whether it is a variable of the same name
  • args: If the name of the function corresponding to the symbol, the input parameters args point sign function list
  • next: points to the next level with variable symbol
public class Symbol {
    String name;
    String rname;
    int level; 
    boolean duplicate; 
    Symbol args; 
    Symbol next;  
}

Prior to this time with a Symbol plus Specifier and Declarator have enough expressive power to describe a symbol, we need to link these three categories, the first increase in a TypeLink

TypeLink represents a Specifier or a Declarator, here with inheritance to achieve may seem look a little better

public class TypeLink {
    public boolean isDeclarator;
    /**
     * typedef int
     */
    public boolean isTypeDef;
    /**
     * Specifier or Declarator
     */
    public Object typeObject;

    private TypeLink next = null;

    public TypeLink(boolean isDeclarator, boolean typeDef, Object typeObj) {
        this.isDeclarator = isDeclarator;
        this.isTypeDef = typeDef;
        this.typeObject = typeObj;
    }

    public Object getTypeObject() {
        return typeObject;
    }

    public TypeLink toNext() {
        return next;
    }

    public void setNextLink(TypeLink obj) {
        this.next = obj;
    }

}

So that Symbol was necessary to add two more attributes

typeLinkBegin and typeLinkEnd variables used to describe the entire list of specifiers and modifiers, i.e. said before these modifiers or specifiers sequentially connected

public class Symbol {
    String name;
    String rname;
    int level;  
    boolean implicit;  
    boolean duplicate; 
    Symbol args;  
    Symbol next;

    TypeLink typeLinkBegin;
    TypeLink typeLinkEnd;
}

example

After this is completed, e.g.

long int (*e)[10];

I can say so

Symbol declares his declares his specifer
name:e declareType = PONITER declareType = array basicType = INT isLong = TRUE
-> -> -> ->

Definition of the structure of the symbol

This document has not talked StructDefine, this file is used to describe the structure, because of the complexity of the structure itself, so it will need special handling, but still the combination of structure variables pile nature, it can still using the above procedure described in

  • tag: name of the structure
  • Nested hierarchical structure: level
  • Symbol: the corresponding structure in the variable
public class StructDefine {
    private String tag;
    private int level;
    private Symbol fields;

    public StructDefine(String tag, int level, Symbol fields) {
        this.tag = tag;
        this.level = level;
        this.fields = fields;
    }
}

example

See example of a structure definition

struct dejavidwh {
    int array1[5];
    struct dejavudwh *pointer1;
} one;

summary

So in the end only

private HashMap<String, ArrayList<Symbol>> symbolTable = new HashMap<>();
    private HashMap<String, StructDefine> structTable = new HashMap<>();

You may describe a symbol table

symbolTable in the key equivalent variable names, while the back of the store with the same name ArrayList variable, because each Symbol has a next pointer to point to other Symbol same level, so this structure is equivalent to that described in the beginning of the hash table

This section describes the data structure of a symbol table, two key points are

  1. Description variables

    Therefore, modifiers and descriptors defined to describe a variable

  2. Associated variable

    Symbol list to define the variables in series

Also my github blog: https://dejavudwh.cn/

Guess you like

Origin www.cnblogs.com/secoding/p/11373929.html