Implementing regular expression matcher in C language

"The Beauty of Code" reading notes. The source
code comes from Chapter 9 of the book "The Practice of Programming"
and is used to process the following models.

character meaning
.(Dot) Match any single character
^ Matches the beginning of the input string
$ Matches the end of the input string
* Matches zero or more occurrences of the previous character
/* match: search for regexp anywhere in text */
    int match(char *regexp, char *text)
    {
        if (regexp[0] == '^')
            return matchhere(regexp+1, text);
        do { /* must look even if string is empty */
            if (matchhere(regexp, text))
                return 1;
        } while (*text++ != '\0');
        return 0;
    }

/* matchhere: search for regexp at beginning of text */
    int matchhere(char *regexp, char *text)
    {
        if (regexp[0] == '\0')
            return 1;
        if (regexp[1] == '*')
            return matchstar(regexp[0], regexp+2, text);
        if (regexp[0] == '$' && regexp[1] == '\0')
            return *text == '\0';
        if (*text!='\0' && (regexp[0]=='.' || regexp[0]==*text))
            return matchhere(regexp+1, text+1);
        return 0;
    }

/* matchstar: search for c*regexp at beginning of text */
    int matchstar(int c, char *regexp, char *text)
    {
        do { /* a * matches zero or more instances */
        if (matchhere(regexp, text))
            return 1;
        } while (*text != '\0' && (*text++ == c || c == '.'));
        return 0;
    }
int match(char *regexp, char *text)

Call this function for regular expression matching.
The parameter regexp is a regular expression, and the parameter text is the string to be matched.

int matchhere(char *regexp, char *text)

This function determines whether the first character of the regular expression matches the first character of the text, and returns 0 if it fails. The result is advanced to the next character of the regular expression and the next character of the text to continue matching. If the second character of the regular expression is *, the following function is called to determine whether the closure matches.

int matchstar(int c, char *regexp, char *text)

Matches the repeated character c until the remaining characters of text are matched successfully. If the remaining characters of text fail to match until the end, 0 is returned. The do-while structure is used because * can also match 0.

I tried some examples myself and found that this algorithm pursues the fastest matching. For example, if the regular expression is aaa.* and the text to be matched is aaaabaaa, then the text to be matched will end after the first three matches.

It is really a streamlined but powerful code. It is recommended to set breakpoints for debugging yourself, take a look at the process, and deepen your understanding. (Actually, it’s because I can only read it myself, but I can’t write it. I can’t write much analysis...)

Guess you like

Origin blog.csdn.net/ZhaoBuDaoFangXia/article/details/53187721