C language regex

POSIX provides for regular expressions C language library functions, see regex (3). We have learned a lot of use of C language library functions, the reader should have the ability to read the man page. This chapter describes the regular expression in grep, sed, awk is to learn to be able to judge the whole, the reader is based on regex (3) his summary of a regular expression in C language, write some simple procedures, such as validating user input IP address or email address format is correct.

C regular expression language processing functions are commonly used regcomp (), regexec (), regfree () and regerror (), it is generally divided into three steps, as follows:

C language using regular expressions are generally divided into three steps:

Compiling a regular expression regcomp ()

Match the regular expression regexec ()

Release regular expression regfree ()

Below is a detailed explanation of the three functions

This function specifies a regular expression pattern compiled into a specific data format compiled, this could make matching more effectively. Regexec function will use this data in a pattern matching the target text string. Implementation of successful return 0.

int regcomp (Compiled regex_t *, const  char * pattern, int CFLAGS)
 / * 
is a type of data structure used to store the compiled regular expression, to store its members re_nsub regular subexpression regular expression the number of sub-regular expression parentheses is wrapped portion expression. 
* / 
    Regex_t 
    pattern      // pointing pointer expression of our regular written. 
    CFLAGS          // following four values or their operation or | value of (): 
    REG_EXTENDED      // way to more powerful extended regular expression matching. 
    REG_ICASE          // Ignore case when matching letters. 
    REG_NOSUB          // result of the match without storage, return only if a successful match. If this flag is set, then the regexec ignored nmatch and pmatch two parameters. 
    REG_NEWLINE      // identification line breaks, so that '$' can be matched from the end of the line, '^' can be matched from the beginning of the line.

When we compiled the regular expression, you can use regexec match our target text string, and if the time in compiling a regular expression is not specified as a parameter cflags REG_NEWLINE, the default is to ignore the line break, that is, the the entire text string as a string process.

Implementation of successful return 0.

regmatch_t structure is a data type defined in the regex.h:

typedef struct {
    regoff_t rm_so;
    regoff_t rm_eo;
} regmatch_t;

Members rm_so deposit match text string at the beginning of the target string, rm_eo storage end position. Usually we define a set of such a structure in the form of an array. Because often our regular expression also contains sub-regular expressions. A main storage unit array 0 regex position, sequentially storing the sub unit behind the regex position.

int regexec (Compiled regex_t *, char * String , size_t nmatch, regmatch_t matchptr [], int the EFLAGS) 
Compiled      // is already used regcomp function compiled regular expressions. 
String      //     target text string. 
nmatch      //     length regmatch_t array of structures. 
regmatch_ matchptr // array of structures of type t, the position information matches the stored text string. 
eflags has two values: 
REG_NOTBOL // make special characters ^ no effect 
REG_NOTEOL // make $ special character has no effect

When we are finished using the compiled regular expression, or to recompile the other regular expression, we can use this function to empty the contents compiled pointed regex_t structure, keep in mind that if it is recompiled, it must be first empty regex_t structure.

void regfree (regex_t *compiled)

When the execution regcomp or regexec error, you can call this function returns a string containing the error message.

regerror size_t ( int The errcode, regex_t Compiled *, char * Buffer, size_t length) 

The errcode      // wrong code returned by regcomp and regexec functions. 
Compiled      // is already compiled regular expression with regcomp function, this value can be NULL. 
Buffer          // point used to store the error message string of memory space. 
length          // specified buffer length, if the length of the error messages greater than this value, the regerror function will automatically cut off the excess string, but he still returns the full length of the string. Therefore, we can use the following method to obtain the length of the error strings.

例如: size_t length = regerror (errcode, compiled, NULL, 0);

Test Case:

#include <sys/types.h>
#include <regex.h>
#include <stdio.h>

int main(int argc, char ** argv)
{
    if (argc != 3) {
        printf("Usage: %s RegexString Text\n", argv[0]);
        return 1;
    }
    const char * pregexstr = argv[1];
    const char * ptext = argv[2];
    regex_t oregex;
    int nerrcode = 0;
    char szerrmsg[1024] = {0};
    size_t unerrmsglen = 0;
    if ((nerrcode = regcomp(&oregex, pregexstr, REG_EXTENDED|REG_NOSUB)) == 0) {
        if ((nerrcode = regexec(&oregex, ptext, 0, NULL, 0)) == 0)    {
            printf("%s matches %s\n", ptext, pregexstr);
            regfree(&oregex);
            return 0;
        }
    }
    unerrmsglen = regerror(nerrcode, &oregex, szerrmsg, sizeof(szerrmsg));
    unerrmsglen = unerrmsglen < sizeof(szerrmsg) ? unerrmsglen : sizeof(szerrmsg) - 1;
    szerrmsg[unerrmsglen] = '\0';
    printf("ErrMsg: %s\n", szerrmsg);
    regfree(&oregex);

    return 1;
}

Matching URL:

./a.out "http:\/\/www\..*\.com" "http://www.taobao.com"
./a. OUT  " ^ [A-zA-Z0-9] @ + [A-zA-Z0-9] +. [A-zA-Z0-9] + "  " [email protected] " 
. / A . OUT  " . \ W + ([-. +] \ W +) * @ \ W + ([-.] \ W +) * \ \ W + ([-.] \ W +) * "  " [email protected] " 
NOTE: \ w matches a character, an underscore

In addition to the functions provided gnu, further conventional processing PCRE regular, full name is Perl Compatible Regular Ex-pressions. We can see from the name PCRE library is compatible with regular expressions in Perl's regular expression library. PCRE is a free open source library, which is implemented by the C language, here is its official home page: http: //www.pcre.org/, interested friends can learn more here content. To get the PCRE library can be downloaded from here: http: //sourceforge.net/projects/pcre/files/

PCRE ++ is a C ++ package for PCRE library, which provides a more convenient, easy to use C ++ interface. Here is its official home page: http: //www.daemon.de/PCRE, interested friends can learn more here content. To get PCRE ++ library, can be downloaded from here: http: //www.daemon.de/PcreDownload

Also commonly used in c ++ boost regex.

Guess you like

Origin www.cnblogs.com/wanghao-boke/p/11488587.html