[Regular] day01

Regular expressions
I. Overview
    verify
    web crawler.

    The concept:
    a string with syntax format.

    Functions
    PCRE
    1, regular perl language syntax compatible. (Java C)
    2, fast, and efficient.
    The POSIX
    1, efficiency is not high
    2, a security risk.
    3, windows can not run.

Second, the regular expression applied
    preg_match_all (string $ pattern, string $ subject, array $ match)
    Function: regular matching
    parameters:
    pattern regular expression
    subject string to match the
    result of match of the matched
    return: the number of matches to.

    Syntax:
    - Delimiter
    - atom
    - metacharacters
    - Pattern Modifiers

    - delimiter is
      a regular expression must have a delimiter. In addition to numbers, letters, underscores, the other can be used as a delimiter. The industry's most commonly used is the definite integral "/" delimiter appear in pairs.
    - Atomic
      Atom is the smallest constituent unit of the regular expression. A regular expression in order to be meaningful, there is at least one atom. 
      a, numbers, letters, underline, all print character called atoms.
      b, non-printing characters.
         \ n new line
      c, the characters need to be escaped. Characters that have special meaning. For example: Metacharacter
      d, atom have a special meaning
         \ d represent all numbers
         \ D represent all the non-numeric

         \ w numbers, letters, underscores
         \ W non-numeric, alphabetic, underscore

         \ s representing all blank
         \ S representatives all non-blank

      E, custom atoms table
         [] range designation atoms such as: az represents all lowercase letters
                                 AZ representing all capital letters
                                 0-9 represent all numbers
                     
            may be given simultaneously a plurality of symbol interval range between not required, a plurality of ranges.
            For example: a-z0-9
            the custom table atom ^ represents a non-
      f ,. atoms of any

      element character
      element is used to modify the character atoms.
      * Represents a modified atom can appear zero or more times.
      + Represents modified atoms may occur once or multiple times.
      ? 1 may occur zero or more times for the modified atoms.

      {m} represents the modified atoms may occur m times.
      {n, m} represents the modified atoms may be present up to at least n times m.
                 the number n <= appear <= m
      {n,} represents the modified atoms may occur at least n most do not limit
                 n <number = occur

      | or

      ^, and \ A representative of the start as a ^ and \ A modified character
      and $ \ Z $ and as a representative of \ Z modified characters end

      ()
       - sub mode 
       - to change the match range
       - back-references


       \ b and \ B \ b represents the character boundaries, \ B represents a non-character boundary

        - mode modifier
           mode modifier is used to modify the regular expression. Pattern Modifiers are written in the regular expression delimiter outside.
      
           i: the positive expression case insensitive.
      

Regular mail verification principle picture:

Regular registration verification principle Pictures

 

 

 

Guess you like

Origin www.cnblogs.com/tommymarc/p/11627343.html