[Muxi_k] - PHP regular expression Getting Started

Regular Expressions

php regular expression study notes

What is a regular expression

  1. Regular expressions (Regular Expressions (Perl-Compatible)), is a matching pattern

  2. Regular expression itself is a string

  3. Regular expressions to be used in the corresponding function

  • There are two sets of regular expression libraries, both functionally similar in PHP, only a slight difference in efficiency

    • A use provided by the PCRE library preg_prefixed function named
    • Provided by a POSIX extended, used to ereg_function prefixed

note

  • PCRE Perl language derived from sign
  • PCRE syntax supports more features with powerful than POSIX

Regular grammar

<?php
$reg = "/a\d/i"
  • Delimiters, you can use a variety of commonly used for the//
  • Atom: smallest matching unit \a\(to be placed delimiter)
  • Metacharacters: can not be used alone, modification atoms, is used to extend the capabilities and limitations atoms /\d/(to be placed delimiter)
  • Correction coding mode: correction, a mode (regular correction) /a/i(to be placed outside the delimiter)

Regular expression atoms

Atom is the smallest unit of a regular expression, an expression of at least one atom

All those atoms are not specified by the character element is displayed as a non-printing characters and printable characters, particularly into five categories. (All characters are atomic)

  1. Ordinary characters as atoms: a-z A-Z 0-9other
  2. After the transfer element and some special characters as character atoms:
    • All of the punctuation, but the statement is of special significance as a symbol needs to be escaped before atoms, such as \"\'\*\+\?\., etc.
  3. Some non-printing characters as atoms: such as\f\n\r\t\cx
  4. Use "universal character type" atoms as: As:\d\D\w\W\s\S
  5. Since atoms table definition ([]) as the atoms: such as:'/[apj]sp/' '/[^apj]sp/'

General character as the atom

  • Printable charactersa-z A-Z 0-9 !@#$%^&*()...
  • Non-printing characters\n...
$str = "this is a ^ test ";

$reg = "/\^/";

if (preg_match($reg, $str, $arr)) {
    echo "正则表达式{$reg}{$str}匹配成功!\n";
    print_r($arr);
} else {
    echo "匹配失败";
}
正则表达式/\^/和this is a ^ test 匹配成功!
Array
(
    [0] => ^
)

'\': Escape sign:

After some special characters and characters as atom transfer element

  • Meaningful symbols may be transformed into meaningless atoms, e.g./\^/
  • It may be meaningless characters turn into meaningful atoms, for example /\t/(tab key)
  • a-z A-Z 0-9 All of these characters does not make sense, with no sense of escape is there can be added from time to increase the

In addition to a-z A-Z 0-9these symbols do not need to transfer the best escape when other symbols, most of the special symbols have special meaning


The use of "universal character type"

\d On behalf of any of a number
\D A representative of any non-numeric
# /\d/
$str = "this is a 123 test 11";
// 匹配所有数字
$reg = "/\d/";
------------------------------------------
正则表达式/\d/和this is a 123 test 11匹配成功!
Array
(
    [0] => Array
        (
            [0] => 1
            [1] => 2
            ...
        )

)
# /\D/
$str = "this is a 123 test 11";

// 匹配所有数字
$reg = "/\D/";
------------------------------------------
正则表达式/\D/和this is a 123 test 11匹配成功!
Array
(
    [0] => Array
        (
            [0] => t
            [3] => s
            [4] =>
            [5] => i
            ...
        )

)

\w It represents any word az AZ 0-9 _
\W It stands for any word except for a non-az AZ 0-9 _ all characters
# /\w/
$str = "!@$@%%^%$@____";
// 匹配所有字
$reg = "/\w/";

------------------------------------
正则表达式/\w/和!@$@%%^%$@____匹配成功!
Array
(
    [0] => Array
        (
            [0] => _
            [1] => _
            ...
        )
)
# \/W\
正则表达式/\W/和!@$@%%^%$@____匹配成功!
Array
(
    [0] => Array
        (
            [0] => !
            [1] => @
            [2] => $
            [3] => @
            ...
        )

)

\s Representatives blank
\S A representative of any non-blank
# /\s/
$str = "qw
we   rt";
----------
Array
(
    [0] => Array
        (
            [0] =>
            [1] =>
            [2] =>
            [3] =>
        )

)
#/\S/
正则表达式/\S/和qw
we   rt匹配成功!
Array
(
    [0] => Array
        (
            [0] => q
            [1] => w
            [2] => w
            [3] => e
            [4] => r
            [5] => t
        )

Custom Table atom

Use []to specify

# [469]
正则表达式/[469]/和1243456768909匹配成功!
Array
(
    [0] => Array
        (
            [0] => 4
            [1] => 4
            [2] => 6
            [3] => 6
            [4] => 9
            [5] => 9
        )

)

Range can be specified by ' -' represents

正则表达式/[1-5]/和1243456768909匹配成功!
Array
(
    [0] => Array
        (
            [0] => 1
            [1] => 2
            [2] => 4
            [3] => 3
            [4] => 4
            [5] => 5
        )

)

non-^

All non-matching numbers

正则表达式/[^1-4]/和ass2423匹配成功!
Array
(
    [0] => Array
        (
            [0] => a
            [1] => s
            [2] => s
        )
)

.point

Behalf of all


Note

Regular expressions are commonly used in non-printing characters

Atomic characters Meaning Description
\cx Match control characters specified by the x, such as \ cM matches a Control-M or a carriage return, or one of x must az AZ
\f Match for a website page, equivalent to \ x0x or \ cL
\n Matches a newline, equivalent to \ X0A or \ cJ
\r Matching a carriage return, equivalent to \ x0d or \ cM
\t Matching a tab, equivalent to \ x09 or \ the cI
\v Matching a vertical tab, equivalent to \ X0B or \ The cK

Regular expression metacharacters

Expression alone can not be used in the positive, for modification atoms

  • ' *' In front of the atoms used to modify one or more of 0 may occur
# /go*gle/
正则表达式/go*gle/和this gggggoogle is a test 匹配成功!
Array
(
    [0] => Array
        (
            [0] => google
        )
)
  • ' +' Matches or multiple front views of 1 atom (at least once in front of atoms)
正则表达式/g+oogle/和this gggggoogle is a test 匹配成功!
Array
(
    [0] => Array
        (
            [0] => gggggoogle
        )
)
  • ' ?' Matches one or more times before its atoms (atoms not occur several times in front)
#正则表达式/go?gle/和this gogle is a test 匹配成功!
Array
(
    [0] => Array
        (
            [0] => gogle
        )
  • {n} Indicate that the previous atom occurs exactly n times
# 正则表达式/go{2}gle/和this google is a test 匹配成功!
Array
(
    [0] => Array
        (
            [0] => google
        )
)
  • {n,} Indicate that the previous occurrence atoms is not less than n
# 正则表达式/go{2,}gle/和this google is a test 匹配成功!
Array
(
    [0] => Array
        (
            [0] => google
        )
)
  • {n,m} It indicates that the previous atom appears at least n times, the most frequent m
# 正则表达式/go{1,3}gle/和this google is a test 匹配成功!
Array
(
    [0] => Array
        (
            [0] => google
        )
)
  • ()
    • Change priority
    • The change larger than the small atomic atom
    • Sub-mode, the entire expression is a large model, parentheses are independent of each sub-model
    • Backreferences

正则表达式/orac(le|my)/和this oracmysql is a test 匹配成功!
Array
(
    [0] => Array
        (
            [0] => oracmy
        )

    [1] => Array
        (
            [0] => my
        )
)
正则表达式/orac(mysql)*/和this oracmysql is a mysql 匹配成功!
Array
(
    [0] => Array
        (
            [0] => oracmysql
        )

    [1] => Array
        (
            [0] => mysql
        )

)
正则表达式/(http|ftp):\/\/\www(.*)?\.(com|net)/和this http://www.baidu.com oracmysql ftp://www.exp.net is a mysql 匹配成功!
Array
(
    [0] => Array
        (
            [0] => http://www.baidu.com oracmysql ftp://www.exp.net
        )

    [1] => Array
        (
            [0] => http
        )

    [2] => Array
        (
            [0] => .baidu.com oracmysql ftp://www.exp
        )

    [3] => Array
        (
            [0] => net
        )

)
正则表达式/\d{4}(-|\/)\d{2}\1\d{2}/和this 2020-03-06 2020/03/06 匹配成功!
Array
(
    [0] => Array
        (
            [0] => 2020-03-06
            [1] => 2020/03/06
        )

    [1] => Array
        (
            [0] => -
            [1] => /
        )

)

Metacharacters table

Metacharacters Meaning Description
* Match zero, one or more times before their atoms
+ Match front atoms or more 1
? Match front atom of 0 or 1
! Branch selection matches two or more
{n} Indicate that the previous atom occurs exactly n times
{n,} Indicate that the previous occurrence atoms is not less than n times
{n,m} It indicates that the previous atom appears at least n times m times the most frequent
^或、A Matches the start position of the input string (or multi-mode at the beginning of the downlink, either immediately after a newline)
\$或\Z (Or after the beginning of the downlink multi-mode, both followed by a line feed) end of the matched input character string
\b Matching word boundaries
\B In addition to the matching part of a word boundary
() Matching a whole atom, i.e. mode unit, to be understood that a plurality of single atoms by atoms of large

Mode correction code (single character)

  1. Correction coding mode write delimiter outside (right)

Example:

"/go*gle/i"
  1. Mode correction code, a character that is a function

  2. effect:

    • Mode correction code can correct explanation regular expressions, the expansion of the regular expressions
  • i: Fixed case-insensitive regular expressions (the default is case-sensitive)
正则表达式/test/i和this is a Test匹配成功!
Array
(
    [0] => Array
        (
            [0] => Test
        )
)
  • m The multi-line regarded as multi-line (the default being put multiple lines regarded as one line)
正则表达式/^is/m和this
is a Test匹配成功!
Array
(
    [0] => Array
        (
            [0] => is
        )
)
  • sCorrection regular expression can match a newline (by default. Can not match newline)
正则表达式/Te.*st/s和this is a Te
st匹配成功!
Array
(
    [0] => Array
        (
            [0] => Te
st
        )
)
  • sCorrection regular expressions, you can omit the blank
正则表达式/web server/ix和this is a WebServer匹配成功!
Array
(
    [0] => Array
        (
            [0] => WebServer
        )

)
  • U(*, +) Regular expression comparison greedy greedy U can cancel mode (not often used for general use (. *?))

Write regular expressions

  1. Regular expression is a language, to learn to open your mind
  2. Column demand

Write regular match url

<?php

$str = "
    这是http://www.example.com网站
    这是http://www.xxx.net/index.php网站
    这是http://www.example.cn/php网站
    这是http://www.demo.org/login.php?user=aaa网站
    这是https://www.test.top网站
    这是https://news.baidu.top网站
    这是ftp://news.baidu.top网站
";


$reg = "/(https?|ftps?):\/\/(.*?).(.*?).(com|net|org|cn|top)([\w\.\/\=\?\&]*)?/";

preg_match_all($reg, $str, $arr);
print_r($arr);

Write regular match email


<?php

$str = "
    这是[email protected]邮箱
    这是[email protected]邮箱
    这是[email protected]邮箱
    这是[email protected]邮箱
";


$reg = "/\w+([-+]\w+)*@\w+(-.\w+)*\.\w+/i";

preg_match_all($reg, $str, $arr);
print_r($arr);

Use regular expressions

Segmentation, matching, search, replace

  1. String handling functions (fast processing, but some do)
  2. Regular Expression Functions (powerful, but less efficient)

Note: The string handling functions can be handled, would not have a regular process

Find a match:

  • strstr()
  • strpos ()
  • substr()

Regular match

  • preg_match ()
  • preg_match_all()
  • preg_grep ()

String split

  • explode()
  • implode()
  • join()

Regular division

  • preg_split ()

String replacement

  • str_replace()

Regular replacement

  • preg_replace()
Released six original articles · won praise 2 · Views 376

Guess you like

Origin blog.csdn.net/Muxi_k/article/details/104751341