Regular Expressions
php regular expression study notes
What is a regular expression
-
Regular expressions (Regular Expressions (Perl-Compatible)), is a matching pattern
-
Regular expression itself is a string
-
Regular expressions to be used in the corresponding function
-
There are two sets of regular expression libraries, both functionally similar in PHP, only a slight difference in efficiency
- A use provided by the PCRE library
preg_
prefixed function named - Provided by a POSIX extended, used to
ereg_
function prefixed
- A use provided by the PCRE library
note
- PCRE Perl language derived from sign
- PCRE syntax supports more features with powerful than POSIX
Regular grammar
<?php
$reg = "/a\d/i"
- Delimiters, you can use a variety of commonly used for the
//
- Atom: smallest matching unit
\a\
(to be placed delimiter) - Metacharacters: can not be used alone, modification atoms, is used to extend the capabilities and limitations atoms
/\d/
(to be placed delimiter) - Correction coding mode: correction, a mode (regular correction)
/a/i
(to be placed outside the delimiter)
Regular expression atoms
Atom is the smallest unit of a regular expression, an expression of at least one atom
All those atoms are not specified by the character element is displayed as a non-printing characters and printable characters, particularly into five categories. (All characters are atomic)
- Ordinary characters as atoms:
a-z A-Z 0-9
other - After the transfer element and some special characters as character atoms:
- All of the punctuation, but the statement is of special significance as a symbol needs to be escaped before atoms, such as
\"\'\*\+\?\.
, etc.
- All of the punctuation, but the statement is of special significance as a symbol needs to be escaped before atoms, such as
- Some non-printing characters as atoms: such as
\f\n\r\t\cx
- Use "universal character type" atoms as: As:
\d\D\w\W\s\S
- Since atoms table definition ([]) as the atoms: such as:
'/[apj]sp/' '/[^apj]sp/'
General character as the atom
- Printable characters
a-z A-Z 0-9 !@#$%^&*()...
- Non-printing characters
\n...
$str = "this is a ^ test ";
$reg = "/\^/";
if (preg_match($reg, $str, $arr)) {
echo "正则表达式{$reg}和{$str}匹配成功!\n";
print_r($arr);
} else {
echo "匹配失败";
}
正则表达式/\^/和this is a ^ test 匹配成功!
Array
(
[0] => ^
)
'\'
: Escape sign:
After some special characters and characters as atom transfer element
- Meaningful symbols may be transformed into meaningless atoms, e.g.
/\^/
- It may be meaningless characters turn into meaningful atoms, for example
/\t/
(tab key) a-z A-Z 0-9
All of these characters does not make sense, with no sense of escape is there can be added from time to increase the
In addition to a-z A-Z 0-9
these symbols do not need to transfer the best escape when other symbols, most of the special symbols have special meaning
The use of "universal character type"
\d |
On behalf of any of a number |
\D |
A representative of any non-numeric |
# /\d/
$str = "this is a 123 test 11";
// 匹配所有数字
$reg = "/\d/";
------------------------------------------
正则表达式/\d/和this is a 123 test 11匹配成功!
Array
(
[0] => Array
(
[0] => 1
[1] => 2
...
)
)
# /\D/
$str = "this is a 123 test 11";
// 匹配所有数字
$reg = "/\D/";
------------------------------------------
正则表达式/\D/和this is a 123 test 11匹配成功!
Array
(
[0] => Array
(
[0] => t
[3] => s
[4] =>
[5] => i
...
)
)
\w |
It represents any word az AZ 0-9 _ |
\W |
It stands for any word except for a non-az AZ 0-9 _ all characters |
# /\w/
$str = "!@$@%%^%$@____";
// 匹配所有字
$reg = "/\w/";
------------------------------------
正则表达式/\w/和!@$@%%^%$@____匹配成功!
Array
(
[0] => Array
(
[0] => _
[1] => _
...
)
)
# \/W\
正则表达式/\W/和!@$@%%^%$@____匹配成功!
Array
(
[0] => Array
(
[0] => !
[1] => @
[2] => $
[3] => @
...
)
)
\s |
Representatives blank |
\S |
A representative of any non-blank |
# /\s/
$str = "qw
we rt";
----------
Array
(
[0] => Array
(
[0] =>
[1] =>
[2] =>
[3] =>
)
)
#/\S/
正则表达式/\S/和qw
we rt匹配成功!
Array
(
[0] => Array
(
[0] => q
[1] => w
[2] => w
[3] => e
[4] => r
[5] => t
)
Custom Table atom
Use []
to specify
# [469]
正则表达式/[469]/和1243456768909匹配成功!
Array
(
[0] => Array
(
[0] => 4
[1] => 4
[2] => 6
[3] => 6
[4] => 9
[5] => 9
)
)
Range can be specified by ' -
' represents
正则表达式/[1-5]/和1243456768909匹配成功!
Array
(
[0] => Array
(
[0] => 1
[1] => 2
[2] => 4
[3] => 3
[4] => 4
[5] => 5
)
)
non-^
All non-matching numbers
正则表达式/[^1-4]/和ass2423匹配成功!
Array
(
[0] => Array
(
[0] => a
[1] => s
[2] => s
)
)
.
point
Behalf of all
Note
Regular expressions are commonly used in non-printing characters
Atomic characters | Meaning Description |
---|---|
\cx |
Match control characters specified by the x, such as \ cM matches a Control-M or a carriage return, or one of x must az AZ |
\f |
Match for a website page, equivalent to \ x0x or \ cL |
\n |
Matches a newline, equivalent to \ X0A or \ cJ |
\r |
Matching a carriage return, equivalent to \ x0d or \ cM |
\t |
Matching a tab, equivalent to \ x09 or \ the cI |
\v |
Matching a vertical tab, equivalent to \ X0B or \ The cK |
… | … |
Regular expression metacharacters
Expression alone can not be used in the positive, for modification atoms
- '
*
' In front of the atoms used to modify one or more of 0 may occur
# /go*gle/
正则表达式/go*gle/和this gggggoogle is a test 匹配成功!
Array
(
[0] => Array
(
[0] => google
)
)
- '
+
' Matches or multiple front views of 1 atom (at least once in front of atoms)
正则表达式/g+oogle/和this gggggoogle is a test 匹配成功!
Array
(
[0] => Array
(
[0] => gggggoogle
)
)
- '
?
' Matches one or more times before its atoms (atoms not occur several times in front)
#正则表达式/go?gle/和this gogle is a test 匹配成功!
Array
(
[0] => Array
(
[0] => gogle
)
{n}
Indicate that the previous atom occurs exactly n times
# 正则表达式/go{2}gle/和this google is a test 匹配成功!
Array
(
[0] => Array
(
[0] => google
)
)
{n,}
Indicate that the previous occurrence atoms is not less than n
# 正则表达式/go{2,}gle/和this google is a test 匹配成功!
Array
(
[0] => Array
(
[0] => google
)
)
{n,m}
It indicates that the previous atom appears at least n times, the most frequent m
# 正则表达式/go{1,3}gle/和this google is a test 匹配成功!
Array
(
[0] => Array
(
[0] => google
)
)
()
- Change priority
- The change larger than the small atomic atom
- Sub-mode, the entire expression is a large model, parentheses are independent of each sub-model
- Backreferences
正则表达式/orac(le|my)/和this oracmysql is a test 匹配成功!
Array
(
[0] => Array
(
[0] => oracmy
)
[1] => Array
(
[0] => my
)
)
正则表达式/orac(mysql)*/和this oracmysql is a mysql 匹配成功!
Array
(
[0] => Array
(
[0] => oracmysql
)
[1] => Array
(
[0] => mysql
)
)
正则表达式/(http|ftp):\/\/\www(.*)?\.(com|net)/和this http://www.baidu.com oracmysql ftp://www.exp.net is a mysql 匹配成功!
Array
(
[0] => Array
(
[0] => http://www.baidu.com oracmysql ftp://www.exp.net
)
[1] => Array
(
[0] => http
)
[2] => Array
(
[0] => .baidu.com oracmysql ftp://www.exp
)
[3] => Array
(
[0] => net
)
)
正则表达式/\d{4}(-|\/)\d{2}\1\d{2}/和this 2020-03-06 2020/03/06 匹配成功!
Array
(
[0] => Array
(
[0] => 2020-03-06
[1] => 2020/03/06
)
[1] => Array
(
[0] => -
[1] => /
)
)
Metacharacters table
Metacharacters | Meaning Description |
---|---|
* |
Match zero, one or more times before their atoms |
+ |
Match front atoms or more 1 |
? |
Match front atom of 0 or 1 |
! |
Branch selection matches two or more |
{n} |
Indicate that the previous atom occurs exactly n times |
{n,} |
Indicate that the previous occurrence atoms is not less than n times |
{n,m} |
It indicates that the previous atom appears at least n times m times the most frequent |
^或、A |
Matches the start position of the input string (or multi-mode at the beginning of the downlink, either immediately after a newline) |
\$或\Z |
(Or after the beginning of the downlink multi-mode, both followed by a line feed) end of the matched input character string |
\b |
Matching word boundaries |
\B |
In addition to the matching part of a word boundary |
() |
Matching a whole atom, i.e. mode unit, to be understood that a plurality of single atoms by atoms of large |
Mode correction code (single character)
- Correction coding mode write delimiter outside (right)
Example:
"/go*gle/i"
-
Mode correction code, a character that is a function
-
effect:
- Mode correction code can correct explanation regular expressions, the expansion of the regular expressions
i
: Fixed case-insensitive regular expressions (the default is case-sensitive)
正则表达式/test/i和this is a Test匹配成功!
Array
(
[0] => Array
(
[0] => Test
)
)
m
The multi-line regarded as multi-line (the default being put multiple lines regarded as one line)
正则表达式/^is/m和this
is a Test匹配成功!
Array
(
[0] => Array
(
[0] => is
)
)
s
Correction regular expression can match a newline (by default. Can not match newline)
正则表达式/Te.*st/s和this is a Te
st匹配成功!
Array
(
[0] => Array
(
[0] => Te
st
)
)
s
Correction regular expressions, you can omit the blank
正则表达式/web server/ix和this is a WebServer匹配成功!
Array
(
[0] => Array
(
[0] => WebServer
)
)
U
(*, +) Regular expression comparison greedy greedy U can cancel mode (not often used for general use (. *?))
Write regular expressions
- Regular expression is a language, to learn to open your mind
- Column demand
Write regular match url
<?php
$str = "
这是http://www.example.com网站
这是http://www.xxx.net/index.php网站
这是http://www.example.cn/php网站
这是http://www.demo.org/login.php?user=aaa网站
这是https://www.test.top网站
这是https://news.baidu.top网站
这是ftp://news.baidu.top网站
";
$reg = "/(https?|ftps?):\/\/(.*?).(.*?).(com|net|org|cn|top)([\w\.\/\=\?\&]*)?/";
preg_match_all($reg, $str, $arr);
print_r($arr);
Write regular match email
<?php
$str = "
这是[email protected]邮箱
这是[email protected]邮箱
这是[email protected]邮箱
这是[email protected]邮箱
";
$reg = "/\w+([-+]\w+)*@\w+(-.\w+)*\.\w+/i";
preg_match_all($reg, $str, $arr);
print_r($arr);
Use regular expressions
Segmentation, matching, search, replace
- String handling functions (fast processing, but some do)
- Regular Expression Functions (powerful, but less efficient)
Note: The string handling functions can be handled, would not have a regular process
Find a match:
- strstr()
- strpos ()
- substr()
Regular match
- preg_match ()
- preg_match_all()
- preg_grep ()
String split
- explode()
- implode()
- join()
Regular division
- preg_split ()
String replacement
- str_replace()
Regular replacement
- preg_replace()