The role of the question mark in regular expressions

Reference https://stackoverflow.com/questions/28646475/warning-preg-match-compilation-failed-unrecognized-character-after-or

Refer to https://www.regular-expressions.info/atomic.html

The second link above is a great document

For a strange reason, the programmer has escaped the > (that is never needed).

Only these characters need to be escaped to obtain a literal character (outside a character class):

( ) ^ $ [ \ | . * + ?

{ # only in these cases: {n} {m,n} {m,}
# where m and n are integers
+ the pattern delimiter

Most of the time an escaped character that doesn’t need to be escaped (or that does not have a special meaning like \b \w \d …) is simply ignored by the regex engine. But it’s not the case here, because (?> is a fixed sequence to open an atomic group, and the sequence (? is not allowed except for these cases:


An atomic group is a group that, when the regex engine exits from it, automatically throws away all backtracking positions remembered by any tokens inside the group
/a(?>(bc|b))c/ 
将会匹配 abcc 而不会匹配 abc 
如果不是元组 而是 /a(bc|b)c/ abc 和abcc 都会匹配  
元组为什么匹配不了abc呢 查看上面的 元组正则 a 匹配 a ;(?>(bc|b)) 匹配到bc 然后 abc 之后 没有任何东西匹配c 就会直接退出 
而不是像非元组 那样 会在(bc|b) 创建一个岔路 一旦bc 随后的 匹配失败
还是会返回到 这个岔路上 走第二条路

再看下面例子

\b(?>integer|insert|in)\b 和 \b(?>in|integer|insert)\b 
他们能不能 匹配 insert 

答案是 第一个 能匹配 第二个不能匹配
  • an inline modifier: (?i) (?-i) Inline modifier reference link is to ignore symbols such as capitalization
  • a non capturing group with inline modifiers: (?i:…) (?-i:…)
  • a lookaround: (?=…) (?!…) (?<=…) (?
# 下面是一个例子用来说明标识符的使用
$pattern = '/(?i)caseless(?-i)cased(?i)caseless/';
#等价
$pattern2 = '/(?i)caseless(?-icased)caseless/';


$str = "caselessCaseDcaselesS";
$c = preg_match_all('/(?i)caseless(?-i:Cased)caseless/', $str, $matches);
var_dump($c,$matches);

The above is just to understand that only the really common or the following and some parameters are used

The above inline method can be replaced by /i /m /x

/(?i)caseless(?-i:Cased)caseless/

inline

Table 4. Common grouping syntax

Common grouping syntax

Classification code/syntax illustrate
capture (exp) Match exp, and capture text into automatically named groups
(?exp) Match exp, and capture the text into the group named name, can also be written as (?'name'exp)
(?:exp) Match exp, do not capture the matched text, and do not assign a group number to this group
zero-width assertion (?=exp) matches the position before exp
(?<=exp) matches the position after exp
(?!exp) matches a position that is not followed by exp
(? matches a position not preceded by exp
Comment (?#comment) This type of grouping has no effect on regular expression processing, and is used to provide comments for human reading
# 验证正则

# ?=HT 表示 后面是HT 的位置
$str = "demoHTdmzns";
preg_match_all('/.*m.*?(?=HT)/', $str, $matches);
var_dump($matches); #会抓取到 demo

# ?<=HT 表示 前面是HT的位置

$str = "demoHTdmzns";
preg_match_all('/(?<=HT).*n/', $str, $matches);
var_dump($matches); # 会匹配到 dmzn


#?!HT 表示后面不是HT的位置
$str = "demoHTdmzns";
preg_match_all('/.*?m(?!HT)/', $str, $matches);
var_dump($matches);
# 结果如下
array(1) {
  [0]=>
  array(2) {
    [0]=>
    string(3) "dem"
    [1]=>
    string(5) "oHTdm"
  }
}

如果是 

$str = "demHTdmzns";
preg_match_all('/.*?m(?!HT)/', $str, $matches);
var_dump($matches); # 会匹配 demoHTdm
#结果如下
array(1) {
  [0]=>
  array(1) {
    [0]=>
    string(7) "demHTdm"
  }
}

#?<!HT 前面不是HT 的位置

$str = "deHTdmzns";
preg_match_all('/(?<!HT)m.*/', $str, $matches);
var_dump($matches); # 会匹配 mzns

如果是
$str = "deHTdmzns";
preg_match_all('/(?<!HT)dm.*/', $str, $matches);
var_dump($matches); # 匹配为空

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325881813&siteId=291194637