Regular expressions
I. Overview
verify
web crawler.
The concept:
a string with syntax format.
Functions
PCRE
1, regular perl language syntax compatible. (Java C)
2, fast, and efficient.
The POSIX
1, efficiency is not high
2, a security risk.
3, windows can not run.
Second, the regular expression applied
preg_match_all (string $ pattern, string $ subject, array $ match)
Function: regular matching
parameters:
pattern regular expression
subject string to match the
result of match of the matched
return: the number of matches to.
Syntax:
- Delimiter
- atom
- metacharacters
- Pattern Modifiers
- delimiter is
a regular expression must have a delimiter. In addition to numbers, letters, underscores, the other can be used as a delimiter. The industry's most commonly used is the definite integral "/" delimiter appear in pairs.
- Atomic
Atom is the smallest constituent unit of the regular expression. A regular expression in order to be meaningful, there is at least one atom.
a, numbers, letters, underline, all print character called atoms.
b, non-printing characters.
\ n new line
c, the characters need to be escaped. Characters that have special meaning. For example: Metacharacter
d, atom have a special meaning
\ d represent all numbers
\ D represent all the non-numeric
\ w numbers, letters, underscores
\ W non-numeric, alphabetic, underscore
\ s representing all blank
\ S representatives all non-blank
E, custom atoms table
[] range designation atoms such as: az represents all lowercase letters
AZ representing all capital letters
0-9 represent all numbers
may be given simultaneously a plurality of symbol interval range between not required, a plurality of ranges.
For example: a-z0-9
the custom table atom ^ represents a non-
f ,. atoms of any
element character
element is used to modify the character atoms.
* Represents a modified atom can appear zero or more times.
+ Represents modified atoms may occur once or multiple times.
? 1 may occur zero or more times for the modified atoms.
{m} represents the modified atoms may occur m times.
{n, m} represents the modified atoms may be present up to at least n times m.
the number n <= appear <= m
{n,} represents the modified atoms may occur at least n most do not limit
n <number = occur
| or
^, and \ A representative of the start as a ^ and \ A modified character
and $ \ Z $ and as a representative of \ Z modified characters end
()
- sub mode
- to change the match range
- back-references
\ b and \ B \ b represents the character boundaries, \ B represents a non-character boundary
- mode modifier
mode modifier is used to modify the regular expression. Pattern Modifiers are written in the regular expression delimiter outside.
i: the positive expression case insensitive.
Regular mail verification principle picture:
Regular registration verification principle Pictures