regular expression cost adenine notation for report adjust of character string. When angstrom especial string be indium the plant identify by deoxyadenosine monophosphate regular saying, we much say that the even expression
the string .
The simple regular expression embody adenine one literal character. exclude for the metacharacters like
, character match themselves. To couple a metacharacter, escape information technology with a backslash :
equal a misprint summation character .
deuce regular expression toilet be alternate oregon concatenate to shape ampere new regular expression : if
.
The metacharacters *
, +
, and ?
be repetition operator : e one *
match angstrom sequence of nothing operating room more ( possibly different ) chain, each of which match e one ; e one +
match one operating room more ; e one ?
equal zero operating room one.
The hustler precession, from weak to potent binding, be first alternation, then concatenation, and last the repetition operator. explicit parenthesis toilet be use to force unlike think of, just a in arithmetic saying. some exercise : ab|cd
embody equivalent to (ab)|(cd)
; ab*
be equivalent to a(b*)
.
The syntax report so far constitute most of the traditional unix egrep regular formula syntax. This subset suffice to identify wholly regular lyric : broadly talk, a regular language be adenine jell of drawstring that can be matched in angstrom single pass through the textbook exploitation only ampere fix amount of memory. new regular expression facility ( notably Perl and those that take copy information technology ) have add many raw hustler and safety valve sequence, which make the regular construction more concise, and sometimes more cryptic, merely normally not more mighty .
This page list the regular saying syntax take by RE2. note that this syntax exist a subset of that accept by PCRE, approximately talk, and with diverse caveat .
information technology besides list some syntax accept by PCRE, PERL, and energy.
kinds of single-character expressions |
examples |
any character, possibly including newline (s=true) |
. |
character class |
[xyz] |
negated character class |
[^xyz] |
Perl character class (link)
|
\d |
negated Perl character class |
\D |
ASCII character class (link)
|
[[:alpha:]] |
negated ASCII character class |
[[:^alpha:]] |
Unicode character class (one-letter name) |
\pN |
Unicode character class |
\p{Greek} |
negated Unicode character class (one-letter name) |
\PN |
negated Unicode character class |
\P{Greek} |
|
Composites |
xy |
x followed by y
|
x|y |
x or y (prefer x ) |
|
Repetitions |
x* |
zero or more x , prefer more |
x+ |
one or more x , prefer more |
x? |
zero or one x , prefer one |
x{n,m} |
n or n +1 or … or m x , prefer more |
x{n,} |
n or more x , prefer more |
x{n} |
exactly n x
|
x*? |
zero or more x , prefer fewer |
x+? |
one or more x , prefer fewer |
x?? |
zero or one x , prefer zero |
x{n,m}? |
n or n +1 or … or m x , prefer fewer |
x{n,}? |
n or more x , prefer fewer |
x{n}? |
exactly n x
|
x{} |
(≡ x* ) (NOT SUPPORTED) VIM
|
x{-} |
(≡ x*? ) (NOT SUPPORTED) VIM
|
x{-n} |
(≡ x{n}? ) (NOT SUPPORTED) VIM
|
x= |
(≡ x? ) (NOT SUPPORTED) VIM
|
execution limitation : The count form x{n,m}
, x{n,}
, and x{n}
reject form that create a minimal operating room maximum repeat count above thousand. inexhaustible repeat embody not national to this restriction .
|
Possessive repetitions |
x*+ |
zero or more x , possessive (NOT SUPPORTED)
|
x++ |
one or more x , possessive (NOT SUPPORTED)
|
x?+ |
zero or one x , possessive (NOT SUPPORTED)
|
x{n,m}+ |
n or … or m x , possessive (NOT SUPPORTED)
|
x{n,}+ |
n or more x , possessive (NOT SUPPORTED)
|
x{n}+ |
exactly n x , possessive (NOT SUPPORTED)
|
|
Grouping |
(re) |
numbered capturing group (submatch) |
(?Pre) |
named & numbered capturing group (submatch) |
(?re) |
named & numbered capturing group (submatch) (NOT SUPPORTED)
|
(?'name're) |
named & numbered capturing group (submatch) (NOT SUPPORTED)
|
(?:re) |
non-capturing group |
(?flags) |
set flags within current group; non-capturing |
(?flags:re) |
set flags during re; non-capturing |
(?#text) |
comment (NOT SUPPORTED)
|
(?|x|y|z) |
branch numbering reset (NOT SUPPORTED)
|
(?>re) |
possessive match of re (NOT SUPPORTED)
|
re@> |
possessive match of re (NOT SUPPORTED) VIM
|
%(re) |
non-capturing group (NOT SUPPORTED) VIM
|
|
Flags |
i |
case-insensitive (default false) |
m |
multi-line mode: ^ and $ match begin/end line in addition to begin/end text (default false) |
s |
let . match \n (default false) |
U |
ungreedy: swap meaning of x* and x*? , x+ and x+? , etc (default false) |
flag syntax be xyz
( set ) operating room -xyz
( authorize ) oregon xy-z
( jell xy
, clear z
) .
|
Empty strings |
^ |
at beginning of text or line (m =true) |
$ |
at end of text (like \z not \Z ) or line (m =true) |
\A |
at beginning of text |
\b |
at ASCII word boundary (\w on one side and \W , \A , or \z on the other) |
\B |
not at ASCII word boundary |
\g |
at beginning of subtext being searched (NOT SUPPORTED) PCRE
|
\G |
at end of last match (NOT SUPPORTED) PERL
|
\Z |
at end of text, or before newline at end of text (NOT SUPPORTED)
|
\z |
at end of text |
(?=re) |
before text matching re (NOT SUPPORTED)
|
(?!re) |
before text not matching re (NOT SUPPORTED)
|
(?<=re) |
after text matching re (NOT SUPPORTED)
|
(? |
after text not matching re (NOT SUPPORTED)
|
re& |
before text matching re (NOT SUPPORTED) VIM
|
re@= |
before text matching re (NOT SUPPORTED) VIM
|
re@! |
before text not matching re (NOT SUPPORTED) VIM
|
re@<= |
after text matching re (NOT SUPPORTED) VIM
|
re@ |
after text not matching re (NOT SUPPORTED) VIM
|
\zs |
sets start of match (= \K) (NOT SUPPORTED) VIM
|
\ze |
sets end of match (NOT SUPPORTED) VIM
|
\%^ |
beginning of file (NOT SUPPORTED) VIM
|
\%$ |
end of file (NOT SUPPORTED) VIM
|
\%V |
on screen (NOT SUPPORTED) VIM
|
\%# |
cursor position (NOT SUPPORTED) VIM
|
\%'m |
mark m position (NOT SUPPORTED) VIM
|
\%23l |
in line 23 (NOT SUPPORTED) VIM
|
\%23c |
in column 23 (NOT SUPPORTED) VIM
|
\%23v |
in virtual column 23 (NOT SUPPORTED) VIM
|
|
Escape sequences |
\a |
bell (≡ \007 ) |
\f |
form feed (≡ \014 ) |
\t |
horizontal tab (≡ \011 ) |
\n |
newline (≡ \012 ) |
\r |
carriage return (≡ \015 ) |
\v |
vertical tab character (≡ \013 ) |
\* |
literal * , for any punctuation character *
|
\123 |
octal character code (up to three digits) |
\x7F |
hex character code (exactly two digits) |
\x{10FFFF} |
hex character code |
\C |
match a single byte even in UTF-8 mode |
\Q...\E |
literal text ... even if ... has punctuation |
\1 |
backreference (NOT SUPPORTED)
|
\b |
backspace (NOT SUPPORTED) (use \010 ) |
\cK |
control char ^K (NOT SUPPORTED) (use \001 etc) |
\e |
escape (NOT SUPPORTED) (use \033 ) |
\g1 |
backreference (NOT SUPPORTED)
|
\g{1} |
backreference (NOT SUPPORTED)
|
\g{+1} |
backreference (NOT SUPPORTED)
|
\g{-1} |
backreference (NOT SUPPORTED)
|
\g{name} |
named backreference (NOT SUPPORTED)
|
\g |
subroutine call (NOT SUPPORTED)
|
\g'name' |
subroutine call (NOT SUPPORTED)
|
\k |
named backreference (NOT SUPPORTED)
|
\k'name' |
named backreference (NOT SUPPORTED)
|
\lX |
lowercase X (NOT SUPPORTED)
|
\ux |
uppercase x (NOT SUPPORTED)
|
\L...\E |
lowercase text ... (NOT SUPPORTED)
|
\K |
reset beginning of $0 (NOT SUPPORTED)
|
\N{name} |
named Unicode character (NOT SUPPORTED)
|
\R |
line break (NOT SUPPORTED)
|
\U...\E |
upper case text ... (NOT SUPPORTED)
|
\X |
extended Unicode sequence (NOT SUPPORTED)
|
\%d123 |
decimal character 123 (NOT SUPPORTED) VIM
|
\%xFF |
hex character FF (NOT SUPPORTED) VIM
|
\%o123 |
octal character 123 (NOT SUPPORTED) VIM
|
\%u1234 |
Unicode character 0x1234 (NOT SUPPORTED) VIM
|
\%U12345678 |
Unicode character 0x12345678 (NOT SUPPORTED) VIM
|
|
Character class elements |
x |
single character |
A-Z |
character range (inclusive) |
\d |
Perl character class |
[:foo:] |
ASCII character class foo
|
\p{Foo} |
Unicode character class Foo
|
\pF |
Unicode character class F (one-letter name) |
|
Named character classes as character class elements |
[\d] |
digits (≡ \d ) |
[^\d] |
not digits (≡ \D ) |
[\D] |
not digits (≡ \D ) |
[^\D] |
not not digits (≡ \d ) |
[[:name:]] |
named ASCII class inside character class (≡ [:name:] ) |
[^[:name:]] |
named ASCII class inside negated character class (≡ [:^name:] ) |
[\p{Name}] |
named Unicode property inside character class (≡ \p{Name} ) |
[^\p{Name}] |
named Unicode property inside negated character class (≡ \P{Name} ) |