Metacharacters or Regular Expression Patterns

Metacharacters or Regular Expression Patterns

The (+,?,*,^,$,(,),{,},[,],|,) are the control characters , except these characters all other characters can match themselves.If you don’t want any control characters then add backslash before them.

You can also check regular expression

  • Regular Expression
  • Table of patterns/metacharachters and their description.

    PatternDescription
    ^Matches expression at the beginning of the string(eg. "^.ug" would match jug,bug,pug which means if present at the start of the string
    $Matches expression at the end of the string(eg. ".ug$" would match if pug,bug,jug if present at the end)
    .Matches any character except new line character
    [...]Matches character present in bracket
    eg.[ch]at can match cat and hat but not mat
    [^...]Matches any character which is not in bracket eg it can match mat,hat but not cat
    ()It contains the sub expression
    *This means teh previous expression may repeat zero or more times.
    eg. "c.*" can match any strings which begins with c like cat,cow etc
    re*Matches 0 or more occurrences of preceding expression
    re+Matches 1 or more occurrences of preceding expression
    re?Matches 0 or 1 more occurrences of preceding expression
    re{n}Matches exactly n number of occurrences of preceding expression
    re{n,}Matches n or more occurrences of preceding characters.
    rel{n,m}Matches at least n and at most m occurrences of preceding characters.
    a | bMatches either a or b
    (re)Groups regular expressions and remembers matched text.
    (?imx)Temporarily toggles ON i,m or x options within expression.If in parentheses,only that area is affected
    (?-imx)Temporarily toggles OFF i,m or x options within expression.If in parentheses,only that area is affected
    (?: re)Groups regular expressions without remembering matched text
    (?imx: re)Temporarily toggles ONi,m or x options within parentheses
    (?-imx: re)Temporarily toggles OFF i,m or x options within parentheses
    (?#...)Comment
    (?= re)Specifies position using pattern.Doesn't have a range
    (?! re)Specifies position using pattern negation. Doesn't have a range
    (?> re)Matches independent pattern without backtracking.
    \wMatches word character
    \WMatches non word character
    \sMatches whitespace.
    Equivalent to [\t \n \r \f]
    \Smatches non whitespaces
    \dmatches digits. Equivalent to 0-9
    \DMatches non digits
    \AMatches beginning of string
    \ZMatches end of the string.
    If a new line exists,it matches just before new line.
    \zMatches end of string
    \GMatches point where last match finished
    \bMatches word boundaries when outside brackets.
    Matches backspace when inside brackets.
    \BMatches nonword boundaries
    \n, \t, \rMatches newlines, carriage returns,return tabs etc
    \1....\9Matches nth grouped sub expression
    \10Matches nth grouped subexpression if it is matched already.Otherwise refers to the octal representation of a character code

    Points to remember:

    1. A dot (.) represents any character except new line character.
    2. A plus sign (+) means the previous expression may repeat one or more times.
    3. An asterik (*) means the previous expression may repeat zero or more times.
    4. + and * both are greedy metacharacters and will try to match as much text as possible.
    5. ‘r’ indicates the pattern should be treated as raw string to avoid conflicts in the way python recognizes escape sequence and the way ‘re’ module does it.
    6. Parenthesis indicate the start and end of a group.
    7. ? mark cannot match anything by itself. It is used to control the greedy behavior of + and *.So it is called as control cahracter.

    You can also check following other posts on regular expression using python here

Leave a Comment