Regular Expression

 

Regular Expression in python

 

What is Regular Expression?
Regular Expression is a powerful tool in text processing toolbox.Regular expression can be used to define text patterns.You can use them to verify whether given inputs match to the patterns or not, also to find different trends in string matching.
Regular expressions are also called as regex patterns or regexes. They can be embedded inside Python and made available through the re module.
Regular Expression has functions to search patterns or to match and replace patterns.
It uses * and ? wildcard characters.
The difference between regular expressions and simple wild card is as follows:
1. A regular expression can match multiple times anywhere in a string.
2. As compared to simple wildcards regular expressions are much more complicated and much richer.

In regular expression the string always matches itself.
Example :
If there is a pattern ‘AAA’ it will always match itself in any string like ‘ITVOYAAAGERS’.This means we can find one string in other string.The wildcard character ‘.'(dot) matches any character in a string except new line character.For example ‘A.A’ matches ‘AAA’,’ABA’,’A.A’.(all this types can be matched with ‘A.A’)
The problem arises when we want to match something with the dot i.e. ‘A.A’.
Regular expression makes you to escape special characters by adding backslash() in front of them.For example, to match ‘A.A’ means only ‘A.A’, we can use the pattern as ‘A.A’, if we use this slash then the special meaning of that dot is eliminated and exact string with dot is being matched i.e. ‘A.A’.
Python uses backslash for escape sequences like newline character i.e. n used as carriage return and t as tab character.
So when we want to consider this slash character in pattern matching we need to make that string as raw to take in all patterns mentioned in it including slash.to make a string as raw string we need to add ‘r’ before that pattern which we wish to match.For example A.A can be matched by specifying r”A.A”.

 

Types of Regular Expressions

There are two types of regular expression.

Basic Regular Expressions

The basic regular expressions are ?,+,{,},) etc.
In basic regular expressions backslash ‘’ is required for (),{} eg.(,}

 

Extended regular expression

The extended regular expression are ?,+,{,(,|,grep,egrep, etc.
In extended regular expression backslash is not required for (),{}.
Following table shows the utility and type of regular expression for Solaris command.

UtilityRegular Expression Type
viBasic Regular Expression
grepBasic Regular Expression
sedBasic Regular Expression
edBasic Regular Expression
dbxBasic Regular Expression
dbxtoolBasic Regular Expression
moreBasic Regular Expression
csplitBasic Regular Expression
exprBasic Regular Expression
lexBasic Regular Expression
pgBasic Regular Expression
nlBasic Regular Expression
rdistBasic Regular Expression
awkExtended Regular Expression
nawkExtended Regular Expression
egrepExtended Regular Expression
EMACSEMACS Regular Expression
PERLPERL Regular Expression

 

re module

This module is used to involve regular expression patterns or metacharacters in python.
Whenever error is encountered re.error exception is raised.

The re module is having many functions to support pattern matching.These functions are listed below:
1. match()
2. search()
3. findall()
4. split()
5. sub()

Note : Above functions are explained in later posts

You can also check following other posts on regular expression using python here

Leave a Comment