Regular Expression in python
What is Regular Expression?
Regular Expression is a powerful tool in text processing toolbox.Regular expression can be used to define text patterns.You can use them to verify whether given inputs match to the patterns or not, also to find different trends in string matching.
Regular expressions are also called as regex patterns or regexes. They can be embedded inside Python and made available through the re module.
Regular Expression has functions to search patterns or to match and replace patterns.
It uses * and ? wildcard characters.
The difference between regular expressions and simple wild card is as follows:
1. A regular expression can match multiple times anywhere in a string.
2. As compared to simple wildcards regular expressions are much more complicated and much richer.
In regular expression the string always matches itself.
Example :
If there is a pattern ‘AAA’ it will always match itself in any string like ‘ITVOYAAAGERS’.This means we can find one string in other string.The wildcard character ‘.'(dot) matches any character in a string except new line character.For example ‘A.A’ matches ‘AAA’,’ABA’,’A.A’.(all this types can be matched with ‘A.A’)
The problem arises when we want to match something with the dot i.e. ‘A.A’.
Regular expression makes you to escape special characters by adding backslash() in front of them.For example, to match ‘A.A’ means only ‘A.A’, we can use the pattern as ‘A.A’, if we use this slash then the special meaning of that dot is eliminated and exact string with dot is being matched i.e. ‘A.A’.
Python uses backslash for escape sequences like newline character i.e. n used as carriage return and t as tab character.
So when we want to consider this slash character in pattern matching we need to make that string as raw to take in all patterns mentioned in it including slash.to make a string as raw string we need to add ‘r’ before that pattern which we wish to match.For example A.A can be matched by specifying r”A.A”.
Types of Regular Expressions
There are two types of regular expression.
Basic Regular Expressions
The basic regular expressions are ?,+,{,},) etc.
In basic regular expressions backslash ‘’ is required for (),{} eg.(,}
Extended regular expression
The extended regular expression are ?,+,{,(,|,grep,egrep, etc.
In extended regular expression backslash is not required for (),{}.
Following table shows the utility and type of regular expression for Solaris command.
Utility | Regular Expression Type |
---|---|
vi | Basic Regular Expression |
grep | Basic Regular Expression |
sed | Basic Regular Expression |
ed | Basic Regular Expression |
dbx | Basic Regular Expression |
dbxtool | Basic Regular Expression |
more | Basic Regular Expression |
csplit | Basic Regular Expression |
expr | Basic Regular Expression |
lex | Basic Regular Expression |
pg | Basic Regular Expression |
nl | Basic Regular Expression |
rdist | Basic Regular Expression |
awk | Extended Regular Expression |
nawk | Extended Regular Expression |
egrep | Extended Regular Expression |
EMACS | EMACS Regular Expression |
PERL | PERL Regular Expression |
re module
This module is used to involve regular expression patterns or metacharacters in python.
Whenever error is encountered re.error exception is raised.
The re module is having many functions to support pattern matching.These functions are listed below:
1. match()
2. search()
3. findall()
4. split()
5. sub()
Note : Above functions are explained in later posts
You can also check following other posts on regular expression using python here
- Flags for regular expressions(Modifiers)
- Functions in ‘re’ module(Part II)-findall(), split(), sub()
- Functions in ‘re’ module(Part I)- Match vs Search
- Metacharachters in Regular Expression