Python Regular Expression

Check the updated DevOps Course.

Course Registration link:

Course Link:

YouTube link:

Regular expressions can be think like a mini-language for specifying text pattern

re.compile(): To create a regex find a pattern in a stringre.match(): does this entire string conform to this patternre.findall(): find all patterns in this string and returns all the matches in it not just the first to get the matched string

Searching with Regex

 match =,string)

Pattern type(Character Classes)

\w : sequence of word-like characters [a-zA-Z0–9_] that are not space\d: Any numeric digit[0–9]\s: whitespace characters(space,newline,tab)\D: match characters that are NOT numeric digits\W: match characters that are NOT words,digit or underscore\S: match characters that are NOT spaces,tab or newline

Repetition Group

+ : 1 or more* : 0 or more?: 0 or 1{k}: exactly integer K occurence{m,n}: m to n occurence inclusive. :matches any character except the newline(\n)^: start of the string$: end of string\: escape character 


# Re module has all regular expression function in it>>> import re>>> example = “Welcome to the world of Python”>>> pattern = r’Python’>>> match =,example)>>> print(match)<_sre.SRE_Match object; span=(24, 30), match=’Python’>>>> if match:… print(“found”,… else:… print(“No match found”)found Python

NOTE: r is for raw string as Regex often uses \ backslashes(\w), so they are often raw strings(r’\d’)

Most popular example is finding phone number :-)

>>> import re>>> message = “my number is 510–123–4567”# Here we are creating regex object,which define the pattern we are looking for 
>>> myregex = re.compile(r’\d\d\d-\d\d\d-\d\d\d\d’)
# Then we are trying to find a pattern in the string
>>> match =
# This will tell us the actual text
>>> print(

In case we have multiple phone number, use findall

>>> import re>>> message = “my number is 510–123–4567 and my office number is 510–987–1234”>>> myregex = re.compile(r’\d\d\d-\d\d\d-\d\d\d\d’)# Find all pattern of the string and return a list objects
>>> print(myregex.findall(message))
[‘510–123–4567’, ‘510–987–1234’]

Lets use group to separate area code with phone number. Here parenthesis have special meaning where group start and where group end.

import remyregex = re.compile(r’(\d\d\d)-(\d\d\d-\d\d\d\d)’)>>> match =“My number is 510–123–4567”)>>> match<_sre.SRE_Match object; span=(13, 25), match=’510–123–4567'># This will return the full matching string
‘510–123–4567’# Only return the first matching group(area code)
‘510’#Second matching group(Return the whole phone number)

To find out parentheses literally in string, we need to escape parentheses using backslash \(

>>> myregex = re.compile(r’\(\d\d\d\)-(\d\d\d-\d\d\d\d)’)>>> match =“My number is (510)-123–4567”)>>>‘(510)-123–4567’

Pipe Character(|) match one of many possible group

>>> lang = re.compile(r’Pyt(hon|con|mon)’)>>> match =“Python is a wonderful language”)>>>‘Python’>>> match =“Pytcon is a wonderful language”)>>>‘Pytcon’>>> match =“Pytmon is a wonderful language”)>>>‘Pytmon’

If regular expression not able to find that pattern it will return None, to verify that

>>> match =“Pytut is a wonderful language”)>>> match == NoneTrue

? : zero or one time

>>> import re# Here ho is optional it might occur zero time or one time
>>> myexpr = re.compile(r’Pyt(ho)?n’)
>>> match =“Python a wonderful language”)>>>‘Python’>>> match =“Pytn a wonderful language”)>>>‘Pytn’

So if we try to match this expression it will fail

>>> match =“Pythohon a wonderful language”)>>> (most recent call last):File “<stdin>”, line 1, in <module>AttributeError: ‘NoneType’ object has no attribute ‘group’>>> match ==NoneTrue

Same way as with our previous example of Phone Number we can make area code optional

>>> myphone = re.compile(r’(\d\d\d-)?\d\d\d-\d\d\d\d’)>>> match =“My phone number is 123–4567”)>>>‘123–4567’

“*” zero or more time

>>> import re>>> myexpr = re.compile(r’Pyth(on)*’)>>> match =“Welcome to the world of Pythononon”)>>>‘Pythononon’

“+” must appear atleast 1 or more time

>>> myexpr = re.compile(r’Pyth(on)+’)>>> match =“Welcome to the world of Pyth”)>>> (most recent call last):File "<stdin>", line 1, in <module>AttributeError: 'NoneType' object has no attribute 'group'>>> match =“Welcome to the world of Python”)>>>‘Python’>>> match =“Welcome to the world of Pythonononon”)>>>‘Pythonononon’

Now if we want to match specific number of times

>>> myregex = re.compile(r’(Re){3}’)>>> match =“My matching string is ReReRe”)>>>‘ReReRe’# Range of repetitions
>>> myregex = re.compile(r'(Re){3,5}')
>>> match ="My matching string is ReReReRe")

Regular expression in Python do greedy matches i.e it try to match longest possible string

# Instead of searching for min i.e first 3 it matches first 5>>> mydigit = re.compile(r’(\d){3,5}’)>>> match =‘123456789’)>>>‘12345’

To do a non-greedy match add ? (then it matches shortest string possible),Putting a question mark after the curly braces makes it to do a non-greedy match

>>> mydigit = re.compile(r’(\d){3,5}?’)>>> match =‘123456789’)>>>‘123’

Let’s take a look at few more example which involves character classes

\w : sequence of word-like characters [a-zA-Z0–9_] that are not space\d: Any numeric digit[0–9]\s: whitespace characters(space,newline,tab)

Let say I need to match this address

>>> import re>>> address = “123 fremont street”>>> match = re.compile(r’\d+\s\w+\s\w+’)>>>match.findall( match.finditer( match.flags match.fullmatch(>>> match.findall(address)[‘123 fremont street’]

We can create our own character class

#Let's create our own character class which matches all lower case vowel
>>> myregex = re.compile(r’[aeiou]’) #To match even upper case
>>> mypat = “Welcome to the world of Python”>>> myregex.findall(mypat)[‘e’, ‘o’, ‘e’, ‘o’, ‘e’, ‘o’, ‘o’, ‘o’]

Now if we want to match two vowel in a row

>>> myregex = re.compile(r’[aeiouAEIOU]{2}’)>>> mypat = “Welcome to the world of Python ae”>>> myregex.findall(mypat)[‘ae’]

Negative Character Class(Use of ^ means search everything except vowel)

>>> myregex = re.compile(r’[^aeiouAEIOU]’)>>> mypat = “Welcome to the world of Python ae”>>> myregex.findall(mypat)[‘W’, ‘l’, ‘c’, ‘m’, ‘ ‘, ‘t’, ‘ ‘, ‘t’, ‘h’, ‘ ‘, ‘w’, ‘r’, ‘l’, ‘d’, ‘ ‘, ‘f’, ‘ ‘, ‘P’, ‘y’, ‘t’, ‘h’, ’n’, ‘ ‘]

Let take look at dot (. :matches any character except the newline(\n))

>>> myregex = re.compile(r’.x’)>>> mypat = “Linux Unix Minix”>>> myregex.findall(mypat)[‘ux’, ‘ix’, ‘ix’]

Dot is majorly used with *

* : 0 or more

Now if we change our regex to include both

>>> myregex = re.compile(r’.*x’)>>> mypat = “Linux Unix Minix”>>> myregex.findall(mypat)[‘Linux Unix Minix’]


.*: always perform greedy match(except newline).*?: To make it non-greedy add ?

Let take a look at this with the help of this example

>>> mystr = ‘“Welcome to the world of Python” great language to learn”’>>> mypat = re.compile(r’”(.*?)”’)#Because of non-greedy nature it will search till first " is encountered
>>> mypat.findall(mystr)
[‘Welcome to the world of Python’]

But in case of greedy match

>>> mypat = re.compile(r’”(.*)”’)# It will return the whole string
>>> mypat.findall(mystr)
[‘Welcome to the world of Python” great language to learn’]

Now as we mentioned above .* matches everything except newline

>>> myexpr = “Welcome to the \n world of \n Python”>>> print(myexpr)Welcome to theworld ofPython>>> mypat = re.compile(r’(.*)’)>>><_sre.SRE_Match object; span=(0, 15), match=’Welcome to the ‘>

Now even in this case if we want to perform a greedy match add re.DOTALL(then it will match newlines as well)

>>> mypat = re.compile(r’.*’,re.DOTALL)>>><_sre.SRE_Match object; span=(0, 34), match=’Welcome to the \n world of \n Python’>

Second argument is really useful, specially if we want to perform case-insensitive search(re.I)

>>> import re>>> mystr = “Why Linux Is Such An Awesome Platform”>>> mypat = re.compile(r’[aeiou]’,re.I)>>> mypat.findall(mystr)[‘i’, ‘u’, ‘I’, ‘u’, ‘A’, ‘A’, ‘e’, ‘o’, ‘e’, ‘a’, ‘o’]

AWS Community Builder, Ex-Redhat, Author, Blogger, YouTuber, RHCA, RHCDS, RHCE, Docker Certified,4XAWS, CCNA, MCP, Certified Jenkins, Terraform Certified, 1XGCP