Python Regular Expression Exercises

Let’s check out some exercises that will help you understand Regular Expressions better.

Exercise 6-a

From the list keep only the lines that start with a number or a letter after > sign.

import re str=””” >Venues >Marketing >medalists >Controversies >Paralympics >snowboarding >[1] >Netherlands >[2] >Norway >[10] >[11] >References >edit >[12] >Norway >Germany >Canada >Netherlands >Japan >Italy >Belarus >China >Slovakia <$#%#$% <#$#$$ <**&&^^ >Slovenia >Belgium >Spain >Kazakhstan >[15] >1964 >1968 >1972 >1992 >1996 >2000″”” #Type your answer here. data=re.findall(”, str) print(data) ==== from unittest.gui import TestCaseGui class myTests(TestCaseGui): def testOne(self): self.assertEqual(data,re.findall(‘>w+’, str),”data checks”) myTests().main() 
Hint 1
You can use findall method from the regex library:

i.e.: re.findall()
Hint 2
w+ can be a meaningful regular expression in this case.
Solution
data = re.findall('>w+', str)

Exercise 6-b

Write a regex so that the full email addresses are extracted.

i.e.: mike@protonmail.com

import re str=’The advancements in biomarine studies franky@google.com with the investments necessary and Davos sinatra123@yahoo.com Then The New Yorker article on wind farms…' #Type your answer here. regex= emails=re.findall(regex, str) print(emails) ==== from unittest.gui import TestCaseGui class myTests(TestCaseGui): def testOne(self): self.assertEqual(emails,[‘franky@google.com’, ‘sinatra123@yahoo.com’] ,”regex checks”) myTests().main() 
Hint 1

One way to approach this problem is:

1- include everything that’s non-space before the “@” sign

2- adding the “@” sign

3- everything non-space after the “@” sign.

This example really shows the versatility of regex because with this format, you will catch the emails regardless of different suffixes (.co.uk, .gov.fr, .co.jp etc.)

Hint 2

Regular Expression for everything except space is:

S : Non-space characters

Hint 3

By combining + with S you can apply non-space to one or more characters
i.e.: S+

Solution

regex = r'S+@S+'

Note: So the part inside quotes is purely regex. But you might be wondering what r is doing in front. r’text here’ is a fantastic trick in Python that can help you avoid numerous conflicts such as back slash misinterpretations while typing directories etc.

Raw string can help you remember and understand the function of r.

It’s a good practice to have sometimes, otherwise if you type your string without the r backslashes will be trated as escape characters.

Exercise 6-c

This time write a regex to get only the part of the email before the “@” sign and include the “@” sign.

i.e: only mike@ part from mike@protonmail.com

import re str=’The advancements in biomarine studies franky@google.com, with the investments necessary and Davos sinatra123@yahoo.com Then The New Yorker article on wind farms…' #Type your answer here. regex= emails=re.findall(regex, str) print(emails) ==== from unittest.gui import TestCaseGui class myTests(TestCaseGui): def testOne(self): self.assertEqual(emails,[‘franky@’, ‘sinatra123@’] ,”regex checks”) myTests().main() 
Hint 1

One way to approach this problem is:

1- include everything that’s non-space before the “@” sign

2- adding the “@” sign

Hint 2

Regular Expression for everything except space is:

S : Non-space characters

Hint 3

By combining + with S you can apply non-space to one or more characters
i.e.: S+

Solution

regex = r'S+@'

Need More Exercises?

Check out Holy Python AI+ for amazing Python learning tools.

*Includes 14 more programming languages, inspirational tools and 1-on-1 consulting options.

Umut Sagir

Finance & Data Science Professional,
Founder of HolyPython