Python Regular Expression Exercises

Let’s check out some exercises that will help you understand Regular Expressions better.

Exercise 6-a

From the list keep only the lines that start with a number or a letter after > sign.


You can use findall method from the regex library:
i.e.: re.findall()
\w+ can be a meaningful regular expression in this case.
data = re.findall('>\w+', str)

Exercise 6-b

Write a regex so that the full email addresses are extracted.
i.e.: mike@protonmail.com


One way to approach this problem is:

1- include everything that’s non-space before the “@” sign 

2- adding the “@” sign

3- everything non-space after the “@” sign.

This example really shows the versatility of regex because with this format, you will catch the emails regardless of different suffixes (.co.uk, .gov.fr, .co.jp etc.)

Regular Expression for everything except space is:

\S : Non-space characters

By combining + with \S you can apply non-space to one or more characters
i.e.: \S+

regex = r'\S+@\S+'

Note: So the part inside quotes is purely regex. But you might be wondering what r is doing in front. r’text here’ is a fantastic trick in Python that can help you avoid numerous conflicts such as back slash misinterpretations while typing directories etc.

Raw string can help you remember and understand the function of r.

It’s a good practice to have sometimes, otherwise if you type your string without the r backslashes will be trated as escape characters.

Regex Cheat Sheet

[0-9]    :     0 to 9

[a-z]     :     a to z

[A-Z]    :     A to Z

abc         :     a, b and c

123         :     1, 2 and 3

.               :     Any character
[^a]       :     Not a
[^a-f]   :     Not a to f

* Zero or more repetitions
+ One or more repetitions
?  One time only
{m} m Repetitions
{m,} m or more Repetitions
{m,n} m to n Repetitions

\w :  Word class (alphanumeric)
\d :  Digits
\s :   Space (whitespace)
\W : Non-word class
\D :  Non-digit
\S :   Non-space 

|      :  Or operand
()    :  Capturing group
(()) :  Capturing subgroup
\      :  Escape a special character
^     :  Starts with
&     :  Ends with