Python – Regular Expressions Practical Guide

Regular Expressions are commonly used in Linux command line tools like sed, awk, grep etc. Most programming languages support them in either built – in or through an external library.

The main problem of using them is that they difficult to understand, but they are well worth the effort to learn. Using a regular expression can save you a lot of time.

Lets start with a simple example:

Validating Input string

We declare the regular expression to match email with a very simple rules:

  • at least one letter – [a-z]+
  • followed by @
  • followed by at least one letter
  • followed by period
  • followed by at least one letter

This is a very simple example , it doesn’t accept digits, doesn’t check for known extension (com/net/org) and there are more pitfalls. But the point is that if we want to add those rules we need to change the regular expression only.

Some match rules:

For example if we want to add support for digits in the first part of the email expression we add:

Or if we want to enable only .com or .net emails we need to add:

Some examples:

Email:

URL

Phone number:

 

Search and Match

Search in text based on regular expression. For example if we want to find a sentence starting with ‘hello’ or ‘bye’ and ending with ‘day’ or ‘month’

Output:

Regular expression substitution

Sometimes you need to find and replace one sub string with another. Using regular expressions , you can search also for pattern. For example  we want to find all the numbers in the text and replace it with * :

Output:

You can use subn which returns a tuple :

Output:

You can also supply a function in the second parameter, the function will be invoked for any match helping you decide what to do:

Output:

 

Splitting a string

Using string class, you can split a string to substrings only with one separator.

Using regular expressions, you can do it for a pattern and for multiple separators. For example:

Output:

Multiple separators:

Output:

 

Shortcuts:

There are some shortcuts for common patterns like numbers, words, etc.

For example if we want to find one or more digit we can use the pattern [0-9]+ . We can do it with ‘\d+’ as a shortcut:

Other shortcuts:

\D – not digit:

\w – word

\W – not word

\s – white space

\S – not white space

Find All

Using iterator:

Output:

 

Compiling a regular expression

If you are using a regular expression in a loop , for example while reading lines from a file it is better to compile it for performance :

 

 

 

 

 

 

 

 

 

 

Tagged

3 thoughts on “Python – Regular Expressions Practical Guide

  1. Not off to a great start – first example should be x=”[a-z]+@[a-z]+\.[a-z]+”

    1. You are right, thanks
      fixed

  2. using iterator print(m) is missing info on print, also needs m.span() and m,m.span(),m.group(0) to get “span=” and “match=”

Leave a Reply

Your email address will not be published. Required fields are marked *