Perl ProgrammingLearn about Pattern Matching using Regular Expressions in Perl

Learn about Pattern Matching using Regular Expressions in Perl

Perl-(17)-Pattern-Matching-(2)-740X296

Using Single-Character Constructs
In the last article, we have learned how to use the simple (literal) patterns. A pattern like /Ahmed/ will match only strings that contain the name “Ahmed”. A modified version of it /Ahmed/i  will match any form of “ahmed” ignoring the letters case. The pattern /^172/ will match any string starting with 172, while /255$/ will match strings ending with 255. All of that go fine for “literal” patterns, but what if we need to match a single digit – any digit – or a single uppercase letter, or a single lowercase letter? How could we achieve this?!

This section is going to give you the answer.

Matching Digits
The following forms will match any single digit:

[0123456789]
[0-9]
\d

Examples

  • The following pattern will match any number between 10 and 19:
/1[0-9]/
  • The following pattern will match any number between 200 and 299:
/2\d\d/

Matching Non-Numeric Characters
The opposite of the above case is when we need to match any non-numeric character. The following forms will do the job:

\D
[^0-9]

Matching Alphabetic Characters
The following will match any single lowercase letter:

[a-z]

The following will match any single uppercase letter:

[A-Z]

The following form will match any letter:

[A-Za-z]

Matching Non-Alphabetic Characters
To match any single non-alphabetic character, use the negated form of [A-Za-z]

[^A-Za-z]

Note
Don’t confuse the ^ anchor that matches the start of the string with the ^ used inside square brackets to negate a character class.

Matching Alphanumeric Characters
To match any single lowercase, uppercase letter, or digit, use either of the following forms:

[A-Za-z0-9]
[\w]

The \w matches word characters.
To match Non-alphanumeric characters, use either the negated form [^A-Za-z0-9] or \W

Matching Space Characters
To match a single white space, tab, or new line, the \s pattern is used. As you have already guessed, \S is the opposite pattern that matches any non-space character.

Matching Characters from Custom Lists
You may come into situations wherein you need to match any single character in a list of characters of your own choice. In this case, a custom character class is defined between square brackets [ ].

Examples

  • The following will match any string starting with 172 or 192:
/^1[79]2/
  • The following will match any vowel:
[AEIOUaeiou]

Learn the Basics of C Programming Language

Matching ANY Single Character
The universal pattern matching character “.” matches any, yes any single character, except the newline character “\n”.

Example
The pattern /.ork/ will match any of the following substrings:

Work, work, fork, York, pork, 7ork

Pattern Multipliers
There are some characters and constructs that have special meanings when seen in regular expressions. They don’t match a character or pattern, but the number of repetitions of the character (or pattern). Such characters or constructs are called pattern multipliers. (some references call it quantifiers).

The following table lists the available multipliers and their uses:

Character Usage Example
? Matches zero or one occurrence of the pattern just before it. /13?/
This will match 1 and 13
* Matches zero or more occurrences of the pattern just before it. /13*/
This will match 1, 13, 133, 1333, …
+ Matches one or more occurrences of the pattern just before it. /13+/
This will match 13, 133, 1333, 13333, …
{n} Matches exactly n occurrences of the pattern just before it. /13{4}/
This will  match 13333
{n,} Matches n or more occurrences of the pattern just before it. /13{4,}/
This will match 13333, 133333, 133333, ….
{n,m} Matches n or more occurrences of the pattern just before it with maximum of m occurrences. /13{2,5}/
This will match 133, 1333, 13333, 133333

Matching Word Boundaries
To specify a word boundary in your match pattern, the \b anchor is used. Conversely, to match a pattern that is not at word boundary, use \B.

Examples

  • The following will match John, but not Jonson:
/John\b/
  • The following will match go, but no Congo:
/\bgo/
  • The following will match Zeid, but not AbuZeid, nor Zeidane:
/\bZeid\b/

Summary
In this article, we continued with Pattern Matching. We have learned how to use character classes to match a single-character. We learned how to match a digit, lowercase, uppercase letters, and white spaces. We have also learned how to create our custom character list using the square brackets. We have seen how to use the dot “.” character to match any single character (except the newline \n).

We had also some talk about Pattern Multipliers, which match certain number of occurrences of a pattern. Finally, we talked about the word boundary anchor that matches a position at word boundary.

In the next article, we will talk about Memorizing matched patterns; another part of the Regular Expressions topic. See you.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Exclusive content

- Advertisement -

Latest article

21,501FansLike
4,106FollowersFollow
106,000SubscribersSubscribe

More article

- Advertisement -