We have moved to www.dataGenX.net, Keep Learning with us.

Tuesday, August 04, 2015

Regular Expressions in Linux


What Are Regular Expressions?

A regular expression is a pattern template you define that a Linux utility Uses to filter text. A Linux utility (such as the sed editor or the awk program) matches the regular expression pattern against data as that data flows Into the utility. If the data matches the pattern, it's accepted for processing.

              If the data doesn't match the pattern, it's rejected. The regular expression pattern makes use of wildcard characters to represent one or more characters in the data stream.

Types of regular expressions:
There are two popular regular expression engines:

The POSIX Basic Regular Expression (BRE) engine
The POSIX Extended Regular Expression (ERE) engine

A.     Defining BRE Patterns:
The most basic BRE pattern is matching text characters in a data stream.
Example 1: Plain text
  
$ echo "This is a test" | sed -n '/test/p'
This is a test.

$ echo "This is a test" | sed -n '/trial/p'
$

$ echo "This is a test" | awk '/test/{print $0}'
This is a test.

$ echo "This is a test" | awk '/trial/{print $0}’
$


Example 2: Special characters
  
The special characters recognized by regular expressions are:

. * [ ] ^ $ { } \ + ? | ( )

For example, if you want to search for a dollar sign in your text, just precede it with a backslash character:

$ cat data2
The cost is $4.00

$ sed -n '/\$/p' data2
The cost is $4.00
$


Example 3: Looking for the ending
  
The dollar sign ($) special character defines the end anchor.


$ echo "This is a good book" | sed -n '/book$/p'
This is a good book

$ echo "This book is good" | sed -n '/book$/p'
$


Example 4: Using ranges
  
You can use a range of characters within a character class by using the dash symbol.
Now you can simplify the zip code example by specifying a range of digits:

$ sed -n '/^[0-9][0-9][0-9][0-9][0-9]$/p' data8
60633
46201
45902
$


B.     Extended Regular Expressions:

The POSIX ERE patterns include a few additional symbols that are used by some Linux applications and utilities. The awk program recognizes the ERE patterns, but the sed editor doesn't.


Example 1: The question mark

The question mark indicates that the preceding character can appear zero or one time, but that's all. It doesn't match repeating occurrences of the character:

$ echo "bt" | awk '/be?t/{print $0}'
bt

$ echo "bet" | awk '/be?t/{print $0}'
Bet

$ echo "beet" | awk '/be?t/{print $0}'
$

$ echo "beeet" | awk '/be?t/{print $0}'
$


Example 2: The plus sign

The plus sign indicates that the preceding character can appear one ormore times, but must be present at least once. The pattern doesn't match if the character is not present:

$ echo "beeet" | awk '/be+t/{print $0}'
beeet

$ echo "beet" | awk '/be+t/{print $0}'
beet

$ echo "bet" | awk '/be+t/{print $0}'
bet

$ echo "bt" | awk '/be+t/{print $0}'
$


Example 3: The pipe symbol

The pipe symbol allows to you to specify two or more patterns that the regular expression engine uses in a logical OR formula when examining the data stream. If any of the patterns match the data stream text, the text passes. If none of the patterns match, the data stream text fails.

The format for using the pipe symbol is:

expr1|expr2|...

Here's an example of this:

$ echo "The cat is asleep" | awk '/cat|dog/{print $0}'
The cat is asleep

$ echo "The dog is asleep" | awk '/cat|dog/{print $0}'
The dog is asleep

$ echo "The sheep is asleep" | awk '/cat|dog/{print $0}'
$


Example 4: Grouping expressions

When you group a regular expression pattern, the group is treated like a standard character. You can apply a special character to the group just as you would to a regular character.

For example:

$ echo "Sat" | awk '/Sat(urday)?/{print $0}'
Sat

$ echo "Saturday" | awk '/Sat(urday)?/{print $0}'
Saturday

$



Like the Facebook Page & join Group
https://www.facebook.com/DataStage4you
https://www.facebook.com/groups/DataStage4you

https://twitter.com/datastage4you
https://groups.google.com/d/forum/datastage4you
For WHATSAPP group , drop a msg to 91-88-00-906098






No comments :

Post a Comment