Computing.Net > Forums > Unix > Expression Matching

Computing.Net: Over 1,000,000 posts about all things technology related! Over 90% answered within 24 hours! Click here to sign up now, it's free!

Expression Matching

Reply to Message Icon

Original Message
Name: arunsista
Date: September 1, 2005 at 02:03:10 Pacific
Subject: Expression Matching
OS: Unix
CPU/Ram: P4, 1 GB RAM
Comment:


Hi All,

I have a question on how regular expressions evaluate.

I have the following regular expression that I use in SED:

echo hello world abac hello world abac| sed 's/^\([a-zA-Z0-9 <:_=\."][a-zA-Z0-9 <:_=\."]*\)abac\(.*\)/$\1arun\2/g'

I could not quite figure out why the output of this program was:

$hello world abac hello world arun

Why is it that the first instance of abac was not picked for replacement. Why was the second occurance get matched first. If I remove the second occurance of abac I still get the following output:

$hello world arun hello world

Can anyone explain the above behaviour.

Also assuming that I need to replace both occurances while expecting a particular sequence of characters preceeding abac like in the case above how would I replace both the occurances of abac.



Report Offensive Message For Removal


Response Number 1
Name: Jim Boothe
Date: September 5, 2005 at 20:14:55 Pacific
Reply:

A regular expression always matches the longest qualifying string, so that explains the behaviour.

It's a little late, but tomorrow I will provide a solution on how to isolate and change two of those expressions in the same line.


Report Offensive Follow Up For Removal

Response Number 2
Name: Jim Boothe
Date: September 7, 2005 at 11:13:41 Pacific
Reply:

When you want to change a pattern and the line may contain two of these patterns one following the other, this would not be a problem if those two back-to-back patterns were separately distinguishable. You would just include the /g flag (as you did), to make sed process all patterns on the line instead of just the first pattern.

But in your case, the concatenation of back-to-back patterns also happen to qualify as one even longer pattern, and of course sed finds the one longer pattern instead of two separate patterns based on the regexp rule of always matching the longest qualifying string.

One way to handle that is to force sed to see both patterns by making it look for an expression such as <pattern><pattern>. But if a line does not contain <pattern><pattern>, the line would not get modified, so you also have to have your single-pattern command as well. The single-pattern command will have to come last to give the double pattern command first go at it.

But since the pattern allows most characters, the first pattern in the double-pattern expression grabs all that it can, qualifying the longest possible string. That leaves the shortest possible qualifying string for the second pattern, which in this case will be just a single character preceding abac, since we insist on one-or more characters at that point.

So, the following code works, although you may not care for which string of characters is captured to end the first pattern, and which string of characters is captured to start the second pattern. But if so, then you can make your patterns a bit more restrictive.

Since I had to code back-to-back patterns, to keep the code shorter, I used a much simpler expression, and changed the test data to all lowercase accordingly.

mysed.sh:
sed \
-e 's/\([a-z ][a-z ]*\)abac\(.*\)\([a-z ][a-z ]*\)abac\(.*\)/A:\1arun\2B:\3arun
\4/g' \
-e 's/\([a-z ][a-z ]*\)abac\(.*\)/$\1arun\2/g' \
myfile

myfile:
a b c abac a b c abac xyz
a b c abac xyz

./mysed.sh
A:a b c arun a b cB: arun xyz
$a b c arun xyz
A:a b c arun a b cB: arun xyz
$a b c arun xyz


Report Offensive Follow Up For Removal







Post Locked

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.


Go to Unix Forum Home



Results for: Expression Matching

Send mail on UNIX
    Summary: Hi, I have a question. how do we send a mail on one line command?? like it should include the.... recipent, subject, body, I found some sources online but, it doesnt send it right away.. I tried pi...
www.computing.net/answers/unix/send-mail-on-unix/4064.html

passing paramters with spaces
    Summary: I am trying to pass variables from a korn script to an oracle stored procedure that sends email. However when I pass the subject or the body of the email to the sql script that calls the procedure, e...
www.computing.net/answers/unix/passing-paramters-with-spaces/6406.html

Negate a Regular Expression?
    Summary: Is there any way to negate a regular expression so that a search finds everything that does NOT match the pattern? Also, is there any way to negate a backreference in order to see if two words do NOT ...
www.computing.net/answers/unix/negate-a-regular-expression/6515.html








Which MP3 player do you have?

iPod/iPhone
Zune
Something Else
None


View Results

Poll Finishes Today.
Discuss in The Lounge
Poll History






Data Recovery Software