Getting Regular Expressions Right (Unix Power Tools, 3rd Edition)
32.16. Getting Regular Expressions Right
Writing regular expressions involves
more than learning the mechanics. You not only have to learn how to
describe patterns, but you also have to recognize the context in
which they appear. You have to be able to think through the level of
detail that is necessary in a regular expression, based on the
context in which the pattern will be applied.
The same thing that makes writing regular expressions difficult is
what makes writing them interesting: the variety of occurrences or
contexts in which a pattern appears. This complexity is inherent in
language itself, just as you can't always understand
an expression (Section 32.1) by looking up each word in the dictionary.
The process of writing a regular expression involves three steps:
Knowing what you want to match and how it might appear in the text.
Writing a pattern to describe what you want to match.
Testing the pattern to see what it matches.
This process is virtually the same kind of process that a programmer
follows to develop a program. Step 1 might be considered the
specification, which should reflect an understanding of the problem
to be solved as well as how to solve it. Step 2 is analogous to the
actual coding of the program, and step 3 involves running the program
and testing it against the specification. Steps 2 and 3 form a loop
that is repeated until the program works satisfactorily.
Testing your description of what you want to match ensures that the
description works as expected. It usually uncovers a few surprises.
Carefully examining the results of a test, comparing the output
against the input, will greatly improve your understanding of regular
expressions. You might consider evaluating the results of a
pattern-matching operation as follows:
Hits
The lines that I wanted to match.
Misses
The lines that I didn't want to match.
Misses that should be hits
The lines that I didn't match but wanted to match.
Hits that should be misses
The lines that I matched but didn't want to match.
Trying to perfect your description of a pattern is something that you
work at from opposite ends: you try to eliminate the
"hits that should be misses" by
limiting the possible matches, and you try to capture the
"misses that should be hits" by
expanding the possible matches.
The difficulty is especially apparent when you must describe patterns
using fixed strings. Each character you remove from the fixed-string
pattern increases the number of possible matches. For instance, while
searching for the string what, you determine that
you'd like to match What as well.
The only fixed-string pattern that will match What
and what is hat, the longest
string common to both. It is obvious, though, that searching for
hat will produce unwanted matches. Each character
you add to a fixed-string pattern decreases the number of possible
matches. The string them is going to produce fewer
matches than the string the.
Using metacharacters in patterns provides greater flexibility in
extending or narrowing the range of matches. Metacharacters, used in
combination with literals or other metacharacters, can be used to
expand the range of matches while still eliminating the matches that
you do not want.
-- DD
32.15. Extended Regular Expressions32.17. Just What Does a Regular Expression Match?
Copyright © 2003 O'Reilly & Associates. All rights reserved.
Wyszukiwarka
Podobne podstrony:
CH32 (5)ch32ch32ch32ch32ch32CH32ch32ch32ch32ch32 (3)więcej podobnych podstron