Sam's Blog
Did you mean +, not *, in that regexp?
Date: Wednesday, 28 April 2010, 13:36.
Categories: perl, ironman, regexp, craft, basic, tutorial.
Part of the series: Better Regexps.
Continuing from my previous article
"Anchoring Regexps",
another common regexp mistake I see is use of *
where the
author really meant +
.
So today I cover +
and *
: what's the difference and
why does it matter?
Both +
and *
are repeat operators, they say that the
immediately preceding atomic pattern should be repeated a certain
number of times.
In the case of +
it means "one or more times", and in the
case of *
it means "zero or more times".
For example:
/a*/ # zero or more repeats of 'a'
/a+/ # one or more repeats of 'a'
/(123)*/ # zero or more repeats of '123'
/(123)+/ # one or more repeats of '123'
You'll particularly often see *
paired with the .
"match any character" token, as .*
to give a wild-card
"match anything" behaviour, often to mark a capture target:
/^Version: (.*)$/
And it's here that we see the first problems.
Is that really what they meant to do?
*
means zero or more, that means that it matches zero
repeats: it will match lines that have a blank version string.
Now it's perfectly possible that this is behaviour you want,
you might have an explicit check for $1 eq ''
to complain
that there's a malformed version line.
Most probably though, the code assumes that $1
actually
contains some information and the empty-string case is an unwanted
side-effect of using *
when they really meant +
.
Somewhere further down the code they'll be proceeding on the assumption that they're working with a string with a real value, and they'll get an unwelcome surprise:
die "Malformed version line: '$line'"
unless $line =~ /^Version: (.*)$/;
$version = $1;
# ... more code ...
# Clean-up old version.
File::Path::remove_tree( "$projectdir/$version" );
Oh dear, our blank $version
just nuked the entire project directory,
taking every version with it.
So, next time you're writing a regular expression and your fingers
stretch to put *
after a pattern, ask yourself "Do I want to
match nothing or do I really only want to match something?"
Nine times out of ten, you just want the something, and you
should be typing +
instead of *
.
This blog entry is part of the series: Better Regexps.
- Anchoring Regexps
- Did you mean +, not *, in that regexp?
- Readable Regexps: Why you should use /x