Sam's Blog

Did you mean +, not *, in that regexp?

Date: Wednesday, 28 April 2010, 13:36.

Categories: perl, ironman, regexp, craft, basic, tutorial.

Part of the series: Better Regexps.

Continuing from my previous article "Anchoring Regexps", another common regexp mistake I see is use of * where the author really meant +.

So today I cover + and *: what's the difference and why does it matter?

Both + and * are repeat operators, they say that the immediately preceding atomic pattern should be repeated a certain number of times.

In the case of + it means "one or more times", and in the case of * it means "zero or more times".

For example:

Code:
/a*/  #  zero or more repeats of 'a'
/a+/  #  one or more repeats of 'a'

/(123)*/  #  zero or more repeats of '123'
/(123)+/  #  one or more repeats of '123'

You'll particularly often see * paired with the . "match any character" token, as .* to give a wild-card "match anything" behaviour, often to mark a capture target:

/^Version: (.*)$/

And it's here that we see the first problems.

Is that really what they meant to do?

* means zero or more, that means that it matches zero repeats: it will match lines that have a blank version string.

Now it's perfectly possible that this is behaviour you want, you might have an explicit check for $1 eq '' to complain that there's a malformed version line.

Most probably though, the code assumes that $1 actually contains some information and the empty-string case is an unwanted side-effect of using * when they really meant +.

Somewhere further down the code they'll be proceeding on the assumption that they're working with a string with a real value, and they'll get an unwelcome surprise:

die "Malformed version line: '$line'"
    unless $line =~ /^Version: (.*)$/;
$version = $1;

#  ... more code ...

#  Clean-up old version.
File::Path::remove_tree( "$projectdir/$version" );

Oh dear, our blank $version just nuked the entire project directory, taking every version with it.

So, next time you're writing a regular expression and your fingers stretch to put * after a pattern, ask yourself "Do I want to match nothing or do I really only want to match something?"

Nine times out of ten, you just want the something, and you should be typing + instead of *.

This blog entry is part of the series: Better Regexps.

  1. Anchoring Regexps
  2. Did you mean +, not *, in that regexp?
  3. Readable Regexps: Why you should use /x

Browse Sam's Blog Subscribe to Sam's Blog

By day of April: 07, 14, 22, 25, 28.

By month of 2010: March, April, May, June, July, August, September, November.

By year: 2010, 2011, 2012, 2013.

Or by: category or series.

Comments

blog comments powered by Disqus
© 2009-2013 Sam Graham, unless otherwise noted. All rights reserved.