Sam's Blog
Advanced Benchmark Analysis II: Probing strengths and weaknesses
Date: Tuesday, 9 March 2010, 13:39.
Categories: perl, ironman, benchmarking, analysis, trim, regexp, optimization, advanced, tutorial.
Part of the series: Explaining Benchmark Analysis.
In my previous blog entry, "Advanced Benchmark Analysis I: Yet more white-space trimming", I left you with the thought that our benchmarks changed with changing input.
This article shows you how to analyze those changes and how to draw conclusions from them.
We've reached the stage where we're convinced that all the implementations we've covered so far really are "doing the same thing", or at least we're convinced that they're "doing sufficiently the same thing for our purposes", and we can now return to the performance discrepency that Aaron noticed.
I mentioned before that he'd changed the input, and this was one of the concluding points in both my previous blog entries, so lets investigate further.
First of all, we're going to put our venerable one-liner through the wrangler again.
perl -MBenchmark -MString::Strip=StripLTSpace -e '
my %tests = (
SSS => ( " " x 10 ) . "some text" . ( " " x 10 ),
LSL => ( " " x 400 ) . "some text" . ( " " x 400 ),
SLS => ( " " x 10 ) . ( "some text" x 400 ) . ( " " x 10 ),
LLL => ( " " x 400 ) . ( "some text" x 400 ) . ( " " x 400 ),
LSS => ( " " x 400 ) . "some text" . ( " " x 10 ),
SSL => ( " " x 10 ) . "some text" . ( " " x 400 ),
NSN => "some text",
NLN => ( "some text" x 400 ),
);
foreach my $test ( qw/SSS LSL SLS LLL LSS SSL NSN NLN/ )
{
my $s = $tests{ $test };
my $t;
my %h = (
CTRL => sub { $t = $s; $t; },
TWO => sub { $t = $s; $t =~ s/^\s+//; $t =~ s/\s+$//; $t; },
CAPTURE => sub { $t = $s; $t =~ s/^\s*(.*?)\s*$/$1/s; $t; },
G_OR => sub { $t = $s; $t =~ s/(^\s+|\s+$)//g; $t; },
NG_OR => sub { $t = $s; $t =~ s/(?:^\s+|\s+$)//g; $t; },
UG_OR => sub { $t = $s; $t =~ s/^\s+|\s+$//g; $t; },
STRIP => sub { $t = $s; StripLTSpace( $t ); $t; },
SUBSTR => sub { $t = $s;
( $t =~ /^\s+/ ) && ( substr( $t, 0, $+[ 0 ] ) = "" );
( $t =~ /\s+$/ ) && ( substr( $t, $-[ 0 ] ) = "" );
$t; },
);
print "\nBenchmark for: $test\n";
Benchmark::cmpthese( -5, \%h );
}'
This time we've wrapped it in a foreach loop running across several different formats of sample data, each with a cryptic acronym for a name.
The acronym isn't really that cryptic, S stands for Short, L stands for Long, N stands for None, and each letter represents a portion of the sample string in turn.
So SSS means short-short-short, which means a short section of white-space, followed by a short section of text, followed by another short section of whitespace.
LSL, on the other hand, means a long-short-long, which is a large chunk of whitespace, a short bit of text and another large chunk of whitespace.
NSN and NLN mean that there's no white-space to trim, and it's just a short or long block of text.
For the sake of brevity, and our sanity, we've not included all possible permutations, but there's no reason you couldn't add them if you wanted.
We've also dropped the bit that outputs the result of each code snippet, we're fairly confident by now that our code snippets are working.
Still with me? Now we run this, gleeful cackle, one-liner once more.
If you find your eyes glazing over, just skip the wall of white and read my analysis below.
Benchmark for: SSS Rate G_OR SUBSTR CAPTURE UG_OR NG_OR TWO STRIP CTRL G_OR 210779/s -- -4% -17% -20% -22% -67% -83% -96% SUBSTR 219299/s 4% -- -14% -17% -19% -66% -82% -96% CAPTURE 254485/s 21% 16% -- -3% -6% -60% -79% -95% UG_OR 263472/s 25% 20% 4% -- -3% -59% -79% -95% NG_OR 270489/s 28% 23% 6% 3% -- -58% -78% -95% TWO 636480/s 202% 190% 150% 142% 135% -- -49% -88% STRIP 1239868/s 488% 465% 387% 371% 358% 95% -- -76% CTRL 5236399/s 2384% 2288% 1958% 1887% 1836% 723% 322% -- Benchmark for: LSL Rate G_OR SUBSTR CAPTURE NG_OR UG_OR TWO STRIP CTRL G_OR 151246/s -- -9% -10% -22% -23% -55% -58% -90% SUBSTR 165814/s 10% -- -1% -15% -16% -51% -54% -89% CAPTURE 167326/s 11% 1% -- -14% -15% -50% -54% -89% NG_OR 194349/s 28% 17% 16% -- -2% -42% -46% -88% UG_OR 197605/s 31% 19% 18% 2% -- -41% -46% -87% TWO 337176/s 123% 103% 102% 73% 71% -- -7% -79% STRIP 362669/s 140% 119% 117% 87% 84% 8% -- -77% CTRL 1571820/s 939% 848% 839% 709% 695% 366% 333% -- Benchmark for: SLS Rate G_OR UG_OR NG_OR CAPTURE SUBSTR TWO STRIP CTRL G_OR 1021/s -- -4% -4% -70% -88% -90% -99% -100% UG_OR 1062/s 4% -- -1% -69% -88% -89% -99% -100% NG_OR 1069/s 5% 1% -- -68% -87% -89% -99% -100% CAPTURE 3380/s 231% 218% 216% -- -60% -66% -98% -100% SUBSTR 8525/s 735% 703% 698% 152% -- -15% -94% -100% TWO 10003/s 879% 842% 836% 196% 17% -- -93% -100% STRIP 142274/s 13829% 13297% 13212% 4110% 1569% 1322% -- -93% CTRL 2055605/s 201148% 193459% 192231% 60720% 24013% 20450% 1345% -- Benchmark for: LLL Rate G_OR UG_OR NG_OR CAPTURE SUBSTR TWO STRIP CTRL G_OR 1010/s -- -5% -6% -70% -88% -90% -99% -100% UG_OR 1062/s 5% -- -1% -69% -87% -89% -99% -100% NG_OR 1071/s 6% 1% -- -69% -87% -89% -99% -100% CAPTURE 3405/s 237% 221% 218% -- -59% -65% -97% -100% SUBSTR 8267/s 718% 678% 672% 143% -- -16% -93% -100% TWO 9867/s 877% 829% 821% 190% 19% -- -92% -99% STRIP 121492/s 11924% 11340% 11246% 3468% 1370% 1131% -- -93% CTRL 1758824/s 173969% 165514% 164155% 51550% 21175% 17725% 1348% -- Benchmark for: LSS Rate G_OR SUBSTR CAPTURE NG_OR UG_OR TWO STRIP CTRL G_OR 181016/s -- -8% -12% -24% -24% -60% -69% -92% SUBSTR 195724/s 8% -- -5% -17% -18% -57% -66% -92% CAPTURE 205592/s 14% 5% -- -13% -14% -55% -65% -91% NG_OR 236992/s 31% 21% 15% -- -0% -48% -59% -90% UG_OR 237885/s 31% 22% 16% 0% -- -48% -59% -90% TWO 454489/s 151% 132% 121% 92% 91% -- -22% -81% STRIP 579231/s 220% 196% 182% 144% 143% 27% -- -76% CTRL 2387847/s 1219% 1120% 1061% 908% 904% 425% 312% -- Benchmark for: SSL Rate G_OR SUBSTR CAPTURE UG_OR NG_OR TWO STRIP CTRL G_OR 186333/s -- -3% -12% -22% -23% -59% -67% -93% SUBSTR 192045/s 3% -- -10% -20% -20% -58% -66% -93% CAPTURE 212790/s 14% 11% -- -11% -12% -53% -62% -92% UG_OR 239237/s 28% 25% 12% -- -1% -48% -58% -91% NG_OR 241381/s 30% 26% 13% 1% -- -47% -57% -91% TWO 457507/s 146% 138% 115% 91% 90% -- -19% -82% STRIP 566512/s 204% 195% 166% 137% 135% 24% -- -78% CTRL 2586352/s 1288% 1247% 1115% 981% 971% 465% 357% -- Benchmark for: NSN Rate CAPTURE G_OR UG_OR NG_OR SUBSTR TWO STRIP CTRL CAPTURE 263012/s -- -17% -19% -20% -74% -74% -79% -94% G_OR 315185/s 20% -- -3% -4% -69% -69% -75% -92% UG_OR 325865/s 24% 3% -- -1% -68% -68% -74% -92% NG_OR 328327/s 25% 4% 1% -- -67% -67% -74% -92% SUBSTR 1006402/s 283% 219% 209% 207% -- -0% -20% -75% TWO 1006740/s 283% 219% 209% 207% 0% -- -20% -75% STRIP 1264629/s 381% 301% 288% 285% 26% 26% -- -69% CTRL 4063153/s 1445% 1189% 1147% 1138% 304% 304% 221% -- Benchmark for: NLN Rate G_OR UG_OR NG_OR CAPTURE SUBSTR TWO STRIP CTRL G_OR 1043/s -- -1% -3% -69% -89% -90% -99% -100% UG_OR 1050/s 1% -- -2% -69% -89% -90% -99% -100% NG_OR 1075/s 3% 2% -- -68% -89% -90% -99% -100% CAPTURE 3343/s 221% 218% 211% -- -66% -68% -98% -100% SUBSTR 9796/s 839% 833% 811% 193% -- -5% -93% -99% TWO 10346/s 892% 885% 863% 210% 6% -- -92% -99% STRIP 137321/s 13069% 12977% 12676% 4008% 1302% 1227% -- -90% CTRL 1356354/s 129975% 129065% 126095% 40476% 13745% 13009% 888% --
Well, isn't that a lot to take in at once?
Let's break it down into stages.
Firstly, we scan through the results looking at the influence of the control function, we see that for the most part it's around an order of magnitude faster than the fastest result, so we can largely dismiss it from our conclusions for those results.
In three cases, SSS, LSL and NSN, it's around 20% to 25% of the cost of the fastest result, in the first of those the next-fastest is half the speed, so it isn't going to distort the relative results all that much.
The LSL and NSN cases, however, have results that are all much closer and the proportional impact of the constant overhead is thus much higher. So we need to keep an eye on that while we're making observations: it may or may not be relevent, but we won't know until we make our observations!
So, lets make some observations.
We'll start by looking at the relative performance of the fastest performer, the String::Strip-driven STRIP benchmark.
Since we're interested in just the relative performance, we can trim our data down to just the relative percentages for the STRIP line, we can ignore the STRIP and CTRL columns, so I've left those out, which makes room for me to stick the name of the test on the front for clarity.
Note that this isn't a simple cut-n-paste, the order of the columns changes for each of the tables, so I've had to reorder the values in each row too.
Rate G_OR SUBSTR CAPTURE UG_OR NG_OR TWO SSS STRIP 1239868/s 488% 465% 387% 371% 358% 95% LSL STRIP 362669/s 140% 119% 117% 84% 87% 8% SLS STRIP 142274/s 13829% 1569% 4110% 13297% 13212% 1322% LLL STRIP 121492/s 11924% 1370% 3468% 11340% 11246% 1131% LSS STRIP 579231/s 220% 196% 182% 143% 144% 27% SSL STRIP 566512/s 204% 195% 166% 137% 135% 24% NSN STRIP 1264629/s 301% 26% 381% 288% 285% 26% NLN STRIP 137321/s 13069% 1302% 4008% 12977% 12676% 1227%
A quick glance at the Rate column shows us that the longer the input the slower the rate, this should be no surprise. Of note should be that increasing the length of the untrimmable section hurts performance more than increasing the amount of white-space to trim, this also isn't too surprising - we have to copy the untrimmed bit around, whereas we throw the white-space away.
When we look at how this technique compares to others though, we get some surprises.
If we look at the next-fastest method, over on the far-right in the TWO column, we see some wildly varying figures.
Looking at the first two lines, SSS and LSL, we can see that when trimming short amounts of white-space from short strings STRIP runs 95% faster than TWO. But when the same text is surrounded by large amounts of white-space the performance gap narrows to a tiny 8%.
Given that 8% is such a small gap, we need to look back at our control for LSL to see if that has an impact. Sure enough LSL was one of the ones we were to keep an eye on, with around a 20% overhead impact on the results of STRIP. This makes that 8% somewhat unreliable as a hard figure, but whatever the true figure is, we know that the performance of the two are vastly closer than for SSS.
We also know, if we've run these benchmarks more than once, that we're getting around 10% random variation in results because we're only running the benchmarks for 3 CPU seconds. A 10% error range on each gives a 20% error range on comparing them, which means that our 8% unreliable figure is swallowed by the Random Noise Monster, leaving us uncertain even which is the faster of the two.
We don't actually need to know which is faster though, because the important bit of information to take away from this is that STRIP's performance degrades faster with additional white-space than TWO's does, almost certainly to the point where TWO becomes faster than STRIP, even if we can't tell whether we've reached that point in this particular benchmark.
To sanity-check that conclusion, we should look at the SLS and LLL results. Sure enough, there we see the gap narrowing too, although the gap is so large to begin with that the narrowing is fairly insignificant.
We can chalk that up as a solid observation. Note that we don't say that it's an important observation, we'll come back to that point later.
OK, we've got an idea of how the relative performance changes when fiddling with the amount of white-space, what happens when we change the length of the untrimmable section instead?
The lines we want to compare this time are SSS vs SLS and LSL vs LLL.
Rate G_OR SUBSTR CAPTURE UG_OR NG_OR TWO SSS STRIP 1239868/s 488% 465% 387% 371% 358% 95% SLS STRIP 142274/s 13829% 1569% 4110% 13297% 13212% 1322% LSL STRIP 362669/s 140% 119% 117% 84% 87% 8% LLL STRIP 121492/s 11924% 1370% 3468% 11340% 11246% 1131%
SSS vs SLS shows us a change from 95% to 1322%. Youch! And we thought a 95% to 8% slide was bad... This shows us that the performance advantage of STRIP over TWO widens massively when the untrimmable section of the text increases in size. The difference is sufficiently large that we don't really need to look at our error rates, it isn't significantly going to change the story.
As before though, we sanity check our observation by seeing if there's also the same result when comparing LSL vs LLL. Here it's even more of a white-wash, we're moving from that dodgy 8% figure, way up to 1131%.
So, we have our second observation: STRIP degrades much less rapidly than TWO as the untrimmable body of the string increases in size and, since STRIP starts faster anyway, this only consolidates its advantage in this situation.
This very rapid decline does raise one issue though. It means that as the length of the untrimmable string decreases the performance gap very rapidly narrows. Now we can't get much shorter than "some text", but at that startling rate of change, it's conceivable that the 95% gap for SSS could be closed entirely on strings that are even shorter.
We'd have to investigate further to determine if that was the case, so that's our third observation: if we were interested in very short strings, we'd want to investigate this edge-case further.
A quick scan of STRIP against the other techniques shows that the same trends hold true against all the regexp methods. The fact that we're seeing the same relationship regardless of the regexp being used could indicate that something in the way String::Strip is handling large amounts of white-space isn't scaling as well as however the regexp engine is doing it, so we mark that up as another observation that could bear further investigation.
Next up, we take a glance at the LSS and SSL figures, this is a short bit of text with a long white-space on one side but not the other.
Rate G_OR SUBSTR CAPTURE UG_OR NG_OR TWO LSS STRIP 579231/s 220% 196% 182% 143% 144% 27% SSL STRIP 566512/s 204% 195% 166% 137% 135% 24%
Comparing these two lets us see if there's some asymmetric behaviour going on, it's really just a check to see if there's anything strange happening, and we can happily see that the results match within the range of our error: trimming off the front is no different to trimming off the back for all methods, as far as the accuracy of our benchmarks can tell.
Last up for analysing STRIP is to take a look at the NSN and NLN cases, these are the "do nothing" cases, we're trying to trim a string that has nothing to trim.
Rate G_OR SUBSTR CAPTURE UG_OR NG_OR TWO NSN STRIP 1264629/s 301% 26% 381% 288% 285% 26% NLN STRIP 137321/s 13069% 1302% 4008% 12977% 12676% 1227%
We can see here that even though we're leaving the string untouched, we're still taking a performance hit when the string is long, this is probably because we're still copying the string around. Unfortunate, but it doesn't appear to change the relative performance of STRIP significantly, since most of the other methods take the hit worse.
OK, we've analyzed the relative performance of STRIP against the others, now we just need to do the same thing for each other technique!
Don't worry, I'll skip over explaining those in detail and just point out the observations directly:
In the short "do nothing" case NSN: TWO, SUBSTR and STRIP all show very similar performance.
CAPTURE appears to scale as badly with increasing the length of the untrimmable text as TWO does.
SUBSTR scales slightly better than CAPTURE and TWO with the length of the untrimmable text, but still worse than STRIP.
All the OR variants (G_OR, UG_OR and NG_OR) show absolutely dire performance scaling with the size of the untrimmable text. The performance drop-off of these is nothing short of punishing if you'd made the mistake of using them on large text.
And we already observed that:
STRIP degrades faster than TWO as you increase the amount of white-space to trim. However, STRIP starts with such an advantage that the break-even point appears to be somewhere in the region of 800-characters of white-space to trim.
STRIP degrades slower than TWO as you increase the length of the untrimmable text. However, performance on short untrimmable strings starts with only a small advantage to STRIP, possibly to the point where TWO is faster on extremely short strings. Our benchmarks were insufficient to confirm this either way.
These two observations were also replicated to a lesser extent with the other regexp-based solutions, although the break-even points would be dramatically different.
Our final step is to examine these observations, assess their importance to us and draw our conclusions from them.
First off, we can dismiss the relatively poor performance of STRIP with large amounts of white-space: unless we're in that minority of applications dealing with stripping 800+ characters of white-space from two-word strings, it's never going to effect us.
We can also ignore the fact that TWO may outperform STRIP for very short untrimmable strings, unless we expect to be near-exclusively working on strings shorter than "some text": any hypothetical gains would be swamped if our input was a mix of very short and longer strings, even if any advantage existed in the first place.
If we expect our input to be dominated by short text that has nothing to trim, it makes very little difference to us which we choose from TWO, SUBSTR or STRIP.
We should avoid all the OR variants like the plague, even more so if we're expecting to use them on text that's, oooh... to pluck an example out of the air, about the length of a blog entry. (A normal length one that is, not the behemoth that you're currently reading.)
If we're working on pretty long blocks of body text, in the region of 800 words/5000 characters or more, with small amounts of white-space to trim, then SUBSTR looks to become faster than TWO but will never be faster than STRIP.
So, our conclusions become:
If we can, use String::Strip, it offers the best general-purpose performance across all realistic scenarios.
If we can't use String::Strip (maybe we can't afford the module dependency, or we can't install it on the target machine), our next-best choice is between TWO and SUBSTR based on whether the body text is 800 words/5000 characters or greater. If this was important to us we could tailor our benchmarks to find out where this balance-point occurs.
Or in code-snippets:
# Fastest general-purpose.
use String::Strip qw/StripLTSpace/;
StripLTSpace( $string );
# Fastest for short strings under 800 words/5000 characters (roughly).
# Short emails, blog entries, comments, most grammar parsing tasks.
$string =~ s/^\s+//;
$string =~ s/\s+$//;
# Fastest for medium-long strings over 800 words/5000 characters (roughly).
# Long emails, book chapters, wordy articles on white-space trimming.
( $string =~ /^\s+/ ) && ( substr( $t, 0, $+[ 0 ] ) = '' );
( $string =~ /\s+$/ ) && ( substr( $t, $-[ 0 ] ) = '' );
This concludes our investigation into white-space trimming, and more-importantly into advanced benchmark analysis, I hope you've enjoyed reading and learned a useful trick or two.
Before we go, let's summarize the steps we took:
First we mocked up a simple benchmark using Benchmark.pm, featuring our proposed solutions.
We chose our sample input and checked our results to ensure that edge-cases were dealt with correctly by all solutions.
We introduced a control group to establish our benchmark overhead, and learned how to quickly assess whether it was distorting our results or not.
We expanded our sample inputs and used the changes in performance to probe into the strengths and weaknesses of each solution.
We drew conclusions from those strengths and weaknesses to determine usage scenarios that matched those strengths and weaknesses.
Obviously, white-space trimming is unlikely to be a true performance black-hole in your application, but these general-purpose techniques can be applied to other performance-sinks.
How do you set about finding those performance-sinks?
That's the topic of my next series: profiling.
This blog entry is part of the series: Explaining Benchmark Analysis.
- Some simple "white-space trim" benchmarks
- Advanced Benchmark Analysis I: Yet more white-space trimming
- Advanced Benchmark Analysis II: Probing strengths and weaknesses