Sam's Blog
The Science of Naming
Date: Wednesday, 31 March 2010, 12:09.
Categories: perl, ironman, design, technique, basic, template-benchmark.
I've been working on Template::Benchmark some more this week and I've hit a brick wall, one entirely of my own stubborn making.
One of the major roadblocks to releasing v1.00 is that I'm unhappy with the name of one of the major conceptual elements of the module: I want to come up with a better name and, it being a major element, I don't want to change it after I've moved to a stable release.
Naming things well is something I'm extremely fussy about and one I've thought about often, so the factors that make a good name seemed like a worthy topic for this week's blog.
But why should you care?
If software was art, and developers were artists, naming things would be part of what is called technique. It's something that, like an understanding of colour is for artists, makes you better at your craft, no matter what you're striving to achieve with that craft.
Developers name things constantly: they name functions and methods, they name classes, they name variables, they name modules, they name entire projects.
Understanding the rules that underpin naming, what the trade-offs are, what makes one name good and another bad, whether you're breaking the rules or not, whether you should break the rules because some other purpose is more important: knowledge of these factors lets you make an informed choice when you name things. Making your choices informed rather than unconscious, makes you a better programmer.
So what, in my opinion, makes for a good name?
The qualities I strive for in a name are that it should be...
Accurate
Specific
Distinct
Consistent
Concise
Meaningful
Memorable
Lets go over those in some more detail.
Accurate
The name should accurately encompass the entire purpose of the thing being named.
Accuracy in naming allows you to follow the Principle of Least Surprise: if it does what it says on the tin, then people only have themselves to blame if they use it and it does something they didn't want to happen.
Being overly-precise on implementation details rather than concepts can be a bane to future development however.
If I have a document object that allowed me to add a footnote, then
naming the method as add_footnote()
would be accurate:
it states that it adds something and that the something is a footnote.
If it was changed to allow a second argument that let you say whether it was a footnote or a margin-note, the name is no longer accurate, in fact it's quite possibly misleading.
If you knew that development was possibly going to head in that direction
you could have anticipated this by naming it add_annotation()
, which
at all stages of implementation would have been accurate.
Specific
The name should be specific to the thing being named, and no other part of your code-base.
This is the counter to the above example in accuracy: if our
document object had an add_annotation()
method to
add footnotes but also had a different method that
allowed the addition of margin-notes, called add_margin_note()
, then
add_annotation()
is no longer sufficiently specific to
the concept of adding footnotes.
Annotation has an overlap of meaning with both footnote and margin-note,
so it could be mistaken as a method to add both or either.
Distinct
The name should be distinct from other names, so that it won't be confused with other parts of your code base.
This isn't just a matter of being specific and ensuring there's no overlap of meaning, it's a matter of ensuring there's no likelihood of accidentally substituting meaning.
A good counter-example is with the Apache Webserver, the Authentication and Authorization modules. The names are certainly both accurate and precise in their meaning, but which does which?
Even if you or I don't personally mistake them, it should be apparent that there's scope for confusion.
Naming one module mod_authn
, which is a valid abbreviation of either,
sure doesn't help; even if the other is named more clearly as mod_authz
.
Being distinct is also the converse of consistency of behaviour, detailed below, in that groups of code behaviour that are different to each other should be distinguishable by their distinct naming.
For example, a group of methods that add information to a class could
have consistent naming by having a common format of add_*()
,
whereas a group of methods that overwrite the information would
have distinct, but consistent among themselves, naming
that followed the format of set_*()
.
Consistent
The name should be consistent with your naming style or conventions within your code-base.
Whether you CamelCase
, under_score
or ALLCAPS
,
you should do it consistently.
If for some reason, say locals vs globals, you mix them, you should
follow those rules consistently throughout a code-base.
If you use set/get prefixes on accessors, or if you don't, you should do it consistently.
Order of adjectives and nouns should be consistent too, if at one end
of your module you're using $y_max
, then in another part of your
module you shouldn't be using $max_x
even if the code is
complete unrelated.
Similar groups of behaviour should be similarly named, if you're in
a loop and you have $min_y
, $max_y
, $best_y
and
$current_y
you might well want to rename your use of
$x
to $current_x
even if you don't have the
min/max/best versions to distinguish it from: consistency with
the $current_y
could make its meaning clearer.
Consistency of naming for common behaviour helps people remember a name by reducing it to a game of "fill in the blanks" from a name that they do remember:
If you know DBI.pm has a function
selectrow_arrayref()
, you can make a good guess at the name of the function that returns a hash rather than an array, or the one that returns all rows rather than a single row.If you know that the Template::Sandbox plugin for Template::Benchmark is Template::Benchmark::Engines::TemplateSandbox, you can make a stab at what the HTML::Template one might be.
Concise
The more concise, without losing quality, a thing is, the clearer its other qualities tend to be.
Concise doesn't mean being short however, it means being sufficient but no more.
A variable named $upmc
is short, a variable named
$the_count_of_the_unread_private_messages_for_the_user_in_this_scope
is sufficient but not short, whereas $unread_message_count
is quite probably as short as it can be while still being
sufficient: it's concise.
Meaningful
A good name doesn't just indicate what part of your code-base it deals with, it should also be educational.
Knowing and thinking of the name should subtly, or even overtly, guide you into thinking about correct use of that code.
This can range from some implication of what arguments a function takes and returns, or to indicate its larger place within a program's structure.
BEGIN
would be a good example of a name that guides you into
understanding its purpose, it could very well be named as something like
POST_MODULE_COMPILE_INIT
, which would possibly be accurate,
but doesn't guide you into the idea of "this is where I put stuff
that I want to run at the very beginning".
Unless you happened to know what "post-module compile init" was,
you wouldn't know that then was a good time to run things.
Being called BEGIN
however tell you what it's for, rather
than what it does or, worse, when it happens.
It's meaningful.
I mentioned the DBI.pm selectrow_arrayref()
earlier,
this is also a good example of a meaningful name because it conveys
more meaning than just a description of what it does: knowing that the
name is concise allows you to infer that each part of that name
is important, and that the mention of _arrayref
implies
the existence of another function, using _hashref
.
Similarly the existence of row
in the name implies that
there's probably a function that selects something else
or that, if there isn't, the developers thought there could
conceivably be one at some point.
So the name of that one function is educating the developer into an unconscious awareness of at least two other parts of the code-base, every time they use it.
Memorable
A good name needs to be memorable, or at the very least deducible.
It's not sufficient to be able to look at a name and think, "oh yes, that code does X", it must also be apparent from the flip side: "I want to do X, what's the logical name of the code that does it?"
If you compare two functions get_superclass()
and
get_parent_class()
are both accurate and
meaningful if you're looking at the names.
On the other hand, if you're a perl developer trying to remember
"What's the function that tells me the superclass?", you're going
to try get_superclass()
first, because superclass is the
terminology perl developers are more familiar with.
If you're consistent in your naming, you'll often find that you'll get memorable "for free", because it's easier to remember a single rule that applies to twenty situations than it is to remember twenty situations each with different quirks.
These are the main factors that I juggle in my head when I'm trying to give a decent name to something, sometimes they're cooperative, and sometimes they're in open conflict with each other.
For that Template::Benchmark feature, I'm struggling for something that's both accurate and memorable.
Maybe inspiration will strike soon.