Sam's Blog

The Science of Naming

Date: Wednesday, 31 March 2010, 12:09.

Categories: perl, ironman, design, technique, basic, template-benchmark.

I've been working on Template::Benchmark some more this week and I've hit a brick wall, one entirely of my own stubborn making.

One of the major roadblocks to releasing v1.00 is that I'm unhappy with the name of one of the major conceptual elements of the module: I want to come up with a better name and, it being a major element, I don't want to change it after I've moved to a stable release.

Naming things well is something I'm extremely fussy about and one I've thought about often, so the factors that make a good name seemed like a worthy topic for this week's blog.

But why should you care?

If software was art, and developers were artists, naming things would be part of what is called technique. It's something that, like an understanding of colour is for artists, makes you better at your craft, no matter what you're striving to achieve with that craft.

Developers name things constantly: they name functions and methods, they name classes, they name variables, they name modules, they name entire projects.

Understanding the rules that underpin naming, what the trade-offs are, what makes one name good and another bad, whether you're breaking the rules or not, whether you should break the rules because some other purpose is more important: knowledge of these factors lets you make an informed choice when you name things. Making your choices informed rather than unconscious, makes you a better programmer.

So what, in my opinion, makes for a good name?

The qualities I strive for in a name are that it should be...

  • Accurate

  • Specific

  • Distinct

  • Consistent

  • Concise

  • Meaningful

  • Memorable

Lets go over those in some more detail.

Accurate

The name should accurately encompass the entire purpose of the thing being named.

Accuracy in naming allows you to follow the Principle of Least Surprise: if it does what it says on the tin, then people only have themselves to blame if they use it and it does something they didn't want to happen.

Being overly-precise on implementation details rather than concepts can be a bane to future development however.

If I have a document object that allowed me to add a footnote, then naming the method as add_footnote() would be accurate: it states that it adds something and that the something is a footnote.

If it was changed to allow a second argument that let you say whether it was a footnote or a margin-note, the name is no longer accurate, in fact it's quite possibly misleading.

If you knew that development was possibly going to head in that direction you could have anticipated this by naming it add_annotation(), which at all stages of implementation would have been accurate.

Specific

The name should be specific to the thing being named, and no other part of your code-base.

This is the counter to the above example in accuracy: if our document object had an add_annotation() method to add footnotes but also had a different method that allowed the addition of margin-notes, called add_margin_note(), then add_annotation() is no longer sufficiently specific to the concept of adding footnotes. Annotation has an overlap of meaning with both footnote and margin-note, so it could be mistaken as a method to add both or either.

Distinct

The name should be distinct from other names, so that it won't be confused with other parts of your code base.

This isn't just a matter of being specific and ensuring there's no overlap of meaning, it's a matter of ensuring there's no likelihood of accidentally substituting meaning.

A good counter-example is with the Apache Webserver, the Authentication and Authorization modules. The names are certainly both accurate and precise in their meaning, but which does which?

Even if you or I don't personally mistake them, it should be apparent that there's scope for confusion.

Naming one module mod_authn, which is a valid abbreviation of either, sure doesn't help; even if the other is named more clearly as mod_authz.

Being distinct is also the converse of consistency of behaviour, detailed below, in that groups of code behaviour that are different to each other should be distinguishable by their distinct naming.

For example, a group of methods that add information to a class could have consistent naming by having a common format of add_*(), whereas a group of methods that overwrite the information would have distinct, but consistent among themselves, naming that followed the format of set_*().

Consistent

The name should be consistent with your naming style or conventions within your code-base.

Whether you CamelCase, under_score or ALLCAPS, you should do it consistently. If for some reason, say locals vs globals, you mix them, you should follow those rules consistently throughout a code-base.

If you use set/get prefixes on accessors, or if you don't, you should do it consistently.

Order of adjectives and nouns should be consistent too, if at one end of your module you're using $y_max, then in another part of your module you shouldn't be using $max_x even if the code is complete unrelated.

Similar groups of behaviour should be similarly named, if you're in a loop and you have $min_y, $max_y, $best_y and $current_y you might well want to rename your use of $x to $current_x even if you don't have the min/max/best versions to distinguish it from: consistency with the $current_y could make its meaning clearer.

Consistency of naming for common behaviour helps people remember a name by reducing it to a game of "fill in the blanks" from a name that they do remember:

Concise

The more concise, without losing quality, a thing is, the clearer its other qualities tend to be.

Concise doesn't mean being short however, it means being sufficient but no more.

A variable named $upmc is short, a variable named $the_count_of_the_unread_private_messages_for_the_user_in_this_scope is sufficient but not short, whereas $unread_message_count is quite probably as short as it can be while still being sufficient: it's concise.

Meaningful

A good name doesn't just indicate what part of your code-base it deals with, it should also be educational.

Knowing and thinking of the name should subtly, or even overtly, guide you into thinking about correct use of that code.

This can range from some implication of what arguments a function takes and returns, or to indicate its larger place within a program's structure.

BEGIN would be a good example of a name that guides you into understanding its purpose, it could very well be named as something like POST_MODULE_COMPILE_INIT, which would possibly be accurate, but doesn't guide you into the idea of "this is where I put stuff that I want to run at the very beginning". Unless you happened to know what "post-module compile init" was, you wouldn't know that then was a good time to run things.

Being called BEGIN however tell you what it's for, rather than what it does or, worse, when it happens.

It's meaningful.

I mentioned the DBI.pm selectrow_arrayref() earlier, this is also a good example of a meaningful name because it conveys more meaning than just a description of what it does: knowing that the name is concise allows you to infer that each part of that name is important, and that the mention of _arrayref implies the existence of another function, using _hashref.

Similarly the existence of row in the name implies that there's probably a function that selects something else or that, if there isn't, the developers thought there could conceivably be one at some point.

So the name of that one function is educating the developer into an unconscious awareness of at least two other parts of the code-base, every time they use it.

Memorable

A good name needs to be memorable, or at the very least deducible.

It's not sufficient to be able to look at a name and think, "oh yes, that code does X", it must also be apparent from the flip side: "I want to do X, what's the logical name of the code that does it?"

If you compare two functions get_superclass() and get_parent_class() are both accurate and meaningful if you're looking at the names.

On the other hand, if you're a perl developer trying to remember "What's the function that tells me the superclass?", you're going to try get_superclass() first, because superclass is the terminology perl developers are more familiar with.

If you're consistent in your naming, you'll often find that you'll get memorable "for free", because it's easier to remember a single rule that applies to twenty situations than it is to remember twenty situations each with different quirks.

These are the main factors that I juggle in my head when I'm trying to give a decent name to something, sometimes they're cooperative, and sometimes they're in open conflict with each other.

For that Template::Benchmark feature, I'm struggling for something that's both accurate and memorable.

Maybe inspiration will strike soon.

Browse Sam's Blog Subscribe to Sam's Blog

By day of March: 03, 05, 09, 10, 18, 27, 31.

By month of 2010: March, April, May, June, July, August, September, November.

By year: 2010, 2011, 2012, 2013.

Or by: category or series.

Comments

blog comments powered by Disqus
© 2009-2013 Sam Graham, unless otherwise noted. All rights reserved.