Zerø Wind Jamie Wong

Too DRY - The Grep Test

I’ll occasionally see code that attempts to use clever mechanisms to reduce code duplication or be terser by using metaprogramming or string concatenation or any number of other mechanisms to DRY up code.

But sometimes it impedes maintainability in ways that are not immediately obvious. When working in any language, but particularly dynamic languages such as Python, JavaScript, or Ruby, grep or one of its variants (ack, git grep, etc.) is an essential tool for figuring out where and how a class, function, module, or variable is used. Here I propose a simple test, that if failed, is sufficient to reject a piece of code under review. (You are performing code reviews, right?)

The Grep Test: If any code declares or makes use of a function, class, module, or variable that cannot be located by grepping for its full identifying token, it fails the Grep Test.

By “full identifying token”, I mean that if I search for inverseTransform, I expect to find every declaration or invocation of any method named inverseTransform. I would not expect to be able to find every use of the inverseTransform method of Node by grepping for Node.inverseTransform, since that isn’t a single token.

EDIT: Added a section on Compromises in response to some of the discussion on HN.

Counterexamples

Dynamic Declaration

Here are a few examples of code that would fail the Grep Test because of dynamically declared functions.

JavaScript: dynamic function declarations

var Ray = function(position, direction) {
    this.position = position;
    this.direction = direction;
};

["position", "direction"].forEach(function(property) {
    var getterName = "get" + property.charAt(0).toUpperCase() + property.substr(1);
    Ray.prototype[getterName] = function() {
    return this[property];
    };
});

var ray = new Ray([0, 0], [1, 1]);
console.log(ray.getPosition());
console.log(ray.getDirection());

Fails because grepping for getPosition won’t find me the function definition.

Ruby: method_missing

class Ray
  attr_accessor :position, :direction

  def initialize(position, direction)
    @position = position
    @direction = direction
  end

  def method_missing(m, *args, &block)
    if m.to_s =~ /^negative_(\w+)$/
      case $1
        when "position"
          [  -@position[0],  -@position[1] ]
        when "direction"
          [ -@direction[0], -@direction[1] ]
      end
    else
      super
    end
  end
end

ray = Ray.new([1, 2], [3, 4])
puts ray.position.inspect
puts ray.direction.inspect
puts ray.negative_position.inspect
puts ray.negative_direction.inspect

Fails the test because grepping for negative_position won’t find me the function definition.

Python: __getattr__

import re

class Ray(object):
    def __init__(self, position, direction):
        self.position = position
        self.direction = direction

    def __getattr__(self, name):
        pat = re.compile('^(\w+)_is_zero$')
        if pat.match(name):
            prop = pat.findall(name)[0]
            if prop == 'position':
                return self.position[0] == 0 and self.position[1] == 0
            elif prop == 'direction':
                return self.direction[0] == 0 and self.direction[1] == 0

        raise AttributeError

ray = Ray([0, 0], [1, 1])
print ray.position
print ray.direction
print ray.position_is_zero
print ray.direction_is_zero

Fails the test because grepping for position_is_zero won’t find me the function definition.

Dynamic Invocation

Here’s an example that would fail the Grep Test due to dynamic invocation:

Let’s say we have a form:

<form class="login">
    Username: <input name="username" />
    Password: <input name="password" />
    <input type="submit" />
</form>

And we want to update a model based on the values of these fields when the form is submitted:

var User = function() {};

User.prototype.getUsername = function() {
  return this._username;
};

User.prototype.setUsername = function(username) {
  this._username = username;
};

User.prototype.getPassword = function() {
  return this._password;
}

User.prototype.setPassword = function(password) {
  this._password = password;
};

$("form.login").submit(function(ev) {
  ev.preventDefault();
  var fields = $(this).serializeArray();
  var user = new User();
  $(fields).each(function(i, field) {
    var capName = field.name.charAt(0).toUpperCase() + field.name.substr(1);
    var value = field.value;
    user["set" + capName](value);
    console.log(user["get" + capName]());
  });
  console.log(user);
});

Fails the test because grepping for getUsername won’t find the invocation in the submit callback.

Compromises

So should we throw out dynamic code generation wholesale? No. We just have to be careful to preserve the whole identifiers our fellow developers are likely to be looking for.

My preferred solution to all of the above counterexamples is to just avoid dynamic declaration and invocation completely, but there is certainly a scale at which this becomes a pain and metaprogramming becomes a pragmatic option.

Avoiding String Manipulation

One option is to provide slightly more information while still dynamically generating functions.

[
  {attr: "position", fn: "getPosition"},
  {attr: "direction", fn: "getDirection"}
].forEach(function(property) {
  Ray.prototype[property.fn] = function() {
    return this[property.attr];
  };
});

Now both position and getPosition remain greppable while still avoiding duplicating any common functionality you may have for these functions.

Using Comments

Another similar option for retaining dynamic methods while maintaining greppability is to simple add the tokens in the comments.

// Generates the following methods for Ray:
//  - getPosition
//  - getDirection
["position", "direction"].forEach(function(property) {
  var getterName = "get" + property.charAt(0).toUpperCase() + property.substr(1);
  Ray.prototype[getterName] = function() {
    return this[property];
  };
});

While I frankly still find this code more difficult to follow than the first compromise or avoiding dynamic declaration altogether, I’d still appreciate these comments when navigating a codebase.

Why is this Important?

When working with dynamic code, it’s an incredible boon to productivity to be able to quickly locate the definition of functions so you can build a complete mental context about what’s going on. Two common tools for doing this are grep and ctags, both of which require the entire identifier be written out at the declaration site. (For the vim users reading this, you should really know about using ctags in vim.) Dynamic declaration ruins this.

When modifying existing code, especially without good test coverage, especially when refactoring code without good test coverage, it’s essential to be able to find all the usages of a given class or function. Here, grep is my tool of choice. Dynamic invocation ruins this. If you have code breaking the Grep Test, refactoring becomes a nightmare. I’ve deleted code I thought was unused far too many times because of this.

As a broad generalization, I would say dynamic declaration is occasionally worth the tradeoff, but dynamic invocation is almost never worthwhile.

A Comment on Frameworks

I’m sure by this point, a few of you have thought “Hey, but Rails fails the Grep Test!“. This is absolutely true, most notable due to the dynamic find_by_* finders, and the dynamic *_path URL path generators.

I personally remember tripping over trying to find these in the documentation when I was learning Rails, so I’m not a huge fan of these. That said, there is an argument for the beautiful sugar this provides. If you’re going to do stuff like this, use it sparingly and make sure it’s a well known convention used everywhere.

If you liked reading this, you should subscribe by email, follow me on Twitter, take a look at other blog posts by me, or if you'd like to chat in a non-recruiting capacity, DM me on Twitter.


Zerø Wind Jamie Wong
Previously Something Out of Nothing May 5, 2013
Up next Rubber Ducky Logs September 29, 2013