Tuesday, May 7, 2013

Refactoring? Or Rewriting?

I was recently asked in an interview what it meant to refactor code and what the prerequisite was to refactoring. I have to admit, I was a bit thrown off being asked what the "prerequisite" is, since it seems to me to be reflexive software development nowadays. To me, refactoring is an interesting concept in software development that a lot of other engineering-type professions don't get to leverage as freely as we do. But, I think it has become (incorrectly) synonymous with rewriting code. (Or maybe refactoring is instead used as a guise to rewrite.)

Refactoring is very simple and is akin to rewording the language in your code to express the same idea, or original intent, in a different statement. Rewriting your code is changing the intended behavior of the code—the opposite of refactoring. I have spent a lot of time refactoring some code on my current project lately, and I have run across a the following code a lot.

public List<Foo> GetFoos(SomethingElse[] somethingElses)
{
    var retval = new List<Foo>();

    foreach(var else in somethingElses)
    {
        retval.Add(new Foo() { Bar = else.Bar });
    }

    return retval;
}

So, this is a pretty trivial sample of code, but it is really easy to see what is going on and what the intent of the code is. Basically, it is mapping a collection of one type of object to a collection of another type. We create a list and then iterate through the original collection, appending a new instance of Foo, and then return it. Using LINQ, we can actually simplify this just a tad and even type less code to get the same effect. (Refactor it.)

using System.Linq;

public List<Foo> GetFoos(SomethingElse[] somethingElses)
{
    return somethingElses
        .Select(x => new Foo() { Bar = x.Bar })
        .ToList();
}

Again, this is a trivial example, but it shows how we can refactor our code, keep in tact the original desired behavior, but use different syntactic sugars to clean it up. (I prefer using LINQ, personally, over foreach, especially in nested situations.) But, besides this being a trivial situation, what confidence do we have that we did not silently introduce a bug into our system? What assurance do we have? Well, before refactoring our code, we should put barriers in place to help us reason about our code. In the simplest sense, we should have a unit test, in place, before we make our changes to assert that we did not introduce a bug. Of course, if we're good TDDers we would have this unit test in place already, and have a nice regression test to fall back on. If not, it would behoove us to get one in place, quickly.

[Test]
public void given_an_array_of_something_elses_it_should_return_a_list_of_foos()
{
    var somethingElses = Enumerable.Range(0, 5)
        .Select(i => new SomethingElse() { Bar = i })
        .ToArray()

    // Let's assume GetFoos is defined as a static method on Baz
    var result = Baz.GetFoos(somethingElses);

    for (int i = somethingElses.Length - 1; i >= 0; --i)
    {
        Assert.That(result[i].Bar, Is.EqualTo(somethingElses.ElementAt(i).Bar));
    }
}

Well, that's refactoring, but what about rewriting? If we look at our simple method, what happens when we pass in a null array of SomethingElse? At this point, our method doesn't care and will attempt to iterate over it anyway. This, of course, results in a null reference exception and we have to track down how this happened. But, let's say we decide to change the behavior of this method. We, instead, will throw an exception if the array is null because we the precondition has not been met. Since we are changing the method's behavior, we are rewriting our code. One hint that this is a rewrite is the fact that we need to introduce a new unit test.

[Test]
public void given_a_null_array_it_should_throw_an_argument_null_exception()
{
    var exception = Assert.Throws<ArgumentNullException>(() => Baz.GetFoos(null));
    Assert.That(exception.ParamName, Is.EqualTo("somethingElses"));
}

I have used NUnit-style syntax in this post, but it should be fairly clear what is being tested.

Now, why is this important? Well, as we're writing our code, we tend to learn a lot about it. We see new ways to implement things that allow us to, later, extend the behavior of something without changing the original meaning. It also allows us to nicely execute test driven development: "Red –> Green –> Refactor."

No comments:

Post a Comment