Refactoring to Go Faster Later

Posted 2020-02-27

I maintain this tool guet, and I've been doing a lot of refactoring as of late. Refactoring is a practice in software development where engineers go back and change the implementation of code to increase clarity why still maintaining the original functionality. In an effort to explain why refactoring is valuble not just for the engineers, but for the product team as a whole, I wanted to walk through an example of a recent refactor and its outcomes.

The Problem

When updating code we manage changes using a tool called git. Saving the current state of a codebase using git is called a commit, and one normally looks like this.

commit bff1ad6cb29db56c624c82f8968058e82b292b99
Author: chiptopher <chrisboyerdev@gmail.com>
Date:   Thu Feb 27 07:28:49 2020 -0500

    Initial commit

The author (line 2 of the commit) gets credit for the commit. However, on sites like Github you can include the the following lines to give more people credit:

    Co-authored-by: First Committer <first@test.com>
    Co-authored-by: Second Committer <second@test.com>

There's been a bug in guet since almost its inception back in 2018. What guet it supposed to do is make pair programming contribution tracking easier. You can add people, set them as the committer, and then they should be tracked in the commit. guet should also rotate who the git author is between the set committers. The workflow would look something like this:

guet add f "First Committer" first@test.com
guet add s "Second Committer" second@test.com
guet set f s
git commit -m "Initial commit"
git commit -m "Second commit"
git log

What a guet user would expect is on the first commit, "First Committer" would be the author. Then on the second commit the author would be "Second Committer." However, when looking at the actual commit logs one can see that the first committer is me ("chiptopher"). This is because I have myself globally set on my computer as the git author.

commit 2dbc76b72dac7cbfc672328960ba26f638e9b083 (HEAD -> master)
Author: Second Committer <second@test.com>
Date:   Thu Feb 27 07:28:55 2020 -0500

    B

    Co-authored-by: Second Committer <second@test.com>
    Co-authored-by: First Committer <first@test.com>

commit bff1ad6cb29db56c624c82f8968058e82b292b99
Author: chiptopher <chrisboyerdev@gmail.com>
Date:   Thu Feb 27 07:28:49 2020 -0500

    Initial commit

    Co-authored-by: First Committer <first@test.com>
    Co-authored-by: Second Committer <second@test.com>

This error was originally logged in November of 2018, but it wasn't until recenlty that I found the reason when poking around in the git module of the code. There was a function in the code that interacted with git configure_git_author that would set the name and email of the current git author. The code that ran when you use the guet set command never calls configure_git_author. However, the code than runs after a commit did call configure_git_author.

Now, for sure the simplest solution to fixing this bug would have been to call the configure_git_author method in the guet set command code, but there was a refactor that I wanted to play which I hoped would aleviate this kind of problem in the future. Enter the observer pattern.

Observer Pattern

I'm not going to get too into the weeds of what an observer pattern is. The formal definition given for it by the Gang of Four was

Define a one to many dependency between objects so that when one object changes state, all its dependents are notified and update automatically.

And this is roughly the diagram for how I implemented it in guet.

Essentially, when you have certain kind of state changes in your system that can can cause a ripple effect, an observer pattern can help define how to listen for those state changes.

Implementation

In my case, the state change was the currently set committers. Every time the currently set committers changes, there are three updates I have to coordinate.

Update the guet configuration for who the current committers are
Update the guet configuration for who the current author is
Update the git configuration for who the current author is

Each change has a a function associated with it: configure_git_author, set_committers, and set_committer_as_author. In order to make sure each of those things happens every time the current committers change, I have to remember to call each function.

With the observer pattern, I implement a contract that says when the committers are changed all off of the observers will be notified. The observed object in my implementation was a Context with a set_committers method. The Git module becomes an observer of the Context. Now, whenever Context is used to update the committers, every module that is observing it will handle the new committers.

How This Helps

Were I to put the configure_git_author call in the guet set command code that would have solved the problem. Including testing, it might have taken twenty minutes. My observer solution took around two hours to complete. This might seem like an expensive price to pay for not gaining much, but results in code that's easier to work with.

Since a process exists that codifies this relationship it will be easier to augment this behavior with future features, and reduces the number of places where consquences of a change in committers exists to one. Every place we have duplicated logic is another place we have to make a change for one feature. To get the git author to update when committers change, we would have to do it once in guet set and once in the post-commit processor. And keep in mind that we have to write a test for each. That solution wouldn't scale in the long run.

The new implementation is clearner, easier to test, and easier to work with. Refactoring is a neccessary part of codebase maintenance, and goes a long way towards speeding up later development.