Cleaning Up Your Code

If you've been programming for any period of time, you've probably seen bad code. The kind that you're scared of touching for fear of breaking the program, the kind that employs some trickery that is beyond any documentation found online, the kind that slows you down.

So what makes clean code, and how can we tackle the disease of bad code?

Luckily, you can follow the same principles to write good code and to refactor bad code. Here are some attributes I've noticed in good code:

Good Names

Programming is not a math equation. The vast majority of the programs you write will not be simple. They will have to interact with the rest of the program, forming complex relationships along the way, and therefore have to be read by other programmers. Someone with no idea of your contribution should be able to quickly understand what your code does.

Let's look at an example, which code snippet is easier to grasp?

Bad

vector<pair<int, int>> getThem() {
  vector<pair<int, int>> v;

  for (pair<int, int> x : theList) {
    if (x.second == 4) {
      v.push(x);
    }
  }

  return v;
}

Good

vector<pair<int, int>> findFlaggedCells() {
  vector<pair<int, int>> flaggedCells;

  const int FLAGGED = 4;
  for (pair<int, int> cell : gameBoard) {
    if (cell.second == FLAGGED) {
      flaggedCells.push_back(cell);
    }
  }

  return flaggedCells;
}

Better

vector<Cell> findFlaggedCells() {
  vector<Cell> flaggedCells;

  const int FLAGGED = 4;
  for (Cell cell : gameBoard) {
    if (cell.status == FLAGGED) {
      flaggedCells.push_back(cell);
    }
  }

  return flaggedCells;
}

If you'd like to start using good names throughout your codebase, here are some general rules to follow:

  • Expose meaning through names: Similar to the above example, the name of a variable should provide some meaning towards code functionality.

  • Use pronounceable names: Software Development involves talking through your code with other developers. It will help a lot if you can actually talk through your code.

  • Use searchable names: There will be many times you want to track a variable or function through your codebase. In this case, the variable should be unique enough that it can be searched for with few duplicates.

Abstracting Reused Code

If you've found pieces of code in your codebase that are heavily duplicated or rely on the same logic, they are probably great contenders for abstraction.

Abstracting code into a separate function or class can help reduce code complexity as well. If your algorithm related to that block of code changes, you only have to change it in one place versus changing it in multiple places across your codebase.

What does a good abstraction look like? Well let's take our example from earlier. The good_names.cpp file is really easy to read! However, we would have to rewrite that code block if we ever wanted to find a cell based on a different FLAG_CODE. So let's move that code into an isolated function.

vector<Cell> findCells(const vector<Cell>& gameBoard, const int FLAG_CODE) {
  vector<Cell> cells;

  for (Cell cell : gameBoard) {
    if (cell.status == FLAG_CODE) {
      cells.push_back(cell);
    }
  }

  return cells;
}

Before you decide to start splitting your code apart, here are some things to keep in mind:

  • Keep your functions small: A small function is easy to read, but also very easily reused. Your function should be able to be used for a wide variety of use cases, not just the one case it used to be in.

  • Limit the functionality: If your code does more than one exact thing, it will be difficult to use it across your codebase. Continue splitting your code until your functions have only one purpose each.Then you can implement a function to call these smaller functions in any way you wish!

  • Ensure no side effects: A function should be thought of as a walled off piece of code that you control. It should *purely *accept input and return output. A pure function is a function that does not affect any code it does not have direct access to. A simple example would be an impure add function that adds two variables outside of it's scope. This is very hard to keep track of and should be avoided in most situations.

Helpful Comments

There will come a time when you have to write some code that is so convoluted that it just can't be fixed with good naming or any number of levels of abstraction. Or you just have some code that requires some “inside” knowledge to understand.

In this case you have two options: you can either write that bit of code and continue on your merry way, or you can leave a comment for yourself and other programmers. Leaving a comment requires very little effort on your part because you've already written the code!

int hash(string input) {
  int output = 5381;

  // Less collisions than simply adding ASCII characters of value
  // simple method: hash("tab") = hash("bat")
  // this method: hash("tab") != hash("bat")
  for (char c : input) {
    output += (output << 5) + c;
  }

  return output;
}

Some questions to consider when writing your comments:

  • What does this code do?

  • Where in the codebase is this code called?

  • Why is this the best way to accomplish the problem?

Easy-to-read Formatting

An age-old debate: tabs or spaces? Braces on the same line or new line?

I'm not going to suggest any specific formatting, but I do want to expose you to some guidelines on deciding formatting for your project. First and foremost, once you've found a method of formatting to use, stick to it.

There's two steps to getting most of the way to a perfect style guide for your project.

  1. Pick a popular style guide for your language. Examples are included below.

  2. Make slight variations to it for your specific use cases. If a certain point in your chosen style guide doesn't make sense for your use case, document it. (Ideally in a CONTRIBUTING.md file!)

Using Intuitive Data Structures

Programming is hard enough as it is, don't make it harder for yourself by choosing poor data structures.

Here are some examples I've seen in real code. I'm defining a “good” data structure as one that most easily maps to the real-world representation of the data and can be picked up quickly by a colleague.

Can you think of a better data structure than the one initially presented? (Try to find one before you read my explanation!)

Using a list as a way to store a Bingo board

This is the “worst” of all the scenarios I'll cover. While a list may seem easier to traverse with a simple for-loop, it doesn't represent the real-world board very well. (If you have a sharp eye, you may have noticed my previous examples with gameBoard used this data structure.)

How would you go “down 2 spaces” if the board were stored as a list? Instead of increasing the row index by two, you would increase your index by 2 * NUM_COLUMNS. You can't do that unless you know how big the board is, which would mean passing your board dimensions to every function that accesses it.

A Bingo board should be stored in a more intuitive two-dimensional vector (a matrix).

Using a string to store an arithmetic equation

A string was likely used because the input from the user was in the form of a string. However, this approach lends itself very easily to out-of-bounds errors or incorrect results.

Evaluating the equation would involve keeping an answer variable that is either NULL or the result of the last operation. Then you iterate through the string, and if you find an operator either execute the operator on the previous and next values or the answer variable and next value if the answer is not NULL. You would also have to check to see if you've ever written to the answer variable because NULL = 0 and having an equation such as 2 — 2 would return the answer variable to the NULL state. The string method also can't be extended to support precedence through parenthesis.

A better method would be to store your equation as an Binary Expression Tree. Evaluating the function would simply be an in-order traversal of this tree. While building the initial tree would require a bit of work, it will make the rest of the program trivial.

Using an array to store pictures posted by a user

This is the most subtle error out of the three scenarios. If I wanted to find all the photos a user has posted, why shouldn't it be stored in a list? It seems the most user-friendly method.

Well to see the error in this situation, we have to account for the situation we want to find a single photo. This now becomes an O(n) operation, testing every photo against some unique id.

Instead, we may want to store the photos in a map, with the unique id as the key and the photo data as the value. This is okay because the id should not change throughout the lifecycle of the photo. Now when we want to find a specific photo, we've reduced our complexity to O(1), while not affecting the use case for listing out all of the users photos.

Handling Errors Well

A great program is nothing if it can't handle edge cases or non-optimistic cases. Always make sure to test your code with some bad cases, such as:

  • NULL input

  • Vectors, Lists, or Maps that are empty

  • Negative input when you expect it to be positive

Let's make our findCells function from earlier more robust by handling these edge cases. (Note that the check I added isn't necessary, and is used only to prove a point.)

vector<pair<int, int>> findCells(const vector<pair<int, int>>& gameBoard, const int FLAG_CODE) {
  if (gameBoard.empty()) {
    return vector<pair<int, int>> {};
  }

  vector<pair<int, int>> cells;

  for (pair<int, int> cell : gameBoard) {
    if (cell.second == FLAG_CODE) {
      cells.push_back(cell);
    }
  }

  return cells;
}

There are also some changes we could make given some assumptions.

  • **What if FLAG_CODE should always be positive? **You can either make the FLAG_CODE argument of type unsigned or you can add a check for that.

  • **What if FLAG_CODE is out of the range of the expected FLAG_CODE range? **You can store an enum for the FLAG_CODE and then let C++ handle the type checking for you!

The main point of this section is to make sure your code either doesn't break (not likely), or breaks in a predictable way. How can we test for that? Read the next section!

Testing Your Code

Before I start explaining the specifics of testing in certain languages, I want to start by explaining the concept of Test Driven Development. You may have heard the term before but thought that your project wasn't big enough to adopt this method of developing.

Test Driven Development (TDD) is a method of developing in which you write tests before you write any code. Why would you want to do this?

TDD actually enforces writing tests for your code! Many times you'll write a program, run some examples and call it done. However, you've likely missed several bugs in your code by testing only the optimal cases. If you did manage to find some bugs and fix them, it is more work to make sure your fix did not break any previous cases. For this reason, tests serve as a method of documentation for all the cases you've checked so far and as a way to force you to think of ways to break your code.

So what makes a good test? Follow the F.I.R.S.T method!

  • Fast: If a test is slow, your approach to the task is probably not optimal. If the test is slow for a sample size of one, imagine how that would scale to a system of a million users!

  • Independent: A test should not depend on the outcome of another test. In the case that tests were dependent, then your tests would fall in a domino effect, hiding bugs in a different test until you fix the first bug. (Not ideal for when you're working in teams asynchronously.)

  • Repeatable: A test should not leave anything to chance. Every time you run a test, assuming no changes to the code have been made, you should expect an identical outcome. If the test cannot be repeated, you will eventually fall into a “It works on my machine!” attitude which is not beneficial to your team or your users.

  • Self-Validating: A test should return a boolean, true or false. This allows us to automate the job of testing our code, checking if all the tests return true or not. If you have to check some printed output to see if it matches the expected output, you could instead do a diff on the expected string and the returned string. (This changes our test from a string return type to a bool return type!)

  • Timely: This is the self-fulfilling prophecy in F.I.R.S.T. If you write your tests in a timely manner, your testing will go smoothly. Do not put off writing tests in favor of extending the functionality of your program.

Some housekeeping notes, but keep your tests separate from your actual program logic, ideally in a /tests directory.

Testing can be very domain-specific in its scope. My experience is primarily in Javascript and C++, which I feel represent two very different sides of programming. (Hopefully your language of choice falls somewhere on that spectrum!)

Javascript

Before you start thinking that tests have to be this fancy concept of your code that can only take a significant portion of your development time, remember that a test can be as simple as checking the output of a function to the expected output!

function add(a, b) {
  return a + b;
}

let sum = add(3, 4);
if (sum !== 7) {
  console.log('add(3, 4) failed. Expected: 7, Actual: ' + sum);
}

There are more robust ways to write tests in Javascript, mostly due to the vibrant developer community. Most commonly used is the test runner. A test runner will automatically run all your tests and present them in a very clean format. It is also much more extensible in the use case that you have databases of objects and want to mock some test data for your tests.

The test runner that I'm going to recommend is Jest. Jest is used by Facebook to test all their JavaScript code including their React applications. To me, Jest has a lot of reputation from being backed by Facebook and having a team dedicated to its active development.

To learn how to start using Jest, I highly recommend Fun Fun Function:

C++

As in Javascript, sometimes the easiest way to test your code in C++ is with some assert statements.

double add(double a, double b) {
  return a + b;
}

int main() {
  assert(add(3, 4) == 7);

  return 0;
}

When you want to get into some more complex testing, such as checking for memory leaks, you may want to start using static analyzers. First, use cppcheck to see if you even have a memory leak in the first place. cppcheck can also offer advice for better performance or cleaner code style.

If you do end up having a memory leak or segmentation fault, one option is to compile your program with the -g flag for debugging support. Then you can run your program with GDB (GNU DeBugger). GDB allows you to set breakpoints in your code, step through functions, inspect values at each step, and locate segmentation faults.

Fun fact: You can also use GDB as a disassembler for executables!

Further Resources

If you want to take your code to the next level, I highly recommend:

Special thanks to Daniel Stinson-Diess for helping me proofread the article and helping me with the C++ Testing portion!