(guest@joequery.me)~ $ |

Self documenting conditional statements

This is an opinion article on improving the readability of conditional statements through the use of meaningful variable names.

Self documenting code

In order to discuss what I mean by "Self documenting conditional statements", we need to know what self-documenting code means. Self documenting code is simply a formal name given to the practice of giving your variable/function/class names meaningful names. Advocates of self-documenting code generally take this one step further and assert that code comments should be kept to an absolute minimum. Code comments should only be used to describe the intention of blocks of code that can't be explained through identifier names alone. Comments should describe the intent of code when necessary (the "why?") instead of summarizing "what" the code is doing.

The primary argument for self-documenting code is the generalization that comments are likely to become out of sync with the code they are commenting on. In other words, as code changes, someone will eventually forget to update the comment to reflect the changes in the code.

Example

Here's a simple example to demonstrate the principle of self-documenting code.

Non self-documenting:

def current_city():
    '''
    Returns the current city of the machine running the script as a string. The
    city is determined from the IP address.
    '''
    # Use the freegeoip geolocation API to get current information based on
    # the IP address of the machine running the program
    r = requests.get("https://freegeoip.net/json/")

    # Convert the geolocation json response data
    return json.loads(r.content)['city']

self-documenting:

def get_city_by_current_ip():
    geolocation_api_url = "https://freegeoip.net/json/"
    r = requests.get(geolocation_api_url)
    geo_data = json.loads(r.content)
    return geo_data['city']

Comments that summarized what the code did were removed. New identifiers were introduced as needed in order to convey the information the comments once presented. Naturally, self-documenting code has the tendency to introduce more identifiers, along with longer identifier names.

Self documenting conditional statements

Now that we know what self-documenting code is all about, let's talk about conditional statements.

A conditional statement is Boolean expression used within a language construct to make a decision. For example,

if x<5:
    # do something
else:
    # don't do anything

While loops as well:

while response != 'q':
    # do something

Expressions such as response != 'q' and x<5 are known as relational expressions. Relational expressions evaluate to either Boolean True or Boolean False. When you introduce logical AND or logical OR into the expression, the relational expression becomes a compound relational expression. For example,

if x > 5 and x%2 == 0:
    # do something

It's fairly common for compound relational expressions to be used as conditions. However, over the years I've observed a tendency for these relational expressions within conditions to not convey the intent of the expression very well.

A simple example

Consider a common scenario in my life where I decide if I want to go outside. If it's cold outside and there's a high chance of rain, you'll have to drag me, kicking and screaming, to get me out of the house. I'm fine with just a high chance of rain, or I'm fine with it just being cold, but I can't handle them both! Here's what that might look like expressed as an if-statement

if temp < 75 and precip_chance > .5:
   # I'm staying inside
else:
   # I'll go out!

While this example is fairly easy to follow, I believe there is an unnecessary mixing of responsibilities occurring within the conditional statement. The relational expression temp < 75 represents the idea of "cold", and precip_chance > .5 represents the idea of "high chance of rain". When software developers are familiarizing themselves with new source code, we mentally emulate how the machine will journey through the program based on different conditions. It makes the developer's job much easier to consider these different conditions when they can focus on the ideas represented by the expressions rather than the logic of the expressions. In other words, it would be much easier to reason through a program thinking "If it's cold but not likely to rain, then..." as opposed to "if temp < 75 but precip_chance <= .5, then..."

Consequently, I propose writing conditional statements resembling this example in the form

is_cold = temp < 75
likely_to_rain = precip_chance > .5

if is_cold and likely_to_rain:
   # I'm staying inside
else:
   # I'll go out!

Now the intended path of the if statement is extremely obvious. The logic for determining is_cold or likely_to_rain is now separate from the organization of the if statement, allowing us to focus on the core ideas as opposed to the raw numbers.

A real world example

Django, the most widely used Python web development framework, has a template tag called include which lets you include one template into another. It can be called with a string literal

include "foo/bar.html"

or with a string variable

include template_name_variable

Within Django's source for the include tag, it has to evaluate if what was received as the argument was a string literal or a variable. Here's how it accomplishes this

if path[0] in ('"', "'") and path[-1] == path[0]:
    # do stuff knowing the argument is a string literal
else:
    # do stuff knowing the argument is a variable

Source

To summarize, if the first character of the argument passed to include is equal to a single quote or a double quote, and if the first and the last character of the argument match (to ensure there isn't a single quote/double quote mismatch), assume the argument provided is a string literal.

In my opinion, the logic for how we determine if the argument is a string literal should not be within the condition itself. The developers reading the source should easily be able to tell the path the program will take if the argument is a string literal. How that is determined is irrelevant when we simply intend to trace the path of the program.

One solution might be to leave a comment

# The argument is a string literal if the first character is a single/double
# quote, and the first and last chaarcters of the argument match
if path[0] in ('"', "'") and path[-1] == path[0]:
    # do stuff knowing the argument is a string literal
else:
    # do stuff knowing the argument is a variable

However, to avoid the issue of excess comments simply explaining "what" the code does while still conveying the intent of the condition, we can easily adjust the code to be self-documenting:

arg_is_string_literal = path[0] in ('"', "'") and path[-1] == path[0]

if arg_is_string_literal:
    # do stuff knowing the argument is a string literal
else:
    # do stuff knowing the argument is a variable

The basic idea of "If this, do this" is much easier to follow when excess logic is extracted from the conditional statement.

Don't get crazy!

Like any programming strategy or pattern, you can take this idea too far! I'm not advocating that all conditional statements should be converted into variable expressions. Conditions involving multiple ranges of numbers, for example, are easy enough to follow. I'm simply advocating that when it's possible to easily extract logic from a conditional statement, you can often times improve the readability and organization of your code by storing relational expressions in well named variables.

Tagged as opinion, style

Date published - September 18, 2014