Coupling and cohesion: Guiding principles for clear code

Introduction

Often developers see code that is hard to read and maintain, but it’s difficult to pinpoint why. The code is poorly organized, the purpose of a module is difficult to understand, and changing code in one place leads to unintended changes elsewhere.

Many structural code problems can be understood in terms of relationships between code units. There can be relationships where they shouldn’t exist (high coupling), or modules can be poorly divided so that unrelated functions are grouped together (low cohesion). High coupling leads to unintended change cascades, and low cohesion leads to incomprehensible code.

Thus, it would appear, the programmer should follow the common advice “aim for low coupling and high cohesion”. But how are these concepts defined and how to apply them in practice? Is it a good idea to always minimize coupling and maximize cohesion?

Framework: network of software elements

In order to formulate coupling and cohesion, we think of the system abstractly as a network of atomic elements (such as functions) and their relationships, and group elements into modules.

This system consists of six elements (e1 to e6) which are grouped into two modules (m1 and m2). Intuitively, e1 might be the entry point of module m1 and it delegates functionality to e2 and e3 using intramodule relationships. Module m2 consists of two unrelated functionalities. The first, e4, invokes e1 using an intermodule (coupling) relationship. The second functionality consists of elements e5 and e6. Element e5 is coupled to e3 from module m1.

Coupling and cohesion

We define two distinct types of coupling, “bad” and “good”, and define cohesion using “good” coupling.

Coupling of the entire system consists of the intermodule relationships. In our example, these are the relationships from e4 to e1 and e5 to e3. This is the kind of coupling we often want to minimize.

Intramodule coupling of an individual module consists of the relationships between the elements of the module. For module m1, these are the relationships from e1 to e2 and e1 to e3. This is often a desirable type of coupling.

Cohesion of a module is defined using intramodule coupling. A maximally cohesive module has all possible intramodule relationships. To increase cohesion, we need to increase intramodule coupling. Module m1 is not maximally cohesive, because it does not have the relationship between e2 and e3. Module m2 is less cohesive than m1, because it has only one intramodule relationship.

Next, we illustrate how to use these concepts in practice.

Example: low cohesion

Codebases sometimes have a global utils module that is a drop box for miscellaneous code that doesn’t fit elsewhere. The purpose of such a module becomes difficult to understand as the codebase grows. Consider the following TypeScript example, which has routines for handling custom timestamp formats and phone numbers:

/* utils.ts */

const parseTimestamp = (timestamp: string): Date => {};
const formatTimestamp = (datetime: Date): string => {};

const isValidPhoneNumber = (phoneNumber: string): boolean => {};
const extractPhoneCountryCode = (phoneNumber: string): number => {};

We can visualize the module using functions as elements. For relationships, we here use abstract themes (date handling, phone numbers) instead of code dependencies, because we don’t necessarily expect utility functions to call each other.

We can see that the module has low cohesion, because date handling functions don’t have any relationship to phone number functions.

More cohesive design

We can improve cohesion by splitting utils into dateUtils and phoneNumberUtils modules:

/* dateUtils.ts */

const parseTimestamp = (timestamp: string): Date => {};
const formatTimestamp = (datetime: Date): string => {};

/* phoneNumberUtils.ts */

const isValidPhoneNumber = (phoneNumber: string): boolean => {};
const extractCountryCode = (phoneNumber: string): number => {};

In the network representation, we can see that both modules are fully cohesive. However, this does have the drawback of increasing the total number of modules.

Example: unnecessary coupling

Code sometimes has coupling that can be removed. Consider a system that handles data from Internet-connected (IoT) sensors and tracks logged-in users. We want to format timestamps from IoT sensors using millisecond precision, and the last login time of users using whole-second precision. We might implement common date handling into a dateUtils module, and separate IoT and user concerns into cohesive iotService and userService modules.

/* dateUtils.ts */

/** Format timestamp using either millisecond or second precision. */
const formatTimestamp = (timestamp: Date, toMilli: boolean): string => {};

/* iotService.ts */

import { formatTimestamp } from './dateUtils';

const formatSensorTimestamp = (sensorTimestamp: Date): string => {
    // Millisecond precision
    return formatTimestamp(sensorTimestamp, true);
};

/* userService.ts */

import { formatTimestamp } from './dateUtils';

const formatUserLastLogin = (lastLogin: Date): string => {
    // Whole-second precision
    return formatTimestamp(lastLogin, false);
};

We model functions as elements and function calls as relationships:

This design is highly coupled. First, the service modules are directly coupled to dateUtils. Second, the dependencies from iotService and userService to dateUtils establish common coupling, because changes to the dateUtils module can affect iotService and userService. The modules iotService and userService are indirectly coupled to each other, because the author of iotService may need to modify dateUtils, which in turn can affect userService.

Eliminating coupling

The use of shared modules is often motivated by the Don’t Repeat Yourself (DRY) principle. However, it leads to increased coupling. There are two alternatives to this:

  1. Eliminate the dateUtils module and embed timestamp formatting independently into both iotService and userService. A small amount of code duplication (violation of the DRY principle) is often tolerable. However, if the amount of code duplication is large, increased coupling needs to be accepted.

  2. Use non-modifiable public libraries instead of modifiable custom libraries when possible. Common coupling occurs only when shared code or state can be modified by service module authors.

Conclusions

Coupling and cohesion help to objectively identify code quality problems. Although coupling and cohesion can be formally defined and quantified, in day-to-day software development they are best used as qualitative guiding principles. Coupling and cohesion must be balanced together with other design concerns, such as code duplication and the total number of modules.

References

Allen, E.B. & Khoshgoftaar, T.M. (1999). Measuring Coupling and Cohesion: An Information-Theory Approach. In Proceedings of the 6th International Software Metrics Symposium, Nov. 1999, Boca Raton, FL, USA.

Briand, L., Morasca, S. & Basili, V. (1996). Property-Based Software Engineering Measurement. IEEE Transactions on Software Engineering, 22(1).

 

Read more: Structuring tests using Given-When-Then


This article is written by Senior Software Architect Kristian Ovaska.