# Composing Four-Part Harmony with Integer Programming

Music composition is often considered to be an entirely creative undertaking. However, there are a surprising number of rules and constraints, especially within Western classical music. Most rules are in some way based in the physics of sound; others are traditions and musical idioms that we have grown accustomed to over hundreds of years and have learned to expect.$\newcommand{\ca}[1]{\textbf{#1}} \newcommand{\cb}[2]{\textbf{#1}^{\textrm{#2}}} \newcommand{\cc}[3]{\textbf{#1}^{\textrm{#2}}_{\textrm{#3}}} \newcommand{\x}{$\times$}$

Many of these musical rules appear in their purest form in four-part harmony. This makes four-part harmony some of the most difficult music to write and therefore an excellent place to apply optimization techniques. By translating these musical rules into integer constraints, it is possible to build an integer programming model that writes four-part harmony on its own, or in combination with some initial melodic or harmonic constraints.

## Music Theory Background

Building chords from pitches and ensuring the smooth progression of pitches and a feasible progression of chords from one step in the music to the next is key in writing successful four-part harmony. In this section are some of the ideas that composers must keep in mind when writing four-part harmony.

### Building Chords

The major scale contains seven pitches built off the tonic pitch. On each of these pitches, one can build a chord. The chords build on notes 1 through 7 are labeled with the roman numerals $\ca{I}$ through $\cb{vii}{$\circ$}$. Capital roman numerals denote major chords, while lowercase roman numerals denote minor chords. Unlike the three-pitch triads, a $\cb{V}{7}$ chord contains four pitches.

Each of these chords comes in a couple permutations, called inversions. The bass, or lowest pitch of a chord, determines the inversion type. A chord that has its root pitch in the bass is in root position. A chord with its third in the bass is in first inversion ($^6$ or $^7_5$). A chord with its fifth in the bass is in second inversion ($^6_4$ or $^4_3$). And a seventh chord with the seventh in its bass is in third inversion ($^2$).

Because four-part harmony requires playing four pitches at the same time and almost all chords in the major scale contains only three distinct pitches, one of the pitches is often doubled.

### Voice Leading

Voice leading is the movement of notes from one chord to another within each of the four parts. While it is important that the notes of these voices form a valid chord together, the voices must also sound decent on their own and cannot interfere with each other tonally.

Generally, the four voices — the soprano, alto, tenor, and bass — must not cross or overlap, otherwise they are less distinguishable as well as harder to sing, assuming the music is written for vocalists. Each individual voice must follow their own rules. For instance, when playing the seventh scale degree, a voice must resolve upward to the tonic. Failing to do this will result in an unsatisfactory progression of sounds. Parallel fourths, fifths, and octaves are also prohibited. Fourths, fifths, and octaves are intervals, or distances between notes. Intervals of these kind are particularly noticeable when played by two voices at once, which weakens the harmonic structure of the piece and causes voices to lose their independence.

Finally, ideal voice leading moves as smoothly as possible, with the smallest allowed intervals between the pitches of each voice. While this is not a strict requirement, smooth voice leading is a highly desirable quality of four-part harmony.

### Harmonic Progressions

Although we may not notice it, almost all Western music follows the same harmonic formula. In general, harmonic progression follow this pattern:

Chords built off the third scale degree, or $\ca{iii}$ chords, tend to progress to $\ca{ii}$ or $\ca{IV}$ chords. These chords tend to move on to either a $\ca{V}$, $\cb{V}{7}$, or $\cb{vii}{$\circ$}$ chord. These are called dominant chords, and all tend to lead back to the tonic chord, $\ca{I}$. From the tonic, it is possible to progress to nearly any other chord. This is, of course, a massive but useful over-simplification of harmonic progression. The model of harmonic progressions used in the LP is more complete and handles inversions on a case-by-case basis.

## Notation

The decision variables for pitches, chords, and intervals are defined in terms of a series of sets of pitches, times, and chord classes, along with a series of mappings between chords and pitches.

### Voices, Pitches, Chords, and Time

In four-part harmony, four pitches are played at the same time, each belonging to a different voice. Let $V$ be the set denoting the four voices, the soprano, alto, tenor and bass:

\begin{equation} V = \{\text{S}, \text{A}, \text{T}, \text{B}\} \end{equation}

Each voice can play a subset of pitches belonging to a five-octave range. Let $P$ be the set of semitone pitches spanning this range, 12 semitones for each octave. In order to stay key-independent, let 0 represent the tonic.

\begin{equation}
P = \{0, 1, 2, \ldots, 59\}

\end{equation}

At each point in four-part harmony, the pitches played together form a chord. A chord class is the chord belonging to a set of pitches, disregarding the chord’s inversion. Let $C$ be the set of chord classes.

\begin{equation} C = \{\ca{I}, \ca{ii}, \ca{iii}, \ca{IV}, \ca{V}, \ca{vi}, \cb{vii}{$\circ$}, \cb{V}{7}\} \end{equation}

Finally, each set of four pitches and its corresponding chord are played at a certain time step. Let $T$ be the set of time steps:

\begin{equation} T = \{1, 2, 3,\ldots, t_{\max}\} \end{equation}

### Pitch and Chord Classes

If $p \in P$ is a pitch, let the pitch class of $p$ be:

\begin{equation} \pi(p) = \{p’ \in P : p \equiv p’ ~ (\mathrm{mod~} 12)\} \end{equation}

For example, if $p = 0$, $\pi(p)$ is the pitch class tonic pitch of the key. In the key of C, $\pi(p)$ is the set of all C keys on the piano. If $Q \subseteq P$ is a subset of pitches that are played simultaneously by the four voices, then the chord class of these pitches is:

\begin{equation} \chi(Q) = \bigcup_{p \in Q} \pi(p) \end{equation}

For example, $\chi(\{0, 4, 7\})$ are all pitches of a $\ca{I}$ chord and $\chi(\{7, 11, 14, 17\})$ are all possible pitches of a $\cb{V}{7}$. This means that $\chi(\ca{I}) = \chi(\{0, 4, 7\})$ and $\chi(\cb{V}{7}) = \chi(\{7, 11, 14, 17\})$.

### Inversions and Doubling

In this model, each chord is defined uniquely by its chord class and inversion. If $c \in C$ is a chord class, let its set of feasible inversions $\lambda( c ) \subseteq P$ be defined by the columns of the table below.

$\lambda$ | $\ca{I}$ | $\ca{ii}$ | $\ca{iii}$ | $\ca{IV}$ | $\ca{V}$ | $\ca{vi}$ | $\cb{vii}{$\circ$}$ | $\cb{V}{7}$ |
---|---|---|---|---|---|---|---|---|

1 |
× | × | × | × | × | × | × | |

3 |
× | × | × | × | × | × | × | × |

5 |
× | × | × | × | ||||

7 |
× |

The first row fixes the root of the chord as the bass (root position), the second row fixes the third (first inversion), the third row fixes the fifth (second inversion), and the fourth row fixes the seventh (third inversion). For example, $\lambda(\ca{ii}) = \chi(\{2, 5\})$, where $\pi(2)$ is the root of $\ca{ii}$ and $\pi(5)$ is the fifth.

$\delta$ | $i = 0$ | $i = 1$ | $i = 2$ | $i = 3$ |
---|---|---|---|---|

$c = \ca{I}, \ca{IV}, \ca{V}$ | 1 3 5 | 1 5 | 5 | – |

$c = \ca{ii}, \ca{iii}, \ca{vi}$ | 1 3 | 1 | – | – |

$c = \cb{vii}{$\circ$}$ | – | 3 5 | – | – |

$c = \cb{V}{7}$ | – | – | – | – |

Many chords being played contain three pitch classes, so two parts of the four-part harmony will have to play a pitch in the same pitch class. This is the doubled pitch. If $(c, i) \in C \times \lambda( C )$ is a chord defined by its chord class, $c$, and inversion, $i$, then the set of feasible doubled pitches $\delta(c, i) \subseteq P$ for this chord is defined by the table above.

For example, if $(c, i) = \cb{IV}{6}$ (a first inversion $\ca{IV}$ chord), then $c = \ca{IV}$, $i = 1$, and so $\delta(\cb{IV}{6}) = \delta(\ca{IV}, 1) = \chi(\{5, 9\})$, where 5 and 9 are the first and fifth of a $\ca{IV}$ chord.

### Harmonic Progressions

Given a chord $(c, i) \in C \times \lambda( C )$, let the set of valid progressions to other chords, $\phi(c, i) \subseteq C \times \lambda( C )$, be defined by the harmonic progression table (PDF), where the row is $(c, i)$ and the columns the set of valid harmonic progressions corresponding to $(c, i)$.

For example, $\phi(\cb{vi}{6}) = \{\ca{ii}, \cb{ii}{6}, \ca{IV}, \cb{IV}{6}, \ca{vi}\}$. This means that if the chord at time $t$ is a $\cb{vi}{6}$, the chord at time $(t+1)$ must be an element of $\phi(\cb{vi}{6})$.

## Feasibility Problem

In the feasibility problem variation of this model, the following constraints are used to find a valid four-part harmony. While a harmony generated in this way is not always the most aesthetically pleasing, it is technically correct and satisfies all part-writing rules outlined in the first section.

### Decision Variables

Let $x_{p, v}^t$ represent a pitch being played by voice $v$ with pitch $p$, on time step $t$ (all pitches are given the same length in this model).

$$x_{p, v}^t \in \{0, 1\} ~~~~~~~~~~ \forall p \in P, v \in V, t \in T $$

Let $y_{c, i, d}^t$ represent a chord of class $c$, inversion $i$ and doubled pitch $d$ being played at time $t$:

$$y_{c, i, d}^t \in \{0, 1\} ~~~~~~~~~~ \begin{array}{@{}l@{}} \forall c \in C, i \in \lambda( c ) \\ \forall d \in \delta(c, i), t \in T \end{array}$$

Finally, let $z_v^t$ be the semitone distance between the pitches of voice $v$ at times $t$ and $(t+1)$:

$$z_v^t \in \mathbb{Z} ~~~~~~~~~~ \forall v \in V, t \in (T \setminus \{t_{\max}\})$$

Together, these variables drive both the feasibility problem portion of this model, as well as the harmony optimizing portion.

### Pitch Constraints

Pitch constraints related pitches to other pitches. First, every voice must play exactly one pitch per time step.

$$\sum\limits_{p \in P} x_{p, v}^t = 1 ~~~~~~~~~~ \forall v \in V, t \in T$$

Voices cannot jump around the staves; each voice must remain within its fixed pitch range.

$$x_{p, v}^t \le \begin{cases} 1 & p \in \text{Range}(v) \\ 0 & p \not\in \text{Range}(v) \end{cases} ~~~~~~~~~~ \forall p \in P, v \in V, t \in T$$

The voices must also remain in the correct order from highest to lowest, so that the soprano voice plays a pitch higher than the alto, the alto higher than the tenor, and the tenor higher than the bass.

$$\sum\limits_{p \in P} px_{p, v}^t \ge 1 + \sum\limits_{p \in P} px_{p, v-1}^t ~~~~~~~~~~ \forall v \in \{\text{S}, \text{A}, \text{T}\}, t \in T$$

Furthermore, whenever a voice uses the seventh scale degree (the eleventh semitone pitch class), it must resolve upward to the tonic.

$$x_{p, v}^t \le x_{p+1, v}^{t+1} ~~~~~~~~~~ \forall p \in \pi(11), v \in V, t \in T$$

Finally, all parallel octaves, $\pi(0)$; fourths, $\pi(5)$; and fifths, $\pi(7)$, must be eliminated, except for parallel unisons, which are allowed. Together, these form the set of intervals $\chi(\{0, 5, 7\}) \setminus \{0\}$.

$$(x_{p+i, v}^t + x_{p, v’}^t) + (x_{p’+i, v}^{t+1} + x_{p’, v’}^{t+1}) \le 3 ~~~~ \begin{array}{@{}l@{}} \forall p, p’ \in P \\ \forall v, v’ \in V, v > v’ \\ \forall t \in (T \setminus \{t_{\max}\}) \\ \forall i \in (\chi(\{0, 5, 7\}) \setminus \{0\}) \end{array}$$

It is acceptable to have an interval within $\chi(\{0, 5, 7\}) \setminus \{0\}$ between pitches within one voice or another voice or different intervals within the set in two voices. However, this constraint prevents two voices from sharing the same interval from the interval set — the definition of a parallel interval.

### Harmony Constraints

Harmony constraints relate chords to other chords. Similar to pitches, only one chord can be played at a given time step.

$$\sum\limits_{\substack{ c \in C, i \in \lambda( c ) \\ d \in \delta(c, i) }} y_{c, i, d}^t = 1 ~~~~~~~~~~ \forall t \in T$$

Most importantly, a chord at time step $t$ must progress to a feasible chord at time step $(t+1)$.

$$y_{c, i, d}^t \le \sum\limits_{\substack{ (c’, i’) \in \phi(c, i) \\ d’ \in \delta(c’, i’) }} y_{c’, i’, d’}^{t+1} ~~~~~~~~~~ \begin{array}{@{}l@{}} \forall c \in C, i \in \lambda( c ) \\ \forall d \in \delta(c, i) \\ \forall t \in (T \setminus \{t_{\max}\}) \end{array}$$

This forces chords to follow a sequence described in the harmonic progression table (PDF).

### Pitch-Harmony Constraints

Pitch-harmony constraints relate pitches and harmonies, effectively glueing together the two parts of the model. If a certain chord is selected to be played at a certain time step, the pitches of all voices during that time step must belong to the chord’s chord class.

$$\sum\limits_{\substack{ i \in \lambda( c ) \\ d \in \delta(c, i) }} y_{c, i, d}^t \le \sum\limits_{\substack{ p’ \in \pi(p) \\ v \in V }} x_{p’, v}^t ~~~~~~~~~~ \begin{array}{@{}l@{}} \forall c \in C, p \in \chi( c )\\ \forall t \in T \end{array}$$

It is also important that the pitches being played correctly represent the chord’s inversion; that is, the bass corresponds to the correct pitch of the bass for the inversion.

$$y_{c, i, d}^t \le \sum\limits_{p \in \pi(d)} x_{p, \text{B}}^t ~~~~~~~~~~ \begin{array}{@{}l@{}} \forall c \in C, i \in \lambda( c ) \\ \forall d \in \delta(c, i), t \in T \end{array}$$

Finally, a chord may only be selected to be played at time $t$ if its doubled pitch matches the doubled pitch of the pitches being played at time $t$.

$$2 y_{c, i, d}^t \le \sum\limits_{\substack{ p \in \pi(d) \\ v \in V }} x_{p, v}^t ~~~~~~~~~~ \begin{array}{@{}l@{}} \forall c \in C, i \in \lambda( c ) \\ \forall d \in \delta(c, i), t \in T \end{array}$$

Together, these constraints ensure that any chord selected to be played at a given time must represent the pitches being played at that time.

### Interval Constraints

Measuring the intervals between two pitches of a voice becomes important when trying to find an optimal four-part harmony in addition to forcing some amount of global smoothness. An interval is the absolute value between two pitches at consecutive time steps. This can be easily linearized:

$$z_v^t \ge \sum\limits_{p \in P} p\left( x_{p, v}^t - x_{p, v}^{t+1} \right) ~~~~~~~~~~ \forall v \in V, t \in (T \setminus \{t_{\max}\})$$ $$z_v^t \ge \sum\limits_{p \in P} p\left( x_{p, v}^{t+1} - x_{p, v}^t \right) ~~~~~~~~~~ \forall v \in V, t \in (T \setminus \{t_{\max}\})$$

Finally, it is often helpful to limit intervals to some maximum number of semitones, $z_{\max}$.

$$z_v^t \le z_{\max} ~~~~~~~~~~ \forall v \in V, t \in (T \setminus \{t_{\max}\})$$

The examples below set $z_{\max} = 6$, although other values may produce better results depending on the input constraints.

## Optimizing Harmony

Determining which four-part harmony sounds better is, of course, a question of aesthetics. However, one universally desirable feature of four-part harmony is smooth voice leading. This is achieved by reducing the motion within each of the parts.

$$\text{minimize} ~~~~~ \frac{1}{4(t_{\max}-1)} \sum\limits_{v \in V, t \in T} z_v^t$$

The constant in front of the summation normalizes the value of the objective function with respect to the length of the four-part harmony, $t_{\max}$, so that the value of the objective function equals the average interval between two consecutive pitches of all parts.

## Examples

The two examples show possible starting conditions for the IP: a complete melody, and a partial harmony. It is also possible to combine a partial part with a partial harmonic progression. Of course, when giving the IP a starting condition, it is the responsibility of the composer to check whether the starting condition is, itself, feasible.

### Harmonizing a Melody

In this example, the gray notes of the soprano line are fixed to the melody “Twinkle Twinkle,” while the remaining parts and chords are left unconstrained. Note that doubled eighth notes were converted to quarter notes within the model to match the desired harmonic pace. The IP model was left responsible for filling in the alto, tenor, and bass parts as well as determining which chords should be played.

As is the case with highly constrained initial conditions, the example can be solved to optimality in seconds with an objective function value of 1.5357. This means the average interval of two consecutive pitches — fixed melody included — is less than a major second. It is interesting to note that, except for the odd $\cc{IV}{6}{4}$ at the beginning, the optimal harmonization is fairly similar to the expected one.

### Voicing a Harmonic Progression

The last example had a rather boring harmonization (though this is to be expected when given a simple melody) Here, a more interesting partial harmonic progression (gray) was chosen as a starting condition of the form:

$$\ast ~ \to ~ \ast ~ \to ~ \ca{vi} ~ \to ~ \ast ~ \to ~ \ast ~ \to ~ \cc{I}{6}{4} ~ \to ~ \ast ~ \to ~ \ca{I}$$

Any chord is allowed to take the place of each $\ast$ as long as it satisfies the remaining harmonic progression and produces feasible voice leading. No constraint was placed on the first chord (typically, human-composed music starts swith some inversion of $\ca{I}$), so the optimal harmonization starts on a $\cc{V}{7}{5}$. This example was solved to optimality within seconds with an objective function value of 1.7143.

## Conclusion

The rules of four-part harmony can be broken down into a series of constraints: pitch constraints that relate pitches to each other, harmony constraints that relate chords to themselves and consecutive chords, and pitch-harmony constraints that build chords up from their component pitches.

Using these constraints to determine feasible four-part harmonies while maximizing smooth voice leading by reducing intervals between successive pitches, it is possible to generate a four-part harmony that can reasonably be considered “optimal.”

The code used to generate the examples is available at: