# Hamming distance

## Hamming distance¶

The Hamming distance is named after the late American mathematician Richard Hamming. Given two sequences of equal length, the Hamming distance measures the minimum number of symbols that must be substituted to transform one string into the other. As such, the Hamming distance is a measure of similarity, or more formally, a measure of the edit distance between two sequences.

Given two zero indexed sequences $a$ and $b$ of length $3$:

```
a = [1, 0, 1]
b = [1, 0, 0]
```

We can see intuitively see that to transform $b$ into $a$ we must substitute $b_{2}$ with the symbol $1$. Likewise, to transform $a$ into $b$ we must substitute $a_{2}$ with the symbol $0$. It follows, that the Hamming distance $hamming(a, b)$ is $1$, as this is the minimum number of substitutions to transform $a$ into $b$ or $b$ into $a$.

It is trivial to implement a Hamming distance function in code:

```
def hamming(a, b):
assert len(a) == len(b), "Undefined for iterables of different length."
return sum([i != j for i, j in zip(a, b)])
```

```
hamming(a, b)
```

Of course attempting to calculate the Hamming distance of $a$ and a new sequence $c$ having length $2$ results in an exception:

```
c = [0, 1]
```

```
hamming(a, c)
```

Sequences need not be numeric:

```
x = 'Hamming'
y = 'Hammond'
hamming(x, y)
```

The Hamming distance can also be useful for defining neighbourhood bounds for metaheurstic local search. Consider the following trivial 0-1 investment problem:

$ \begin{align*} min: &\quad z = \sum_{i=1}^{5} c_{i}y_{i} \\ st: &\quad\\ &\quad\sum_{i=1}^{5} y_{i} > 2 \\ & \quad y \in\{0,1\} \end{align*} $

Where $c_{i}$ is the cost of making investment $y_{i}$.

A metaheuristic (albeit poor) approach to solving the problem might be to begin the solution vector $y = [1, 1, 1, 1, 1]$, and to randomly explore solutions within Hamming distance $h$ of the incumbent solution until some stopping condition is met. For example, let $h = 2$ and the stopping condition be 2 iterations. In this case we randomly explore the neighbourhood bounded by 2 substitutions of our incumbent solution. Our local search might proceed as follows:

Iteration 0: $y = [1, 1, 1, 1, 1], z = 15$

Iteration 1: $y = [0, 1, 1, 1, 1], z = 14$ (new incumbent)

Iteration 1: $y = [1, 1, 1, 0, 1], z = 11$ (new incumbent)

Iteration 1: $y = [1, 0, 1, 1, 1], z = 13$

Iteration 2: $y = [1, 1, 0, 0, 0], z = 3$ (infeasible)

*Solution: $y = [1, 1, 1, 0, 1], z = 11$*

If you can guess the cost vector $c$ you'd know this isn't the best heuristic!

Now for the code:

```
import numpy as np
def local_search(c, y, h):
#randomly choose between 1 and d decision variables to flip
subs = np.random.choice(len(y), np.random.randint(1, h+1), False)
for i in subs:
y[i] = int(not y[i]) #flippity flip
return y, np.dot(c, y)
c = [1, 2, 3, 4, 5] #cost vector
y = [1, 1, 1, 1, 1] #solution vector
z = np.dot(c, y) #objective function value
i = 0 #iteration count
print('Iteration %d: y = %s, z = %d' % (i, y, z))
while i < 2: #stopping condition
i += 1
y_hat, z_hat = local_search(c, y[:], 2)
if sum(y_hat) < 3: #check feasibility
print('Iteration %d: y = %s, z = %d, infeasible' % (i, y_hat, z_hat))
elif z_hat < z: #we found a new incumbent
print('Iteration %d: y = %s, z = %d, new incumbent' % (i, y_hat, z_hat))
y = y_hat
z = z_hat
i = 0
else:
print('Iteration %d: y = %s, z = %d' % (i, y_hat, z_hat))
print('\nSolution: y = %s, z = %d' % (y, z))
```

That's all folks!