Hamming distance

Hamming distance

The Hamming distance is named after the late American mathematician Richard Hamming. Given two sequences of equal length, the Hamming distance measures the minimum number of symbols that must be substituted to transform one string into the other. As such, the Hamming distance is a measure of similarity, or more formally, a measure of the edit distance between two sequences.

Given two zero indexed sequences $a$ and $b$ of length $3$:

In [1]:
a = [1, 0, 1]
b = [1, 0, 0]

We can see intuitively see that to transform $b$ into $a$ we must substitute $b_{2}$ with the symbol $1$. Likewise, to transform $a$ into $b$ we must substitute $a_{2}$ with the symbol $0$. It follows, that the Hamming distance $hamming(a, b)$ is $1$, as this is the minimum number of substitutions to transform $a$ into $b$ or $b$ into $a$.

It is trivial to implement a Hamming distance function in code:

In [2]:
def hamming(a, b):
    assert len(a) == len(b), "Undefined for iterables of different length."
    return sum([i != j for i, j in zip(a, b)])
In [3]:
hamming(a, b) 
Out[3]:
1

Of course attempting to calculate the Hamming distance of $a$ and a new sequence $c$ having length $2$ results in an exception:

In [4]:
c = [0, 1]
In [5]:
hamming(a, c)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-5-40a6cb906c2e> in <module>()
----> 1 hamming(a, c)

<ipython-input-2-bde40aad7275> in hamming(a, b)
      1 def hamming(a, b):
----> 2     assert len(a) == len(b), "Undefined for iterables of different length."
      3     return sum([i != j for i, j in zip(a, b)])

AssertionError: Undefined for iterables of different length.

Sequences need not be numeric:

In [6]:
x = 'Hamming'
y = 'Hammond'
hamming(x, y)
Out[6]:
2

The Hamming distance can also be useful for defining neighbourhood bounds for metaheurstic local search. Consider the following trivial 0-1 investment problem:

$ \begin{align*} min: &\quad z = \sum_{i=1}^{5} c_{i}y_{i} \\ st: &\quad\\ &\quad\sum_{i=1}^{5} y_{i} > 2 \\ & \quad y \in\{0,1\} \end{align*} $

Where $c_{i}$ is the cost of making investment $y_{i}$.

A metaheuristic (albeit poor) approach to solving the problem might be to begin the solution vector $y = [1, 1, 1, 1, 1]$, and to randomly explore solutions within Hamming distance $h$ of the incumbent solution until some stopping condition is met. For example, let $h = 2$ and the stopping condition be 2 iterations. In this case we randomly explore the neighbourhood bounded by 2 substitutions of our incumbent solution. Our local search might proceed as follows:


Iteration 0: $y = [1, 1, 1, 1, 1], z = 15$

Iteration 1: $y = [0, 1, 1, 1, 1], z = 14$ (new incumbent)

Iteration 1: $y = [1, 1, 1, 0, 1], z = 11$ (new incumbent)

Iteration 1: $y = [1, 0, 1, 1, 1], z = 13$

Iteration 2: $y = [1, 1, 0, 0, 0], z = 3$ (infeasible)


Solution: $y = [1, 1, 1, 0, 1], z = 11$


If you can guess the cost vector $c$ you'd know this isn't the best heuristic!

Now for the code:

In [7]:
import numpy as np

def local_search(c, y, h):
    #randomly choose between 1 and d decision variables to flip
    subs = np.random.choice(len(y), np.random.randint(1, h+1), False) 
    for i in subs:
        y[i] =  int(not y[i]) #flippity flip
    return y, np.dot(c, y)

c = [1, 2, 3, 4, 5] #cost vector
y = [1, 1, 1, 1, 1] #solution vector
z = np.dot(c, y)    #objective function value

i = 0 #iteration count
print('Iteration %d: y = %s, z = %d' % (i, y, z))
while i < 2: #stopping condition
    i += 1
    y_hat, z_hat = local_search(c, y[:], 2)
    if sum(y_hat) < 3: #check feasibility
        print('Iteration %d: y = %s, z = %d, infeasible' % (i, y_hat, z_hat))
    elif z_hat < z: #we found a new incumbent
        print('Iteration %d: y = %s, z = %d, new incumbent' % (i, y_hat, z_hat))
        y = y_hat
        z = z_hat
        i = 0
    else:
        print('Iteration %d: y = %s, z = %d' % (i, y_hat, z_hat))

print('\nSolution: y = %s, z = %d' % (y, z))
Iteration 0: y = [1, 1, 1, 1, 1], z = 15
Iteration 1: y = [1, 0, 1, 0, 1], z = 9, new incumbent
Iteration 1: y = [1, 0, 1, 1, 1], z = 13
Iteration 2: y = [1, 0, 0, 0, 1], z = 6, infeasible

Solution: y = [1, 0, 1, 0, 1], z = 9

That's all folks!