# Clustering: Computing the Pairwise Distance Matrix

## Learn how to compute a MPDist based pairwise distance matrix for clustering.

This is a quick code tutorial that demonstrates how you can compute the MPDist based pairwise distance matrix. This distance matrix can be used in any clustering algorithm that allows for a custom distance matrix.

In [1]:
from matrixprofile.algorithms.hierarchical_clustering import pairwise_dist
import numpy as np

In [2]:
%pdoc pairwise_dist

Class docstring:
Utility function to compute all pairwise distances between the timeseries
using MPDist.

Note
----
scipy.spatial.distance.pdist cannot be used because they
do not allow for jagged arrays, however their code was used as a reference
in creating this function.
https://github.com/scipy/scipy/blob/master/scipy/spatial/distance.py#L2039

Parameters
----------
X : array_like
An array_like object containing time series to compute distances for.
window_size : int
The window size to use in computing the MPDist.
threshold : float
The threshold used to compute MPDist.
n_jobs : int
Number of CPU cores to use during computation.

Returns
-------
Y : np.ndarray
Returns a condensed distance matrix Y.  For
each :math:i and :math:j (where :math:i<j<m),where m is the
number of original observations. The metric dist(u=X[i], v=X[j])
is computed and stored in entry ij.
Call docstring:
Call self as a function.

This function computes a condensed distance matrix for all time series of interest. Below is an example of computing the distance matrix on a handful of randomly generated time series.

In [3]:
# generate 5 random time series

data = []
size = 100

for _ in range(5):
data.append(np.random.uniform(size=size))

In [4]:
window_size = 8
n_jobs = 4

distance_matrix = pairwise_dist(data, window_size=window_size, n_jobs=n_jobs)

In [5]:
distance_matrix

Out[5]:
array([1.2334854 , 1.13236744, 1.124416  , 1.17065294, 1.14144607,
1.2107359 , 1.08488366, 1.09598017, 0.98853814, 0.98214056])

### Converting to Square Form¶

Some clustering algorithms require the distance matrix to be square. In this case, we simply convert it.

In [6]:
from scipy.spatial.distance import squareform

In [7]:
square_distance_matrix = squareform(distance_matrix)

In [8]:
square_distance_matrix

Out[8]:
array([[0.        , 1.2334854 , 1.13236744, 1.124416  , 1.17065294],
[1.2334854 , 0.        , 1.14144607, 1.2107359 , 1.08488366],
[1.13236744, 1.14144607, 0.        , 1.09598017, 0.98853814],
[1.124416  , 1.2107359 , 1.09598017, 0.        , 0.98214056],
[1.17065294, 1.08488366, 0.98853814, 0.98214056, 0.        ]])