# HDBScan Clustering with MPDist

## Learn how to use MPDist metric with the HDBScan clustering algorithm.

HDBScan is a newer clustering algorithm merging concepts from hierarchical clustering and DBScan into play. You should read the following paper for more details about this algorithm - https://arxiv.org/abs/1911.02282

This notebook illustrates how to use MPDist with HDBScan. The specific implementation of HDBScan may be found here - https://hdbscan.readthedocs.io/en/latest/

This example is simple in nature. Random walk and incremental time series are generated to illustrate implementation.

In [1]:

```
import hdbscan
from matrixprofile.algorithms.hierarchical_clustering import pairwise_dist
from scipy.spatial.distance import squareform
import numpy as np
```

In [2]:

```
np.random.seed(9999)
```

In [3]:

```
data = []
size = 100
random_ts = np.random.uniform(size=size)
for _ in range(5):
data.append(np.copy(random_ts))
data.append(np.arange(100))
data.append(np.arange(100))
data.append(np.arange(100))
```

In [4]:

```
window_size = 8
n_jobs = 4
distance_matrix = pairwise_dist(data, window_size=window_size, n_jobs=n_jobs)
```

In [5]:

```
square_distance_matrix = squareform(distance_matrix)
```

In [6]:

```
clusterer = hdbscan.HDBSCAN(metric='precomputed', min_cluster_size=2)
clusterer.fit(square_distance_matrix)
clusterer.labels_
```

Out[6]:

## Comments

Comments powered by Disqus