HDBScan Clustering with MPDist

Learn how to use MPDist metric with the HDBScan clustering algorithm.

HDBScan is a newer clustering algorithm merging concepts from hierarchical clustering and DBScan into play. You should read the following paper for more details about this algorithm - https://arxiv.org/abs/1911.02282

This notebook illustrates how to use MPDist with HDBScan. The specific implementation of HDBScan may be found here - https://hdbscan.readthedocs.io/en/latest/

This example is simple in nature. Random walk and incremental time series are generated to illustrate implementation.

In [1]:
import hdbscan
from matrixprofile.algorithms.hierarchical_clustering import pairwise_dist
from scipy.spatial.distance import squareform

import numpy as np
In [2]:
np.random.seed(9999)
In [3]:
data = []
size = 100

random_ts = np.random.uniform(size=size)

for _ in range(5):
    data.append(np.copy(random_ts))

data.append(np.arange(100))
data.append(np.arange(100))
data.append(np.arange(100))
In [4]:
window_size = 8
n_jobs = 4

distance_matrix = pairwise_dist(data, window_size=window_size, n_jobs=n_jobs)
In [5]:
square_distance_matrix = squareform(distance_matrix)
In [6]:
clusterer = hdbscan.HDBSCAN(metric='precomputed', min_cluster_size=2)
clusterer.fit(square_distance_matrix)
clusterer.labels_
Out[6]:
array([0, 0, 0, 0, 0, 1, 1, 1])

Here we see that the first 5 time series are clustered together and the latter 3 are clustered together as expected.

Comments

Comments powered by Disqus