HDBScan Clustering with MPDist
Learn how to use MPDist metric with the HDBScan clustering algorithm.
HDBScan is a newer clustering algorithm merging concepts from hierarchical clustering and DBScan into play. You should read the following paper for more details about this algorithm - https://arxiv.org/abs/1911.02282
This notebook illustrates how to use MPDist with HDBScan. The specific implementation of HDBScan may be found here - https://hdbscan.readthedocs.io/en/latest/
This example is simple in nature. Random walk and incremental time series are generated to illustrate implementation.
In [1]:
import hdbscan
from matrixprofile.algorithms.hierarchical_clustering import pairwise_dist
from scipy.spatial.distance import squareform
import numpy as np
In [2]:
np.random.seed(9999)
In [3]:
data = []
size = 100
random_ts = np.random.uniform(size=size)
for _ in range(5):
data.append(np.copy(random_ts))
data.append(np.arange(100))
data.append(np.arange(100))
data.append(np.arange(100))
In [4]:
window_size = 8
n_jobs = 4
distance_matrix = pairwise_dist(data, window_size=window_size, n_jobs=n_jobs)
In [5]:
square_distance_matrix = squareform(distance_matrix)
In [6]:
clusterer = hdbscan.HDBSCAN(metric='precomputed', min_cluster_size=2)
clusterer.fit(square_distance_matrix)
clusterer.labels_
Out[6]:
Here we see that the first 5 time series are clustered together and the latter 3 are clustered together as expected.
Comments
Comments powered by Disqus