HDBScan is a newer clustering algorithm merging concepts from hierarchical clustering and DBScan into play. You should read the following paper for more details about this algorithm - https://arxiv.org/abs/1911.02282
This notebook illustrates how to use MPDist with HDBScan. The specific implementation of HDBScan may be found here - https://hdbscan.readthedocs.io/en/latest/
This example is simple in nature. Random walk and incremental time series are generated to illustrate implementation.
import hdbscan from matrixprofile.algorithms.hierarchical_clustering import pairwise_dist from scipy.spatial.distance import squareform import numpy as np
data =  size = 100 random_ts = np.random.uniform(size=size) for _ in range(5): data.append(np.copy(random_ts)) data.append(np.arange(100)) data.append(np.arange(100)) data.append(np.arange(100))
window_size = 8 n_jobs = 4 distance_matrix = pairwise_dist(data, window_size=window_size, n_jobs=n_jobs)
square_distance_matrix = squareform(distance_matrix)
clusterer = hdbscan.HDBSCAN(metric='precomputed', min_cluster_size=2) clusterer.fit(square_distance_matrix) clusterer.labels_
array([0, 0, 0, 0, 0, 1, 1, 1])
Here we see that the first 5 time series are clustered together and the latter 3 are clustered together as expected.