Mueen's Algorithm for Similarity Search is The Fastest Similarity Search Algorithm for Time Series Subsequences under Euclidean Distance and Correlation Coefficient.
dist_profile(
data,
query,
...,
window_size = NULL,
method = "v3",
index = 1,
k = NULL,
weight = NULL,
paa = 1
)
a matrix
or a vector
.
a matrix
or a vector
. See details.
Precomputed values from the first iteration. If not supplied, these values will be computed.
an int
or NULL
. Sliding window size. See details.
method that will be used to calculate the distance profile. See details.
an int
. Index of query window. See details.
an int
or NULL
. Default is NULL
. Defines the size of batch for MASS V3. Prefer to
use a power of 2. If NULL
, it will be set automatically.
a vector
of numeric
or NULL
with the same length of the window_size
. This is
a MASS extension to weight the query.
a numeric
. Default is 1
. Factor of PAA reduction (2 == half of size). This is a
MASS extension.
Returns the distance_profile
for the given query and the last_product
for STOMP
algorithm and the parameters for recursive call. See details.
This function has several ways to work:
Case 1: You have a small sized query and the data. In this case you only have to provide the first two
parameters data
and query
. Internally the window_size
will be get from the query length.
Case 2: You have one or two data vectors and want to compute the join or self-similarity. In this case
you need to use the recursive solution. The parameters are data
, query
, window_size
and index
.
The first iteration don't need the index
unless you are starting somewhere else. The query
will be
the source of a query_window
, starting on index
, with length of window_size
.
The method
defines which MASS will be used. Current supported values are: v2
, v3
, weighted
.
Abdullah Mueen, Yan Zhu, Michael Yeh, Kaveh Kamgar, Krishnamurthy Viswanathan, Chetan Kumar Gupta and Eamonn Keogh (2015), The Fastest Similarity Search Algorithm for Time Series Subsequences under Euclidean Distance
Website: https://www.cs.unm.edu/~mueen/FastestSimilaritySearch.html
w <- mp_toy_data$sub_len
ref_data <- mp_toy_data$data[, 1]
# minimum example, data and query
nn <- dist_profile(ref_data, ref_data[1:w])
distance_profile <- sqrt(nn$distance_profile)
# data and indexed query
nn <- dist_profile(ref_data, ref_data, window_size = w, index = 10)
distance_profile <- sqrt(nn$distance_profile)
# recursive
nn <- NULL
for (i in seq_len(10)) {
nn <- dist_profile(ref_data, ref_data, nn, window_size = w, index = i)
}
# weighted
weight <- c(rep(1, w / 3), rep(0.5, w / 3), rep(0.8, w / 3)) # just an example
nn <- dist_profile(ref_data, ref_data,
window_size = w, index = 1, method = "weighted",
weight = weight
)
distance_profile <- sqrt(nn$distance_profile)