Computes the Matrix Profile and Profile Index for Univariate Time Series.

mpx(
  data,
  window_size,
  query = NULL,
  exclusion_zone = 0.5,
  idxs = TRUE,
  distance = c("euclidean", "pearson"),
  n_workers = 1,
  progress = TRUE
)

Arguments

data

Required. Any 1-dimension series of numbers (matrix, vector, ts etc.) (See details).

window_size

Required. An integer defining the rolling window size.

query

Optional. Another 1-dimension series of numbers for an AB-join similarity. Default is NULL (See details).

exclusion_zone

A numeric. Defines the size of the area around the rolling window that will be ignored to avoid trivial matches. Default is 0.5, i.e., half of the window_size.

idxs

A logical. Specifies if the computation will return the Profile Index or not. Defaults to TRUE.

distance

A string. Currently accepts euclidean and pearson. Defaults to euclidean.

n_workers

An integer. The number of threads using for computing. Defaults to 1.

progress

A logical. If TRUE (the default) will show a progress bar. Useful for long computations. (See details)

Value

Returns a list with the Matrix Profile, Profile Index (if idxs is TRUE), and some information about the settings used to build it.

Details

This algorithm was developed apart from the main Matrix Profile branch that relies on Fast Fourier Transform (FFT) at least in one part of the process. This algorithm doesn't use FFT and is several times faster. It also relies on Ogita's work to better precision computing mean and standard deviation (part of the process). About progress, it is really recommended to use it as feedback for long computations. It indeed adds some (neglectable) overhead, but the benefit of knowing that your computer is still computing is much bigger than the seconds you may lose in the final benchmark. About n_workers, for Windows systems, this package uses TBB for multithreading, and Linux and macOS, use TinyThread++. This may or not raise some issues in the future, so we must be aware of slower processing due to different mutexes implementations or even unexpected crashes. The Windows version is usually more reliable. The data and query parameters will be internally converted to a single vector using as.numeric(), thus, bear in mind that a multidimensional matrix may not work as you expect, but most 1-dimensional data types will work normally. If query is provided, expect the same pre-procesment done for data; in addition, exclusion_zone will be ignored and set to 0. Both data and query doesn't need to have the same size and they can be interchanged if both are provided. The difference will be in the returning object. AB-Join returns the Matrix Profile 'A' and 'B' i.e., the distance between a rolling window from query to data and from data to query.

See also

Other matrix profile computations: scrimp(), stamp(), stomp()

Examples

# \donttest{ mp <- mpx(runif(200), window_size = 30)
#> <simpleError in mpx_rcpp(data, window_size, ez, as.logical(idxs), as.logical(dist), as.logical(progress)): object '_matrixprofiler_mpx_rcpp' not found>
# }