tods.detection_algorithm Module

tods.detection_algorithm.AutoRegODetect

class tods.detection_algorithm.AutoRegODetect.AutoRegODetectorPrimitive(*args, **kwds)

Bases: tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase

Autoregressive models use linear regression to calculate a sample’s deviance from the predicted value, which is then used as its outlier scores. This model is for multivariate time series. This model handles multivariate time series by various combination approaches. See AutoRegOD for univarite data.

See :cite:`aggarwal2015outlier,zhao2020using` for details.

decision_scores_

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type

numpy array of shape (n_samples,)

labels_

The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying threshold_ on decision_scores_.

Type

int, either 0 or 1

Parameters
  • window_size (int) – The moving window size.

  • step_size (int, optional (default=1)) – The displacement for moving window.

  • contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. When fitting this is used to define the threshold on the decision function.

  • method (str, optional (default=``’average’``)) – Combination method: {‘average’, ‘maximization’, ‘median’}. Pass in weights of detector for weighted version.

  • weights (numpy array of shape (1, n_dimensions)) – Score weight by dimensions. (default=[1,1,…,1])

fit(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None]

Fit model with training data. :param *: Container DataFrame. Time series data up to fit.

Returns

None

Parameters
  • timeout – A maximum time this primitive should be fitting during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

A CallResult with None value.

get_params() → tods.detection_algorithm.AutoRegODetect.Params

Return parameters. :param None:

Returns

class Params

Returns

Return type

An instance of parameters.

produce(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]

Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.

Returns

Container DataFrame 1 marks Outliers, 0 marks normal.

Parameters
  • inputs – The inputs of shape [num_inputs, …].

  • timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

The outputs of shape [num_inputs, …] wrapped inside CallResult.

produce_score(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]

Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.

Returns

Container DataFrame Outlier score of input DataFrame.

set_params(*, params: tods.detection_algorithm.AutoRegODetect.Params) → None

Set parameters for outlier detection. :param params: class Params

Returns

None

Parameters

params – An instance of parameters.

set_training_data(*, inputs: d3m.container.pandas.DataFrame) → None

Set training data for outlier detection. :param inputs: Container DataFrame

Returns

None

Parameters

inputs – The inputs.

tods.detection_algorithm.DeepLog

class tods.detection_algorithm.DeepLog.DeepLogPrimitive(*args, **kwds)

Bases: tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase

A primitive that uses DeepLog for outlier detection

clf_.decision_scores_

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type

numpy array of shape (n_samples,)

clf_.threshold_

For outlier, decision_scores_ more than threshold_. For inlier, decision_scores_ less than threshold_.

Type

float within (0, 1)

clf_.labels_

The binary labels of the training data. 0 stands for inliers. and 1 for outliers/anomalies. It is generated by applying. threshold_ on decision_scores_.

Type

int, either 0 or 1

left_inds_

One of the mapping from decision_score to data. For point outlier detection, left_inds_ exactly equals the index of each data point. For Collective outlier detection, left_inds_ equals the start index of each subsequence.

Type

ndarray,

left_inds_

One of the mapping from decision_score to data. For point outlier detection, left_inds_ exactly equals the index of each data point plus 1. For Collective outlier detection, left_inds_ equals the ending index of each subsequence.

Type

ndarray,

fit(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None]

Fit model with training data. :param *: Container DataFrame. Time series data up to fit.

Returns

None

Parameters
  • timeout – A maximum time this primitive should be fitting during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

A CallResult with None value.

get_params() → tods.detection_algorithm.DeepLog.Params

Return parameters. :param None:

Returns

class Params

Returns

Return type

An instance of parameters.

produce(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]

Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.

Returns

Container DataFrame 1 marks Outliers, 0 marks normal.

Parameters
  • inputs – The inputs of shape [num_inputs, …].

  • timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

The outputs of shape [num_inputs, …] wrapped inside CallResult.

set_params(*, params: tods.detection_algorithm.DeepLog.Params) → None

Set parameters for outlier detection. :param params: class Params

Returns

None

Parameters

params – An instance of parameters.

set_training_data(*, inputs: d3m.container.pandas.DataFrame) → None

Set training data for outlier detection. :param inputs: Container DataFrame

Returns

None

Parameters

inputs – The inputs.

tods.detection_algorithm.KDiscordODetect

class tods.detection_algorithm.KDiscordODetect.KDiscordODetectorPrimitive(*args, **kwds)

Bases: tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase

KDiscord first split multivariate time series into subsequences (matrices), and it use kNN outlier detection based on PyOD. For an observation, its distance to its kth nearest neighbor could be viewed as the outlying score. It could be viewed as a way to measure the density. See :cite:`ramaswamy2000efficient,angiulli2002fast` for details.

See :cite:`aggarwal2015outlier,zhao2020using` for details.

decision_scores_

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type

numpy array of shape (n_samples,)

threshold_

The threshold is based on contamination. It is the n_samples * contamination most abnormal samples in decision_scores_. The threshold is calculated for generating binary outlier labels.

Type

float

labels_

The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying threshold_ on decision_scores_.

Type

int, either 0 or 1

Parameters
  • window_size (int) – The moving window size.

  • step_size (int, optional (default=1)) – The displacement for moving window.

  • contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.

  • n_neighbors (int, optional (default = 5)) – Number of neighbors to use by default for k neighbors queries.

  • method (str, optional (default=``’largest’``)) –

    {‘largest’, ‘mean’, ‘median’}

    • ’largest’: use the distance to the kth neighbor as the outlier score

    • ’mean’: use the average of all k neighbors as the outlier score

    • ’median’: use the median of the distance to k neighbors as the outlier score

  • radius (float, optional (default = 1.0)) – Range of parameter space to use by default for radius_neighbors queries.

  • algorithm ({'auto', 'ball_tree', 'kd_tree', 'brute'}, optional) –

    Algorithm used to compute the nearest neighbors:

    • ’ball_tree’ will use BallTree

    • ’kd_tree’ will use KDTree

    • ’brute’ will use a brute-force search.

    • ’auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit() method.

    Note: fitting on sparse input will override the setting of this parameter, using brute force.

    Deprecated since version 0.74: algorithm is deprecated in PyOD 0.7.4 and will not be possible in 0.7.6. It has to use BallTree for consistency.

  • leaf_size (int, optional (default = 30)) – Leaf size passed to BallTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.

  • metric (string or callable, default 'minkowski') –

    metric to use for distance computation. Any metric from scikit-learn or scipy.spatial.distance can be used.

    If metric is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two arrays as input and return one value indicating the distance between them. This works for Scipy’s metrics, but is less efficient than passing the metric name as a string.

    Distance matrices are not supported.

    Valid values for metric are:

    • from scikit-learn: [‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’, ‘manhattan’]

    • from scipy.spatial.distance: [‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘correlation’, ‘dice’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’]

    See the documentation for scipy.spatial.distance for details on these metrics.

  • p (integer, optional (default = 2)) – Parameter for the Minkowski metric from sklearn.metrics.pairwise.pairwise_distances. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used. See http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances

  • metric_params (dict, optional (default = None)) – Additional keyword arguments for the metric function.

fit(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None]

Fit model with training data. :param *: Container DataFrame. Time series data up to fit.

Returns

None

Parameters
  • timeout – A maximum time this primitive should be fitting during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

A CallResult with None value.

get_params() → tods.detection_algorithm.KDiscordODetect.Params

Return parameters. :param None:

Returns

class Params

Returns

Return type

An instance of parameters.

produce(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]

Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.

Returns

Container DataFrame 1 marks Outliers, 0 marks normal.

Parameters
  • inputs – The inputs of shape [num_inputs, …].

  • timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

The outputs of shape [num_inputs, …] wrapped inside CallResult.

produce_score(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]

Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.

Returns

Container DataFrame Outlier score of input DataFrame.

set_params(*, params: tods.detection_algorithm.KDiscordODetect.Params) → None

Set parameters for outlier detection. :param params: class Params

Returns

None

Parameters

params – An instance of parameters.

set_training_data(*, inputs: d3m.container.pandas.DataFrame) → None

Set training data for outlier detection. :param inputs: Container DataFrame

Returns

None

Parameters

inputs – The inputs.

tods.detection_algorithm.LSTMODetect

class tods.detection_algorithm.LSTMODetect.LSTMODetectorPrimitive(*args, **kwds)

Bases: tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase

A base class for primitives which have to be fitted before they can start producing (useful) outputs from inputs, but they are fitted only on input data.

decision_scores_

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type

numpy array of shape (n_samples,)

labels_

The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying threshold_ on decision_scores_.

Type

int, either 0 or 1

Parameters
  • window_size (int) – The moving window size.

  • step_size (int, optional (default=1)) – The displacement for moving window.

  • contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. When fitting this is used to define the threshold on the decision function.

fit(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None]

Fit model with training data. :param *: Container DataFrame. Time series data up to fit.

Returns

None

Parameters
  • timeout – A maximum time this primitive should be fitting during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

A CallResult with None value.

get_params() → tods.detection_algorithm.LSTMODetect.Params

Return parameters. :param None:

Returns

class Params

Returns

Return type

An instance of parameters.

produce(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]

Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.

Returns

Container DataFrame 1 marks Outliers, 0 marks normal.

Parameters
  • inputs – The inputs of shape [num_inputs, …].

  • timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

The outputs of shape [num_inputs, …] wrapped inside CallResult.

set_params(*, params: tods.detection_algorithm.LSTMODetect.Params) → None

Set parameters for outlier detection. :param params: class Params

Returns

None

Parameters

params – An instance of parameters.

set_training_data(*, inputs: d3m.container.pandas.DataFrame) → None

Set training data for outlier detection. :param inputs: Container DataFrame

Returns

None

Parameters

inputs – The inputs.

tods.detection_algorithm.MatrixProfile

class tods.detection_algorithm.MatrixProfile.MP(window_size, step_size, contamination)

Bases: tods.detection_algorithm.core.CollectiveBase.CollectiveBaseDetector

This is the class for matrix profile function

decision_function(X)
Parameters

data – dataframe column

Returns

nparray

fit(X)

Fit detector. y is ignored in unsupervised methods. :param X: The input samples. :type X: numpy array of shape (n_samples, n_features) :param y: Not used, present for API consistency by convention. :type y: Ignored

Returns

self – Fitted estimator.

Return type

object

class tods.detection_algorithm.MatrixProfile.MatrixProfilePrimitive(*args, **kwds)

Bases: tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase

A primitive that performs matrix profile on a DataFrame using Stumpy package Stumpy documentation: https://stumpy.readthedocs.io/en/latest/index.html

Parameters
T_Andarray

The time series or sequence for which to compute the matrix profile

mint

Window size

T_Bndarray

The time series or sequence that contain your query subsequences of interest. Default is None which corresponds to a self-join.

ignore_trivialbool

Set to True if this is a self-join. Otherwise, for AB-join, set this to False. Default is True.

outndarray

The first column consists of the matrix profile, the second column consists of the matrix profile indices, the third column consists of the left matrix profile indices, and the fourth column consists of the right matrix profile indices.

clf_.decision_scores_

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type

numpy array of shape (n_samples,)

clf_.threshold_

For outlier, decision_scores_ more than threshold_. For inlier, decision_scores_ less than threshold_.

Type

float within (0, 1)

clf_.labels_

The binary labels of the training data. 0 stands for inliers. and 1 for outliers/anomalies. It is generated by applying. threshold_ on decision_scores_.

Type

int, either 0 or 1

left_inds_

One of the mapping from decision_score to data. For point outlier detection, left_inds_ exactly equals the index of each data point. For Collective outlier detection, left_inds_ equals the start index of each subsequence.

Type

ndarray,

left_inds_

One of the mapping from decision_score to data. For point outlier detection, left_inds_ exactly equals the index of each data point plus 1. For Collective outlier detection, left_inds_ equals the ending index of each subsequence.

Type

ndarray,

Parameters

contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. When fitting this is used to define the threshold on the decision function.

fit(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None]

Fit model with training data. :param *: Container DataFrame. Time series data up to fit.

Returns

None

Parameters
  • timeout – A maximum time this primitive should be fitting during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

A CallResult with None value.

get_params() → tods.detection_algorithm.MatrixProfile.Params

Return parameters. :param None:

Returns

class Params

Returns

Return type

An instance of parameters.

produce(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]

Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.

Returns

Container DataFrame 1 marks Outliers, 0 marks normal.

Parameters
  • inputs – The inputs of shape [num_inputs, …].

  • timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

The outputs of shape [num_inputs, …] wrapped inside CallResult.

set_params(*, params: tods.detection_algorithm.MatrixProfile.Params) → None

Set parameters for outlier detection. :param params: class Params

Returns

None

Parameters

params – An instance of parameters.

set_training_data(*, inputs: d3m.container.pandas.DataFrame) → None

Set training data for outlier detection. :param inputs: Container DataFrame

Returns

None

Parameters

inputs – The inputs.

tods.detection_algorithm.PCAODetect

class tods.detection_algorithm.PCAODetect.PCAODetectorPrimitive(*args, **kwds)

Bases: tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase

PCA-based outlier detection with both univariate and multivariate time series data. TS data will be first transformed to tabular format. For univariate data, it will be in shape of [valid_length, window_size]. for multivariate data with d sequences, it will be in the shape of [valid_length, window_size].

decision_scores_

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type

numpy array of shape (n_samples,)

threshold_

The threshold is based on contamination. It is the n_samples * contamination most abnormal samples in decision_scores_. The threshold is calculated for generating binary outlier labels.

Type

float

labels_

The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying threshold_ on decision_scores_.

Type

int, either 0 or 1

Parameters
  • window_size (int) – The moving window size.

  • step_size (int, optional (default=1)) – The displacement for moving window.

  • contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.

  • n_components (int, float, None or string) –

    Number of components to keep. It should be smaller than the window_size. if n_components is not set all components are kept:

    n_components == min(n_samples, n_features)
    

    if n_components == ‘mle’ and svd_solver == ‘full’, Minka’s MLE is used to guess the dimension if 0 < n_components < 1 and svd_solver == ‘full’, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components n_components cannot be equal to n_features for svd_solver == ‘arpack’.

  • n_selected_components (int, optional (default=None)) – Number of selected principal components for calculating the outlier scores. It is not necessarily equal to the total number of the principal components. If not set, use all principal components.

  • whiten (bool, optional (default False)) –

    When True (False by default) the components_ vectors are multiplied by the square root of n_samples and then divided by the singular values to ensure uncorrelated outputs with unit component-wise variances.

    Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometime improve the predictive accuracy of the downstream estimators by making their data respect some hard-wired assumptions.

  • svd_solver (string {'auto', 'full', 'arpack', 'randomized'}) –

    auto :

    the solver is selected by a default policy based on X.shape and n_components: if the input data is larger than 500x500 and the number of components to extract is lower than 80% of the smallest dimension of the data, then the more efficient ‘randomized’ method is enabled. Otherwise the exact full SVD is computed and optionally truncated afterwards.

    full :

    run exact full SVD calling the standard LAPACK solver via scipy.linalg.svd and select the components by postprocessing

    arpack :

    run SVD truncated to n_components calling ARPACK solver via scipy.sparse.linalg.svds. It requires strictly 0 < n_components < X.shape[1]

    randomized :

    run randomized SVD by the method of Halko et al.

  • tol (float >= 0, optional (default .0)) – Tolerance for singular values computed by svd_solver == ‘arpack’.

  • iterated_power (int >= 0, or 'auto', (default 'auto')) – Number of iterations for the power method computed by svd_solver == ‘randomized’.

  • random_state (int, RandomState instance or None, optional (default None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. Used when svd_solver == ‘arpack’ or ‘randomized’.

  • weighted (bool, optional (default=True)) – If True, the eigenvalues are used in score computation. The eigenvectors with small eigenvalues comes with more importance in outlier score calculation.

  • standardization (bool, optional (default=True)) – If True, perform standardization first to convert data to zero mean and unit variance. See http://scikit-learn.org/stable/auto_examples/preprocessing/plot_scaling_importance.html

fit(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None]

Fit model with training data. :param *: Container DataFrame. Time series data up to fit.

Returns

None

Parameters
  • timeout – A maximum time this primitive should be fitting during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

A CallResult with None value.

get_params() → tods.detection_algorithm.PCAODetect.Params

Return parameters. :param None:

Returns

class Params

Returns

Return type

An instance of parameters.

produce(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]

Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.

Returns

Container DataFrame 1 marks Outliers, 0 marks normal.

Parameters
  • inputs – The inputs of shape [num_inputs, …].

  • timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

The outputs of shape [num_inputs, …] wrapped inside CallResult.

produce_score(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]

Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.

Returns

Container DataFrame Outlier score of input DataFrame.

set_params(*, params: tods.detection_algorithm.PCAODetect.Params) → None

Set parameters for outlier detection. :param params: class Params

Returns

None

Parameters

params – An instance of parameters.

set_training_data(*, inputs: d3m.container.pandas.DataFrame) → None

Set training data for outlier detection. :param inputs: Container DataFrame

Returns

None

Parameters

inputs – The inputs.

tods.detection_algorithm.PyodABOD

class tods.detection_algorithm.PyodABOD.ABODPrimitive(*args, **kwds)

Bases: tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase

ABOD class for Angle-base Outlier Detection. For an observation, the variance of its weighted cosine scores to all neighbors could be viewed as the outlying score. See :cite:`kriegel2008angle` for details.

Two versions of ABOD are supported:

  • Fast ABOD: use k nearest neighbors to approximate.

  • Original ABOD: consider all training points with high time complexity at O(n^3).

decision_scores_

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type

numpy array of shape (n_samples,)

threshold_

The threshold is based on contamination. It is the n_samples * contamination most abnormal samples in decision_scores_. The threshold is calculated for generating binary outlier labels.

Type

float

labels_

The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying threshold_ on decision_scores_.

Type

int, either 0 or 1

Parameters
  • contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.

  • n_neighbors (int, optional (default=10)) – Number of neighbors to use by default for k neighbors queries.

  • method (str, optional (default=``’fast’``)) –

    Valid values for metric are:

    • ’fast’: fast ABOD. Only consider n_neighbors of training points

    • ’default’: original ABOD with all training points, which could be slow

fit(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None]

Fit model with training data. :param *: Container DataFrame. Time series data up to fit.

Returns

None

Parameters
  • timeout – A maximum time this primitive should be fitting during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

A CallResult with None value.

get_params() → tods.detection_algorithm.PyodABOD.Params

Return parameters. :param None:

Returns

class Params

Returns

Return type

An instance of parameters.

produce(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]

Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.

Returns

Container DataFrame 1 marks Outliers, 0 marks normal.

Parameters
  • inputs – The inputs of shape [num_inputs, …].

  • timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

The outputs of shape [num_inputs, …] wrapped inside CallResult.

produce_score(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]

Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.

Returns

Container DataFrame Outlier score of input DataFrame.

set_params(*, params: tods.detection_algorithm.PyodABOD.Params) → None

Set parameters for outlier detection. :param params: class Params

Returns

None

Parameters

params – An instance of parameters.

set_training_data(*, inputs: d3m.container.pandas.DataFrame) → None

Set training data for outlier detection. :param inputs: Container DataFrame

Returns

None

Parameters

inputs – The inputs.

tods.detection_algorithm.PyodAE

class tods.detection_algorithm.PyodAE.AutoEncoderPrimitive(*args, **kwds)

Bases: tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase

Auto Encoder (AE) is a type of neural networks for learning useful data representations unsupervisedly. Similar to PCA, AE could be used to detect outlying objects in the data by calculating the reconstruction errors. See :cite:`aggarwal2015outlier` Chapter 3 for details.

encoding_dim_

The number of neurons in the encoding layer.

Type

int

compression_rate_

The ratio between the original feature and the number of neurons in the encoding layer.

Type

float

model_

The underlying AutoEncoder in Keras.

Type

Keras Object

history_

The AutoEncoder training history.

Type

Keras Object

decision_scores_

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type

numpy array of shape (n_samples,)

threshold_

The threshold is based on contamination. It is the n_samples * contamination most abnormal samples in decision_scores_. The threshold is calculated for generating binary outlier labels.

Type

float

labels_

The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying threshold_ on decision_scores_.

Type

int, either 0 or 1

Parameters
  • hidden_neurons (list, optional (default=[4,2,4])) – The number of neurons per hidden layers.

  • hidden_activation (str, optional (default=``’relu’``)) – Activation function to use for hidden layers. All hidden layers are forced to use the same type of activation. See https://keras.io/activations/

  • output_activation (str, optional (default=``’sigmoid’``)) – Activation function to use for output layer. See https://keras.io/activations/

  • loss (str or obj, optional (default=keras.losses.mean_squared_error)) – String (name of objective function) or objective function. See https://keras.io/losses/

  • optimizer (str, optional (default=``’adam’``)) – String (name of optimizer) or optimizer instance. See https://keras.io/optimizers/

  • epochs (int, optional (default=100)) – Number of epochs to train the model.

  • batch_size (int, optional (default=32)) – Number of samples per gradient update.

  • dropout_rate (float in (0., 1), optional (default=0.2)) – The dropout to be used across all layers.

  • l2_regularizer (float in (0., 1), optional (default=0.1)) – The regularization strength of activity_regularizer applied on each layer. By default, l2 regularizer is used. See https://keras.io/regularizers/

  • validation_size (float in (0., 1), optional (default=0.1)) – The percentage of data to be used for validation.

  • preprocessing (bool, optional (default=True)) – If True, apply standardization on the data.

  • verbose (int, optional (default=1)) – Verbosity mode. - 0 = silent - 1 = progress bar - 2 = one line per epoch. For verbosity >= 1, model summary may be printed.

  • random_state (random_state: int, RandomState instance or None, optional) – (default=None) If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

  • contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. When fitting this is used to define the threshold on the decision function.

fit(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None]

Fit model with training data. :param *: Container DataFrame. Time series data up to fit.

Returns

None

Parameters
  • timeout – A maximum time this primitive should be fitting during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

A CallResult with None value.

get_params() → tods.detection_algorithm.PyodAE.Params

Return parameters. :param None:

Returns

class Params

Returns

Return type

An instance of parameters.

produce(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]

Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.

Returns

Container DataFrame 1 marks Outliers, 0 marks normal.

Parameters
  • inputs – The inputs of shape [num_inputs, …].

  • timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

The outputs of shape [num_inputs, …] wrapped inside CallResult.

set_params(*, params: tods.detection_algorithm.PyodAE.Params) → None

Set parameters for outlier detection. :param params: class Params

Returns

None

Parameters

params – An instance of parameters.

set_training_data(*, inputs: d3m.container.pandas.DataFrame) → None

Set training data for outlier detection. :param inputs: Container DataFrame

Returns

None

Parameters

inputs – The inputs.

tods.detection_algorithm.PyodCBLOF

class tods.detection_algorithm.PyodCBLOF.CBLOFPrimitive(*args, **kwds)

Bases: tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase

The CBLOF operator calculates the outlier score based on cluster-based local outlier factor. CBLOF takes as an input the data set and the cluster model that was generated by a clustering algorithm. It classifies the clusters into small clusters and large clusters using the parameters alpha and beta. The anomaly score is then calculated based on the size of the cluster the point belongs to as well as the distance to the nearest large cluster. Use weighting for outlier factor based on the sizes of the clusters as proposed in the original publication. Since this might lead to unexpected behavior (outliers close to small clusters are not found), it is disabled by default.Outliers scores are solely computed based on their distance to the closest large cluster center. By default, kMeans is used for clustering algorithm instead of Squeezer algorithm mentioned in the original paper for multiple reasons. See :cite:`he2003discovering` for details.

decision_scores_

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type

numpy array of shape (n_samples,)

threshold_

The threshold is based on contamination. It is the n_samples * contamination most abnormal samples in decision_scores_. The threshold is calculated for generating binary outlier labels.

Type

float

labels_

The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying threshold_ on decision_scores_.

Type

int, either 0 or 1

Parameters
  • n_clusters (int, optional (default=8)) – The number of clusters to form as well as the number of centroids to generate.

  • contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.

  • clustering_estimator (Estimator, optional (default=None)) – The base clustering algorithm for performing data clustering. A valid clustering algorithm should be passed in. The estimator should have standard sklearn APIs, fit() and predict(). The estimator should have attributes labels_ and cluster_centers_. If cluster_centers_ is not in the attributes once the model is fit, it is calculated as the mean of the samples in a cluster. If not set, CBLOF uses KMeans for scalability. See https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

  • alpha (float in (0.5, 1), optional (default=0.9)) – Coefficient for deciding small and large clusters. The ratio of the number of samples in large clusters to the number of samples in small clusters.

  • beta (int or float in (1,), optional (default=5).) – Coefficient for deciding small and large clusters. For a list sorted clusters by size |C1|, |C2|, …, |Cn|, beta = |Ck|/|Ck-1|

  • use_weights (bool, optional (default=False)) – If set to True, the size of clusters are used as weights in outlier score calculation.

  • check_estimator (bool, optional (default=False)) –

    If set to True, check whether the base estimator is consistent with sklearn standard. .. warning:

    check_estimator may throw errors with scikit-learn 0.20 above.
    

  • random_state (int, RandomState or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

fit(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None]

Fit model with training data. :param *: Container DataFrame. Time series data up to fit.

Returns

None

Parameters
  • timeout – A maximum time this primitive should be fitting during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

A CallResult with None value.

get_params() → tods.detection_algorithm.PyodCBLOF.Params

Return parameters. :param None:

Returns

class Params

Returns

Return type

An instance of parameters.

produce(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]

Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.

Returns

Container DataFrame 1 marks Outliers, 0 marks normal.

Parameters
  • inputs – The inputs of shape [num_inputs, …].

  • timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

The outputs of shape [num_inputs, …] wrapped inside CallResult.

set_params(*, params: tods.detection_algorithm.PyodCBLOF.Params) → None

Set parameters for outlier detection. :param params: class Params

Returns

None

Parameters

params – An instance of parameters.

set_training_data(*, inputs: d3m.container.pandas.DataFrame) → None

Set training data for outlier detection. :param inputs: Container DataFrame

Returns

None

Parameters

inputs – The inputs.

tods.detection_algorithm.PyodCOF

class tods.detection_algorithm.PyodCOF.COFPrimitive(*args, **kwds)

Bases: tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase

Connectivity-Based Outlier Factor (COF) COF uses the ratio of average chaining distance of data point and the average of average chaining distance of k nearest neighbor of the data point, as the outlier score for observations. See :cite:`tang2002enhancing` for details.

decision_scores_

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type

numpy array of shape (n_samples,)

threshold_

The threshold is based on contamination. It is the n_samples * contamination most abnormal samples in decision_scores_. The threshold is calculated for generating binary outlier labels.

Type

float

labels_

The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying threshold_ on decision_scores_.

Type

int, either 0 or 1

n_neighbors_

Number of neighbors to use by default for k neighbors queries.

Type

int

Parameters
  • contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.

  • n_neighbors (int, optional (default=20)) – Number of neighbors to use by default for k neighbors queries. Note that n_neighbors should be less than the number of samples. If n_neighbors is larger than the number of samples provided, all samples will be used.

fit(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None]

Fit model with training data. :param *: Container DataFrame. Time series data up to fit.

Returns

None

Parameters
  • timeout – A maximum time this primitive should be fitting during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

A CallResult with None value.

get_params() → tods.detection_algorithm.PyodCOF.Params

Return parameters. :param None:

Returns

class Params

Returns

Return type

An instance of parameters.

produce(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]

Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.

Returns

Container DataFrame 1 marks Outliers, 0 marks normal.

Parameters
  • inputs – The inputs of shape [num_inputs, …].

  • timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

The outputs of shape [num_inputs, …] wrapped inside CallResult.

produce_score(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]

Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.

Returns

Container DataFrame Outlier score of input DataFrame.

set_params(*, params: tods.detection_algorithm.PyodCOF.Params) → None

Set parameters for outlier detection. :param params: class Params

Returns

None

Parameters

params – An instance of parameters.

set_training_data(*, inputs: d3m.container.pandas.DataFrame) → None

Set training data for outlier detection. :param inputs: Container DataFrame

Returns

None

Parameters

inputs – The inputs.

tods.detection_algorithm.PyodHBOS

class tods.detection_algorithm.PyodHBOS.HBOSPrimitive(*args, **kwds)

Bases: tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase

Histogram-based Outlier Detection (HBOS) Histogram- based outlier detection (HBOS) is an efficient unsupervised method. It assumes the feature independence and calculates the degree of outlyingness by building histograms. See :cite:`goldstein2012histogram` for details.

bin_edges_

The edges of the bins.

Type

numpy array of shape (n_bins + 1, n_features )

hist_

The density of each histogram.

Type

numpy array of shape (n_bins, n_features)

decision_scores_

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type

numpy array of shape (n_samples,)

threshold_

The threshold is based on contamination. It is the n_samples * contamination most abnormal samples in decision_scores_. The threshold is calculated for generating binary outlier labels.

Type

float

labels_

The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying threshold_ on decision_scores_.

Type

int, either 0 or 1

Parameters
  • n_bins (int, optional (default=10)) – The number of bins.

  • alpha (float in (0, 1), optional (default=0.1)) – The regularizer for preventing overflow.

  • tol (float in (0, 1), optional (default=0.1)) – The parameter to decide the flexibility while dealing the samples falling outside the bins.

  • contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.

fit(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None]

Fit model with training data. :param *: Container DataFrame. Time series data up to fit.

Returns

None

Parameters
  • timeout – A maximum time this primitive should be fitting during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

A CallResult with None value.

get_params() → tods.detection_algorithm.PyodHBOS.Params

Return parameters. :param None:

Returns

class Params

Returns

Return type

An instance of parameters.

produce(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]

Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.

Returns

Container DataFrame 1 marks Outliers, 0 marks normal.

Parameters
  • inputs – The inputs of shape [num_inputs, …].

  • timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

The outputs of shape [num_inputs, …] wrapped inside CallResult.

produce_score(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]

Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.

Returns

Container DataFrame Outlier score of input DataFrame.

set_params(*, params: tods.detection_algorithm.PyodHBOS.Params) → None

Set parameters for outlier detection. :param params: class Params

Returns

None

Parameters

params – An instance of parameters.

set_training_data(*, inputs: d3m.container.pandas.DataFrame) → None

Set training data for outlier detection. :param inputs: Container DataFrame

Returns

None

Parameters

inputs – The inputs.

tods.detection_algorithm.PyodIsolationForest

class tods.detection_algorithm.PyodIsolationForest.IsolationForestPrimitive(*args, **kwds)

Bases: tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase

Wrapper of Pyod Isolation Forest with more functionalities. The IsolationForest ‘isolates’ observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature. See :cite:`liu2008isolation,liu2012isolation` for details. Since recursive partitioning can be represented by a tree structure, the number of splittings required to isolate a sample is equivalent to the path length from the root node to the terminating node. This path length, averaged over a forest of such random trees, is a measure of normality and our decision function. Random partitioning produces noticeably shorter paths for anomalies. Hence, when a forest of random trees collectively produce shorter path lengths for particular samples, they are highly likely to be anomalies.

decision_scores_

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type

numpy array of shape (n_samples,)

threshold_

The threshold is based on contamination. It is the n_samples * contamination most abnormal samples in decision_scores_. The threshold is calculated for generating binary outlier labels.

Type

float

labels_

The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying threshold_ on decision_scores_.

Type

int, either 0 or 1

Parameters
  • n_estimators (int, optional (default=100)) – The number of base estimators in the ensemble.

  • max_samples (int or float, optional (default=``”auto”``)) –

    The number of samples to draw from X to train each base estimator.
    • If int, then draw max_samples samples.

    • If float, then draw max_samples * X.shape[0] samples.

    • If “auto”, then max_samples=min(256, n_samples).

    If max_samples is larger than the number of samples provided, all samples will be used for all trees (no sampling).

  • contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.

  • max_features (int or float, optional (default=1.0)) –

    The number of features to draw from X to train each base estimator.
    • If int, then draw max_features features.

    • If float, then draw max_features * X.shape[1] features.

  • bootstrap (bool, optional (default=False)) – If True, individual trees are fit on random subsets of the training data sampled with replacement. If False, sampling without replacement is performed.

  • behaviour (str, default 'old') – Behaviour of the decision_function which can be either ‘old’ or ‘new’. Passing behaviour='new' makes the decision_function change to match other anomaly detection algorithm API which will be the default behaviour in the future. As explained in details in the offset_ attribute documentation, the decision_function becomes dependent on the contamination parameter, in such a way that 0 becomes its natural threshold to detect outliers.

  • random_state (int, RandomState instance or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

  • verbose (int, optional (default=0)) – Controls the verbosity of the tree building process.

fit(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None]

Fit model with training data. :param *: Container DataFrame. Time series data up to fit.

Returns

None

Parameters
  • timeout – A maximum time this primitive should be fitting during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

A CallResult with None value.

get_params() → tods.detection_algorithm.PyodIsolationForest.Params

Return parameters. :param None:

Returns

class Params

Returns

Return type

An instance of parameters.

produce(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]

Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.

Returns

Container DataFrame 1 marks Outliers, 0 marks normal.

Parameters
  • inputs – The inputs of shape [num_inputs, …].

  • timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

The outputs of shape [num_inputs, …] wrapped inside CallResult.

set_params(*, params: tods.detection_algorithm.PyodIsolationForest.Params) → None

Set parameters for outlier detection. :param params: class Params

Returns

None

Parameters

params – An instance of parameters.

set_training_data(*, inputs: d3m.container.pandas.DataFrame) → None

Set training data for outlier detection. :param inputs: Container DataFrame

Returns

None

Parameters

inputs – The inputs.

tods.detection_algorithm.PyodKNN

class tods.detection_algorithm.PyodKNN.KNNPrimitive(*args, **kwds)

Bases: tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase

kNN class for outlier detection. For an observation, its distance to its kth nearest neighbor could be viewed as the outlying score. It could be viewed as a way to measure the density. See :cite:`ramaswamy2000efficient,angiulli2002fast` for details. Three kNN detectors are supported: largest: use the distance to the kth neighbor as the outlier score mean: use the average of all k neighbors as the outlier score median: use the median of the distance to k neighbors as the outlier score

decision_scores_

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type

numpy array of shape (n_samples,)

threshold_

The threshold is based on contamination. It is the n_samples * contamination most abnormal samples in decision_scores_. The threshold is calculated for generating binary outlier labels.

Type

float

labels_

The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying threshold_ on decision_scores_.

Type

int, either 0 or 1

Parameters
  • contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.

  • n_neighbors (int, optional (default = 5)) – Number of neighbors to use by default for k neighbors queries.

  • method (str, optional (default=``’largest’``)) –

    {‘largest’, ‘mean’, ‘median’} - ‘largest’: use the distance to the kth neighbor as the outlier score - ‘mean’: use the average of all k neighbors as the outlier score - ‘median’: use the median of the distance to k neighbors as the

    outlier score

  • radius (float, optional (default = 1.0)) – Range of parameter space to use by default for radius_neighbors queries.

  • algorithm ({'auto', 'ball_tree', 'kd_tree', 'brute'}, optional) –

    Algorithm used to compute the nearest neighbors: - ‘ball_tree’ will use BallTree - ‘kd_tree’ will use KDTree - ‘brute’ will use a brute-force search. - ‘auto’ will attempt to decide the most appropriate algorithm

    based on the values passed to fit() method.

    Note: fitting on sparse input will override the setting of this parameter, using brute force. .. deprecated:: 0.74

    algorithm is deprecated in PyOD 0.7.4 and will not be possible in 0.7.6. It has to use BallTree for consistency.

  • leaf_size (int, optional (default = 30)) – Leaf size passed to BallTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.

  • metric (string or callable, default 'minkowski') –

    metric to use for distance computation. Any metric from scikit-learn or scipy.spatial.distance can be used. If metric is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two arrays as input and return one value indicating the distance between them. This works for Scipy’s metrics, but is less efficient than passing the metric name as a string. Distance matrices are not supported. Valid values for metric are: - from scikit-learn: [‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’,

    ’manhattan’]

    • from scipy.spatial.distance: [‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘correlation’, ‘dice’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’]

    See the documentation for scipy.spatial.distance for details on these metrics.

  • p (integer, optional (default = 2)) – Parameter for the Minkowski metric from sklearn.metrics.pairwise.pairwise_distances. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used. See http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances

  • metric_params (dict, optional (default = None)) – Additional keyword arguments for the metric function.

  • n_jobs (int, optional (default = 1)) – The number of parallel jobs to run for neighbors search. If -1, then the number of jobs is set to the number of CPU cores. Affects only kneighbors and kneighbors_graph methods.

fit(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None]

Fit model with training data. :param *: Container DataFrame. Time series data up to fit.

Returns

None

Parameters
  • timeout – A maximum time this primitive should be fitting during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

A CallResult with None value.

get_params() → tods.detection_algorithm.PyodKNN.Params

Return parameters. :param None:

Returns

class Params

Returns

Return type

An instance of parameters.

produce(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]

Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.

Returns

Container DataFrame 1 marks Outliers, 0 marks normal.

Parameters
  • inputs – The inputs of shape [num_inputs, …].

  • timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

The outputs of shape [num_inputs, …] wrapped inside CallResult.

set_params(*, params: tods.detection_algorithm.PyodKNN.Params) → None

Set parameters for outlier detection. :param params: class Params

Returns

None

Parameters

params – An instance of parameters.

set_training_data(*, inputs: d3m.container.pandas.DataFrame) → None

Set training data for outlier detection. :param inputs: Container DataFrame

Returns

None

Parameters

inputs – The inputs.

tods.detection_algorithm.PyodLODA

class tods.detection_algorithm.PyodLODA.LODAPrimitive(*args, **kwds)

Bases: tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase

Wrap of Pyod loda. Loda: Lightweight on-line detector of anomalies. See :cite:`pevny2016loda` for more information.

decision_scores_

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type

numpy array of shape (n_samples,)

threshold_

The threshold is based on contamination. It is the n_samples * contamination most abnormal samples in decision_scores_. The threshold is calculated for generating binary outlier labels.

Type

float

labels_

The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying threshold_ on decision_scores_.

Type

int, either 0 or 1

Parameters
  • contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.

  • n_bins (int, optional (default = 10)) – The number of bins for the histogram.

  • n_random_cuts (int, optional (default = 100)) – The number of random cuts.

fit(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None]

Fit model with training data. :param *: Container DataFrame. Time series data up to fit.

Returns

None

Parameters
  • timeout – A maximum time this primitive should be fitting during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

A CallResult with None value.

get_params() → tods.detection_algorithm.PyodLODA.Params

Return parameters. :param None:

Returns

class Params

Returns

Return type

An instance of parameters.

produce(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]

Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.

Returns

Container DataFrame 1 marks Outliers, 0 marks normal.

Parameters
  • inputs – The inputs of shape [num_inputs, …].

  • timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

The outputs of shape [num_inputs, …] wrapped inside CallResult.

set_params(*, params: tods.detection_algorithm.PyodLODA.Params) → None

Set parameters for outlier detection. :param params: class Params

Returns

None

Parameters

params – An instance of parameters.

set_training_data(*, inputs: d3m.container.pandas.DataFrame) → None

Set training data for outlier detection. :param inputs: Container DataFrame

Returns

None

Parameters

inputs – The inputs.

tods.detection_algorithm.PyodLOF

class tods.detection_algorithm.PyodLOF.LOFPrimitive(*args, **kwds)

Bases: tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase

Wrapper of Pyod LOF Class with more functionalities. Unsupervised Outlier Detection using Local Outlier Factor (LOF). The anomaly score of each sample is called Local Outlier Factor. It measures the local deviation of density of a given sample with respect to its neighbors. It is local in that the anomaly score depends on how isolated the object is with respect to the surrounding neighborhood. More precisely, locality is given by k-nearest neighbors, whose distance is used to estimate the local density. By comparing the local density of a sample to the local densities of its neighbors, one can identify samples that have a substantially lower density than their neighbors. These are considered outliers. See :cite:`breunig2000lof` for details.

decision_scores_

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type

numpy array of shape (n_samples,)

threshold_

The threshold is based on contamination. It is the n_samples * contamination most abnormal samples in decision_scores_. The threshold is calculated for generating binary outlier labels.

Type

float

labels_

The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying threshold_ on decision_scores_.

Type

int, either 0 or 1

Parameters
  • n_neighbors (int, optional (default=20)) – Number of neighbors to use by default for kneighbors queries. If n_neighbors is larger than the number of samples provided, all samples will be used.

  • algorithm ({'auto', 'ball_tree', 'kd_tree', 'brute'}, optional) –

    Algorithm used to compute the nearest neighbors: - ‘ball_tree’ will use BallTree - ‘kd_tree’ will use KDTree - ‘brute’ will use a brute-force search. - ‘auto’ will attempt to decide the most appropriate algorithm

    based on the values passed to fit() method.

    Note: fitting on sparse input will override the setting of this parameter, using brute force.

  • leaf_size (int, optional (default=30)) – Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.

  • metric (string or callable, default 'minkowski') –

    metric used for the distance computation. Any metric from scikit-learn or scipy.spatial.distance can be used. If ‘precomputed’, the training input X is expected to be a distance matrix. If metric is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two arrays as input and return one value indicating the distance between them. This works for Scipy’s metrics, but is less efficient than passing the metric name as a string. Valid values for metric are: - from scikit-learn: [‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’,

    ’manhattan’]

    • from scipy.spatial.distance: [‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘correlation’, ‘dice’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’]

    See the documentation for scipy.spatial.distance for details on these metrics: http://docs.scipy.org/doc/scipy/reference/spatial.distance.html

  • p (integer, optional (default = 2)) – Parameter for the Minkowski metric from sklearn.metrics.pairwise.pairwise_distances. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used. See http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances

  • metric_params (dict, optional (default = None)) – Additional keyword arguments for the metric function.

  • contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. When fitting this is used to define the threshold on the decision function.

  • n_jobs (int, optional (default = 1)) – The number of parallel jobs to run for neighbors search. If -1, then the number of jobs is set to the number of CPU cores. Affects only kneighbors and kneighbors_graph methods.

fit(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None]

Fit model with training data. :param *: Container DataFrame. Time series data up to fit.

Returns

None

Parameters
  • timeout – A maximum time this primitive should be fitting during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

A CallResult with None value.

get_params() → tods.detection_algorithm.PyodLOF.Params

Return parameters. :param None:

Returns

class Params

Returns

Return type

An instance of parameters.

produce(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]

Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.

Returns

Container DataFrame 1 marks Outliers, 0 marks normal.

Parameters
  • inputs – The inputs of shape [num_inputs, …].

  • timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

The outputs of shape [num_inputs, …] wrapped inside CallResult.

set_params(*, params: tods.detection_algorithm.PyodLOF.Params) → None

Set parameters for outlier detection. :param params: class Params

Returns

None

Parameters

params – An instance of parameters.

set_training_data(*, inputs: d3m.container.pandas.DataFrame) → None

Set training data for outlier detection. :param inputs: Container DataFrame

Returns

None

Parameters

inputs – The inputs.

tods.detection_algorithm.PyodMoGaal

class tods.detection_algorithm.PyodMoGaal.Mo_GaalPrimitive(*args, **kwds)

Bases: tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase

Multi-Objective Generative Adversarial Active Learning. MO_GAAL directly generates informative potential outliers to assist the classifier in describing a boundary that can separate outliers from normal data effectively. Moreover, to prevent the generator from falling into the mode collapsing problem, the network structure of SO-GAAL is expanded from a single generator (SO-GAAL) to multiple generators with different objectives (MO-GAAL) to generate a reasonable reference distribution for the whole dataset. Read more in the :cite:`liu2019generative`.

decision_scores_

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type

numpy array of shape (n_samples,)

threshold_

The threshold is based on contamination. It is the n_samples * contamination most abnormal samples in decision_scores_. The threshold is calculated for generating binary outlier labels.

Type

float

labels_

The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying threshold_ on decision_scores_.

Type

int, either 0 or 1

Parameters
  • contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.

  • k (int, optional (default=10)) – The number of sub generators.

  • stop_epochs (int, optional (default=20)) – The number of epochs of training.

  • lr_d (float, optional (default=0.01)) – The learn rate of the discriminator.

  • lr_g (float, optional (default=0.0001)) – The learn rate of the generator.

  • decay (float, optional (default=1e-6)) – The decay parameter for SGD.

  • momentum (float, optional (default=0.9)) – The momentum parameter for SGD.

fit(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None]

Fit model with training data. :param *: Container DataFrame. Time series data up to fit.

Returns

None

Parameters
  • timeout – A maximum time this primitive should be fitting during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

A CallResult with None value.

get_params() → tods.detection_algorithm.PyodMoGaal.Params

Return parameters. :param None:

Returns

class Params

Returns

Return type

An instance of parameters.

produce(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]

Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.

Returns

Container DataFrame 1 marks Outliers, 0 marks normal.

Parameters
  • inputs – The inputs of shape [num_inputs, …].

  • timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

The outputs of shape [num_inputs, …] wrapped inside CallResult.

set_params(*, params: tods.detection_algorithm.PyodMoGaal.Params) → None

Set parameters for outlier detection. :param params: class Params

Returns

None

Parameters

params – An instance of parameters.

set_training_data(*, inputs: d3m.container.pandas.DataFrame) → None

Set training data for outlier detection. :param inputs: Container DataFrame

Returns

None

Parameters

inputs – The inputs.

tods.detection_algorithm.PyodOCSVM

class tods.detection_algorithm.PyodOCSVM.OCSVMPrimitive(*args, **kwds)

Bases: tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase

Wrapper of scikit-learn one-class SVM Class with more functionalities. Unsupervised Outlier Detection. Estimate the support of a high-dimensional distribution. The implementation is based on libsvm. See http://scikit-learn.org/stable/modules/svm.html#svm-outlier-detection and :cite:`scholkopf2001estimating`.

decision_scores_

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type

numpy array of shape (n_samples,)

threshold_

The threshold is based on contamination. It is the n_samples * contamination most abnormal samples in decision_scores_. The threshold is calculated for generating binary outlier labels.

Type

float

labels_

The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying threshold_ on decision_scores_.

Type

int, either 0 or 1

Parameters
  • kernel (string, optional (default=``’rbf’``)) – Specifies the kernel type to be used in the algorithm. It must be one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ or a callable. If none is given, ‘rbf’ will be used. If a callable is given it is used to precompute the kernel matrix.

  • nu (float, optional) – An upper bound on the fraction of training errors and a lower bound of the fraction of support vectors. Should be in the interval (0, 1]. By default 0.5 will be taken.

  • degree (int, optional (default=3)) – Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels.

  • gamma (float, optional (default=``’auto’``)) – Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’. If gamma is ‘auto’ then 1/n_features will be used instead.

  • coef0 (float, optional (default=0.0)) – Independent term in kernel function. It is only significant in ‘poly’ and ‘sigmoid’.

  • tol (float, optional) – Tolerance for stopping criterion.

  • shrinking (bool, optional) – Whether to use the shrinking heuristic.

  • cache_size (float, optional) – Specify the size of the kernel cache (in MB).

  • verbose (bool, default: False) – Enable verbose output. Note that this setting takes advantage of a per-process runtime setting in libsvm that, if enabled, may not work properly in a multithreaded context.

  • max_iter (int, optional (default=-1)) – Hard limit on iterations within solver, or -1 for no limit.

fit(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None]

Fit model with training data. :param *: Container DataFrame. Time series data up to fit.

Returns

None

Parameters
  • timeout – A maximum time this primitive should be fitting during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

A CallResult with None value.

get_params() → tods.detection_algorithm.PyodOCSVM.Params

Return parameters. :param None:

Returns

class Params

Returns

Return type

An instance of parameters.

produce(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]

Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.

Returns

Container DataFrame 1 marks Outliers, 0 marks normal.

Parameters
  • inputs – The inputs of shape [num_inputs, …].

  • timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

The outputs of shape [num_inputs, …] wrapped inside CallResult.

set_params(*, params: tods.detection_algorithm.PyodOCSVM.Params) → None

Set parameters for outlier detection. :param params: class Params

Returns

None

Parameters

params – An instance of parameters.

set_training_data(*, inputs: d3m.container.pandas.DataFrame) → None

Set training data for outlier detection. :param inputs: Container DataFrame

Returns

None

Parameters

inputs – The inputs.

tods.detection_algorithm.PyodSOD

class tods.detection_algorithm.PyodSOD.SODPrimitive(*args, **kwds)

Bases: tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase

Subspace outlier detection (SOD) schema aims to detect outlier in varying subspaces of a high dimensional feature space. For each data object, SOD explores the axis-parallel subspace spanned by the data object’s neighbors and determines how much the object deviates from the neighbors in this subspace. See :cite:`kriegel2009outlier` for details.

decision_scores_

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type

numpy array of shape (n_samples,)

threshold_

The threshold is based on contamination. It is the n_samples * contamination most abnormal samples in decision_scores_. The threshold is calculated for generating binary outlier labels.

Type

float

labels_

The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying threshold_ on decision_scores_.

Type

int, either 0 or 1

Parameters
  • n_neighbors (int, optional (default=20)) – Number of neighbors to use by default for k neighbors queries.

  • ref_set (int, optional (default=10)) – specifies the number of shared nearest neighbors to create the reference set. Note that ref_set must be smaller than n_neighbors.

  • alpha (float in (0., 1.), optional (default=0.8)) – specifies the lower limit for selecting subspace. 0.8 is set as default as suggested in the original paper.

  • contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.

fit(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None]

Fit model with training data. :param *: Container DataFrame. Time series data up to fit.

Returns

None

Parameters
  • timeout – A maximum time this primitive should be fitting during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

A CallResult with None value.

get_params() → tods.detection_algorithm.PyodSOD.Params

Return parameters. :param None:

Returns

class Params

Returns

Return type

An instance of parameters.

produce(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]

Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.

Returns

Container DataFrame 1 marks Outliers, 0 marks normal.

Parameters
  • inputs – The inputs of shape [num_inputs, …].

  • timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

The outputs of shape [num_inputs, …] wrapped inside CallResult.

set_params(*, params: tods.detection_algorithm.PyodSOD.Params) → None

Set parameters for outlier detection. :param params: class Params

Returns

None

Parameters

params – An instance of parameters.

set_training_data(*, inputs: d3m.container.pandas.DataFrame) → None

Set training data for outlier detection. :param inputs: Container DataFrame

Returns

None

Parameters

inputs – The inputs.

tods.detection_algorithm.PyodSoGaal

class tods.detection_algorithm.PyodSoGaal.So_GaalPrimitive(*args, **kwds)

Bases: tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase

Single-Objective Generative Adversarial Active Learning. SO-GAAL directly generates informative potential outliers to assist the classifier in describing a boundary that can separate outliers from normal data effectively. Moreover, to prevent the generator from falling into the mode collapsing problem, the network structure of SO-GAAL is expanded from a single generator (SO-GAAL) to multiple generators with different objectives (MO-GAAL) to generate a reasonable reference distribution for the whole dataset. Read more in the :cite:`liu2019generative`.

decision_scores_

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type

numpy array of shape (n_samples,)

threshold_

The threshold is based on contamination. It is the n_samples * contamination most abnormal samples in decision_scores_. The threshold is calculated for generating binary outlier labels.

Type

float

labels_

The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying threshold_ on decision_scores_.

Type

int, either 0 or 1

Parameters
  • contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.

  • stop_epochs (int, optional (default=20)) – The number of epochs of training.

  • lr_d (float, optional (default=0.01)) – The learn rate of the discriminator.

  • lr_g (float, optional (default=0.0001)) – The learn rate of the generator.

  • decay (float, optional (default=1e-6)) – The decay parameter for SGD.

  • momentum (float, optional (default=0.9)) – The momentum parameter for SGD.

fit(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None]

Fit model with training data. :param *: Container DataFrame. Time series data up to fit.

Returns

None

Parameters
  • timeout – A maximum time this primitive should be fitting during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

A CallResult with None value.

get_params() → tods.detection_algorithm.PyodSoGaal.Params

Return parameters. :param None:

Returns

class Params

Returns

Return type

An instance of parameters.

produce(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]

Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.

Returns

Container DataFrame 1 marks Outliers, 0 marks normal.

Parameters
  • inputs – The inputs of shape [num_inputs, …].

  • timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

The outputs of shape [num_inputs, …] wrapped inside CallResult.

set_params(*, params: tods.detection_algorithm.PyodSoGaal.Params) → None

Set parameters for outlier detection. :param params: class Params

Returns

None

Parameters

params – An instance of parameters.

set_training_data(*, inputs: d3m.container.pandas.DataFrame) → None

Set training data for outlier detection. :param inputs: Container DataFrame

Returns

None

Parameters

inputs – The inputs.

tods.detection_algorithm.PyodVAE

class tods.detection_algorithm.PyodVAE.VariationalAutoEncoderPrimitive(*args, **kwds)

Bases: tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase

Auto Encoder (AE) is a type of neural networks for learning useful data representations unsupervisedly. Similar to PCA, AE could be used to detect outlying objects in the data by calculating the reconstruction errors. See :cite:`aggarwal2015outlier` Chapter 3 for details.

encoding_dim_

The number of neurons in the encoding layer.

Type

int

compression_rate_

The ratio between the original feature and the number of neurons in the encoding layer.

Type

float

model_

The underlying AutoEncoder in Keras.

Type

Keras Object

history_

The AutoEncoder training history.

Type

Keras Object

decision_scores_

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type

numpy array of shape (n_samples,)

threshold_

The threshold is based on contamination. It is the n_samples * contamination most abnormal samples in decision_scores_. The threshold is calculated for generating binary outlier labels.

Type

float

labels_

The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying threshold_ on decision_scores_.

Type

int, either 0 or 1

Parameters
  • hidden_neurons (list, optional (default=[4, 2, 4])) – The number of neurons per hidden layers.

  • hidden_activation (str, optional (default=``’relu’``)) – Activation function to use for hidden layers. All hidden layers are forced to use the same type of activation. See https://keras.io/activations/

  • output_activation (str, optional (default=``’sigmoid’``)) – Activation function to use for output layer. See https://keras.io/activations/

  • loss (str or obj, optional (default=keras.losses.mean_squared_error)) – String (name of objective function) or objective function. See https://keras.io/losses/

  • optimizer (str, optional (default=``’adam’``)) – String (name of optimizer) or optimizer instance. See https://keras.io/optimizers/

  • epochs (int, optional (default=100)) – Number of epochs to train the model.

  • batch_size (int, optional (default=32)) – Number of samples per gradient update.

  • dropout_rate (float in (0., 1), optional (default=0.2)) – The dropout to be used across all layers.

  • l2_regularizer (float in (0., 1), optional (default=0.1)) – The regularization strength of activity_regularizer applied on each layer. By default, l2 regularizer is used. See https://keras.io/regularizers/

  • validation_size (float in (0., 1), optional (default=0.1)) – The percentage of data to be used for validation.

  • preprocessing (bool, optional (default=True)) – If True, apply standardization on the data.

  • verbose (int, optional (default=1)) – Verbosity mode. - 0 = silent - 1 = progress bar - 2 = one line per epoch. For verbosity >= 1, model summary may be printed.

  • random_state (random_state: int, RandomState instance or None, optional) – (default=None) If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

  • contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. When fitting this is used to define the threshold on the decision function.

fit(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None]

Fit model with training data. :param *: Container DataFrame. Time series data up to fit.

Returns

None

Parameters
  • timeout – A maximum time this primitive should be fitting during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

A CallResult with None value.

get_params() → tods.detection_algorithm.PyodVAE.Params

Return parameters. :param None:

Returns

class Params

Returns

Return type

An instance of parameters.

produce(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]

Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.

Returns

Container DataFrame 1 marks Outliers, 0 marks normal.

Parameters
  • inputs – The inputs of shape [num_inputs, …].

  • timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

The outputs of shape [num_inputs, …] wrapped inside CallResult.

set_params(*, params: tods.detection_algorithm.PyodVAE.Params) → None

Set parameters for outlier detection. :param params: class Params

Returns

None

Parameters

params – An instance of parameters.

set_training_data(*, inputs: d3m.container.pandas.DataFrame) → None

Set training data for outlier detection. :param inputs: Container DataFrame

Returns

None

Parameters

inputs – The inputs.

tods.detection_algorithm.Telemanom

tods.detection_algorithm.UODBasePrimitive

class tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase(*args, **kwds)

Bases: tods.common.TODSBasePrimitives.TODSUnsupervisedLearnerPrimitiveBase

A base class for primitives which have to be fitted before they can start producing (useful) outputs from inputs, but they are fitted only on input data.

clf_.decision_scores_

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type

numpy array of shape (n_samples,)

clf_.threshold_

For outlier, decision_scores_ more than threshold_. For inlier, decision_scores_ less than threshold_.

Type

float within (0, 1)

clf_.labels_

The binary labels of the training data. 0 stands for inliers. and 1 for outliers/anomalies. It is generated by applying. threshold_ on decision_scores_.

Type

int, either 0 or 1

left_inds_

One of the mapping from decision_score to data. For point outlier detection, left_inds_ exactly equals the index of each data point. For Collective outlier detection, left_inds_ equals the start index of each subsequence.

Type

ndarray,

left_inds_

One of the mapping from decision_score to data. For point outlier detection, left_inds_ exactly equals the index of each data point plus 1. For Collective outlier detection, left_inds_ equals the ending index of each subsequence.

Type

ndarray,

Parameters

contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. When fitting this is used to define the threshold on the decision function.

get_params() → tods.detection_algorithm.UODBasePrimitive.Params_ODBase

Return parameters. :param None:

Returns

class Params_ODBase

Returns

Return type

An instance of parameters.

set_params(*, params: tods.detection_algorithm.UODBasePrimitive.Params_ODBase) → None

Set parameters for outlier detection. :param params: class Params_ODBase

Returns

None

Parameters

params – An instance of parameters.

abstract set_training_data(*, inputs: Inputs) → None

Set training data for outlier detection. :param inputs: Container DataFrame

Returns

None

Parameters

inputs – The inputs.