tods.detection_algorithm Module¶
tods.detection_algorithm.AutoRegODetect¶
-
class
tods.detection_algorithm.AutoRegODetect.
AutoRegODetectorPrimitive
(*args, **kwds) Bases:
tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase
Autoregressive models use linear regression to calculate a sample’s deviance from the predicted value, which is then used as its outlier scores. This model is for multivariate time series. This model handles multivariate time series by various combination approaches. See AutoRegOD for univarite data.
See :cite:`aggarwal2015outlier,zhao2020using` for details.
-
decision_scores_
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type
numpy array
ofshape (n_samples,)
-
labels_
The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying
threshold_
ondecision_scores_
.- Type
int
,either 0
or1
- Parameters
window_size (
int
) – The moving window size.step_size (
int
,optional (default=1)
) – The displacement for moving window.contamination (
float in (0.
,0.5)
,optional (default=0.1)
) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. When fitting this is used to define the threshold on the decision function.method (
str
,optional (default=``
’average’``)
) – Combination method: {‘average’, ‘maximization’, ‘median’}. Pass in weights of detector for weighted version.weights (
numpy array
ofshape (1
,n_dimensions)
) – Score weight by dimensions. (default=[1,1,…,1])
-
fit
(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None] Fit model with training data. :param *: Container DataFrame. Time series data up to fit.
- Returns
None
- Parameters
timeout – A maximum time this primitive should be fitting during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
A
CallResult
withNone
value.
-
get_params
() → tods.detection_algorithm.AutoRegODetect.Params Return parameters. :param None:
- Returns
class Params
- Returns
- Return type
An instance of parameters.
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.
- Returns
Container DataFrame 1 marks Outliers, 0 marks normal.
- Parameters
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
produce_score
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.
- Returns
Container DataFrame Outlier score of input DataFrame.
-
set_params
(*, params: tods.detection_algorithm.AutoRegODetect.Params) → None Set parameters for outlier detection. :param params: class Params
- Returns
None
- Parameters
params – An instance of parameters.
-
set_training_data
(*, inputs: d3m.container.pandas.DataFrame) → None Set training data for outlier detection. :param inputs: Container DataFrame
- Returns
None
- Parameters
inputs – The inputs.
-
tods.detection_algorithm.DeepLog¶
-
class
tods.detection_algorithm.DeepLog.
DeepLogPrimitive
(*args, **kwds) Bases:
tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase
A primitive that uses DeepLog for outlier detection
-
clf_.
decision_scores_
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type
numpy array
ofshape (n_samples,)
-
clf_.
threshold_
For outlier, decision_scores_ more than threshold_. For inlier, decision_scores_ less than threshold_.
- Type
float within (0
,1)
-
clf_.
labels_
The binary labels of the training data. 0 stands for inliers. and 1 for outliers/anomalies. It is generated by applying.
threshold_
ondecision_scores_
.- Type
int
,either 0
or1
-
left_inds_
One of the mapping from decision_score to data. For point outlier detection, left_inds_ exactly equals the index of each data point. For Collective outlier detection, left_inds_ equals the start index of each subsequence.
- Type
ndarray,
-
left_inds_
One of the mapping from decision_score to data. For point outlier detection, left_inds_ exactly equals the index of each data point plus 1. For Collective outlier detection, left_inds_ equals the ending index of each subsequence.
- Type
ndarray,
-
fit
(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None] Fit model with training data. :param *: Container DataFrame. Time series data up to fit.
- Returns
None
- Parameters
timeout – A maximum time this primitive should be fitting during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
A
CallResult
withNone
value.
-
get_params
() → tods.detection_algorithm.DeepLog.Params Return parameters. :param None:
- Returns
class Params
- Returns
- Return type
An instance of parameters.
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.
- Returns
Container DataFrame 1 marks Outliers, 0 marks normal.
- Parameters
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
set_params
(*, params: tods.detection_algorithm.DeepLog.Params) → None Set parameters for outlier detection. :param params: class Params
- Returns
None
- Parameters
params – An instance of parameters.
-
set_training_data
(*, inputs: d3m.container.pandas.DataFrame) → None Set training data for outlier detection. :param inputs: Container DataFrame
- Returns
None
- Parameters
inputs – The inputs.
-
tods.detection_algorithm.KDiscordODetect¶
-
class
tods.detection_algorithm.KDiscordODetect.
KDiscordODetectorPrimitive
(*args, **kwds) Bases:
tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase
KDiscord first split multivariate time series into subsequences (matrices), and it use kNN outlier detection based on PyOD. For an observation, its distance to its kth nearest neighbor could be viewed as the outlying score. It could be viewed as a way to measure the density. See :cite:`ramaswamy2000efficient,angiulli2002fast` for details.
See :cite:`aggarwal2015outlier,zhao2020using` for details.
-
decision_scores_
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type
numpy array
ofshape (n_samples,)
-
threshold_
The threshold is based on
contamination
. It is then_samples * contamination
most abnormal samples indecision_scores_
. The threshold is calculated for generating binary outlier labels.- Type
float
-
labels_
The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying
threshold_
ondecision_scores_
.- Type
int
,either 0
or1
- Parameters
window_size (
int
) – The moving window size.step_size (
int
,optional (default=1)
) – The displacement for moving window.contamination (
float in (0.
,0.5)
,optional (default=0.1)
) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.n_neighbors (
int
,optional (default = 5)
) – Number of neighbors to use by default for k neighbors queries.method (
str
,optional (default=``
’largest’``)
) –{‘largest’, ‘mean’, ‘median’}
’largest’: use the distance to the kth neighbor as the outlier score
’mean’: use the average of all k neighbors as the outlier score
’median’: use the median of the distance to k neighbors as the outlier score
radius (
float
,optional (default = 1.0)
) – Range of parameter space to use by default for radius_neighbors queries.algorithm (
{'auto', 'ball_tree', 'kd_tree', 'brute'}
, optional) –Algorithm used to compute the nearest neighbors:
’ball_tree’ will use BallTree
’kd_tree’ will use KDTree
’brute’ will use a brute-force search.
’auto’ will attempt to decide the most appropriate algorithm based on the values passed to
fit()
method.
Note: fitting on sparse input will override the setting of this parameter, using brute force.
Deprecated since version 0.74:
algorithm
is deprecated in PyOD 0.7.4 and will not be possible in 0.7.6. It has to use BallTree for consistency.leaf_size (
int
,optional (default = 30)
) – Leaf size passed to BallTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.metric (
string
orcallable
, default'minkowski'
) –metric to use for distance computation. Any metric from scikit-learn or scipy.spatial.distance can be used.
If metric is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two arrays as input and return one value indicating the distance between them. This works for Scipy’s metrics, but is less efficient than passing the metric name as a string.
Distance matrices are not supported.
Valid values for metric are:
from scikit-learn: [‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’, ‘manhattan’]
from scipy.spatial.distance: [‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘correlation’, ‘dice’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’]
See the documentation for scipy.spatial.distance for details on these metrics.
p (
integer
,optional (default = 2)
) – Parameter for the Minkowski metric from sklearn.metrics.pairwise.pairwise_distances. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used. See http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distancesmetric_params (
dict
,optional (default = None)
) – Additional keyword arguments for the metric function.
-
fit
(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None] Fit model with training data. :param *: Container DataFrame. Time series data up to fit.
- Returns
None
- Parameters
timeout – A maximum time this primitive should be fitting during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
A
CallResult
withNone
value.
-
get_params
() → tods.detection_algorithm.KDiscordODetect.Params Return parameters. :param None:
- Returns
class Params
- Returns
- Return type
An instance of parameters.
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.
- Returns
Container DataFrame 1 marks Outliers, 0 marks normal.
- Parameters
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
produce_score
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.
- Returns
Container DataFrame Outlier score of input DataFrame.
-
set_params
(*, params: tods.detection_algorithm.KDiscordODetect.Params) → None Set parameters for outlier detection. :param params: class Params
- Returns
None
- Parameters
params – An instance of parameters.
-
set_training_data
(*, inputs: d3m.container.pandas.DataFrame) → None Set training data for outlier detection. :param inputs: Container DataFrame
- Returns
None
- Parameters
inputs – The inputs.
-
tods.detection_algorithm.LSTMODetect¶
-
class
tods.detection_algorithm.LSTMODetect.
LSTMODetectorPrimitive
(*args, **kwds) Bases:
tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase
A base class for primitives which have to be fitted before they can start producing (useful) outputs from inputs, but they are fitted only on input data.
-
decision_scores_
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type
numpy array
ofshape (n_samples,)
-
labels_
The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying
threshold_
ondecision_scores_
.- Type
int
,either 0
or1
- Parameters
window_size (
int
) – The moving window size.step_size (
int
,optional (default=1)
) – The displacement for moving window.contamination (
float in (0.
,0.5)
,optional (default=0.1)
) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. When fitting this is used to define the threshold on the decision function.
-
fit
(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None] Fit model with training data. :param *: Container DataFrame. Time series data up to fit.
- Returns
None
- Parameters
timeout – A maximum time this primitive should be fitting during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
A
CallResult
withNone
value.
-
get_params
() → tods.detection_algorithm.LSTMODetect.Params Return parameters. :param None:
- Returns
class Params
- Returns
- Return type
An instance of parameters.
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.
- Returns
Container DataFrame 1 marks Outliers, 0 marks normal.
- Parameters
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
set_params
(*, params: tods.detection_algorithm.LSTMODetect.Params) → None Set parameters for outlier detection. :param params: class Params
- Returns
None
- Parameters
params – An instance of parameters.
-
set_training_data
(*, inputs: d3m.container.pandas.DataFrame) → None Set training data for outlier detection. :param inputs: Container DataFrame
- Returns
None
- Parameters
inputs – The inputs.
-
tods.detection_algorithm.MatrixProfile¶
-
class
tods.detection_algorithm.MatrixProfile.
MP
(window_size, step_size, contamination) Bases:
tods.detection_algorithm.core.CollectiveBase.CollectiveBaseDetector
This is the class for matrix profile function
-
decision_function
(X) - Parameters
data – dataframe column
- Returns
nparray
-
fit
(X) Fit detector. y is ignored in unsupervised methods. :param X: The input samples. :type X:
numpy array
ofshape (n_samples
,n_features)
:param y: Not used, present for API consistency by convention. :type y:Ignored
- Returns
self – Fitted estimator.
- Return type
object
-
-
class
tods.detection_algorithm.MatrixProfile.
MatrixProfilePrimitive
(*args, **kwds) Bases:
tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase
A primitive that performs matrix profile on a DataFrame using Stumpy package Stumpy documentation: https://stumpy.readthedocs.io/en/latest/index.html
- Parameters
- T_Andarray
The time series or sequence for which to compute the matrix profile
- mint
Window size
- T_Bndarray
The time series or sequence that contain your query subsequences of interest. Default is None which corresponds to a self-join.
- ignore_trivialbool
Set to True if this is a self-join. Otherwise, for AB-join, set this to False. Default is True.
- outndarray
The first column consists of the matrix profile, the second column consists of the matrix profile indices, the third column consists of the left matrix profile indices, and the fourth column consists of the right matrix profile indices.
-
clf_.
decision_scores_
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type
numpy array
ofshape (n_samples,)
-
clf_.
threshold_
For outlier, decision_scores_ more than threshold_. For inlier, decision_scores_ less than threshold_.
- Type
float within (0
,1)
-
clf_.
labels_
The binary labels of the training data. 0 stands for inliers. and 1 for outliers/anomalies. It is generated by applying.
threshold_
ondecision_scores_
.- Type
int
,either 0
or1
-
left_inds_
One of the mapping from decision_score to data. For point outlier detection, left_inds_ exactly equals the index of each data point. For Collective outlier detection, left_inds_ equals the start index of each subsequence.
- Type
ndarray,
-
left_inds_
One of the mapping from decision_score to data. For point outlier detection, left_inds_ exactly equals the index of each data point plus 1. For Collective outlier detection, left_inds_ equals the ending index of each subsequence.
- Type
ndarray,
- Parameters
contamination (
float in (0.
,0.5)
,optional (default=0.1)
) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. When fitting this is used to define the threshold on the decision function.
-
fit
(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None] Fit model with training data. :param *: Container DataFrame. Time series data up to fit.
- Returns
None
- Parameters
timeout – A maximum time this primitive should be fitting during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
A
CallResult
withNone
value.
-
get_params
() → tods.detection_algorithm.MatrixProfile.Params Return parameters. :param None:
- Returns
class Params
- Returns
- Return type
An instance of parameters.
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.
- Returns
Container DataFrame 1 marks Outliers, 0 marks normal.
- Parameters
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
set_params
(*, params: tods.detection_algorithm.MatrixProfile.Params) → None Set parameters for outlier detection. :param params: class Params
- Returns
None
- Parameters
params – An instance of parameters.
-
set_training_data
(*, inputs: d3m.container.pandas.DataFrame) → None Set training data for outlier detection. :param inputs: Container DataFrame
- Returns
None
- Parameters
inputs – The inputs.
tods.detection_algorithm.PCAODetect¶
-
class
tods.detection_algorithm.PCAODetect.
PCAODetectorPrimitive
(*args, **kwds) Bases:
tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase
PCA-based outlier detection with both univariate and multivariate time series data. TS data will be first transformed to tabular format. For univariate data, it will be in shape of [valid_length, window_size]. for multivariate data with d sequences, it will be in the shape of [valid_length, window_size].
-
decision_scores_
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type
numpy array
ofshape (n_samples,)
-
threshold_
The threshold is based on
contamination
. It is then_samples * contamination
most abnormal samples indecision_scores_
. The threshold is calculated for generating binary outlier labels.- Type
float
-
labels_
The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying
threshold_
ondecision_scores_
.- Type
int
,either 0
or1
- Parameters
window_size (
int
) – The moving window size.step_size (
int
,optional (default=1)
) – The displacement for moving window.contamination (
float in (0.
,0.5)
,optional (default=0.1)
) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.n_components (
int
,float
,None
orstring
) –Number of components to keep. It should be smaller than the window_size. if n_components is not set all components are kept:
n_components == min(n_samples, n_features)
if n_components == ‘mle’ and svd_solver == ‘full’, Minka’s MLE is used to guess the dimension if
0 < n_components < 1
and svd_solver == ‘full’, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components n_components cannot be equal to n_features for svd_solver == ‘arpack’.n_selected_components (
int
,optional (default=None)
) – Number of selected principal components for calculating the outlier scores. It is not necessarily equal to the total number of the principal components. If not set, use all principal components.whiten (
bool
,optional (default False)
) –When True (False by default) the components_ vectors are multiplied by the square root of n_samples and then divided by the singular values to ensure uncorrelated outputs with unit component-wise variances.
Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometime improve the predictive accuracy of the downstream estimators by making their data respect some hard-wired assumptions.
svd_solver (string
{'auto', 'full', 'arpack', 'randomized'}
) –- auto :
the solver is selected by a default policy based on X.shape and n_components: if the input data is larger than 500x500 and the number of components to extract is lower than 80% of the smallest dimension of the data, then the more efficient ‘randomized’ method is enabled. Otherwise the exact full SVD is computed and optionally truncated afterwards.
- full :
run exact full SVD calling the standard LAPACK solver via scipy.linalg.svd and select the components by postprocessing
- arpack :
run SVD truncated to n_components calling ARPACK solver via scipy.sparse.linalg.svds. It requires strictly 0 < n_components < X.shape[1]
- randomized :
run randomized SVD by the method of Halko et al.
tol (
float >= 0
,optional (default .0)
) – Tolerance for singular values computed by svd_solver == ‘arpack’.iterated_power (
int >= 0
, or'auto'
, (default'auto'
)
) – Number of iterations for the power method computed by svd_solver == ‘randomized’.random_state (
int
,RandomState instance
orNone
,optional (default None)
) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. Used whensvd_solver
== ‘arpack’ or ‘randomized’.weighted (
bool
,optional (default=True)
) – If True, the eigenvalues are used in score computation. The eigenvectors with small eigenvalues comes with more importance in outlier score calculation.standardization (
bool
,optional (default=True)
) – If True, perform standardization first to convert data to zero mean and unit variance. See http://scikit-learn.org/stable/auto_examples/preprocessing/plot_scaling_importance.html
-
fit
(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None] Fit model with training data. :param *: Container DataFrame. Time series data up to fit.
- Returns
None
- Parameters
timeout – A maximum time this primitive should be fitting during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
A
CallResult
withNone
value.
-
get_params
() → tods.detection_algorithm.PCAODetect.Params Return parameters. :param None:
- Returns
class Params
- Returns
- Return type
An instance of parameters.
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.
- Returns
Container DataFrame 1 marks Outliers, 0 marks normal.
- Parameters
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
produce_score
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.
- Returns
Container DataFrame Outlier score of input DataFrame.
-
set_params
(*, params: tods.detection_algorithm.PCAODetect.Params) → None Set parameters for outlier detection. :param params: class Params
- Returns
None
- Parameters
params – An instance of parameters.
-
set_training_data
(*, inputs: d3m.container.pandas.DataFrame) → None Set training data for outlier detection. :param inputs: Container DataFrame
- Returns
None
- Parameters
inputs – The inputs.
-
tods.detection_algorithm.PyodABOD¶
-
class
tods.detection_algorithm.PyodABOD.
ABODPrimitive
(*args, **kwds) Bases:
tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase
ABOD class for Angle-base Outlier Detection. For an observation, the variance of its weighted cosine scores to all neighbors could be viewed as the outlying score. See :cite:`kriegel2008angle` for details.
Two versions of ABOD are supported:
Fast ABOD: use k nearest neighbors to approximate.
Original ABOD: consider all training points with high time complexity at O(n^3).
-
decision_scores_
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type
numpy array
ofshape (n_samples,)
-
threshold_
The threshold is based on
contamination
. It is then_samples * contamination
most abnormal samples indecision_scores_
. The threshold is calculated for generating binary outlier labels.- Type
float
-
labels_
The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying
threshold_
ondecision_scores_
.- Type
int
,either 0
or1
- Parameters
contamination (
float in (0.
,0.5)
,optional (default=0.1)
) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.n_neighbors (
int
,optional (default=10)
) – Number of neighbors to use by default for k neighbors queries.method (
str
,optional (default=``
’fast’``)
) –Valid values for metric are:
’fast’: fast ABOD. Only consider n_neighbors of training points
’default’: original ABOD with all training points, which could be slow
-
fit
(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None] Fit model with training data. :param *: Container DataFrame. Time series data up to fit.
- Returns
None
- Parameters
timeout – A maximum time this primitive should be fitting during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
A
CallResult
withNone
value.
-
get_params
() → tods.detection_algorithm.PyodABOD.Params Return parameters. :param None:
- Returns
class Params
- Returns
- Return type
An instance of parameters.
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.
- Returns
Container DataFrame 1 marks Outliers, 0 marks normal.
- Parameters
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
produce_score
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.
- Returns
Container DataFrame Outlier score of input DataFrame.
-
set_params
(*, params: tods.detection_algorithm.PyodABOD.Params) → None Set parameters for outlier detection. :param params: class Params
- Returns
None
- Parameters
params – An instance of parameters.
-
set_training_data
(*, inputs: d3m.container.pandas.DataFrame) → None Set training data for outlier detection. :param inputs: Container DataFrame
- Returns
None
- Parameters
inputs – The inputs.
tods.detection_algorithm.PyodAE¶
-
class
tods.detection_algorithm.PyodAE.
AutoEncoderPrimitive
(*args, **kwds) Bases:
tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase
Auto Encoder (AE) is a type of neural networks for learning useful data representations unsupervisedly. Similar to PCA, AE could be used to detect outlying objects in the data by calculating the reconstruction errors. See :cite:`aggarwal2015outlier` Chapter 3 for details.
-
encoding_dim_
The number of neurons in the encoding layer.
- Type
int
-
compression_rate_
The ratio between the original feature and the number of neurons in the encoding layer.
- Type
float
-
model_
The underlying AutoEncoder in Keras.
- Type
Keras Object
-
history_
The AutoEncoder training history.
- Type
Keras Object
-
decision_scores_
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type
numpy array
ofshape (n_samples,)
-
threshold_
The threshold is based on
contamination
. It is then_samples * contamination
most abnormal samples indecision_scores_
. The threshold is calculated for generating binary outlier labels.- Type
float
-
labels_
The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying
threshold_
ondecision_scores_
.- Type
int
,either 0
or1
- Parameters
hidden_neurons (
list
,optional (default=[4,2,4])
) – The number of neurons per hidden layers.hidden_activation (
str
,optional (default=``
’relu’``)
) – Activation function to use for hidden layers. All hidden layers are forced to use the same type of activation. See https://keras.io/activations/output_activation (
str
,optional (default=``
’sigmoid’``)
) – Activation function to use for output layer. See https://keras.io/activations/loss (
str
orobj
,optional (default=keras.losses.mean_squared_error)
) – String (name of objective function) or objective function. See https://keras.io/losses/optimizer (
str
,optional (default=``
’adam’``)
) – String (name of optimizer) or optimizer instance. See https://keras.io/optimizers/epochs (
int
,optional (default=100)
) – Number of epochs to train the model.batch_size (
int
,optional (default=32)
) – Number of samples per gradient update.dropout_rate (
float in (0.
,1)
,optional (default=0.2)
) – The dropout to be used across all layers.l2_regularizer (
float in (0.
,1)
,optional (default=0.1)
) – The regularization strength of activity_regularizer applied on each layer. By default, l2 regularizer is used. See https://keras.io/regularizers/validation_size (
float in (0.
,1)
,optional (default=0.1)
) – The percentage of data to be used for validation.preprocessing (
bool
,optional (default=True)
) – If True, apply standardization on the data.verbose (
int
,optional (default=1)
) – Verbosity mode. - 0 = silent - 1 = progress bar - 2 = one line per epoch. For verbosity >= 1, model summary may be printed.random_state (
random_state
:int
,RandomState instance
orNone
, optional) – (default=None) If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.contamination (
float in (0.
,0.5)
,optional (default=0.1)
) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. When fitting this is used to define the threshold on the decision function.
-
fit
(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None] Fit model with training data. :param *: Container DataFrame. Time series data up to fit.
- Returns
None
- Parameters
timeout – A maximum time this primitive should be fitting during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
A
CallResult
withNone
value.
-
get_params
() → tods.detection_algorithm.PyodAE.Params Return parameters. :param None:
- Returns
class Params
- Returns
- Return type
An instance of parameters.
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.
- Returns
Container DataFrame 1 marks Outliers, 0 marks normal.
- Parameters
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
set_params
(*, params: tods.detection_algorithm.PyodAE.Params) → None Set parameters for outlier detection. :param params: class Params
- Returns
None
- Parameters
params – An instance of parameters.
-
set_training_data
(*, inputs: d3m.container.pandas.DataFrame) → None Set training data for outlier detection. :param inputs: Container DataFrame
- Returns
None
- Parameters
inputs – The inputs.
-
tods.detection_algorithm.PyodCBLOF¶
-
class
tods.detection_algorithm.PyodCBLOF.
CBLOFPrimitive
(*args, **kwds) Bases:
tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase
The CBLOF operator calculates the outlier score based on cluster-based local outlier factor. CBLOF takes as an input the data set and the cluster model that was generated by a clustering algorithm. It classifies the clusters into small clusters and large clusters using the parameters alpha and beta. The anomaly score is then calculated based on the size of the cluster the point belongs to as well as the distance to the nearest large cluster. Use weighting for outlier factor based on the sizes of the clusters as proposed in the original publication. Since this might lead to unexpected behavior (outliers close to small clusters are not found), it is disabled by default.Outliers scores are solely computed based on their distance to the closest large cluster center. By default, kMeans is used for clustering algorithm instead of Squeezer algorithm mentioned in the original paper for multiple reasons. See :cite:`he2003discovering` for details.
-
decision_scores_
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type
numpy array
ofshape (n_samples,)
-
threshold_
The threshold is based on
contamination
. It is then_samples * contamination
most abnormal samples indecision_scores_
. The threshold is calculated for generating binary outlier labels.- Type
float
-
labels_
The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying
threshold_
ondecision_scores_
.- Type
int
,either 0
or1
- Parameters
n_clusters (
int
,optional (default=8)
) – The number of clusters to form as well as the number of centroids to generate.contamination (
float in (0.
,0.5)
,optional (default=0.1)
) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.clustering_estimator (
Estimator
,optional (default=None)
) – The base clustering algorithm for performing data clustering. A valid clustering algorithm should be passed in. The estimator should have standard sklearn APIs, fit() and predict(). The estimator should have attributeslabels_
andcluster_centers_
. Ifcluster_centers_
is not in the attributes once the model is fit, it is calculated as the mean of the samples in a cluster. If not set, CBLOF uses KMeans for scalability. See https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.htmlalpha (
float in (0.5
,1)
,optional (default=0.9)
) – Coefficient for deciding small and large clusters. The ratio of the number of samples in large clusters to the number of samples in small clusters.beta (
int
orfloat in (1,)
,optional (default=5).
) – Coefficient for deciding small and large clusters. For a list sorted clusters by size |C1|, |C2|, …, |Cn|, beta = |Ck|/|Ck-1|use_weights (
bool
,optional (default=False)
) – If set to True, the size of clusters are used as weights in outlier score calculation.check_estimator (
bool
,optional (default=False)
) –If set to True, check whether the base estimator is consistent with sklearn standard. .. warning:
check_estimator may throw errors with scikit-learn 0.20 above.
random_state (
int
,RandomState
orNone
,optional (default=None)
) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
-
fit
(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None] Fit model with training data. :param *: Container DataFrame. Time series data up to fit.
- Returns
None
- Parameters
timeout – A maximum time this primitive should be fitting during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
A
CallResult
withNone
value.
-
get_params
() → tods.detection_algorithm.PyodCBLOF.Params Return parameters. :param None:
- Returns
class Params
- Returns
- Return type
An instance of parameters.
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.
- Returns
Container DataFrame 1 marks Outliers, 0 marks normal.
- Parameters
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
set_params
(*, params: tods.detection_algorithm.PyodCBLOF.Params) → None Set parameters for outlier detection. :param params: class Params
- Returns
None
- Parameters
params – An instance of parameters.
-
set_training_data
(*, inputs: d3m.container.pandas.DataFrame) → None Set training data for outlier detection. :param inputs: Container DataFrame
- Returns
None
- Parameters
inputs – The inputs.
-
tods.detection_algorithm.PyodCOF¶
-
class
tods.detection_algorithm.PyodCOF.
COFPrimitive
(*args, **kwds) Bases:
tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase
Connectivity-Based Outlier Factor (COF) COF uses the ratio of average chaining distance of data point and the average of average chaining distance of k nearest neighbor of the data point, as the outlier score for observations. See :cite:`tang2002enhancing` for details.
-
decision_scores_
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type
numpy array
ofshape (n_samples,)
-
threshold_
The threshold is based on
contamination
. It is then_samples * contamination
most abnormal samples indecision_scores_
. The threshold is calculated for generating binary outlier labels.- Type
float
-
labels_
The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying
threshold_
ondecision_scores_
.- Type
int
,either 0
or1
-
n_neighbors_
Number of neighbors to use by default for k neighbors queries.
- Type
int
- Parameters
contamination (
float in (0.
,0.5)
,optional (default=0.1)
) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.n_neighbors (
int
,optional (default=20)
) – Number of neighbors to use by default for k neighbors queries. Note that n_neighbors should be less than the number of samples. If n_neighbors is larger than the number of samples provided, all samples will be used.
-
fit
(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None] Fit model with training data. :param *: Container DataFrame. Time series data up to fit.
- Returns
None
- Parameters
timeout – A maximum time this primitive should be fitting during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
A
CallResult
withNone
value.
-
get_params
() → tods.detection_algorithm.PyodCOF.Params Return parameters. :param None:
- Returns
class Params
- Returns
- Return type
An instance of parameters.
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.
- Returns
Container DataFrame 1 marks Outliers, 0 marks normal.
- Parameters
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
produce_score
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.
- Returns
Container DataFrame Outlier score of input DataFrame.
-
set_params
(*, params: tods.detection_algorithm.PyodCOF.Params) → None Set parameters for outlier detection. :param params: class Params
- Returns
None
- Parameters
params – An instance of parameters.
-
set_training_data
(*, inputs: d3m.container.pandas.DataFrame) → None Set training data for outlier detection. :param inputs: Container DataFrame
- Returns
None
- Parameters
inputs – The inputs.
-
tods.detection_algorithm.PyodHBOS¶
-
class
tods.detection_algorithm.PyodHBOS.
HBOSPrimitive
(*args, **kwds) Bases:
tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase
Histogram-based Outlier Detection (HBOS) Histogram- based outlier detection (HBOS) is an efficient unsupervised method. It assumes the feature independence and calculates the degree of outlyingness by building histograms. See :cite:`goldstein2012histogram` for details.
-
bin_edges_
The edges of the bins.
- Type
numpy array
ofshape (n_bins + 1
,n_features )
-
hist_
The density of each histogram.
- Type
numpy array
ofshape (n_bins
,n_features)
-
decision_scores_
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type
numpy array
ofshape (n_samples,)
-
threshold_
The threshold is based on
contamination
. It is then_samples * contamination
most abnormal samples indecision_scores_
. The threshold is calculated for generating binary outlier labels.- Type
float
-
labels_
The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying
threshold_
ondecision_scores_
.- Type
int
,either 0
or1
- Parameters
n_bins (
int
,optional (default=10)
) – The number of bins.alpha (
float in (0
,1)
,optional (default=0.1)
) – The regularizer for preventing overflow.tol (
float in (0
,1)
,optional (default=0.1)
) – The parameter to decide the flexibility while dealing the samples falling outside the bins.contamination (
float in (0.
,0.5)
,optional (default=0.1)
) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.
-
fit
(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None] Fit model with training data. :param *: Container DataFrame. Time series data up to fit.
- Returns
None
- Parameters
timeout – A maximum time this primitive should be fitting during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
A
CallResult
withNone
value.
-
get_params
() → tods.detection_algorithm.PyodHBOS.Params Return parameters. :param None:
- Returns
class Params
- Returns
- Return type
An instance of parameters.
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.
- Returns
Container DataFrame 1 marks Outliers, 0 marks normal.
- Parameters
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
produce_score
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.
- Returns
Container DataFrame Outlier score of input DataFrame.
-
set_params
(*, params: tods.detection_algorithm.PyodHBOS.Params) → None Set parameters for outlier detection. :param params: class Params
- Returns
None
- Parameters
params – An instance of parameters.
-
set_training_data
(*, inputs: d3m.container.pandas.DataFrame) → None Set training data for outlier detection. :param inputs: Container DataFrame
- Returns
None
- Parameters
inputs – The inputs.
-
tods.detection_algorithm.PyodIsolationForest¶
-
class
tods.detection_algorithm.PyodIsolationForest.
IsolationForestPrimitive
(*args, **kwds) Bases:
tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase
Wrapper of Pyod Isolation Forest with more functionalities. The IsolationForest ‘isolates’ observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature. See :cite:`liu2008isolation,liu2012isolation` for details. Since recursive partitioning can be represented by a tree structure, the number of splittings required to isolate a sample is equivalent to the path length from the root node to the terminating node. This path length, averaged over a forest of such random trees, is a measure of normality and our decision function. Random partitioning produces noticeably shorter paths for anomalies. Hence, when a forest of random trees collectively produce shorter path lengths for particular samples, they are highly likely to be anomalies.
-
decision_scores_
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type
numpy array
ofshape (n_samples,)
-
threshold_
The threshold is based on
contamination
. It is then_samples * contamination
most abnormal samples indecision_scores_
. The threshold is calculated for generating binary outlier labels.- Type
float
-
labels_
The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying
threshold_
ondecision_scores_
.- Type
int
,either 0
or1
- Parameters
n_estimators (
int
,optional (default=100)
) – The number of base estimators in the ensemble.max_samples (
int
orfloat
,optional (default=``
”auto”``)
) –- The number of samples to draw from X to train each base estimator.
If int, then draw max_samples samples.
If float, then draw max_samples * X.shape[0] samples.
If “auto”, then max_samples=min(256, n_samples).
If max_samples is larger than the number of samples provided, all samples will be used for all trees (no sampling).
contamination (
float in (0.
,0.5)
,optional (default=0.1)
) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.max_features (
int
orfloat
,optional (default=1.0)
) –- The number of features to draw from X to train each base estimator.
If int, then draw max_features features.
If float, then draw max_features * X.shape[1] features.
bootstrap (
bool
,optional (default=False)
) – If True, individual trees are fit on random subsets of the training data sampled with replacement. If False, sampling without replacement is performed.behaviour (
str
, default'old'
) – Behaviour of thedecision_function
which can be either ‘old’ or ‘new’. Passingbehaviour='new'
makes thedecision_function
change to match other anomaly detection algorithm API which will be the default behaviour in the future. As explained in details in theoffset_
attribute documentation, thedecision_function
becomes dependent on the contamination parameter, in such a way that 0 becomes its natural threshold to detect outliers.random_state (
int
,RandomState instance
orNone
,optional (default=None)
) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.verbose (
int
,optional (default=0)
) – Controls the verbosity of the tree building process.
-
fit
(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None] Fit model with training data. :param *: Container DataFrame. Time series data up to fit.
- Returns
None
- Parameters
timeout – A maximum time this primitive should be fitting during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
A
CallResult
withNone
value.
-
get_params
() → tods.detection_algorithm.PyodIsolationForest.Params Return parameters. :param None:
- Returns
class Params
- Returns
- Return type
An instance of parameters.
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.
- Returns
Container DataFrame 1 marks Outliers, 0 marks normal.
- Parameters
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
set_params
(*, params: tods.detection_algorithm.PyodIsolationForest.Params) → None Set parameters for outlier detection. :param params: class Params
- Returns
None
- Parameters
params – An instance of parameters.
-
set_training_data
(*, inputs: d3m.container.pandas.DataFrame) → None Set training data for outlier detection. :param inputs: Container DataFrame
- Returns
None
- Parameters
inputs – The inputs.
-
tods.detection_algorithm.PyodKNN¶
-
class
tods.detection_algorithm.PyodKNN.
KNNPrimitive
(*args, **kwds) Bases:
tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase
kNN class for outlier detection. For an observation, its distance to its kth nearest neighbor could be viewed as the outlying score. It could be viewed as a way to measure the density. See :cite:`ramaswamy2000efficient,angiulli2002fast` for details. Three kNN detectors are supported: largest: use the distance to the kth neighbor as the outlier score mean: use the average of all k neighbors as the outlier score median: use the median of the distance to k neighbors as the outlier score
-
decision_scores_
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type
numpy array
ofshape (n_samples,)
-
threshold_
The threshold is based on
contamination
. It is then_samples * contamination
most abnormal samples indecision_scores_
. The threshold is calculated for generating binary outlier labels.- Type
float
-
labels_
The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying
threshold_
ondecision_scores_
.- Type
int
,either 0
or1
- Parameters
contamination (
float in (0.
,0.5)
,optional (default=0.1)
) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.n_neighbors (
int
,optional (default = 5)
) – Number of neighbors to use by default for k neighbors queries.method (
str
,optional (default=``
’largest’``)
) –{‘largest’, ‘mean’, ‘median’} - ‘largest’: use the distance to the kth neighbor as the outlier score - ‘mean’: use the average of all k neighbors as the outlier score - ‘median’: use the median of the distance to k neighbors as the
outlier score
radius (
float
,optional (default = 1.0)
) – Range of parameter space to use by default for radius_neighbors queries.algorithm (
{'auto', 'ball_tree', 'kd_tree', 'brute'}
, optional) –Algorithm used to compute the nearest neighbors: - ‘ball_tree’ will use BallTree - ‘kd_tree’ will use KDTree - ‘brute’ will use a brute-force search. - ‘auto’ will attempt to decide the most appropriate algorithm
based on the values passed to
fit()
method.Note: fitting on sparse input will override the setting of this parameter, using brute force. .. deprecated:: 0.74
algorithm
is deprecated in PyOD 0.7.4 and will not be possible in 0.7.6. It has to use BallTree for consistency.leaf_size (
int
,optional (default = 30)
) – Leaf size passed to BallTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.metric (
string
orcallable
, default'minkowski'
) –metric to use for distance computation. Any metric from scikit-learn or scipy.spatial.distance can be used. If metric is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two arrays as input and return one value indicating the distance between them. This works for Scipy’s metrics, but is less efficient than passing the metric name as a string. Distance matrices are not supported. Valid values for metric are: - from scikit-learn: [‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’,
’manhattan’]
from scipy.spatial.distance: [‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘correlation’, ‘dice’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’]
See the documentation for scipy.spatial.distance for details on these metrics.
p (
integer
,optional (default = 2)
) – Parameter for the Minkowski metric from sklearn.metrics.pairwise.pairwise_distances. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used. See http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distancesmetric_params (
dict
,optional (default = None)
) – Additional keyword arguments for the metric function.n_jobs (
int
,optional (default = 1)
) – The number of parallel jobs to run for neighbors search. If-1
, then the number of jobs is set to the number of CPU cores. Affects only kneighbors and kneighbors_graph methods.
-
fit
(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None] Fit model with training data. :param *: Container DataFrame. Time series data up to fit.
- Returns
None
- Parameters
timeout – A maximum time this primitive should be fitting during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
A
CallResult
withNone
value.
-
get_params
() → tods.detection_algorithm.PyodKNN.Params Return parameters. :param None:
- Returns
class Params
- Returns
- Return type
An instance of parameters.
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.
- Returns
Container DataFrame 1 marks Outliers, 0 marks normal.
- Parameters
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
set_params
(*, params: tods.detection_algorithm.PyodKNN.Params) → None Set parameters for outlier detection. :param params: class Params
- Returns
None
- Parameters
params – An instance of parameters.
-
set_training_data
(*, inputs: d3m.container.pandas.DataFrame) → None Set training data for outlier detection. :param inputs: Container DataFrame
- Returns
None
- Parameters
inputs – The inputs.
-
tods.detection_algorithm.PyodLODA¶
-
class
tods.detection_algorithm.PyodLODA.
LODAPrimitive
(*args, **kwds) Bases:
tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase
Wrap of Pyod loda. Loda: Lightweight on-line detector of anomalies. See :cite:`pevny2016loda` for more information.
-
decision_scores_
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type
numpy array
ofshape (n_samples,)
-
threshold_
The threshold is based on
contamination
. It is then_samples * contamination
most abnormal samples indecision_scores_
. The threshold is calculated for generating binary outlier labels.- Type
float
-
labels_
The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying
threshold_
ondecision_scores_
.- Type
int
,either 0
or1
- Parameters
contamination (
float in (0.
,0.5)
,optional (default=0.1)
) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.n_bins (
int
,optional (default = 10)
) – The number of bins for the histogram.n_random_cuts (
int
,optional (default = 100)
) – The number of random cuts.
-
fit
(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None] Fit model with training data. :param *: Container DataFrame. Time series data up to fit.
- Returns
None
- Parameters
timeout – A maximum time this primitive should be fitting during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
A
CallResult
withNone
value.
-
get_params
() → tods.detection_algorithm.PyodLODA.Params Return parameters. :param None:
- Returns
class Params
- Returns
- Return type
An instance of parameters.
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.
- Returns
Container DataFrame 1 marks Outliers, 0 marks normal.
- Parameters
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
set_params
(*, params: tods.detection_algorithm.PyodLODA.Params) → None Set parameters for outlier detection. :param params: class Params
- Returns
None
- Parameters
params – An instance of parameters.
-
set_training_data
(*, inputs: d3m.container.pandas.DataFrame) → None Set training data for outlier detection. :param inputs: Container DataFrame
- Returns
None
- Parameters
inputs – The inputs.
-
tods.detection_algorithm.PyodLOF¶
-
class
tods.detection_algorithm.PyodLOF.
LOFPrimitive
(*args, **kwds) Bases:
tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase
Wrapper of Pyod LOF Class with more functionalities. Unsupervised Outlier Detection using Local Outlier Factor (LOF). The anomaly score of each sample is called Local Outlier Factor. It measures the local deviation of density of a given sample with respect to its neighbors. It is local in that the anomaly score depends on how isolated the object is with respect to the surrounding neighborhood. More precisely, locality is given by k-nearest neighbors, whose distance is used to estimate the local density. By comparing the local density of a sample to the local densities of its neighbors, one can identify samples that have a substantially lower density than their neighbors. These are considered outliers. See :cite:`breunig2000lof` for details.
-
decision_scores_
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type
numpy array
ofshape (n_samples,)
-
threshold_
The threshold is based on
contamination
. It is then_samples * contamination
most abnormal samples indecision_scores_
. The threshold is calculated for generating binary outlier labels.- Type
float
-
labels_
The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying
threshold_
ondecision_scores_
.- Type
int
,either 0
or1
- Parameters
n_neighbors (
int
,optional (default=20)
) – Number of neighbors to use by default for kneighbors queries. If n_neighbors is larger than the number of samples provided, all samples will be used.algorithm (
{'auto', 'ball_tree', 'kd_tree', 'brute'}
, optional) –Algorithm used to compute the nearest neighbors: - ‘ball_tree’ will use BallTree - ‘kd_tree’ will use KDTree - ‘brute’ will use a brute-force search. - ‘auto’ will attempt to decide the most appropriate algorithm
based on the values passed to
fit()
method.Note: fitting on sparse input will override the setting of this parameter, using brute force.
leaf_size (
int
,optional (default=30)
) – Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.metric (
string
orcallable
, default'minkowski'
) –metric used for the distance computation. Any metric from scikit-learn or scipy.spatial.distance can be used. If ‘precomputed’, the training input X is expected to be a distance matrix. If metric is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two arrays as input and return one value indicating the distance between them. This works for Scipy’s metrics, but is less efficient than passing the metric name as a string. Valid values for metric are: - from scikit-learn: [‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’,
’manhattan’]
from scipy.spatial.distance: [‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘correlation’, ‘dice’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’]
See the documentation for scipy.spatial.distance for details on these metrics: http://docs.scipy.org/doc/scipy/reference/spatial.distance.html
p (
integer
,optional (default = 2)
) – Parameter for the Minkowski metric from sklearn.metrics.pairwise.pairwise_distances. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used. See http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distancesmetric_params (
dict
,optional (default = None)
) – Additional keyword arguments for the metric function.contamination (
float in (0.
,0.5)
,optional (default=0.1)
) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. When fitting this is used to define the threshold on the decision function.n_jobs (
int
,optional (default = 1)
) – The number of parallel jobs to run for neighbors search. If-1
, then the number of jobs is set to the number of CPU cores. Affects only kneighbors and kneighbors_graph methods.
-
fit
(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None] Fit model with training data. :param *: Container DataFrame. Time series data up to fit.
- Returns
None
- Parameters
timeout – A maximum time this primitive should be fitting during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
A
CallResult
withNone
value.
-
get_params
() → tods.detection_algorithm.PyodLOF.Params Return parameters. :param None:
- Returns
class Params
- Returns
- Return type
An instance of parameters.
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.
- Returns
Container DataFrame 1 marks Outliers, 0 marks normal.
- Parameters
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
set_params
(*, params: tods.detection_algorithm.PyodLOF.Params) → None Set parameters for outlier detection. :param params: class Params
- Returns
None
- Parameters
params – An instance of parameters.
-
set_training_data
(*, inputs: d3m.container.pandas.DataFrame) → None Set training data for outlier detection. :param inputs: Container DataFrame
- Returns
None
- Parameters
inputs – The inputs.
-
tods.detection_algorithm.PyodMoGaal¶
-
class
tods.detection_algorithm.PyodMoGaal.
Mo_GaalPrimitive
(*args, **kwds) Bases:
tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase
Multi-Objective Generative Adversarial Active Learning. MO_GAAL directly generates informative potential outliers to assist the classifier in describing a boundary that can separate outliers from normal data effectively. Moreover, to prevent the generator from falling into the mode collapsing problem, the network structure of SO-GAAL is expanded from a single generator (SO-GAAL) to multiple generators with different objectives (MO-GAAL) to generate a reasonable reference distribution for the whole dataset. Read more in the :cite:`liu2019generative`.
-
decision_scores_
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type
numpy array
ofshape (n_samples,)
-
threshold_
The threshold is based on
contamination
. It is then_samples * contamination
most abnormal samples indecision_scores_
. The threshold is calculated for generating binary outlier labels.- Type
float
-
labels_
The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying
threshold_
ondecision_scores_
.- Type
int
,either 0
or1
- Parameters
contamination (
float in (0.
,0.5)
,optional (default=0.1)
) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.k (
int
,optional (default=10)
) – The number of sub generators.stop_epochs (
int
,optional (default=20)
) – The number of epochs of training.lr_d (
float
,optional (default=0.01)
) – The learn rate of the discriminator.lr_g (
float
,optional (default=0.0001)
) – The learn rate of the generator.decay (
float
,optional (default=1e-6)
) – The decay parameter for SGD.momentum (
float
,optional (default=0.9)
) – The momentum parameter for SGD.
-
fit
(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None] Fit model with training data. :param *: Container DataFrame. Time series data up to fit.
- Returns
None
- Parameters
timeout – A maximum time this primitive should be fitting during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
A
CallResult
withNone
value.
-
get_params
() → tods.detection_algorithm.PyodMoGaal.Params Return parameters. :param None:
- Returns
class Params
- Returns
- Return type
An instance of parameters.
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.
- Returns
Container DataFrame 1 marks Outliers, 0 marks normal.
- Parameters
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
set_params
(*, params: tods.detection_algorithm.PyodMoGaal.Params) → None Set parameters for outlier detection. :param params: class Params
- Returns
None
- Parameters
params – An instance of parameters.
-
set_training_data
(*, inputs: d3m.container.pandas.DataFrame) → None Set training data for outlier detection. :param inputs: Container DataFrame
- Returns
None
- Parameters
inputs – The inputs.
-
tods.detection_algorithm.PyodOCSVM¶
-
class
tods.detection_algorithm.PyodOCSVM.
OCSVMPrimitive
(*args, **kwds) Bases:
tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase
Wrapper of scikit-learn one-class SVM Class with more functionalities. Unsupervised Outlier Detection. Estimate the support of a high-dimensional distribution. The implementation is based on libsvm. See http://scikit-learn.org/stable/modules/svm.html#svm-outlier-detection and :cite:`scholkopf2001estimating`.
-
decision_scores_
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type
numpy array
ofshape (n_samples,)
-
threshold_
The threshold is based on
contamination
. It is then_samples * contamination
most abnormal samples indecision_scores_
. The threshold is calculated for generating binary outlier labels.- Type
float
-
labels_
The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying
threshold_
ondecision_scores_
.- Type
int
,either 0
or1
- Parameters
kernel (
string
,optional (default=``
’rbf’``)
) – Specifies the kernel type to be used in the algorithm. It must be one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ or a callable. If none is given, ‘rbf’ will be used. If a callable is given it is used to precompute the kernel matrix.nu (
float
, optional) – An upper bound on the fraction of training errors and a lower bound of the fraction of support vectors. Should be in the interval (0, 1]. By default 0.5 will be taken.degree (
int
,optional (default=3)
) – Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels.gamma (
float
,optional (default=``
’auto’``)
) – Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’. If gamma is ‘auto’ then 1/n_features will be used instead.coef0 (
float
,optional (default=0.0)
) – Independent term in kernel function. It is only significant in ‘poly’ and ‘sigmoid’.tol (
float
, optional) – Tolerance for stopping criterion.shrinking (
bool
, optional) – Whether to use the shrinking heuristic.cache_size (
float
, optional) – Specify the size of the kernel cache (in MB).verbose (
bool
, default:False
) – Enable verbose output. Note that this setting takes advantage of a per-process runtime setting in libsvm that, if enabled, may not work properly in a multithreaded context.max_iter (
int
,optional (default=-1)
) – Hard limit on iterations within solver, or -1 for no limit.
-
fit
(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None] Fit model with training data. :param *: Container DataFrame. Time series data up to fit.
- Returns
None
- Parameters
timeout – A maximum time this primitive should be fitting during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
A
CallResult
withNone
value.
-
get_params
() → tods.detection_algorithm.PyodOCSVM.Params Return parameters. :param None:
- Returns
class Params
- Returns
- Return type
An instance of parameters.
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.
- Returns
Container DataFrame 1 marks Outliers, 0 marks normal.
- Parameters
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
set_params
(*, params: tods.detection_algorithm.PyodOCSVM.Params) → None Set parameters for outlier detection. :param params: class Params
- Returns
None
- Parameters
params – An instance of parameters.
-
set_training_data
(*, inputs: d3m.container.pandas.DataFrame) → None Set training data for outlier detection. :param inputs: Container DataFrame
- Returns
None
- Parameters
inputs – The inputs.
-
tods.detection_algorithm.PyodSOD¶
-
class
tods.detection_algorithm.PyodSOD.
SODPrimitive
(*args, **kwds) Bases:
tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase
Subspace outlier detection (SOD) schema aims to detect outlier in varying subspaces of a high dimensional feature space. For each data object, SOD explores the axis-parallel subspace spanned by the data object’s neighbors and determines how much the object deviates from the neighbors in this subspace. See :cite:`kriegel2009outlier` for details.
-
decision_scores_
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type
numpy array
ofshape (n_samples,)
-
threshold_
The threshold is based on
contamination
. It is then_samples * contamination
most abnormal samples indecision_scores_
. The threshold is calculated for generating binary outlier labels.- Type
float
-
labels_
The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying
threshold_
ondecision_scores_
.- Type
int
,either 0
or1
- Parameters
n_neighbors (
int
,optional (default=20)
) – Number of neighbors to use by default for k neighbors queries.ref_set (
int
,optional (default=10)
) – specifies the number of shared nearest neighbors to create the reference set. Note that ref_set must be smaller than n_neighbors.alpha (
float in (0.
,1.)
,optional (default=0.8)
) – specifies the lower limit for selecting subspace. 0.8 is set as default as suggested in the original paper.contamination (
float in (0.
,0.5)
,optional (default=0.1)
) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.
-
fit
(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None] Fit model with training data. :param *: Container DataFrame. Time series data up to fit.
- Returns
None
- Parameters
timeout – A maximum time this primitive should be fitting during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
A
CallResult
withNone
value.
-
get_params
() → tods.detection_algorithm.PyodSOD.Params Return parameters. :param None:
- Returns
class Params
- Returns
- Return type
An instance of parameters.
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.
- Returns
Container DataFrame 1 marks Outliers, 0 marks normal.
- Parameters
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
set_params
(*, params: tods.detection_algorithm.PyodSOD.Params) → None Set parameters for outlier detection. :param params: class Params
- Returns
None
- Parameters
params – An instance of parameters.
-
set_training_data
(*, inputs: d3m.container.pandas.DataFrame) → None Set training data for outlier detection. :param inputs: Container DataFrame
- Returns
None
- Parameters
inputs – The inputs.
-
tods.detection_algorithm.PyodSoGaal¶
-
class
tods.detection_algorithm.PyodSoGaal.
So_GaalPrimitive
(*args, **kwds) Bases:
tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase
Single-Objective Generative Adversarial Active Learning. SO-GAAL directly generates informative potential outliers to assist the classifier in describing a boundary that can separate outliers from normal data effectively. Moreover, to prevent the generator from falling into the mode collapsing problem, the network structure of SO-GAAL is expanded from a single generator (SO-GAAL) to multiple generators with different objectives (MO-GAAL) to generate a reasonable reference distribution for the whole dataset. Read more in the :cite:`liu2019generative`.
-
decision_scores_
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type
numpy array
ofshape (n_samples,)
-
threshold_
The threshold is based on
contamination
. It is then_samples * contamination
most abnormal samples indecision_scores_
. The threshold is calculated for generating binary outlier labels.- Type
float
-
labels_
The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying
threshold_
ondecision_scores_
.- Type
int
,either 0
or1
- Parameters
contamination (
float in (0.
,0.5)
,optional (default=0.1)
) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.stop_epochs (
int
,optional (default=20)
) – The number of epochs of training.lr_d (
float
,optional (default=0.01)
) – The learn rate of the discriminator.lr_g (
float
,optional (default=0.0001)
) – The learn rate of the generator.decay (
float
,optional (default=1e-6)
) – The decay parameter for SGD.momentum (
float
,optional (default=0.9)
) – The momentum parameter for SGD.
-
fit
(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None] Fit model with training data. :param *: Container DataFrame. Time series data up to fit.
- Returns
None
- Parameters
timeout – A maximum time this primitive should be fitting during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
A
CallResult
withNone
value.
-
get_params
() → tods.detection_algorithm.PyodSoGaal.Params Return parameters. :param None:
- Returns
class Params
- Returns
- Return type
An instance of parameters.
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.
- Returns
Container DataFrame 1 marks Outliers, 0 marks normal.
- Parameters
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
set_params
(*, params: tods.detection_algorithm.PyodSoGaal.Params) → None Set parameters for outlier detection. :param params: class Params
- Returns
None
- Parameters
params – An instance of parameters.
-
set_training_data
(*, inputs: d3m.container.pandas.DataFrame) → None Set training data for outlier detection. :param inputs: Container DataFrame
- Returns
None
- Parameters
inputs – The inputs.
-
tods.detection_algorithm.PyodVAE¶
-
class
tods.detection_algorithm.PyodVAE.
VariationalAutoEncoderPrimitive
(*args, **kwds) Bases:
tods.detection_algorithm.UODBasePrimitive.UnsupervisedOutlierDetectorBase
Auto Encoder (AE) is a type of neural networks for learning useful data representations unsupervisedly. Similar to PCA, AE could be used to detect outlying objects in the data by calculating the reconstruction errors. See :cite:`aggarwal2015outlier` Chapter 3 for details.
-
encoding_dim_
The number of neurons in the encoding layer.
- Type
int
-
compression_rate_
The ratio between the original feature and the number of neurons in the encoding layer.
- Type
float
-
model_
The underlying AutoEncoder in Keras.
- Type
Keras Object
-
history_
The AutoEncoder training history.
- Type
Keras Object
-
decision_scores_
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type
numpy array
ofshape (n_samples,)
-
threshold_
The threshold is based on
contamination
. It is then_samples * contamination
most abnormal samples indecision_scores_
. The threshold is calculated for generating binary outlier labels.- Type
float
-
labels_
The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying
threshold_
ondecision_scores_
.- Type
int
,either 0
or1
- Parameters
hidden_neurons (
list
,optional (default=[4
,2
,4])
) – The number of neurons per hidden layers.hidden_activation (
str
,optional (default=``
’relu’``)
) – Activation function to use for hidden layers. All hidden layers are forced to use the same type of activation. See https://keras.io/activations/output_activation (
str
,optional (default=``
’sigmoid’``)
) – Activation function to use for output layer. See https://keras.io/activations/loss (
str
orobj
,optional (default=keras.losses.mean_squared_error)
) – String (name of objective function) or objective function. See https://keras.io/losses/optimizer (
str
,optional (default=``
’adam’``)
) – String (name of optimizer) or optimizer instance. See https://keras.io/optimizers/epochs (
int
,optional (default=100)
) – Number of epochs to train the model.batch_size (
int
,optional (default=32)
) – Number of samples per gradient update.dropout_rate (
float in (0.
,1)
,optional (default=0.2)
) – The dropout to be used across all layers.l2_regularizer (
float in (0.
,1)
,optional (default=0.1)
) – The regularization strength of activity_regularizer applied on each layer. By default, l2 regularizer is used. See https://keras.io/regularizers/validation_size (
float in (0.
,1)
,optional (default=0.1)
) – The percentage of data to be used for validation.preprocessing (
bool
,optional (default=True)
) – If True, apply standardization on the data.verbose (
int
,optional (default=1)
) – Verbosity mode. - 0 = silent - 1 = progress bar - 2 = one line per epoch. For verbosity >= 1, model summary may be printed.random_state (
random_state
:int
,RandomState instance
orNone
, optional) – (default=None) If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.contamination (
float in (0.
,0.5)
,optional (default=0.1)
) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. When fitting this is used to define the threshold on the decision function.
-
fit
(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None] Fit model with training data. :param *: Container DataFrame. Time series data up to fit.
- Returns
None
- Parameters
timeout – A maximum time this primitive should be fitting during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
A
CallResult
withNone
value.
-
get_params
() → tods.detection_algorithm.PyodVAE.Params Return parameters. :param None:
- Returns
class Params
- Returns
- Return type
An instance of parameters.
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame. Time series data up to outlier detection.
- Returns
Container DataFrame 1 marks Outliers, 0 marks normal.
- Parameters
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
set_params
(*, params: tods.detection_algorithm.PyodVAE.Params) → None Set parameters for outlier detection. :param params: class Params
- Returns
None
- Parameters
params – An instance of parameters.
-
set_training_data
(*, inputs: d3m.container.pandas.DataFrame) → None Set training data for outlier detection. :param inputs: Container DataFrame
- Returns
None
- Parameters
inputs – The inputs.
-
tods.detection_algorithm.Telemanom¶
tods.detection_algorithm.UODBasePrimitive¶
-
class
tods.detection_algorithm.UODBasePrimitive.
UnsupervisedOutlierDetectorBase
(*args, **kwds) Bases:
tods.common.TODSBasePrimitives.TODSUnsupervisedLearnerPrimitiveBase
A base class for primitives which have to be fitted before they can start producing (useful) outputs from inputs, but they are fitted only on input data.
-
clf_.
decision_scores_
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type
numpy array
ofshape (n_samples,)
-
clf_.
threshold_
For outlier, decision_scores_ more than threshold_. For inlier, decision_scores_ less than threshold_.
- Type
float within (0
,1)
-
clf_.
labels_
The binary labels of the training data. 0 stands for inliers. and 1 for outliers/anomalies. It is generated by applying.
threshold_
ondecision_scores_
.- Type
int
,either 0
or1
-
left_inds_
One of the mapping from decision_score to data. For point outlier detection, left_inds_ exactly equals the index of each data point. For Collective outlier detection, left_inds_ equals the start index of each subsequence.
- Type
ndarray,
-
left_inds_
One of the mapping from decision_score to data. For point outlier detection, left_inds_ exactly equals the index of each data point plus 1. For Collective outlier detection, left_inds_ equals the ending index of each subsequence.
- Type
ndarray,
- Parameters
contamination (
float in (0.
,0.5)
,optional (default=0.1)
) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. When fitting this is used to define the threshold on the decision function.
-
get_params
() → tods.detection_algorithm.UODBasePrimitive.Params_ODBase Return parameters. :param None:
- Returns
class Params_ODBase
- Returns
- Return type
An instance of parameters.
-
set_params
(*, params: tods.detection_algorithm.UODBasePrimitive.Params_ODBase) → None Set parameters for outlier detection. :param params: class Params_ODBase
- Returns
None
- Parameters
params – An instance of parameters.
-
abstract
set_training_data
(*, inputs: Inputs) → None Set training data for outlier detection. :param inputs: Container DataFrame
- Returns
None
- Parameters
inputs – The inputs.
-