tods.timeseries_processing module¶
tods.timeseries_processing.HoltSmoothing¶
-
class
tods.timeseries_processing.HoltSmoothing.
HoltSmoothingPrimitive
(*args, **kwds) Bases:
d3m.primitive_interfaces.unsupervised_learning.UnsupervisedLearnerPrimitiveBase
Normalize samples individually to unit norm.
Each sample (i.e. each row of the data matrix) with at least one non zero component is rescaled independently of other samples so that its norm (l1 or l2) equals one.
This transformer is able to work both with dense numpy arrays and scipy.sparse matrix (use CSR format if you want to avoid the burden of a copy / conversion).
Scaling inputs to unit norms is a common operation for text classification or clustering for instance. For instance the dot product of two l2-normalized TF-IDF vectors is the cosine similarity of the vectors and is the base similarity metric for the Vector Space Model commonly used by the Information Retrieval community.
Read more in the User Guide.
- Parameters
norm (
'l1'
,'l2'
, or'max'
,optional (``
’l2’`` by default)) – The norm to use to normalize each non zero sample.copy (
boolean
, optional, defaultTrue
) – set to False to perform inplace row normalization and avoid a copy (if the input is already a numpy array or a scipy.sparse CSR matrix).
Examples
>>> from sklearn.preprocessing import Normalizer >>> X = [[4, 1, 2, 2], ... [1, 3, 9, 3], ... [5, 7, 5, 1]] >>> transformer = Normalizer().fit(X) # fit does nothing. >>> transformer Normalizer() >>> transformer.transform(X) array([[0.8, 0.2, 0.4, 0.4], [0.1, 0.3, 0.9, 0.3], [0.5, 0.7, 0.5, 0.1]])
Notes
This estimator is stateless (besides constructor parameters), the fit method does nothing but is useful when used in a pipeline.
For a comparison of the different scalers, transformers, and normalizers, see examples/preprocessing/plot_all_scaling.py.
See also
normalize
Equivalent function without the estimator API.
-
fit
(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None] Fits primitive using inputs and outputs (if any) using currently set training data.
The returned value should be a
CallResult
object withvalue
set toNone
.If
fit
has already been called in the past on different training data, this method fits it again from scratch using currently set training data.On the other hand, caller can call
fit
multiple times on the same training data to continue fitting.If
fit
fully fits using provided training data, there is no point in making further calls to this method with same training data, and in fact further calls can be noops, or a primitive can decide to fully refit from scratch.In the case fitting can continue with same training data (even if it is maybe not reasonable, because the internal metric primitive is using looks like fitting will be degrading), if
fit
is called again (without setting training data), the primitive has to continue fitting.Caller can provide
timeout
information to guide the length of the fitting process. Ideally, a primitive should adapt its fitting process to try to do the best fitting possible inside the time allocated. If this is not possible and the primitive reaches the timeout before fitting, it should raise aTimeoutError
exception to signal that fitting was unsuccessful in the given time. The state of the primitive after the exception should be as the method call has never happened and primitive should continue to operate normally. The purpose oftimeout
is to give opportunity to a primitive to cleanly manage its state instead of interrupting execution from outside. Maintaining stable internal state should have precedence over respecting thetimeout
(caller can terminate the misbehaving primitive from outside anyway). If a longertimeout
would produce different fitting, thenCallResult
’shas_finished
should be set toFalse
.Some primitives have internal fitting iterations (for example, epochs). For those, caller can provide how many of primitive’s internal iterations should a primitive do before returning. Primitives should make iterations as small as reasonable. If
iterations
isNone
, then there is no limit on how many iterations the primitive should do and primitive should choose the best amount of iterations on its own (potentially controlled through hyper-parameters). Ifiterations
is a number, a primitive has to do those number of iterations (even if not reasonable), if possible.timeout
should still be respected and potentially less iterations can be done because of that. Primitives with internal iterations should makeCallResult
contain correct values.For primitives which do not have internal iterations, any value of
iterations
means that they should fit fully, respecting onlytimeout
.- Parameters
timeout – A maximum time this primitive should be fitting during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
A
CallResult
withNone
value.
-
get_params
() → tods.timeseries_processing.HoltSmoothing.Params Returns parameters of this primitive.
Parameters are all parameters of the primitive which can potentially change during a life-time of a primitive. Parameters which cannot are passed through constructor.
Parameters should include all data which is necessary to create a new instance of this primitive behaving exactly the same as this instance, when the new instance is created by passing the same parameters to the class constructor and calling
set_params
.No other arguments to the method are allowed (except for private arguments).
- Returns
- Return type
An instance of parameters.
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Produce primitive’s best choice of the output for each of the inputs.
The output value should be wrapped inside
CallResult
object before returning.In many cases producing an output is a quick operation in comparison with
fit
, but not all cases are like that. For example, a primitive can start a potentially long optimization process to compute outputs.timeout
anditerations
can serve as a way for a caller to guide the length of this process.Ideally, a primitive should adapt its call to try to produce the best outputs possible inside the time allocated. If this is not possible and the primitive reaches the timeout before producing outputs, it should raise a
TimeoutError
exception to signal that the call was unsuccessful in the given time. The state of the primitive after the exception should be as the method call has never happened and primitive should continue to operate normally. The purpose oftimeout
is to give opportunity to a primitive to cleanly manage its state instead of interrupting execution from outside. Maintaining stable internal state should have precedence over respecting thetimeout
(caller can terminate the misbehaving primitive from outside anyway). If a longertimeout
would produce different outputs, thenCallResult
’shas_finished
should be set toFalse
.Some primitives have internal iterations (for example, optimization iterations). For those, caller can provide how many of primitive’s internal iterations should a primitive do before returning outputs. Primitives should make iterations as small as reasonable. If
iterations
isNone
, then there is no limit on how many iterations the primitive should do and primitive should choose the best amount of iterations on its own (potentially controlled through hyper-parameters). Ifiterations
is a number, a primitive has to do those number of iterations, if possible.timeout
should still be respected and potentially less iterations can be done because of that. Primitives with internal iterations should makeCallResult
contain correct values.For primitives which do not have internal iterations, any value of
iterations
means that they should run fully, respecting onlytimeout
.If primitive should have been fitted before calling this method, but it has not been, primitive should raise a
PrimitiveNotFittedError
exception.- Parameters
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
set_params
(*, params: tods.timeseries_processing.HoltSmoothing.Params) → None Sets parameters of this primitive.
Parameters are all parameters of the primitive which can potentially change during a life-time of a primitive. Parameters which cannot are passed through constructor.
No other arguments to the method are allowed (except for private arguments).
- Parameters
params – An instance of parameters.
-
set_training_data
(*, inputs: d3m.container.pandas.DataFrame) → None Sets training data of this primitive.
- Parameters
inputs – The inputs.
tods.timeseries_processing.HoltWintersExponentialSmoothing¶
-
class
tods.timeseries_processing.HoltWintersExponentialSmoothing.
HoltWintersExponentialSmoothingPrimitive
(*args, **kwds) Bases:
d3m.primitive_interfaces.unsupervised_learning.UnsupervisedLearnerPrimitiveBase
Normalize samples individually to unit norm.
Each sample (i.e. each row of the data matrix) with at least one non zero component is rescaled independently of other samples so that its norm (l1 or l2) equals one.
This transformer is able to work both with dense numpy arrays and scipy.sparse matrix (use CSR format if you want to avoid the burden of a copy / conversion).
Scaling inputs to unit norms is a common operation for text classification or clustering for instance. For instance the dot product of two l2-normalized TF-IDF vectors is the cosine similarity of the vectors and is the base similarity metric for the Vector Space Model commonly used by the Information Retrieval community.
Read more in the User Guide.
- Parameters
norm (
'l1'
,'l2'
, or'max'
,optional (``
’l2’`` by default)) – The norm to use to normalize each non zero sample.copy (
boolean
, optional, defaultTrue
) – set to False to perform inplace row normalization and avoid a copy (if the input is already a numpy array or a scipy.sparse CSR matrix).
Examples
>>> from sklearn.preprocessing import Normalizer >>> X = [[4, 1, 2, 2], ... [1, 3, 9, 3], ... [5, 7, 5, 1]] >>> transformer = Normalizer().fit(X) # fit does nothing. >>> transformer Normalizer() >>> transformer.transform(X) array([[0.8, 0.2, 0.4, 0.4], [0.1, 0.3, 0.9, 0.3], [0.5, 0.7, 0.5, 0.1]])
Notes
This estimator is stateless (besides constructor parameters), the fit method does nothing but is useful when used in a pipeline.
For a comparison of the different scalers, transformers, and normalizers, see examples/preprocessing/plot_all_scaling.py.
See also
normalize
Equivalent function without the estimator API.
-
fit
(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None] Fits primitive using inputs and outputs (if any) using currently set training data.
The returned value should be a
CallResult
object withvalue
set toNone
.If
fit
has already been called in the past on different training data, this method fits it again from scratch using currently set training data.On the other hand, caller can call
fit
multiple times on the same training data to continue fitting.If
fit
fully fits using provided training data, there is no point in making further calls to this method with same training data, and in fact further calls can be noops, or a primitive can decide to fully refit from scratch.In the case fitting can continue with same training data (even if it is maybe not reasonable, because the internal metric primitive is using looks like fitting will be degrading), if
fit
is called again (without setting training data), the primitive has to continue fitting.Caller can provide
timeout
information to guide the length of the fitting process. Ideally, a primitive should adapt its fitting process to try to do the best fitting possible inside the time allocated. If this is not possible and the primitive reaches the timeout before fitting, it should raise aTimeoutError
exception to signal that fitting was unsuccessful in the given time. The state of the primitive after the exception should be as the method call has never happened and primitive should continue to operate normally. The purpose oftimeout
is to give opportunity to a primitive to cleanly manage its state instead of interrupting execution from outside. Maintaining stable internal state should have precedence over respecting thetimeout
(caller can terminate the misbehaving primitive from outside anyway). If a longertimeout
would produce different fitting, thenCallResult
’shas_finished
should be set toFalse
.Some primitives have internal fitting iterations (for example, epochs). For those, caller can provide how many of primitive’s internal iterations should a primitive do before returning. Primitives should make iterations as small as reasonable. If
iterations
isNone
, then there is no limit on how many iterations the primitive should do and primitive should choose the best amount of iterations on its own (potentially controlled through hyper-parameters). Ifiterations
is a number, a primitive has to do those number of iterations (even if not reasonable), if possible.timeout
should still be respected and potentially less iterations can be done because of that. Primitives with internal iterations should makeCallResult
contain correct values.For primitives which do not have internal iterations, any value of
iterations
means that they should fit fully, respecting onlytimeout
.- Parameters
timeout – A maximum time this primitive should be fitting during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
A
CallResult
withNone
value.
-
get_params
() → tods.timeseries_processing.HoltWintersExponentialSmoothing.Params Returns parameters of this primitive.
Parameters are all parameters of the primitive which can potentially change during a life-time of a primitive. Parameters which cannot are passed through constructor.
Parameters should include all data which is necessary to create a new instance of this primitive behaving exactly the same as this instance, when the new instance is created by passing the same parameters to the class constructor and calling
set_params
.No other arguments to the method are allowed (except for private arguments).
- Returns
- Return type
An instance of parameters.
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Produce primitive’s best choice of the output for each of the inputs.
The output value should be wrapped inside
CallResult
object before returning.In many cases producing an output is a quick operation in comparison with
fit
, but not all cases are like that. For example, a primitive can start a potentially long optimization process to compute outputs.timeout
anditerations
can serve as a way for a caller to guide the length of this process.Ideally, a primitive should adapt its call to try to produce the best outputs possible inside the time allocated. If this is not possible and the primitive reaches the timeout before producing outputs, it should raise a
TimeoutError
exception to signal that the call was unsuccessful in the given time. The state of the primitive after the exception should be as the method call has never happened and primitive should continue to operate normally. The purpose oftimeout
is to give opportunity to a primitive to cleanly manage its state instead of interrupting execution from outside. Maintaining stable internal state should have precedence over respecting thetimeout
(caller can terminate the misbehaving primitive from outside anyway). If a longertimeout
would produce different outputs, thenCallResult
’shas_finished
should be set toFalse
.Some primitives have internal iterations (for example, optimization iterations). For those, caller can provide how many of primitive’s internal iterations should a primitive do before returning outputs. Primitives should make iterations as small as reasonable. If
iterations
isNone
, then there is no limit on how many iterations the primitive should do and primitive should choose the best amount of iterations on its own (potentially controlled through hyper-parameters). Ifiterations
is a number, a primitive has to do those number of iterations, if possible.timeout
should still be respected and potentially less iterations can be done because of that. Primitives with internal iterations should makeCallResult
contain correct values.For primitives which do not have internal iterations, any value of
iterations
means that they should run fully, respecting onlytimeout
.If primitive should have been fitted before calling this method, but it has not been, primitive should raise a
PrimitiveNotFittedError
exception.- Parameters
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
set_params
(*, params: tods.timeseries_processing.HoltWintersExponentialSmoothing.Params) → None Sets parameters of this primitive.
Parameters are all parameters of the primitive which can potentially change during a life-time of a primitive. Parameters which cannot are passed through constructor.
No other arguments to the method are allowed (except for private arguments).
- Parameters
params – An instance of parameters.
-
set_training_data
(*, inputs: d3m.container.pandas.DataFrame) → None Sets training data of this primitive.
- Parameters
inputs – The inputs.
tods.timeseries_processing.MovingAverageTransform¶
tods.timeseries_processing.SKAxiswiseScaler¶
-
class
tods.timeseries_processing.SKAxiswiseScaler.
SKAxiswiseScalerPrimitive
(*args, **kwds) Bases:
d3m.primitive_interfaces.transformer.TransformerPrimitiveBase
Standardize a dataset along any axis, and center to the mean and component wise scale to unit variance. See sklearn documentation for more details.
-
metadata
Primitive’s metadata. Available as a class attribute.
-
logger
Primitive’s logger. Available as a class attribute.
-
hyperparams
Hyperparams passed to the constructor.
-
random_seed
Random seed passed to the constructor.
-
docker_containers
A dict mapping Docker image keys from primitive’s metadata to (named) tuples containing container’s address under which the container is accessible by the primitive, and a dict mapping exposed ports to ports on that address.
-
volumes
A dict mapping volume keys from primitive’s metadata to file and directory paths where downloaded and extracted files are available to the primitive.
-
temporary_directory
An absolute path to a temporary directory a primitive can use to store any files for the duration of the current pipeline run phase. Directory is automatically cleaned up after the current pipeline run phase finishes.
- Parameters
axis (
int (0 by default).
) – axis used to compute the means and standard deviations along. If 0, independently standardize each feature, otherwise (if 1) standardize each sample.with_mean (
boolean
,True by default.
) – If True, center the data before scaling.with_std (
boolean
,True by default.
) – If True, scale the data to unit variance (or equivalently, unit standard deviation).
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame. Time series data up to scale.
- Returns
Container DataFrame after scaling.
- Parameters
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
tods.timeseries_processing.SKPowerTransformer¶
-
class
tods.timeseries_processing.SKPowerTransformer.
SKPowerTransformerPrimitive
(*args, **kwds) Bases:
d3m.primitive_interfaces.unsupervised_learning.UnsupervisedLearnerPrimitiveBase
PowerTransformer primitive using sklearn to make data more Gaussian-like. See sklearn documentation for more details.
-
lambda_
The parameters of the power transformation for the selected features.
- Type
numpy array
offloat
,shape (n_features,)
- Parameters
method (
str (``
’yeo-johnson’`` or'box-cox'
)
) – PowerTransforming algorithm to use.standardize (
bool
) – Set to True to apply zero-mean, unit-variance normalization to the transformed output.
-
fit
(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None] Fit model with training data. :param *: Container DataFrame. Time series data up to fit.
- Returns
None
- Parameters
timeout – A maximum time this primitive should be fitting during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
A
CallResult
withNone
value.
-
get_params
() → tods.timeseries_processing.SKPowerTransformer.Params Return parameters. :param None:
- Returns
class Params
- Returns
- Return type
An instance of parameters.
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame. Time series data up to powertransformation
- Returns
Container DataFrame after powertransformation.
- Parameters
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
set_params
(*, params: tods.timeseries_processing.SKPowerTransformer.Params) → None Set parameters for Powertransformer. :param params: class Params
- Returns
None
- Parameters
params – An instance of parameters.
-
set_training_data
(*, inputs: d3m.container.pandas.DataFrame) → None Set training data for Powertransformer. :param inputs: Container DataFrame
- Returns
None
- Parameters
inputs – The inputs.
-
tods.timeseries_processing.SKQuantileTransformer¶
-
class
tods.timeseries_processing.SKQuantileTransformer.
SKQuantileTransformerPrimitive
(*args, **kwds) Bases:
d3m.primitive_interfaces.unsupervised_learning.UnsupervisedLearnerPrimitiveBase
Transform features using quantiles information.
This method transforms the features to follow a uniform or a normal distribution. Therefore, for a given feature, this transformation tends to spread out the most frequent values. It also reduces the impact of (marginal) outliers: this is therefore a robust preprocessing scheme.
The transformation is applied on each feature independently. First an estimate of the cumulative distribution function of a feature is used to map the original values to a uniform distribution. The obtained values are then mapped to the desired output distribution using the associated quantile function. Features values of new/unseen data that fall below or above the fitted range will be mapped to the bounds of the output distribution. Note that this transform is non-linear. It may distort linear correlations between variables measured at the same scale but renders variables measured at different scales more directly comparable.
Read more in the User Guide.
New in version 0.19.
- Parameters
n_quantiles (
int
,optional (default=1000
orn_samples)
) – Number of quantiles to be computed. It corresponds to the number of landmarks used to discretize the cumulative distribution function. If n_quantiles is larger than the number of samples, n_quantiles is set to the number of samples as a larger number of quantiles does not give a better approximation of the cumulative distribution function estimator.output_distribution (
str
,optional (default=``
’uniform’``)
) – Marginal distribution for the transformed data. The choices are ‘uniform’ (default) or ‘normal’.ignore_implicit_zeros (
bool
,optional (default=False)
) – Only applies to sparse matrices. If True, the sparse entries of the matrix are discarded to compute the quantile statistics. If False, these entries are treated as zeros.subsample (
int
,optional (default=1e5)
) – Maximum number of samples used to estimate the quantiles for computational efficiency. Note that the subsampling procedure may differ for value-identical sparse and dense matrices.random_state (
int
,RandomState instance
orNone
,optional (default=None)
) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. Note that this is used by subsampling and smoothing noise.copy (
boolean
, optional,(default=True)
) – Set to False to perform inplace transformation and avoid a copy (if the input is already a numpy array).
-
n_quantiles_
The actual number of quantiles used to discretize the cumulative distribution function.
- Type
integer
-
quantiles_
The values corresponding the quantiles of reference.
- Type
ndarray
,shape (n_quantiles
,n_features)
-
references_
Quantiles of references.
- Type
ndarray
,shape(n_quantiles
,)
Examples
>>> import numpy as np >>> from sklearn.preprocessing import QuantileTransformer >>> rng = np.random.RandomState(0) >>> X = np.sort(rng.normal(loc=0.5, scale=0.25, size=(25, 1)), axis=0) >>> qt = QuantileTransformer(n_quantiles=10, random_state=0) >>> qt.fit_transform(X) array([...])
See also
quantile_transform
Equivalent function without the estimator API.
PowerTransformer
Perform mapping to a normal distribution using a power transform.
StandardScaler
Perform standardization that is faster, but less robust to outliers.
RobustScaler
Perform robust standardization that removes the influence of outliers but does not put outliers and inliers on the same scale.
Notes
NaNs are treated as missing values: disregarded in fit, and maintained in transform.
For a comparison of the different scalers, transformers, and normalizers, see examples/preprocessing/plot_all_scaling.py.
-
fit
(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None] Fit model with training data. :param *: Container DataFrame. Time series data up to fit.
- Returns
None
- Parameters
timeout – A maximum time this primitive should be fitting during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
A
CallResult
withNone
value.
-
get_params
() → tods.timeseries_processing.SKQuantileTransformer.Params Return parameters. :param None:
- Returns
class Params
- Returns
- Return type
An instance of parameters.
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame. Time series data up to Quantile Transform.
- Returns
Container DataFrame after Quantile Transformation.
- Parameters
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
set_params
(*, params: tods.timeseries_processing.SKQuantileTransformer.Params) → None Set parameters for QuantileTransformer. :param params: class Params
- Returns
None
- Parameters
params – An instance of parameters.
-
set_training_data
(*, inputs: d3m.container.pandas.DataFrame) → None Set training data for QuantileTransformer. :param inputs: Container DataFrame
- Returns
None
- Parameters
inputs – The inputs.
tods.timeseries_processing.SKStandardScaler¶
-
class
tods.timeseries_processing.SKStandardScaler.
SKStandardScalerPrimitive
(*args, **kwds) Bases:
d3m.primitive_interfaces.unsupervised_learning.UnsupervisedLearnerPrimitiveBase
Standardize features by removing the mean and scaling to unit variance. See sklearn documentation for more details.
-
scale_
Per feature relative scaling of the data. This is calculated using np.sqrt(var_). Equal to None when with_std=False.
- Type
ndarray
orNone
,shape (n_features,)
-
mean_
The mean value for each feature in the training set. Equal to None when with_mean=False.
- Type
ndarray
orNone
,shape (n_features,)
-
var_
The variance for each feature in the training set. Used to compute scale_. Equal to None when with_std=False.
- Type
ndarray
orNone
,shape (n_features,)
-
n_samples_seen_
The number of samples processed by the estimator for each feature. If there are not missing samples, the n_samples_seen will be an integer, otherwise it will be an array. Will be reset on new calls to fit, but increments across partial_fit calls.
- Type
int
orarray
,shape (n_features,)
- Parameters
with_mean (
bool
) – If True, center the data before scaling. This does not work (and will raise an exception) when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in memory.with_std (
bool
) – If True, scale the data to unit variance (or equivalently, unit standard deviation).
-
fit
(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None] Fit model with training data. :param *: Container DataFrame. Time series data up to fit.
- Returns
None
- Parameters
timeout – A maximum time this primitive should be fitting during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
A
CallResult
withNone
value.
-
get_params
() → tods.timeseries_processing.SKStandardScaler.Params Return parameters. :param None:
- Returns
class Params
- Returns
- Return type
An instance of parameters.
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame. Time series data up to standardlize.
- Returns
Container DataFrame after standardlization.
- Parameters
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
set_params
(*, params: tods.timeseries_processing.SKStandardScaler.Params) → None Set parameters for Standardizer. :param params: class Params
- Returns
None
- Parameters
params – An instance of parameters.
-
set_training_data
(*, inputs: d3m.container.pandas.DataFrame) → None Set training data for Standardizer. :param inputs: Container DataFrame
- Returns
None
- Parameters
inputs – The inputs.
-
tods.timeseries_processing.SimpleExponentialSmoothing¶
-
class
tods.timeseries_processing.SimpleExponentialSmoothing.
SimpleExponentialSmoothingPrimitive
(*args, **kwds) Bases:
d3m.primitive_interfaces.unsupervised_learning.UnsupervisedLearnerPrimitiveBase
Normalize samples individually to unit norm.
Each sample (i.e. each row of the data matrix) with at least one non zero component is rescaled independently of other samples so that its norm (l1 or l2) equals one.
This transformer is able to work both with dense numpy arrays and scipy.sparse matrix (use CSR format if you want to avoid the burden of a copy / conversion).
Scaling inputs to unit norms is a common operation for text classification or clustering for instance. For instance the dot product of two l2-normalized TF-IDF vectors is the cosine similarity of the vectors and is the base similarity metric for the Vector Space Model commonly used by the Information Retrieval community.
Read more in the User Guide.
- Parameters
norm (
'l1'
,'l2'
, or'max'
,optional (``
’l2’`` by default)) – The norm to use to normalize each non zero sample.copy (
boolean
, optional, defaultTrue
) – set to False to perform inplace row normalization and avoid a copy (if the input is already a numpy array or a scipy.sparse CSR matrix).
Examples
>>> from sklearn.preprocessing import Normalizer >>> X = [[4, 1, 2, 2], ... [1, 3, 9, 3], ... [5, 7, 5, 1]] >>> transformer = Normalizer().fit(X) # fit does nothing. >>> transformer Normalizer() >>> transformer.transform(X) array([[0.8, 0.2, 0.4, 0.4], [0.1, 0.3, 0.9, 0.3], [0.5, 0.7, 0.5, 0.1]])
Notes
This estimator is stateless (besides constructor parameters), the fit method does nothing but is useful when used in a pipeline.
For a comparison of the different scalers, transformers, and normalizers, see examples/preprocessing/plot_all_scaling.py.
See also
normalize
Equivalent function without the estimator API.
-
fit
(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None] Fits primitive using inputs and outputs (if any) using currently set training data.
The returned value should be a
CallResult
object withvalue
set toNone
.If
fit
has already been called in the past on different training data, this method fits it again from scratch using currently set training data.On the other hand, caller can call
fit
multiple times on the same training data to continue fitting.If
fit
fully fits using provided training data, there is no point in making further calls to this method with same training data, and in fact further calls can be noops, or a primitive can decide to fully refit from scratch.In the case fitting can continue with same training data (even if it is maybe not reasonable, because the internal metric primitive is using looks like fitting will be degrading), if
fit
is called again (without setting training data), the primitive has to continue fitting.Caller can provide
timeout
information to guide the length of the fitting process. Ideally, a primitive should adapt its fitting process to try to do the best fitting possible inside the time allocated. If this is not possible and the primitive reaches the timeout before fitting, it should raise aTimeoutError
exception to signal that fitting was unsuccessful in the given time. The state of the primitive after the exception should be as the method call has never happened and primitive should continue to operate normally. The purpose oftimeout
is to give opportunity to a primitive to cleanly manage its state instead of interrupting execution from outside. Maintaining stable internal state should have precedence over respecting thetimeout
(caller can terminate the misbehaving primitive from outside anyway). If a longertimeout
would produce different fitting, thenCallResult
’shas_finished
should be set toFalse
.Some primitives have internal fitting iterations (for example, epochs). For those, caller can provide how many of primitive’s internal iterations should a primitive do before returning. Primitives should make iterations as small as reasonable. If
iterations
isNone
, then there is no limit on how many iterations the primitive should do and primitive should choose the best amount of iterations on its own (potentially controlled through hyper-parameters). Ifiterations
is a number, a primitive has to do those number of iterations (even if not reasonable), if possible.timeout
should still be respected and potentially less iterations can be done because of that. Primitives with internal iterations should makeCallResult
contain correct values.For primitives which do not have internal iterations, any value of
iterations
means that they should fit fully, respecting onlytimeout
.- Parameters
timeout – A maximum time this primitive should be fitting during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
A
CallResult
withNone
value.
-
get_params
() → tods.timeseries_processing.SimpleExponentialSmoothing.Params Returns parameters of this primitive.
Parameters are all parameters of the primitive which can potentially change during a life-time of a primitive. Parameters which cannot are passed through constructor.
Parameters should include all data which is necessary to create a new instance of this primitive behaving exactly the same as this instance, when the new instance is created by passing the same parameters to the class constructor and calling
set_params
.No other arguments to the method are allowed (except for private arguments).
- Returns
- Return type
An instance of parameters.
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Produce primitive’s best choice of the output for each of the inputs.
The output value should be wrapped inside
CallResult
object before returning.In many cases producing an output is a quick operation in comparison with
fit
, but not all cases are like that. For example, a primitive can start a potentially long optimization process to compute outputs.timeout
anditerations
can serve as a way for a caller to guide the length of this process.Ideally, a primitive should adapt its call to try to produce the best outputs possible inside the time allocated. If this is not possible and the primitive reaches the timeout before producing outputs, it should raise a
TimeoutError
exception to signal that the call was unsuccessful in the given time. The state of the primitive after the exception should be as the method call has never happened and primitive should continue to operate normally. The purpose oftimeout
is to give opportunity to a primitive to cleanly manage its state instead of interrupting execution from outside. Maintaining stable internal state should have precedence over respecting thetimeout
(caller can terminate the misbehaving primitive from outside anyway). If a longertimeout
would produce different outputs, thenCallResult
’shas_finished
should be set toFalse
.Some primitives have internal iterations (for example, optimization iterations). For those, caller can provide how many of primitive’s internal iterations should a primitive do before returning outputs. Primitives should make iterations as small as reasonable. If
iterations
isNone
, then there is no limit on how many iterations the primitive should do and primitive should choose the best amount of iterations on its own (potentially controlled through hyper-parameters). Ifiterations
is a number, a primitive has to do those number of iterations, if possible.timeout
should still be respected and potentially less iterations can be done because of that. Primitives with internal iterations should makeCallResult
contain correct values.For primitives which do not have internal iterations, any value of
iterations
means that they should run fully, respecting onlytimeout
.If primitive should have been fitted before calling this method, but it has not been, primitive should raise a
PrimitiveNotFittedError
exception.- Parameters
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
set_params
(*, params: tods.timeseries_processing.SimpleExponentialSmoothing.Params) → None Sets parameters of this primitive.
Parameters are all parameters of the primitive which can potentially change during a life-time of a primitive. Parameters which cannot are passed through constructor.
No other arguments to the method are allowed (except for private arguments).
- Parameters
params – An instance of parameters.
-
set_training_data
(*, inputs: d3m.container.pandas.DataFrame) → None Sets training data of this primitive.
- Parameters
inputs – The inputs.
tods.timeseries_processing.TimeSeriesSeasonalityTrendDecomposition¶
-
class
tods.timeseries_processing.TimeSeriesSeasonalityTrendDecomposition.
TimeSeriesSeasonalityTrendDecompositionPrimitive
(*args, **kwds) Bases:
d3m.primitive_interfaces.transformer.TransformerPrimitiveBase
A primitive to decompose time series in trend , seasonality and residual Decomposition is done based on period(frequency) passed as hyperparameter The columns for which decomposition is done is passed as hyperparameter .Default is all value columns
-
metadata
Primitive’s metadata. Available as a class attribute.
-
logger
Primitive’s logger. Available as a class attribute.
-
hyperparams
Hyperparams passed to the constructor.
-
random_seed
Random seed passed to the constructor.
-
docker_containers
A dict mapping Docker image keys from primitive’s metadata to (named) tuples containing container’s address under which the container is accessible by the primitive, and a dict mapping exposed ports to ports on that address.
-
volumes
A dict mapping volume keys from primitive’s metadata to file and directory paths where downloaded and extracted files are available to the primitive.
-
temporary_directory
An absolute path to a temporary directory a primitive can use to store any files for the duration of the current pipeline run phase. Directory is automatically cleaned up after the current pipeline run phase finishes.
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] - Parameters
inputs – Container DataFrame
timeout – Default
iterations – Default
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
Container DataFrame containing decomposed time series
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-