tods.timeseries_processing module¶

tods.timeseries_processing.HoltSmoothing¶

class tods.timeseries_processing.HoltSmoothing.HoltSmoothingPrimitive(*args, **kwds)

Bases: d3m.primitive_interfaces.unsupervised_learning.UnsupervisedLearnerPrimitiveBase

Normalize samples individually to unit norm.

Each sample (i.e. each row of the data matrix) with at least one non zero component is rescaled independently of other samples so that its norm (l1 or l2) equals one.

This transformer is able to work both with dense numpy arrays and scipy.sparse matrix (use CSR format if you want to avoid the burden of a copy / conversion).

Scaling inputs to unit norms is a common operation for text classification or clustering for instance. For instance the dot product of two l2-normalized TF-IDF vectors is the cosine similarity of the vectors and is the base similarity metric for the Vector Space Model commonly used by the Information Retrieval community.

Read more in the User Guide.

Parameters

norm ('l1', 'l2', or 'max', optional (``’l2’`` by default)) – The norm to use to normalize each non zero sample.
copy (boolean, optional, default True) – set to False to perform inplace row normalization and avoid a copy (if the input is already a numpy array or a scipy.sparse CSR matrix).

Examples

>>> from sklearn.preprocessing import Normalizer
>>> X = [[4, 1, 2, 2],
...      [1, 3, 9, 3],
...      [5, 7, 5, 1]]
>>> transformer = Normalizer().fit(X)  # fit does nothing.
>>> transformer
Normalizer()
>>> transformer.transform(X)
array([[0.8, 0.2, 0.4, 0.4],
       [0.1, 0.3, 0.9, 0.3],
       [0.5, 0.7, 0.5, 0.1]])

Notes

This estimator is stateless (besides constructor parameters), the fit method does nothing but is useful when used in a pipeline.

For a comparison of the different scalers, transformers, and normalizers, see examples/preprocessing/plot_all_scaling.py.

See also

normalize: Equivalent function without the estimator API.

fit(*, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[None]

Fits primitive using inputs and outputs (if any) using currently set training data.

The returned value should be a CallResult object with value set to None.

If fit has already been called in the past on different training data, this method fits it again from scratch using currently set training data.

On the other hand, caller can call fit multiple times on the same training data to continue fitting.

If fit fully fits using provided training data, there is no point in making further calls to this method with same training data, and in fact further calls can be noops, or a primitive can decide to fully refit from scratch.

In the case fitting can continue with same training data (even if it is maybe not reasonable, because the internal metric primitive is using looks like fitting will be degrading), if fit is called again (without setting training data), the primitive has to continue fitting.

Caller can provide timeout information to guide the length of the fitting process. Ideally, a primitive should adapt its fitting process to try to do the best fitting possible inside the time allocated. If this is not possible and the primitive reaches the timeout before fitting, it should raise a TimeoutError exception to signal that fitting was unsuccessful in the given time. The state of the primitive after the exception should be as the method call has never happened and primitive should continue to operate normally. The purpose of timeout is to give opportunity to a primitive to cleanly manage its state instead of interrupting execution from outside. Maintaining stable internal state should have precedence over respecting the timeout (caller can terminate the misbehaving primitive from outside anyway). If a longer timeout would produce different fitting, then CallResult’s has_finished should be set to False.

Some primitives have internal fitting iterations (for example, epochs). For those, caller can provide how many of primitive’s internal iterations should a primitive do before returning. Primitives should make iterations as small as reasonable. If iterations is None, then there is no limit on how many iterations the primitive should do and primitive should choose the best amount of iterations on its own (potentially controlled through hyper-parameters). If iterations is a number, a primitive has to do those number of iterations (even if not reasonable), if possible. timeout should still be respected and potentially less iterations can be done because of that. Primitives with internal iterations should make CallResult contain correct values.

For primitives which do not have internal iterations, any value of iterations means that they should fit fully, respecting only timeout.

Parameters

timeout – A maximum time this primitive should be fitting during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.

Returns

Return type

A CallResult with None value.

get_params() → tods.timeseries_processing.HoltSmoothing.Params

Returns parameters of this primitive.

Parameters are all parameters of the primitive which can potentially change during a life-time of a primitive. Parameters which cannot are passed through constructor.

Parameters should include all data which is necessary to create a new instance of this primitive behaving exactly the same as this instance, when the new instance is created by passing the same parameters to the class constructor and calling set_params.

No other arguments to the method are allowed (except for private arguments).

Returns
Return type: An instance of parameters.

produce(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]

Produce primitive’s best choice of the output for each of the inputs.

The output value should be wrapped inside CallResult object before returning.

In many cases producing an output is a quick operation in comparison with fit, but not all cases are like that. For example, a primitive can start a potentially long optimization process to compute outputs. timeout and iterations can serve as a way for a caller to guide the length of this process.

Ideally, a primitive should adapt its call to try to produce the best outputs possible inside the time allocated. If this is not possible and the primitive reaches the timeout before producing outputs, it should raise a TimeoutError exception to signal that the call was unsuccessful in the given time. The state of the primitive after the exception should be as the method call has never happened and primitive should continue to operate normally. The purpose of timeout is to give opportunity to a primitive to cleanly manage its state instead of interrupting execution from outside. Maintaining stable internal state should have precedence over respecting the timeout (caller can terminate the misbehaving primitive from outside anyway). If a longer timeout would produce different outputs, then CallResult’s has_finished should be set to False.

Some primitives have internal iterations (for example, optimization iterations). For those, caller can provide how many of primitive’s internal iterations should a primitive do before returning outputs. Primitives should make iterations as small as reasonable. If iterations is None, then there is no limit on how many iterations the primitive should do and primitive should choose the best amount of iterations on its own (potentially controlled through hyper-parameters). If iterations is a number, a primitive has to do those number of iterations, if possible. timeout should still be respected and potentially less iterations can be done because of that. Primitives with internal iterations should make CallResult contain correct values.

For primitives which do not have internal iterations, any value of iterations means that they should run fully, respecting only timeout.

If primitive should have been fitted before calling this method, but it has not been, primitive should raise a PrimitiveNotFittedError exception.

Parameters

inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.

Returns

Return type

The outputs of shape [num_inputs, …] wrapped inside CallResult.

set_params(*, params: tods.timeseries_processing.HoltSmoothing.Params) → None

Sets parameters of this primitive.

Parameters are all parameters of the primitive which can potentially change during a life-time of a primitive. Parameters which cannot are passed through constructor.

No other arguments to the method are allowed (except for private arguments).

Parameters: params – An instance of parameters.

set_training_data(*, inputs: d3m.container.pandas.DataFrame) → None

Sets training data of this primitive.

Parameters: inputs – The inputs.

tods.timeseries_processing.HoltWintersExponentialSmoothing¶

class tods.timeseries_processing.HoltWintersExponentialSmoothing.HoltWintersExponentialSmoothingPrimitive(*args, **kwds)

Bases: d3m.primitive_interfaces.unsupervised_learning.UnsupervisedLearnerPrimitiveBase

Normalize samples individually to unit norm.

Each sample (i.e. each row of the data matrix) with at least one non zero component is rescaled independently of other samples so that its norm (l1 or l2) equals one.

This transformer is able to work both with dense numpy arrays and scipy.sparse matrix (use CSR format if you want to avoid the burden of a copy / conversion).

Scaling inputs to unit norms is a common operation for text classification or clustering for instance. For instance the dot product of two l2-normalized TF-IDF vectors is the cosine similarity of the vectors and is the base similarity metric for the Vector Space Model commonly used by the Information Retrieval community.

Read more in the User Guide.

Parameters

norm ('l1', 'l2', or 'max', optional (``’l2’`` by default)) – The norm to use to normalize each non zero sample.
copy (boolean, optional, default True) – set to False to perform inplace row normalization and avoid a copy (if the input is already a numpy array or a scipy.sparse CSR matrix).

Examples

>>> from sklearn.preprocessing import Normalizer
>>> X = [[4, 1, 2, 2],
...      [1, 3, 9, 3],
...      [5, 7, 5, 1]]
>>> transformer = Normalizer().fit(X)  # fit does nothing.
>>> transformer
Normalizer()
>>> transformer.transform(X)
array([[0.8, 0.2, 0.4, 0.4],
       [0.1, 0.3, 0.9, 0.3],
       [0.5, 0.7, 0.5, 0.1]])

Notes

This estimator is stateless (besides constructor parameters), the fit method does nothing but is useful when used in a pipeline.

For a comparison of the different scalers, transformers, and normalizers, see examples/preprocessing/plot_all_scaling.py.

See also

normalize: Equivalent function without the estimator API.