tods.data_processing module¶
tods.data_processing.CategoricalToBinary¶
tods.data_processing.ColumnFilter¶
-
class
tods.data_processing.ColumnFilter.
ColumnFilterPrimitive
(*args, **kwds) A primitive that filters out columns of wrong shape in DataFrame (specifically columns generated by some features analysis)
-
metadata
Primitive’s metadata. Available as a class attribute.
-
logger
Primitive’s logger. Available as a class attribute.
-
hyperparams
Hyperparams passed to the constructor.
-
random_seed
Random seed passed to the constructor.
-
docker_containers
A dict mapping Docker image keys from primitive’s metadata to (named) tuples containing container’s address under which the container is accessible by the primitive, and a dict mapping exposed ports to ports on that address.
-
volumes
A dict mapping volume keys from primitive’s metadata to file and directory paths where downloaded and extracted files are available to the primitive.
-
temporary_directory
An absolute path to a temporary directory a primitive can use to store any files for the duration of the current pipeline run phase. Directory is automatically cleaned up after the current pipeline run phase finishes.
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Process the testing data. :param inputs: Container DataFrame.
- Returns
Container DataFrame after AutoCorrelation.
- Parameters
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
tods.data_processing.ContinuityValidation¶
-
class
tods.data_processing.ContinuityValidation.
ContinuityValidationPrimitive
(*args, **kwds) Check whether the seires data is consitent in time interval and provide processing if not consistent.
-
metadata
Primitive’s metadata. Available as a class attribute.
-
logger
Primitive’s logger. Available as a class attribute.
-
hyperparams
Hyperparams passed to the constructor.
-
random_seed
Random seed passed to the constructor.
-
docker_containers
A dict mapping Docker image keys from primitive’s metadata to (named) tuples containing container’s address under which the container is accessible by the primitive, and a dict mapping exposed ports to ports on that address.
-
volumes
A dict mapping volume keys from primitive’s metadata to file and directory paths where downloaded and extracted files are available to the primitive.
-
temporary_directory
An absolute path to a temporary directory a primitive can use to store any files for the duration of the current pipeline run phase. Directory is automatically cleaned up after the current pipeline run phase finishes.
- Parameters
continuity_option (
enumeration
) –- Choose ablation or imputation.
ablation: delete some rows and increase timestamp interval to keep the timestamp consistent imputation: linearly imputate the absent timestamps to keep the timestamp consistent
interval (
float
) – Only used in imputation, give the timestamp interval. ‘interval’ should be an integral multiple of ‘timestamp’ or ‘timestamp’ should be an integral multiple of ‘interval’
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] - Parameters
inputs – Container DataFrame
timeout – Default
iterations – Default
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
Container DataFrame with consistent timestamp
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
tods.data_processing.DatasetToDataframe¶
-
class
tods.data_processing.DatasetToDataframe.
DatasetToDataFramePrimitive
(*args, **kwds) A primitive which extracts a DataFrame out of a Dataset.
-
metadata
Primitive’s metadata. Available as a class attribute.
-
logger
Primitive’s logger. Available as a class attribute.
-
hyperparams
Hyperparams passed to the constructor.
-
random_seed
Random seed passed to the constructor.
-
docker_containers
A dict mapping Docker image keys from primitive’s metadata to (named) tuples containing container’s address under which the container is accessible by the primitive, and a dict mapping exposed ports to ports on that address.
-
volumes
A dict mapping volume keys from primitive’s metadata to file and directory paths where downloaded and extracted files are available to the primitive.
-
temporary_directory
An absolute path to a temporary directory a primitive can use to store any files for the duration of the current pipeline run phase. Directory is automatically cleaned up after the current pipeline run phase finishes.
-
produce
(*, inputs: d3m.container.dataset.Dataset, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Produce primitive’s best choice of the output for each of the inputs.
The output value should be wrapped inside
CallResult
object before returning.In many cases producing an output is a quick operation in comparison with
fit
, but not all cases are like that. For example, a primitive can start a potentially long optimization process to compute outputs.timeout
anditerations
can serve as a way for a caller to guide the length of this process.Ideally, a primitive should adapt its call to try to produce the best outputs possible inside the time allocated. If this is not possible and the primitive reaches the timeout before producing outputs, it should raise a
TimeoutError
exception to signal that the call was unsuccessful in the given time. The state of the primitive after the exception should be as the method call has never happened and primitive should continue to operate normally. The purpose oftimeout
is to give opportunity to a primitive to cleanly manage its state instead of interrupting execution from outside. Maintaining stable internal state should have precedence over respecting thetimeout
(caller can terminate the misbehaving primitive from outside anyway). If a longertimeout
would produce different outputs, thenCallResult
’shas_finished
should be set toFalse
.Some primitives have internal iterations (for example, optimization iterations). For those, caller can provide how many of primitive’s internal iterations should a primitive do before returning outputs. Primitives should make iterations as small as reasonable. If
iterations
isNone
, then there is no limit on how many iterations the primitive should do and primitive should choose the best amount of iterations on its own (potentially controlled through hyper-parameters). Ifiterations
is a number, a primitive has to do those number of iterations, if possible.timeout
should still be respected and potentially less iterations can be done because of that. Primitives with internal iterations should makeCallResult
contain correct values.For primitives which do not have internal iterations, any value of
iterations
means that they should run fully, respecting onlytimeout
.If primitive should have been fitted before calling this method, but it has not been, primitive should raise a
PrimitiveNotFittedError
exception.- Parameters
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
tods.data_processing.DuplicationValidation¶
-
class
tods.data_processing.DuplicationValidation.
DuplicationValidationPrimitive
(*args, **kwds) Check whether the seires data involves duplicate data in one timestamp, and provide processing if the duplication exists.
-
metadata
Primitive’s metadata. Available as a class attribute.
-
logger
Primitive’s logger. Available as a class attribute.
-
hyperparams
Hyperparams passed to the constructor.
-
random_seed
Random seed passed to the constructor.
-
docker_containers
A dict mapping Docker image keys from primitive’s metadata to (named) tuples containing container’s address under which the container is accessible by the primitive, and a dict mapping exposed ports to ports on that address.
-
volumes
A dict mapping volume keys from primitive’s metadata to file and directory paths where downloaded and extracted files are available to the primitive.
-
temporary_directory
An absolute path to a temporary directory a primitive can use to store any files for the duration of the current pipeline run phase. Directory is automatically cleaned up after the current pipeline run phase finishes.
- Parameters
keep_option (
enumeration
) – When dropping rows, choose to keep the first one or calculate the average
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] - Parameters
inputs – Container DataFrame
timeout – Default
iterations – Default
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
Container DataFrame after drop the duplication
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
tods.data_processing.TimeIntervalTransform¶
-
class
tods.data_processing.TimeIntervalTransform.
TimeIntervalTransformPrimitive
(*args, **kwds) A primitive which configures the time interval of the dataframe. Resample the timestamps based on the time_interval passed as hyperparameter
-
metadata
Primitive’s metadata. Available as a class attribute.
-
logger
Primitive’s logger. Available as a class attribute.
-
hyperparams
Hyperparams passed to the constructor.
-
random_seed
Random seed passed to the constructor.
-
docker_containers
A dict mapping Docker image keys from primitive’s metadata to (named) tuples containing container’s address under which the container is accessible by the primitive, and a dict mapping exposed ports to ports on that address.
-
volumes
A dict mapping volume keys from primitive’s metadata to file and directory paths where downloaded and extracted files are available to the primitive.
-
temporary_directory
An absolute path to a temporary directory a primitive can use to store any files for the duration of the current pipeline run phase. Directory is automatically cleaned up after the current pipeline run phase finishes.
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] Produce primitive’s best choice of the output for each of the inputs.
The output value should be wrapped inside
CallResult
object before returning.In many cases producing an output is a quick operation in comparison with
fit
, but not all cases are like that. For example, a primitive can start a potentially long optimization process to compute outputs.timeout
anditerations
can serve as a way for a caller to guide the length of this process.Ideally, a primitive should adapt its call to try to produce the best outputs possible inside the time allocated. If this is not possible and the primitive reaches the timeout before producing outputs, it should raise a
TimeoutError
exception to signal that the call was unsuccessful in the given time. The state of the primitive after the exception should be as the method call has never happened and primitive should continue to operate normally. The purpose oftimeout
is to give opportunity to a primitive to cleanly manage its state instead of interrupting execution from outside. Maintaining stable internal state should have precedence over respecting thetimeout
(caller can terminate the misbehaving primitive from outside anyway). If a longertimeout
would produce different outputs, thenCallResult
’shas_finished
should be set toFalse
.Some primitives have internal iterations (for example, optimization iterations). For those, caller can provide how many of primitive’s internal iterations should a primitive do before returning outputs. Primitives should make iterations as small as reasonable. If
iterations
isNone
, then there is no limit on how many iterations the primitive should do and primitive should choose the best amount of iterations on its own (potentially controlled through hyper-parameters). Ifiterations
is a number, a primitive has to do those number of iterations, if possible.timeout
should still be respected and potentially less iterations can be done because of that. Primitives with internal iterations should makeCallResult
contain correct values.For primitives which do not have internal iterations, any value of
iterations
means that they should run fully, respecting onlytimeout
.If primitive should have been fitted before calling this method, but it has not been, primitive should raise a
PrimitiveNotFittedError
exception.- Parameters
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-
tods.data_processing.TimeStampValidation¶
-
class
tods.data_processing.TimeStampValidation.
TimeStampValidationPrimitive
(*args, **kwds) A primitive to check time series is sorted by time stamp , if not then return sorted time series
-
metadata
Primitive’s metadata. Available as a class attribute.
-
logger
Primitive’s logger. Available as a class attribute.
-
hyperparams
Hyperparams passed to the constructor.
-
random_seed
Random seed passed to the constructor.
-
docker_containers
A dict mapping Docker image keys from primitive’s metadata to (named) tuples containing container’s address under which the container is accessible by the primitive, and a dict mapping exposed ports to ports on that address.
-
volumes
A dict mapping volume keys from primitive’s metadata to file and directory paths where downloaded and extracted files are available to the primitive.
-
temporary_directory
An absolute path to a temporary directory a primitive can use to store any files for the duration of the current pipeline run phase. Directory is automatically cleaned up after the current pipeline run phase finishes.
-
produce
(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame] - Parameters
inputs – Container DataFrame
timeout – Default
iterations – Default
inputs – The inputs of shape [num_inputs, …].
timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.
iterations – How many of internal iterations should the primitive do.
- Returns
Container DataFrame sorted by Time Stamp
- Returns
- Return type
The outputs of shape [num_inputs, …] wrapped inside
CallResult
.
-