tods.data_processing module

tods.data_processing.CategoricalToBinary

tods.data_processing.ColumnFilter

class tods.data_processing.ColumnFilter.ColumnFilterPrimitive(*args, **kwds)

A primitive that filters out columns of wrong shape in DataFrame (specifically columns generated by some features analysis)

metadata

Primitive’s metadata. Available as a class attribute.

logger

Primitive’s logger. Available as a class attribute.

hyperparams

Hyperparams passed to the constructor.

random_seed

Random seed passed to the constructor.

docker_containers

A dict mapping Docker image keys from primitive’s metadata to (named) tuples containing container’s address under which the container is accessible by the primitive, and a dict mapping exposed ports to ports on that address.

volumes

A dict mapping volume keys from primitive’s metadata to file and directory paths where downloaded and extracted files are available to the primitive.

temporary_directory

An absolute path to a temporary directory a primitive can use to store any files for the duration of the current pipeline run phase. Directory is automatically cleaned up after the current pipeline run phase finishes.

produce(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]

Process the testing data. :param inputs: Container DataFrame.

Returns

Container DataFrame after AutoCorrelation.

Parameters
  • inputs – The inputs of shape [num_inputs, …].

  • timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

The outputs of shape [num_inputs, …] wrapped inside CallResult.

tods.data_processing.ContinuityValidation

class tods.data_processing.ContinuityValidation.ContinuityValidationPrimitive(*args, **kwds)

Check whether the seires data is consitent in time interval and provide processing if not consistent.

metadata

Primitive’s metadata. Available as a class attribute.

logger

Primitive’s logger. Available as a class attribute.

hyperparams

Hyperparams passed to the constructor.

random_seed

Random seed passed to the constructor.

docker_containers

A dict mapping Docker image keys from primitive’s metadata to (named) tuples containing container’s address under which the container is accessible by the primitive, and a dict mapping exposed ports to ports on that address.

volumes

A dict mapping volume keys from primitive’s metadata to file and directory paths where downloaded and extracted files are available to the primitive.

temporary_directory

An absolute path to a temporary directory a primitive can use to store any files for the duration of the current pipeline run phase. Directory is automatically cleaned up after the current pipeline run phase finishes.

Parameters
  • continuity_option (enumeration) –

    Choose ablation or imputation.

    ablation: delete some rows and increase timestamp interval to keep the timestamp consistent imputation: linearly imputate the absent timestamps to keep the timestamp consistent

  • interval (float) – Only used in imputation, give the timestamp interval. ‘interval’ should be an integral multiple of ‘timestamp’ or ‘timestamp’ should be an integral multiple of ‘interval’

produce(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]
Parameters
  • inputs – Container DataFrame

  • timeout – Default

  • iterations – Default

  • inputs – The inputs of shape [num_inputs, …].

  • timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Container DataFrame with consistent timestamp

Returns

Return type

The outputs of shape [num_inputs, …] wrapped inside CallResult.

tods.data_processing.DatasetToDataframe

class tods.data_processing.DatasetToDataframe.DatasetToDataFramePrimitive(*args, **kwds)

A primitive which extracts a DataFrame out of a Dataset.

metadata

Primitive’s metadata. Available as a class attribute.

logger

Primitive’s logger. Available as a class attribute.

hyperparams

Hyperparams passed to the constructor.

random_seed

Random seed passed to the constructor.

docker_containers

A dict mapping Docker image keys from primitive’s metadata to (named) tuples containing container’s address under which the container is accessible by the primitive, and a dict mapping exposed ports to ports on that address.

volumes

A dict mapping volume keys from primitive’s metadata to file and directory paths where downloaded and extracted files are available to the primitive.

temporary_directory

An absolute path to a temporary directory a primitive can use to store any files for the duration of the current pipeline run phase. Directory is automatically cleaned up after the current pipeline run phase finishes.

produce(*, inputs: d3m.container.dataset.Dataset, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]

Produce primitive’s best choice of the output for each of the inputs.

The output value should be wrapped inside CallResult object before returning.

In many cases producing an output is a quick operation in comparison with fit, but not all cases are like that. For example, a primitive can start a potentially long optimization process to compute outputs. timeout and iterations can serve as a way for a caller to guide the length of this process.

Ideally, a primitive should adapt its call to try to produce the best outputs possible inside the time allocated. If this is not possible and the primitive reaches the timeout before producing outputs, it should raise a TimeoutError exception to signal that the call was unsuccessful in the given time. The state of the primitive after the exception should be as the method call has never happened and primitive should continue to operate normally. The purpose of timeout is to give opportunity to a primitive to cleanly manage its state instead of interrupting execution from outside. Maintaining stable internal state should have precedence over respecting the timeout (caller can terminate the misbehaving primitive from outside anyway). If a longer timeout would produce different outputs, then CallResult’s has_finished should be set to False.

Some primitives have internal iterations (for example, optimization iterations). For those, caller can provide how many of primitive’s internal iterations should a primitive do before returning outputs. Primitives should make iterations as small as reasonable. If iterations is None, then there is no limit on how many iterations the primitive should do and primitive should choose the best amount of iterations on its own (potentially controlled through hyper-parameters). If iterations is a number, a primitive has to do those number of iterations, if possible. timeout should still be respected and potentially less iterations can be done because of that. Primitives with internal iterations should make CallResult contain correct values.

For primitives which do not have internal iterations, any value of iterations means that they should run fully, respecting only timeout.

If primitive should have been fitted before calling this method, but it has not been, primitive should raise a PrimitiveNotFittedError exception.

Parameters
  • inputs – The inputs of shape [num_inputs, …].

  • timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

The outputs of shape [num_inputs, …] wrapped inside CallResult.

tods.data_processing.DuplicationValidation

class tods.data_processing.DuplicationValidation.DuplicationValidationPrimitive(*args, **kwds)

Check whether the seires data involves duplicate data in one timestamp, and provide processing if the duplication exists.

metadata

Primitive’s metadata. Available as a class attribute.

logger

Primitive’s logger. Available as a class attribute.

hyperparams

Hyperparams passed to the constructor.

random_seed

Random seed passed to the constructor.

docker_containers

A dict mapping Docker image keys from primitive’s metadata to (named) tuples containing container’s address under which the container is accessible by the primitive, and a dict mapping exposed ports to ports on that address.

volumes

A dict mapping volume keys from primitive’s metadata to file and directory paths where downloaded and extracted files are available to the primitive.

temporary_directory

An absolute path to a temporary directory a primitive can use to store any files for the duration of the current pipeline run phase. Directory is automatically cleaned up after the current pipeline run phase finishes.

Parameters

keep_option (enumeration) – When dropping rows, choose to keep the first one or calculate the average

produce(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]
Parameters
  • inputs – Container DataFrame

  • timeout – Default

  • iterations – Default

  • inputs – The inputs of shape [num_inputs, …].

  • timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Container DataFrame after drop the duplication

Returns

Return type

The outputs of shape [num_inputs, …] wrapped inside CallResult.

tods.data_processing.TimeIntervalTransform

class tods.data_processing.TimeIntervalTransform.TimeIntervalTransformPrimitive(*args, **kwds)

A primitive which configures the time interval of the dataframe. Resample the timestamps based on the time_interval passed as hyperparameter

metadata

Primitive’s metadata. Available as a class attribute.

logger

Primitive’s logger. Available as a class attribute.

hyperparams

Hyperparams passed to the constructor.

random_seed

Random seed passed to the constructor.

docker_containers

A dict mapping Docker image keys from primitive’s metadata to (named) tuples containing container’s address under which the container is accessible by the primitive, and a dict mapping exposed ports to ports on that address.

volumes

A dict mapping volume keys from primitive’s metadata to file and directory paths where downloaded and extracted files are available to the primitive.

temporary_directory

An absolute path to a temporary directory a primitive can use to store any files for the duration of the current pipeline run phase. Directory is automatically cleaned up after the current pipeline run phase finishes.

produce(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]

Produce primitive’s best choice of the output for each of the inputs.

The output value should be wrapped inside CallResult object before returning.

In many cases producing an output is a quick operation in comparison with fit, but not all cases are like that. For example, a primitive can start a potentially long optimization process to compute outputs. timeout and iterations can serve as a way for a caller to guide the length of this process.

Ideally, a primitive should adapt its call to try to produce the best outputs possible inside the time allocated. If this is not possible and the primitive reaches the timeout before producing outputs, it should raise a TimeoutError exception to signal that the call was unsuccessful in the given time. The state of the primitive after the exception should be as the method call has never happened and primitive should continue to operate normally. The purpose of timeout is to give opportunity to a primitive to cleanly manage its state instead of interrupting execution from outside. Maintaining stable internal state should have precedence over respecting the timeout (caller can terminate the misbehaving primitive from outside anyway). If a longer timeout would produce different outputs, then CallResult’s has_finished should be set to False.

Some primitives have internal iterations (for example, optimization iterations). For those, caller can provide how many of primitive’s internal iterations should a primitive do before returning outputs. Primitives should make iterations as small as reasonable. If iterations is None, then there is no limit on how many iterations the primitive should do and primitive should choose the best amount of iterations on its own (potentially controlled through hyper-parameters). If iterations is a number, a primitive has to do those number of iterations, if possible. timeout should still be respected and potentially less iterations can be done because of that. Primitives with internal iterations should make CallResult contain correct values.

For primitives which do not have internal iterations, any value of iterations means that they should run fully, respecting only timeout.

If primitive should have been fitted before calling this method, but it has not been, primitive should raise a PrimitiveNotFittedError exception.

Parameters
  • inputs – The inputs of shape [num_inputs, …].

  • timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Return type

The outputs of shape [num_inputs, …] wrapped inside CallResult.

tods.data_processing.TimeStampValidation

class tods.data_processing.TimeStampValidation.TimeStampValidationPrimitive(*args, **kwds)

A primitive to check time series is sorted by time stamp , if not then return sorted time series

metadata

Primitive’s metadata. Available as a class attribute.

logger

Primitive’s logger. Available as a class attribute.

hyperparams

Hyperparams passed to the constructor.

random_seed

Random seed passed to the constructor.

docker_containers

A dict mapping Docker image keys from primitive’s metadata to (named) tuples containing container’s address under which the container is accessible by the primitive, and a dict mapping exposed ports to ports on that address.

volumes

A dict mapping volume keys from primitive’s metadata to file and directory paths where downloaded and extracted files are available to the primitive.

temporary_directory

An absolute path to a temporary directory a primitive can use to store any files for the duration of the current pipeline run phase. Directory is automatically cleaned up after the current pipeline run phase finishes.

produce(*, inputs: d3m.container.pandas.DataFrame, timeout: float = None, iterations: int = None) → d3m.primitive_interfaces.base.CallResult[d3m.container.pandas.DataFrame]
Parameters
  • inputs – Container DataFrame

  • timeout – Default

  • iterations – Default

  • inputs – The inputs of shape [num_inputs, …].

  • timeout – A maximum time this primitive should take to produce outputs during this method call, in seconds.

  • iterations – How many of internal iterations should the primitive do.

Returns

Container DataFrame sorted by Time Stamp

Returns

Return type

The outputs of shape [num_inputs, …] wrapped inside CallResult.