dfdataset
dfdataset ¶
Module for dfdataset
Classes¶
DfDataset ¶
DfDataset(
id_key,
subset_name,
data_location,
df_filename=ExperimentFilenames.SUBSET_NAME,
shuffle=False,
data_shuffler=None,
dataframe_filters=None,
feature_combiners=None,
extra_key_list=None,
)
Bases: Dataset
, ABC
Dataset for dataframes
Initializes an instance of DfDataset with the given arguments
Parameters:
-
id_key
(str
) –Column name of the id column in your dataframe
-
subset_name
(str
) –Name of the dataset
-
data_location
(Union[dict, LocationConfig]
) –Location of the data used in the data set
-
df_filename
(str
, default:SUBSET_NAME
) –Specify the file name of the dataframe
-
shuffle
(bool
, default:False
) –Flag to shuffle the data or not
-
dataframe_filters
(Optional[List[DataframeFilter]]
, default:None
) –Optional list of dataframe filters to filter the data
Source code in niceml/data/datasets/dfdataset.py
Functions¶
__getitem__ ¶
The getitem function returns the indexed data item.
Args:
index: Specify index
of the item
Returns: An item of input data and target data
Source code in niceml/data/datasets/dfdataset.py
__len__ ¶
The len function is used to determine the number of steps in a dataset.
Returns:
-
–
The number of items
extract_data ¶
The extract_data function takes in a list of indexes and an input dictionary.
The function then extracts the data from self.data
using the key provided by
the input dictionary, and returns it as a numpy array. If the type is categorical,
it will convert it to one-hot encoding.
Parameters:
-
self
–Bind the method to an object
-
cur_indexes
(List[int]
) –Select the rows of data that are needed for the current batch
-
cur_input
(dict
) –Get the key and type of the input
Returns:
-
–
The data of the current key
Source code in niceml/data/datasets/dfdataset.py
get_all_data ¶
get_all_data_info ¶
The get_all_data_info function returns a list of RegDataInfo
objects for
all data in self.data
.
Returns:
-
List[RegDataInfo]
–A list of
RegDataInfo
objects
Source code in niceml/data/datasets/dfdataset.py
get_data ¶
get_data_by_key ¶
Returns all rows of the data, whose 'id_key' matches the 'data_key'.
Parameters:
-
data_key
–Identify the data that is being requested
Returns:
-
–
A dataframe of the rows where the
self.id_key
column matches thedata_key
parameter
Source code in niceml/data/datasets/dfdataset.py
get_data_from_idx_list ¶
returns data with a given index_list
Source code in niceml/data/datasets/dfdataset.py
get_item_count ¶
get_items_per_epoch ¶
get_set_name ¶
The get_set_name function returns the name of the set.
Returns:
-
str
–The name of the data set
initialize ¶
The initialize function should read in all the necessary files from disk and store them as
attributes on this class instance.
This function is called when the data set is created.
It takes in a RegDataDescription
object, which contains information about the
inputs and targets of your dataset. The initialize function should also take in an
ExperimentContext
object, which contains information about where to find your
data on disk. The ExperimentContext
is not used in this class.
Parameters:
-
data_description
(RegDataDescription
) –RegDataDescription: Pass the data description of the dataset to this class
-
exp_context
(ExperimentContext
) –ExperimentContext: Pass the experiment context.
Source code in niceml/data/datasets/dfdataset.py
iter_with_info ¶
The iter_with_info function is a generator that yields the next batch of data, along with some additional information about the batch. The additional information is useful for various diagnostic purposes. The function returns an object of type DataIterator, which has two fields: * 'batch' contains the next batch of data. * 'info' contains additional information about that batch.
Returns:
-
–
A dataiterator object
Source code in niceml/data/datasets/dfdataset.py
on_epoch_end ¶
Execute logic to be performed at the end of an epoch (e.g. shuffling the data)
RegDataInfo
dataclass
¶
Bases: DataInfo
Datainfo for Regression data
Functions¶
__getattr__ ¶
The getattr function is called when an attribute lookup has not found the attribute
in the usual places (i.e. it is not an instance attribute nor is it
found in the class tree of self). In this case it is the value of the key (item
)
in the self.data
dictionary
Parameters:
-
item
–Access the value of a key in
self.data
Returns:
-
Any
–The value of the key (
item
) in theself.data
dictionary
Source code in niceml/data/datasets/dfdataset.py
get_identifier ¶
The get_identifier function returns the dataid of this object.
Returns:
-
str
–The dataid
get_info_dict ¶
The get_info_dict function returns a dictionary containing the dataid and all the data in self.data.
Returns:
-
dict
–A dictionary containing the dataid and all the other key-value pairs in self