Skip to content

dfanalyzer

dfanalyzer

Module for DataframeAnalyzer and DfMetric

Classes

DataframeAnalyzer

DataframeAnalyzer(metrics, parq_file_prefix='')

Bases: ResultAnalyzer

Result analyzer for dataframes

Initialize a result analyzer for dataframes

Source code in niceml/mlcomponents/resultanalyzers/dataframes/dfanalyzer.py
def __init__(
    self,
    metrics: List[DfMetric],
    parq_file_prefix: str = "",
):
    """Initialize a result analyzer for dataframes"""

    super().__init__()
    self.parq_file_prefix = parq_file_prefix
    self.df_metrics: List[DfMetric] = metrics
Functions
__call__
__call__(dataset, exp_context, subset_name)

Calculate values of the metrics in self.metrics and save them into a csv file.

Parameters:

  • dataset

    Dataset of the experiment. Not used in this function

  • exp_context (ExperimentContext) –

    Current ExperimentContext to read and write files

  • subset_name (str) –

    Name the subset

Source code in niceml/mlcomponents/resultanalyzers/dataframes/dfanalyzer.py
def __call__(self, dataset, exp_context: ExperimentContext, subset_name: str):
    """
    Calculate values of the metrics in `self.metrics` and save them into a csv file.

    Args:
        dataset: Dataset of the experiment. Not used in this function
        exp_context: Current `ExperimentContext` to read and write files
        subset_name: Name the subset

    """
    input_file: str = join(
        ExperimentFilenames.PREDICTION_FOLDER,
        f"{self.parq_file_prefix}{subset_name}.parq",
    )
    data_frame = exp_context.read_parquet(input_file)

    output_file = join(
        ExperimentFilenames.ANALYSIS_FOLDER,
        ExperimentFilenames.ANALYSIS_FILE.format(subset_name=subset_name),
    )

    out_dict = {}
    for met in self.df_metrics:
        out_dict.update(met(data_frame, exp_context, subset_name))

    log_str = f"{basename(output_file)}\n" f"========================\n"

    log_str += get_logstr_from_dict(out_dict)
    logging.getLogger(__name__).info(log_str)

    mlflow.log_dict(out_dict, output_file)
    exp_context.write_yaml(out_dict, output_file)
initialize
initialize(data_description)

The initialize function initialized the metrics in self.metrics This function is called once before the first call to the evaluate function. It can be used to initialize any variables that are needed for evaluation. The data_description parameter contains information about the data set, such as number of classes and feature names.

Parameters:

  • data_description (DataDescription) –

    DataDescription used to initialize instances of this class and the metrics

Source code in niceml/mlcomponents/resultanalyzers/dataframes/dfanalyzer.py
def initialize(self, data_description: DataDescription):
    """
    The initialize function initialized the metrics in `self.metrics`
    This function is called once before the first call to the
    evaluate function. It can be used to initialize any variables that are needed
    for evaluation. The data_description parameter contains information about the
    data set, such as number of classes and feature names.

    Args:
        data_description: `DataDescription` used to initialize instances of
                            this class and the metrics
    """
    super().initialize(data_description)
    for cur_metric in self.df_metrics:
        cur_metric.initialize(data_description)

DfMetric

Bases: ABC

metric of a dataframe

Functions
__call__ abstractmethod
__call__(data, exp_context, dataset_name)

Calculates the metric for the given data and returns a dict with the results

Source code in niceml/mlcomponents/resultanalyzers/dataframes/dfanalyzer.py
@abstractmethod
def __call__(
    self, data: pd.DataFrame, exp_context: ExperimentContext, dataset_name: str
) -> dict:
    """Calculates the metric for the given data and returns a dict with the results"""
initialize
initialize(data_description)

Initializes the metric with a data_description

Source code in niceml/mlcomponents/resultanalyzers/dataframes/dfanalyzer.py
def initialize(self, data_description: DataDescription):
    """Initializes the metric with a data_description"""
    self.data_description = data_description

Functions