Overview of DagsterJobs¶

Job: `job_train`¶

Job for training an experiment

graph LR
  experiment --> train;
  acquire_locks --> train;
  train --> prediction;
  train --> prediction;
  prediction --> analysis;
  prediction --> analysis;
  prediction --> analysis;
  analysis --> release_locks;
  analysis --> exptests;

Op: `acquire_locks`¶

op for acquiring locks

ConfigKey	Description
`filelock_dict`	Abstract base class for file locks.

Op: `experiment`¶

This Op creates the experiment params

ConfigKey	Description
`exp_folder_pattern`	Folder pattern of the experiment. Can use $RUN_ID and $SHORT_ID to make the name unique
`exp_out_location`	Folder to store the experiments

Op: `train`¶

DagsterOp that trains the model

ConfigKey	Description
`data_description`	This class is used to describe the data. E.g. how big the input image size is, or what target classes are used.
`data_train`	Dataset to load, transform, shuffle the data before training
`data_validation`	Dataset to load, transform, shuffle the data before training
`exp_initializer`	This class creates the first folder and files for an experiment
`learner`	Wrapper to do the training
`model`	ABC for model factories. Used to create the model before training
`remove_key_list`	These key are removed from any config recursively before it is saved.
`train_params`	TrainParams are used to select the amount of steps and epochs for training

Op: `prediction`¶

Dagster op to predict the stored model with the given datasets

ConfigKey	Description
`datasets`	Dataset to load, transform, shuffle the data before training
`model_loader`	Callable that loads models
`prediction_function`	Abstract class for prediction functions
`prediction_handler`	Abstract PredictionHandler class to implement your own prediction handler
`prediction_steps`	If None the whole datasets are processed. Otherwise only `prediction_steps` are evaluated.
`remove_key_list`	These key are removed from any config recursively before it is saved.

Op: `analysis`¶

This dagster op analysis the previous predictions applied by the model

ConfigKey	Description
`remove_key_list`	These key are removed from any config recursively before it is saved.
`result_analyzer`	After the prediction is done all data can be analyzed with a specific implementation of the ResultAnalyzer

Op: `release_locks`¶

op for releasing locks

ConfigKey	Description

Op: `exptests`¶

op to run the experiment tests

ConfigKey	Description
`remove_key_list`	These key are removed from any config recursively before it is saved.
`tests`	Class to execute a list of ExperimentTests

Job: `job_eval`¶

Job for evaluating experiment

graph LR
  localize_experiment --> eval_copy_exp;
  eval_copy_exp --> prediction;
  acquire_locks --> prediction;
  prediction --> analysis;
  prediction --> analysis;
  prediction --> analysis;
  analysis --> release_locks;
  analysis --> exptests;

Op: `acquire_locks`¶

op for acquiring locks

ConfigKey	Description
`filelock_dict`	Abstract base class for file locks.

Op: `localize_experiment`¶

This op localizes the experiment and returns the experiment context

ConfigKey	Description
`existing_experiment`	Used to define the experiment id. This is an alpha numeric str with the lenth of 4
`exp_folder_pattern`	Unused. Only required due to easier configuration
`exp_out_location`	Folder to store the experiments

Op: `eval_copy_exp`¶

Copy experiment from one to another.

ConfigKey	Description
`description`	Description of the experiment. Replaces the training description
`exp_folder_pattern`	Folder pattern of the experiment. Can use $RUN_ID and $SHORT_ID to make the name unique
`exp_out_location`	Folder to store the experiments

Op: `prediction`¶

Dagster op to predict the stored model with the given datasets

ConfigKey	Description
`datasets`	Dataset to load, transform, shuffle the data before training
`model_loader`	Callable that loads models
`prediction_function`	Abstract class for prediction functions
`prediction_handler`	Abstract PredictionHandler class to implement your own prediction handler
`prediction_steps`	If None the whole datasets are processed. Otherwise only `prediction_steps` are evaluated.
`remove_key_list`	These key are removed from any config recursively before it is saved.

Op: `analysis`¶

This dagster op analysis the previous predictions applied by the model

ConfigKey	Description
`remove_key_list`	These key are removed from any config recursively before it is saved.
`result_analyzer`	After the prediction is done all data can be analyzed with a specific implementation of the ResultAnalyzer

Op: `release_locks`¶

op for releasing locks

ConfigKey	Description

Op: `exptests`¶

op to run the experiment tests

ConfigKey	Description
`remove_key_list`	These key are removed from any config recursively before it is saved.
`tests`	Class to execute a list of ExperimentTests

Job: `job_copy_exp`¶

Copy an experiment from one location to another

Op: `copy_exp`¶

Copy experiment from one to another.

ConfigKey	Description
`experiment_id`	Alphanumeric 4-char string to identify the experiment
`input_loc_name`	Name of the input location ressource
`output_loc_name`	Name of the output location ressource

Job: `job_data_generation`¶

Job for data generation

graph LR
  data_generation --> split_data;
  split_data --> crop_numbers;
  crop_numbers --> image_to_tabular_data;
  image_to_tabular_data --> df_normalization;

Op: `data_generation`¶

Generates random test image dataset based on a given data_generator

ConfigKey	Description
`data_generator`	Generator of images with numbers for an object detection test dataset

Op: `split_data`¶

Splits the data in input_location into subsets (set_infos)

ConfigKey	Description
`clear_folder`	Flag if the output folder should be cleared before the split.
`max_split`	Maximum split of the name (e.g. 1)
`name_delimiter`	Character to seperate names.
`output_location`	Folder to save the split images
`recursive`	Flag if the input folder should be searched recursively.
`set_infos`	Split information how to split the data
`sub_dir`	Subdirectory to save the split images

Op: `crop_numbers`¶

Crops the numbers from the input images and stores them separately

ConfigKey	Description
`clear_folder`	Flag if the output folder should be cleared before the split
`name_delimiter`	Delimiter used within the filenames
`output_location`	Foldername where the images are stored
`recursive`	Flag if the input folder should be searched recursively
`sub_dir`	Subdirectory to save the split images

Op: `image_to_tabular_data`¶

The image_to_tabular_data function takes in a location of images and converts them to tabular data.

Args: context: OpExecutionContext: Pass in the configuration of the operation input_location: dict: Specify the location of the input data

Returns: The output_location where the parquet files with the table values are stored. The files are still divided into test, train and validation.

ConfigKey	Description
`clear_folder`	Flag if the output folder should be cleared before the split
`name_delimiter`	Delimiter used within the filenames
`output_location`	Foldername where the images are stored
`recursive`	Flag if the input folder should be searched recursively
`sub_dir`	Subdirectory to save the split images
`target_image_size`	Image size to which the images should be scaled
`use_dirs_as_subsets`	Flag if the subdirectories should be used as subset names

Op: `df_normalization`¶

The df_normalization function takes in a dataframe and normalizes the features specified in scalar_feature_keys, categorical_feature_keys and binary_feature_keys. The parameters for the feature keys can be a function that returns the feature keys as a list or a list of feature keys. The function returns a normalized parquet file with all columns normalized specified in feature_keys, as well as an output yaml file containing information about how each feature was normalized. The input_parq_location is where the input parquet files are located, while output_parq_location is where you want to save your new dataframes and norm info yaml.

Args: context: OpExecutionContext: Get the op_config input_location: dict: Specify the location of the input data

Returns: The output_parq_location, which is the location of the normalized parquet files and norm info

ConfigKey	Description
`binary_feature_keys`	Column names to be normalized with binary values (list or function)
`categorical_feature_keys`	Column names to be normalized with categorical values (list or function)
`output_norm_feature_info_file_name`	File name for the file containing the normalization information of the features
`output_parq_location`	Target location for the normalized parq files
`recursive`
`scalar_feature_keys`	Column names to be normalized with scalar values (list or function)

Overview of DagsterJobs¶

Job: job_train¶

Op: acquire_locks¶

Op: experiment¶

Op: train¶

Op: prediction¶

Op: analysis¶

Op: release_locks¶

Op: exptests¶

Job: job_eval¶

Op: acquire_locks¶

Op: localize_experiment¶

Op: eval_copy_exp¶

Op: prediction¶

Op: analysis¶

Op: release_locks¶

Op: exptests¶

Job: job_copy_exp¶

Op: copy_exp¶

Job: job_data_generation¶

Op: data_generation¶

Op: split_data¶

Op: crop_numbers¶

Op: image_to_tabular_data¶

Op: df_normalization¶

Job: `job_train`¶

Op: `acquire_locks`¶

Op: `experiment`¶

Op: `train`¶

Op: `prediction`¶

Op: `analysis`¶

Op: `release_locks`¶

Op: `exptests`¶

Job: `job_eval`¶

Op: `acquire_locks`¶

Op: `localize_experiment`¶

Op: `eval_copy_exp`¶

Op: `prediction`¶

Op: `analysis`¶

Op: `release_locks`¶

Op: `exptests`¶

Job: `job_copy_exp`¶

Op: `copy_exp`¶

Job: `job_data_generation`¶

Op: `data_generation`¶

Op: `split_data`¶

Op: `crop_numbers`¶

Op: `image_to_tabular_data`¶

Op: `df_normalization`¶