Skip to content

Overview of DagsterJobs

Job: job_train

Job for training an experiment

graph LR
  experiment --> train;
  acquire_locks --> train;
  train --> prediction;
  train --> prediction;
  prediction --> analysis;
  prediction --> analysis;
  prediction --> analysis;
  analysis --> release_locks;
  analysis --> exptests;

Op: acquire_locks

op for acquiring locks

ConfigKey Description
filelock_dict Abstract base class for file locks.

Op: experiment

This Op creates the experiment params

ConfigKey Description
exp_folder_pattern Folder pattern of the experiment. Can use $RUN_ID and $SHORT_ID to make the name unique
exp_out_location Folder to store the experiments

Op: train

DagsterOp that trains the model

ConfigKey Description
data_description This class is used to describe the data. E.g. how big the input image size is, or what target classes are used.
data_train Dataset to load, transform, shuffle the data before training
data_validation Dataset to load, transform, shuffle the data before training
exp_initializer This class creates the first folder and files for an experiment
learner Wrapper to do the training
model ABC for model factories. Used to create the model before training
remove_key_list These key are removed from any config recursively before it is saved.
train_params TrainParams are used to select the amount of steps and epochs for training

Op: prediction

Dagster op to predict the stored model with the given datasets

ConfigKey Description
datasets Dataset to load, transform, shuffle the data before training
model_loader Callable that loads models
prediction_function Abstract class for prediction functions
prediction_handler Abstract PredictionHandler class to implement your own prediction handler
prediction_steps If None the whole datasets are processed. Otherwise only prediction_steps are evaluated.
remove_key_list These key are removed from any config recursively before it is saved.

Op: analysis

This dagster op analysis the previous predictions applied by the model

ConfigKey Description
remove_key_list These key are removed from any config recursively before it is saved.
result_analyzer After the prediction is done all data can be analyzed with a specific implementation of the ResultAnalyzer

Op: release_locks

op for releasing locks

ConfigKey Description

Op: exptests

op to run the experiment tests

ConfigKey Description
remove_key_list These key are removed from any config recursively before it is saved.
tests Class to execute a list of ExperimentTests

Job: job_eval

Job for evaluating experiment

graph LR
  localize_experiment --> eval_copy_exp;
  eval_copy_exp --> prediction;
  acquire_locks --> prediction;
  prediction --> analysis;
  prediction --> analysis;
  prediction --> analysis;
  analysis --> release_locks;
  analysis --> exptests;

Op: acquire_locks

op for acquiring locks

ConfigKey Description
filelock_dict Abstract base class for file locks.

Op: localize_experiment

This op localizes the experiment and returns the experiment context

ConfigKey Description
existing_experiment Used to define the experiment id. This is an alpha numeric str with the lenth of 4
exp_folder_pattern Unused. Only required due to easier configuration
exp_out_location Folder to store the experiments

Op: eval_copy_exp

Copy experiment from one to another.

ConfigKey Description
description Description of the experiment. Replaces the training description
exp_folder_pattern Folder pattern of the experiment. Can use $RUN_ID and $SHORT_ID to make the name unique
exp_out_location Folder to store the experiments

Op: prediction

Dagster op to predict the stored model with the given datasets

ConfigKey Description
datasets Dataset to load, transform, shuffle the data before training
model_loader Callable that loads models
prediction_function Abstract class for prediction functions
prediction_handler Abstract PredictionHandler class to implement your own prediction handler
prediction_steps If None the whole datasets are processed. Otherwise only prediction_steps are evaluated.
remove_key_list These key are removed from any config recursively before it is saved.

Op: analysis

This dagster op analysis the previous predictions applied by the model

ConfigKey Description
remove_key_list These key are removed from any config recursively before it is saved.
result_analyzer After the prediction is done all data can be analyzed with a specific implementation of the ResultAnalyzer

Op: release_locks

op for releasing locks

ConfigKey Description

Op: exptests

op to run the experiment tests

ConfigKey Description
remove_key_list These key are removed from any config recursively before it is saved.
tests Class to execute a list of ExperimentTests

Job: job_copy_exp

Copy an experiment from one location to another

Op: copy_exp

Copy experiment from one to another.

ConfigKey Description
experiment_id Alphanumeric 4-char string to identify the experiment
input_loc_name Name of the input location ressource
output_loc_name Name of the output location ressource

Job: job_data_generation

Job for data generation

graph LR
  data_generation --> split_data;
  split_data --> crop_numbers;
  crop_numbers --> image_to_tabular_data;
  image_to_tabular_data --> df_normalization;

Op: data_generation

Generates random test image dataset based on a given data_generator

ConfigKey Description
data_generator Generator of images with numbers for an object detection test dataset

Op: split_data

Splits the data in input_location into subsets (set_infos)

ConfigKey Description
clear_folder Flag if the output folder should be cleared before the split.
max_split Maximum split of the name (e.g. 1)
name_delimiter Character to seperate names.
output_location Folder to save the split images
recursive Flag if the input folder should be searched recursively.
set_infos Split information how to split the data
sub_dir Subdirectory to save the split images

Op: crop_numbers

Crops the numbers from the input images and stores them separately

ConfigKey Description
clear_folder Flag if the output folder should be cleared before the split
name_delimiter Delimiter used within the filenames
output_location Foldername where the images are stored
recursive Flag if the input folder should be searched recursively
sub_dir Subdirectory to save the split images

Op: image_to_tabular_data

The image_to_tabular_data function takes in a location of images and converts them to tabular data.

Args: context: OpExecutionContext: Pass in the configuration of the operation input_location: dict: Specify the location of the input data

Returns: The output_location where the parquet files with the table values are stored. The files are still divided into test, train and validation.

ConfigKey Description
clear_folder Flag if the output folder should be cleared before the split
name_delimiter Delimiter used within the filenames
output_location Foldername where the images are stored
recursive Flag if the input folder should be searched recursively
sub_dir Subdirectory to save the split images
target_image_size Image size to which the images should be scaled
use_dirs_as_subsets Flag if the subdirectories should be used as subset names

Op: df_normalization

The df_normalization function takes in a dataframe and normalizes the features specified in scalar_feature_keys, categorical_feature_keys and binary_feature_keys. The parameters for the feature keys can be a function that returns the feature keys as a list or a list of feature keys. The function returns a normalized parquet file with all columns normalized specified in feature_keys, as well as an output yaml file containing information about how each feature was normalized. The input_parq_location is where the input parquet files are located, while output_parq_location is where you want to save your new dataframes and norm info yaml.

Args: context: OpExecutionContext: Get the op_config input_location: dict: Specify the location of the input data

Returns: The output_parq_location, which is the location of the normalized parquet files and norm info

ConfigKey Description
binary_feature_keys Column names to be normalized with binary values (list or function)
categorical_feature_keys Column names to be normalized with categorical values (list or function)
output_norm_feature_info_file_name File name for the file containing the normalization information of the features
output_parq_location Target location for the normalized parq files
recursive
scalar_feature_keys Column names to be normalized with scalar values (list or function)