Overview of DagsterJobs¶
Job: job_train
¶
Job for training an experiment
graph LR
experiment --> train;
acquire_locks --> train;
train --> prediction;
train --> prediction;
prediction --> analysis;
prediction --> analysis;
prediction --> analysis;
analysis --> release_locks;
analysis --> exptests;
Op: acquire_locks
¶
op for acquiring locks
ConfigKey | Description |
---|---|
filelock_dict |
Abstract base class for file locks. |
Op: experiment
¶
This Op creates the experiment params
ConfigKey | Description |
---|---|
exp_folder_pattern |
Folder pattern of the experiment. Can use $RUN_ID and $SHORT_ID to make the name unique |
exp_out_location |
Folder to store the experiments |
Op: train
¶
DagsterOp that trains the model
ConfigKey | Description |
---|---|
data_description |
This class is used to describe the data. E.g. how big the input image size is, or what target classes are used. |
data_train |
Dataset to load, transform, shuffle the data before training |
data_validation |
Dataset to load, transform, shuffle the data before training |
exp_initializer |
This class creates the first folder and files for an experiment |
learner |
Wrapper to do the training |
model |
ABC for model factories. Used to create the model before training |
remove_key_list |
These key are removed from any config recursively before it is saved. |
train_params |
TrainParams are used to select the amount of steps and epochs for training |
Op: prediction
¶
Dagster op to predict the stored model with the given datasets
ConfigKey | Description |
---|---|
datasets |
Dataset to load, transform, shuffle the data before training |
model_loader |
Callable that loads models |
prediction_function |
Abstract class for prediction functions |
prediction_handler |
Abstract PredictionHandler class to implement your own prediction handler |
prediction_steps |
If None the whole datasets are processed. Otherwise only prediction_steps are evaluated. |
remove_key_list |
These key are removed from any config recursively before it is saved. |
Op: analysis
¶
This dagster op analysis the previous predictions applied by the model
ConfigKey | Description |
---|---|
remove_key_list |
These key are removed from any config recursively before it is saved. |
result_analyzer |
After the prediction is done all data can be analyzed with a specific implementation of the ResultAnalyzer |
Op: release_locks
¶
op for releasing locks
ConfigKey | Description |
---|---|
Op: exptests
¶
op to run the experiment tests
ConfigKey | Description |
---|---|
remove_key_list |
These key are removed from any config recursively before it is saved. |
tests |
Class to execute a list of ExperimentTests |
Job: job_eval
¶
Job for evaluating experiment
graph LR
localize_experiment --> eval_copy_exp;
eval_copy_exp --> prediction;
acquire_locks --> prediction;
prediction --> analysis;
prediction --> analysis;
prediction --> analysis;
analysis --> release_locks;
analysis --> exptests;
Op: acquire_locks
¶
op for acquiring locks
ConfigKey | Description |
---|---|
filelock_dict |
Abstract base class for file locks. |
Op: localize_experiment
¶
This op localizes the experiment and returns the experiment context
ConfigKey | Description |
---|---|
existing_experiment |
Used to define the experiment id. This is an alpha numeric str with the lenth of 4 |
exp_folder_pattern |
Unused. Only required due to easier configuration |
exp_out_location |
Folder to store the experiments |
Op: eval_copy_exp
¶
Copy experiment from one to another.
ConfigKey | Description |
---|---|
description |
Description of the experiment. Replaces the training description |
exp_folder_pattern |
Folder pattern of the experiment. Can use $RUN_ID and $SHORT_ID to make the name unique |
exp_out_location |
Folder to store the experiments |
Op: prediction
¶
Dagster op to predict the stored model with the given datasets
ConfigKey | Description |
---|---|
datasets |
Dataset to load, transform, shuffle the data before training |
model_loader |
Callable that loads models |
prediction_function |
Abstract class for prediction functions |
prediction_handler |
Abstract PredictionHandler class to implement your own prediction handler |
prediction_steps |
If None the whole datasets are processed. Otherwise only prediction_steps are evaluated. |
remove_key_list |
These key are removed from any config recursively before it is saved. |
Op: analysis
¶
This dagster op analysis the previous predictions applied by the model
ConfigKey | Description |
---|---|
remove_key_list |
These key are removed from any config recursively before it is saved. |
result_analyzer |
After the prediction is done all data can be analyzed with a specific implementation of the ResultAnalyzer |
Op: release_locks
¶
op for releasing locks
ConfigKey | Description |
---|---|
Op: exptests
¶
op to run the experiment tests
ConfigKey | Description |
---|---|
remove_key_list |
These key are removed from any config recursively before it is saved. |
tests |
Class to execute a list of ExperimentTests |
Job: job_copy_exp
¶
Copy an experiment from one location to another
Op: copy_exp
¶
Copy experiment from one to another.
ConfigKey | Description |
---|---|
experiment_id |
Alphanumeric 4-char string to identify the experiment |
input_loc_name |
Name of the input location ressource |
output_loc_name |
Name of the output location ressource |
Job: job_data_generation
¶
Job for data generation
graph LR
data_generation --> split_data;
split_data --> crop_numbers;
crop_numbers --> image_to_tabular_data;
image_to_tabular_data --> df_normalization;
Op: data_generation
¶
Generates random test image dataset based on a given data_generator
ConfigKey | Description |
---|---|
data_generator |
Generator of images with numbers for an object detection test dataset |
Op: split_data
¶
Splits the data in input_location into subsets (set_infos)
ConfigKey | Description |
---|---|
clear_folder |
Flag if the output folder should be cleared before the split. |
max_split |
Maximum split of the name (e.g. 1) |
name_delimiter |
Character to seperate names. |
output_location |
Folder to save the split images |
recursive |
Flag if the input folder should be searched recursively. |
set_infos |
Split information how to split the data |
sub_dir |
Subdirectory to save the split images |
Op: crop_numbers
¶
Crops the numbers from the input images and stores them separately
ConfigKey | Description |
---|---|
clear_folder |
Flag if the output folder should be cleared before the split |
name_delimiter |
Delimiter used within the filenames |
output_location |
Foldername where the images are stored |
recursive |
Flag if the input folder should be searched recursively |
sub_dir |
Subdirectory to save the split images |
Op: image_to_tabular_data
¶
The image_to_tabular_data function takes in a location of images and converts them to tabular data.
Args: context: OpExecutionContext: Pass in the configuration of the operation input_location: dict: Specify the location of the input data
Returns: The output_location where the parquet files with the table values are stored. The files are still divided into test, train and validation.
ConfigKey | Description |
---|---|
clear_folder |
Flag if the output folder should be cleared before the split |
name_delimiter |
Delimiter used within the filenames |
output_location |
Foldername where the images are stored |
recursive |
Flag if the input folder should be searched recursively |
sub_dir |
Subdirectory to save the split images |
target_image_size |
Image size to which the images should be scaled |
use_dirs_as_subsets |
Flag if the subdirectories should be used as subset names |
Op: df_normalization
¶
The df_normalization function takes in a dataframe and normalizes the features
specified in scalar_feature_keys
, categorical_feature_keys
and binary_feature_keys
.
The parameters for the feature keys can be a function that returns the feature keys as
a list or a list of feature keys. The function returns a normalized parquet file with
all columns normalized specified in feature_keys, as well as an output yaml file
containing information about how each feature was normalized. The input_parq_location
is where the input parquet files are located, while output_parq_location is where you
want to save your new dataframes and norm info yaml.
Args: context: OpExecutionContext: Get the op_config input_location: dict: Specify the location of the input data
Returns: The output_parq_location, which is the location of the normalized parquet files and norm info
ConfigKey | Description |
---|---|
binary_feature_keys |
Column names to be normalized with binary values (list or function) |
categorical_feature_keys |
Column names to be normalized with categorical values (list or function) |
output_norm_feature_info_file_name |
File name for the file containing the normalization information of the features |
output_parq_location |
Target location for the normalized parq files |
recursive |
|
scalar_feature_keys |
Column names to be normalized with scalar values (list or function) |