Frequently asked Questions¶
How can I add a run configuration to PyCharm?¶
If you want to use your own pipeline configuration or are tired to type the make commands into the terminal, PyCharm offers you to use run configurations.
In PyCharm, search for the run configuration editor. Usually it can be found in the upper right corner of PyCharm. Otherwise, search the official PyCharm documentation to find it.
Add a new configuration using the Python template. Switch the 'target' from 'script path' to 'module name', and set it to use 'dagster'. The 'Parameters' are what you would type in the terminal after the module name. In case of a training, you can adjust the following code to your needs and add it to the configuration:
If needed, you can make the '.env' file available for your trainings run. Just add your .env-file and check 'Enable EnvFile' in the 'EnvFile' tab of the run configuration.
Save everything, and you are ready to run and debug your experiment.
Tip
You can also add a run configuration for the dashboard.
Just set the module name to streamlit
and the parameters to run
niceml/dashboard/dashboard.py configs/dashboard/<path to your
dashboard yaml>
What is the setup of the trainings pipeline?¶
The training process consists of three steps: training, prediction, and analysis. Each step serves a specific purpose in the training pipeline:
-
Training: During this step, the model learns from the training data to improve its performance. The model parameters are updated iteratively based on the calculated loss and optimization algorithm.
-
Prediction: After training, the trained model is used to make predictions on unseen data. This step allows you to evaluate the model's performance on new images and assess its ability to identify objects accurately.
-
Analysis: Once the prediction step is complete, niceML performs an analysis of the trained model. This may include computing additional metrics, or providing insights into the model's behavior and performance, as well as checking if the training process was successful.
Which information does niceML show, when a training is run?¶
During the training process, niceML provides real-time updates on the progress:
- First, niceML will give an overview about the layers of the model being trained. This allows you to inspect the architecture and understand the composition of the model.
- A progress bar indicates the number of images that have already been processed by the training.
- The loss and other metrics are being calculated and displayed during the training. This allows you to monitor the performance of the model.
Can I use PyTorch with niceML?¶
Currently, no. But we want to implement it in future versions of niceML.
How does the Test Data generated by niceML look like?¶
The niceml generate
command allows you to create sample images with
numbers randomly placed on them. The numbers represent the objects or
regions, the model should be able to identify.
The following five types of files are generated:
-
Test Images: The test images are created based on thumbnail images provided by niceML, which are augmented in varying degree. The numbers are randomly chosen, colored and placed on them. These images serve as the foundation for training and evaluating your model.
-
Mask Images: For each test image, a corresponding mask image is generated. The mask image has the same dimensions as the reference image, with the numbers represented in black and the remaining regions in white. These masks help the model to locate the regions which it should learn to identify.
-
Label Information: For each test image, label information in the form of a JSON file is generated. This file contains the location coordinates of the numbers on each test image. The label information serves as ground-truth data for the model to learn and evaluate its performance.
-
Number Images: Additionally, niceML generates cropped versions of the test images, focusing only on the regions where the numbers are present. These number images provide isolated representations of the individual objects or regions that the model should be able to identify.
-
Tabular: niceML converts the number images into a dataframe format. Each row in the dataframe represents the information of one number image. The number shown in the image can be read from the 'label' column. The images are converted from RGB to grayscale and scaled to a fixed size, the default is 10x10 pixels. If the original image is not square, black borders are added to fill the missing space. The dataframe represents the color of each pixel in the 10x10 image, with each column corresponding to one pixel. This dataframe can be useful for testing simple classification tasks with tabular data.
Test image and its mask image:
Generated label data
{
"filename": "test-data.json",
"img_size": {
"width": 1024,
"height": 1024
},
"labels": [
{
"class_name": "3",
"class_index": null,
"color": null,
"active": null,
"bounding_box": {
"x_pos": 605,
"y_pos": 258,
"width": 53,
"height": 52
},
"score": null,
"rotation": 133
},
{
"class_name": "0",
"class_index": null,
"color": null,
"active": null,
"bounding_box": {
"x_pos": 918,
"y_pos": 141,
"width": 41,
"height": 49
},
"score": null,
"rotation": 328
},
{
"class_name": "1",
"class_index": null,
"color": null,
"active": null,
"bounding_box": {
"x_pos": 122,
"y_pos": 696,
"width": 43,
"height": 50
},
"score": null,
"rotation": 198
}
]
}
Number image example:
Tabular data
identifier | label | px_0_0 | px_0_1 | px_0_2 | px_0_3 | px_0_4 | px_0_5 | px_0_6 | px_0_7 | px_0_8 | px_0_9 | px_1_0 | px_1_1 | px_1_2 | px_1_3 | px_1_4 | px_1_5 | px_1_6 | px_1_7 | px_1_8 | px_1_9 | px_2_0 | px_2_1 | px_2_2 | px_2_3 | px_2_4 | px_2_5 | px_2_6 | px_2_7 | px_2_8 | px_2_9 | px_3_0 | px_3_1 | px_3_2 | px_3_3 | px_3_4 | px_3_5 | px_3_6 | px_3_7 | px_3_8 | px_3_9 | px_4_0 | px_4_1 | px_4_2 | px_4_3 | px_4_4 | px_4_5 | px_4_6 | px_4_7 | px_4_8 | px_4_9 | px_5_0 | px_5_1 | px_5_2 | px_5_3 | px_5_4 | px_5_5 | px_5_6 | px_5_7 | px_5_8 | px_5_9 | px_6_0 | px_6_1 | px_6_2 | px_6_3 | px_6_4 | px_6_5 | px_6_6 | px_6_7 | px_6_8 | px_6_9 | px_7_0 | px_7_1 | px_7_2 | px_7_3 | px_7_4 | px_7_5 | px_7_6 | px_7_7 | px_7_8 | px_7_9 | px_8_0 | px_8_1 | px_8_2 | px_8_3 | px_8_4 | px_8_5 | px_8_6 | px_8_7 | px_8_8 | px_8_9 | px_9_0 | px_9_1 | px_9_2 | px_9_3 | px_9_4 | px_9_5 | px_9_6 | px_9_7 | px_9_8 | px_9_9 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4af0f654_000_0.png | 0 | 31 | 31 | 72 | 98 | 95 | 86 | 80 | 85 | 86 | 82 | 81 | 76 | 47 | 29 | 23 | 20 | 19 | 159 | 108 | 77 | 78 | 87 | 79 | 63 | 45 | 49 | 41 | 31 | 27 | 21 | 18 | 19 | 25 | 63 | 97 | 108 | 24 | 35 | 84 | 97 | 90 | 83 | 85 | 87 | 84 | 81 | 78 | 52 | 209 | 209 | 209 | 209 | 207 | 209 | 209 | 209 | 197 | 112 | 191 | 55 | 37 | 36 | 28 | 24 | 22 | 20 | 19 | 20 | 27 | 73 | 97 | 107 | 32 | 61 | 90 | 95 | 88 | 85 | 86 | 79 | 85 | 122 | 169 | 209 | 209 | 209 | 209 | 209 | 209 | 209 | 209 | 209 | 209 | 209 | 209 | 209 | 70 | 23 | 19 | 19 |
74b46882_000_1.png | 1 | 48 | 41 | 37 | 51 | 51 | 54 | 60 | 57 | 54 | 58 | 57 | 57 | 68 | 79 | 86 | 79 | 64 | 47 | 81 | 125 | 109 | 78 | 74 | 81 | 106 | 88 | 97 | 93 | 108 | 116 | 89 | 56 | 51 | 48 | 56 | 60 | 61 | 60 | 58 | 60 | 64 | 65 | 67 | 69 | 60 | 62 | 51 | 56 | 77 | 88 | 124 | 96 | 138 | 76 | 138 | 132 | 138 | 78 | 88 | 77 | 67 | 75 | 73 | 74 | 76 | 73 | 68 | 61 | 53 | 99 | 74 | 138 | 64 | 131 | 112 | 138 | 138 | 136 | 138 | 124 | 138 | 138 | 138 | 138 | 138 | 138 | 138 | 83 | 78 | 66 | 74 | 78 | 76 | 78 | 79 | 78 | 68 | 138 | 100 | 138 |
d2cfe383_001_0.png | 0 | 32 | 32 | 33 | 34 | 33 | 35 | 36 | 32 | 38 | 33 | 35 | 35 | 34 | 35 | 33 | 37 | 36 | 35 | 35 | 36 | 36 | 36 | 37 | 37 | 37 | 37 | 38 | 38 | 38 | 38 | 38 | 38 | 38 | 37 | 37 | 37 | 37 | 37 | 37 | 37 | 32 | 32 | 32 | 32 | 32 | 33 | 33 | 35 | 34 | 35 | 35 | 34 | 34 | 32 | 36 | 35 | 35 | 34 | 34 | 34 | 34 | 34 | 33 | 60 | 33 | 57 | 38 | 31 | 41 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 33 | 32 | 34 | 35 | 34 | 35 | 35 | 37 | 36 | 35 | 35 | 35 | 34 | 35 | 35 | 35 | 35 |
76d53953_000_0.png | 0 | 13 | 12 | 27 | 29 | 18 | 2 | 37 | 82 | 70 | 17 | 10 | 7 | 11 | 29 | 37 | 59 | 75 | 70 | 71 | 48 | 18 | 19 | 74 | 64 | 57 | 81 | 93 | 74 | 72 | 15 | 20 | 27 | 40 | 30 | 0 | 0 | 58 | 106 | 50 | 22 | 18 | 6 | 5 | 44 | 37 | 55 | 170 | 96 | 66 | 136 | 15 | 99 | 64 | 68 | 64 | 98 | 71 | 73 | 14 | 26 | 29 | 47 | 46 | 11 | 1 | 15 | 81 | 83 | 41 | 36 | 70 | 55 | 208 | 208 | 208 | 208 | 208 | 208 | 208 | 208 | 208 | 208 | 116 | 77 | 76 | 80 | 66 | 15 | 27 | 34 | 37 | 57 | 27 | 0 | 3 | 32 | 93 | 95 | 90 | 208 |
d365655f_000_3.png | 3 | 113 | 115 | 117 | 114 | 100 | 75 | 58 | 80 | 98 | 100 | 86 | 80 | 82 | 92 | 93 | 85 | 80 | 88 | 91 | 81 | 57 | 40 | 102 | 87 | 59 | 37 | 136 | 45 | 42 | 44 | 46 | 49 | 53 | 54 | 52 | 115 | 115 | 118 | 115 | 101 | 85 | 69 | 70 | 94 | 101 | 105 | 107 | 92 | 88 | 98 | 96 | 90 | 91 | 94 | 81 | 91 | 100 | 195 | 195 | 195 | 195 | 195 | 195 | 90 | 42 | 44 | 46 | 49 | 50 | 50 | 111 | 107 | 110 | 108 | 96 | 82 | 68 | 67 | 84 | 93 | 110 | 120 | 108 | 86 | 89 | 104 | 103 | 89 | 172 | 195 | 195 | 195 | 195 | 195 | 195 | 195 | 195 | 195 | 195 | 106 |
Data split and cropping
After generating the test data using niceML, the resulting folder structure will have the following subfolders:
- number_data: This folder contains the generated images, their corresponding labels, and masks.
- number_data_split: In this folder, the generated data is
split into three subfolders:
train
,test
, andvalidation
. Each subfolder contains the images, labels, and masks corresponding to the respective dataset split. The image files are sorted together with their corresponding label and mask files. - numbers_cropped_split: This folder contains cropped
versions of the generated images, focusing only on the regions
where the numbers are present. Each cropped number image is
named after the test image it originated from and is placed in
the same split folder as the original image (
train
,test
, orvalidation
). This allows for convenient access to the isolated number images for further analysis or processing. - numbers_tabular_data: This folder contains the tabular data of the cropped images. Each split folder contains a separate dataframe with only the information of images of that split.