Skip to content

Frequently asked Questions

How can I add a run configuration to PyCharm?

If you want to use your own pipeline configuration or are tired to type the make commands into the terminal, PyCharm offers you to use run configurations.

In PyCharm, search for the run configuration editor. Usually it can be found in the upper right corner of PyCharm. Otherwise, search the official PyCharm documentation to find it.

Add a new configuration using the Python template. Switch the 'target' from 'script path' to 'module name', and set it to use 'dagster'. The 'Parameters' are what you would type in the terminal after the module name. In case of a training, you can adjust the following code to your needs and add it to the configuration:

job execute -m niceml.dagster.jobs.repository -j job_train -c configs/jobs/<path to your job yaml>

If needed, you can make the '.env' file available for your trainings run. Just add your .env-file and check 'Enable EnvFile' in the 'EnvFile' tab of the run configuration.

Save everything, and you are ready to run and debug your experiment.

Tip

You can also add a run configuration for the dashboard. Just set the module name to streamlit and the parameters to run niceml/dashboard/dashboard.py configs/dashboard/<path to your dashboard yaml>

What is the setup of the trainings pipeline?

The training process consists of three steps: training, prediction, and analysis. Each step serves a specific purpose in the training pipeline:

  1. Training: During this step, the model learns from the training data to improve its performance. The model parameters are updated iteratively based on the calculated loss and optimization algorithm.

  2. Prediction: After training, the trained model is used to make predictions on unseen data. This step allows you to evaluate the model's performance on new images and assess its ability to identify objects accurately.

  3. Analysis: Once the prediction step is complete, niceML performs an analysis of the trained model. This may include computing additional metrics, or providing insights into the model's behavior and performance, as well as checking if the training process was successful.

Which information does niceML show, when a training is run?

During the training process, niceML provides real-time updates on the progress:

  • First, niceML will give an overview about the layers of the model being trained. This allows you to inspect the architecture and understand the composition of the model.
  • A progress bar indicates the number of images that have already been processed by the training.
  • The loss and other metrics are being calculated and displayed during the training. This allows you to monitor the performance of the model.

Can I use PyTorch with niceML?

Currently, no. But we want to implement it in future versions of niceML.

How does the Test Data generated by niceML look like?

The niceml generate command allows you to create sample images with numbers randomly placed on them. The numbers represent the objects or regions, the model should be able to identify.

The following five types of files are generated:

  • Test Images: The test images are created based on thumbnail images provided by niceML, which are augmented in varying degree. The numbers are randomly chosen, colored and placed on them. These images serve as the foundation for training and evaluating your model.

  • Mask Images: For each test image, a corresponding mask image is generated. The mask image has the same dimensions as the reference image, with the numbers represented in black and the remaining regions in white. These masks help the model to locate the regions which it should learn to identify.

  • Label Information: For each test image, label information in the form of a JSON file is generated. This file contains the location coordinates of the numbers on each test image. The label information serves as ground-truth data for the model to learn and evaluate its performance.

  • Number Images: Additionally, niceML generates cropped versions of the test images, focusing only on the regions where the numbers are present. These number images provide isolated representations of the individual objects or regions that the model should be able to identify.

  • Tabular: niceML converts the number images into a dataframe format. Each row in the dataframe represents the information of one number image. The number shown in the image can be read from the 'label' column. The images are converted from RGB to grayscale and scaled to a fixed size, the default is 10x10 pixels. If the original image is not square, black borders are added to fill the missing space. The dataframe represents the color of each pixel in the 10x10 image, with each column corresponding to one pixel. This dataframe can be useful for testing simple classification tasks with tabular data.

Test image and its mask image:

Augmented test image Augmented test image Augmented test image Augmented test image

Generated label data
augmented.json
{
  "filename": "test-data.json",
  "img_size": {
    "width": 1024,
    "height": 1024
  },
  "labels": [
    {
      "class_name": "3",
      "class_index": null,
      "color": null,
      "active": null,
      "bounding_box": {
        "x_pos": 605,
        "y_pos": 258,
        "width": 53,
        "height": 52
      },
      "score": null,
      "rotation": 133
    },
    {
      "class_name": "0",
      "class_index": null,
      "color": null,
      "active": null,
      "bounding_box": {
        "x_pos": 918,
        "y_pos": 141,
        "width": 41,
        "height": 49
      },
      "score": null,
      "rotation": 328
    },
    {
      "class_name": "1",
      "class_index": null,
      "color": null,
      "active": null,
      "bounding_box": {
        "x_pos": 122,
        "y_pos": 696,
        "width": 43,
        "height": 50
      },
      "score": null,
      "rotation": 198
    }
  ]
}

Number image example:

Thumbnail test image Thumbnail test image

Tabular data
identifier label px_0_0 px_0_1 px_0_2 px_0_3 px_0_4 px_0_5 px_0_6 px_0_7 px_0_8 px_0_9 px_1_0 px_1_1 px_1_2 px_1_3 px_1_4 px_1_5 px_1_6 px_1_7 px_1_8 px_1_9 px_2_0 px_2_1 px_2_2 px_2_3 px_2_4 px_2_5 px_2_6 px_2_7 px_2_8 px_2_9 px_3_0 px_3_1 px_3_2 px_3_3 px_3_4 px_3_5 px_3_6 px_3_7 px_3_8 px_3_9 px_4_0 px_4_1 px_4_2 px_4_3 px_4_4 px_4_5 px_4_6 px_4_7 px_4_8 px_4_9 px_5_0 px_5_1 px_5_2 px_5_3 px_5_4 px_5_5 px_5_6 px_5_7 px_5_8 px_5_9 px_6_0 px_6_1 px_6_2 px_6_3 px_6_4 px_6_5 px_6_6 px_6_7 px_6_8 px_6_9 px_7_0 px_7_1 px_7_2 px_7_3 px_7_4 px_7_5 px_7_6 px_7_7 px_7_8 px_7_9 px_8_0 px_8_1 px_8_2 px_8_3 px_8_4 px_8_5 px_8_6 px_8_7 px_8_8 px_8_9 px_9_0 px_9_1 px_9_2 px_9_3 px_9_4 px_9_5 px_9_6 px_9_7 px_9_8 px_9_9
4af0f654_000_0.png 0 31 31 72 98 95 86 80 85 86 82 81 76 47 29 23 20 19 159 108 77 78 87 79 63 45 49 41 31 27 21 18 19 25 63 97 108 24 35 84 97 90 83 85 87 84 81 78 52 209 209 209 209 207 209 209 209 197 112 191 55 37 36 28 24 22 20 19 20 27 73 97 107 32 61 90 95 88 85 86 79 85 122 169 209 209 209 209 209 209 209 209 209 209 209 209 209 70 23 19 19
74b46882_000_1.png 1 48 41 37 51 51 54 60 57 54 58 57 57 68 79 86 79 64 47 81 125 109 78 74 81 106 88 97 93 108 116 89 56 51 48 56 60 61 60 58 60 64 65 67 69 60 62 51 56 77 88 124 96 138 76 138 132 138 78 88 77 67 75 73 74 76 73 68 61 53 99 74 138 64 131 112 138 138 136 138 124 138 138 138 138 138 138 138 83 78 66 74 78 76 78 79 78 68 138 100 138
d2cfe383_001_0.png 0 32 32 33 34 33 35 36 32 38 33 35 35 34 35 33 37 36 35 35 36 36 36 37 37 37 37 38 38 38 38 38 38 38 37 37 37 37 37 37 37 32 32 32 32 32 33 33 35 34 35 35 34 34 32 36 35 35 34 34 34 34 34 33 60 33 57 38 31 41 33 33 33 33 33 33 33 32 32 32 32 32 32 32 33 32 34 35 34 35 35 37 36 35 35 35 34 35 35 35 35
76d53953_000_0.png 0 13 12 27 29 18 2 37 82 70 17 10 7 11 29 37 59 75 70 71 48 18 19 74 64 57 81 93 74 72 15 20 27 40 30 0 0 58 106 50 22 18 6 5 44 37 55 170 96 66 136 15 99 64 68 64 98 71 73 14 26 29 47 46 11 1 15 81 83 41 36 70 55 208 208 208 208 208 208 208 208 208 208 116 77 76 80 66 15 27 34 37 57 27 0 3 32 93 95 90 208
d365655f_000_3.png 3 113 115 117 114 100 75 58 80 98 100 86 80 82 92 93 85 80 88 91 81 57 40 102 87 59 37 136 45 42 44 46 49 53 54 52 115 115 118 115 101 85 69 70 94 101 105 107 92 88 98 96 90 91 94 81 91 100 195 195 195 195 195 195 90 42 44 46 49 50 50 111 107 110 108 96 82 68 67 84 93 110 120 108 86 89 104 103 89 172 195 195 195 195 195 195 195 195 195 195 106

Data split and cropping

After generating the test data using niceML, the resulting folder structure will have the following subfolders:

  • number_data: This folder contains the generated images, their corresponding labels, and masks.
  • number_data_split: In this folder, the generated data is split into three subfolders: train, test, and validation. Each subfolder contains the images, labels, and masks corresponding to the respective dataset split. The image files are sorted together with their corresponding label and mask files.
  • numbers_cropped_split: This folder contains cropped versions of the generated images, focusing only on the regions where the numbers are present. Each cropped number image is named after the test image it originated from and is placed in the same split folder as the original image (train, test, or validation). This allows for convenient access to the isolated number images for further analysis or processing.
  • numbers_tabular_data: This folder contains the tabular data of the cropped images. Each split folder contains a separate dataframe with only the information of images of that split.