Frequently asked Questions¶

How can I add a run configuration to PyCharm?¶

If you want to use your own pipeline configuration or are tired to type the make commands into the terminal, PyCharm offers you to use run configurations.

In PyCharm, search for the run configuration editor. Usually it can be found in the upper right corner of PyCharm. Otherwise, search the official PyCharm documentation to find it.

Add a new configuration using the Python template. Switch the 'target' from 'script path' to 'module name', and set it to use 'dagster'. The 'Parameters' are what you would type in the terminal after the module name. In case of a training, you can adjust the following code to your needs and add it to the configuration:

job execute -m niceml.dagster.jobs.repository -j job_train -c configs/jobs/<path to your job yaml>

If needed, you can make the '.env' file available for your trainings run. Just add your .env-file and check 'Enable EnvFile' in the 'EnvFile' tab of the run configuration.

Save everything, and you are ready to run and debug your experiment.

Tip

You can also add a run configuration for the dashboard. Just set the module name to streamlit and the parameters to run niceml/dashboard/dashboard.py configs/dashboard/<path to your dashboard yaml>

What is the setup of the trainings pipeline?¶

The training process consists of three steps: training, prediction, and analysis. Each step serves a specific purpose in the training pipeline:

Training: During this step, the model learns from the training data to improve its performance. The model parameters are updated iteratively based on the calculated loss and optimization algorithm.
Prediction: After training, the trained model is used to make predictions on unseen data. This step allows you to evaluate the model's performance on new images and assess its ability to identify objects accurately.
Analysis: Once the prediction step is complete, niceML performs an analysis of the trained model. This may include computing additional metrics, or providing insights into the model's behavior and performance, as well as checking if the training process was successful.

Which information does niceML show, when a training is run?¶

During the training process, niceML provides real-time updates on the progress:

First, niceML will give an overview about the layers of the model being trained. This allows you to inspect the architecture and understand the composition of the model.
A progress bar indicates the number of images that have already been processed by the training.
The loss and other metrics are being calculated and displayed during the training. This allows you to monitor the performance of the model.

Can I use PyTorch with niceML?¶

Currently, no. But we want to implement it in future versions of niceML.

How does the Test Data generated by niceML look like?¶

The niceml generate command allows you to create sample images with numbers randomly placed on them. The numbers represent the objects or regions, the model should be able to identify.

The following five types of files are generated:

Test Images: The test images are created based on thumbnail images provided by niceML, which are augmented in varying degree. The numbers are randomly chosen, colored and placed on them. These images serve as the foundation for training and evaluating your model.
Mask Images: For each test image, a corresponding mask image is generated. The mask image has the same dimensions as the reference image, with the numbers represented in black and the remaining regions in white. These masks help the model to locate the regions which it should learn to identify.
Label Information: For each test image, label information in the form of a JSON file is generated. This file contains the location coordinates of the numbers on each test image. The label information serves as ground-truth data for the model to learn and evaluate its performance.
Number Images: Additionally, niceML generates cropped versions of the test images, focusing only on the regions where the numbers are present. These number images provide isolated representations of the individual objects or regions that the model should be able to identify.
Tabular: niceML converts the number images into a dataframe format. Each row in the dataframe represents the information of one number image. The number shown in the image can be read from the 'label' column. The images are converted from RGB to grayscale and scaled to a fixed size, the default is 10x10 pixels. If the original image is not square, black borders are added to fill the missing space. The dataframe represents the color of each pixel in the 10x10 image, with each column corresponding to one pixel. This dataframe can be useful for testing simple classification tasks with tabular data.

Test image and its mask image:

Augmented test image

Generated label data

augmented.json

{
  "filename": "test-data.json",
  "img_size": {
    "width": 1024,
    "height": 1024
  },
  "labels": [
    {
      "class_name": "3",
      "class_index": null,
      "color": null,
      "active": null,
      "bounding_box": {
        "x_pos": 605,
        "y_pos": 258,
        "width": 53,
        "height": 52
      },
      "score": null,
      "rotation": 133
    },
    {
      "class_name": "0",
      "class_index": null,
      "color": null,
      "active": null,
      "bounding_box": {
        "x_pos": 918,
        "y_pos": 141,
        "width": 41,
        "height": 49
      },
      "score": null,
      "rotation": 328
    },
    {
      "class_name": "1",
      "class_index": null,
      "color": null,
      "active": null,
      "bounding_box": {
        "x_pos": 122,
        "y_pos": 696,
        "width": 43,
        "height": 50
      },
      "score": null,
      "rotation": 198
    }
  ]
}

Number image example:

Thumbnail test image

Tabular data

identifier	label	px_0_0	px_0_1	px_0_2	px_0_3	px_0_4	px_0_5	px_0_6	px_0_7	px_0_8	px_0_9	px_1_0	px_1_1	px_1_2	px_1_3	px_1_4	px_1_5	px_1_6	px_1_7	px_1_8	px_1_9	px_2_0	px_2_1	px_2_2	px_2_3	px_2_4	px_2_5	px_2_6	px_2_7	px_2_8	px_2_9	px_3_0	px_3_1	px_3_2	px_3_3	px_3_4	px_3_5	px_3_6	px_3_7	px_3_8	px_3_9	px_4_0	px_4_1	px_4_2	px_4_3	px_4_4	px_4_5	px_4_6	px_4_7	px_4_8	px_4_9	px_5_0	px_5_1	px_5_2	px_5_3	px_5_4	px_5_5	px_5_6	px_5_7	px_5_8	px_5_9	px_6_0	px_6_1	px_6_2	px_6_3	px_6_4	px_6_5	px_6_6	px_6_7	px_6_8	px_6_9	px_7_0	px_7_1	px_7_2	px_7_3	px_7_4	px_7_5	px_7_6	px_7_7	px_7_8	px_7_9	px_8_0	px_8_1	px_8_2	px_8_3	px_8_4	px_8_5	px_8_6	px_8_7	px_8_8	px_8_9	px_9_0	px_9_1	px_9_2	px_9_3	px_9_4	px_9_5	px_9_6	px_9_7	px_9_8	px_9_9
4af0f654_000_0.png	0	31	31	72	98	95	86	80	85	86	82	81	76	47	29	23	20	19	159	108	77	78	87	79	63	45	49	41	31	27	21	18	19	25	63	97	108	24	35	84	97	90	83	85	87	84	81	78	52	209	209	209	209	207	209	209	209	197	112	191	55	37	36	28	24	22	20	19	20	27	73	97	107	32	61	90	95	88	85	86	79	85	122	169	209	209	209	209	209	209	209	209	209	209	209	209	209	70	23	19	19
74b46882_000_1.png	1	48	41	37	51	51	54	60	57	54	58	57	57	68	79	86	79	64	47	81	125	109	78	74	81	106	88	97	93	108	116	89	56	51	48	56	60	61	60	58	60	64	65	67	69	60	62	51	56	77	88	124	96	138	76	138	132	138	78	88	77	67	75	73	74	76	73	68	61	53	99	74	138	64	131	112	138	138	136	138	124	138	138	138	138	138	138	138	83	78	66	74	78	76	78	79	78	68	138	100	138
d2cfe383_001_0.png	0	32	32	33	34	33	35	36	32	38	33	35	35	34	35	33	37	36	35	35	36	36	36	37	37	37	37	38	38	38	38	38	38	38	37	37	37	37	37	37	37	32	32	32	32	32	33	33	35	34	35	35	34	34	32	36	35	35	34	34	34	34	34	33	60	33	57	38	31	41	33	33	33	33	33	33	33	32	32	32	32	32	32	32	33	32	34	35	34	35	35	37	36	35	35	35	34	35	35	35	35
76d53953_000_0.png	0	13	12	27	29	18	2	37	82	70	17	10	7	11	29	37	59	75	70	71	48	18	19	74	64	57	81	93	74	72	15	20	27	40	30	0	0	58	106	50	22	18	6	5	44	37	55	170	96	66	136	15	99	64	68	64	98	71	73	14	26	29	47	46	11	1	15	81	83	41	36	70	55	208	208	208	208	208	208	208	208	208	208	116	77	76	80	66	15	27	34	37	57	27	0	3	32	93	95	90	208
d365655f_000_3.png	3	113	115	117	114	100	75	58	80	98	100	86	80	82	92	93	85	80	88	91	81	57	40	102	87	59	37	136	45	42	44	46	49	53	54	52	115	115	118	115	101	85	69	70	94	101	105	107	92	88	98	96	90	91	94	81	91	100	195	195	195	195	195	195	90	42	44	46	49	50	50	111	107	110	108	96	82	68	67	84	93	110	120	108	86	89	104	103	89	172	195	195	195	195	195	195	195	195	195	195	106

Data split and cropping

After generating the test data using niceML, the resulting folder structure will have the following subfolders:

number_data: This folder contains the generated images, their corresponding labels, and masks.
number_data_split: In this folder, the generated data is split into three subfolders: train, test, and validation. Each subfolder contains the images, labels, and masks corresponding to the respective dataset split. The image files are sorted together with their corresponding label and mask files.
numbers_cropped_split: This folder contains cropped versions of the generated images, focusing only on the regions where the numbers are present. Each cropped number image is named after the test image it originated from and is placed in the same split folder as the original image (train, test, or validation). This allows for convenient access to the isolated number images for further analysis or processing.
numbers_tabular_data: This folder contains the tabular data of the cropped images. Each split folder contains a separate dataframe with only the information of images of that split.