filechecksumprocessor
filechecksumprocessor ¶
Module for abstract implementation of FileChecksumProcessor
Classes¶
FileChecksumProcessor ¶
FileChecksumProcessor(
input_location,
output_location,
lockfile_location,
lock_file_name="lock.yaml",
debug=False,
process_count=8,
batch_size=16,
)
Bases: ABC
FileChecksumProcessor that can be used as part of a pipeline to process files based on the checksum
FileChecksumProcessor that can be used as part of a pipeline to process files based on the checksum. Args: input_location: Input location of the Processor output_location: Output location of the Processor lockfile_location: Location of the checksum lockfile debug: Flag to activate the debug mode process_count: Amount of processes for parallel execution batch_size: Size of a batch
Source code in niceml/filechecksumprocessors/filechecksumprocessor.py
Functions¶
find_changed_files ¶
Filters input and output files that are not required to be reprocessed
Source code in niceml/filechecksumprocessors/filechecksumprocessor.py
generate_batches
abstractmethod
¶
Generates batches of input and output files and returns them as a list
Source code in niceml/filechecksumprocessors/filechecksumprocessor.py
list_files
abstractmethod
¶
load_checksums ¶
Loads checksums from lockfile
Source code in niceml/filechecksumprocessors/filechecksumprocessor.py
process
abstractmethod
¶
Processes a batch of files. Returns a dict of input and output files with the updated checksums e.g. {"inputs":{"filename":"checksum"}, "outputs":{"filename":"checksum"}}
Source code in niceml/filechecksumprocessors/filechecksumprocessor.py
remove_not_required_outputs ¶
Removes output files that are not required anymore
Source code in niceml/filechecksumprocessors/filechecksumprocessor.py
run_process ¶
Processes files
Source code in niceml/filechecksumprocessors/filechecksumprocessor.py
Functions¶
check_files_changed ¶
Checks if files in a location have changed
Source code in niceml/filechecksumprocessors/filechecksumprocessor.py
remove_deleted_checksums ¶
Takes in a list of input files, a list of output files, and a dictionary containing the checksums for all the files. It returns an updated version of that dictionary with only those keys corresponding to either input or output file names.
Parameters:
-
input_file_list
(List[str]
) –List[str]: Specify the input files
-
output_file_list
(List[str]
) –List[str]: Specify the output files
-
checksum_dict
(Dict[str, Dict[str, str]]
) –Dict[str,Dict[str,str]]: Dictionary with the checksums of the input and output files
Returns:
-
Dict[str, Dict[str, str]]
–A dictionary of dictionaries with the updated checksums of the input and output files