This page contains answers to frequently asked questions about GaNDLF.
The usage guide provides a good starting point for you to understand the application of GaNDLF. If you have any questions, please feel free to post a support request, and we will do our best to address it ASAP.
importlib.metadata.PackageNotFoundError: GANDLF
?This means that GaNDLF was not installed correctly. Please ensure you have followed the installation guide properly.
Verify that the installation has been done correctly by running gandlf verify-install
after activating the correct virtual environment. If you are still having issues, please feel free to post a support request, and we will do our best to address it ASAP.
Virtually all of it! For more details, please see the usage guide and our extensive samples. All available options are documented in the config_all_options.yaml file.
Yes, GaNDLF has successfully been run on an SGE cluster and another managed using Kubernetes. Please post a question with more details such as the type of scheduler, and so on, and we will do our best to address it.
Yes, look for logs_*.csv
files in the output directory. It should be arranged in accordance with the cross-validation configuration. Furthermore, it should contain separate files for each data cohort, i.e., training/validation/testing, along with the values for all requested performance metrics, which are defined per problem type.
If you have data_preprocessing
enabled, GaNDLF will load all of the resized images as tensors into memory. Depending on your dataset (resolution, size, number of modalities), this can lead to high RAM usage. To avoid this, you can enable the memory saver mode by enabling the flag memory_save_mode
in the configuration. This will write the resized images into disk.
GaNDLF allows you to resume training from a previous checkpoint in 2 ways:
--resume
CLI parameter in gandlf run
, only the model weights and state dictionary will be preserved, but parameters and data are taken from the new options in the CLI. This is helpful when you are updated the training data or some compatible options in the parameters.
--resume
and --reset
are skipped in gandlf run
, the model weights, state dictionary, and all previously saved information (parameters, training/validation/testing data) is used to resume training.pip install --upgrade gandlf
to get the latest version of GaNDLF, or if you are interested in the nightly builds, then you can run pip install --upgrade --pre gandlf
.git pull
from the base GaNDLF
directory to get the latest master of GaNDLF. Follow this up with pip install -e .
after activating the appropriate virtual environment to ensure the updates get passed through.Please see https://mlcommons.github.io/GaNDLF/usage/#federating-your-model-using-openfl.
Please see https://mlcommons.github.io/GaNDLF/usage/#federating-your-model-evaluation-using-medperf.
0.0.19
or earlier, and I am facing issues after updating to 0.0.20
or later. What should I do?Please read the migration guide to understand the changes that have been made to GaNDLF. If you have any questions, please feel free to post a support request.
This is a safety feature to ensure a tight integration between the configuration used to define a model and the code version used to perform the training. Ensure that you have all requirements satisfied, and then check the version
key in the configuration, and ensure it appropriately matches the output of gandlf run --version
.
global_*
?The classification metrics are based on TorchMetrics [ref], and this is an issue that is documented on their side [ref]. Please use either per_class_weighted
or per_class_average
metrics for final evaluation.
Please post a support request.