Gym === Intro ----- :code:`perturb-gym` is a special part of :code:`perturb-lib` specifically designed for easily configurable training of perturbation models. It is also a convenient tool for logging results and training details, as well as for aggregating results in a readable and processable format. Training perturbation models using :code:`perturb-gym` ------------------------------------------------------ At the beginning, we enter poetry environment: .. code-block:: bash poetry shell Training configurations are defined in the form of :code:`.yaml` files. Predefined ones are given in the :code:`perturb_gym/configs/collection` directory. Configuration files allow specifying models to be trained, model arguments (hyperparameters), train/validation/test splits, as well as environment parameters such as random seed. Each configuration file can be potentially used to specify a grid of training configurations. An example :code:`.yaml` file structure is given below: .. code-block:: yaml environment_configs: - seed: 17 data_configs: - training_contexts: "Replogle22_K562" val_and_test_perturbations_selected_from: "Replogle22_K562" model_configs: - model_id: LPM model_args: optimizer_name: "AdamW" learning_rate: 0.002 learning_rate_decay: 0.996 num_layers: [1,2,3] hidden_dim: 256 dropout: 0.0 batch_size: 5000 embedding_dim: 32 lightning_trainer_pars: max_epochs: 1 save_model_after_training: True Training will be performed as specified in the corresponding configuration file. In this example, 3 different training configurations are defined, where in each one different number of layers will be used. Training is executed by passing the id of the configuration file to the training script. For example, if the id of the configuration file is `example2`, the corresponding command would be: .. code-block:: bash python -m perturb_gym.training train_from_config_file example2 Multiple training configurations that are defined within the training configuration file can also be executed in parallel in case SLURM environment is available. This can be done as follows: .. code-block:: bash python -m perturb_gym.training train_from_config_file example2 use_slurm=True If for user the existing collection of configuration files is not sufficient, the user can train perturbation models based on newly defined configuration file. This can be done as follows .. code-block:: bash python -m perturb_gym.training train_from_config_file "/path/to/yaml/config/file" Results are by default generated in the cache folder but user can also specify a custom folder as follows: .. code-block:: bash python -m perturb_gym.training train_from_config_file example1 results_dir="/path/to/results/dir" Evaluating trained models and processing results ------------------------------------------------ Trained models can be (re-)evaluated by running: .. code-block:: bash python -m perturb_gym.evaluation evaluate_all_trained_models "/path/to/dir/with/models" This command will evaluate all models found in the given directory and all subdirectories. The following command will process all the results and store them in a convenient :code:`pandas.DataFrame` format. .. code-block:: bash python -m perturb_gym.evaluation process_results "/path/to/dir/with/models"