Using ArcaNN

Iterations, Steps and Phases of the Iterative Procedure

At this stage, ArcaNN is installed in your machine, and you have made the necessary changes to adapt it (see HPC Configuration). As in the GitHub Repository, you can now find in the location where you installed ArcaNN, an arcann_traininig/ folder containing several files, as well as the arcann_training/ scripts, a tools/ directory and a examples/ directory.

To start the procedure, create an empty directory anywhere you like that will be your iterative training working directory. We will refer to this directory by the variable name $WORK_DIR.

We will describe the prerequisites, and then the initialization, training, exploration, labeling steps and, the optional test. At the end of each step description, we include an example.

As described in more detail below, training the NNP proceeds in iterations consisting of three steps: exploration, labeling, and training. Each step is broken down into elementary tasks, which we call phases. Each iteration will have three folders: XXX-exploration, XXX-labeling, and XXX-training (e.g., XXX is 003 for the 3rd iteration). Each step is executed in its corresponding folder by running, in order, the relevant phases with the following command:

python -m arcann_training STEP_NAME PHASE_NAME

where STEP_NAME refers to the current step (initialization, exploration, labeling, training, or test) and PHASE_NAME is the specific task that needs to be performed within that step. This will become clearer with examples in the sections below, where each step is explained. The following tables provide a brief description of the phases in each step, in the correct order. Since initialization has only a single start phase, which is self-explanatory, it is detailed in the example below.

Exploration

Phase	Description
`prepare`	Prepares the folders for running the exploration MDs of all systems (automatically generating input files required for each simulation).
`launch`	Submits the MD simulation to the specified partition of the cluster, usually with a SLURM array.
`check`	Verifies whether the exploration simulations have completed successfully. If any simulations ended abruptly, it indicates which ones, allowing the user to `skip` or `force` them (see Exploration).
`deviate`	Reads the model deviation (maximum deviation between atomic forces predicted by the committee of NN) along the trajectories of each system and identifies configurations that are candidates (deviations within specified boundaries; see Exploration).
`extract`	Extracts a user-defined number of candidate configurations per system, saving them to a `SYSNAME/candidates_SYSNAME.xyz` file for labeling and addition to the NNP training set.
`clean`	Removes files that are no longer required (optional).

Labeling

Phase	Description
`prepare`	Prepares folders and files to run electronic structure calculations on identified candidates of each system, obtaining the energies and forces required to train the NNP.
`launch`	Submits the calculations with one SLURM array per system.
`check`	Verifies that calculations have completed successfully. If any calculations finished abruptly, it writes their index to a text file in the corresponding `SYSNAME/` folder. The user must decide whether to `skip` or resubmit manually each failed calculation before proceeding.
`extract`	Extracts necessary information from the CP2K outputs and builds DeePMD-kit "systems"/datasets for each system (stored in the `$WORK_DIR/data/` folder).
`clean`	Removes files that are no longer required and compresses the calculation outputs into an archive (optional).

Training

Phase	Description
`prepare`	Prepares folders and files for training and the user-defined number of independent NNPs to be used in the next iteration.
`launch`	Submits the training calculations using the `dp train` code from DeePMD-kit.
`check`	Verifies whether the training has completed successfully. If any traoining ended abruptly, they need to be resubmitted manually to ensure the training finishes correctly.
`freeze`	Freezes the NN parameters into a binary file (`.pb` extension for TensorFlow back-end) usable with LAMMPS and Python. This is done with the `dp freeze` code from DeePMD-kit.
`check_freeze`	Verifies that the calculations completed successfully. If any calculations finished abruptly, they must be resubmitted manually to ensure freezing completes correctly.
`compress`	Compresses the NNP by modifying the `.pb` file to enhance performance with minimal loss of accuracy. Uses the `dp compress` code from DeePMD-kit (optional).
`check_compress`	Verifies that the calculations completed successfully. If any calculations finished abruptly, they must be resubmitted manually to ensure compressing completes correctly.
`increment`	Changes the iteration number in `control` and creates new `exploration`, `labeling`, and `training` folders for the next iteration.
`clean`	Removes files that are no longer required (optional).

Test

Phase	Description
`prepare`	Prepares folders and files for testing the performance of the current iteration's NNP on each dataset included in the training set.
`launch`	Submits the testing calculations using the `dp test` code from DeePMD-kit. If you need "detail files" generated by `dp test`, include this directly in the `job_test_deepmd` file.
`check`	Verifies whether the calculations have completed successfully.
`clean`	Removes files that are no longer required (optional). If "detail files" weren't requested, the `XXX-test/` folder will be removed, as all the step information is consolidated in the `control/test_XXX.json` file. Otherwise, the "detail files" will be compressed into .npy format and stored in `XXX-test/`.

Parameters

Parameters will need to be defined for most phases of each step (e.g., length of MD simulations, temperature, number of CPU tasks for labeling calculations, etc.). This is done via input files in the JSON format. Executing phase without an input file will use all the default values (see the exploration.json file in examples/inputs for all exploration phases) and write them to a default_json.json file. This file serves as a reminder of what the default values are. After successfully executing a phase, a used_input.json file will be created, indicating which parameters ArcaNN used for that phase. It will be appended with additional parameters after a subsequent phase (e.g., after exploration prepare, a used_input.json is created and appended after exploration deviate with parameters specific to the deviate phase). If you want to override the default values for a phase, simply create an input.json file with the parameters you want to change. For example, to override the number of picoseconds for the exploration prepare phase, add an input.json file like this:

{
    "exp_time_ps": 100
}

And run or rerun exploration prepare: you will see within the used_input.json that the requested parameters have been read, and that dependent parameters (e.g., walltime) have been adjusted accordingly.

The parameters indicated in an input.json file will always override the default or auto-calculated ones, and for some of them, the values will persist. For instance, if you provided an override for max_candidates in iteration 003, it will be maintained in iteration 004 without requiring another input.json.

We will now describe each step of the concurrent learning procedure in detail.