Using ArcaNN
Iterations, Steps and Phases of the Iterative Procedure
At this stage, ArcaNN is installed in your machine, and you have made the necessary changes to adapt it (see HPC Configuration). As in the GitHub Repository, you can now find in the location where you installed ArcaNN, an arcann_traininig/
folder containing several files, as well as the arcann_training/
scripts, a tools/
directory and a examples/
directory.
To start the procedure, create an empty directory anywhere you like that will be your iterative training working directory.
We will refer to this directory by the variable name $WORK_DIR
.
We will describe the prerequisites, and then the initialization, training, exploration, labeling steps and, the optional test. At the end of each step description, we include an example.
As described in more detail below, training the NNP proceeds in iterations consisting of three steps: exploration, labeling, and training.
Each step is broken down into elementary tasks, which we call phases.
Each iteration will have three folders: XXX-exploration, XXX-labeling, and XXX-training (e.g., XXX
is 003
for the 3rd iteration).
Each step is executed in its corresponding folder by running, in order, the relevant phases with the following command:
python -m arcann_training STEP_NAME PHASE_NAME
where STEP_NAME
refers to the current step (initialization
, exploration
, labeling
, training
, or test
) and PHASE_NAME
is the specific task that needs to be performed within that step.
This will become clearer with examples in the sections below, where each step is explained.
The following tables provide a brief description of the phases in each step, in the correct order.
Since initialization
has only a single start
phase, which is self-explanatory, it is detailed in the example below.
Exploration
Phase | Description |
---|---|
prepare |
Prepares the folders for running the exploration MDs of all systems (automatically generating input files required for each simulation). |
launch |
Submits the MD simulation to the specified partition of the cluster, usually with a SLURM array. |
check |
Verifies whether the exploration simulations have completed successfully. If any simulations ended abruptly, it indicates which ones, allowing the user to skip or force them (see Exploration). |
deviate |
Reads the model deviation (maximum deviation between atomic forces predicted by the committee of NN) along the trajectories of each system and identifies configurations that are candidates (deviations within specified boundaries; see Exploration). |
extract |
Extracts a user-defined number of candidate configurations per system, saving them to a SYSNAME/candidates_SYSNAME.xyz file for labeling and addition to the NNP training set. |
clean |
Removes files that are no longer required (optional). |
Labeling
Phase | Description |
---|---|
prepare |
Prepares folders and files to run electronic structure calculations on identified candidates of each system, obtaining the energies and forces required to train the NNP. |
launch |
Submits the calculations with one SLURM array per system. |
check |
Verifies that calculations have completed successfully. If any calculations finished abruptly, it writes their index to a text file in the corresponding SYSNAME/ folder. The user must decide whether to skip or resubmit manually each failed calculation before proceeding. |
extract |
Extracts necessary information from the CP2K outputs and builds DeePMD-kit "systems"/datasets for each system (stored in the $WORK_DIR/data/ folder). |
clean |
Removes files that are no longer required and compresses the calculation outputs into an archive (optional). |
Training
Phase | Description |
---|---|
prepare |
Prepares folders and files for training and the user-defined number of independent NNPs to be used in the next iteration. |
launch |
Submits the training calculations using the dp train code from DeePMD-kit. |
check |
Verifies whether the training has completed successfully. If any traoining ended abruptly, they need to be resubmitted manually to ensure the training finishes correctly. |
freeze |
Freezes the NN parameters into a binary file (.pb extension for TensorFlow back-end) usable with LAMMPS and Python. This is done with the dp freeze code from DeePMD-kit. |
check_freeze |
Verifies that the calculations completed successfully. If any calculations finished abruptly, they must be resubmitted manually to ensure freezing completes correctly. |
compress |
Compresses the NNP by modifying the .pb file to enhance performance with minimal loss of accuracy. Uses the dp compress code from DeePMD-kit (optional). |
check_compress |
Verifies that the calculations completed successfully. If any calculations finished abruptly, they must be resubmitted manually to ensure compressing completes correctly. |
increment |
Changes the iteration number in control and creates new exploration , labeling , and training folders for the next iteration. |
clean |
Removes files that are no longer required (optional). |
Test
Phase | Description |
---|---|
prepare |
Prepares folders and files for testing the performance of the current iteration's NNP on each dataset included in the training set. |
launch |
Submits the testing calculations using the dp test code from DeePMD-kit. If you need "detail files" generated by dp test , include this directly in the job_test_deepmd file. |
check |
Verifies whether the calculations have completed successfully. |
clean |
Removes files that are no longer required (optional). If "detail files" weren't requested, the XXX-test/ folder will be removed, as all the step information is consolidated in the control/test_XXX.json file. Otherwise, the "detail files" will be compressed into .npy format and stored in XXX-test/ . |
Parameters
Parameters will need to be defined for most phases of each step (e.g., length of MD simulations, temperature, number of CPU tasks for labeling calculations, etc.).
This is done via input files in the JSON format.
Executing phase without an input file will use all the default values (see the exploration.json
file in examples/inputs
for all exploration
phases) and write them to a default_json.json
file.
This file serves as a reminder of what the default values are.
After successfully executing a phase, a used_input.json
file will be created, indicating which parameters ArcaNN
used for that phase.
It will be appended with additional parameters after a subsequent phase (e.g., after exploration prepare
, a used_input.json
is created and appended after exploration deviate
with parameters specific to the deviate
phase).
If you want to override the default values for a phase, simply create an input.json
file with the parameters you want to change.
For example, to override the number of picoseconds for the exploration prepare
phase, add an input.json
file like this:
{
"exp_time_ps": 100
}
And run or rerun exploration prepare
: you will see within the used_input.json
that the requested parameters have been read, and that dependent parameters (e.g., walltime) have been adjusted accordingly.
The parameters indicated in an input.json
file will always override the default or auto-calculated ones, and for some of them, the values will persist.
For instance, if you provided an override for max_candidates
in iteration 003
, it will be maintained in iteration 004
without requiring another input.json
.
We will now describe each step of the concurrent learning procedure in detail.