SN2
Here we introduce the basic usage of the ArcaNN software, ilustrated by a SN2 reaction. All the files are available in the GitHub Repository; and after ArcaNN installation, you will find them at examples/sn2_ch3cl_br/
inside your local arcann_traininig
directory.
The iterative training and dataset generation for the SN2 reaction, comprised two iterative trainings : a first non-reactive training was performed on reactant and products structures, followed by a reactive training where transition structures were generated.
The files set up for the non-reactive SN2 ArcaNN training is illustrated bellow. Then, the ArcaNN inputs for each step of the first iteration and the corresponding control json
files are detailed.
User files
We will start by creating a user_files/
directory (See Iterative procedure prerequisites) where we will include the necessary files for each step of the procedure. You also need to create a data/
directory where the initial labeled datasets will be stored. For the reactive training, you will store the datasets of the non-reactive training in the corresponding data/
directory, together with the initial datasets.
For the non-reactive training, 6 systems were defined : 3 systems to explore the reactant basin (ch3cl_br_close_300K
, ch3cl_br_free_300K
, ch3cl_br_smd_300K
) and 3 systems to explore the product basin (ch3br_cl_close_300K
, ch3br_cl_free_300K
, ch3br_cl_smd_300K
).
In the user_files/
folder you will find the following files for each one of the systems (for clarity purposes, we only indicate the files of the ch3cl_br_close_300K
system here). Note also that hpc1
and hpc2
are the machine keywords indicated in the machine.json file, see HPC Configuration.
JSON FILES
machine.json
: file containing the cluster parameters.dp_train_2.1.json
: input for DeePMD trainings.
JOB FILES
job_lammps-deepmd_explore_gpu_hpc1.sh
andjob-array_lammps-deepmd_explore_gpu_hpc1.sh
: job scripts for explorationjob_CP2K_label_cpu_hpc1.sh
andjob-array_CP2K_label_hpc1.sh
: job scripts for labelingjob_deepmd_compress_gpu_hpc1.sh
,job_deepmd_freeze_gpu_hpc1.sh
andjob_deepmd_train_gpu_hpc1.sh
job scripts for training
CP2K FILES
1_ch3cl_br_close_300K_labeling_XXXXX_hpc1.inp
,2_ch3cl_br_close_300K_labeling_XXXXX_hpc1.inp
,1_ch3cl_br_close_300K_labeling_XXXXX_hpc1.inp
,2_ch3cl_br_close_300K_labeling_XXXXX_hpc2.inp
: inputs for CP2K labeling. There are 2 input files per subsystem, see details in labeling.
LAMMPS FILES
ch3cl_br_close_300K.lmp
: starting configurations for the first exploration in the LAMMPS format.-
ch3cl_br_close_300K.in
: inputs for LAMMPS exploration. -
plumed_SYSTEM_300K.dat
: plumed input files for the emplorations.
Additional plumed files can be used, and must be named as plumed_KEYWORD_SYSTEM.dat
. Here, we used an additional plumed file to store colvars and another to define the key atoms : plumed_colvars_ch3cl_br_close_300K.dat
and plumed_atomdef_ch3cl_br_close_300K.dat
.
The atom order is defined in the properties.txt
file. It makes sure that the order of the atoms in the SYSTEM.lmp
files match the order indicated in the "type_map"
keyword of the DeePMD-kit dptrain_2.1.json
training file. Also, it makes sure that the generated structures also presents the correct atom numbering to avoid conflicts.
Initialization
After the initialization step, a default_input.json
file is generated, containing the name of the LMP
systems found in the user_files/
, and the default number of NNP for training defined in ArcaNN.
{
"systems_auto": ["ch3br_cl_close_300K", "ch3br_cl_free_300K", "ch3br_cl_smd_300K", "ch3cl_br_close_300K", "ch3cl_br_free_300K", "ch3cl_br_smd_300K"],
"nnp_count": 3
}
Training
You can now move to the 000-training
directory corresponding to the training of the first generation of NNP. After running the prepare
phase, a default_input.json
file is created. In order to modify some of the default parameters, an input.json
file must be created in the same directory, where only the parameters to be updated need to be indicated as the following:
{
"user_machine_keyword_train": "v100_myproject1",
"job_walltime_train_h": 12.0
}
Then, the input is updated and stored in the directory as used_input.json
:
{
"user_machine_keyword_train": "v100_myproject1",
"user_machine_keyword_freeze": "v100_myproject1",
"user_machine_keyword_compress": "v100_myproject1",
"job_email": "",
"use_initial_datasets": true,
"use_extra_datasets": false,
"deepmd_model_version": 2.1,
"job_walltime_train_h": 12.0,
"mean_s_per_step": 0.108,
"start_lr": 0.001,
"stop_lr": 1e-06,
"decay_rate": 0.9172759353897796,
"decay_steps": 5000,
"decay_steps_fixed": false,
"numb_steps": 400000,
"numb_test": 0
}
The corresponding control
file in your local $WORKDIR/control/
is updated after the execution of each phase
. Once the 000-training
step is finished, you will find the following training_000.json
file:
{
"user_machine_keyword_train": "v100_myproject1",
"user_machine_keyword_freeze": "v100_myproject1",
"user_machine_keyword_compress": "v100_myproject1",
"job_email": "",
"use_initial_datasets": true,
"use_extra_datasets": false,
"deepmd_model_version": 2.1,
"job_walltime_train_h": 12.0,
"mean_s_per_step": 0.039030916666666665,
"start_lr": 0.001,
"stop_lr": 1e-06,
"decay_rate": 0.9172759353897796,
"decay_steps": 5000,
"decay_steps_fixed": false,
"numb_steps": 400000,
"numb_test": 0,
"training_datasets": ["init_ch3br_cl_xxxxx_1001_4001_60", "init_ch3cl_br_xxxxx_1001_4001_60"],
"trained_count": 1000,
"initial_count": 1000,
"added_auto_count": 0,
"added_adhoc_count": 0,
"added_auto_iter_count": 0,
"added_adhoc_iter_count": 0,
"extra_count": 0,
"is_prepared": true,
"is_launched": true,
"is_checked": true,
"is_freeze_launched": true,
"is_frozen": true,
"is_compress_launched": true,
"is_compressed": true,
"is_incremented": true,
"min_nbor_dist": 0.9898124626241066,
"max_nbor_size": [30, 45, 1, 1, 17],
"median_s_per_step": 0.038560000000000004,
"stdeviation_s_per_step": 0.0011691332942493009
}
When a phase
is executed succesfully, the corresponding "is_prepared"
, "is_launched"
, "is_checked"
, etc. keywords are set to true
Additional performance data, such as the mean time ("mean_s_per_step"
), median time ("median_s_per_step"
) and standard deviation ("stdeviation_s_per_step"
) per training step are reported in this file.
Exploration
After the first training phase you now have starting NNP that can be used to propagate reactive MD. After executing the prepare
phase in the 0001-exploration/
folder, you will obtain an default_input.json
file with default values.
We allow for the first exploration for slightly larger deviations by setting "sigma_low"
keyword set to 0.15 eV/Ang. This is done by modifying the input.json
and running prepare
again.
{
"sigma_low": 0.15
}
The used_input.json
becomes then:
{
"user_machine_keyword_exp": "v100_myproject1",
"job_email": "",
"atomsk_path": "/programs/apps/atomsk/0.13.1/atomsk",
"vmd_path": "/prod/vmd/1.9.4a43/bin/vmd_LINUXAMD64",
"exploration_type": ["lammps", "lammps", "lammps", "lammps", "lammps", "lammps"],
"traj_count": [2, 2, 2, 2, 2, 2],
"temperature_K": [300.0, 300.0, 300.0, 300.0, 300.0, 300.0],
"timestep_ps": [0.0005, 0.0005, 0.0005, 0.0005, 0.0005, 0.0005],
"previous_start": [true, true, true, true, true, true],
"disturbed_start": [false, false, false, false, false, false],
"print_interval_mult": [0.01, 0.01, 0.01, 0.01, 0.01, 0.01],
"job_walltime_h": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
"exp_time_ps": [10.0, 10.0, 41.0, 10.0, 10.0, 41.0],
"max_exp_time_ps": [400, 400, 400, 400, 400, 400],
"max_candidates": [50, 50, 50, 50, 50, 50],
"sigma_low": [0.15, 0.15, 0.15, 0.15, 0.15, 0.15],
"sigma_high": [0.7, 0.7, 0.7, 0.7, 0.7, 0.7],
"sigma_high_limit": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
"ignore_first_x_ps": [0.5, 0.5, 0.5, 0.5, 0.5, 0.5],
"disturbed_start_value": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
"disturbed_start_indexes": [[], [], [], [], [], []],
"disturbed_candidate_value": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
"disturbed_candidate_indexes": [[], [], [], [], [], []]
}
For the first iteration the default parameters are a good starting point. The "traj_count"
keyword sets to 2 the number of simulations per NNP. and per system and "timestep_ps"
sets to 0.0005 ps the timestep of the simulations. The "disturbed_candidate_value"
keywords are all set to 0, so no disturbance is applied to the candidate structures that will be added to the training set.
To perform the explorations, one directory per system is created, in which there will be 3 subdirectories (one per trained NNP) 1/
, 2/
and 3/
, in which again there will be 2 subdirectories (by default) 0001/
and 0002/
. This means that a total of 36 MD trajectories will be performed for this first iteration. Be careful, the total exploration time can quickly become huge, especially if you have many systems.
If we have a look at the exploration_001.json
file inside the $WORKDIR/control/
folder:
{
"atomsk_path": "/programs/apps/atomsk/0.13.1/atomsk",
"user_machine_keyword_exp": "v100_myproject1",
"deepmd_model_version": 2.1,
"nnp_count": 3,
"systems_auto": {
"ch3br_cl_close_300K": {
// exploration parameters from used_input.json
},
"ch3br_cl_free_300K": {
//
},
"ch3br_cl_smd_300K": {
//
},
"ch3cl_br_close_300K": {
//
},
"ch3cl_br_free_300K": {
//
},
"ch3cl_br_smd_300K": {
//
}
},
"is_locked": true,
"is_launched": true,
"is_checked": true,
"is_deviated": true,
"is_extracted": true,
"nb_sim": 36,
"vmd_path": "/prod/vmd/1.9.4a43/bin/vmd_LINUXAMD64"
}
The total number of MD simulations is indicated by the "nb_sim"
keyword. The "vmd_path"
and the "atomsk_path"
correspond to the ones indicated in the used_input.json
, but are not necessary if the code is already available in the ArcaNN path. When the exploration
step is succesfully finished, all the phase
keywords are set to "true"
.
Labeling
For the last step
of the first iteration, we move to the $WORKDIR/001-labeling/
folder to run the different phases
. You should adapt the Slurm parameters for the electronic structure calculation to match the architecture of your system. In this case, the number of MPI processes per node is set to 16 with the "nb_mpi_per_node"
keyword in the input.json
:
{
"user_machine_keyword_label": "mykeyword1",
"nb_mpi_per_node": 16
}
As usual, the used_input.json
file will be updated consequently when re running the prepare
phase:
{
"user_machine_keyword_label": "mykeyword1",
"job_email": "",
"labeling_program": "cp2k",
"walltime_first_job_h": [0.5, 0.5, 0.5, 0.5, 0.5, 0.5],
"walltime_second_job_h": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
"nb_nodes": [1, 1, 1, 1, 1, 1],
"nb_mpi_per_node": [16, 16, 16, 16, 16, 16],
"nb_threads_per_mpi": [1, 1, 1, 1, 1, 1]
}
The number of MPI processes hs been set to 16 for the 6 systems. The walltimes of both calculations (2 calculation are performed when using CP2, a first quick calculation at a lower level of theory and then the reference level) are kept at the default values.
Here the reactive water calculations use full nodes and have a higher wall time of 1h30min. The wall times should be set for the first iteration but can be guessed automatically later using the average time per CP2K calculation measured in the previous iteration. We can now run the first 2 phases and wait for the electronic structure calculations to finish. When running the check phase there could be a message telling us that there are failed configurations in the water-reactive
folder! We can see which calculations did not converge in the water-reactive/water-reactive_step2_not_converged.txt
file. Suppose there were 2 failed jobs, the 13-th and the 54-th. We might just do touch water-reactive/00013/skip
and touch water-reactive/00054/skip
and run the check
phase again. This time it will inform us that some configurations will be skipped, but the final message should be that check phase is a success. All that is left to do now is run the extract
phase, clean up with the clean
phase, store wavefunctions and remove all unwanted data and finally update our local folder. We have now augmented our total training set and might do a new training iteration and keep iterating until convergence is reached!
Finally, we can check the labeling_001.json
file in $WORKDIR/control/
:
{
"labeling_program": "cp2k",
"user_machine_keyword_label": "mykeyword1",
"systems_auto": {
"ch3br_cl_close_300K": {
// labeling parameters from used_input.json
},
"ch3br_cl_free_300K": {
//
},
"ch3br_cl_smd_300K": {
//
},
"ch3cl_br_close_300K": {
//
},
"ch3cl_br_free_300K": {
//
},
"ch3cl_br_smd_300K": {
}
},
"total_to_label": 50,
"launch_all_jobs": true,
"is_locked": true,
"is_launched": true,
"is_checked": true,
"is_extracted": true
}
The total number of structures that have been selected labeled from the selected candidates in the previous exploration step is indicated with the "total_to_label"
keyword.
The first iteration is done. After executing the extract
phase, the directories for the next iteration will be created.