Fitting SNAP using GaN data

This example:

Uses the atomic positions in the folder example_GaN_data
Generates surrogate DFT data based on the GaN model presented in 10.1088/1361-648x/ab6cbe
Uses snap.jl and 80% of the GaN data to create the matrix A. The matrix is generated only with the energy block.
Uses 80% of the GaN data to create the vector b. The reference energy ($E_{\rm ref}$) is assumed to be zero.
Uses backslash to fit the parameters $\boldsymbol{\beta}$, thus, solves $\mathbf{A \cdot \boldsymbol{\beta}=y}$
Uses 20% of the GaN data and the bispectrum components provided by snap.jland the fitted parameters $\boldsymbol{\beta}$ to validate the fitting.
The error computed during the validation is < ~5%.

TODO:

Check if the GaN model is correct.
Calculate reference energy ($E_{\rm ref}$).
Check ill-conditioned system.

Install and import dependencies

First let's make sure we have all required packages installed

using Pkg
pkg"add LAMMPS"

We start off by importing the necessary packages

using MPI; MPI.Init()
using LAMMPS
using LinearAlgebra: norm, pinv
using Printf

Estimation of the SNAP coefficients

The choice of the coefficients $\boldsymbol{\beta}=(\beta_0^1, \tilde{\beta}^1, \dots, \beta_0^l, \tilde{\beta}^l)$ is based on a system of linear equations which considers a large number of atomic configurations and $l$ atom types. The matrix formulation for this system $\mathbf{A \cdot \boldsymbol{\beta}=y}$ is defined in the following equations (see 10.1016/j.jcp.2014.12.018):

\[\begin{equation*} \mathbf{A}= \begin{pmatrix} \vdots & & & & & & & & \\ N_{s_1} & \sum_{i=1}^{N_{s_1}} B_1^i & \dots & \sum_{i=1}^{N_{s_1}} B_k^i & \dots & N_{s_L} & \sum_{i=1}^{N_{s_L}} B_1^i & \dots & \sum_{i=1}^{N_{s_L}} B_k^i\\ \vdots & & & & & & & & \\ 0 & -\sum_{i=1}^{N_{s_1}} \frac{\partial B_1^i}{\partial r_j^{\alpha}} & \dots & -\sum_{i=1}^{N_{s_1}} \frac{\partial B_k^i}{\partial r_j^{\alpha}} & \dots & 0 & -\sum_{i=1}^{N_{s_l}} \frac{\partial B_1^i}{\partial r_j^{\alpha}} & \dots & -\sum_{i=1}^{N_{s_l}} \frac{\partial B_k^i}{\partial r_j^{\alpha}} \\ \vdots & & & & & & & & \\ 0 & - \sum_{j=1}^{N_{s_1}} r^j_{\alpha} \sum_{i=1}^{N_{s_1}} \frac{\partial B_1^i}{\partial r_j^{\beta}} & \dots & - \sum_{j=1}^{N_{s_1}} r^j_{\alpha} \sum_{i=1}^{N_{s_1}} \frac{\partial B_k^i}{\partial r_j^{\beta}} & \dots & 0 & - \sum_{j=1}^{N_{s_l}} r^j_{\alpha} \sum_{i=1}^{N_{s_l}} \frac{\partial B_1^i}{\partial r_j^{\beta}} & \dots & - \sum_{j=1}^{N_{s_l}} r^j_{\alpha} \sum_{i=1}^{N_{s_l}} \frac{\partial B_k^i}{\partial r_j^{\beta}}\\ \vdots & & & & & & & & \\ \end{pmatrix} \end{equation*}\]

The indexes $\alpha, \beta = 1,2,3$ depict the $x$, $y$ and $z$ spatial component, $j$ is an individual atom, and $s$ a particular configuration. All atoms in each configuration are considered. The number of atoms of type $m$ in the configuration $s$ is $N_{s_m}$.

The RHS of the linear system is computed as:

\[\begin{equation*} \mathbf{y}= \begin{pmatrix} \vdots \\ E^s_{\rm qm} - E^s_{\rm ref} \\ \vdots \\\\ F^{s,j,\alpha}_{\rm qm} - F^{s,j,\alpha}_{\rm ref} \\ \vdots \\ W_{\rm qm}^{s,\alpha,\beta} - W_{\rm ref}^{s,\alpha,\beta} \\ \vdots \\ \end{pmatrix} \end{equation*}\]

Calculate $A$ using `snap.jl`

This example only fits the energy, thus, the first block of the matrix A.

include(joinpath(dirname(pathof(LAMMPS)), "..", "examples", "snap.jl"))
A

48×62 Matrix{Float64}:
 96.0  714.892  175.705  125.324  …  3157.79  330.568  1336.49  432.028
 96.0  741.638  171.502  122.601     3101.62  340.641  1254.11  545.843
 96.0  736.281  172.097  123.056     3099.71  340.618  1260.01  536.707
 96.0  736.39   172.848  123.5       3100.18  340.488  1254.05  548.47
 96.0  743.85   171.078  122.015     3097.8   339.628  1250.89  543.843
 96.0  736.916  172.51   123.381  …  3101.99  340.895  1256.86  546.525
 96.0  743.904  170.969  122.077     3099.39  341.379  1251.63  549.245
 96.0  738.596  171.981  122.985     3104.01  336.768  1259.6   535.521
 96.0  739.471  172.462  123.217     3107.2   338.89   1259.06  548.391
 96.0  740.706  171.673  122.861     3105.18  341.492  1255.34  551.494
  ⋮                               ⋱                       ⋮     
 96.0  743.088  171.117  122.303     3101.88  344.48   1252.19  555.025
 96.0  737.216  172.745  123.571  …  3102.71  337.652  1256.42  545.619
 96.0  738.869  173.318  123.551     3103.85  334.767  1252.77  547.766
 96.0  741.594  171.769  122.864     3111.31  342.38   1257.92  557.53
 96.0  737.911  171.982  122.856     3104.09  340.038  1259.47  536.737
 96.0  738.979  171.821  122.92      3109.29  341.452  1259.55  545.03
 96.0  741.679  170.768  122.231  …  3100.44  343.354  1254.12  548.632
 96.0  737.134  172.699  123.294     3097.01  336.846  1253.92  539.131
 96.0  738.856  172.706  123.366     3104.08  340.743  1254.96  551.47

Calculate $y$

A molecular mechanics model for the interaction of gallium and nitride is used to generate surrogate DFT data (see 10.1088/1361-648x/ab6cbe). The total potential energy is computed for each configuration

\[E = \sum_{i \lt j; r_{i,j} \leq rcut } E_{GaN}(r_{i,j}) \ \text{ where } \\ E_{GaN}(r_{i,j}) = \left\{ \begin{array}{ll} E_{C}(r_{i,j}) + E_{LJ}(r_{i,j}, \epsilon_{Ga,Ga}, \sigma_{Ga,Ga}) & \mbox{if both atoms are Ga} \\ E_{C}(r_{i,j}) + E_{LJ}(r_{i,j}, \epsilon_{N,N}, \sigma_{N,N}) & \mbox{if both atoms are N} \\ E_{C}(r_{i,j}) + E_{BM}(r_{i,j}, A_{Ga,N}, \rho_{Ga,N}) & \mbox{if one atom is Ga and the other is N}\\ \end{array} \right.\]

Note: this is a work in progress, this model should be checked.

const N = N1 + N2
const M1 = M
const M2 = 61

const ε_Ga_Ga = 0.643
const σ_Ga_Ga = 2.390
const ε_N_N = 1.474
const σ_N_N = 1.981
const A_Ga_N = 608.54
const ρ_Ga_N = 0.435
const q_Ga = 3.0
const q_N = -3.0
const ε0 = 55.26349406 # e2⋅GeV−1⋅fm−1 ToDo: check this
const E_ref = zeros(M1)
E_LJ(r, ε = 1.0, σ = 1.0) = 4.0 * ε * ((σ / norm(r))^12 - (σ / norm(r))^6)
E_BM(r, A = 1.0, ρ = 1.0) = A * exp(-norm(r) / ρ)
E_C(r) = q_Ga * q_N / (4.0 * π * ε0 * norm(r))

function read_atomic_conf(m, N)
    rs = []
    open(joinpath(DATA, string(m), "DATA")) do f
        for i = 1:23
            readline(f)
        end
        for i = 1:N
            s = split(readline(f))
            r = [parse(Float64, s[3]),
                 parse(Float64, s[4]),
                 parse(Float64, s[5])]
            push!(rs, r)
        end
    end
    return rs
end

function calc_tot_energy(rcut, rs, N, ε_Ga_Ga, σ_Ga_Ga, ε_N_N, σ_N_N, A_Ga_N, ρ_Ga_N)
    E_tot_acc = 0.0
    for i = 1:N
        for j = i:N
            r_diff = rs[i] - rs[j]
            if norm(r_diff) <= rcut && norm(r_diff) > 0.0
                if i <= N1 && j <= N1
                    E_tot_acc += E_C(r_diff) + E_LJ(r_diff, ε_Ga_Ga, σ_Ga_Ga)
                elseif i > N1 && j > N1
                    E_tot_acc += E_C(r_diff) + E_LJ(r_diff, ε_N_N, σ_N_N)
                else
                    E_tot_acc += E_C(r_diff) + E_BM(r_diff, A_Ga_N, ρ_Ga_N)
                end
            end
        end
    end
    return E_tot_acc
end

calc_tot_energy (generic function with 1 method)

Vector $y$ is calculated using the DFT data and $E_{\rm ref}$

function calc_y(rcut, M1, N, ε_Ga_Ga, σ_Ga_Ga, ε_N_N, σ_N_N, A_Ga_N, ρ_Ga_N, E_ref)
    y = zeros(M1)
    for m = 1:M1
        rs = read_atomic_conf(m, N)
        y[m] = calc_tot_energy(rcut, rs, N, ε_Ga_Ga, σ_Ga_Ga,
                               ε_N_N, σ_N_N, A_Ga_N, ρ_Ga_N) - E_ref[m]
    end
    return y
end

y = calc_y(rcut, M1, N, ε_Ga_Ga, σ_Ga_Ga, ε_N_N, σ_N_N, A_Ga_N, ρ_Ga_N, E_ref)

48-element Vector{Float64}:
 1722.4498771523417
 1677.2635491164112
 1711.3148873863208
 1732.267614125171
 1708.435054676101
 1690.4950201294132
 1684.066527325373
 1746.948996355329
 1710.9274852177402
 1715.0902572123912
    ⋮
 1744.267721178441
 1681.2684856943063
 1721.9940387385514
 1736.3547928530331
 1745.397940711828
 1726.9880294606958
 1705.9038135458254
 1697.3739427062326
 1715.5343234240943

Calculate the optimal solution

The optimal solution $\widehat{\mathbf{\beta}}$ is, thus:

\[\begin{equation*} \widehat{\mathbf{\beta}} = \mathrm{argmin}_{\mathbf{\beta}} ||\mathbf{A \cdot \boldsymbol{\beta} - y}||^2 = \mathbf{A^{-1} \cdot y} \end{equation*}\]

β = A \ y

62-element Vector{Float64}:
   10.186840372409128
 -153.97924361444737
   32.105482990505585
  -48.040124411822326
   92.71701861106854
   -5.5374135409148195
  193.0568779355885
 -123.4210657200344
 -130.9828131774663
  -90.14059965823128
    ⋮
 -195.9265740329176
  174.71930890601402
   75.46114960910315
   20.530494605269546
  -89.75357933645647
  -77.4138645009578
   30.71316571463391
  107.16067919386244
  -13.9531148168796

Check results

The local energy can be decomposed into separate contributions for each atom. SNAP energy can be written in terms of the bispectrum components of the atoms and a set of coefficients. $K$ components of the bispectrum are considered so that $\mathbf{B}^{i}=\{ B^i_1, \dots, B_K^i\}$ for each atom $i$, whose SNAP energy is computed as follows:

\[ E^i_{\rm SNAP}(\mathbf{B}^i) = \beta_0^{\alpha_i} + \sum_{k=1}^K \beta_k^{\alpha_i} B_k^i = \beta_0^{\alpha_i} + \boldsymbol{\tilde{\beta}}^{\alpha_i} \cdot \mathbf{B}^i\]

where $\alpha_i$ depends on the atom type.

function calc_fitted_tot_energy(path, β, ncoeff, N1, N)
    # Calculate y
    lmp = LMP(["-screen","none"])
    bs = run_snap(lmp, path, rcut, twojmax)

    E_tot_acc = 0.0
    for n in 1:N1
        E_atom_acc = β[1]
        for k in 2:ncoeff
            k2 = k - 1
            E_atom_acc += β[k] * bs[k2, n]
        end
        E_tot_acc += E_atom_acc
    end
    for n in N1+1:N
        E_atom_acc = β[ncoeff+1]
        for k in ncoeff+2:2*ncoeff
            k2 = k - ncoeff - 1
            E_atom_acc += β[k] * bs[k2, n]
        end
        E_tot_acc += E_atom_acc
    end
    command(lmp, "clear")
    return E_tot_acc
end

calc_fitted_tot_energy (generic function with 1 method)

Comparison between the potential energy calculated based on the DFT data, and the SNAP potential energy calculated based on the bispectrum components provided by $snap.jl$ and the fitted coefficients $\mathbf{\beta}$.

@printf("Potential Energy, Fitted Potential Energy, Error (%%)\n")
for m = M1+1:M2
    path = joinpath(DATA, string(m), "DATA")
    rs = read_atomic_conf(m, N)
    E_tot = calc_tot_energy(rcut, rs, N, ε_Ga_Ga, σ_Ga_Ga, ε_N_N, σ_N_N, A_Ga_N, ρ_Ga_N)
    E_tot_fit = calc_fitted_tot_energy(path, β, ncoeff, N1, N)
    @printf("%0.2f, %0.2f, %0.2f\n", E_tot, E_tot_fit,
            abs(E_tot - E_tot_fit) / E_tot * 100.)
end

Potential Energy, Fitted Potential Energy, Error (%)
1695.02, 1668.98, 1.54
1716.77, 1746.47, 1.73
1727.48, 1756.01, 1.65
1690.29, 1646.75, 2.58
1709.51, 1654.80, 3.20
1749.57, 1695.39, 3.10
1752.94, 1767.68, 0.84
1765.52, 1728.84, 2.08
1725.46, 1628.16, 5.64
1732.92, 1688.68, 2.55
1687.87, 1712.33, 1.45
1726.56, 1611.73, 6.65
1685.35, 1731.82, 2.76

This page was generated using Literate.jl.