GROMACS Tutorial

Step Seven: Analysis

The introduction of the g_bar program in GROMACS 4.5 (now named "gmx bar" in GROMACS 5.0) has made the calculation of ΔGAB very simple. Simply collect all of the md*.xvg files that were produced by mdrun (one for each value of λ) in the working directory and run gmx bar:

gmx bar -f md*.xvg -o -oi -oh

The program will print lots of useful information to the screen, in addition to producing three output files: bar.xvg, barint.xvg, and histogram.xvg. The screen output from gmx bar should look something like:

Detailed results in kT (see help for explanation):

 lam_A  lam_B      DG   +/-     s_A   +/-     s_B   +/-   stdev   +/- 
     0      1    0.07  0.00    0.03  0.00    0.03  0.00    0.25  0.00
     1      2    0.06  0.00    0.03  0.00    0.04  0.00    0.26  0.00
     2      3    0.05  0.00    0.03  0.00    0.04  0.00    0.27  0.00
     3      4    0.03  0.00    0.03  0.00    0.04  0.00    0.28  0.00
     4      5    0.02  0.00    0.04  0.00    0.05  0.00    0.29  0.00
     5      6   -0.00  0.01    0.04  0.00    0.05  0.00    0.30  0.01
     6      7   -0.02  0.01    0.05  0.01    0.06  0.01    0.32  0.00
     7      8   -0.06  0.00    0.06  0.00    0.07  0.00    0.35  0.00
     8      9   -0.10  0.01    0.06  0.00    0.08  0.01    0.38  0.00
     9     10   -0.15  0.01    0.08  0.01    0.10  0.01    0.41  0.00
    10     11   -0.23  0.01    0.10  0.01    0.13  0.01    0.47  0.00
    11     12   -0.32  0.01    0.10  0.01    0.14  0.01    0.51  0.01
    12     13   -0.44  0.01    0.17  0.01    0.20  0.01    0.58  0.01
    13     14   -0.57  0.01    0.15  0.01    0.17  0.01    0.57  0.00
    14     15   -0.60  0.02    0.11  0.01    0.11  0.01    0.47  0.00
    15     16   -0.53  0.01    0.06  0.01    0.06  0.01    0.35  0.00
    16     17   -0.41  0.00    0.03  0.00    0.03  0.00    0.25  0.00
    17     18   -0.28  0.00    0.02  0.00    0.02  0.00    0.18  0.00
    18     19   -0.16  0.00    0.01  0.00    0.01  0.00    0.13  0.00
    19     20   -0.05  0.00    0.00  0.00    0.00  0.00    0.10  0.00

Final results in kJ/mol:

point      0 -      1,   DG  0.19 +/-  0.01
point      1 -      2,   DG  0.16 +/-  0.00
point      2 -      3,   DG  0.12 +/-  0.00
point      3 -      4,   DG  0.09 +/-  0.00
point      4 -      5,   DG  0.04 +/-  0.00
point      5 -      6,   DG -0.00 +/-  0.01
point      6 -      7,   DG -0.06 +/-  0.01
point      7 -      8,   DG -0.15 +/-  0.01
point      8 -      9,   DG -0.26 +/-  0.02
point      9 -     10,   DG -0.39 +/-  0.02
point     10 -     11,   DG -0.58 +/-  0.02
point     11 -     12,   DG -0.79 +/-  0.02
point     12 -     13,   DG -1.11 +/-  0.02
point     13 -     14,   DG -1.43 +/-  0.02
point     14 -     15,   DG -1.51 +/-  0.04
point     15 -     16,   DG -1.33 +/-  0.02
point     16 -     17,   DG -1.02 +/-  0.01
point     17 -     18,   DG -0.70 +/-  0.01
point     18 -     19,   DG -0.40 +/-  0.00
point     19 -     20,   DG -0.13 +/-  0.00

total      0 -     20,   DG -9.25 +/-  0.15

The value of ΔGAB I obtained is -9.25 ± 0.15 kJ mol-1, or -2.21 ± 0.04 kcal mol-1. Since the process I conducted for this demonstration was the decoupling of methane, the reverse process (the introduction of uncharged methane into water, thus the actual hydration energy of the process) corresponds to a ΔGAB of 2.21 ± 0.04 kcal mol-1 (assuming reversibility), in good agreement with the value obtained by Shirts et al. of 2.24 ± 0.01 kcal mol-1 (per Table II of the Supporting Information of that paper), despite the fact that I used about half the number of λ vectors for the van der Waals transformation than the original authors did, in addition to the other changes described previously.

Approximately 0.02 kJ mol-1 (roughly 0.004 kcal mol-1) of the difference can be attributed to the temperature difference alone (300 K here and in the Mobley tutorial, 298 K in the original work), so the difference in temperature does not have much of a practical impact on our result.

Technically, this is all we need to do to arrive at our answer, but gmx bar also prints a number of useful output files (all of which are optional). Their contents are worth exploring here, as they provide a great level of detail into the decoupling process and the success of sampling.


Output file 1: bar.xvg

Let's take a look at the other output files that gmx bar gave us, starting with bar.xvg. This output file plots the relative free energy differences for each interval of λ (i.e., between neighboring Hamiltonians), and should look something like this (after re-plotting as a bar graph rather than the default line representation, and adding a few grid lines for perspective):

The vertical gray lines indicate the values of λ utilized in the decoupling process conducted here. Thus, each black bar indicates the free energy difference between neighboring values of λ. In bar.xvg, we get the first look at what calc_lambda_neighbors was doing during the simulations. For example, in our simulation of init_lambda_state = 0, the free energy at λ = 0.05 (the nearest neighboring window, as specified by calc_lambda_neighbors = 1) was evaluated every nstdhdl (10) steps. These "foreign" λ calculations result in energetic overlap between the values of λ, such that we have phase space overlap and adequate sampling. Interested readers should refer to the paper by Wu and Kofke cited in the gmx bar description for a discussion of this concept as well as the interpretation of the entropy values sA and sB.

Thus, the free energy change from λ = 0 to λ = 1 is simply the sum of the free energy changes of each pair of neighboring λ simulations, which are plotted in bar.xvg.

The values of ΔG correspond to the first half of the output printed to the screen by gmx bar. The second half of the screen output contains these same values of ΔG, converted to kJ mol-1 (1 kT = 2.494 kJ mol-1 at 300 K).


Output file 2: barint.xvg

The barint.xvg file plots the cumulative ΔG as a function of λ. In barint.xvg, the point at λ = 1 corresponds to the sum of ΔG from λ vector 0 to λ vector 1, as indicated in the screen output above:

 lam_A  lam_B      DG   +/-     s_A   +/-     s_B   +/-   stdev   +/-
     0      1    0.07  0.00    0.03  0.00    0.03  0.00    0.25  0.00
     1      2    0.06  0.00    0.03  0.00    0.04  0.00    0.26  0.00

Therefore, the value of the cumulative ΔG in barint.xvg at λ = 0 is zero, and the value at λ = 0.1 is 0.0 + 0.07 = 0.07, and this is the value we find in barint.xvg. Below is the plot of barint.xvg (blue) alongside the contents of bar.xvg (black, the values of ΔG between λ values) to illustrate the cumulative ΔG. By taking the value of ΔG at λ = 20 (-3.71 kT), we can can calculate ΔG as -9.25 kJ mol-1.


Output file 3: histogram.xvg

The histogram.xvg file has a lot of information in it. If you load histogram.xvg into xmgrace, it may not make a lot of sense, and the legends will run off the screen. For ease of understanding, I will plot the two types of data separately. The first plot contains distributions of ∂H/∂λ values (in kJ mol-1) for each value of λ found in the md*.xvg files. It should look something like this:

The other component of histogram.xvg are distributions of ΔH values (where H is the Hamiltonian, not enthalpy) between neighboring (so-called "native" and "foreign") λ values. For configurations produced during trajectories generated to the "native" λ value (i.e., init_lambda_state in the .mdp file), the value of ΔH is calculated for this configuration at any requested foreign λ values. The results (after scaling to a suitable axis) are as follows:

Note that, due to the large number of data sets, the legend is cut off. The legends for each distribution provide the description of the quantities plotted. For instance, "N(ΔH(λ=0.05) | λ=0)" indicates that the distribution is for ΔH values calculated at a foreign λ value of 0.05 from configurations in the λ = 0 trajectory. For a more complete discussion of the implications of this plot, please refer to Figure 5 (and its associated discussion) in the BAR paper.

So what does all of this mean? In general, histograms from MD simulations are used to indicate some kind of overlap in energies, positions, etc. much like in umbrella sampling. Here, in order to obtain a reliable estimate of ΔG, we need adequate sampling within each window (based on ∂H/∂λ distributions) and between neighboring windows (ΔH distributions). The significant overlap in the distributions of the data shown here and the low error estimate printed by gmx bar indicate that we have achieved this goal. If your own calculations give you non-overlapping histograms, resulting in large error estimates, then you likely either need longer simulations, more λ states, or better parameters for whatever molecule it is you are trying to transform.

Back: Production MD Next: Advanced Applications

Bevan Lab Homepage

Virginia Tech Homepage

Virginia Tech Biochemistry


Site design and content copyright 2008-2015 by Justin Lemkul
Problems with the site? Send them to the Webmaster