CSP via packing density

How close can we get to the lowest energy molecular structures just by maximizing packing density? Let’s compare the structures Andy obtained from Materials Studio by minimising some force-field potential

with the structures I got from maximizing packing density: https://milotorda.net/getting-loads-of-sufficiently-dense-structures/ .

To get a clearer view, we took Andy’s set, created a neighborhood around the packing coefficient, divided the interval into 200 bins, and in each bin, selected the lowest energy structure. Then, we calculated the energies of these structures using VASP’s DFT (PBE) and computed the packing density via https://milotorda.net/exercise-in-monte-carlo-density-computation/

Next, we selected around 200 structures generated here https://milotorda.net/getting-loads-of-sufficiently-dense-structures/ and computed their energies using VASP’s DFT, just like we did for Andy’s set.

The energy difference between the lowest energy structure in Andy’s set and mine is \Delta \textbf{E}=0.03268 \ \text{eV}. However, in Andy’s structures, the van der Waals spheres of different anthracene molecules overlap. If we filter out the structures with overlaps,

then the energy difference between the lowest energy structures in both sets is \Delta \textbf{E}=0.02383 \ \text{eV}.

We then continued to further optimise the energy of all structures using VASP’s implementation of the conjugate gradient method. Here, the molecules are no longer rigid, meaning the bond lengths of the molecules in each structure change.

To be honest, I didn’t mention that we performed DFT calculations for a broader density spectrum from Andy’s CSP set. At the time of the DFT calculations, I didn’t have density estimates for all configurations, so I used the packing coefficient given by Materials Studio to guide the selection of structures. The density estimates I used are slightly different. Here is the full picture,

which brings up the question of how far from the highest density structure (in the density spectrum) do we have to look for the lowest energy structure.

Exercise in Monte-Carlo packing density computation.

I needed a way to calculate the packing densities of structures modeled using the van der Waals sphere model, generated through CSP and DFT computations. Specifically, for periodic configurations, this comes down to computing:: \rho = \frac{\text{vol}(\mathbf{O})}{\text{vol}(\mathbf{U})}. Here, \text{vol}(\mathbf{U}) represents the volume of the unit cell \mathbf{U}, and \text{vol}(\mathbf{O}) is the volume of the subset of \mathbf{U} occupied by the van der Waals spheres denoted as \mathbf{O}.

Calculating the volume of \mathbf{U} is is straightforward; it’s simply the determinant of the unit cell’s generators: \text{vol}(\mathbf{U})}=\text{det}(\mathbf{U}). However, analytically computing the volume of \mathbf{O} involves solving numerous intricate integrals of intersections of spheres with various radii, making it cumbersome.

Let’s try to do this the Monte-Carlo way, instead. The volume of \mathbf{O} can be expressed as an integral over the unit cell \mathbf{U}, i.e.

    \begin{equation*}\text{vol}(\mathbf{O})=\int_{\mathbf{O}} d\mathbf{V}\end{equation*}

Here, d\mathbf{V} is the natural volume form on \mathbf{R}^3. Since this integral is zero everywhere except in the unit cell, it can be redefined as an integral of the indicator function over the unit cell:

    \begin{equation*}\text{vol}(\mathbf{O})=\int \bm{1}_{\mathbf{O}} d\mathbf{U}\end{equation*}

By changing coordinates to an integral over the unit cube \textbf{C}, we get:

    \begin{equation*}\text{vol}(\mathbf{O})=\int \bm{1}_{\mathbf{U}^{-1}\mathbf{O}} \text{det}(\mathbf{U}) d\mathbf{C}\end{equation*}

Plugging this into the density equation yields:

    \begin{equation*}$\rho=\int \bm{1}_{\mathbf{U}^{-1}\mathbf{O}} d\mathbf{C}\end{equation*}

Considering \mathbf{C} as a uniform random vector on the unit cube, this integral represents the expected value of the indicator function with respect to the 3-variate uniform distribution. This, in turn, equals the probability of X \sim \mathbf{C} lying in the collection of the van der Waals spheres \mathbf{U}^{-1}\mathbf{O}, scaled by the inverse of the unit cell:

    \begin{equation*}$\rho=\textbf{E}\left[ \bm{1}_{\mathbf{U}^{-1}\mathbf{O}} \right]=\text{P}\left( X \in \mathbf{U}^{-1}\mathbf{O} \right)\end{equation*}

This means that the crystal’s density can be estimated by sampling X_1, \ldots X_N \sim \mathbf{C}, counting how many fall inside \mathbf{U}^{-1}\mathbf{O}, and dividing by N:

    \begin{equation*}\hat{\rho} = \frac{\# \left \{ X_i \in \mathbf{U}^{-1}\mathbf{O} \right \}}{N}\end{equation*}

Now, say I want to determine the density of this P21C anthracene structure

Equivalently, I can sample uniformly from the parallelepiped defined by the unit cell, e.g. 100 000 realisations,

and count how many fall inside at least one of the van der Waals spheres

which is 73056. Therefore, the density estimate is \hat{\rho}=0.73056.

But how accurate is this estimate? Or even better is to ask, how large does N need to be so that | \hat{\rho} - \rho | < c, where c is some constant?

To answer this, consider each X_i as a Bernoulli trial. Then \mathbm{X}=\sum_{i=1}^N X_i is binomially distributed random variable with the expected value N\rho and variance N\rho\left(1-\rho \right). According to the central limit theorem, the random variable \mathbm{Z}=\frac{\sqrt{N}\left( \hat{\rho} - \rho \right)}{\sqrt{\rho\left(1-\rho \right)}} converges in distribution to a normally distributed random variable with mean of 0 and a variance of 1. Therefore, I can request \mathbm{Z} being in the interval (-z;z), and I want the size of this interval to be less then some constant with probability p:

    \begin{equation*}P \left( | \hat{\rho} - \rho | < 2 z \sqrt{\frac{\hat{\rho}\left(1-\hat{\rho} \right)}{N}} \right)=p \end{equation*}

where z is the \frac{1+p}{2} quantile of the standard normal distribution.

For the P21C anthracene structure mentioned earlier, to achieve | \hat{\rho} - \rho | < 0.001 with probability p=0.999, I need N=9,000,000. The density estimate in this case is then:

    \begin{equation*}\hat{\rho} \approx 0.731501\end{equation*}

As a point of comparison, Mercury reports the packing coefficient of this structure as 0.733255.

Getting Loads of Sufficiently Dense Structures

After running the packing algorithm, the next question is how to generate an arbitrary number of sufficiently dense structures.

One way to do this is to keep all the feasible and repaired configurations sampled at each iteration of the optimization process. For instance, after using the method described here https://milotorda.net/etrpa-repairing-unfeasible-solutions/ , I obtained 573621 packings

and out of these I select those with densities higher than 0.7 which leaves me with 1296 structures

Still quite a lot. Lets reduce it even further by clustering. Most likely many of the structures are quite similar to each other and my goal is to identify the most dissimilar ones.

To do this, I did a single-linkage hierarchical clustering on the 1296 configurations using the standard Euclidean distance. Given that the underlying configuration space is a 9-dimensional unit flat torus, the metric coincides with the metric of that of \mathbb{R}^9, thus making sure that the single-linkage hierarchical clustering respects the geometry of my configuration space.

Suppose I want to have a selection of 100 most distinct structures. To do this, I cut the dendrogram tree at a level where it has 100 leaves (in terms of Topological Data Analysis I create a ÄŒech complex with 100 connected components)

and within each cluster the configuration with the highest packing density, leaving me with 100 structures

Let’s have a look at the packing configurations of the two most distinct clusters:

Density 0.7267

Density 0.7053

These structures seem quite similar, possibly due to the limited variability allowed within the P1 space group.

A possible problem with these configuration might be, that in most of them, are actually in contact with each other because of the repairing map, and when any two van der Waals spheres of two molecules touch, the energy should go to infinity. So, let’s try another approach.

Since the algorithm performs a Riemannian gradient ascent on a manifold of parametric probability measures, the time evolution of the distribution parameters can be thought of as a trajectory of an autonomous dynamical system. After each iteration, I save the system’s state, and when the algorithm finishes, I can return to that state and generate as many samples as necessary.

For example, after completing the run described in this post ETRPA – repairing unfeasible solutions , I take the state where the maximum packing was sampled

and generate 100 packings with density higher than 0.7 from this distribution. However, this time, I skip the repairing step and only check for overlaps. It takes a bit longer to get 100 packings with density above 0.7 without the repairing process, but the molecules do not come into contact with each other in the resulting configurations.

Let’s take a look at the distribution of these 100 configurations in my configuration space

,the single-linkage hierarchical clustering dendrogram

and the two most distinct structures in this set

Density: 0.7299

Density: 0.7079

Again, it seems they don’t look that much different. Perhaps the best would be to merge the 100 structures selected from the entire optimisation run with the 100 structures generated without repair from the distribution where the maximum packing was sampled

and then use the single-linkage hierarchical clustering to create 100 clusters

to reduce the number of structures to 100 by taking the configuration with highes density in each cluster

In this set of 100 packings, 86 are from the set of selected structures in the optimisation run, and the ramaining 14 are from the 100 structures sampled without repair.

Here are examples from the two most distinct clusters, one obtained from the set without repair and one from the optimisation run set:

Density: 0.7221

Density: 0.7102

They still look like the same configuration, just realised with alternative unit cell parameters. I’m going to blame the lack of variability in the P1 plane group for this.

Repairing Unfeasible Solutions in ETRPA.

The ETRPA is designed to optimize a multi-objective function by:

1. Maximizing packing density.
2. Minimizing overlap between shapes in a configuration.

This is achieved through the use of a penalty function, which generally works quite effectively. However, there are some downsides to this approach. Many generated samples are flagged as unfeasible, leading to inefficiency. Additionally, some potentially good configurations with minimal overlaps are discarded. A more effective approach would involve mapping unfeasible solutions to feasible ones, essentially repairing the overlaps.

I’ve designed a relatively straightforward method to accomplish this almost analytically. It boils down to finding roots of a second-order polynomial, which is essentially a high school algebra problem. Admittedly, you’ll need to solve quite a few of these second-order polynomials, but they are relatively easy to handle.

So, with this approach, you can now map a configuration with overlap/s

to one without overlaps

Now, a couple of new issues have cropped up. First, there are some seriously bad solutions that are being sampled. For example, consider this one:

which is then mapped to this configuration

This is not helpful at all. Second, the parameters of this configuration are way above the optimisation boundaries that were used to create all configurations. As a result, it falls outside the configuration space where my search is taking place.

To address this issue, after repairing the configuration, I conduct a check to determine if the repaired configuration falls within my optimization boundaries. Any configurations that fall outside of these bounds are labeled as unfeasible solutions.

It might seems this puts me back in the same situation of multi-objective optimization, but that’s not the case. In my scenario, the Euclidean gradient at time t takes this form:

\nabla_{\bm{\theta}} J(\bm{\theta}^t) \ = \ \bm{\mu}^t_{\textbf{F}_{1- \frac{1}{q^{*}}}}-\bm{\mu}^t.

Here, \bm{\mu}^t represents the mean of all samples in the current batch, and I only consider the configurations without overlap or the repaired ones that lie within the optimization boundaries for \bm{\mu}^t_{\textbf{F}_{1- \frac{1}{q^{*}}}}. This effectively means I’ve eliminated the need for the penalty function and am optimizing only based on the objective function given by the packing density.

This is a promising, but it does introduce another problem. When I repair some of the unfeasible configurations, I end up making slight alterations to the underlying probability distribution, shifting it slightly from \bm{\theta}^t

to \bm{\theta}_{\text{rep}}^t

The issue here is that if I were to compute the Natural gradient as I used to, I’d obtain \widetilde{\nabla} J(\bm{\theta}_{\text{rep}}^t) =\mathcal{I}_{\bm{\theta}_{\text{rep} }^t}^{-1} \left({\bm{\mu}_{\text{rep}}^t}_{\textbf{F}_{1- \frac{1}{q^{*}}}}-\bm{\mu_\text{rep}}^t\right) in the tangent space \mathcal{T}_{\bm{\theta}_\text{rep}^t} of the statistical manifold, where everything is unfolding, but what I actually need is \widetilde{\nabla} J(\bm{\theta}^t) in the tangent space at the point \bm{\theta}^t.

Fortunately, the statistical manifold of the exponential family is a flat Riemannian manifold. So, first, I move from \bm{\theta}^t to \bm{\theta}_{\text{rep} }^t using this vector \mathcal{I}_{\bm{\theta}^t}^{-1} \left(\bm{\mu_\text{rep}}^t-\bm{\mu}^t\right) (where \bm{\mu}^t corresponds to the dual Legendre parametrization of \bm{\theta}^t, and \bm{\mu}_\text{rep}^t to \bm{\theta}_\text{rep}^t), and then I compute the natural gradient at the point \bm{\theta}_\text{rep}^t.

In addition, to increase the competition within my packing population, I introduce a sort of tournament by retaining the N-best configurations I’ve found so far and incorporating them with the repaired configurations from the current generation. This leads to yet another slightly modified distribution, \bm{\theta}_\text{rep + N-best}^t

and I have to perform the same Riemannian gymnastics once more. In the end, the approximation to the natural gradient at the point \bm{\theta}^t is as follows

\widetilde{\nabla} J(\bm{\theta}^t) = \mathcal{I}_{\bm{\theta}_{\text{rep + N-best} }^t}^{-1} \left({\bm{\mu}_{\text{rep + N-best}}^t}_{\textbf{F}_{1- \frac{1}{q^{*}}}}-\bm{\mu}_{\text{rep + N-best} }^t\right) + \mathcal{I}_{\bm{\theta}_{\text{rep} }^t}^{-1} \left(\bm{\mu}_{\text{rep + N-best}}^t-\bm{\mu_\text{rep}}^t\right) + \mathcal{I}_{\bm{\theta}^t}^{-1}\left(\bm{\mu_\text{rep}}^t-\bm{\mu}^t\right)

So far, experiments show that these modifications to the ETRPA are far more effective than the original ETRPA. Here’s a visualization from one of the test optimization runs:

In this particular run, the best packing I achieved attains the density of 0.7845. Previously, reaching such a level of density required multiple rounds of the refinement process, each consisting of 8000 iterations. Now, a single run with just 2000 iterations is sufficient. It’s worth noting that even after only 200 iterations, the algorithm already reaches areas with densities exceeding 0.7.

It’s important to mention that currently, I’ve only implemented the repair process for unfeasible configurations for the P1 space group. However, similar repairs can be done for configurations in any space group. Furthermore, the algorithm’s convergence speed can be further enhanced by fine-tuning its hyperparameters. Perhaps it’s time to make use of the BARKLA cluster for the hyperparameter tuning.

Anthracene P1 packing

Left: Anthracene, modeled as a collection of spheres, with each sphere’s radius given by the respective van der Waals radius of each atom in the molecule. Right: P1 packing of this model, with density of approximately 0.730855.

Here is a visualization showing the evolution of the Extended Multivariate von Mises distribution during the search run:

Minimising a Lennard-Jones hexagonal molecular system

Each frame of this simulation showcases the outcomes of Sequential Quadratic Programming (SQP) minimization applied to a Lennard-Jones system. The system’s energy is defined by the following expression:

    \begin{equation*} E_{LJ}=\sum_{\{ A, B \} } \frac{1}{2}\sum_{a \in A} \sum_{b \in B} 4\epsilon\left[\left(\frac{\sigma}{r_{ab}}\right)^{12} - \left(\frac{\sigma}{r_{ab}}\right)^{6}\right] \end{equation*}

where the parameters are set as \sigma = 1 and \epsilon = \frac{1}{4}. Here, \{ A, B \} represents the ordered pair of molecules A and B. The simulation involves a system of 500 molecules, each comprising 6 atoms. At the start, a uniform random configuration was generated, with each molecule represented by two center of mass coordinates and one rotational coordinate.

As the simulation progresses, the boundaries of the box are incrementally reduced following each SQP minimization. This effectively simulates an increase in pressure on the system. Notice the hexagonal tiling patches as the simulation advances. These are depicted as blue regular hexagons, each scaled by a factor of 1.93185 and rotated by an angle of 0.17862 relative to the molecules.

It’s important to note that the empty spaces observed in the simulation are a result of the rate at which pressure is increased. A sufficiently slow increase in pressure would eventually lead to a 6^3 regular tiling.

Densest P1 packing of the tetrahedron

I applied the Entropic Trust Region Packing Algorithm to search for the densest lattice packing of tetrahedra. It turned out that the GPU implementation of the overlap constraint evaluation for space group packings of polyhedra was more complicated than I expected. The good news was that the optimization algorithm worked as thought.

The densest lattice packing of the regular tetrahedron is \frac{18}{49}\approx 0.36734 (https://www.ams.org/journals/bull/1970-76-01/S0002-9904-1970-12400-4/). This is the configuration the algorithm found. See the visualization of the output packing with density 0.36734 below.

Densest P1, P-1 and P21/c packings of spheres

I started to work on space group packings. I applied the Entropic Trust Region Packing Algorithm (https://arxiv.org/abs/2202.11959) to search for the densest packings of the sphere in space groups P1 and P-1.

In the P1 setting, the densest packing density coincides with the general optimal packing density of spheres (https://www.jstor.org/stable/20159940). If the unit cell parameters are set to a=b=c=2 and \alpha=\beta=\gamma=\frac{\pi}{3} with the unit cell given by

    \begin{equation*} U=\left[\begin{array}{ccc} a\sin(\beta)\sqrt{1-\left(\frac{\cos(\alpha)cos(\beta) - cos(\gamma)}{\sin(\alpha)\sin(\beta)}\right)^2} & 0  & 0  \\ -a\frac{\cos(\alpha)cos(\beta) - cos(\gamma)}{\sin(\alpha)} & b\sin(\alpha) & 0  \\ a\cos(\beta) & b\cos(\alpha)  & c \end{array}\right] \end{equation*}

then the determinant of U equals to \det(U)=4\sqrt{2} and combined with the volume of the unit sphere \text{vol}(S)=\frac{4}{3}\pi gives the optimal packing density of spheres \rho(S_{P1})=\frac{\text{vol}(S)}{\det(U)}=\frac{\pi}{\sqrt{18}}\approx 0.74048. See a visualization of this configuration below.

There are a few interesting things about this. First, for some reason, I could find this configuration reported anywhere. Second, it is conjectured that the densest packings of centrally symmetric polyhedra are P1 packings (https://journals.aps.org/pre/abstract/10.1103/PhysRevE.80.041104). It might be that this conjecture can be extended to a larger class of centrally symmetric convex sets, similar to the 2D setting where the densest packing of centrally symmetric convex sets is a lattice packing (https://link.springer.com/article/10.1007/BF02392671).

In the P-1 setting the situation is similar. If the unit cell parameters are set to a=c=2, b=2\sqrt{2} and \alpha=\beta=\gamma=\frac{\pi}{2}, then the determinant of the unit cell equals to \det(U)=8\sqrt{2} and again we get the optimal packing density of spheres \rho(S_{P-1})=\frac{2\text{vol}(S)}{\det(U)}=\frac{\pi}{\sqrt{18}}. See the visualization of this configuration below.

Here again, I couldn’t find this configuration of spheres reported anywhere. Another interesting thing is that the situation resembles the 2D setting, where the densest P1 and P2 configurations of a disc have the same packing density (https://journals.aps.org/pre/abstract/10.1103/PhysRevE.106.054603). The P-1 space group can be considered as an equivalent of the P2 plane group since the point group symmetries are given via inversion through the centre of the unit cell. It might be that centrally symmetric polyhedra attain their optimal packings also in the P-1 space group. Moreover, it is conjectured that for convex 2D sets, the P2 configurations are optimal (https://link.springer.com/article/10.1007/s00454-016-9792-4). I am wondering for which other convex 3D sets is the P-1 packing configuration optimal.

With the P21/c space group, the symmetries of the densest sphere packing continue. If the unit cell parameters are set to a=2\sqrt{2}, b=4, c=2 and \alpha=\beta=\gamma=\frac{\pi}{2}, then the determinant of the unit cell equals to \det(U)=16\sqrt{2} and again we get the optimal packing density of spheres \rho(S_{P21/c})=\frac{4\text{vol}(S)}{\det(U)}=\frac{\pi}{\sqrt{18}}. See the visualization of this configuration below.

What is interesting is that the P21/c and P-1 space groups are the most frequent space groups found in the Cambridge Structural Database amounting to 59.3% of structures (https://www.ccdc.cam.ac.uk/support-and-resources/ccdcresources/f0dbc77b8b6041b8be63ff490bad33d6.pdf). I wonder if there is a relationship between densest space group packing of a sphere and the frequencies space groups occurring in CSD.