Sphere packings continued – space groups P2_1 and C2/c

In this post, we’re diving back into the world of sphere packings, focusing on the space groups P2_1 and C2/c. Feel free to check out an earlier discussion on Densest P1, P-1 and P21/c packings of spheres. Both of these groups reach the packing density of \frac{\pi}{\sqrt{18}}.

The densest P2_1 packing almost identical to the P\overline{1} group. Even though they belong to different crystal systems—triclinic for P\overline{1} and monoclinic for P2_1—they share the same unit cell parameters: a = c = 2, b = 2\sqrt{2}, and \beta = \frac{\pi}{2}. Here’s a snapshot of what this packing looks like:

Next up is the C2/c space group, which also falls under the monoclinic category. This group is a bit more complex, with eight symmetry operations. Its densest packing has unit cell parameters of a = b = 4, c = 2\sqrt{3}, and \beta = \pi - \text{arcsin} \left( \frac{\sqrt{2}}{\sqrt{3}} \right). Check out the illustration below of the densest packing configuration:

So far, we’ve seen that the optimal packing density for the space groups P1, P\overline{1}, P21/c, P2_1, and C2/c matches the general optimal sphere packing density. This isn’t too surprising since these groups are all related to the Hexagonal Closed Packed structure, via group – subgroup relations. Interestingly, these space groups are among the top ten of the most frequently occurring in the Cambridge Structural Database, making up 74% of its entries.

Yayoi Kusama’s Chandelier of Grief: Symmetry in Art and Science

This photo captures my fiancé Lina and myself at Yayoi Kusama’s “Infinity Mirror Rooms” exhibit at Tate Modern in London, taken in March 2024. The installation, called “Chandelier of Grief,” uses mirrors and light to explore ideas about symmetry.

The centrepiece of the artwork is a chandelier surrounded by six mirrors arranged in a regular hexagonal prism. This creates an interesting visual effect—an infinite lattice of chandeliers extending in all directions, with each reflected chandelier appearing at the centre of symmetry within this lattice.

What we’re experiencing is a visual representation of a hexagonal lattice, one of the five two-dimensional Bravais lattices. Mathematically, it represents a discrete group of isometries in two-dimensional Euclidean space. The point group of this hexagonal lattice is isomorphic to the dihedral group D_6, highlighting its sixfold rotational symmetry and mirror reflection properties.

Physically, the hexagonal lattice emerges as the ground state configuration for systems of particles interacting via certain potentials, such as the Lennard-Jones potential, in the thermodynamic limit. Moreover, the hexagonal lattice is realised in materials like graphene, where carbon atoms arrange themselves in this highly symmetric pattern, leading to its unique properties.

Kusama’s installation is not only aesthetically pleasing but also serves as a visual representation of abstract concepts in mathematics and physics. The infinite reflections illustrate how artistic expression can intersect with scientific principles, making complex ideas accessible and engaging to a wider audience.

CSP via packing density

How close can we get to the lowest energy molecular structures just by maximizing packing density? Let’s compare the structures Andy obtained from Materials Studio by minimising some force-field potential

with the structures I got from maximizing packing density: https://milotorda.net/getting-loads-of-sufficiently-dense-structures/ .

To get a clearer view, we took Andy’s set, created a neighborhood around the packing coefficient, divided the interval into 200 bins, and in each bin, selected the lowest energy structure. Then, we calculated the energies of these structures using VASP’s DFT (PBE) and computed the packing density via https://milotorda.net/exercise-in-monte-carlo-density-computation/

Next, we selected around 200 structures generated here https://milotorda.net/getting-loads-of-sufficiently-dense-structures/ and computed their energies using VASP’s DFT, just like we did for Andy’s set.

The energy difference between the lowest energy structure in Andy’s set and mine is \Delta \textbf{E}=0.03268 \ \text{eV}. However, in Andy’s structures, the van der Waals spheres of different anthracene molecules overlap. If we filter out the structures with overlaps,

then the energy difference between the lowest energy structures in both sets is \Delta \textbf{E}=0.02383 \ \text{eV}.

We then continued to further optimise the energy of all structures using VASP’s implementation of the conjugate gradient method. Here, the molecules are no longer rigid, meaning the bond lengths of the molecules in each structure change.

To be honest, I didn’t mention that we performed DFT calculations for a broader density spectrum from Andy’s CSP set. At the time of the DFT calculations, I didn’t have density estimates for all configurations, so I used the packing coefficient given by Materials Studio to guide the selection of structures. The density estimates I used are slightly different. Here is the full picture,

which brings up the question of how far from the highest density structure (in the density spectrum) do we have to look for the lowest energy structure.

Exercise in Monte-Carlo packing density computation.

I needed a way to calculate the packing densities of structures modeled using the van der Waals sphere model, generated through CSP and DFT computations. Specifically, for periodic configurations, this comes down to computing:: \rho = \frac{\text{vol}(\mathbf{O})}{\text{vol}(\mathbf{U})}. Here, \text{vol}(\mathbf{U}) represents the volume of the unit cell \mathbf{U}, and \text{vol}(\mathbf{O}) is the volume of the subset of \mathbf{U} occupied by the van der Waals spheres denoted as \mathbf{O}.

Calculating the volume of \mathbf{U} is is straightforward; it’s simply the determinant of the unit cell’s generators: \text{vol}(\mathbf{U})}=\text{det}(\mathbf{U}). However, analytically computing the volume of \mathbf{O} involves solving numerous intricate integrals of intersections of spheres with various radii, making it cumbersome.

Let’s try to do this the Monte-Carlo way, instead. The volume of \mathbf{O} can be expressed as an integral over the unit cell \mathbf{U}, i.e.

    \begin{equation*}\text{vol}(\mathbf{O})=\int_{\mathbf{O}} d\mathbf{V}\end{equation*}

Here, d\mathbf{V} is the natural volume form on \mathbf{R}^3. Since this integral is zero everywhere except in the unit cell, it can be redefined as an integral of the indicator function over the unit cell:

    \begin{equation*}\text{vol}(\mathbf{O})=\int \bm{1}_{\mathbf{O}} d\mathbf{U}\end{equation*}

By changing coordinates to an integral over the unit cube \textbf{C}, we get:

    \begin{equation*}\text{vol}(\mathbf{O})=\int \bm{1}_{\mathbf{U}^{-1}\mathbf{O}} \text{det}(\mathbf{U}) d\mathbf{C}\end{equation*}

Plugging this into the density equation yields:

    \begin{equation*}$\rho=\int \bm{1}_{\mathbf{U}^{-1}\mathbf{O}} d\mathbf{C}\end{equation*}

Considering \mathbf{C} as a uniform random vector on the unit cube, this integral represents the expected value of the indicator function with respect to the 3-variate uniform distribution. This, in turn, equals the probability of X \sim \mathbf{C} lying in the collection of the van der Waals spheres \mathbf{U}^{-1}\mathbf{O}, scaled by the inverse of the unit cell:

    \begin{equation*}$\rho=\textbf{E}\left[ \bm{1}_{\mathbf{U}^{-1}\mathbf{O}} \right]=\text{P}\left( X \in \mathbf{U}^{-1}\mathbf{O} \right)\end{equation*}

This means that the crystal’s density can be estimated by sampling X_1, \ldots X_N \sim \mathbf{C}, counting how many fall inside \mathbf{U}^{-1}\mathbf{O}, and dividing by N:

    \begin{equation*}\hat{\rho} = \frac{\# \left \{ X_i \in \mathbf{U}^{-1}\mathbf{O} \right \}}{N}\end{equation*}

Now, say I want to determine the density of this P21C anthracene structure

Equivalently, I can sample uniformly from the parallelepiped defined by the unit cell, e.g. 100 000 realisations,

and count how many fall inside at least one of the van der Waals spheres

which is 73056. Therefore, the density estimate is \hat{\rho}=0.73056.

But how accurate is this estimate? Or even better is to ask, how large does N need to be so that | \hat{\rho} - \rho | < c, where c is some constant?

To answer this, consider each X_i as a Bernoulli trial. Then \mathbm{X}=\sum_{i=1}^N X_i is binomially distributed random variable with the expected value N\rho and variance N\rho\left(1-\rho \right). According to the central limit theorem, the random variable \mathbm{Z}=\frac{\sqrt{N}\left( \hat{\rho} - \rho \right)}{\sqrt{\rho\left(1-\rho \right)}} converges in distribution to a normally distributed random variable with mean of 0 and a variance of 1. Therefore, I can request \mathbm{Z} being in the interval (-z;z), and I want the size of this interval to be less then some constant with probability p:

    \begin{equation*}P \left( | \hat{\rho} - \rho | < 2 z \sqrt{\frac{\hat{\rho}\left(1-\hat{\rho} \right)}{N}} \right)=p \end{equation*}

where z is the \frac{1+p}{2} quantile of the standard normal distribution.

For the P21C anthracene structure mentioned earlier, to achieve | \hat{\rho} - \rho | < 0.001 with probability p=0.999, I need N=9,000,000. The density estimate in this case is then:

    \begin{equation*}\hat{\rho} \approx 0.731501\end{equation*}

As a point of comparison, CCDC’s Mercury reports the packing coefficient of this structure as 0.733255.

Getting Loads of Sufficiently Dense Structures

After running the packing algorithm, the next question is how to generate an arbitrary number of sufficiently dense structures.

One way to do this is to keep all the feasible and repaired configurations sampled at each iteration of the optimization process. For instance, after using the method described here https://milotorda.net/etrpa-repairing-unfeasible-solutions/ , I obtained 573621 packings

and out of these I select those with densities higher than 0.7 which leaves me with 1296 structures

Still quite a lot. Lets reduce it even further by clustering. Most likely many of the structures are quite similar to each other and my goal is to identify the most dissimilar ones.

To do this, I did a single-linkage hierarchical clustering on the 1296 configurations using the standard Euclidean distance. Given that the underlying configuration space is a 9-dimensional unit flat torus, the metric coincides with the metric of that of \mathbb{R}^9, thus making sure that the single-linkage hierarchical clustering respects the geometry of my configuration space.

Suppose I want to have a selection of 100 most distinct structures. To do this, I cut the dendrogram tree at a level where it has 100 leaves (in terms of Topological Data Analysis I create a Čech complex with 100 connected components)

and within each cluster the configuration with the highest packing density, leaving me with 100 structures

Let’s have a look at the packing configurations of the two most distinct clusters:

Density 0.7267

Density 0.7053

These structures seem quite similar, possibly due to the limited variability allowed within the P1 space group.

A possible problem with these configuration might be, that in most of them, are actually in contact with each other because of the repairing map, and when any two van der Waals spheres of two molecules touch, the energy should go to infinity. So, let’s try another approach.

Since the algorithm performs a Riemannian gradient ascent on a manifold of parametric probability measures, the time evolution of the distribution parameters can be thought of as a trajectory of an autonomous dynamical system. After each iteration, I save the system’s state, and when the algorithm finishes, I can return to that state and generate as many samples as necessary.

For example, after completing the run described in this post ETRPA – repairing unfeasible solutions , I take the state where the maximum packing was sampled

and generate 100 packings with density higher than 0.7 from this distribution. However, this time, I skip the repairing step and only check for overlaps. It takes a bit longer to get 100 packings with density above 0.7 without the repairing process, but the molecules do not come into contact with each other in the resulting configurations.

Let’s take a look at the distribution of these 100 configurations in my configuration space

,the single-linkage hierarchical clustering dendrogram

and the two most distinct structures in this set

Density: 0.7299

Density: 0.7079

Again, it seems they don’t look that much different. Perhaps the best would be to merge the 100 structures selected from the entire optimisation run with the 100 structures generated without repair from the distribution where the maximum packing was sampled

and then use the single-linkage hierarchical clustering to create 100 clusters

to reduce the number of structures to 100 by taking the configuration with highes density in each cluster

In this set of 100 packings, 86 are from the set of selected structures in the optimisation run, and the ramaining 14 are from the 100 structures sampled without repair.

Here are examples from the two most distinct clusters, one obtained from the set without repair and one from the optimisation run set:

Density: 0.7221

Density: 0.7102

They still look like the same configuration, just realised with alternative unit cell parameters. I’m going to blame the lack of variability in the P1 plane group for this.

Repairing Unfeasible Solutions in ETRPA.

The ETRPA is designed to optimize a multi-objective function by:

1. Maximizing packing density.
2. Minimizing overlap between shapes in a configuration.

This is achieved through the use of a penalty function, which generally works quite effectively. However, there are some downsides to this approach. Many generated samples are flagged as unfeasible, leading to inefficiency. Additionally, some potentially good configurations with minimal overlaps are discarded. A more effective approach would involve mapping unfeasible solutions to feasible ones, essentially repairing the overlaps.

I’ve designed a relatively straightforward method to accomplish this almost analytically. It boils down to finding roots of a second-order polynomial, which is essentially a high school algebra problem. Admittedly, you’ll need to solve quite a few of these second-order polynomials, but they are relatively easy to handle.

So, with this approach, you can now map a configuration with overlap/s

to one without overlaps

Now, a couple of new issues have cropped up. First, there are some seriously bad solutions that are being sampled. For example, consider this one:

which is then mapped to this configuration

This is not helpful at all. Second, the parameters of this configuration are way above the optimisation boundaries that were used to create all configurations. As a result, it falls outside the configuration space where my search is taking place.

To address this issue, after repairing the configuration, I conduct a check to determine if the repaired configuration falls within my optimization boundaries. Any configurations that fall outside of these bounds are labeled as unfeasible solutions.

It might seems this puts me back in the same situation of multi-objective optimization, but that’s not the case. In my scenario, the Euclidean gradient at time t takes this form:

\nabla_{\bm{\theta}} J(\bm{\theta}^t) \ = \ \bm{\mu}^t_{\textbf{F}_{1- \frac{1}{q^{*}}}}-\bm{\mu}^t.

Here, \bm{\mu}^t represents the mean of all samples in the current batch, and I only consider the configurations without overlap or the repaired ones that lie within the optimization boundaries for \bm{\mu}^t_{\textbf{F}_{1- \frac{1}{q^{*}}}}. This effectively means I’ve eliminated the need for the penalty function and am optimizing only based on the objective function given by the packing density.

This is a promising, but it does introduce another problem. When I repair some of the unfeasible configurations, I end up making slight alterations to the underlying probability distribution, shifting it slightly from \bm{\theta}^t

to \bm{\theta}_{\text{rep}}^t

The issue here is that if I were to compute the Natural gradient as I used to, I’d obtain \widetilde{\nabla} J(\bm{\theta}_{\text{rep}}^t) =\mathcal{I}_{\bm{\theta}_{\text{rep} }^t}^{-1} \left({\bm{\mu}_{\text{rep}}^t}_{\textbf{F}_{1- \frac{1}{q^{*}}}}-\bm{\mu_\text{rep}}^t\right) in the tangent space \mathcal{T}_{\bm{\theta}_\text{rep}^t} of the statistical manifold, where everything is unfolding, but what I actually need is \widetilde{\nabla} J(\bm{\theta}^t) in the tangent space at the point \bm{\theta}^t.

Fortunately, the statistical manifold of the exponential family is a flat Riemannian manifold. So, first, I move from \bm{\theta}^t to \bm{\theta}_{\text{rep} }^t using this vector \mathcal{I}_{\bm{\theta}^t}^{-1} \left(\bm{\mu_\text{rep}}^t-\bm{\mu}^t\right) (where \bm{\mu}^t corresponds to the dual Legendre parametrization of \bm{\theta}^t, and \bm{\mu}_\text{rep}^t to \bm{\theta}_\text{rep}^t), and then I compute the natural gradient at the point \bm{\theta}_\text{rep}^t.

In addition, to increase the competition within my packing population, I introduce a sort of tournament by retaining the N-best configurations I’ve found so far and incorporating them with the repaired configurations from the current generation. This leads to yet another slightly modified distribution, \bm{\theta}_\text{rep + N-best}^t

and I have to perform the same Riemannian gymnastics once more. In the end, the approximation to the natural gradient at the point \bm{\theta}^t is as follows

\widetilde{\nabla} J(\bm{\theta}^t) = \mathcal{I}_{\bm{\theta}_{\text{rep + N-best} }^t}^{-1} \left({\bm{\mu}_{\text{rep + N-best}}^t}_{\textbf{F}_{1- \frac{1}{q^{*}}}}-\bm{\mu}_{\text{rep + N-best} }^t\right) + \mathcal{I}_{\bm{\theta}_{\text{rep} }^t}^{-1} \left(\bm{\mu}_{\text{rep + N-best}}^t-\bm{\mu_\text{rep}}^t\right) + \mathcal{I}_{\bm{\theta}^t}^{-1}\left(\bm{\mu_\text{rep}}^t-\bm{\mu}^t\right)

So far, experiments show that these modifications to the ETRPA are far more effective than the original ETRPA. Here’s a visualization from one of the test optimization runs:

In this particular run, the best packing I achieved attains the density of 0.7845. Previously, reaching such a level of density required multiple rounds of the refinement process, each consisting of 8000 iterations. Now, a single run with just 2000 iterations is sufficient. It’s worth noting that even after only 200 iterations, the algorithm already reaches areas with densities exceeding 0.7.

It’s important to mention that currently, I’ve only implemented the repair process for unfeasible configurations for the P1 space group. However, similar repairs can be done for configurations in any space group. Furthermore, the algorithm’s convergence speed can be further enhanced by fine-tuning its hyperparameters. Perhaps it’s time to make use of the BARKLA cluster for the hyperparameter tuning.

Anthracene P1 packing

Left: Anthracene, modeled as a collection of spheres, with each sphere’s radius given by the respective van der Waals radius of each atom in the molecule. Right: P1 packing of this model, with density of approximately 0.730855.

Here is a visualization showing the evolution of the Extended Multivariate von Mises distribution during the search run:

Minimising a Lennard-Jones hexagonal molecular system

Each frame of this simulation showcases the outcomes of Sequential Quadratic Programming (SQP) minimization applied to a Lennard-Jones system. The system’s energy is defined by the following expression:

    \begin{equation*} E_{LJ}=\sum_{\{ A, B \} } \frac{1}{2}\sum_{a \in A} \sum_{b \in B} 4\epsilon\left[\left(\frac{\sigma}{r_{ab}}\right)^{12} - \left(\frac{\sigma}{r_{ab}}\right)^{6}\right] \end{equation*}

where the parameters are set as \sigma = 1 and \epsilon = \frac{1}{4}. Here, \{ A, B \} represents the ordered pair of molecules A and B. The simulation involves a system of 500 molecules, each comprising 6 atoms. At the start, a uniform random configuration was generated, with each molecule represented by two center of mass coordinates and one rotational coordinate.

As the simulation progresses, the boundaries of the box are incrementally reduced following each SQP minimization. This effectively simulates an increase in pressure on the system. Notice the hexagonal tiling patches as the simulation advances. These are depicted as blue regular hexagons, each scaled by a factor of 1.93185 and rotated by an angle of 0.17862 relative to the molecules.

It’s important to note that the empty spaces observed in the simulation are a result of the rate at which pressure is increased. A sufficiently slow increase in pressure would eventually lead to a 6^3 regular tiling.