IGO with adaptive sampling for the learning rates

Here are some results from the implementation of adaptive sampling for the tuning of learning rates. 25 simulation threads with different learning rates (current learning rate + a multiple of delta) are run and every n iterations of the IGO algorithm sets the current learning rate to the learning rate of the thread where the minimum mean of the objective was attained during these n iterations and all 25 threads. An the process gets repeated.

Initial learning rates: [1,1]; Number of samples: 100; delta: 0.25; Iterations: 25

Best objective 3.906106261908349e-05 @ [3.141947289734544, 3.141325967480646] attained @ 799th iteration.

Initial learning rates: [1,1]; Number of samples: 100; delta: 0.125; Iterations: 25

Best objective 2.811559213462544e-04 @ [3.142757645858563, 3.141837545625751] attained @ 2392th iteration.

Initial learning rates: [1,1]; Number of samples: 1000; delta: 0.125; Iterations: 10

Best objective 9.220073825133568e-05 @ [3.141111456149766, 3.141109756733831] attained @ 327th iteration.

IGO with two learning rates

I have separated the learning rate into two. First the learning rate for the parameters related to concentration and mean. The second learning rate for parameters for the precision matrix.

All simulation where run for 1000 iterations.

Objective function: 2D Rastrigin centered at [pi,pi].

Eta_d = 1; Eta_D=0.25; 100 samples

Best fit: 5.092878218704300e-05 @ [3.142098748056062, 3.142098748056062]

Eta_d = 1; Eta_D=0.25; 1000 samples

Best fit: 0.011457103531619 @ [3.134161151183576, 3.140001246486607]

Eta_d = 1; Eta_D=0.5; 25 samples

Best fit: 0.013942156424022 @ [3.149423592902749, 3.138598537797074]

Eta_d = 1; Eta_D=0.5; 50 samples

Best fit: 2.739779972955603e-04 @ [3.140419809603958, 3.141666381043835]

Eta_d = 1; Eta_D=0.5; 75 samples

Best fit: 3.583357357932471e-04 @[3.142532825223459, 3.140632298254060]

Eta_d = 1; Eta_D=0.5; 100 samples

Best fit: 6.607625318082455e-04 @ [3.140007830023662, 3.140687697105372]

Eta_d = 1; Eta_D=0.5; 325 samples

Best fit: 0.005180560001300 @ [3.136709161988744, 3.140087288810498]

Eta_d = 1; Eta_D=0.5; 550 samples

Best fit: 0.013942156424022 @ [3.140419809603958, 3.138598537797074]

Eta_d = 1; Eta_D=0.5; 1000 samples

Best fit:9.464108499734891e-04 @ [3.143699809057704, 3.142167429144717]

Eta_d = 1; Eta_D=0.75; 100 samples

Best fit: 0.002806343200142 @ [3.138836948966443, 3.139032993376946]

Eta_d = 1; Eta_D=0.75; 1000 samples

Best fit: 4.175267021224727e-04 @ [3.140161030118904, 3.141358087630181]

IGO learning rate

Evolution of mean objective value for 2D Rastrigin function optimization. Rastrigin function is shited so that the min is at [pi,pi] and the optimization is bounded to [0,2*pi]x[0,2*pi] with various learning rate values (eta)

Left: using 100 samples Right: using 1000 samples.

Single: using 100 samples

eta = 0.3

eta = 0.6

eta = 0.9

eta = 1.2

eta = 1.5

eta = 1.8

eta = 2.1

eta = 2.4

eta = 2.7

eta = 3

eta = 3.3

eta = 3.6

eta = 3.9

eta = 4.2

eta = 4.5

eta = 4.8

eta = 5.1

eta = 5.4

eta = 5.7

eta = 6

Comparison of the Gibbs sampler for the extension of the multivariate von Mises distribution and rejection sampling algorithm for the multivariate von Mises distribution

I did a comparison of the Mardia rejection sampling algorithm for the sine model of the bivariate von Mises and the trivariate von Mises distributions and the Gibbs sampling algorithm for my extension of the multivariate von Mises distribution in the case when the distribution coincide.

 

Left: Mardia rejection sampling algorithm. Right: Gibbs sampling algorithm.

Bivariate von Mises

BVM(0,0,0)

BVM(pi,10,2)

BVM(0,0,10)

Trivariate von Mises

TVM(0,0,10*[0 -1 1;-1 0 1;1 1 0])

projection along z axis

projection along y axis

projection along x axis

Generalized von Mises distribution sampler using von Neumann’s rejection sampling algorithm.

Algorithm: Some computational aspects of the generalized von Mises distribution

100000 samples generated for every value of parameters.

The red line is the generalized von Mises density function.

mu_1 = 4.5, mu_2 = 1, kappa_1 = 0.8, kappa_2 = 2

mu_1 = 2, mu_2 = 1, kappa_1 = 3, kappa_2 = 0

mu_1 = 2, mu_2 = 1, kappa_1 = 0 kappa_2 =3

mu_1 = 2, mu_2 = 1, kappa_1 = 0 kappa_2 =0

 

 

Heptagon space group packings using xNES

I started with plane group packings of heptagons.  The heptagons are not precisely regular as I used a simple construction method. That is I placed points on a unit circle with the angle difference of 2*pi/7.

For every space group 100 experiments were performed with random initial configurations.  The way the algorithm works now is that it doesn’t need feasible initial configuration.

The best know packing of heptagons is 0.892690686126509 which is also believed to be optimal. Same double lattice packing as withe pentagons.  https://blogs.ams.org/visualinsight/2014/11/15/packing-regular-heptagons/

Plane group: p2

Max density:  0.892607642589804

Mean density:  0.8425

Density variance:  0.0015

Number of infeasible solutions: 0

0.4777
0.2507
0.8900
1.9010
3.7421
1.0389

Plane group: cm

Max density:  0.8414

Mean density:  0.7251

Density variance: 0.0033

Number of infeasible solutions: 0

Plane group: p2mm

Max density:  0.7365

Mean density:  0.5760

Density variance: 0.0125

Number of infeasible solutions: 0

Plane group: p2mg

Max density:  0.8238

Mean density:  0.6826

Density variance:  0.0097

Number of infeasible solutions: 0

Plane group: p2gg

Max density:  0.892690618215488

Mean density:  0.8643

Density variance:  6.5725e-04

Number of infeasible solutions: 0

Plane group: c2mm

Max density:  0.7390

Mean density:  0.5719

Density variance: 0.0040

Number of infeasible solutions: 0

Plane group: p3m1

Max density:  0.5718

Mean density:  0.5377

Density variance: 0.0022

Number of infeasible solutions: 0

0.3232
0.3333
4.4878
5.7579

xNES constraint handling for pentagon packings in plane group pg.

For every constraint handling method 100 optimization procedures were performed with random initial configurations. The proved optimal packing density is (5-sqrt(5))/3 ~  0.921310674166737

Penalty function based on feasibility

I used the penalty function from this paper:  An efficient constraint handling method for genetic algorithms

The closest the algorithm got to the optimum was  0.921310674166735 in 2 instances with the constraint violation of 1.110223024625157e-16. This is also the minimum constraint violation achieved in 100 runs. This same value of constraint violation had 53 other instances with mean 0.921310674166652 and variance 2.276453802275327e-25.

The overall mean density from the 100 runs was 0.8926 and variance 0.0061.

In 99 instances the output solution violated constrains. In these cases the mean of constraint violation was 0.002671273212076 and variance 1.732259359382852e-04.

Dynamic penalty

I’ve tested the dynamic penalty function from: A survey of constraint handling techniques used with evolutionary algorithms

In 100 experiments all of the output solution violated the constraints. the minimum constrain violation in these 100 instances was 0.8888 and mean constraint violation 1.5547. This is very bad.

An interesting observation is that when I replaced the values of the objective function with the maximum of feasible solutions (as in Penalty function based on feasibility) in the penalty function the algorithm suddenly performed very well.

Annealing penalty

Same case as with dynamic penalty.

Death penalty

Not really good. Gets stuck in configurations where it can’t find any feasible samples very quickly.

Repair

Very slow and not very precise due to the way the repairing is implemented. That is by multiplying the unit cell by a small constant c>1 until there is no overlap.

Augmented lagrangian method

I’ve implemented the augmented lagrangian method for the xNES from this paper:  Augmented Lagrangian Genetic Algorithm for Structural Optimization

Same as in the case of dynamic penalties. All of the 100 experiments outpu solutions violated the constraints. The mean constraint violation was 1.0899 with variance 0.0020. Not useful at all.

Augmented lagrangian method with penalty function based on feasibility

In 45 cases the algorithms  output solution’s nonlinear constrains were distanced from zero less then 1.110223024625157e-16. Out of these 32 were unfeasible. From the feasible solutions the maximum density was 0.921310674166734 and the constraint violation of 0. Below is a picture of this solution.

The mean density of these 45 cases was 0.9205 and variance 2.0456e-05. Overall the mean density was 0.8777 and variance 0.0107. In 79 cases the constraints were violated. with mean constraint violation of 0.0017 and variance 6.7180e-05.