Setting: initial learning rates = [1,1]; runs=1; shrinkage covariance as previously; and resetting the learning rates to initial values left after 100 iterations right after 200 iterations.
Setting: adaptive learning rates with 2*delta and delta set to eta_{*}/3. 100 samples per run. Starting distribution: uniform. Shrinkage covariance computed with this: http://ledoit.net/shrinkDiag.m
Starting learning rates: eta_d=10; eta_D=10; iterations=1000; runs=1;
Starting learning rates: eta_d=10; eta_D=10; iterations=100; runs=10;
Starting learning rates: eta_d=10; eta_D=1; iterations=100; runs=10;
Starting learning rates: eta_d=10; eta_D=10; iterations=100; runs=10;
Starting learning rates: eta_d=1; eta_D=1; iterations=100; runs=10;
Here are some results from the implementation of adaptive sampling for the tuning of learning rates. 25 simulation threads with different learning rates (current learning rate + a multiple of delta) are run and every n iterations of the IGO algorithm sets the current learning rate to the learning rate of the thread where the minimum mean of the objective was attained during these n iterations and all 25 threads. An the process gets repeated.
Initial learning rates: [1,1]; Number of samples: 100; delta: 0.25; Iterations: 25
Best objective 3.906106261908349e-05 @ [3.141947289734544, 3.141325967480646] attained @ 799th iteration.
Initial learning rates: [1,1]; Number of samples: 100; delta: 0.125; Iterations: 25
Best objective 2.811559213462544e-04 @ [3.142757645858563, 3.141837545625751] attained @ 2392th iteration.
Initial learning rates: [1,1]; Number of samples: 1000; delta: 0.125; Iterations: 10
Best objective 9.220073825133568e-05 @ [3.141111456149766, 3.141109756733831] attained @ 327th iteration.
I have separated the learning rate into two. First the learning rate for the parameters related to concentration and mean. The second learning rate for parameters for the precision matrix.
All simulation where run for 1000 iterations.
Objective function: 2D Rastrigin centered at [pi,pi].
Eta_d = 1; Eta_D=0.25; 100 samples
Best fit: 5.092878218704300e-05 @ [3.142098748056062, 3.142098748056062]
Eta_d = 1; Eta_D=0.25; 1000 samples
Best fit: 0.011457103531619 @ [3.134161151183576, 3.140001246486607]
Eta_d = 1; Eta_D=0.5; 25 samples
Best fit: 0.013942156424022 @ [3.149423592902749, 3.138598537797074]
Eta_d = 1; Eta_D=0.5; 50 samples
Best fit: 2.739779972955603e-04 @ [3.140419809603958, 3.141666381043835]
Eta_d = 1; Eta_D=0.5; 75 samples
Best fit: 3.583357357932471e-04 @[3.142532825223459, 3.140632298254060]
Eta_d = 1; Eta_D=0.5; 100 samples
Best fit: 6.607625318082455e-04 @ [3.140007830023662, 3.140687697105372]
Eta_d = 1; Eta_D=0.5; 325 samples
Best fit: 0.005180560001300 @ [3.136709161988744, 3.140087288810498]
Eta_d = 1; Eta_D=0.5; 550 samples
Best fit: 0.013942156424022 @ [3.140419809603958, 3.138598537797074]
Eta_d = 1; Eta_D=0.5; 1000 samples
Best fit:9.464108499734891e-04 @ [3.143699809057704, 3.142167429144717]
Eta_d = 1; Eta_D=0.75; 100 samples
Best fit: 0.002806343200142 @ [3.138836948966443, 3.139032993376946]
Eta_d = 1; Eta_D=0.75; 1000 samples
Best fit: 4.175267021224727e-04 @ [3.140161030118904, 3.141358087630181]
Evolution of mean objective value for 2D Rastrigin function optimization. Rastrigin function is shited so that the min is at [pi,pi] and the optimization is bounded to [0,2*pi]x[0,2*pi] with various learning rate values (eta)
Left: using 100 samples Right: using 1000 samples.
Single: using 100 samples
eta = 0.3
eta = 0.6
eta = 0.9
eta = 1.2
eta = 1.5
eta = 1.8
eta = 2.1
eta = 2.4
eta = 2.7
eta = 3
eta = 3.3
eta = 3.6
eta = 3.9
eta = 4.2
eta = 4.5
eta = 4.8
eta = 5.1
eta = 5.4
eta = 5.7
eta = 6
I did a comparison of the Mardia rejection sampling algorithm for the sine model of the bivariate von Mises and the trivariate von Mises distributions and the Gibbs sampling algorithm for my extension of the multivariate von Mises distribution in the case when the distribution coincide.
Left: Mardia rejection sampling algorithm. Right: Gibbs sampling algorithm.
Bivariate von Mises
BVM(0,0,0)
BVM(pi,10,2)
BVM(0,0,10)
Trivariate von Mises
TVM(0,0,10*[0 -1 1;-1 0 1;1 1 0])
projection along z axis
projection along y axis
projection along x axis
Algorithm: Some computational aspects of the generalized von Mises distribution
100000 samples generated for every value of parameters.
The red line is the generalized von Mises density function.
mu_1 = 4.5, mu_2 = 1, kappa_1 = 0.8, kappa_2 = 2
mu_1 = 2, mu_2 = 1, kappa_1 = 3, kappa_2 = 0
mu_1 = 2, mu_2 = 1, kappa_1 = 0 kappa_2 =3
mu_1 = 2, mu_2 = 1, kappa_1 = 0 kappa_2 =0
Slides: AI3SDpresentation.pdf