Quasi-Newton (downhill gradient) vs. Bayesian fitting (MCMC)

Robin
New Member

I couple the distributions and kinematics of cold gas in galaxies w/ Bulge/Disk decompositions.

Posts: 13

Quasi-Newton (downhill gradient) vs. Bayesian fitting (MCMC) Jul 28, 2017 11:45:10 GMT 8

Quote

Post by Robin on Jul 28, 2017 11:45:10 GMT 8

It has become clear to me that using quasi-Newton (hill climbing optimisation) will cause the optimisation to almost always get stuck in local minima. This should be compared with methods using Bayesian inference such as a Markov Chain Monte Carlo (MCMC) approach which attempts to collect a large number of samples of the parameter space and finds the global maximum of likelihood within the posterior distributions.
As an example, I show the outcomes of an optimisation of a galaxy for which I have chosen to feed in differing initial conditions for the effective radii of the bulge and disk. First, here are the results of running a 'BFGS' downhill gradient method:

Note that there are three distinct solutions for which initial conditions appear to converge upon.
Now compare this to the results from the MCMC optimisation with the same initial conditions:

Not only do the majority of solutions converge, they appear to be finding the global minimum of the full parameter space that the BFGS routine could not.

*Note that there is one solution in the MCMC optimisations that has diverged. Upon inspection, this is due to constraints which unfortunately caused the effective radius of the bulge and disk to be fixed. Without constraints, the solutions are almost always robust to the provision of the initial conditions.

Admin Aaron
Administrator

Posts: 9

Quasi-Newton (downhill gradient) vs. Bayesian fitting (MCMC) Jul 28, 2017 17:45:07 GMT 8

Quote

Post by Admin Aaron on Jul 28, 2017 17:45:07 GMT 8

Thanks Robin, and yes that is a good point.

A better option for initial parameter searches, if you really have no idea what you should be using, is to use one of the genetic algorithms (there are example vignettes using CMA). In practice a combination of ProFound outputs and a better ML optimisation (using a GA or a better behaved hill climber) will usually get you close enough to at least converge quickly with your MCMC of choice.

Cheers,

A

I work for ProFit.

Sarah
New Member

Posts: 31

Quasi-Newton (downhill gradient) vs. Bayesian fitting (MCMC) Jul 31, 2017 14:18:07 GMT 8

Quote

Post by Sarah on Jul 31, 2017 14:18:07 GMT 8

I'd like to add though that you have simple things (such as stars rather than galaxies), and good initial guesses e.g. from the segmentation statistics returned by ProFound, then a downhill gradient algorithm will have no trouble converging on the global maximum and will be much faster than a full MCMC. Of course the MCMC will still work, but it is a bit of an overkill in simple cases.

asgr Junior Member Posts: 56	Quasi-Newton (downhill gradient) vs. Bayesian fitting (MCMC) Jul 31, 2017 18:14:05 GMT 8 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by asgr on Jul 31, 2017 18:14:05 GMT 8 Yes, for stars the ProFound guesses are a very good start, and full exhaustive MCMC is usually overkill.

taranu
New Member

Posts: 9

Quasi-Newton (downhill gradient) vs. Bayesian fitting (MCMC) Aug 1, 2017 12:16:27 GMT 8

Quote

Post by taranu on Aug 1, 2017 12:16:27 GMT 8

I'm surprised that it's quite that bad. Are you fitting sizes in linear or log space? It should be log - I don't think there's any advantage to fitting linear sizes. If you still have this issue with BFGS, try some of the other algorithms in optim/nloptr as discussed in this thread: profit.freeforums.net/thread/20/choose-algorithm?page=1

Robin
New Member

I couple the distributions and kinematics of cold gas in galaxies w/ Bulge/Disk decompositions.

Posts: 13

Quasi-Newton (downhill gradient) vs. Bayesian fitting (MCMC) Aug 9, 2017 15:20:20 GMT 8

Quote

Post by Robin on Aug 9, 2017 15:20:20 GMT 8

Yes, I'm fitting effective radius (as well as Sersic index and axial ratio) in log space. I think I chose a particularly convoluted galaxy (spiral-arms, highly flocculant) in this example so potentially I'm giving optim a bad rap. It is true that for more simple galaxies, the results are not so divergent.

asgr
Junior Member

Posts: 56

Quasi-Newton (downhill gradient) vs. Bayesian fitting (MCMC) Aug 10, 2017 20:00:26 GMT 8

Quote

Post by asgr on Aug 10, 2017 20:00:26 GMT 8

To complicate things, if the data cannot be fit by the model (i.e. if the data in detail cannot be represented by the model, which is always true in some regime) then the best (most likely) global fit is not necessarily the best (in the sense of being the most informative). Even in a simple case of trying to fit a Normal + Uniform with a mixture of Normals will certainly create a biased fit even for the Normal part (it expends effort trying to fit the unfittable Uniform part).

In other words, if the residuals are not sub dominant compared to the sigma map then anything goes...

A

Quasi-Newton (downhill gradient) vs. Bayesian fitting (MCMC)

Post by Robin on Jul 28, 2017 11:45:10 GMT 8

Post by Admin Aaron on Jul 28, 2017 17:45:07 GMT 8

Post by Sarah on Jul 31, 2017 14:18:07 GMT 8

Post by asgr on Jul 31, 2017 18:14:05 GMT 8

Post by taranu on Aug 1, 2017 12:16:27 GMT 8

Post by Robin on Aug 9, 2017 15:20:20 GMT 8

Post by asgr on Aug 10, 2017 20:00:26 GMT 8

Quick Reply