It has become clear to me that using quasi-Newton (hill climbing optimisation) will cause the optimisation to almost always get stuck in local minima. This should be compared with methods using Bayesian inference such as a Markov Chain Monte Carlo (MCMC) approach which attempts to collect a large number of samples of the parameter space and finds the global maximum of likelihood within the posterior distributions. As an example, I show the outcomes of an optimisation of a galaxy for which I have chosen to feed in differing initial conditions for the effective radii of the bulge and disk. First, here are the results of running a 'BFGS' downhill gradient method:
Note that there are three distinct solutions for which initial conditions appear to converge upon. Now compare this to the results from the MCMC optimisation with the same initial conditions:
Not only do the majority of solutions converge, they appear to be finding the global minimum of the full parameter space that the BFGS routine could not. *Note that there is one solution in the MCMC optimisations that has diverged. Upon inspection, this is due to constraints which unfortunately caused the effective radius of the bulge and disk to be fixed. Without constraints, the solutions are almost always robust to the provision of the initial conditions.
A better option for initial parameter searches, if you really have no idea what you should be using, is to use one of the genetic algorithms (there are example vignettes using CMA). In practice a combination of ProFound outputs and a better ML optimisation (using a GA or a better behaved hill climber) will usually get you close enough to at least converge quickly with your MCMC of choice.
I'd like to add though that you have simple things (such as stars rather than galaxies), and good initial guesses e.g. from the segmentation statistics returned by ProFound, then a downhill gradient algorithm will have no trouble converging on the global maximum and will be much faster than a full MCMC. Of course the MCMC will still work, but it is a bit of an overkill in simple cases.
I'm surprised that it's quite that bad. Are you fitting sizes in linear or log space? It should be log - I don't think there's any advantage to fitting linear sizes. If you still have this issue with BFGS, try some of the other algorithms in optim/nloptr as discussed in this thread: profit.freeforums.net/thread/20/choose-algorithm?page=1
Yes, I'm fitting effective radius (as well as Sersic index and axial ratio) in log space. I think I chose a particularly convoluted galaxy (spiral-arms, highly flocculant) in this example so potentially I'm giving optim a bad rap. It is true that for more simple galaxies, the results are not so divergent.
To complicate things, if the data cannot be fit by the model (i.e. if the data in detail cannot be represented by the model, which is always true in some regime) then the best (most likely) global fit is not necessarily the best (in the sense of being the most informative). Even in a simple case of trying to fit a Normal + Uniform with a mixture of Normals will certainly create a biased fit even for the Normal part (it expends effort trying to fit the unfittable Uniform part).
In other words, if the residuals are not sub dominant compared to the sigma map then anything goes...