Modifying MCMC Initial Positions

by Henry Ngo (2019) & Sarah Blunt (2021) & Mireya Arora (2021)

When you set up the MCMC Sampler, the initial position of your walkers are randomly determined. Specifically, they are uniformly distributed in your Prior phase space. This tutorial will show you how to change this default behaviour so that the walkers can begin at locations you specify. For instance, if you have an initial guess for the best fitting orbit and want to use MCMC to explore posterior space around this peak, you may want to start your walkers at positions centered around this peak and distributed according to an N-dimensional Gaussian distribution.

Note: This tutorial is meant to be read after reading the MCMC Introduction tutorial. If you are wondering what walkers are, you should start there!

The Driver class is the main way you might interact with orbitize! as it automatically reads your input, creates all the orbitize! objects needed to do your calculation, and defaults to some commonly used parameters or settings. However, sometimes you want to work directly with the underlying API to do more advanced tasks such as changing the MCMC walkers’ initial positions, or modifying the priors.

This tutorial walks you through how to do that.

Goals of this tutorial: - Learn to modify the MCMC Sampler object - Learn about the structure of the orbitize code base

Import modules

[1]:
import numpy as np
from scipy.optimize import minimize as mn
import orbitize
from orbitize import driver
import multiprocessing as mp

1) Create Driver object

First, let’s begin as usual and create our Driver object, as in the MCMC Introduction tutorial.

[2]:
filename = "{}/GJ504.csv".format(orbitize.DATADIR)

# system parameters
num_secondary_bodies = 1
total_mass = 1.75  # [Msol]
plx = 51.44  # [mas]
mass_err = 0.05  # [Msol]
plx_err = 0.12  # [mas]

# MCMC parameters
num_temps = 5
num_walkers = 30
num_threads = mp.cpu_count()  # or a different number if you prefer


my_driver = driver.Driver(
    filename,
    "MCMC",
    num_secondary_bodies,
    total_mass,
    plx,
    mass_err=mass_err,
    plx_err=plx_err,
    mcmc_kwargs={
        "num_temps": num_temps,
        "num_walkers": num_walkers,
        "num_threads": num_threads,
    },
)

2) Access the Sampler object to view the walker positions

As mentioned in the introduction, the Driver class creates the objects needed for the orbit fit. At the time of this writing, it creates a Sampler object which you can access with the .sampler attribute and a System object which you can access with the .system attribute.

The Sampler object contains all of the information used by the orbit sampling algorithm (OFTI or MCMC) to fit the orbit and determine the posteriors. The System object contains information about the astrophysical system itself (stellar and companion parameters, the input data, etc.).

To see all of the attributes of the driver object, you can use dir().

[3]:
print(dir(my_driver))
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'sampler', 'system']

This returns many other functions too, but you see sampler and system at the bottom. Don’t forget that in Jupyter notebooks, you can use my_driver? to get the docstring for its class (i.e. the Driver class) and my_driver?? to get the full source code of that class. You can also get this information in the API documentation.

Now, let’s list the attributes of the my_driver.sampler attribute.

[4]:
print(dir(my_driver.sampler))
['__abstractmethods__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '__weakref__', '_abc_impl', '_fill_in_fixed_params', '_logl', '_update_chains_from_sampler', 'check_prior_support', 'chi2_type', 'chop_chains', 'curr_pos', 'custom_lnlike', 'examine_chains', 'fixed_params', 'has_corr', 'lnlike', 'num_params', 'num_temps', 'num_threads', 'num_walkers', 'priors', 'results', 'run_sampler', 'sampled_param_idx', 'system', 'use_pt', 'validate_xyz_positions']

Again, you can use the ? and ?? features as well as the API documentation to find out more. Here we see an attribute curr_pos which contains the current position of all the walkers for the MCMC sampler. These positions were generated upon initialization of the Sampler object, which happened as part of the initialization of the Driver object.

Examine my_driver.sampler.curr_pos

curr_pos is an array and has shape (n_temps, n_walkers, n_params) for the parallel-tempered MCMC sampler and shape (n_walkers, n_params) for the affine-invariant ensemble sampler.

[5]:
my_driver.sampler.curr_pos.shape  # Here we are using the parallel-tempered MCMC sampler
[5]:
(5, 30, 8)

Basically, this is the same shape as the output of the Sampler. Let’s look at the start position of the first five walkers at the lowest temperature, to get a better sense of what the strucutre is like.

[6]:
print(my_driver.sampler.curr_pos[0, 0:5, :])
[[9.27153771e+03 7.72052247e-02 2.48173602e+00 1.45797651e+00
  4.43581764e+00 5.33957331e-01 5.11155491e+01 1.80782845e+00]
 [4.20682067e-01 9.89156369e-01 4.61093206e-01 1.25947642e+00
  2.21880036e+00 8.30484942e-01 5.16175061e+01 1.74926489e+00]
 [2.93009249e+01 4.42876011e-01 1.85406431e+00 5.21790350e-02
  4.29764787e+00 7.13679469e-02 5.14691320e+01 1.76691096e+00]
 [1.46043332e+03 8.26045166e-01 1.17672512e-01 1.66771136e+00
  4.94514469e+00 5.58027305e-01 5.15399393e+01 1.83912467e+00]
 [5.26417460e+01 7.93835235e-01 2.01212573e+00 6.43273326e-01
  5.69960003e+00 9.72070078e-02 5.16224931e+01 1.77433834e+00]]

3) Replace curr_pos with your own initial positions for walkers

When the sampler is run with the sampler.run_sampler() method, it will start the walkers at the curr_pos values, run the MCMC forward for the given number of steps, and then update curr_pos to reflect where the walkers ended up. The next time run_sampler() is called, it does the same thing again.

Here, you have just created the sampler but have not run it yet. So, if you update curr_pos with our own custom start locations, when you run the sampler, it will begin at your custom start locations instead.

3.1) Generate your own initial positions

There are many ways to create your own walker start distribution and what you want to do will depend on your science question and prior knowledge.

If you have already generated and validated your own initial walker positions, you can skip down to the “Update sampler position”. Some users use the output of OFTI or a previous MCMC run as the initial position.

If you need to generate your own positions, read on. Here, let’s assume you know a possible best fit value and your uncertainty in that fit. Perhaps you got this through a least squares minimization. So, let’s create a distribution of walkers that are centered on the best fit value and distributed normallly with the 1-sigma in each dimension equal to the uncertainty on that best fit value.

First, let’s define the best fit value and the spread. As a reminder, the order of the parameters in the array is (for a single planet-star system): semimajor axis, eccentricity, inclination, argument of periastron, position angle of nodes, epoch of periastron passage, parallax and total mass. You can check the indices with this dict in the system object.

[7]:
print(my_driver.system.param_idx)
{'sma1': 0, 'ecc1': 1, 'inc1': 2, 'aop1': 3, 'pan1': 4, 'tau1': 5, 'plx': 6, 'mtot': 7}
[8]:
# Set centre and spread of the walker distribution
# Values from Table 1 in Blunt et al. 2017, AJ, 153, 229
sma_cen = 44.48
sma_sig = 15.0
ecc_cen = 0.0151
ecc_sig = 0.175
inc_cen = 2.30  # (131.7 deg)
inc_sig = 0.279  # (16.0 deg)
aop_cen = 1.60  # (91.7 deg)
aop_sig = 1.05  # (60.0 deg)
pan_cen = 2.33  # (133.7 deg)
pan_sig = 0.872  # (50.0 deg)
tau_cen = 0.77  # (2228.11 yr)
tau_sig = 0.65  # (121.0 yr)

# Note : parallax and stellar mass already defined above (plx, plx_err, total_mass, mass_err)
walker_centres = np.array(
    [sma_cen, ecc_cen, inc_cen, aop_cen, pan_cen, tau_cen, plx, total_mass]
)
walker_1sigmas = np.array(
    [sma_sig, ecc_sig, inc_sig, aop_sig, pan_sig, tau_sig, plx_err, mass_err]
)

You can use numpy.random.standard_normal to generate normally distributed random numbers in the same shape as your walker initial positions (my_driver.sampler.curr_pos.shape). Then, multiply by walker_1sigmas to get the spread to match your desired distribution and add walker_centres to get the distribution centered on your desired values.

[9]:
curr_pos_shape = my_driver.sampler.curr_pos.shape  # Get shape of walker positions

# Draw from multi-variate normal distribution to generate new walker positions
new_pos = np.random.standard_normal(curr_pos_shape) * walker_1sigmas + walker_centres

3.2) Using an optimizer to obtain a best fit value

Other optimizing software can also be used to generate intial positions. Depending on the quality of data collected and whether a suitable guess array of parameters can be made, different optimizing software can provide better best fit values for for MCMC walkers. Below you will find a few options that cater to different scenarios.

3.2a) Using scipy.optimize.minimize

Assuming the data obtained allows for a suitable guess to be made for each parameter, a scipy.optimize.minimize software can be used to generate a best fit value. You may want to skip this step and input your guess values directly into MCMC’s initial walker positions, however scipy can help refine the guess parameters.

First, we define a new log liklihood function function neg_logl based on the guess values we have. Note, since we have predefined a good guess, from the aforementioned Table, as walker_centres we will continue to use it as a guess array for examples below.

[10]:
# The following code performs a minimization whereas the log liklihood function is based on maximization so we redefine the
# likelihood function is redefined to return -x to make this a minization scenario

m = my_driver.sampler


def neg_logl(paramarray):
    x = m._logl(
        paramarray, include_logp=True
    )  # set include_logp to true to include guess array in likelihood function

    return -x


guessarray = walker_centres
results = mn(neg_logl, guessarray, method="Powell")
print(results.x)  # results.x is the best fit value
/home/docs/checkouts/readthedocs.org/user_builds/orbitize/envs/latest/lib/python3.10/site-packages/scipy/optimize/_optimize.py:2473: RuntimeWarning: invalid value encountered in scalar multiply
  tmp2 = (x - v) * (fx - fw)
/home/docs/checkouts/readthedocs.org/user_builds/orbitize/envs/latest/lib/python3.10/site-packages/orbitize/priors.py:564: RuntimeWarning: invalid value encountered in log
  lnprob = np.log(np.sin(element_array) / normalization)
[4.90426906e+01 9.99444988e-06 2.48501010e+00 1.60085022e+00
 2.33034111e+00 7.69934177e-01 5.14404237e+01 1.74922411e+00]

In our trials, Powell has given the best results, but you may replace it with a different minimizing method depending on your need.

3.3) Scattering walkers

To set up MCMC so that it explores the nearby probablity space thoroughly and finds the global minimum, you can scatter the initial positions of the walkers around the best fit value. This can be done by adding random numbers to results.x

This section overrides walker_1sigmas and creates a spread of new_pos in a different manner than above. The following is a template based on the aforementioned Table. The scatter is created using a variety of methods, we recommend reviewing the code to ensure it is compatible to your data.

[11]:
new_pos = np.random.standard_normal(curr_pos_shape) * 0.03 + results.x

3.4) Update sampler position

After generating and validating your new walker positions, through whatever methods you choose, it’s now time to update the sampler object to have its curr_pos be your new positions.

[12]:
my_driver.sampler.curr_pos = np.copy(new_pos)

3.5) Validate your new positions

Drawing from a normal distribution can cause your walkers to start outside of your prior space. See the Modifying Priors tutorial for information on how to interact with the prior objects, which would allow you to find the limits on each parameter set by the priors etc.

Here, let’s do something more simple and just check that all values are physically valid. After this we can begin to correct them.

The following function can be used to identify walkers that have been initialized outside of the appropriate prior probability space. It will raise a ValueError if walkers are initialized outside of the priors. You should update your positions until this method runs without raising an error.

[13]:
try:
    my_driver.sampler.check_prior_support()
except Exception as e:
    print(e)
Attempting to start with walkers outside of prior support: check parameter(s) 1

We should continue investigating which parameters are being initialized outside of the prior space until this function returns empty lists.

And you’re done! You can continue at “Running the MCMC Sampler” in the MCMC Introduction Tutorial