Quickstart

In this first tutorial we load a data file, perform single-mode Pelton Cole-Cole inversion on it, visualize the fit quality and posterior distribution and save results to a csv file. A more in-depth look at the Pelton Cole-Cole model and its parameters is available in the Pelton tutorial.

Running your first inversion

To invert a SIP data file, you would use the following approach:

  • Import the base PeltonColeCole model.
  • Pass filepath to instantiate the model with a specific data file.
  • Set the simulation to explore the Pelton model parameter space with nwalkers=32.
  • Set the simulation to run for 2000 steps by passing nsteps=2000.
  • Skip the first 9 lines in the data file with headers=9.
  • Fit the model to the data by calling the fit() method.
[2]:
import os

import numpy as np
from bisip import PeltonColeCole
from bisip import DataFiles

# This will get one of the example data files in the BISIP package
data_files = DataFiles()
filepath = data_files['SIP-K389172']

model = PeltonColeCole(filepath=filepath,
                       nwalkers=32,  # number of MCMC walkers
                       nsteps=2000,  # number of MCMC steps
                       headers=9,  # number of lines to skip in the data file
                       )

# Fit the model to this data file
model.fit()
100%|██████████| 2000/2000 [00:03<00:00, 593.58it/s]

Note

In this example we use headers=9 to skip the header line and the 8 highest measurement frequencies. As a result we keep only a single polarization mode in the lower frequency range.

Visualizing the parameter traces

Visual inspection of the traces of each MCMC walker is a good way to quickly assess how well the inversion went. They also define what we consider samples of the posterior distribution. The walkers are initialized at random positions in the parameter space, let’s inspect the parameter traces to see if they have moved towards a unique solution to the inverse problem. Plotting the traces is one of the first things the user should do after completing the fit() method. Simply call the plot_traces() method of the Inversion object to visualize the parameter chains.

[3]:
# Plot the parameter traces
fig = model.plot_traces()
../_images/tutorials_quickstart_9_0.png

The walkers have correctly moved to similar solutions. As you can see for the first few iterations the walkers are spread out across the entire parameter space. The ensemble has then reached a stationary state after at least 50 steps. To be safe, let’s discard all the values before the 500th step and use the following steps to estimate the best values for our parameters.

You can also store the chain as a numpy ndarray using the get_chain() method. Passing the argument flat=True will flatten the walkers into a single chain, whereas passing flat=False will preserve the nwalkers dimension. The thin argument will pick one sample every thin steps, this can be used to reduce autocorrelation.

[4]:
# Get chains of all walkers, discarding first 500 steps
chain = model.get_chain(discard=500)
print(chain.shape)  # (nsteps, nwalkers, ndim)
(1500, 32, 4)
[5]:
# Get chains of all walkers, discarding first 500 steps,
# thinning by a factor of 2 and flattening the walkers
chain = model.get_chain(discard=500, thin=2, flat=True)
print(chain.shape)  # (nsteps*nwalkers/thin, ndim)
(24000, 4)

Note

Here ndim refers to the number of dimensions of the model (the number of parameters).

Plotting models over data

Let’s now plot the best model obtained with this chain using the plot_fit() method, which will show the real and imaginary parts of the complex resistivity as a function of measurement frequency. The plot will also display the median model as a red line and the 95% highest probability density interval as dotted lines.

[6]:
# Use the chain argument to plot the model
# for a specific flattened chain
fig = model.plot_fit(chain)

# You can then use the `fig` matplotlib object to save the figure or make
# adjustments according to your personal preference. For example:
# fig.savefig('fit_figure_sample_K389172.png', dpi=144, bbox_inches='tight')
../_images/tutorials_quickstart_17_0.png

You may also directly pass the arguments of the get_chain() method to most plotting functions. By default this will flatten the walkers into a single chain. For example here we use the plot_fit_pa() method, which will display the amplitude and phase shift as a function of measurement frequency. We also adjust the confidence interval to display the 72% HPD with the p argument (p for percentile).

[7]:
# Use the discard argument to discard samples
# note that we do not need to pass the chain argument
fig = model.plot_fit_pa(discard=500, p=[14, 50, 86])  # 14=lower, 50=median, 86=higher
../_images/tutorials_quickstart_19_0.png

Printing best parameters

Here we will use the get_param_mean() and get_param_std() utility functions to extract the mean and standard deviations of each parameter in our chain. We will also access the name of each parameter with the param_names property.

[8]:
values = model.get_param_mean(chain=chain)
uncertainties = model.get_param_std(chain=chain)

for n, v, u in zip(model.param_names, values, uncertainties):
  print(f'{n}: {v:.5f} +/- {u:.5f}')
r0: 1.02413 +/- 0.00484
m1: 0.36238 +/- 0.02619
log_tau1: -2.13531 +/- 0.25427
c1: 0.50193 +/- 0.03324

Looks good! The relative error on the recovered parameters is quite small considering our large error bars on the data.

Note

It is important to note that for every inversion scheme the amplitude values (and the r0 parameter) have been normalized by the maximum amplitude. You may access this normalization factor with model.data['norm_factor']. Therefore the real \(\rho_0\) value is r0 * model.data['norm_factor'].

Inspecting the posterior

Let’s now visualize the posterior distribution of all parameters using a corner plot (from the corner Python package).

[9]:
fig = model.plot_corner(chain=chain)
../_images/tutorials_quickstart_27_0.png

This is interesting, we note that the \(\log \tau\) values are anticorrelated with the \(m\) values. It seems like the model can obtain similar fits of the data by adjusting these parameters against each other.

Saving results to CSV file

Finally let’s save the best parameters and some of their statistics as a csv file.

[10]:
# Get the lower, median and higher percentiles
results = model.get_param_percentile(chain=chain, p=[2.5, 50, 97.5])
# Join the list of parameter names into a comma separated string
headers = ','.join(model.param_names)
# Save to csv with numpy
# The first row is the 2.5th percentile, 2nd the 50th (median), 3rd the 97.5th.
# Parameter names will be listed in the csv file header.
[11]:
print(headers)
r0,m1,log_tau1,c1
[12]:
print(results)
[[ 1.01547398  0.31429044 -2.66165185  0.43494749]
 [ 1.02389413  0.3611087  -2.12615936  0.50230639]
 [ 1.03427943  0.41689845 -1.66573259  0.56613694]]
[13]:
np.savetxt('quickstart_results.csv', results, header=headers,
           delimiter=',', comments='')