Damping value cross validation

4. Damping value cross validation#

The choice of the damping value used in the inversion directly affects the results and therefore needs to be carefully chosen. We provide some tools to perform a cross-validation with a range of damping values in order to chose the optimal value.

This cross validation routine is adapted from Uieda & Barbosa 2017. See their paper for a detailed discussion.

The main idea is to re-sample the gravity data at a finer resolution and separate the data into a testing and training set. The inversion is performed with a range of damping values, each time using just the training set. Each resulting inverted topo from each of the damping values is then forward modelled onto the locations of the testing data, which were not included in the inversion. The optimal damping value is the one which results in the smallest difference between the forward modeled data and the testing data.

4.1. Import packages#

[1]:

# set EPSG for plotting functions
import os

import numpy as np
import polartoolkit as ptk
import verde as vd

import invert4geom

os.environ["POLARTOOLKIT_EPSG"] = "3031"

/home/mdtanker/miniforge3/envs/invert4geom/lib/python3.12/site-packages/UQpy/__init__.py:6: UserWarning:

pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.

4.2. Create synthetic data#

Below we will create some synthetic topography data to act as the real topography, and it will forward model its gravity effect with some added noise. It will forward model it at 2x the grid spacing to create testing and training datasets to use for the damping parameter cross validation.

[2]:

# the density contrast is between rock (~2670 kg/m3) and air (~1 kg/m3)
true_density_contrast = 2670 - 1
true_topography, _, _, observed_gravity = invert4geom.load_synthetic_model(
    spacing=1000,
    region=(0, 40000, 0, 30000),
    density_contrast=true_density_contrast,
    gravity_noise=0.2,
    plot_topography=True,
)
true_zref = true_topography.values.mean()
print(f"mean of true topography: {true_zref} m ")

../_images/tutorial_04_damping_cross_validation_4_0.png

../_images/tutorial_04_damping_cross_validation_4_1.png

mean of true topography: 492.2704164812973 m

4.5. Cross Validation#

Now we can repeat this with a range of damping parameters to find the optimal (lowest) score.

This is a classic optimization problem, where we vary the damping parameter to find which value produces the best score. For this we use the Python package Optuna.

For each trial a .pickle file containing the inversion results will be created. Once all trials are complete, all but the best trial’s pickle files will be deleted. Additionally, another .pickle file will be created containing the Optuna study object with all the trial information and results.

By default, optimize_inversion_damping will use a sampler to choose damping values between the provided damping_limits and repeat this n_trials times. This may skip large portions of the parameter space. If instead you would like to systematically test damping values at a constant interval, you can set grid_search to True, and this will test all damping values as defined by:

np.logspace(<<exponent of lower damping limit>>, <<exponent of upper damping limit>>, n_trials)

[11]:

damping_cv_obj = inv.optimize_inversion_damping(
    damping_limits=(0.001, 1),
    n_trials=8,
    grid_search=True,
    plot_scores=False,
    fname="../tmp/04_damping_CV",
)
inv.solver_damping

Inversion terminated due to max_iterations limit. Consider increasing this limit.

[11]:

np.float64(0.019306977288832496)

[12]:

damping_cv_obj.study.trials_dataframe()

[12]:

	number	value	datetime_start	datetime_complete	duration	params_damping	user_attrs_fname	system_attrs_grid_id	system_attrs_search_space	state
0	0	1.730399	2026-02-17 02:17:36.651284	2026-02-17 02:17:43.101102	0 days 00:00:06.449818	0.372759	../tmp/04_damping_CV_trial_0	0	{'damping': [0.001, 0.0026826957952797246, 0.0...	COMPLETE
1	1	0.119877	2026-02-17 02:17:43.102307	2026-02-17 02:17:44.636571	0 days 00:00:01.534264	0.007197	../tmp/04_damping_CV_trial_1	1	{'damping': [0.001, 0.0026826957952797246, 0.0...	COMPLETE
2	2	0.809352	2026-02-17 02:17:44.637739	2026-02-17 02:17:45.716687	0 days 00:00:01.078948	0.002683	../tmp/04_damping_CV_trial_2	2	{'damping': [0.001, 0.0026826957952797246, 0.0...	COMPLETE
3	3	6.319568	2026-02-17 02:17:45.717765	2026-02-17 02:17:46.259139	0 days 00:00:00.541374	1.000000	../tmp/04_damping_CV_trial_3	3	{'damping': [0.001, 0.0026826957952797246, 0.0...	COMPLETE
4	4	0.023230	2026-02-17 02:17:46.260044	2026-02-17 02:17:52.302433	0 days 00:00:06.042389	0.019307	../tmp/04_damping_CV_trial_4	4	{'damping': [0.001, 0.0026826957952797246, 0.0...	COMPLETE
5	5	0.886706	2026-02-17 02:17:52.303473	2026-02-17 02:17:53.242700	0 days 00:00:00.939227	0.001000	../tmp/04_damping_CV_trial_5	5	{'damping': [0.001, 0.0026826957952797246, 0.0...	COMPLETE
6	6	0.097773	2026-02-17 02:17:53.243739	2026-02-17 02:17:59.959784	0 days 00:00:06.716045	0.138950	../tmp/04_damping_CV_trial_6	6	{'damping': [0.001, 0.0026826957952797246, 0.0...	COMPLETE
7	7	0.026461	2026-02-17 02:17:59.960753	2026-02-17 02:18:06.618580	0 days 00:00:06.657827	0.051795	../tmp/04_damping_CV_trial_7	7	{'damping': [0.001, 0.0026826957952797246, 0.0...	COMPLETE

[13]:

fig = invert4geom.plot_scores(
    damping_cv_obj.study.trials_dataframe().value.values,
    damping_cv_obj.study.trials_dataframe().params_damping.values,
    param_name="Damping",
    logx=True,
)

../_images/tutorial_04_damping_cross_validation_19_0.png

4.6. Plot inversion results using optimal value#

[14]:

inv.plot_convergence()

inv.plot_inversion_results(iters_to_plot=2)

_ = ptk.grid_compare(
    true_topography,
    inv.model.topography,
    grid1_name="True topography",
    grid2_name="Inverted topography",
    robust=True,
    hist=True,
    inset=False,
    title="difference",
    reverse_cpt=True,
    cmap="rain",
)

../_images/tutorial_04_damping_cross_validation_21_0.png

../_images/tutorial_04_damping_cross_validation_21_1.png

../_images/tutorial_04_damping_cross_validation_21_2.png

../_images/tutorial_04_damping_cross_validation_21_3.png

../_images/tutorial_04_damping_cross_validation_21_4.png

4.7. Run inversion with poor choice of damping value#

[15]:

# setup the inversion
inv = invert4geom.Inversion(
    data,
    model,
    solver_damping=0.005,
    # set stopping criteria
    l2_norm_tolerance=0.2**2,  # gravity error is .2 mGal or L2-norm of ~0.45
    delta_l2_norm_tolerance=1.01,
)

inv.invert()

[16]:

_ = ptk.grid_compare(
    true_topography,
    inv.model.topography,
    grid1_name="True topography",
    grid2_name="Inverted topography",
    robust=True,
    hist=True,
    inset=False,
    title="difference",
    reverse_cpt=True,
    cmap="rain",
    # diff_lims=(-20, 20),
)

../_images/tutorial_04_damping_cross_validation_24_0.png

Comparing this inversion with a damping value of 0.005 to the inversion with the optimal damping value of ~0.1 shows that a low damping value allows too much noise through to the final inverted topography.

In the past example (simple inversion), we arbitrarily chose a damping value. Here, we show how by finding the optimal damping value using a cross-validation, we have improved the inversion’s performance.

The next notebook, density optimization, will show how to perform a similar procedure for choosing a density contrast value.