invert4geom.optimize_regional_constraint_point_minimization

invert4geom.optimize_regional_constraint_point_minimization#

optimize_regional_constraint_point_minimization(testing_training_df, grid_method, grav_ds, n_trials, tension_factor_limits=(0, 1), spline_damping_limits=None, depth_limits=None, block_size_limits=None, damping_limits=None, grav_obs_height_limits=None, sampler=None, plot=False, plot_grid=False, fold_progressbar=False, optimize_on_true_regional_misfit=False, separate_metrics=True, score_as_median=False, true_regional=None, progressbar=True, parallel=False, fname=None, seed=0, **kwargs)[source]#

Run an Optuna optimization to find the optimal hyperparameters for the Constraint Point Minimization technique for estimating the regional component of gravity misfit. Since constraints are used both for determining the regional field, and for the scoring of the performance, we must split the constraints into testing and training sets. This function can perform both single and K-Folds cross validations, determined by the number of “fold_x” columns in testing_training_df. If using more than one fold, the score for each parameter set is the mean of the scores of each fold. The total number of regional separation this will perform is n_trials*K-folds. This function then uses the optimal parameter values to redo the regional estimation using all the constraints points, not just the training points, and returns the results. By default this will perform a multi-objective optimization to find the best trade-off between the lowest RMSE of the residual misfit at the constraints and the highest RMS amplitude of the residual at all locations. Choose which Constraint Point Minimization gridding method with the grid_method parameter, and supplied the associated method parameter limits via parameters <parameter>_limits. For grid method “eq_sources” which has multiple parameters, if limits aren’t provided for one of the parameters, supply a constant value for the parameter in the keyword arguments, which are past direction to DatasetAccessorInvert4Geom.regional_separation. For synthetic testing, if the true regional grid is provided, the optimization can be set to optimize on the RMSE of the predicted and true regional gravity, by setting optimize_on_true_regional_misfit=True.

Parameters:
  • testing_training_df (DataFrame) – constraints dataframe with columns “easting”, “northing”, “upward”, and a column for each fold in the format “fold_0”, “fold_1”, etc. This can be created with function cross_validation.split_test_train(). Each fold column should have strings of “test” or “train” to indicate which rows are testing or training points. If more than one fold is provided, this function will perform a K-Folds cross validation and the score for each set of parameters will be the mean of the K-scores.

  • grid_method (str) – constraint point minimization method to use, choose between “verde” for bi-harmonic spline gridding, “pygmt” for tensioned minimum curvature gridding, or “eq_sources” for equivalent sources gridding.

  • grav_ds (Dataset) – gravity dataset with coordinates “easting”, “northing”, and variables “reg” and gravity_anomaly.

  • n_trials (int) – number of trials to run

  • tension_factor_limits (tuple[float, float]) – limits to use for the PyGMT tension factor gridding, by default (0, 1)

  • spline_damping_limits (tuple[float, float] | None) – limits to use for the Verde bi-harmonic spline damping, by default None

  • depth_limits (tuple[float, float] | None) – limits to use for the equivalent sources’ depths, by default None

  • block_size_limits (tuple[float, float] | None) – limits to use for the block size for fitting equivalent sources, by default None

  • damping_limits (tuple[float, float] | None) – limits to use for the damping value for fitting equivalent sources, by default None

  • grav_obs_height_limits (tuple[float, float] | None) – limits to use for the gravity observation height for fitting equivalent sources, by default None

  • sampler (BaseSampler | None) – customize the optuna sampler, by default TPE sampler

  • plot (bool) – plot the resulting optimization figures, by default False

  • plot_grid (bool) – plot the resulting regional gravity grid, by default False

  • fold_progressbar (bool) – turn on or off a progress bar for the optimization of each fold if performing a K-Folds cross-validation within the optimization, by default False

  • optimize_on_true_regional_misfit (bool) – if true_regional grid is provide, choose to perform optimization on the RMSE between the true regional and the estimated region, by default False

  • separate_metrics (bool) – if False, returns the scores combined with the formula residual_constraints_score / residual_amplitude_score, by default is True and returns both the residual and regional scores separately.

  • score_as_median (bool) – use the root median square instead of the root mean square for the scoring metric, by default False

  • true_regional (DataArray | None) – if the true regional gravity is known (in synthetic models), supply this as a grid to include a user_attr of the RMSE between this and the estimated regional for each trial, or set optimize_on_true_regional_misfit=True to have the optimization optimize on the RMSE, by default None

  • progressbar (bool) – add a progressbar, by default True

  • parallel (bool) – run the optimization in parallel, by default False

  • fname (str | None) – file name to save the study to, by default None

  • seed (int) – random seed for the samplers, by default 0

  • kwargs (Any) – additional keyword arguments to pass to the DatasetAccessorInvert4Geom.regional_separation

Return type:

tuple[Study, Dataset, FrozenTrial]

Returns:

  • study (optuna.study.Study,) – the completed Optuna study

  • resulting_grav_ds (xarray.Dataset) – the resulting gravity dataset of the best trial

  • best_trial (optuna.trial.FrozenTrial) – the best trial