invert4geom.optimization#
Classes#
Objective function to use in an Optuna optimization for finding the optimal damping |
|
Objective function to use in an Optuna optimization for finding the optimal values |
|
Objective function to use in an Optuna optimization for finding the optimal |
|
Objective function to use in an Optuna optimization for finding the optimal trend |
|
Objective function to use in an Optuna optimization for finding the optimal filter |
|
Objective function to use in an Optuna optimization for finding the optimal |
|
Objective function to use in an Optuna optimization for finding the optimal |
|
Objective function to use in an Optuna optimization for finding the buffer zone |
|
DuplicatePruner |
Functions#
Number of available virtual or physical CPUs on this system, i.e. |
|
|
Run optuna optimization, optionally in parallel. Pre-define the study, and objective |
|
Set up optuna optimization in parallel splitting up the number of trials over either |
|
Warn if any best parameter values are at their limits |
|
Log the results of an optuna trial |
|
Creates a study, sets directions and metric names based on the input parameters. |
|
custom optuna callback, only log trial info if it's the best value yet. |
|
custom optuna callback, warn if limits provide better score than current trial |
|
custom optuna callback, warn if limits provide better score than current trial for |
|
Use Optuna to find the optimal damping regularization parameter for a gravity |
|
Run an Optuna optimization to find the optimal zref and or density contrast values |
|
Perform an optimization for zref and density contrast values same as |
|
Use Optuna to find the optimal parameters for fitting equivalent sources to gravity |
|
Run an Optuna optimization to find the optimal filter width for estimating the |
|
Run an Optuna optimization to find the optimal trend order for estimating the |
|
Run an Optuna optimization to find the optimal equivalent source parameters for |
|
Run an Optuna optimization to find the optimal hyperparameters for the Constraint |
|
Run an optimization to find best buffer zone width. |
Module Contents#
- available_cpu_count()[source]#
Number of available virtual or physical CPUs on this system, i.e. user/real as output by time(1) when called with an optimally scaling userspace-only program
Adapted from https://stackoverflow.com/a/1006301/18686384
- Return type:
Any
- run_optuna(study, objective, n_trials, storage=None, maximize_cpus=True, parallel=False, progressbar=None, callbacks=None)[source]#
Run optuna optimization, optionally in parallel. Pre-define the study, and objective function, and if parallel is True, the storage (preferably with JournalStorage) and study name.
- Parameters:
- Return type:
- _optuna_set_cores(n_trials, optimize_study, study_name, storage, objective, max_cores=True)[source]#
Set up optuna optimization in parallel splitting up the number of trials over either all available cores or giving each available core 1 trial.
- _warn_parameter_at_limits(trial)[source]#
Warn if any best parameter values are at their limits
- Parameters:
trial (optuna.trial.FrozenTrial) – optuna trial, most likely should be the best trial.
- Return type:
None
- _log_optuna_results(trial)[source]#
Log the results of an optuna trial
- Parameters:
trial (optuna.trial.FrozenTrial) – optuna trial
- Return type:
None
- _create_regional_separation_study(optimize_on_true_regional_misfit, separate_metrics, sampler, true_regional=None, parallel=True, fname=None)[source]#
Creates a study, sets directions and metric names based on the input parameters.
- Parameters:
optimize_on_true_regional_misfit (bool) – choose to optimize on the true regional misfit instead of the residual misfit at constraints and the residual misfit amplitude.
separate_metrics (bool) – choose to optimize on the residual misfit at constraints and the residual misfit amplitude as separate metrics, as opposed to them as a ratio.
sampler (optuna.samplers.BaseSampler) – sampler object
true_regional (xarray.DataArray | None, optional) – grid of true regional values, by default None
parallel (bool, optional) – inform whether the study should be run in run in parallel, by default True. If True, uses file storage, which slows down the optimization, but allows for running in parallel.
fname (str | None, optional) – file name to save the study to, by default None
- Returns:
study (optuna.study.Study) – return a study object with direction, sampler, and metric names set
storage (optuna.storages.BaseStorage | None) – return an optuna storage object if parallel is True, otherwise None
- Return type:
tuple[optuna.study.Study, optuna.storages.BaseStorage | None]
- _logging_callback(study, frozen_trial)[source]#
custom optuna callback, only log trial info if it’s the best value yet.
- Parameters:
study (optuna.study.Study) – optuna study
frozen_trial (optuna.trial.FrozenTrial) – current trial
- Return type:
None
- _warn_limits_better_than_trial_1_param(study, trial)[source]#
custom optuna callback, warn if limits provide better score than current trial
- Parameters:
study (optuna.study.Study) – optuna study
trial (optuna.trial.FrozenTrial) – current trial
- Return type:
None
- _warn_limits_better_than_trial_multi_params(study, trial)[source]#
custom optuna callback, warn if limits provide better score than current trial for multiple parameter optimization
- Parameters:
study (optuna.study.Study) – optuna study
trial (optuna.trial.FrozenTrial) – current trial
- Return type:
None
- class OptimalInversionDamping(damping_limits, fname, plot_grids=False, **kwargs)[source]#
Objective function to use in an Optuna optimization for finding the optimal damping regularization value for a gravity inversion. Used within function optimize_inversion_damping().
- optimize_inversion_damping(training_df, testing_df, n_trials, damping_limits, n_startup_trials=None, score_as_median=False, sampler=None, grid_search=False, fname=None, plot_cv=True, plot_grids=False, logx=True, logy=True, progressbar=True, parallel=False, seed=0, **kwargs)[source]#
Use Optuna to find the optimal damping regularization parameter for a gravity inversion. The optimization aims to minimize the cross-validation score, represented by the root mean (or median) squared error (RMSE), between the testing gravity data, and the predict gravity data after and inversion. Follows methods of Uieda and Barbosa[1].
Provide upper and low damping values, number of trials to run, and specify to let Optuna choose the best damping value for each trial or to use a grid search. The results are saved to a pickle file with the best inversion results and the study.
- Parameters:
training_df (pandas.DataFrame) – rows of the gravity data frame which are just the training data
testing_df (pandas.DataFrame) – rows of the gravity data frame which are just the testing data
n_trials (int) – number of damping values to try
n_startup_trials (int | None, optional) – number of startup trials, by default is automatically determined
damping_limits (tuple[float, float]) – upper and lower limits
score_as_median (bool, optional) – if True, changes the scoring from the root mean square to the root median square, by default False
sampler (optuna.samplers.BaseSampler | None, optional) – customize the optuna sampler, by default either GPsampler or GridSampler depending on if grid_search is True or False
grid_search (bool, optional) – search the entire parameter space between damping_limits in n_trial steps, by default False
fname (str, optional) – file name to save both study and inversion results to as pickle files, by default fname is tmp_x_damping_cv where x is a random integer between 0 and 999 and will save study to <fname>_study.pickle and tuple of inversion results to <fname>_results.pickle.
plot_cv (bool, optional) – plot the cross-validation results, by default True
plot_grids (bool, optional) – for each damping value, plot comparison of predicted and testing gravity data, by default False
logx (bool, optional) – make x axis of CV result plot on log scale, by default True
logy (bool, optional) – make y axis of CV result plot on log scale, by default True
progressbar (bool, optional) – add a progressbar, by default True
parallel (bool, optional) – run the optimization in parallel, by default False
seed (int, optional) – random seed for the samplers, by default 0
kwargs (Any)
- Returns:
study (optuna.study) – the completed optuna study
inv_results (tuple[pandas.DataFrame, pandas.DataFrame, dict[str, typing.Any], float]) – a tuple of the inversion results: topography dataframe, gravity dataframe, parameter values and elapsed time.
- Return type:
tuple[optuna.study, tuple[pandas.DataFrame, pandas.DataFrame, dict[str, Any], float]]
- class OptimalInversionZrefDensity(fname, grav_df, constraints_df, regional_grav_kwargs, zref=None, zref_limits=None, density_contrast_limits=None, density_contrast=None, starting_topography=None, starting_topography_kwargs=None, progressbar=True, **kwargs)[source]#
Objective function to use in an Optuna optimization for finding the optimal values for zref and or density contrast values for a gravity inversion. This class is used within the function optimize_inversion_zref_density_contrast. If using constraint point minimization for the regional separation, split constraints into testing and training sets and provide the testing set to argument constraints_df and the training set to the constraints_df argument of regional_grav_kwargs. To perform K-folds cross-validation, provide lists of constraints dataframes to the parameters where each dataframe in each list corresponds to fold.
- Parameters:
fname (str)
grav_df (pandas.DataFrame)
constraints_df (pandas.DataFrame | list[pandas.DataFrame])
zref (float | None)
density_contrast (float | None)
starting_topography (xarray.DataArray | None)
progressbar (bool)
kwargs (Any)
- optimize_inversion_zref_density_contrast(grav_df, constraints_df, n_trials, n_startup_trials=None, starting_topography=None, zref_limits=None, density_contrast_limits=None, zref=None, density_contrast=None, starting_topography_kwargs=None, regional_grav_kwargs=None, score_as_median=False, sampler=None, grid_search=False, fname=None, plot_cv=True, logx=False, logy=False, progressbar=True, parallel=False, fold_progressbar=True, seed=0, **kwargs)[source]#
Run an Optuna optimization to find the optimal zref and or density contrast values for a gravity inversion. The optimization aims to minimize the cross-validation score, represented by the root mean (or median) squared error (RMSE), between points of known topography and the inverted topography. Follows methods of Uieda and Barbosa[1]. This can optimize for either zref, density contrast, or both at the same time. Provide upper and low limits for each parameter, number of trials and let Optuna choose the best parameter values for each trial or use a grid search to test all values between the limits in intervals of n_trials. The results are saved to a pickle file with the best inversion results and the study. Since each new set of zref and density values changes the starting model, for each set of parameters this function re-calculates the starting gravity, the gravity misfit and its regional and residual components. regional_grav_kwargs are passed to regional.regional_separation. Once the optimal parameters are found, the regional separation and inversion are performed again and saved to <fname>_results.pickle and the study is saved to <fname>_study.pickle. The constraint point minimization regional separation technique uses constraints points to estimate the regional field, and since constraints are used to calculating the scoring metric of this function, the constraints need to be separated into training (regional estimation) and testing (scoring) sets. To do this, supply the training constraints to`regional_grav_kwargs` via method=”constraint” or method=”constraint_cv” and constraints_df, and the testing constraints to this function as constraints_df. Typically there are not many constraints and omitting some of them from the training set will significantly impact the regional estimation. To help with this, we can use a K-Folds approach, where for each set of parameter values, we perform this entire procedure K times, each time with a different separation of training and testing points, called a fold. The score associated with that parameter set is the mean of the K scores. Once the optimal parameter values are found, we then repeat the inversion using all of the constraints in the regional estimation. For a K-folds approach, supply lists of dataframes containing only each fold’s testing or training points to the two constraints_df arguments. To automatically perform the test/train split and K-folds optimization, you can also use the convenience function optimize_inversion_zref_density_contrast_kfolds.
- Parameters:
grav_df (pandas.DataFrame) – gravity data frame with columns easting, northing, upward, and gravity_anomaly
constraints_df (pandas.DataFrame or list[pandas.DataFrame]) – constraints data frame with columns easting, northing, and upward, or list of dataframes for each fold of a cross-validation
n_trials (int) – number of trials, if grid_search is True, needs to be a perfect square and >=16.
n_startup_trials (int | None, optional) – number of startup trials, by default is automatically determined
starting_topography (xarray.DataArray | None, optional) – a starting topography grid used to create the prisms layers. If not provided, must provide region, spacing and dampings to starting_topography_kwargs, by default None
zref_limits (tuple[float, float] | None, optional) – upper and lower limits for the reference level, in meters, by default None
density_contrast_limits (tuple[float, float] | None, optional) – upper and lower limits for the density contrast, in kg/m^-3, by default None
zref (float | None, optional) – if zref_limits not provided, must provide a constant zref value, by default None
density_contrast (float | None, optional) – if density_contrast_limits not provided, must provide a constant density contrast value, by default None
starting_topography_kwargs (dict[str, Any] | None, optional) – dictionary with key: value pairs of “region”:tuple[float, float, float, float]. “spacing”:float, and “dampings”:float | list[float] | None, used to create a flat starting topography at each zref value if starting_topography not provided, by default None
regional_grav_kwargs (dict[str, Any] | None, optional) – dictionary with kwargs to supply to regional.regional_separation(), by default None
score_as_median (bool, optional) – change scoring metric from root mean square to root median square, by default False
sampler (optuna.samplers.BaseSampler | None, optional) – customize the optuna sampler, by default uses GPsampler unless grid_search is True, then uses GridSampler.
grid_search (bool, optional) – Switch the sampler to GridSampler and search entire parameter space between provided limits in intervals set by n_trials (for 1 parameter optimizations), or by the square root of n_trials (for 2 parameter optimizations), by default False
fname (str | None, optional) – file name to save both study and inversion results to as pickle files, by default fname is tmp_x_zref_density_cv where x is a random integer between 0 and 999 and will save study to <fname>_study.pickle and tuple of inversion results to <fname>_results.pickle.
plot_cv (bool, optional) – plot the cross-validation results, by default True
logx (bool, optional) – use a log scale for the cross-validation plot x-axis, by default False
logy (bool, optional) – use a log scale for the cross-validation plot y-axis, by default False
progressbar (bool, optional) – add a progressbar, by default True
parallel (bool, optional) – run the optimization in parallel, by default False
fold_progressbar (bool, optional) – show a progress bar for each fold of the constraint-point minimization cross-validation, by default True
seed (int, optional) – random seed for the samplers, by default 0
kwargs (Any)
- Returns:
study (optuna.study) – the completed optuna study
final_inversion_results (tuple[pandas.DataFrame, pandas.DataFrame, dict[str, typing.Any], float]) – a tuple of the inversion results: topography dataframe, gravity dataframe, parameter values and elapsed time.
- Return type:
tuple[optuna.study, tuple[pandas.DataFrame, pandas.DataFrame, dict[str, Any], float]]
- optimize_inversion_zref_density_contrast_kfolds(constraints_df, split_kwargs=None, **kwargs)[source]#
Perform an optimization for zref and density contrast values same as function optimize_inversion_zref_density_contrast, but pass a dataframe of constraint points and split_kwargs which are both passed split_test_train create K-folds of testing and training constraints. For each set of zref/density values, regional separation and inversion are performed for each of the K-folds in the constraints dataframe. The score for each parameter set will be the mean of the K-folds scores. This then repeats for all parameters. Within each parameter set and fold, the training constraints are used for the regional separation and the testing constraints are used for scoring. This optimization performs a total number of inversions equal to K-folds * number of parameter sets. For 20 parameter sets and 5 K-folds, this is 100 inversions. This extra computational expense is only useful if the regional separation technique you supply via regional_grav_kwargs uses constraints points for the estimations, such as constraint point minimization (method=’constraints_cv’ or method=’constraints’). It is more efficient, but less accurate, to simple use a different regional estimation technique, which doesn’t require constraint points, to find the optimal zref and density values. Then use these again in another inversion with the desired regional separation technique. Using the regional method of “constraints” will simply use the training points and supplied grid_method parameter values to calculate a regional field. Using the regional method of “constraints_cv” will take the training points and split these into a secondary set of training and testing points. These will be used internally in the regional separation to find the optimal grid_method parameters.
- Parameters:
constraints_df (pandas.DataFrame) – constraints dataframe with columns “easting”, “northing”, and “upward”.
split_kwargs (dict[str, Any] | None, optional) – kwargs to be passed to split_test_train for splitting constraints_df into test and train sets, by default None
**kwargs (Any) – kwargs to be passed to optimize_inversion_zref_density_contrast
- Returns:
study (optuna.study) – the completed optuna study
inversion_results (tuple[pandas.DataFrame, pandas.DataFrame, dict[str, typing.Any], float]]) – tuple of the best inversion results.
- Return type:
tuple[optuna.study, tuple[pandas.DataFrame, pandas.DataFrame, dict[str, Any], float]]
- class OptimalEqSourceParams(depth_limits=None, block_size_limits=None, damping_limits=None, **kwargs)[source]#
Objective function to use in an Optuna optimization for finding the optimal equivalent source parameters for fitting to gravity data.
- Parameters:
- optimize_eq_source_params(coordinates, data, n_trials=100, damping_limits=None, depth_limits=None, block_size_limits=None, sampler=None, plot=False, progressbar=True, parallel=False, fname=None, seed=0, **kwargs)[source]#
Use Optuna to find the optimal parameters for fitting equivalent sources to gravity data. The 3 parameters are damping, depth, and block size. Any or all of these can be optimized at the same time. Provide upper and lower limits for each parameter, or if you don’t want to optimize a parameter, provide a constant value of the parameter in the kwargs.
- Parameters:
coordinates (tuple[pandas.Series | numpy.ndarray, pandas.Series | numpy.ndarray, pandas.Series | numpy.ndarray]) – tuple of coordinates in the order (easting, northing, upward) for the gravity observation locations.
data (pandas.Series | numpy.ndarray) – gravity data values
n_trials (int, optional) – number of trials to run, by default 100
damping_limits (tuple[float, float], optional) – damping parameter limits, by default (0, 10**3)
depth_limits (tuple[float, float], optional) – source depth limits (positive downwards) in meters, by default (0, 10e6)
block_size_limits (tuple[float, float] | None, optional) – block size limits in meters, by default None
sampler (optuna.samplers.BaseSampler | None, optional) – specify which Optuna sampler to use, by default GPsampler
plot (bool, optional) – plot the resulting optimization figures, by default False
progressbar (bool, optional) – add a progressbar, by default True
parallel (bool, optional) – run the optimization in parallel, by default False
fname (str | None, optional) – file name to save the study to, by default None
seed (int, optional) – random seed for the samplers, by default 0
kwargs (Any) – additional keyword arguments to pass to OptimalEqSourceParams, which are passed to eq_sources_score. These can include parameters to pass to harmonica.EquivalentSources; “damping”, “points”, “depth”, “block_size”, “parallel”, and “dtype”, or parameters to pass to vd.cross_val_score; “delayed”, or “weights”.
- Returns:
study (optuna.study) – the completed optuna study
eqs (harmonica.EquivalentSources) – the fitted equivalent sources model
- Return type:
tuple[optuna.study, harmonica.EquivalentSources]
- class OptimizeRegionalTrend(trend_limits, optimize_on_true_regional_misfit=False, separate_metrics=True, **kwargs)[source]#
Objective function to use in an Optuna optimization for finding the optimal trend order for estimation the regional component of gravity misfit.
- Parameters:
- class OptimizeRegionalFilter(filter_width_limits, optimize_on_true_regional_misfit=False, separate_metrics=True, **kwargs)[source]#
Objective function to use in an Optuna optimization for finding the optimal filter width for estimation the regional component of gravity misfit.
- Parameters:
- class OptimizeRegionalEqSources(depth_limits=None, block_size_limits=None, damping_limits=None, grav_obs_height_limits=None, optimize_on_true_regional_misfit=False, separate_metrics=True, **kwargs)[source]#
Objective function to use in an Optuna optimization for finding the optimal equivalent source parameters for estimation the regional component of gravity misfit.
- Parameters:
- class OptimizeRegionalConstraintsPointMinimization(training_df, testing_df, grid_method, tension_factor_limits=(0, 1), spline_damping_limits=None, depth_limits=None, block_size_limits=None, damping_limits=None, grav_obs_height_limits=None, optimize_on_true_regional_misfit=False, separate_metrics=True, progressbar=False, **kwargs)[source]#
Objective function to use in an Optuna optimization for finding the optimal hyperparameter values the Constraint Point Minimization technique for estimation the regional component of gravity misfit. If single dataframes are supplied to training_df and testing_df, for each parameter value a regional field will be estimated using the training_df, and a score calculated used the testing_df. If lists of dataframes are supplied, a score will be calculated for each item in the list and the mean of the scores will be the metric returned. This class is used with the function optimize_regional_constraint_point_minimization.
- Parameters:
training_df (pandas.DataFrame | list[pandas.DataFrame])
testing_df (pandas.DataFrame | list[pandas.DataFrame])
grid_method (str)
optimize_on_true_regional_misfit (bool)
separate_metrics (bool)
progressbar (bool)
kwargs (Any)
- optimize_regional_filter(testing_df, grav_df, filter_width_limits, score_as_median=False, remove_starting_grav_mean=False, true_regional=None, n_trials=100, sampler=None, plot=False, plot_grid=False, optimize_on_true_regional_misfit=False, separate_metrics=True, progressbar=True, parallel=False, fname=None, seed=0)[source]#
Run an Optuna optimization to find the optimal filter width for estimating the regional component of gravity misfit. For synthetic testing, if the true regional grid is provided, the optimization can be set to optimize on the RMSE of the predicted and true regional gravity, by setting optimize_on_true_regional_misfit=True. By default this will perform a multi-objective optimization to find the best trade-off between the lowest RMSE of the residual at the constraints and the highest RMSE of the residual at all locations.
- Parameters:
testing_df (pandas.DataFrame) – constraint points to use for calculating the score with columns “easting”, “northing” and “upward”.
grav_df (pandas.DataFrame) – gravity dataframe with columns “easting”, “northing”, “reg”, and gravity_anomaly.
filter_width_limits (tuple[float, float]) – limits to use for the filter width in meters.
score_as_median (bool, optional) – use the root median square instead of the root mean square for the scoring metric, by default False
remove_starting_grav_mean (bool, optional) – remove the mean of the starting gravity data before estimating the regional. Useful to mitigate effects of poorly-chosen zref value. By default False
true_regional (xarray.DataArray | None, optional) – if the true regional gravity is known (in synthetic models), supply this as a grid to include a user_attr of the RMSE between this and the estimated regional for each trial, or set optimize_on_true_regional_misfit=True to have the optimization optimize on the RMSE, by default None
n_trials (int, optional) – number of trials to run, by default 100
sampler (optuna.samplers.BaseSampler | None, optional) – customize the optuna sampler, by default TPE sampler
plot (bool, optional) – plot the resulting optimization figures, by default False
plot_grid (bool, optional) – plot the resulting regional gravity grid, by default False
optimize_on_true_regional_misfit (bool, optional) – if true_regional grid is provide, choose to perform optimization on the RMSE between the true regional and the estimated region, by default False
separate_metrics (bool, optional) – if False, returns the scores combined with the formula residual_constraints_score / residual_amplitude_score, by default is True and returns both the residual and regional scores separately.
progressbar (bool, optional) – add a progressbar, by default True
parallel (bool, optional) – run the optimization in parallel, by default False
fname (str | None, optional) – file name to save the study to, by default None
seed (int, optional) – random seed for the samplers, by default 0
- Returns:
study (optuna.study,) – the completed Optuna study
resulting_grav_df (pandas.DataFrame) – the resulting gravity dataframe of the best trial
best_trial (optuna.trial.FrozenTrial) – the best trial
- Return type:
tuple[optuna.study, pandas.DataFrame, optuna.trial.FrozenTrial]
- optimize_regional_trend(testing_df, grav_df, trend_limits, score_as_median=False, remove_starting_grav_mean=False, true_regional=None, sampler=None, plot=False, plot_grid=False, optimize_on_true_regional_misfit=False, separate_metrics=True, progressbar=True, parallel=False, fname=None, seed=0)[source]#
Run an Optuna optimization to find the optimal trend order for estimating the regional component of gravity misfit. For synthetic testing, if the true regional grid is provided, the optimization can be set to optimize on the RMSE of the predicted and true regional gravity, by setting optimize_on_true_regional_misfit=True. By default this will perform a multi-objective optimization to find the best trade-off between the lowest RMSE of the residual at the constraints and the highest RMSE of the residual at all locations.
- Parameters:
testing_df (pandas.DataFrame) – constraint points to use for calculating the score with columns “easting”, “northing” and “upward”.
grav_df (pandas.DataFrame) – gravity dataframe with columns “easting”, “northing”, “reg” and gravity_anomaly.
trend_limits (tuple[int, int]) – limits to use for the trend order in degrees.
score_as_median (bool, optional) – use the root median square instead of the root mean square for the scoring metric, by default False
remove_starting_grav_mean (bool, optional) – remove the mean of the starting gravity data before estimating the regional. Useful to mitigate effects of poorly-chosen zref value. By default False
true_regional (xarray.DataArray | None, optional) – if the true regional gravity is known (in synthetic models), supply this as a grid to include a user_attr of the RMSE between this and the estimated regional for each trial, or set optimize_on_true_regional_misfit=True to have the optimization optimize on the RMSE, by default None
sampler (optuna.samplers.BaseSampler | None, optional) – customize the optuna sampler, by default GridSampler
plot (bool, optional) – plot the resulting optimization figures, by default False
plot_grid (bool, optional) – plot the resulting regional gravity grid, by default False
optimize_on_true_regional_misfit (bool, optional) – if true_regional grid is provide, choose to perform optimization on the RMSE between the true regional and the estimated region, by default False
separate_metrics (bool, optional) – if False, returns the scores combined with the formula residual_constraints_score / residual_amplitude_score, by default is True and returns both the residual and regional scores separately.
progressbar (bool, optional) – add a progressbar, by default True
parallel (bool, optional) – run the optimization in parallel, by default False
fname (str | None, optional) – file name to save the study to, by default None
seed (int, optional) – random seed for the samplers, by default 0
- Returns:
study (optuna.study,) – the completed Optuna study
resulting_grav_df (pandas.DataFrame) – the resulting gravity dataframe of the best trial
best_trial (optuna.trial.FrozenTrial) – the best trial
- Return type:
tuple[optuna.study, pandas.DataFrame, optuna.trial.FrozenTrial]
- optimize_regional_eq_sources(testing_df, grav_df, score_as_median=False, true_regional=None, n_trials=100, depth_limits=None, block_size_limits=None, damping_limits=None, grav_obs_height_limits=None, sampler=None, plot=False, plot_grid=False, optimize_on_true_regional_misfit=False, separate_metrics=True, progressbar=True, parallel=False, fname=None, seed=0, **kwargs)[source]#
Run an Optuna optimization to find the optimal equivalent source parameters for estimating the regional component of gravity misfit. For synthetic testing, if the true regional grid is provided, the optimization can be set to optimize on the RMSE of the predicted and true regional gravity, by setting optimize_on_true_regional_misfit=True. By default this will perform a multi-objective optimization to find the best trade-off between the lowest RMSE of the residual at the constraints and the highest RMSE of the residual at all locations.
- Parameters:
testing_df (pandas.DataFrame) – constraint points to use for calculating the score with columns “easting”, “northing” and “upward”.
grav_df (pandas.DataFrame) – gravity dataframe with columns “easting”, “northing”, “reg”, and gravity_anomaly.
score_as_median (bool, optional) – use the root median square instead of the root mean square for the scoring metric, by default False
true_regional (xarray.DataArray | None, optional) – if the true regional gravity is known (in synthetic models), supply this as a grid to include a user_attr of the RMSE between this and the estimated regional for each trial, or set optimize_on_true_regional_misfit=True to have the optimization optimize on the RMSE, by default None
n_trials (int, optional) – number of trials to run, by default 100
depth_limits (tuple[float, float] | None, optional) – limits to use for source depths, positive down in meters, by default None
block_size_limits (tuple[float, float] | None, optional) – limits to use for block size in meters, by default None
damping_limits (tuple[float, float] | None, optional) – limits to use for the damping parameter, by default None
grav_obs_height_limits (tuple[float, float] | None, optional) – limits to use for the gravity observation height in meters, by default None
sampler (optuna.samplers.BaseSampler | None, optional) – customize the optuna sampler, by default TPE sampler
plot (bool, optional) – plot the resulting optimization figures, by default False
plot_grid (bool, optional) – plot the resulting regional gravity grid, by default False
optimize_on_true_regional_misfit (bool, optional) – if true_regional grid is provide, choose to perform optimization on the RMSE between the true regional and the estimated region, by default False
separate_metrics (bool, optional) – if False, returns the scores combined with the formula residual_constraints_score / residual_amplitude_score, by default is True and returns both the residual and regional scores separately.
progressbar (bool, optional) – add a progressbar, by default True
parallel (bool, optional) – run the optimization in parallel, by default False
fname (str | None, optional) – file name to save the study to, by default None
seed (int, optional) – random seed for the samplers, by default 0
kwargs (Any) – additional keyword arguments to pass to the regional.regional_separation
- Returns:
study (optuna.study,) – the completed Optuna study
resulting_grav_df (pandas.DataFrame) – the resulting gravity dataframe of the best trial
best_trial (optuna.trial.FrozenTrial) – the best trial
- Return type:
tuple[optuna.study, pandas.DataFrame, optuna.trial.FrozenTrial]
- optimize_regional_constraint_point_minimization(testing_training_df, grid_method, grav_df, n_trials, tension_factor_limits=(0, 1), spline_damping_limits=None, depth_limits=None, block_size_limits=None, damping_limits=None, grav_obs_height_limits=None, sampler=None, plot=False, plot_grid=False, fold_progressbar=False, optimize_on_true_regional_misfit=False, separate_metrics=True, score_as_median=False, true_regional=None, progressbar=True, parallel=False, fname=None, seed=0, **kwargs)[source]#
Run an Optuna optimization to find the optimal hyperparameters for the Constraint Point Minimization (CPM) technique for estimating the regional component of gravity misfit. Since constraints are used both for determining the regional field, and for the scoring of the performance, we must split the constraints into testing and training sets. This function can perform both single and K-Folds cross validations, determined by the number of “fold_x” columns in testing_training_df. If using more than one fold, the score for each parameter set is the mean of the scores of each fold. The total number of regional separation this will perform is n_trials*K-folds. This function then uses the optimal parameter values to redo the regional estimation using all the constraints points, not just the training points, and returns the results. By default this will perform a multi-objective optimization to find the best trade-off between the lowest RMSE of the residual misfit at the constraints and the highest RMS amplitude of the residual at all locations. Choose which CPM gridding method with the grid_method parameter, and supplied the associated method parameter limits via parameters <parameter>_limits. For grid method “eq_sources” which has multiple parameters, if limits aren’t provided for one of the parameters, supply a constant value for the parameter in the keyword arguments, which are past direction to regional.regional_separation. For synthetic testing, if the true regional grid is provided, the optimization can be set to optimize on the RMSE of the predicted and true regional gravity, by setting optimize_on_true_regional_misfit=True.
- Parameters:
testing_training_df (pandas.DataFrame) – constraints dataframe with columns “easting”, “northing”, “upward”, and a column for each fold in the format “fold_0”, “fold_1”, etc. This can be created with function cross_validation.split_test_train(). Each fold column should have strings of “test” or “train” to indicate which rows are testing or training points. If more than one fold is provided, this function will perform a K-Folds cross validation and the score for each set of parameters will be the mean of the K-scores.
grid_method (str) – constraint point minimization method to use, choose between “verde” for bi-harmonic spline gridding, “pygmt” for tensioned minimum curvature gridding, or “eq_sources” for equivalent sources gridding.
grav_df (pandas.DataFrame) – gravity dataframe with columns “easting”, “northing”, “reg”, and “gravity_anomaly”.
n_trials (int) – number of trials to run
tension_factor_limits (tuple[float, float], optional) – limits to use for the PyGMT tension factor gridding, by default (0, 1)
spline_damping_limits (tuple[float, float] | None, optional) – limits to use for the Verde bi-harmonic spline damping, by default None
depth_limits (tuple[float, float] | None, optional) – limits to use for the equivalent sources’ depths, by default None
block_size_limits (tuple[float, float] | None, optional) – limits to use for the block size for fitting equivalent sources, by default None
damping_limits (tuple[float, float] | None, optional) – limits to use for the damping value for fitting equivalent sources, by default None
grav_obs_height_limits (tuple[float, float] | None, optional) – limits to use for the gravity observation height for fitting equivalent sources, by default None
sampler (optuna.samplers.BaseSampler | None, optional) – customize the optuna sampler, by default TPE sampler
plot (bool, optional) – plot the resulting optimization figures, by default False
plot_grid (bool, optional) – plot the resulting regional gravity grid, by default False
fold_progressbar (bool, optional) – turn on or off a progress bar for the optimization of each fold if performing a K-Folds cross-validation within the optimization, by default False
optimize_on_true_regional_misfit (bool, optional) – if true_regional grid is provide, choose to perform optimization on the RMSE between the true regional and the estimated region, by default False
separate_metrics (bool, optional) – if False, returns the scores combined with the formula residual_constraints_score / residual_amplitude_score, by default is True and returns both the residual and regional scores separately.
score_as_median (bool, optional) – use the root median square instead of the root mean square for the scoring metric, by default False
true_regional (xarray.DataArray | None, optional) – if the true regional gravity is known (in synthetic models), supply this as a grid to include a user_attr of the RMSE between this and the estimated regional for each trial, or set optimize_on_true_regional_misfit=True to have the optimization optimize on the RMSE, by default None
progressbar (bool, optional) – add a progressbar, by default True
parallel (bool, optional) – run the optimization in parallel, by default False
fname (str | None, optional) – file name to save the study to, by default None
seed (int, optional) – random seed for the samplers, by default 0
kwargs (Any) – additional keyword arguments to pass to the regional.regional_separation
- Returns:
study (optuna.study,) – the completed Optuna study
resulting_grav_df (pandas.DataFrame) – the resulting gravity dataframe of the best trial
best_trial (optuna.trial.FrozenTrial) – the best trial
- Return type:
tuple[optuna.study, pandas.DataFrame, optuna.trial.FrozenTrial]
- optimal_buffer(target, buffer_perc_limits=(1, 50), n_trials=25, sampler=None, grid_search=False, fname=None, progressbar=True, parallel=False, plot=True, seed=0, **kwargs)[source]#
Run an optimization to find best buffer zone width.
- class OptimalBuffer(buffer_perc_limits, target, fname, **kwargs)[source]#
Objective function to use in an Optuna optimization for finding the buffer zone width as a percentage of region width which limits the gravity decay (edge effects) to a specified amount within a region of interest. Used within function func:optimal_buffer.
- class DuplicateIterationPruner[source]#
Bases:
optuna.pruners.BasePrunerDuplicatePruner
Pruner to detect duplicate trials based on the parameters.
This pruner is used to identify and prune trials that have the same set of parameters as a previously completed trial.
- prune(study, trial)[source]#
Judge whether the trial should be pruned based on the reported values.
Note that this method is not supposed to be called by library users. Instead,
optuna.trial.Trial.reportandoptuna.trial.Trial.should_pruneprovide user interfaces to implement pruning mechanism in an objective function.- Parameters:
study (optuna.study.Study) – Study object of the target study.
trial (optuna.trial.FrozenTrial) – FrozenTrial object of the target trial. Take a copy before modifying this object.
- Returns:
A boolean value representing whether the trial should be pruned.
- Return type: