invert4geom.uncertainty

invert4geom.uncertainty#

Classes#

DiscreteUniform

Discrete uniform distribution.

Functions#

`create_lhc`(n_samples, parameter_dict[, random_state, ...])	Given some parameter values and their expected distributions, create a Latin
`randomly_sample_data`(seed, data_df, data_col, uncert_col)	Given a dataframe with a data column and an uncertainty column, sample the data with
`starting_topography_uncertainty`(runs[, ...])	Create a stochastic ensemble of starting topographies by sampling the constraints or
`equivalent_sources_uncertainty`(runs, data, coords, ...)	Create a stochastic ensemble of regional gravity anomalies by sampling the
`regional_misfit_uncertainty`(runs[, sample_gravity, ...])	Create a stochastic ensemble of regional gravity anomalies by sampling the
`full_workflow_uncertainty_loop`(runs[, fname, ...])	Run a series of inversions (N=runs), and save results of
`model_ensemble_stats`(dataset[, weights, region])	Given a dataset, calculate the cell-wise mean, standard deviation, and weighted mean
`merge_simulation_results`(grids)	Merge a list of grids into a single dataset with variable names "run_<number>"
`merged_stats`(results[, plot, constraints_df, ...])	Use the outputs of the function uncertainty.full_workflow_uncertainty_loop to

Module Contents#

class DiscreteUniform(loc=0.0, scale=1.0)[source]#

Bases: UQpy.distributions.DistributionDiscrete1D

Discrete uniform distribution.

Parameters:

loc (float | int)
scale (float | int)

create_lhc(n_samples, parameter_dict, random_state=1, criterion='centered')[source]#

Given some parameter values and their expected distributions, create a Latin Hypercube with a given number of samples.

Parameters:

n_samples (int) – how many samples to make for each parameter
parameter_dict (dict) – nested dictionary, with a dictionary of ‘distribution’, ‘loc’, ‘scale’ and optionally ‘log’ for each parameter to be sampled. Distributions can be ‘uniform’ or ‘normal’. For ‘uniform’, ‘loc’ is the lower bound and ‘scale’ is the range of the distribution. ‘loc’ + ‘scale’ = upper bound. For ‘normal’, ‘loc’ is the center (mean) of the distribution and ‘scale’ is the standard deviation. If ‘log’ is True, the provided ‘loc’ and ‘scale’ values are the base 10 exponents. For example, a uniform distribution with loc=-4, scale=6 and log=True would sample values between 1e-4 and 1e2.
random_state (int, optional) – random state to use for sampling, by default 1
criterion (str, optional) – criterion to use for sampling, by default “centered”, options are “centered”, “random”, “maximin”, or “mincorrelation”, which each relate to a criterion from the Python package UQpy.

Returns:

nested dictionary with parameter names, distribution specifics, and sampled values

Return type:

dict[dict[Any]]

randomly_sample_data(seed, data_df, data_col, uncert_col)[source]#

Given a dataframe with a data column and an uncertainty column, sample the data with a normal distribution within the uncertainty range. Note that this overwrites the data column with the newly sampled data.

Parameters:

seed (int) – random number generator seed
data_df (pandas.DataFrame) – dataframe with columns data_col and uncert_col
data_col (str) – name of data column to sample
uncert_col (str) – name of uncertainty column to sample within

Returns:

dataframe with data column updated with sampled values

Return type:

pandas.DataFrame

starting_topography_uncertainty(runs, sample_constraints=False, parameter_dict=None, plot=True, plot_region=None, true_topography=None, **kwargs)[source]#

Create a stochastic ensemble of starting topographies by sampling the constraints or parameters within their respective distributions and find the cell-wise (weighted) statistics of the ensemble.

Parameters:

runs (int) – number of runs to perform
sample_constraints (bool, optional) – choose to sample the constraints from a normal distribution with a mean of each constraints depth and a standard deviation set by the uncert column, by default False
parameter_dict (dict[str, Any] | None, optional) – dictionary of parameters passes to create_topography with the uncertainty distributions defined, by default None
plot (bool, optional) – show the results, by default True
plot_region (tuple[float, float, float, float] | None, optional) – clip the plot to a region, by default None
true_topography (xarray.DataArray | None, optional) – if the true topography is known, will make a plot comparing the results, by default None
kwargs (Any)

Returns:

stats_ds (xarray.Dataset) – a dataset with the cell-wise statistics of the ensemble of topographies.
sampled_param_dict (dict[str, typing.Any]) – dictionary of sampled parameter values.

Return type:

tuple[xarray.Dataset, dict[str, Any]]

equivalent_sources_uncertainty(runs, data, coords, grid_points, parameter_dict=None, region=None, plot=True, plot_region=None, true_gravity=None, deterministic_error=None, weight_by=None, **kwargs)[source]#

Create a stochastic ensemble of regional gravity anomalies by sampling the constraints, gravity, or parameters within their respective distributions and calculate the cell-wise (weighted) statistics of the ensemble.

Parameters:

runs (int) – number of runs to perform
parameter_dict (dict[str, Any] | None, optional) – dictionary of parameters passes to regional_separation with the uncertainty distributions defined, by default None
region (tuple[float, float, float, float] | None = None,) – region to calculate statistics within, by default None
plot (bool, optional) – show the results, by default True
plot_region (tuple[float, float, float, float] | None, optional) – clip the plot to a region, by default None
true_regional (xarray.DataArray | None, optional) – if the true regional misfit is known, will make a plot comparing the results, by default None
deterministic_error (xarray.DataArray | None, optional) – if the deterministic error is known, will make a plot comparing the results, by default None
weight_by (str | None, optional) – how to weight the models, by default None
data (numpy.typing.NDArray)
coords (tuple[numpy.typing.NDArray, numpy.typing.NDArray, numpy.typing.NDArray])
grid_points (pandas.DataFrame)
true_gravity (xarray.DataArray | None)
kwargs (Any)

Returns:

stats_ds (xarray.Dataset,) – a dataset with the cell-wise statistics of the ensemble of regional gravity
sampled_parms_dict (dict[str, typing.Any]) – a dictionary of sampled parameter values.

Return type:

tuple[xarray.Dataset, dict[str, Any]]

regional_misfit_uncertainty(runs, sample_gravity=False, parameter_dict=None, region=None, plot=True, plot_region=None, true_regional=None, weight_by=None, **kwargs)[source]#

Create a stochastic ensemble of regional gravity anomalies by sampling the constraints, gravity, or parameters within their respective distributions and calculate the cell-wise (weighted) statistics of the ensemble.

Parameters:

runs (int) – number of runs to perform
sample_gravity (bool, optional) – choose to sample the gravity data from a normal distribution with a mean of each points value and a standard deviation set by the uncert column, by default False
parameter_dict (dict[str, Any] | None, optional) – dictionary of parameters passes to regional_separation with the uncertainty distributions defined, by default None
region (tuple[float, float, float, float] | None = None,) – region to calculate statistics within, by default None
plot (bool, optional) – show the results, by default True
plot_region (tuple[float, float, float, float] | None, optional) – clip the plot to a region, by default None
true_regional (xarray.DataArray | None, optional) – if the true regional misfit is known, will make a plot comparing the results, by default None
weight_by (str | None)
kwargs (Any)

Returns:

stats_ds (xarray.Dataset) – a dataset with the cell-wise statistics of the ensemble of regional gravity
sampled_param_dict (dict[str, typing.Any]) – a dictionary of sampled parameter values.

Return type:

tuple[xarray.Dataset, dict[str, Any]]

full_workflow_uncertainty_loop(runs, fname=None, sample_gravity=False, gravity_filter_width=None, sample_constraints=False, starting_topography_parameter_dict=None, regional_misfit_parameter_dict=None, parameter_dict=None, create_starting_topography=False, create_starting_prisms=False, calculate_starting_gravity=False, calculate_regional_misfit=False, regional_grav_kwargs=None, starting_topography_kwargs=None, **kwargs)[source]#

Run a series of inversions (N=runs), and save results of each inversion to pickle files starting with fname. If files already exist, just return the loaded results instead of re-running the inversion. Choose which variables to include in the sampling and whether or not to run a damping value cross-validation for each inversion.

Feed returned values into function merged_stats to compute cell-wise stats on the resulting ensemble of starting topography models, inverted topography models, and gravity anomalies.

Sampling of data (gravity and constraints) uses the columns “uncert” in the dataframes and randomly samples the data from a normal distribution with the uncertainty value as the standard deviation and the data value as the mean. The randomness is controlled by a seed which is equal to the run number, so it changes at every run, and the same run will always produce the same sampling. This allows the run number to be increased and this function run again with the same filename to continue the stochastic ensemble. This only works with data sampling, not parameter sampling.

Sampling of parameter values are determined by 3 supplied dictionaries: parameter_dict which can contain parameters density_contrast, zref, and solver_damping. The other two dictionaries are starting_topography_parameter_dict and regional_misfit_parameter_dict which can contain any parameters that are used in utils.create_topography and regional.regional_separation respectively. Any parameters in these 3 dictionaries will be sampled with a Latin Hypercube sampling technique and the sampled values will be past to inversion.run_inversion. These dictionaries should be formatted as follows: {“parameter_name”: {“distribution”: “normal”, “loc”: 0, “scale”: 1, “log”: True}} where for a “distribution” of “normal”, “loc” is the center of the distribution and “scale” is the standard deviation, and for a “distribution” of “uniform”, “loc” is the lower bound and “scale” is the range of the distribution. If “log” is True, “loc” and “scale” refer to the base 10 exponent of the values. For example, a uniform distribution with loc=-4, scale=6 and log=True will sample values between 1e-4 and 1e2. The Latin Hypercube sampling takes the parameter distributions and the number of runs and creates evenly spaced samples within the distribution bounds. Therefore, unlike the sampled of data, the same run number will only reproduce the same sampling results if the total run numbers are the same. This means you should not reuse the filename to add more iterations to the stochastic ensemble but increasing the run number if you are using parameter sampling.

Parameters:

runs (int) – number of inversion workflows to run
fname (str | None, optional) – file name to use as root to save each inversions results to, by default None and is set to “tmp_{random.randint(0,999)}_stochastic_ensemble”.
sample_gravity (bool, optional) – choose to randomly sample the gravity data from a normal distribution with a mean of each data value and a standard deviation given by the column “uncert”, by default False
gravity_filter_width (float | None, optional) – the width in meters of a low-pass filter to apply to the gravity data after sampling, by default None
sample_constraints (bool, optional) – choose to randomly sample the constraint elevations from a normal distribution with a mean of each data value and a standard deviation given by the column “uncert”, by default False
starting_topography_parameter_dict (dict[str, Any] | None, optional) – parameters with their uncertainty distributions used for creating the starting topography model, by default None
regional_misfit_parameter_dict (dict[str, Any] | None, optional) – parameters with their uncertainty distributions used for estimating the regional component of the gravity misfit, by default None
parameter_dict (dict[str, Any] | None, optional) – parameters with their uncertainty distributions used in the inversion workflow, by default None
create_starting_topography (bool, optional) – choose to recreate the starting topography model, by default False
create_starting_prisms (bool, optional) – choose to recreate the starting prism model, by default False
calculate_starting_gravity (bool, optional) – choose to recalculate the starting gravity, by default False
calculate_regional_misfit (bool, optional) – choose to recalculate the regional gravity, by default False
regional_grav_kwargs (dict[str, Any] | None, optional) – kwargs passed to regional.regional_separation, by default None
starting_topography_kwargs (dict[str, Any] | None, optional) – kwargs passed to utils.create_topography, by default None
kwargs (Any)

Returns:

params (list[dict[str, typing.Any]]) – list of inversion parameters dictionaries with added key for the run number
grav_dfs (list[pandas.DataFrame]) – list of gravity dataframes from each inversion run
prism_dfs (list[pandas.DataFrame]) – list of prism dataframes from each inversion run
sampled_params (dict[str, typing.Any]) – dictionary of sampled parameter values from the Latin Hypercube sampling

Return type:

tuple[dict[str, Any], list[pandas.DataFrame], list[pandas.DataFrame], dict[str, Any]]

model_ensemble_stats(dataset, weights=None, region=None)[source]#

Given a dataset, calculate the cell-wise mean, standard deviation, and weighted mean and standard deviation of the variables.

Parameters:

dataset (xarray.Dataset) – dataset to perform cell-wise statistics on
weights (list | numpy.ndarray, optional) – weights to use in statistic calculations for each inversion topography, by default None
region (tuple[float, float, float, float], optional) – regions to calculate statistics within, by default None

Returns:

Dataset with variables for the mean, standard deviation, weighted mean, and weighted standard deviation of the ensemble of inverted topographies.

Return type:

xarray.Dataset

merge_simulation_results(grids)[source]#

Merge a list of grids into a single dataset with variable names “run_<number>” where x is the run number.

Parameters:: grids (list[xarray.DataArray]) – list of xarray grids
Returns:: dataset with a variable for each grid, with the variable name in the format “run_<number>”.
Return type:: xarray.Dataset

merged_stats(results, plot=True, constraints_df=None, weight_by='residual', region=None)[source]#

Use the outputs of the function uncertainty.full_workflow_uncertainty_loop to calculate the cell-wise statistics of the inversion ensemble and plot the resulting mean and standard deviation of the ensemble.

Parameters:

results (tuple[Any]) – list of lists of inversion results output from the function uncertainty.full_workflow_uncertainty_loop
plot (bool, optional) – show the resulting weighted mean and weighted standard deviation of the inversion ensemble, by default True
constraints_df (pandas.DataFrame, optional) – dataframe of constraint points to use for weighting the cell-wise statistics and for plotting , by default None
weight_by (str, optional) – choose to weight the cell-wise stats by either the RMS of the final residual gravity misfit of each inversion, or by the RMS between a priori topography measurements supplied by constraints_df and the inverted topography of each inversion, by default “residual”
region (tuple[float, float, float, float], optional) – region to calculate statistics within, by default None

Returns:

Dataset with variables for the mean, standard deviation, weighted mean, and weighted standard deviation of the ensemble of inverted topographies.

Return type:

xarray.Dataset