invert4geom.uncertainty
=======================

.. py:module:: invert4geom.uncertainty


Classes
-------

.. autoapisummary::

   invert4geom.uncertainty.DiscreteUniform


Functions
---------

.. autoapisummary::

   invert4geom.uncertainty.create_lhc
   invert4geom.uncertainty.randomly_sample_data
   invert4geom.uncertainty.starting_topography_uncertainty
   invert4geom.uncertainty.equivalent_sources_uncertainty
   invert4geom.uncertainty.regional_misfit_uncertainty
   invert4geom.uncertainty.full_workflow_uncertainty_loop
   invert4geom.uncertainty.model_ensemble_stats
   invert4geom.uncertainty.merge_simulation_results
   invert4geom.uncertainty.merged_stats


Module Contents
---------------

.. py:class:: DiscreteUniform(loc = 0.0, scale = 1.0)

   Bases: :py:obj:`UQpy.distributions.DistributionDiscrete1D`


   Discrete uniform distribution.


.. py:function:: create_lhc(n_samples, parameter_dict, random_state = 1, criterion = 'centered')

   Given some parameter values and their expected distributions, create a Latin
   Hypercube with a given number of samples.

   :param n_samples: how many samples to make for each parameter
   :type n_samples: int
   :param parameter_dict: nested dictionary, with a dictionary of 'distribution', 'loc', 'scale' and
                          optionally 'log' for each parameter to be sampled. Distributions can be
                          'uniform' or 'normal'. For 'uniform', 'loc' is the lower bound and 'scale' is
                          the range of the distribution. 'loc' + 'scale' = upper bound. For 'normal',
                          'loc' is the center (mean) of the distribution and 'scale' is the standard
                          deviation. If 'log' is True, the provided 'loc' and 'scale' values are the base
                          10 exponents. For example, a uniform distribution with loc=-4, scale=6 and
                          log=True would sample values between 1e-4 and 1e2.
   :type parameter_dict: dict
   :param random_state: random state to use for sampling, by default 1
   :type random_state: int, optional
   :param criterion: criterion to use for sampling, by default "centered", options are "centered",
                     "random", "maximin", or "mincorrelation", which each relate to a criterion from
                     the Python package UQpy.
   :type criterion: str, optional

   :returns: nested dictionary with parameter names, distribution specifics, and sampled
             values
   :rtype: dict[dict[typing.Any]]


.. py:function:: randomly_sample_data(seed, data_df, data_col, uncert_col)

   Given a dataframe with a data column and an uncertainty column, sample the data with
   a normal distribution within the uncertainty range. Note that this overwrites the
   data column with the newly sampled data.

   :param seed: random number generator seed
   :type seed: int
   :param data_df: dataframe with columns `data_col` and `uncert_col`
   :type data_df: pandas.DataFrame
   :param data_col: name of data column to sample
   :type data_col: str
   :param uncert_col: name of uncertainty column to sample within
   :type uncert_col: str

   :returns: dataframe with data column updated with sampled values
   :rtype: pandas.DataFrame


.. py:function:: starting_topography_uncertainty(runs, sample_constraints = False, parameter_dict = None, plot = True, plot_region = None, true_topography = None, **kwargs)

   Create a stochastic ensemble of starting topographies by sampling the constraints or
   parameters within their respective distributions and find the cell-wise (weighted)
   statistics of the ensemble.

   :param runs: number of runs to perform
   :type runs: int
   :param sample_constraints: choose to sample the constraints from a normal distribution with a mean of each
                              constraints depth and a standard deviation set by the `uncert` column, by
                              default False
   :type sample_constraints: bool, optional
   :param parameter_dict: dictionary of parameters passes to `create_topography` with the uncertainty
                          distributions defined, by default None
   :type parameter_dict: dict[str, typing.Any] | None, optional
   :param plot: show the results, by default True
   :type plot: bool, optional
   :param plot_region: clip the plot to a region, by default None
   :type plot_region: tuple[float, float, float, float] | None, optional
   :param true_topography: if the true topography is known, will make a plot comparing the results, by
                           default None
   :type true_topography: xarray.DataArray | None, optional

   :returns: * **stats_ds** (*xarray.Dataset*) -- a dataset with the cell-wise statistics of the ensemble of topographies.
             * **sampled_param_dict** (*dict[str, typing.Any]*) -- dictionary of sampled parameter values.


.. py:function:: equivalent_sources_uncertainty(runs, data, coords, grid_points, parameter_dict = None, region = None, plot = True, plot_region = None, true_gravity = None, deterministic_error = None, weight_by = None, **kwargs)

   Create a stochastic ensemble of regional gravity anomalies by sampling the
   constraints, gravity, or parameters within their respective distributions and
   calculate the cell-wise (weighted) statistics of the ensemble.

   :param runs: number of runs to perform
   :type runs: int
   :param parameter_dict: dictionary of parameters passes to `regional_separation` with the uncertainty
                          distributions defined, by default None
   :type parameter_dict: dict[str, typing.Any] | None, optional
   :param region: region to calculate statistics within, by default None
   :type region: tuple[float, float, float, float] | None = None,
   :param plot: show the results, by default True
   :type plot: bool, optional
   :param plot_region: clip the plot to a region, by default None
   :type plot_region: tuple[float, float, float, float] | None, optional
   :param true_regional: if the true regional misfit is known, will make a plot comparing the results, by
                         default None
   :type true_regional: xarray.DataArray | None, optional
   :param deterministic_error: if the deterministic error is known, will make a plot comparing the results, by
                               default None
   :type deterministic_error: xarray.DataArray | None, optional
   :param weight_by: how to weight the models, by default None
   :type weight_by: str | None, optional

   :returns: * **stats_ds** (*xarray.Dataset,*) -- a dataset with the cell-wise statistics of the ensemble of regional gravity
             * **sampled_parms_dict** (*dict[str, typing.Any]*) -- a dictionary of sampled parameter values.


.. py:function:: regional_misfit_uncertainty(runs, sample_gravity = False, parameter_dict = None, region = None, plot = True, plot_region = None, true_regional = None, weight_by = None, **kwargs)

   Create a stochastic ensemble of regional gravity anomalies by sampling the
   constraints, gravity, or parameters within their respective distributions and
   calculate the cell-wise (weighted) statistics of the ensemble.

   :param runs: number of runs to perform
   :type runs: int
   :param sample_gravity: choose to sample the gravity data from a normal distribution with a mean of each
                          points value and a standard deviation set by the `uncert` column, by
                          default False
   :type sample_gravity: bool, optional
   :param parameter_dict: dictionary of parameters passes to `regional_separation` with the uncertainty
                          distributions defined, by default None
   :type parameter_dict: dict[str, typing.Any] | None, optional
   :param region: region to calculate statistics within, by default None
   :type region: tuple[float, float, float, float] | None = None,
   :param plot: show the results, by default True
   :type plot: bool, optional
   :param plot_region: clip the plot to a region, by default None
   :type plot_region: tuple[float, float, float, float] | None, optional
   :param true_regional: if the true regional misfit is known, will make a plot comparing the results, by
                         default None
   :type true_regional: xarray.DataArray | None, optional

   :returns: * **stats_ds** (*xarray.Dataset*) -- a dataset with the cell-wise statistics of the ensemble of regional gravity
             * **sampled_param_dict** (*dict[str, typing.Any]*) -- a dictionary of sampled parameter values.


.. py:function:: full_workflow_uncertainty_loop(runs, fname = None, sample_gravity = False, gravity_filter_width = None, sample_constraints = False, starting_topography_parameter_dict = None, regional_misfit_parameter_dict = None, parameter_dict = None, create_starting_topography = False, create_starting_prisms = False, calculate_starting_gravity = False, calculate_regional_misfit = False, regional_grav_kwargs = None, starting_topography_kwargs = None, **kwargs)

   Run a series of inversions (N=runs), and save results of
   each inversion to pickle files starting with `fname`. If files already
   exist, just return the loaded results instead of re-running the inversion.
   Choose which variables to include in the sampling and whether or not to
   run a damping value cross-validation for each inversion.

   Feed returned values into function `merged_stats` to compute
   cell-wise stats on the resulting ensemble of starting topography models,
   inverted topography models, and gravity anomalies.

   Sampling of data (gravity and constraints) uses the columns "uncert" in the
   dataframes and randomly samples the data from a normal distribution with the
   uncertainty value as the standard deviation and the data value as the mean. The
   randomness is controlled by a seed which is equal to the run number, so it changes
   at every run, and the same run will always produce the same sampling. This allows
   the run number to be increased and this function run again with the same filename
   to continue the stochastic ensemble. This only works with data sampling, not
   parameter sampling.

   Sampling of parameter values are determined by 3 supplied dictionaries:
   `parameter_dict` which can contain parameters density_contrast, zref, and
   solver_damping. The other two dictionaries are `starting_topography_parameter_dict`
   and `regional_misfit_parameter_dict` which can contain any parameters that are used
   in `utils.create_topography` and `regional.regional_separation` respectively. Any
   parameters in these 3 dictionaries will be sampled with a Latin Hypercube sampling
   technique and the sampled values will be past to `inversion.run_inversion`. These
   dictionaries should be formatted as follows: `{"parameter_name": {"distribution":
   "normal", "loc": 0, "scale": 1, "log": True}}` where for a "distribution" of
   "normal", "loc" is the center of the distribution and "scale" is the standard
   deviation, and for a "distribution" of "uniform", "loc" is the lower bound and
   "scale" is the range of the distribution. If "log" is True, "loc" and "scale"
   refer to the base 10 exponent of the values. For example, a uniform distribution
   with loc=-4, scale=6 and log=True will sample values between 1e-4 and 1e2. The
   Latin Hypercube sampling takes the parameter distributions and the number of runs
   and creates evenly spaced samples within the distribution bounds. Therefore, unlike
   the sampled of data, the same run number will only reproduce the same sampling
   results if the total run numbers are the same. This means you should not reuse the
   filename to add more iterations to the stochastic ensemble but increasing the run
   number if you are using parameter sampling.


   :param runs: number of inversion workflows to run
   :type runs: int
   :param fname: file name to use as root to save each inversions results to, by default None and
                 is set to "tmp_{random.randint(0,999)}_stochastic_ensemble".
   :type fname: str | None, optional
   :param sample_gravity: choose to randomly sample the gravity data from a normal distribution with a
                          mean of each data value and a standard deviation given by the column "uncert",
                          by default False
   :type sample_gravity: bool, optional
   :param gravity_filter_width: the width in meters of a low-pass filter to apply to the gravity data after
                                sampling, by default None
   :type gravity_filter_width: float | None, optional
   :param sample_constraints: choose to randomly sample the constraint elevations from a normal distribution
                              with a mean of each data value and a standard deviation given by the column
                              "uncert", by default False
   :type sample_constraints: bool, optional
   :param starting_topography_parameter_dict: parameters with their uncertainty distributions used for creating the starting
                                              topography model, by default None
   :type starting_topography_parameter_dict: dict[str, typing.Any] | None, optional
   :param regional_misfit_parameter_dict: parameters with their uncertainty distributions used for estimating the regional
                                          component of the gravity misfit, by default None
   :type regional_misfit_parameter_dict: dict[str, typing.Any] | None, optional
   :param parameter_dict: parameters with their uncertainty distributions used in the inversion workflow,
                          by default None
   :type parameter_dict: dict[str, typing.Any] | None, optional
   :param create_starting_topography: choose to recreate the starting topography model, by default False
   :type create_starting_topography: bool, optional
   :param create_starting_prisms: choose to recreate the starting prism model, by default False
   :type create_starting_prisms: bool, optional
   :param calculate_starting_gravity: choose to recalculate the starting gravity, by default False
   :type calculate_starting_gravity: bool, optional
   :param calculate_regional_misfit: choose to recalculate the regional gravity, by default False
   :type calculate_regional_misfit: bool, optional
   :param regional_grav_kwargs: kwargs passed to :func:`.regional.regional_separation`, by default None
   :type regional_grav_kwargs: dict[str, typing.Any] | None, optional
   :param starting_topography_kwargs: kwargs passed to :func:`.utils.create_topography`, by default None
   :type starting_topography_kwargs: dict[str, typing.Any] | None, optional

   :returns: * **params** (*list[dict[str, typing.Any]]*) -- list of inversion parameters dictionaries with added key for the run number
             * **grav_dfs** (*list[pandas.DataFrame]*) -- list of gravity dataframes from each inversion run
             * **prism_dfs** (*list[pandas.DataFrame]*) -- list of prism dataframes from each inversion run
             * **sampled_params** (*dict[str, typing.Any]*) -- dictionary of sampled parameter values from the Latin Hypercube sampling


.. py:function:: model_ensemble_stats(dataset, weights = None, region = None)

   Given a dataset, calculate the cell-wise mean, standard deviation, and weighted mean
   and standard deviation of the variables.

   :param dataset: dataset to perform cell-wise statistics on
   :type dataset: xarray.Dataset
   :param weights: weights to use in statistic calculations for each inversion topography, by
                   default None
   :type weights: list | numpy.ndarray, optional
   :param region: regions to calculate statistics within, by default None
   :type region: tuple[float, float, float, float], optional

   :returns: Dataset with variables for the mean, standard deviation, weighted mean, and
             weighted standard deviation of the ensemble of inverted topographies.
   :rtype: xarray.Dataset


.. py:function:: merge_simulation_results(grids)

   Merge a list of grids into a single dataset with variable names "run_<number>"
   where x is the run number.

   :param grids: list of xarray grids
   :type grids: list[xarray.DataArray]

   :returns: dataset with a variable for each grid, with the variable
             name in the format "run_<number>".
   :rtype: xarray.Dataset


.. py:function:: merged_stats(results, plot = True, constraints_df = None, weight_by = 'residual', region = None)

   Use the outputs of the function `uncertainty.full_workflow_uncertainty_loop` to
   calculate the cell-wise statistics of the inversion ensemble and plot the resulting
   mean and standard deviation of the ensemble.

   :param results: list of lists of inversion results output from the function
                   `uncertainty.full_workflow_uncertainty_loop`
   :type results: tuple[typing.Any]
   :param plot: show the resulting weighted mean and weighted standard deviation of the
                inversion ensemble, by default True
   :type plot: bool, optional
   :param constraints_df: dataframe of constraint points to use for weighting the cell-wise statistics and
                          for plotting , by default None
   :type constraints_df: pandas.DataFrame, optional
   :param weight_by: choose to weight the cell-wise stats by either the RMS of the final residual
                     gravity misfit of each inversion, or by the RMS between a priori topography
                     measurements supplied by constraints_df and the inverted topography of each
                     inversion, by default "residual"
   :type weight_by: str, optional
   :param region: region to calculate statistics within, by default None
   :type region: tuple[float, float, float, float], optional

   :returns: Dataset with variables for the mean, standard deviation, weighted mean, and
             weighted standard deviation of the ensemble of inverted topographies.
   :rtype: xarray.Dataset


