.. _sens_salib:

Sensitivity analysis workflow Part 2 (SALib driven workflow)
============================================================
In response to requests for a cross-platform sensitivity analysis
workflow, apsimNGpy has been integrated with SALib. As explained below

Sensitivity analysis follows a three-stage workflow triad:

1. **Sampling**
2. **Model evaluation**
3. **Analysis**

In this framework, **sampling** and **analysis** are handled by
`SALib` https://salib.readthedocs.io/en/latest/api.html, while **model evaluation** is performed by **APSIM**.

Conceptual overview
-------------------

The logical process begins by **defining the problem**, which includes
identifying the input parameters to be varied, their ranges, and the
model outputs of interest. This problem definition is then passed to
SALib, which generates a structured set of parameter samples according
to the selected sensitivity analysis method.

These sampled parameter sets are subsequently **evaluated by APSIM**.
For each sample, APSIM executes a full simulation and produces the
corresponding model outputs. Once all simulations have completed, the
evaluated results are returned to SALib, where sensitivity indices are
computed and summarized.

In simplified terms, the data flow is:

- define the sensitivity problem and model inputs,
- generate parameter samples using SALib,
- evaluate each sample by running APSIM,
- analyze the results using SALib.


Computational considerations
----------------------------

The **model evaluation stage** is by far the most computationally
expensive component of the workflow. Each APSIM simulation may take
several seconds or more to complete, and sensitivity analyses typically
require hundreds to thousands of model evaluations.

To mitigate this computational cost, APSIM simulations are executed
**in parallel**, allowing multiple parameter sets to be evaluated
simultaneously. Parallel execution substantially reduces total runtime
and makes large-scale sensitivity analysis feasible in practice.

The first step is to import the necessary libraries

.. code-block:: python

    from SALib.sample import saltelli
    from SALib.analyze import sobol
    from apsimNGpy.sensitivity.sensitivity import ConfigProblem, run_sensitivity


Define APSIM model inputs
--------------------------

.. code-block:: python

     params = {
            ".Simulations.Simulation.Field.Sow using a variable rule?Population": (2, 10),
            ".Simulations.Simulation.Field.Fertilise at sowing?Amount": (0, 300),
            # ".Simulations.Simulation.Field.Maize.CultivarFolder.Dekalb_XL82?[Leaf].Photosynthesis.RUE.FixedValue": (
            #     1.2, 2.2),
        }
.. tip::

   The **base parameter path** and the **specific parameter name or sub-path**
   are separated using one of the following delimiters: ``?``, ``::``, or ``@``.

   The values associated with each parameter define the **lower and upper
   bounds** of the sampling space used during sensitivity analysis.

Initialize the problem
--------------------------
use the ``ConfigProblem`` problem configuration class to set the scene for the senstivity triad

.. code-block:: python

        runner = ConfigProblem(
            base_model="Maize",
            params=params,
            outputs=["Yield", "Maize.AboveGround.N"],
        )

Generate samples
----------------
Since we are performing a Sobol’ sensitivity analysis, we need to generate samples using the Saltelli sampler from SAlib, as shown below.

.. code-block:: python

    param_values = saltelli.sample(runner.problem, 2 ** 4)

Sample matrix and evaluation
============================

The variable ``param_values`` is a two-dimensional NumPy array.
Inspecting its shape using ``param_values.shape`` shows that the array
has dimensions **96 by 2**.

This means that **96 parameter samples** were generated, and each sample
contains values for **2 model parameters**.

---

Saltelli sampling behavior
--------------------------

The Saltelli sampler generates a fixed number of samples based on the
number of input parameters and the base sample size. In this example:

- the base sample size ``N`` is ``2**4`` (16),
- the number of model inputs ``D`` is 2.

By default, the Saltelli sampler generates ``N * (2D + 2)`` samples.
With the values above, this results in **96 samples**.

If the option ``calc_second_order=False`` is used, the number of samples
is reduced. In that case, the sampler generates ``N * (D + 2)`` samples,
which significantly lowers the computational cost.

---

Running the APSIM model
-----------------------

The model evaluation step is performed using the ``evaluate`` method of
the ``ConfigProblem`` object.

.. code-block:: python

   Y = runner.evaluate(param_values)

This method runs the APSIM model once for **each row** in
``param_values`` and collects the corresponding outputs.

---

Evaluation output
-----------------

The variable ``Y`` is returned as a NumPy array. Its dimensionality
depends on the number of outputs specified in ``ConfigProblem``.

- If a single output is selected, ``Y`` is a one-dimensional array.
- If multiple outputs are selected, ``Y`` is a two-dimensional array.

In this example, two outputs were specified, so ``Y`` has shape
``(96, 2)``. Each row corresponds to one parameter sample, and each
column corresponds to one model output.

In other words, the evaluation step applies APSIM to every sampled
parameter set and returns the simulated results in a structured array
that can be passed directly to SALib for analysis.

Perform Analysis
--------------------
With the model outputs loaded into Python memory, the final step is to
compute the sensitivity indices. In this example, we use
``sobol.analyze``, which computes first-order, second-order, and
total-order sensitivity indices.

Additional details on the Sobol analysis workflow are available
`here <https://salib.readthedocs.io/en/latest/user_guide/basics.html#an-example>`_.

.. code-block:: python

   Si = [sobol.analyze(runner.problem, Y[:, i], print_to_console=True) for i in range(Y.ndim)]

Each element in the result is returned as a Python dictionary containing
the keys ``S1``, ``S2``, ``ST``, ``S1_conf``, ``S2_conf``, and ``ST_conf``.
The ``*_conf`` entries represent the confidence intervals associated with
each sensitivity index, typically computed at a 95 percent confidence
level.

To display all sensitivity indices at once, set the keyword argument
``print_to_console=True``. Alternatively, individual indices can be
accessed directly from the results object, as shown below. Using grain yield as example hence index 0

.. code-block:: python

    Si[0]['S1']

.. code-block:: none

   array([0.43932783, 0.84502346])

Amount which is Nitrogen fertilizer quantity have a higher first order effect than the planting population density.

we can also retrieve the total effect

.. code-block:: python

   Si[0]['S1']

.. code-block:: python

  array([0.44579877, 0.91875446])

The larger total-order effects relative to the first-order effects
suggest potential interactions between the two factors

.. code-block:: python

    Si[0]['S2'][0,1]

.. code-block:: none

   np.float64(0.025099857475360032)

.. code-block:: python

 Si = [sobol.analyze(runner.problem, Y[:,i], print_to_console=True) for i in range(Y.ndim)]

.. code-block:: none

         ST   ST_conf
    Population  0.445799  0.447337
    Amount      0.918754  0.492323
                      S1   S1_conf
    Population  0.439328  0.471019
    Amount      0.845023  0.589469
                              S2   S2_conf
    [Population, Amount]  0.0251  0.338638
                      ST   ST_conf
    Population  0.342162  0.386487
    Amount      0.948013  0.527931
                      S1   S1_conf
    Population  0.334605  0.493981
    Amount      0.898494  0.579069
                               S2   S2_conf
    [Population, Amount]  0.04358  0.397149

SALib provides utilities that allow each sensitivity index to be
converted into a pandas DataFrame, which can then be used for further
analysis or visualization.

.. code-block:: python

  total_y, s1_y, s2_y  = Si[0].to_df()
  total_gn, s1_gn, s2_n  = Si[1].to_df()

.. code-block:: python

   print(total_y)

.. code-block:: python
                       ST   ST_conf
    Population  0.445799  0.447337
    Amount      0.918754  0.492323

SALib provides an object-oriented interface that simplifies the
sensitivity analysis triad of sampling, evaluation, and analysis.
In apsimNGpy, this workflow is further streamlined by wrapping all
three steps into a single method,
:meth:`~apsimNGpy.sensitivity.sensitivity.run_sensitivity`.

By keeping the initialized runner or problem definition constant,
the complete sensitivity analysis demonstrated above can be executed
in a concise and reproducible manner, as shown below.

.. code-block:: python

    Si_sobol = run_sensitivity(
        runner,
        method="sobol",
        N=2 ** 5,  # ← base sample size should be power of 2
        sample_options={
            "calc_second_order": True,
            "skip_values": 1024,
             "seed": 42,
        },
        analyze_options={
            "conf_level": 0.95,
            "num_resamples": 1000,
            "print_to_console": True,
            "calc_second_order": True,
        },
    )


The returned object is an instance of the
``SALib.util.problem.ProblemSpec`` class. When evaluated in the Python
console, this object displays a summary of the problem definition and
the computed sensitivity indices, as shown below.

.. code-block:: none

      Out[4]:
    Samples:
        2 parameters: ['Population', 'Amount']
        96 samples
    Outputs:
        2 outputs: ['Y1', 'Y2']
        96 evaluations
    Analysis:
    Y1:
                      ST   ST_conf
    Population  0.308278  0.291488
    Amount      1.089438  0.522208:
                      S1   S1_conf
    Population  0.394278  0.374126
    Amount      0.780433  0.644199:
                                S2   S2_conf
    [Population, Amount] -0.024841  0.561135:
    Y2:
                      ST   ST_conf
    Population  0.230316  0.245983
    Amount      1.073707  0.475261:
                      S1   S1_conf
    Population  0.307134  0.290722
    Amount      0.772897  0.628444:
                               S2   S2_conf
    [Population, Amount] -0.06778  0.422351:

The presence of negative sensitivity indices usually indicates an
insufficient sample size. Increasing the base sample size can help
mitigate this issue.

Converting outputs to pandas
----------------------------
If we use `to_df` a list of Pandas Dataframes is returned

.. code-block:: none
     [[                  ST   ST_conf
      Population  0.308278  0.291488
      Amount      1.089438  0.522208,
                        S1   S1_conf
      Population  0.394278  0.374126
      Amount      0.780433  0.644199,
                                  S2   S2_conf
      [Population, Amount] -0.024841  0.561135],
     [                  ST   ST_conf
      Population  0.230316  0.245983
      Amount      1.073707  0.475261,
                        S1   S1_conf
      Population  0.307134  0.290722
      Amount      0.772897  0.628444,
                                 S2   S2_conf
      [Population, Amount] -0.06778  0.422351]]

Key important attributes for  samples, model results and analyses extraction available on the returned instance are shown below.

.. code-block:: python

    print( Si_sobol.samples)
    print( Si_sobol.results)
    print( Si_sobol.analysis)


Basic plotting functionality is also provided

.. code-block:: python

    Si_sobol.plot()

We can try another method known as Morris.   The Morris method is typically used as a *screening tool* to identify influential
parameters with relatively low computational cost. It is well suited for high-dimensional
problems where the goal is to rank parameters rather than quantify precise sensitivities.

.. code-block:: python

  Si_morris = run_sensitivity(
        runner,
        method="morris", n_cores=10,
        sample_options={
            'seed': 42,
            "num_levels": 6,
            "optimal_trajectories": 6,
        },
        analyze_options={
            'conf_level': 0.95,
            "num_resamples": 1000,
            "print_to_console": True,
            'seed': 42
        },
    )
.. code-block:: none

   Out[7]:
    Samples:
        2 parameters: ['Population', 'Amount']
        18 samples
    Outputs:
        2 outputs: ['Y1', 'Y2']
        18 evaluations
    Analysis:
    Y1:
                          mu       mu_star         sigma  mu_star_conf
    Population  11394.831001  14289.237399  17359.167016  10309.394014
    Amount      28441.782977  29864.030018  31160.051327  22045.514695:
    Y2:
                       mu    mu_star      sigma  mu_star_conf
    Population  33.702220  35.548789  35.455297     23.199723
    Amount      78.486127  78.486127  73.559727     55.214639:

The final sensitivity analysis method demonstrated in this tutorial is
the FAST method. FAST is a variance-based approach that estimates the
influence of each input parameter by systematically varying inputs
across the parameter space and analyzing the resulting model response.

.. code-block:: python

    si_fast = run_sensitivity(
        runner,
        method="fast",
        sample_options={
            "M": 2,

        },
        analyze_options={
            'conf_level': 0.95,
            "num_resamples": 1000,
            "print_to_console": True,
        },
    )

.. code-block:: none

    Out[9]:
    Samples:
        2 parameters: ['Population', 'Amount']
        130 samples
    Outputs:
        2 outputs: ['Y1', 'Y2']
        130 evaluations
    Analysis:
    Y1:
                      S1        ST   S1_conf   ST_conf
    Population  0.211503  0.332158  0.206649  0.165346
    Amount      0.585945  0.742978  0.206981  0.163964:
    Y2:
                      S1        ST   S1_conf   ST_conf
    Population  0.151057  0.243790  0.208318  0.158650
    Amount      0.708005  0.819993  0.200030  0.162489:

Ideally, the analysis could begin with the Morris method to screen out
less influential factors, particularly when a large number of inputs
are under consideration. However, even without this initial screening,
the results are consistent across methods.

In this case, both FAST and Sobol analyses indicate that yield is more
sensitive to nitrogen fertilizer rate than to population density.

To extend the analysis beyond the Sobol, Morris, and FAST methods, follow
the same workflow demonstrated in the first example and consult the
SALib documentation for additional sensitivity analysis techniques and
their usage.

.. versionadded:: 1.0.0

References

Iwanaga, T., Usher, W., & Herman, J. (2022). Toward SALib 2.0: Advancing the accessibility and interpretability of global sensitivity analyses. Socio-Environmental Systems Modelling, 4, 18155. doi:10.18174/sesmo.18155

Herman, J. and Usher, W. (2017) SALib: An open-source Python library for sensitivity analysis. Journal of Open Source Software, 2(9). doi:10.21105/joss.00097