Sensitivity analysis workflow Part 2 (SALib driven workflow)

In response to requests for a cross-platform sensitivity analysis workflow, apsimNGpy has been integrated with SALib. As explained below

Sensitivity analysis follows a three-stage workflow triad:

Sampling
Model evaluation
Analysis

In this framework, sampling and analysis are handled by SALib https://salib.readthedocs.io/en/latest/api.html, while model evaluation is performed by APSIM.

Conceptual overview

The logical process begins by defining the problem, which includes identifying the input parameters to be varied, their ranges, and the model outputs of interest. This problem definition is then passed to SALib, which generates a structured set of parameter samples according to the selected sensitivity analysis method.

These sampled parameter sets are subsequently evaluated by APSIM. For each sample, APSIM executes a full simulation and produces the corresponding model outputs. Once all simulations have completed, the evaluated results are returned to SALib, where sensitivity indices are computed and summarized.

In simplified terms, the data flow is:

define the sensitivity problem and model inputs,
generate parameter samples using SALib,
evaluate each sample by running APSIM,
analyze the results using SALib.

Computational considerations

The model evaluation stage is by far the most computationally expensive component of the workflow. Each APSIM simulation may take several seconds or more to complete, and sensitivity analyses typically require hundreds to thousands of model evaluations.

To mitigate this computational cost, APSIM simulations are executed in parallel, allowing multiple parameter sets to be evaluated simultaneously. Parallel execution substantially reduces total runtime and makes large-scale sensitivity analysis feasible in practice.

The first step is to import the necessary libraries

from SALib.sample import saltelli
from SALib.analyze import sobol
from apsimNGpy.sensitivity.sensitivity import ConfigProblem, run_sensitivity

Define APSIM model inputs

params = {
       ".Simulations.Simulation.Field.Sow using a variable rule?Population": (2, 10),
       ".Simulations.Simulation.Field.Fertilise at sowing?Amount": (0, 300),
       # ".Simulations.Simulation.Field.Maize.CultivarFolder.Dekalb_XL82?[Leaf].Photosynthesis.RUE.FixedValue": (
       #     1.2, 2.2),
   }

Tip

The base parameter path and the specific parameter name or sub-path are separated using one of the following delimiters: ?, ::, or @.

The values associated with each parameter define the lower and upper bounds of the sampling space used during sensitivity analysis.

Initialize the problem

use the ConfigProblem problem configuration class to set the scene for the senstivity triad

runner = ConfigProblem(
    base_model="Maize",
    params=params,
    outputs=["Yield", "Maize.AboveGround.N"],
)

Generate samples

Since we are performing a Sobol’ sensitivity analysis, we need to generate samples using the Saltelli sampler from SAlib, as shown below.

param_values = saltelli.sample(runner.problem, 2 ** 4)

Sample matrix and evaluation

The variable param_values is a two-dimensional NumPy array. Inspecting its shape using param_values.shape shows that the array has dimensions 96 by 2.

This means that 96 parameter samples were generated, and each sample contains values for 2 model parameters.

—

Saltelli sampling behavior

The Saltelli sampler generates a fixed number of samples based on the number of input parameters and the base sample size. In this example:

the base sample size N is 2**4 (16),
the number of model inputs D is 2.

By default, the Saltelli sampler generates N * (2D + 2) samples. With the values above, this results in 96 samples.

If the option calc_second_order=False is used, the number of samples is reduced. In that case, the sampler generates N * (D + 2) samples, which significantly lowers the computational cost.

—

Running the APSIM model

The model evaluation step is performed using the evaluate method of the ConfigProblem object.

Y = runner.evaluate(param_values)

This method runs the APSIM model once for each row in param_values and collects the corresponding outputs.

—

Evaluation output

The variable Y is returned as a NumPy array. Its dimensionality depends on the number of outputs specified in ConfigProblem.

If a single output is selected, Y is a one-dimensional array.
If multiple outputs are selected, Y is a two-dimensional array.

In this example, two outputs were specified, so Y has shape (96, 2). Each row corresponds to one parameter sample, and each column corresponds to one model output.

In other words, the evaluation step applies APSIM to every sampled parameter set and returns the simulated results in a structured array that can be passed directly to SALib for analysis.

Perform Analysis

With the model outputs loaded into Python memory, the final step is to compute the sensitivity indices. In this example, we use sobol.analyze, which computes first-order, second-order, and total-order sensitivity indices.

Additional details on the Sobol analysis workflow are available here.

Si = [sobol.analyze(runner.problem, Y[:, i], print_to_console=True) for i in range(Y.ndim)]

Each element in the result is returned as a Python dictionary containing the keys S1, S2, ST, S1_conf, S2_conf, and ST_conf. The *_conf entries represent the confidence intervals associated with each sensitivity index, typically computed at a 95 percent confidence level.

To display all sensitivity indices at once, set the keyword argument print_to_console=True. Alternatively, individual indices can be accessed directly from the results object, as shown below. Using grain yield as example hence index 0

Si[0]['S1']

array([0.43932783, 0.84502346])

Amount which is Nitrogen fertilizer quantity have a higher first order effect than the planting population density.

we can also retrieve the total effect

Si[0]['S1']

array([0.44579877, 0.91875446])

The larger total-order effects relative to the first-order effects suggest potential interactions between the two factors

Si[0]['S2'][0,1]

np.float64(0.025099857475360032)

Si = [sobol.analyze(runner.problem, Y[:,i], print_to_console=True) for i in range(Y.ndim)]

     ST   ST_conf
Population  0.445799  0.447337
Amount      0.918754  0.492323
                  S1   S1_conf
Population  0.439328  0.471019
Amount      0.845023  0.589469
                          S2   S2_conf
[Population, Amount]  0.0251  0.338638
                  ST   ST_conf
Population  0.342162  0.386487
Amount      0.948013  0.527931
                  S1   S1_conf
Population  0.334605  0.493981
Amount      0.898494  0.579069
                           S2   S2_conf
[Population, Amount]  0.04358  0.397149

SALib provides utilities that allow each sensitivity index to be converted into a pandas DataFrame, which can then be used for further analysis or visualization.

total_y, s1_y, s2_y  = Si[0].to_df()
total_gn, s1_gn, s2_n  = Si[1].to_df()

print(total_y)

SALib provides an object-oriented interface that simplifies the sensitivity analysis triad of sampling, evaluation, and analysis. In apsimNGpy, this workflow is further streamlined by wrapping all three steps into a single method, run_sensitivity().

By keeping the initialized runner or problem definition constant, the complete sensitivity analysis demonstrated above can be executed in a concise and reproducible manner, as shown below.

Si_sobol = run_sensitivity(
    runner,
    method="sobol",
    N=2 ** 5,  # ← base sample size should be power of 2
    sample_options={
        "calc_second_order": True,
        "skip_values": 1024,
         "seed": 42,
    },
    analyze_options={
        "conf_level": 0.95,
        "num_resamples": 1000,
        "print_to_console": True,
        "calc_second_order": True,
    },
)

The returned object is an instance of the SALib.util.problem.ProblemSpec class. When evaluated in the Python console, this object displays a summary of the problem definition and the computed sensitivity indices, as shown below.

  Out[4]:
Samples:
    2 parameters: ['Population', 'Amount']
    96 samples
Outputs:
    2 outputs: ['Y1', 'Y2']
    96 evaluations
Analysis:
Y1:
                  ST   ST_conf
Population  0.308278  0.291488
Amount      1.089438  0.522208:
                  S1   S1_conf
Population  0.394278  0.374126
Amount      0.780433  0.644199:
                            S2   S2_conf
[Population, Amount] -0.024841  0.561135:
Y2:
                  ST   ST_conf
Population  0.230316  0.245983
Amount      1.073707  0.475261:
                  S1   S1_conf
Population  0.307134  0.290722
Amount      0.772897  0.628444:
                           S2   S2_conf
[Population, Amount] -0.06778  0.422351:

The presence of negative sensitivity indices usually indicates an insufficient sample size. Increasing the base sample size can help mitigate this issue.

Converting outputs to pandas

If we use to_df a list of Pandas Dataframes is returned

Key important attributes for samples, model results and analyses extraction available on the returned instance are shown below.

print( Si_sobol.samples)
print( Si_sobol.results)
print( Si_sobol.analysis)

Basic plotting functionality is also provided

Si_sobol.plot()

We can try another method known as Morris. The Morris method is typically used as a screening tool to identify influential parameters with relatively low computational cost. It is well suited for high-dimensional problems where the goal is to rank parameters rather than quantify precise sensitivities.

Si_morris = run_sensitivity(
      runner,
      method="morris", n_cores=10,
      sample_options={
          'seed': 42,
          "num_levels": 6,
          "optimal_trajectories": 6,
      },
      analyze_options={
          'conf_level': 0.95,
          "num_resamples": 1000,
          "print_to_console": True,
          'seed': 42
      },
  )

Out[7]:
 Samples:
     2 parameters: ['Population', 'Amount']
     18 samples
 Outputs:
     2 outputs: ['Y1', 'Y2']
     18 evaluations
 Analysis:
 Y1:
                       mu       mu_star         sigma  mu_star_conf
 Population  11394.831001  14289.237399  17359.167016  10309.394014
 Amount      28441.782977  29864.030018  31160.051327  22045.514695:
 Y2:
                    mu    mu_star      sigma  mu_star_conf
 Population  33.702220  35.548789  35.455297     23.199723
 Amount      78.486127  78.486127  73.559727     55.214639:

The final sensitivity analysis method demonstrated in this tutorial is the FAST method. FAST is a variance-based approach that estimates the influence of each input parameter by systematically varying inputs across the parameter space and analyzing the resulting model response.

si_fast = run_sensitivity(
    runner,
    method="fast",
    sample_options={
        "M": 2,

    },
    analyze_options={
        'conf_level': 0.95,
        "num_resamples": 1000,
        "print_to_console": True,
    },
)

Out[9]:
Samples:
    2 parameters: ['Population', 'Amount']
    130 samples
Outputs:
    2 outputs: ['Y1', 'Y2']
    130 evaluations
Analysis:
Y1:
                  S1        ST   S1_conf   ST_conf
Population  0.211503  0.332158  0.206649  0.165346
Amount      0.585945  0.742978  0.206981  0.163964:
Y2:
                  S1        ST   S1_conf   ST_conf
Population  0.151057  0.243790  0.208318  0.158650
Amount      0.708005  0.819993  0.200030  0.162489:

Ideally, the analysis could begin with the Morris method to screen out less influential factors, particularly when a large number of inputs are under consideration. However, even without this initial screening, the results are consistent across methods.

In this case, both FAST and Sobol analyses indicate that yield is more sensitive to nitrogen fertilizer rate than to population density.

To extend the analysis beyond the Sobol, Morris, and FAST methods, follow the same workflow demonstrated in the first example and consult the SALib documentation for additional sensitivity analysis techniques and their usage.

Added in version 1.0.0.

References

Iwanaga, T., Usher, W., & Herman, J. (2022). Toward SALib 2.0: Advancing the accessibility and interpretability of global sensitivity analyses. Socio-Environmental Systems Modelling, 4, 18155. doi:10.18174/sesmo.18155

Herman, J. and Usher, W. (2017) SALib: An open-source Python library for sensitivity analysis. Journal of Open Source Software, 2(9). doi:10.21105/joss.00097