Sensitivity analysis workflow Part 2 (SALib driven workflow)
In response to requests for a cross-platform sensitivity analysis workflow, apsimNGpy has been integrated with SALib. As explained below
Sensitivity analysis follows a three-stage workflow triad:
Sampling
Model evaluation
Analysis
In this framework, sampling and analysis are handled by
SALib https://salib.readthedocs.io/en/latest/api.html, while model evaluation is performed by APSIM.
Conceptual overview
The logical process begins by defining the problem, which includes identifying the input parameters to be varied, their ranges, and the model outputs of interest. This problem definition is then passed to SALib, which generates a structured set of parameter samples according to the selected sensitivity analysis method.
These sampled parameter sets are subsequently evaluated by APSIM. For each sample, APSIM executes a full simulation and produces the corresponding model outputs. Once all simulations have completed, the evaluated results are returned to SALib, where sensitivity indices are computed and summarized.
In simplified terms, the data flow is:
define the sensitivity problem and model inputs,
generate parameter samples using SALib,
evaluate each sample by running APSIM,
analyze the results using SALib.
Computational considerations
The model evaluation stage is by far the most computationally expensive component of the workflow. Each APSIM simulation may take several seconds or more to complete, and sensitivity analyses typically require hundreds to thousands of model evaluations.
To mitigate this computational cost, APSIM simulations are executed in parallel, allowing multiple parameter sets to be evaluated simultaneously. Parallel execution substantially reduces total runtime and makes large-scale sensitivity analysis feasible in practice.
The first step is to import the necessary libraries
from SALib.sample import saltelli
from SALib.analyze import sobol
from apsimNGpy.sensitivity.sensitivity import ConfigProblem, run_sensitivity
Define APSIM model inputs
params = {
".Simulations.Simulation.Field.Sow using a variable rule?Population": (2, 10),
".Simulations.Simulation.Field.Fertilise at sowing?Amount": (0, 300),
# ".Simulations.Simulation.Field.Maize.CultivarFolder.Dekalb_XL82?[Leaf].Photosynthesis.RUE.FixedValue": (
# 1.2, 2.2),
}
Tip
The base parameter path and the specific parameter name or sub-path
are separated using one of the following delimiters: ?, ::, or @.
The values associated with each parameter define the lower and upper bounds of the sampling space used during sensitivity analysis.
Initialize the problem
use the ConfigProblem problem configuration class to set the scene for the senstivity triad
runner = ConfigProblem(
base_model="Maize",
params=params,
outputs=["Yield", "Maize.AboveGround.N"],
)
Generate samples
Since we are performing a Sobol’ sensitivity analysis, we need to generate samples using the Saltelli sampler from SAlib, as shown below.
param_values = saltelli.sample(runner.problem, 2 ** 4)
Sample matrix and evaluation
The variable param_values is a two-dimensional NumPy array.
Inspecting its shape using param_values.shape shows that the array
has dimensions 96 by 2.
This means that 96 parameter samples were generated, and each sample contains values for 2 model parameters.
—
Saltelli sampling behavior
The Saltelli sampler generates a fixed number of samples based on the number of input parameters and the base sample size. In this example:
the base sample size
Nis2**4(16),the number of model inputs
Dis 2.
By default, the Saltelli sampler generates N * (2D + 2) samples.
With the values above, this results in 96 samples.
If the option calc_second_order=False is used, the number of samples
is reduced. In that case, the sampler generates N * (D + 2) samples,
which significantly lowers the computational cost.
—
Running the APSIM model
The model evaluation step is performed using the evaluate method of
the ConfigProblem object.
Y = runner.evaluate(param_values)
This method runs the APSIM model once for each row in
param_values and collects the corresponding outputs.
—
Evaluation output
The variable Y is returned as a NumPy array. Its dimensionality
depends on the number of outputs specified in ConfigProblem.
If a single output is selected,
Yis a one-dimensional array.If multiple outputs are selected,
Yis a two-dimensional array.
In this example, two outputs were specified, so Y has shape
(96, 2). Each row corresponds to one parameter sample, and each
column corresponds to one model output.
In other words, the evaluation step applies APSIM to every sampled parameter set and returns the simulated results in a structured array that can be passed directly to SALib for analysis.
Perform Analysis
With the model outputs loaded into Python memory, the final step is to
compute the sensitivity indices. In this example, we use
sobol.analyze, which computes first-order, second-order, and
total-order sensitivity indices.
Additional details on the Sobol analysis workflow are available here.
Si = [sobol.analyze(runner.problem, Y[:, i], print_to_console=True) for i in range(Y.ndim)]
Each element in the result is returned as a Python dictionary containing
the keys S1, S2, ST, S1_conf, S2_conf, and ST_conf.
The *_conf entries represent the confidence intervals associated with
each sensitivity index, typically computed at a 95 percent confidence
level.
To display all sensitivity indices at once, set the keyword argument
print_to_console=True. Alternatively, individual indices can be
accessed directly from the results object, as shown below. Using grain yield as example hence index 0
Si[0]['S1']
array([0.43932783, 0.84502346])
Amount which is Nitrogen fertilizer quantity have a higher first order effect than the planting population density.
we can also retrieve the total effect
Si[0]['S1']
array([0.44579877, 0.91875446])
The larger total-order effects relative to the first-order effects suggest potential interactions between the two factors
Si[0]['S2'][0,1]
np.float64(0.025099857475360032)
Si = [sobol.analyze(runner.problem, Y[:,i], print_to_console=True) for i in range(Y.ndim)]
ST ST_conf
Population 0.445799 0.447337
Amount 0.918754 0.492323
S1 S1_conf
Population 0.439328 0.471019
Amount 0.845023 0.589469
S2 S2_conf
[Population, Amount] 0.0251 0.338638
ST ST_conf
Population 0.342162 0.386487
Amount 0.948013 0.527931
S1 S1_conf
Population 0.334605 0.493981
Amount 0.898494 0.579069
S2 S2_conf
[Population, Amount] 0.04358 0.397149
SALib provides utilities that allow each sensitivity index to be converted into a pandas DataFrame, which can then be used for further analysis or visualization.
total_y, s1_y, s2_y = Si[0].to_df()
total_gn, s1_gn, s2_n = Si[1].to_df()
print(total_y)
SALib provides an object-oriented interface that simplifies the
sensitivity analysis triad of sampling, evaluation, and analysis.
In apsimNGpy, this workflow is further streamlined by wrapping all
three steps into a single method,
run_sensitivity().
By keeping the initialized runner or problem definition constant, the complete sensitivity analysis demonstrated above can be executed in a concise and reproducible manner, as shown below.
Si_sobol = run_sensitivity(
runner,
method="sobol",
N=2 ** 5, # ← base sample size should be power of 2
sample_options={
"calc_second_order": True,
"skip_values": 1024,
"seed": 42,
},
analyze_options={
"conf_level": 0.95,
"num_resamples": 1000,
"print_to_console": True,
"calc_second_order": True,
},
)
The returned object is an instance of the
SALib.util.problem.ProblemSpec class. When evaluated in the Python
console, this object displays a summary of the problem definition and
the computed sensitivity indices, as shown below.
Out[4]:
Samples:
2 parameters: ['Population', 'Amount']
96 samples
Outputs:
2 outputs: ['Y1', 'Y2']
96 evaluations
Analysis:
Y1:
ST ST_conf
Population 0.308278 0.291488
Amount 1.089438 0.522208:
S1 S1_conf
Population 0.394278 0.374126
Amount 0.780433 0.644199:
S2 S2_conf
[Population, Amount] -0.024841 0.561135:
Y2:
ST ST_conf
Population 0.230316 0.245983
Amount 1.073707 0.475261:
S1 S1_conf
Population 0.307134 0.290722
Amount 0.772897 0.628444:
S2 S2_conf
[Population, Amount] -0.06778 0.422351:
The presence of negative sensitivity indices usually indicates an insufficient sample size. Increasing the base sample size can help mitigate this issue.
Converting outputs to pandas
If we use to_df a list of Pandas Dataframes is returned
Key important attributes for samples, model results and analyses extraction available on the returned instance are shown below.
print( Si_sobol.samples)
print( Si_sobol.results)
print( Si_sobol.analysis)
Basic plotting functionality is also provided
Si_sobol.plot()
We can try another method known as Morris. The Morris method is typically used as a screening tool to identify influential parameters with relatively low computational cost. It is well suited for high-dimensional problems where the goal is to rank parameters rather than quantify precise sensitivities.
Si_morris = run_sensitivity(
runner,
method="morris", n_cores=10,
sample_options={
'seed': 42,
"num_levels": 6,
"optimal_trajectories": 6,
},
analyze_options={
'conf_level': 0.95,
"num_resamples": 1000,
"print_to_console": True,
'seed': 42
},
)
Out[7]:
Samples:
2 parameters: ['Population', 'Amount']
18 samples
Outputs:
2 outputs: ['Y1', 'Y2']
18 evaluations
Analysis:
Y1:
mu mu_star sigma mu_star_conf
Population 11394.831001 14289.237399 17359.167016 10309.394014
Amount 28441.782977 29864.030018 31160.051327 22045.514695:
Y2:
mu mu_star sigma mu_star_conf
Population 33.702220 35.548789 35.455297 23.199723
Amount 78.486127 78.486127 73.559727 55.214639:
The final sensitivity analysis method demonstrated in this tutorial is the FAST method. FAST is a variance-based approach that estimates the influence of each input parameter by systematically varying inputs across the parameter space and analyzing the resulting model response.
si_fast = run_sensitivity(
runner,
method="fast",
sample_options={
"M": 2,
},
analyze_options={
'conf_level': 0.95,
"num_resamples": 1000,
"print_to_console": True,
},
)
Out[9]:
Samples:
2 parameters: ['Population', 'Amount']
130 samples
Outputs:
2 outputs: ['Y1', 'Y2']
130 evaluations
Analysis:
Y1:
S1 ST S1_conf ST_conf
Population 0.211503 0.332158 0.206649 0.165346
Amount 0.585945 0.742978 0.206981 0.163964:
Y2:
S1 ST S1_conf ST_conf
Population 0.151057 0.243790 0.208318 0.158650
Amount 0.708005 0.819993 0.200030 0.162489:
Ideally, the analysis could begin with the Morris method to screen out less influential factors, particularly when a large number of inputs are under consideration. However, even without this initial screening, the results are consistent across methods.
In this case, both FAST and Sobol analyses indicate that yield is more sensitive to nitrogen fertilizer rate than to population density.
To extend the analysis beyond the Sobol, Morris, and FAST methods, follow the same workflow demonstrated in the first example and consult the SALib documentation for additional sensitivity analysis techniques and their usage.
Added in version 1.0.0.
References
Iwanaga, T., Usher, W., & Herman, J. (2022). Toward SALib 2.0: Advancing the accessibility and interpretability of global sensitivity analyses. Socio-Environmental Systems Modelling, 4, 18155. doi:10.18174/sesmo.18155
Herman, J. and Usher, W. (2017) SALib: An open-source Python library for sensitivity analysis. Journal of Open Source Software, 2(9). doi:10.21105/joss.00097