Title: | Model Selection of PERMANOVA Models Using AICc |
---|---|
Description: | Provides tools for model selection and model averaging of PerMANOVA models using Akaike Information Criterion corrected for small sample sizes (AICc) and Information Theoretic criteria principles. The package is built around the PERMANOVA analysis from the 'vegan' package and provides a streamlined workflow for generating and comparing models, obtaining model weights, and summarizing results using model averaging approaches. The methods implemented in this package are based on the practical information- theoretic approach described by Burnham, K. P. and Anderson, D. R. (2002) (<doi:10.1007/b97636>). |
Authors: | Derek Corcoran [aut, cre] |
Maintainer: | Derek Corcoran <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.3 |
Built: | 2025-02-09 05:32:46 UTC |
Source: | https://github.com/sustainscapes/aiccperm |
#' This function calculates the Akaike's Information Criterion (AICc) for a permutational multivariate analysis of variance (PERMANOVA) model. The AICc is a modified version of the Akaike Information Criterion (AIC) that is more appropriate for small sample sizes and high-dimensional models.
AICc_permanova2(adonis2_model)
AICc_permanova2(adonis2_model)
adonis2_model |
An object of class adonis2 from the vegan package |
The AICc calculation for a PERMANOVA model is:
where AIC is the Akaike Information Criterion, k is the number of parameters in the model (excluding the intercept), and n is the number of observations.
A data frame with the AICc, the number of parameters (k) and the number of observations (N).
Zuur, A. F., Ieno, E. N., Walker, N. J., Saveliev, A. A., & Smith, G. M. (2009). Mixed effects models and extensions in ecology with R. Springer Science & Business Media.
library(vegan) data(dune) data(dune.env) # Run PERMANOVA using adonis2 Model <- adonis2(dune ~ Management * A1, data = dune.env) # Calculate AICc AICc_permanova2(Model)
library(vegan) data(dune) data(dune.env) # Run PERMANOVA using adonis2 Model <- adonis2(dune ~ Management * A1, data = dune.env) # Calculate AICc AICc_permanova2(Model)
Calculates the adjusted R squared for each predictor using the Akaike Information Criterion (AIC) and model averaging. AIC is used to compare the performance of candidate models and select the best one. Then, the R squared is adjusted based on the weight of evidence in favor of each model. The final result is a long-format table of variable names and corresponding adjusted R squared values.
akaike_adjusted_rsq(DF)
akaike_adjusted_rsq(DF)
DF |
A data.frame containing the variables to calculate the adjusted R squared for. The data.frame should include the columns: "form", "AICc", "max_vif", "k", "DeltaAICc", "AICWeight", and "N". |
The adjusted R squared is calculated as:
where RSS is the residual sum of squares, N is the sample size, and k is the number of predictors. The R squared is adjusted based on the weight of evidence in favor of each model, which is calculated as:
where w_i is the weight of evidence in favor of the ith model, and DeltaAICc_i is the difference in AICc between the ith model and the best model. Model averaging uses the weights to combine the performance of different models in the final calculation of the adjusted R squared.
A data.frame with columns "Variable" and "Full_Akaike_Adjusted_RSq". Each row represents a predictor, and its corresponding adjusted R squared value based on the Akaike-adjusted model averaging process.
library(data.table) df <- data.table( form = c(1, 2, 3), AICc = c(10, 20, 30), max_vif = c(3, 4, 5), k = c(1, 2, 3), DeltaAICc = c(2, 5, 8), AICWeight = c(0.2, 0.5, 0.3), N = c(100, 100, 100), A1 = c(0.3, 0.5, NA), A2 = c(0.7, NA, 0.2), A3 = c(0.2, 0.3, 0.6) ) akaike_adjusted_rsq(df)
library(data.table) df <- data.table( form = c(1, 2, 3), AICc = c(10, 20, 30), max_vif = c(3, 4, 5), k = c(1, 2, 3), DeltaAICc = c(2, 5, 8), AICWeight = c(0.2, 0.5, 0.3), N = c(100, 100, 100), A1 = c(0.3, 0.5, NA), A2 = c(0.7, NA, 0.2), A3 = c(0.2, 0.3, 0.6) ) akaike_adjusted_rsq(df)
This function takes a dataframe with several models and calculates the maximum Variance Inflation Factor (VIF) for a given model. And either filters out the ones with high collinearity or it flags them accordingly
filter_vif( all_forms, env_data, ncores = 2, filter = TRUE, threshold = 5, verbose = TRUE )
filter_vif( all_forms, env_data, ncores = 2, filter = TRUE, threshold = 5, verbose = TRUE )
all_forms |
A data frame generated by |
env_data |
A dataset with the variables described in all_froms |
ncores |
An integer specifying the number of cores to use for parallel processing |
filter |
logical, if TRUE it filters out the models with a maximum VIF of high or higher, if FALSE it generates a new column called collinearity, wich will |
threshold |
A numeric value specifying the threshold for filtering models based on maximum VIF (default is 5) |
verbose |
logical, defaults TRUE, sends messages about processing times |
A data.frame with the models, fitering out the ones with high collinearity or flagginf them.
library(vegan) data(dune) data(dune.env) AllModels <- make_models(vars = c("A1", "Moisture", "Manure")) filter_vif( all_forms = AllModels, env_data = dune.env )
library(vegan) data(dune) data(dune.env) AllModels <- make_models(vars = c("A1", "Moisture", "Manure")) filter_vif( all_forms = AllModels, env_data = dune.env )
This function fits PERMANOVA models for all combinations of variables in a given dataset, and arranges the models by Akaike Information Criterion (AICc) score. The function also calculates the maximum variance inflation factor (max_vif) for each model.
fit_models( all_forms, com_data, env_data, method = "bray", ncores = 2, log = TRUE, logfile = "log.txt", multiple = 100, strata = NULL, verbose = FALSE )
fit_models( all_forms, com_data, env_data, method = "bray", ncores = 2, log = TRUE, logfile = "log.txt", multiple = 100, strata = NULL, verbose = FALSE )
all_forms |
A data frame generated by |
com_data |
A dataset with community presence absense or abundance data, you can also use a dist class file generated from vegan, betapart or other packages |
env_data |
A dataset with the variables described in all_froms |
method |
method for distance from |
ncores |
An integer specifying the number of cores to use for parallel processing |
log |
logical if true, a log file will be generated |
logfile |
the text file that will be generated as a log |
multiple |
after how many loops to write a log file |
strata |
a block variable similar to the use in |
verbose |
logical, defaults TRUE, sends messages about processing times |
A data.frame with fitted models arranged by AICc, including the formula used, the number of explanatory variables, R2, adjusted R2, and the AICc and max VIF.
Anderson, M. J. (2001). A new method for non-parametric multivariate analysis of variance. Austral Ecology, 26(1), 32-46. https://doi.org/10.1111/j.1442-9993.2001.01070.pp.x
## example with dataframe as community data library(vegan) data(dune) data(dune.env) AllModels <- make_models(vars = c("A1", "Moisture", "Manure")) fit_models( all_forms = AllModels, com_data = dune, env_data = dune.env ) ## example with distance as community data library(betapart) Distance <- beta.pair.abund(dune) Distance <- Distance$beta.bray.bal fit_models( all_forms = AllModels, com_data = Distance, env_data = dune.env )
## example with dataframe as community data library(vegan) data(dune) data(dune.env) AllModels <- make_models(vars = c("A1", "Moisture", "Manure")) fit_models( all_forms = AllModels, com_data = dune, env_data = dune.env ) ## example with distance as community data library(betapart) Distance <- beta.pair.abund(dune) Distance <- Distance$beta.bray.bal fit_models( all_forms = AllModels, com_data = Distance, env_data = dune.env )
Generates all possible linear models for a given set of predictor variables using the distance matrix as a response variable. The function allows for the user to specify the maximum number of variables in a model, which can be useful in cases where there are many predictors. The output is a data frame containing all the possible models, which can be passed to the fit_models function for fitting using a PERMANOVA approach.
make_models(vars, ncores = 2, k = NULL, verbose = TRUE)
make_models(vars, ncores = 2, k = NULL, verbose = TRUE)
vars |
A character vector of variables to use for modeling |
ncores |
An integer specifying the number of cores to use for parallel processing |
k |
maximum number of variables in a model, default is NULL |
verbose |
logical, defaults TRUE, sends messages about processing times |
A data frame containing all the possible linear permanova models
Anderson, M. J. (2001). A new method for non-parametric multivariate analysis of variance. Austral Ecology, 26(1), 32-46.
make_models( vars = c("A", "B", "C", "D"), ncores = 2, verbose = FALSE ) # using k as a way to limit number of variables make_models( vars = c("A", "B", "C", "D"), ncores = 2, k = 2, verbose = FALSE )
make_models( vars = c("A", "B", "C", "D"), ncores = 2, verbose = FALSE ) # using k as a way to limit number of variables make_models( vars = c("A", "B", "C", "D"), ncores = 2, k = 2, verbose = FALSE )
This function selects models from a data frame based on the AICc and VIF values. Models with AICc greater than negative infinity and VIF less than or equal to 6 are considered. The difference in AICc values for each model is calculated with respect to the model with the minimum AICc. Models with a difference in AICc less than or equal to the specified delta_aicc value are selected.
select_models(df, delta_aicc = 2)
select_models(df, delta_aicc = 2)
df |
a data frame containing the models to select from. |
delta_aicc |
a numeric value specifying the maximum difference in AICc values allowed. |
a data frame containing the selected models and the AIC weights.
df <- data.frame(AICc = c(10, 12, 15, 20), max_vif = c(2, 4, 5, 6)) select_models(df) select_models(df, delta_aicc = 5)
df <- data.frame(AICc = c(10, 12, 15, 20), max_vif = c(2, 4, 5, 6)) select_models(df) select_models(df, delta_aicc = 5)
This function calculates the maximum Variance Inflation Factor (VIF) for a given model. The VIF is a measure of collinearity among predictor variables within a regression model. It quantifies how much the variance of an estimated regression coefficient is increased due to collinearity. A VIF of 1 indicates no collinearity, while values above 1 indicate increasing levels of collinearity. A VIF of 5 or greater is often considered high, indicating a strong presence of collinearity.
VIF(model)
VIF(model)
model |
A regression model, such as those created by lm, glm, or other similar functions. |
The maximum VIF value.
Belsley, D. A., Kuh, E., & Welsch, R. E. (1980). Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. John Wiley & Sons.
Kutner, M. H., Nachtsheim, C. J., Neter, J., & Li, W. (2004). Applied Linear Statistical Models. McGraw-Hill/Irwin.
O'Brien, R. M. (2007). A caution regarding rules of thumb for variance inflation factors. Quality & Quantity, 41(5), 673-690.
data("mtcars") VIF(lm(mpg ~ ., data = mtcars))
data("mtcars") VIF(lm(mpg ~ ., data = mtcars))