This function calculates metric statistics for use with developing a multi-metric index.
Inputs are a data frame with
metric.stats(
fun.DF,
col_metrics,
col_SampID = "SAMPLEID",
col_RefStatus = "Ref_Status",
RefStatus_Ref = "Ref",
RefStatus_Str = "Str",
RefStatus_Oth = "Oth",
col_DataType = "Data_Type",
DataType_Cal = "Cal",
DataType_Ver = "Ver",
col_Subset = NULL,
Subset_Value = NULL
)
Data frame.
Column names for metrics.
Column name for unique sample identifier. Default = "SAMPLEID".
Column name for Reference Status. Default = "Ref_Status"
Reference Status name for Reference used in col_ RefStatus. Default = “Ref”. Use NULL if you don't use this value.
Reference Status name for Stressed used in col_ RefStatus. Default = “Str”. Use NULL if you don't use this value.
Reference Status name for Other used in col_ RefStatus. Default = “Oth”. Use NULL if you don't use this value.
Column name for Data Type – Validation vs. Calibration. Default = "Data_Type"
Datatype name for Calibration used in col_DataType. Default = “Cal”. Use NULL if you don't use this value.
Datatype name for Verification used in col_DataType. Default = “Ver”. Use NULL if you don't use this value.
Column name to subset the data and run on each subset. Default = NULL. If NULL then no subset will be generated.
Subset name to be used for creating subset. Default = NULL.
data frame of metrics (rows) and statistics (columns). This is in long format with columns for INDEX_CLASS, RefStatus, and DataType.
Summary statistics for the data are calculated.
The data is filtered by the column Subset for only a single value given by the user. If need further subsets re-run the function. If no subset is given the entire data set is used.
Statistics will be generated for up to 6 combinations for RefStatus (Ref, Oth, Str) and DataType (Cal, Ver).
The resulting dataframe will have the statistics in columns with the first 4 columns as: INDEX_CLASS (if col_Subset not provided), col_RefStatus, col_DataType, and Metric_Name.
The following statistics are generated with na.rm = TRUE.
* n = number
* min = minimum
* max = maximum
* mean = mean
* median = median
* range = range (max - min)
* sd = standard deviation
* cv = coefficient of variation (sd/mean)
* q05 = quantile, 5
* q10 = quantile, 10
* q25 = quantile, 25
* q50 = quantile, 50
* q75 = quantile, 75
* q90 = quantile, 90
* q95 = quantile, 95
# data, benthos
df_bugs <- data_mmi_dev
# Munge Names
names(df_bugs)[names(df_bugs) %in% "BenSampID"] <- "SAMPLEID"
names(df_bugs)[names(df_bugs) %in% "TaxaID"] <- "TAXAID"
names(df_bugs)[names(df_bugs) %in% "Individuals"] <- "N_TAXA"
names(df_bugs)[names(df_bugs) %in% "Exclude"] <- "EXCLUDE"
names(df_bugs)[names(df_bugs) %in% "Class"] <- "INDEX_CLASS"
names(df_bugs)[names(df_bugs) %in% "Unique_ID"] <- "SITEID"
# Calc Metrics
cols_keep <- c("Ref_v1", "CalVal_Class4", "SITEID", "CollDate", "CollMeth")
# INDEX_NAME and INDEX_CLASS kept by default
df_metval <- metric.values(df_bugs, "bugs", fun.cols2keep = cols_keep)
#>
#> There are 7 missing fields in the data:
#> ELEVATION_ATTR, GRADIENT_ATTR, WSAREA_ATTR, HABSTRUCT, BCG_ATTR2, AIRBREATHER, UFC
#>
#> If you continue the metrics associated with these fields will be invalid.
#> For example, if the HABIT field is missing all habit related metrics will not be correct.
#> Do you wish to continue (YES or NO)?
#> boo.Shiny == TRUE and interactive == FALSE
#> so prompt skipped and value set to '1'.
#> Warning: Metrics related to the following fields are invalid:
#> ELEVATION_ATTR
#> GRADIENT_ATTR
#> WSAREA_ATTR
#> HABSTRUCT
#> BCG_ATTR2
#> AIRBREATHER
#> UFC
#> Joining with `by = join_by(SAMPLEID, INDEX_NAME, INDEX_CLASS)`
#> Warning: The `.dots` argument of `group_by()` is deprecated as of dplyr 1.0.0.
#> ℹ The deprecated feature was likely used in the dplyr package.
#> Please report the issue at <https://github.com/tidyverse/dplyr/issues>.
# Calc Stats
col_metrics <- names(df_metval)[9:ncol(df_metval)]
col_SampID <- "SAMPLEID"
col_RefStatus <- "REF_V1"
RefStatus_Ref <- "Ref"
RefStatus_Str <- "Strs"
RefStatus_Oth <- "Other"
col_DataType <- "CALVAL_CLASS4"
DataType_Cal <- "cal"
DataType_Ver <- "verif"
col_Subset <- "INDEX_CLASS"
Subset_Value <- "CENTRALHILLS"
df_stats <- metric.stats(df_metval
, col_metrics
, col_SampID
, col_RefStatus
, RefStatus_Ref
, RefStatus_Str
, RefStatus_Oth
, col_DataType
, DataType_Cal
, DataType_Ver
, col_Subset
, Subset_Value)
#> Error in metric.stats(df_metval, col_metrics, col_SampID, col_RefStatus, RefStatus_Ref, RefStatus_Str, RefStatus_Oth, col_DataType, DataType_Cal, DataType_Ver, col_Subset, Subset_Value): Values missing from column 'INDEX_CLASS'; CENTRALHILLS
if (FALSE) {
# Save Results
write.table(df_stats
, file.path(tempdir(), "metric.stats.tsv")
, col.names = TRUE
, row.names = FALSE
, sep = "\t")
}