Calculate metric values — metric.values • BioMonTools

This function calculates metric values for bugs, fish, algae , and coral. Inputs are a data frame with SampleID and taxa with phylogenetic and autecological information (see below for required fields by community). The dplyr package is used to generate the metric values.

metric.values(
  fun.DF,
  fun.Community,
  fun.MetricNames = NULL,
  boo.Adjust = FALSE,
  fun.cols2keep = NULL,
  boo.marine = FALSE,
  boo.Shiny = FALSE,
  verbose = FALSE,
  metric_subset = NULL,
  taxaid_dni = NULL
)

Arguments

fun.DF: Data frame of taxa (list required fields)
fun.Community: Community name for which to calculate metric values (bugs, fish, algae, or coral)
fun.MetricNames: Optional vector of metric names to be returned. If none are supplied then all will be returned. Default=NULL
boo.Adjust: Optional boolean value on whether to perform adjustments of values prior to scoring. Default = FALSE but may be TRUE for certain metrics.
fun.cols2keep: Column names of fun.DF to retain in the output. Uses column names.
boo.marine: Should estuary/marine metrics be included. Ignored if fun.MetricNames is not null. Default = FALSE.
boo.Shiny: Boolean value for if the function is accessed via Shiny. Default = FALSE.
verbose: Include messages to track progress. Default = FALSE
metric_subset: Subset of metrics to be generated. Internal function. Default = NULL
taxaid_dni: Taxa names to be included in DNI (Do Not Include) metrics (n = 3) but dropped for all other metrics. Only for benthic metrics. Default = NULL

Value

data frame of SampleID and metric values

Details

All percent metric results are 0-100.

No manipulations of the taxa are performed by this routine. All benthic macroinvertebrate taxa should be identified to the appropriate operational taxonomic unit (OTU).

Any non-count taxa should be identified in the "Exclude" field as "TRUE". These taxa will be excluded from taxa richness metrics (but will count for all others).

Any non-target taxa should be identified in the "NonTarget" field as "TRUE". Non-target taxa are those that are not part of your intended #' capture list; e.g., fish, herps, water column taxa, or water surface taxa in a benthic sample. The target list will vary by program. The non-target taxa will be removed prior to any calculations.

Excluded taxa are ambiguous taxa (on a sample basis), i.e., the parent taxa when child taxa are present. For example, the parent taxa Chironomidae would be excluded when the child taxa Tanytarsini is present. Both would be excluded when Tanytarsus is present. The markExcluded function can be used to populated this field.

There are a number of required fields (see below) for metric to calculation. If any fields are missing the user will be prompted as to which are missing and if the user wants to continue or quit. If the user continues the missing fields will be added but will be filled with zero or NA (as appropriate). Any metrics based on the missing fields will not be valid.

A future update may turn these fields into function parameters. This would allow the user to tweak the function inputs to match their data rather than having to update their data to match the function.

Required fields, all communities:

* SAMPLEID (character or number, must be unique)

* TAXAID (character or number, must be unique)

* N_TAXA

* INDEX_NAME

* INDEX_CLASS (BCG or MMI site category; e.g., for BCG PacNW valid values are "hi" or "lo")

Additional Required fields, bugs:

* EXCLUDE (valid values are TRUE and FALSE)

* NONTARGET (valid values are TRUE and FALSE)

* PHYLUM, SUBPHYLUM, CLASS, SUBCLASS, INFRAORDER, ORDER, FAMILY, SUBFAMILY, TRIBE, GENUS

* FFG, HABIT, LIFE_CYCLE, TOLVAL, BCG_ATTR, THERMAL_INDICATOR, FFG2, TOLVAL2, LONGLIVED, NOTEWORTHY, HABITAT, UFC, ELEVATION_ATTR, GRADIENT_ATTR, WSAREA_ATTR, HABSTRUCT

Additional Required fields, fish:

* N_ANOMALIES

* SAMP_BIOMASS (biomass total for sample, funciton uses max in case entered for all taxa in sample)

* DA_MI2, SAMP_WIDTH_M, SAMP_LENGTH_M, , TYPE, TOLER, NATIVE, TROPHIC, SILT, FAMILY, GENUS, HYBRID, BCG_ATTR, THERMAL_INDICATOR, ELEVATION_ATTR, GRADIENT_ATTR, WSAREA_ATTR, REPRODUCTION, HABITAT, CONNECTIVITY, SCC

Additional Required fields, algae:

* EXCLUDE, NONTARGET, PHYLUM, ORDER, FAMILY, GENUS, BC_USGS, TROPHIC_USGS, SAP_USGS, PT_USGS, O_USGS, SALINITY_USGS, BAHLS_USGS, P_USGS, N_USGS, HABITAT_USGS, N_FIXER_USGS, MOTILITY_USGS, SIZE_USGS, HABIT_USGS, MOTILE2_USGS, TOLVAL, DIATOM_ISA, DIAT_CL, POLL_TOL, BEN_SES, DIATAS_TP, DIATAS_TN, DIAT_COND, DIAT_CA, MOTILITY, NF

Valid values for fields:

* FFG: CG, CF, PR, SC, SH

* HABIT: BU, CB, CN, SP, SW

* LIFE_CYCLE: UNI, SEMI, MULTI

* THERMAL_INDICATOR: STENOC, COLD, COOL, WARM, STENOW, EURYTHERMAL , COWA, NA

* LONGLIVED: TRUE, FALSE

* NOTEWORTHY: TRUE, FALSE

* HABITAT: BRAC, DEPO, GENE, HEAD, RHEO, RIVE, SPEC, UNKN

* UFC: integers 1:6 (taxonomic uncertainty frequency class)

* ELEVATION_ATTR: LOW, HIGH

* GRADIENT_ATTR: LOW, MOD, HIGH

* WSAREA_ATTR: SMALL, MEDIUM, LARGE, XLARGE

* REPRODUCTION: BROADCASTER, SIMPLE NEST, COMPLEX NEST, BEARER, MIGRATORY

* CONNECTIVITY: TRUE, FALSE

* SCC (Species of Conservation Concern): TRUE, FALSE

'Columns to keep' are additional fields in the input file that the user wants retained in the output. Fields need to be those that are unique per sample and not associated with the taxa. For example, the fields used in qc.check(); Area_mi2, SurfaceArea, Density_m2, and Density_ft2.

If fun.MetricNames is provided only those metrics will be returned in the provided order. This variable can be used to sort the metrics per the user's preferences. By default the metric names will be returned in the groupings that were used for calculation.

The fields TOLVAL2 and FFG2 are provided to allow the user to calculate metrics based on alternative scenarios. For example, including both HBI and NCBI where the NCBI uses a different set of tolerance values (TOLVAL2).

If TAXAID is 'NONE' and N_TAXA is '0' then metrics **will** be calculated with that record. Other values for TAXAID with N_TAXA = 0 will be removed before calculations.

For 'Oligochete' metrics either Class or Subclass is required for calculation.

The parameter boo.Shiny can be set to TRUE when accessing this function in Shiny. Normally the QC check for required fields is interactive. Setting boo.Shiny to TRUE will always continue. The default is FALSE.

The parameter 'taxaid_dni' denotes taxa to be included in Do Not Include (DNI) metrics but dropped from all other metrics. Only for benthic metrics.

Breaking change from 0.5 to 0.6 with change from Index_Name to Index_Class.

Examples

# Example 1, data already in R

df_metric_values_bugs <- metric.values(data_benthos_PacNW, "bugs")
#> Joining with `by = join_by(SAMPLEID, INDEX_NAME, INDEX_CLASS)`

if (FALSE) {
# View Results
View(df_metric_values_bugs)
}

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Example 2, data from Excel
if (FALSE) {
# Packages
library(readxl)
library(reshape2)

df_samps_bugs <- read_excel(system.file("extdata/Data_Benthos.xlsx"
                                       , package = "BioMonTools")
                            , guess_max = 10^6)

# Columns to keep
myCols <- c("Area_mi2", "SurfaceArea", "Density_m2", "Density_ft2")

# Run Function
df_metric_values_bugs <- metric.values(df_samps_bugs[1:100, ]
                                       , "bugs"
                                       , fun.cols2keep = myCols)

# View Results
View(df_metric_values_bugs)
}

# Get data in long format so can QC results more easily
df_long <- melt(df_metric_values_bugs, id.vars = c("SAMPLEID"
                                                 , "INDEX_NAME"
                                                 , "INDEX_CLASS"
                                                 , toupper(myCols))
                          , variable.name = "METRIC_NAME"
                          , value.name = "METRIC_VALUE")
#> Error in eval(expr, envir, enclos): object 'myCols' not found

if (FALSE) {
# Save Results
write.table(df_long, file.path(tempdir(), "metric.values.tsv")
            , col.names = TRUE, row.names = FALSE, sep = "\t")

# DataExplorer Report
library(DataExplorer)
create_report(df_metric_values_bugs
              , output_file = file.path(tempdir()
                                 , "DataExplorer_Report_MetricValues.html"))
create_report(df_samps_bugs
              , output_file = file.path(tempdir()
                                   , "DataExplorer_Report_BugSamples.html"))
}

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Example 3, specific metrics or metrics in a specific order
## reuse df_samps_bugs from above

# metric names to keep (in this order)
myMetrics <- c("ni_total", "nt_EPT", "nt_Ephem", "pi_tv_intol", "pi_Ephem"
               , "nt_ffg_scrap", "pi_habit_climb")

# Run Function
df_metric_values_bugs_myMetrics <- metric.values(df_samps_bugs, "bugs"
                                               , fun.MetricNames = myMetrics)
#> Error in eval(expr, envir, enclos): object 'df_samps_bugs' not found
if (FALSE) {
# View Results
View(df_metric_values_bugs_myMetrics)
}

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Example 4, fish metrics

df_metric_values_fish <- metric.values(data_fish_MBSS, "fish")
#> EXCLUDE column does not have any TRUE values.
#> This is common with fish samples.
#> Valid values are TRUE or FALSE.
#> Other values are not recognized

if (FALSE) {
# View Results
View(df_metric_values_fish)
}

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Example 5, periphyton (algae) metrics

# df_metric_values_periphyton <- metric.values(data_diatom_mmi_dev, "algae")

if (FALSE) {
# View Results
}