Vignette, BioMonTools

Purpose

The BioMonTools package was created to enable users to have a common set of tools for bioassessment and biomonitoring data.

Installation

The package is hosted on GitHub (https://github.com/leppott/BioMonTools) and can be installed using the following lines of code. It is necessary to install the devtools package.

# Installing the BioMonTools library (with the vignette) from GitHub
library(devtools) 
install_github("leppott/BioMonTools", force=TRUE, build_vignettes=TRUE)

Help

After the BioMonTools package has been installed running the following line of code will open the help file with links to all functions.

help(package="BioMonTools")

Package Functions

The suite of functions in the BioMonTools package are presented below.

metric.values
markExcluded
rarify

metric.values

Calculate metric values from a data frame with taxa by sample with fully qualified master taxa information. All percent metric results are 0-1.

Data Preparation

There are a number of required fields for the input file. If any fields are missing the user will be prompted as to which are missing and the user can decide whether to continue or quit. If the user continues, the missing fields will be added but will be filled with zero or NA (as appropriate). Any metrics based on the missing fields will not be valid.

Required Fields:

SAMPLEID
- Unique sample identifier
- Valid values: character or number, must be unique
TAXAID
- Unique taxa identifier
- Valid values: character or number, must be unique
N_TAXA
- Number of individuals
- Valid values: number
EXCLUDE
- Non-unique/non-distinct taxa are excluded from richness metric calculations but are counted in the other metrics. Appendix B of the ‘BCGcalc_README_20180919’ Word document describes the Exclude Taxa Decision Criteria that was used during BCG model calibration.
- Valid values: TRUE or FALSE
- Non-unique/non-distinct taxa should be entered as “TRUE”
NONTARGET
- Non-target taxa are not part of the intended capture list; e.g., fish, herps, water column taxa. They are excluded from all metric calculations.
- Valid values: TRUE or FALSE.
- NonTarget taxa should be entered as “TRUE”
INDEX_NAME
- Name of the BCG rules worksheet in the ‘Rules’ file (in the ‘extdata’ folder) that the R code is referencing.
- Valid values for the Puget Lowlands/Willamette Valley model: BCG_PacNW_v1_500ct or BCG_PacNW_v1_300ct.
SITE_TYPE
- Select which BCG model to apply: Low (lo) gradient (<1% NHD+ v2 flowline slope) or high (hi) gradient (≥ 1% NHD+ v2 flowline slope).
- Valid values: “hi” or “lo”.
PHYLUM, SUBPHYLUM, CLASS, ORDER, FAMILY, SUBFAMILY, TRIBE, GENUS
- Phylogeny
- Other phylogenetic rankings (e.g., SubOrder or SuperFamily) can be included but are not used in the current metric calculations.
- Valid values: text
BCG_Attr
- BCG attribute assignments (for example, Stamp and Gerritsen 2018).
- Valid values: 1i, 1m, 2, 3, 4, 5, 6; if not available, leave blank or enter ‘NA’.
FFG, HABIT, LIFE_CYCLE, TOLVAL, THERMAL_INDICATOR, UFC, ELEVATION_ATTR, GRADIENT_ATTR, WSAREA_ATTR, REPRODUCTION, CONNECTIVITY
- FFG
  - Valid values: CG, CF, PR, SC, SH
  - Function feeding group entries.
- HABIT
  - Valid values: BU, CB, CN, SP, SW
  - Habit designations.
- LIFE_CYLCE
  - Valid values: UNI, SEMI, MULTI
  - Life cycle designations.
- THERMAL_INDICATOR
  - Valid values: STENOC, COLD, COOL, WARM, STENOW , EURYTHERMAL, INCONCLUSIVE, NA
- LONGLIVED
  - Valid values: TRUE, FALSE
- NOTEWORTHY
  - Valid values: TRUE, FALSE
- HABITAT
  - Valid values: BRAC, DEPO, GENE, HEAD, RHEO, RIVE, SPEC, UNKN
- UFC
  - Valid values: numeric; 1, 2, 3, 4, 5, 6
  - Taxonomic Uncertainty Frequency Class
- ELEVATION_ATTR
  - Valid values: LOW, HIGH
- GRADIENT_ATTR
  - Valid values: LOW, MOD, HIGH
- WSAREA_ATTR
  - Valid values: SMALL, MEDIUM, LARGE, XLARGE
- REPRODUCTION
  - Valid values: BROADCASTER, SIMPLE NEST, COMPLEX NEST, BEARER, MIGRATORY
- CONNECTIVITY
  - Valid values: TRUE, FALSE

Fields that are optional can be in the output (see examples below).

# Packages
# library(readxl)
# library(knitr)
# library(BioMonTools)
# library(dplyr)

# Load Data
df_data <- readxl::read_excel(system.file("./extdata/Data_Benthos.xlsx"
                                       , package = "BioMonTools")
                      , guess_max = 10^6)
# Columns to keep
myCols <- c("Area_mi2", "SurfaceArea", "Density_m2", "Density_ft2")

# Run Function
df_metval <- BioMonTools::metric.values(df_data, "bugs", fun.cols2keep = myCols)
#> Updated col class; TOLVAL2 to numeric
#> Joining with `by = join_by(SAMPLEID, INDEX_NAME, INDEX_CLASS)`
#> Warning: The `.dots` argument of `group_by()` is deprecated as of dplyr 1.0.0.
#> ℹ The deprecated feature was likely used in the dplyr package.
#>   Please report the issue at <https://github.com/tidyverse/dplyr/issues>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.

# Metrics of Interest
## thermal indicator (_ti_)
#names(df.metval)[grepl("_ti_", names(df.metval))]
col_met2keep <- c("ni_total"
                  , "nt_total"
                  , "nt_ti_stenocold" # renamed from corecold
                  , "nt_ti_cold"
                  , "nt_ti_cool"
                  , "pi_ti_stenocold" # renamed from corecold
                  , "pi_ti_cold"
                  , "pi_ti_cool")
col_ID <- c("SAMPLEID"
            , toupper(myCols)
            , "INDEX_NAME"
            , "INDEX_CLASS")
# Ouput
df_metval_ci <- df_metval[, c(col_ID, col_met2keep)]
# RMD table
knitr::kable(head(df_metval_ci), caption = "Metric Calculation, select metrics")

Metric Calculation, select metrics
SAMPLEID	AREA_MI2	SURFACEAREA	DENSITY_M2	DENSITY_FT2	INDEX_NAME	INDEX_CLASS	ni_total	nt_total	nt_ti_cold	nt_ti_cool	pi_ti_cold	pi_ti_cool
01103CSR_Bug_2001-08-27_0	44.084184	NA	NA	NA	BCG_PacNW_v1_500ct	Lo	589	27	1	10	40.237691	60.95076
02087REF_Bug_2002-08-22_0	1.791475	NA	NA	NA	BCG_PacNW_v1_500ct	Hi	542	38	23	28	18.265683	20.66421
03013CSR_Bug_2003-07-01_0	9.613734	8	NA	NA	BCG_PacNW_v1_500ct	Hi	507	16	3	5	2.958580	13.21499
03053CSR_Bug_2003-08-14_0	3.061723	8	NA	NA	BCG_PacNW_v1_500ct	Hi	573	36	19	24	46.073298	58.46422
03054CSR_Bug_2003-08-18_0	71.249741	8	NA	NA	BCG_PacNW_v1_500ct	Lo	558	13	4	4	6.989247	11.82796
06012CSR_Bug_2006-09-05_0	82.338353	8	NA	NA	BCG_PacNW_v1_500ct	Hi	545	40	18	28	33.944954	75.96330

markExcluded

Takes as an input data frame with Sample ID, Taxa ID, and phlogenetic name fields and returns a similar dataframe with a column for “exclude” taxa (TRUE or FALSE).

Exclude taxa are refered to by multiple names; ambiguous, non-distinct, and non-unique. The “exclude” name was chosen so as to be consistent with “non-target” taxa. That is, taxa marked as “TRUE” are treated as undesirables. Exclude taxa are those that are present in a sample when taxa of the same group are present in the same sample are identified at a finer level. That is, the parent is marked as exclude when child taxa are present in the same sample.

The exclude taxa are referenced in the metric values function. These taxa are removed from the taxa richness metrics. This is because these are coarser level taxa.

Exceptions is a 2 column data frame of synonyms or other exceptions. Column 1 is the name used in the TaxaID column the input data frame (df_samptax). Column 2 is the name used in the TaxaLevels columns of the input data frame (df_samptax). The phylogenetic columns (TaxaLevels) will be modified from Column 2 of the Exceptions data frame to match Column 1 of the Exceptions data frame. This ensures that the algorithm for markExcluded works properly. The changes will not be stored and the original names provided in the input data frame (df_samptax) will be returned in the final result. The function example below includes a practical case.

Taxa Levels are phylogenetic names that are to be checked. They should be listed in order from course (kingdom) to fine (species). Names not appearing in the data will be skipped.

The spelling of names must be consistent (including case) for this function to produce the intended output.

Returns a data frame of df_samptax with an additional column, Exclude.

# Packages
#library(readxl)
#library(dplyr)
#library(lazyeval)
#library(knitr)

# Define pipe
`%>%` <- dplyr::`%>%`

# Data
df_samps_bugs <- readxl::read_excel(system.file("./extdata/Data_Benthos.xlsx"
                                        , package = "BioMonTools")
                            , guess_max = 10^6)

# Variables
SampID     <- "SampleID"
TaxaID     <- "TaxaID"
TaxaCount  <- "N_Taxa"
Exclude    <- "Exclude_New"
TaxaLevels <- c("Kingdom"
                , "Phylum"
                , "SubPhylum"
                , "Class"
                , "SubClass"
                , "Order"
                , "SubOrder"
                , "SuperFamily"
                , "Family"
                , "SubFamily"
                , "Tribe"
                , "Genus"
                , "SubGenus"
                , "Species"
                , "Variety")
# Taxa that should be treated as equivalent
Exceptions <- data.frame("TaxaID" = "Sphaeriidae", "PhyloID" = "Pisidiidae")

# Filter Data
# df_samptax <- filter(df_samps_bugs, !!as.name(SampID) == 
# "08BEA3478__2013-08-21_0")
# df_tst_small <- markExcluded(df_samptax, SampID, TaxaID, TaxaCount, TaxaLevels
#, Exceptions, Exclude)

# EXAMPLE 1
df_tst <- BioMonTools::markExcluded(df_samps_bugs
                                    , SampID = "SampleID"
                                    , TaxaID = "TaxaID"
                                    , TaxaCount = "N_Taxa"
                                    , Exclude = "Exclude_New"
                                    , TaxaLevels = TaxaLevels
                                    , Exceptions = Exceptions)
#> *WARNING*, 3 taxa levels missing from input data frame; 
#>  KINGDOM, SUBORDER, VARIETY
#> [1] "Working on item (1/12); PHYLUM"
#> [1] "Working on item (2/12); SUBPHYLUM"
#> [1] "Working on item (3/12); CLASS"
#> [1] "Working on item (4/12); SUBCLASS"
#> [1] "Working on item (5/12); ORDER"
#> [1] "Working on item (6/12); SUPERFAMILY"
#> [1] "Working on item (7/12); FAMILY"
#> [1] "Working on item (8/12); SUBFAMILY"
#> [1] "Working on item (9/12); TRIBE"
#> [1] "Working on item (10/12); GENUS"
#> [1] "Working on item (11/12); SUBGENUS"
#> [1] "Working on item (12/12); SPECIES"


# Compare
df_compare <- dplyr::summarise(dplyr::group_by(df_tst, SampleID)
                               , Exclude_Import = sum(Exclude)
                               , Exclude_R = sum(Exclude_New))
df_compare$Diff <- df_compare$Exclude_Import - df_compare$Exclude_R
#
tbl_diff <- table(df_compare$Diff)
#kable(tbl_diff)
# sort
df_compare <- df_compare %>% dplyr::arrange(desc(Diff))

# Number with issues
#sum(abs(df_compare$Diff))
# total samples
#nrow(df_compare)

# confusion matrix
tbl_results <- table(df_tst$Exclude, df_tst$Exclude_New, useNA = "ifany")
#
# Show differences
knitr::kable(tbl_results, caption = "Confusion Matrix")

Confusion Matrix
	FALSE	TRUE
FALSE	24653	0
TRUE	0	2350

# samples with differences
samp_diff <- as.data.frame(df_compare[df_compare[,"Diff"] != 0, "SampleID"])
# results for only those with differences
df_tst_diff <- df_tst[df_tst[,"SampleID"] %in% samp_diff$SampleID, ]
# add diff field
df_tst_diff$Exclude_Diff <- df_tst_diff$Exclude - df_tst_diff$Exclude_New

# Classification Performance Metrics
class_TP <- tbl_results[2,2] # True Positive
class_FN <- tbl_results[2,1] # False Negative
class_FP <- tbl_results[1,2] # False Positive
class_TN <- tbl_results[1,1] # True Negative
class_n <- sum(tbl_results)  # total
#
# sensitivity (recall); TP / (TP+FN); measure model to ID true positives
class_sens <- class_TP / (class_TP + class_FN)
# precision; TP / (TP+FP); accuracy of model positives
class_prec <- class_TP / (class_TP + class_FP)
# specifity; TN / (TN + FP); measure model to ID true negatives
class_spec <- class_TN  / (class_TN + class_FP)
# overall accuracy; (TP + TN) / all cases; accuracy of all classifications
class_acc <- (class_TP + class_TN) / class_n
# F1; 2 * (class_prec*class_sens) / (class_prec+class_sens)
## balance of precision and recall
class_F1 <- 2 * (class_prec * class_sens) / (class_prec + class_sens)
#
results_names <- c("Sensitivity (Recall)"
                   , "Precision", "Specificity"
                   , "OVerall Accuracy"
                   , "F1")
results_values <- c(class_sens
                    , class_prec
                    , class_spec
                    , class_acc
                    , class_F1)
#
tbl_class <- data.frame(results_names, results_values)
names(tbl_class) <- c("Performance Metrics", "Percent")
tbl_class$Percent <- round(tbl_class$Percent * 100, 2)
knitr::kable(tbl_class, caption = "Classification Performance Metrics")

Classification Performance Metrics
Performance Metrics	Percent
Sensitivity (Recall)	100
Precision	100
Specificity	100
OVerall Accuracy	100
F1	100

The same data with no “exceptions” leaves 8 records misclassified.

# Packages
#library(readxl)
#library(dplyr)
#library(lazyeval)
#library(knitr)

# Define pipe
`%>%` <- dplyr::`%>%`

# Data
df_samps_bugs <- readxl::read_excel(system.file("./extdata/Data_Benthos.xlsx"
                                              , package = "BioMonTools")
                                  , guess_max = 10^6)

# Variables
SampID     <- "SampleID"
TaxaID     <- "TaxaID"
TaxaCount  <- "N_Taxa"
Exclude    <- "Exclude_New"
TaxaLevels <- c("Kingdom"
                , "Phylum"
                , "SubPhylum"
                , "Class"
                , "SubClass"
                , "Order"
                , "SubOrder"
                , "SuperFamily"
                , "Family"
                , "SubFamily"
                , "Tribe"
                , "Genus"
                , "SubGenus"
                , "Species"
                , "Variety")
# Taxa that should be treated as equivalent
Exceptions <- NA

# EXAMPLE 2
## No Exceptions

df_tst2 <- BioMonTools::markExcluded(df_samps_bugs
                                     , SampID = "SampleID"
                                     , TaxaID = "TaxaID"
                                     , TaxaCount = "N_Taxa"
                                     , Exclude = "Exclude_New"
                                     , TaxaLevels = TaxaLevels
                                     , Exceptions = NA)
#> *WARNING*, 3 taxa levels missing from input data frame; 
#>  KINGDOM, SUBORDER, VARIETY
#> [1] "Working on item (1/12); PHYLUM"
#> [1] "Working on item (2/12); SUBPHYLUM"
#> [1] "Working on item (3/12); CLASS"
#> [1] "Working on item (4/12); SUBCLASS"
#> [1] "Working on item (5/12); ORDER"
#> [1] "Working on item (6/12); SUPERFAMILY"
#> [1] "Working on item (7/12); FAMILY"
#> [1] "Working on item (8/12); SUBFAMILY"
#> [1] "Working on item (9/12); TRIBE"
#> [1] "Working on item (10/12); GENUS"
#> [1] "Working on item (11/12); SUBGENUS"
#> [1] "Working on item (12/12); SPECIES"


# Compare
df_compare2 <- dplyr::summarise(dplyr::group_by(df_tst2, SampleID)
                               , Exclude_Import = sum(Exclude)
                               , Exclude_R = sum(Exclude_New))
df_compare2$Diff <- df_compare2$Exclude_Import - df_compare2$Exclude_R
#
tbl_diff2 <- table(df_compare2$Diff)
#kable(tbl_diff2)
# sort
df_compare2 <- df_compare2 %>% dplyr::arrange(desc(Diff))

# Number with issues
#sum(abs(df_compare2$Diff))
# total samples
#nrow(df_compare2)

# confusion matrix
tbl_results2 <- table(df_tst2$Exclude, df_tst2$Exclude_New, useNA = "ifany")
#
# Show differences
knitr::kable(tbl_results2, caption = "Confusion Matrix")

Confusion Matrix
	FALSE	TRUE
FALSE	24653	0
TRUE	8	2342

knitr::kable(df_compare2[1:10, ])

SampleID	Exclude_Import	Exclude_R	Diff
08BEA3478__2013-08-21_0	3	2	1
08BEA3571__2014-09-03_0	5	4	1
08LIT2692__2012-09-10_0	8	7	1
08SAM0000__2012-07-11_0	4	3	1
08SAM0000__2013-07-25_0	4	3	1
09BLA0716__2014-08-14_0	3	2	1
95C__2012-10-02_0	1	0	1
95T__2012-10-03_0	1	0	1
01103CSR_Bug_2001-08-27_0	1	1	0
02087REF_Bug_2002-08-22_0	0	0	0

knitr::kable(tail(df_compare2))

SampleID	Exclude_Import	Exclude_R
silvr192__2012-08-24_0	3	3
skytrmann__2010-08-12_0	6	6
skytrmann__2013-08-28_0	8	8
swampcart__2012-08-20_0	6	6
woodshand__2010-08-06_0	12	12
woodshand__2013-09-03_0	4	4

# samples with differences
(samp_diff2 <- as.data.frame(df_compare2[df_compare2[,"Diff"] != 0, "SampleID"]))
#>                  SampleID
#> 1 08BEA3478__2013-08-21_0
#> 2 08BEA3571__2014-09-03_0
#> 3 08LIT2692__2012-09-10_0
#> 4 08SAM0000__2012-07-11_0
#> 5 08SAM0000__2013-07-25_0
#> 6 09BLA0716__2014-08-14_0
#> 7       95C__2012-10-02_0
#> 8       95T__2012-10-03_0

# results for only those with differences
df_tst_diff2 <- dplyr::filter(df_tst2, SampleID %in% samp_diff2$SampleID)
# add diff field
df_tst_diff2$Exclude_Diff <- df_tst_diff2$Exclude - df_tst_diff2$Exclude_New

# Classification Performance Metrics
class_TP2 <- tbl_results2[2,2] # True Positive
class_FN2 <- tbl_results2[2,1] # False Negative
class_FP2 <- tbl_results2[1,2] # False Positive
class_TN2 <- tbl_results2[1,1] # True Negative
class_n2 <- sum(tbl_results2)  # total
#
# sensitivity (recall); TP / (TP+FN); measure model to ID true positives
class_sens2 <- class_TP2 / (class_TP2 + class_FN2)
# precision; TP / (TP+FP); accuracy of model positives
class_prec2 <- class_TP2 / (class_TP2 + class_FP2)
# specifity; TN / (TN + FP); measure model to ID true negatives
class_spec2 <- class_TN2 / (class_TN2 + class_FP2)
# overall accuracy; (TP + TN) / all cases; accuracy of all classifications
class_acc2 <- (class_TP2 + class_TN2) / class_n2
# F1; 2 * (class_prec*class_sens) / (class_prec+class_sens)
## balance of precision and recall
class_F12 <- 2 * (class_prec2 * class_sens2) / (class_prec2 + class_sens2)
#
results_names2 <- c("Sensitivity (Recall)"
                    , "Precision"
                    , "Specificity"
                    , "OVerall Accuracy"
                    , "F1")
results_values2 <- c(class_sens2
                     , class_prec2
                     , class_spec2
                     , class_acc2
                     , class_F12)
#
tbl_class2 <- data.frame(results_names2, results_values2)
names(tbl_class2) <- c("Performance Metrics", "Percent")
tbl_class2$Percent <- round(tbl_class2$Percent * 100, 2)
knitr::kable(tbl_class2, caption = "Classification Performance Metrics")

Classification Performance Metrics
Performance Metrics	Percent
Sensitivity (Recall)	99.66
Precision	100.00
Specificity	100.00
OVerall Accuracy	99.97
F1	99.83

rarify

The rarify function subsamples count data to a fixed count per sample. It takes as an input a 3 column data frame (SampleID, TaxonID, Count) and returns a similar dataframe with revised Counts. The names of the columns does not matter as they are specified in the code. Any non-count taxa (e.g., fish in a bug sample) should be removed prior to using the rarify function. The function code is from USEPA Corvallis John Van Sickle’s R code for RIVPACS (v1.0, 2005-06-10) and was tweaked for the addition of a user provided seed so repeatable results can be obtained.

The other function inputs are subsample size (target number of organisms in each sample) and seed. The seed is given so the results can be reproduced from the same input file. If no seed is given a random seed is used. An example seed is the date of admission to the Union for each state where the data is collected (e.g., Washington is 18891111). These values can be found on Wikipedia on the right sidebar for each State.

If you are running the 500-count BCG model and any of your samples have more than 600 organisms (the upper limit for the model), you should randomly subsample your data to 600 (600 is +20% of the 500-count target). This is done to make richness metrics comparable across the 500-count samples. You can do this with the Rarify routine.

# Subsample to 500 organisms (from over 500 organisms) for 12 samples.

# Packages
#library(BioMonTools)
#library(knitr)

# load bio data
df_biodata <- BioMonTools::data_bio2rarify
#dim(df_biodata)
#kable(head(df_biodata))

# subsample
mySize <- 500
Seed_OR <- 18590214
Seed_WA <- 18891111
Seed_US <- 17760704
bugs_mysize <- BioMonTools::rarify(inbug = df_biodata
                                   , sample.ID = "SampleID"
                                   , abund = "N_Taxa"
                                   , subsiz = mySize
                                   , mySeed = Seed_US)
#> Rarify of samples complete. 
#>  Number of samples =  12 
#>  Execution time (sec) =  0.00900000000000034


# view results
#dim(bugs_mysize)
#kable(head(bugs_mysize))

# Compare pre- and post- subsample counts
df_compare <- merge(df_biodata
                    , bugs_mysize
                    , by = c("SampleID", "TaxaID")
                    , suffixes = c("_Orig","_500"))
df_compare <- df_compare[,c("SampleID", "TaxaID", "N_Taxa_Orig", "N_Taxa_500")]
knitr::kable(head(df_compare), caption = "Comparison, by Sample")

Comparison, by Sample
SampleID	TaxaID	N_Taxa_Orig	N_Taxa_500
01103CSR_Bug_2001-08-27_0	Ceratopogoninae	237	142
01103CSR_Bug_2001-08-27_0	Cheumatopsyche	2	0
01103CSR_Bug_2001-08-27_0	Chironomini	8	4
01103CSR_Bug_2001-08-27_0	Cladocera	2	1
01103CSR_Bug_2001-08-27_0	Coenagrionidae	2	1
01103CSR_Bug_2001-08-27_0	Corbicula	2	1


# compare totals
tbl_totals <- aggregate(cbind(N_Taxa_Orig, N_Taxa_500) ~ SampleID
                        , df_compare
                        , sum)
knitr::kable(head(tbl_totals), caption = "Comparison, sample totals")

Comparison, sample totals
SampleID	N_Taxa_Orig	N_Taxa_500
01103CSR_Bug_2001-08-27_0	816	500
02087REF_Bug_2002-08-22_0	542	500
03013CSR_Bug_2003-07-01_0	508	500
03053CSR_Bug_2003-08-14_0	573	500
03054CSR_Bug_2003-08-18_0	558	500
06012CSR_Bug_2006-09-05_0	545	500


## Not run: 
# save the data
#write.table(bugs_mysize, paste("bugs",mySize,"txt",sep="."),sep="\t")
## End(Not run)

Flags

Results should be interpreted with caution if they are flagged for any of the criteria listed below. It is up to the user’s discretion as to how to handled flagged samples.

# Packages
#library(readxl)
#library(reshape2)
#library(knitr)
#library(BioMonTools)

# Import
df.samps.bugs <- readxl::read_excel(system.file("extdata/Data_Benthos.xlsx"
                                                , package = "BioMonTools")
                           , guess_max = 10^6)

# Calculate Metrics
# Extra columns to keep in results
keep.cols <- c("Area_mi2", "SurfaceArea", "Density_m2", "Density_ft2")
# Run Function
df.metrics <- BioMonTools::metric.values(df.samps.bugs
                                         , "bugs"
                                         , fun.cols2keep = keep.cols)
#> Updated col class; TOLVAL2 to numeric
#> Joining with `by = join_by(SAMPLEID, INDEX_NAME, INDEX_CLASS)`


# Flags
# Import QC Checks
df.checks <- readxl::read_excel(system.file("extdata/MetricFlags.xlsx"
                                          , package = "BioMonTools")
                                , sheet = "Flags") 
# Run Function
df.flags <- BioMonTools::qc.checks(df.metrics, df.checks)

# Change terminology; PASS/FAIL to NA/flag
df.flags[,"FLAG"][df.flags[,"FLAG"] == "FAIL"] <- "flag"
df.flags[, "FLAG"][df.flags[,"FLAG"] == "PASS"] <- NA
# long to wide format
df.flags.wide <- reshape2::dcast(df.flags
                                 , SAMPLEID ~ CHECKNAME
                                 , value.var = "FLAG")
# Calc number of "flag"s by row.
df.flags.wide$NumFlags <- rowSums(df.flags.wide == "flag", na.rm = TRUE)
# Rearrange columns
NumCols <- ncol(df.flags.wide)
df.flags.wide <- df.flags.wide[, c(1, NumCols, 2:(NumCols - 1))]
# View(df.flags.wide)

# Summarize Results
knitr::kable(table(df.flags[,"CHECKNAME"], df.flags[,"FLAG"], useNA = "ifany"))

	flag	NA
brackish organisms present	7	671
catchment, Large	0	678
catchment, small	158	520
individuals, dominant 02, Large	204	474
individuals, Large	0	678
individuals, small	80	598
Low density (ft2)	0	678
Low density (m2)	0	678
Ramellogammarus	8	670
surface area, small	112	566

Erik.Leppo@tetratech.com

2024-06-05