Convert user taxa names to those in an official project based name list.

taxa_translate(
  df_user = NULL,
  df_official = NULL,
  df_official_metadata = NULL,
  taxaid_user = "TAXAID",
  taxaid_official_match = NULL,
  taxaid_official_project = NULL,
  taxaid_drop = NULL,
  col_drop = NULL,
  sum_n_taxa_boo = FALSE,
  sum_n_taxa_col = NULL,
  sum_n_taxa_group_by = NULL,
  clean = FALSE,
  match_caps = FALSE
)

Arguments

df_user

User taxa data

df_official

Official project taxa data (master taxa list).

df_official_metadata

Metadata for offiical project taxa data. Included Default is NULL

taxaid_user

Taxonomic identifier in user data. Default is "TAXAID".

taxaid_official_match

Taxonomic identifier in official data user to match with user data. This is not the project taxanomic identifier.

taxaid_official_project

Taxonomic identifier in official data that is specific to a project, e.g., after operational taxonomic unit (OTU) applied.

taxaid_drop

Official taxonomic identifier that signals a record should be dropped; e.g., DNI (Do Not Include) or -999. Default = NULL

col_drop

Columns to remove in output. Default = NULL

sum_n_taxa_boo

Boolean value for if the results should be summarized Default = FALSE

sum_n_taxa_col

Column name for number of individuals for user data when summarizing. This column will be summed. Default = NULL (suggestion = N_TAXA)

sum_n_taxa_group_by

Column names for user data to use for grouping the data when summarizing the user data. Suggestions are SAMPID and TAXA_ID. Default = NULL

clean

Should the taxa have leading and trailing white space removed. Non-braking spaces (e.g., from ITIS) also removed. Default = FALSE

match_caps

Should the matching be performed using ALL CAPS. Default = FALSE

Value

A list with four elements. The first (merge) is the user data frame with additional columns from the official data appended to it. Names from the user data that overlap with the official data have the suffix '_User'. The second element (nonmatch) of the list is a vector of the non-matching taxa from the user data. The third element (metadata) includes the metadata for the official data (if provided). The fourth element (unique) is a data frame of the unique taxa names old and new.

Details

Merges user file with official file. The official file has phylogeny, autecology, and other project specific fields.

The inputs for the function uses existing data frames (or tibbles).

Any fields that match between the user file and the official file the official data column name have the 'official' version retained.

The 'col_drop' parameter can be used to remove unwanted columns; e.g., the other taxa id fields in the 'official' data file.

By default, taxa are not collapsed to the official taxaid. That is, if multiple taxa in a sample have the same name the rows will not be combined. If collapsing is desired set the parameter `sum_n_taxa_boo` to TRUE. Will also need to provide `sum_n_taxa_col` and `sum_n_taxa_group_by`.

Slightly different than `qc_taxa` since no options in `taxa_translate` for using one field over another and is more generic.

The parameter `taxaid_drop` is used to drop records that matched to a new name that should not be included in the results. Examples include "999" or "DNI" (Do Not Include). Default is NULL so no action is taken. "NA"s are always removed.

Optional parameter to `clean` the data of leading and trailing white space. Default is FALSE (no action). Not fully implemented.

The optional parameter `match_caps` matches on all upper case (input and official lists). The default is FALSE and matches will be performed without any additional steps for case. Not fully implemented.

The taxa list and metadata file names will be added to the results as two new columns.

Another output is the unique taxa with old and new names.

Examples

# Example 1, PacNW
## Input Parameters
df_user <- data_benthos_PacNW
fn_official <- file.path(system.file("extdata", package = "BioMonTools")
                         , "taxa_official"
                         , "ORWA_TAXATRANSLATOR_20221219.csv")
df_official <- read.csv(fn_official)
fn_official_metadata <- file.path(system.file("extdata"
                                              , package = "BioMonTools")
                                  , "taxa_official"
                      , "ORWA_ATTRIBUTES_METADATA_20221117.csv")
df_official_metadata <- read.csv(fn_official_metadata)
taxaid_user <- "TaxaID"
taxaid_official_match <- "Taxon_orig"
taxaid_official_project <- "OTU_MTTI"
taxaid_drop <- "DNI"
col_drop <- c("Taxon_v2", "OTU_BCG_MariNW") # non desired ID cols in Official
sum_n_taxa_boo <- TRUE
sum_n_taxa_col <- "N_TAXA"
sum_n_taxa_group_by <- c("INDEX_NAME", "INDEX_CLASS", "SampleID", "TaxaID")
## Run Function
taxatrans <- taxa_translate(df_user
                               , df_official
                               , df_official_metadata
                               , taxaid_user
                               , taxaid_official_match
                               , taxaid_official_project
                               , taxaid_drop
                               , col_drop
                               , sum_n_taxa_boo
                               , sum_n_taxa_col
                               , sum_n_taxa_group_by)
#> User taxa match, 221 / 223
#> The following user taxa (2/223) did not match the official taxa list:
#> Eukiefferiella coerulescens/claripennis groups
#> Telmatodrilinae
## View Results
# View(taxatrans$merge)
taxatrans$nonmatch
#>                                           TaxaID N_Taxa_Sum N_Taxa_Count
#> 1 Eukiefferiella coerulescens/claripennis groups          4            1
#> 2                                Telmatodrilinae          1            1
# View(taxatrans$official_metadata)

#~~~~~
# Example 2, Multiple Stages
# Create data
TAXAID <- c(rep("Agapetus", 3), rep("Zavrelimyia", 2))

N_TAXA <- c(rep(33, 3), rep(50, 2))
STAGE <- c("A","L","P","X","")
df_user <- data.frame(TAXAID, N_TAXA, STAGE)
df_user[, "INDEX_NAME"] <- "BCG_MariNW_Bugs500ct"
df_user[, "INDEX_CLASS"] <- "HiGrad-HiElev"
df_user[, "SAMPLEID"] <- "Test2023"
df_user[, "STATIONID"] <- "Test"
df_user[, "DATE"] <- "2023-01-16"
## Input Parameters
fn_official <- file.path(system.file("extdata", package = "BioMonTools")
                         , "taxa_official"
                         , "ORWA_TAXATRANSLATOR_20221219.csv")
df_official <- read.csv(fn_official)
fn_official_metadata <- file.path(system.file("extdata"
                                              , package = "BioMonTools")
                                  , "taxa_official"
                                  , "ORWA_ATTRIBUTES_20221212.csv")
df_official_metadata <- read.csv(fn_official_metadata)
taxaid_user <- "TAXAID"
taxaid_official_match <- "Taxon_orig"
taxaid_official_project <- "OTU_BCG_MariNW"
taxaid_drop <- NULL
col_drop <- c("Taxon_v2", "OTU_MTTI") # non desired ID cols in Official
sum_n_taxa_boo <- TRUE
sum_n_taxa_col <- "N_TAXA"
sum_n_taxa_group_by <- c("INDEX_NAME", "INDEX_CLASS", "SAMPLEID", "TAXAID")
## Run Function
taxatrans <- BioMonTools::taxa_translate(df_user
                                         , df_official
                                         , df_official_metadata
                                         , taxaid_user
                                      , taxaid_official_match
                                      , taxaid_official_project
                                      , taxaid_drop
                                      , col_drop
                                      , sum_n_taxa_boo
                                      , sum_n_taxa_col
                                      , sum_n_taxa_group_by)
#> User taxa match, 2 / 2
## View Results (before and after)
df_user
#>        TAXAID N_TAXA STAGE           INDEX_NAME   INDEX_CLASS SAMPLEID
#> 1    Agapetus     33     A BCG_MariNW_Bugs500ct HiGrad-HiElev Test2023
#> 2    Agapetus     33     L BCG_MariNW_Bugs500ct HiGrad-HiElev Test2023
#> 3    Agapetus     33     P BCG_MariNW_Bugs500ct HiGrad-HiElev Test2023
#> 4 Zavrelimyia     50     X BCG_MariNW_Bugs500ct HiGrad-HiElev Test2023
#> 5 Zavrelimyia     50       BCG_MariNW_Bugs500ct HiGrad-HiElev Test2023
#>   STATIONID       DATE
#> 1      Test 2023-01-16
#> 2      Test 2023-01-16
#> 3      Test 2023-01-16
#> 4      Test 2023-01-16
#> 5      Test 2023-01-16
taxatrans$merge
#>        TAXAID           INDEX_NAME   INDEX_CLASS SAMPLEID N_TAXA  Taxon_Group
#> 1    Agapetus BCG_MariNW_Bugs500ct HiGrad-HiElev Test2023     99  Trichoptera
#> 2 Zavrelimyia BCG_MariNW_Bugs500ct HiGrad-HiElev Test2023    100 Chironomidae
#>      Taxon_v2 Changed            Rationale NonTarget    OTU_MTTI OTU_BCG_MariNW
#> 1    Agapetus      no                          FALSE    Agapetus       Agapetus
#> 2 Zavrelimyia      no Current Nomenclature     FALSE Zavrelimyia    Zavrelimyia
#>      TSN  Kingdom     Phylum SubPhylum   Class  SubClass       Order   SubOrder
#> 1 117121 Animalia Arthropoda  Hexapoda Insecta Pterygota Trichoptera           
#> 2 128259 Animalia Arthropoda  Hexapoda Insecta Pterygota     Diptera Nematocera
#>   SuperFamily          Family   SubFamily        Tribe GenusGroup       Genus
#> 1             Glossosomatidae  Agapetinae                            Agapetus
#> 2                Chironomidae Tanypodinae Pentaneurini            Zavrelimyia
#>   SubGenus SpeciesGroup SpeciesSubGroup SpeciesComplex Species Match_Official
#> 1                                                                        TRUE
#> 2                                                                        TRUE