--- title: "Build a Mini Reference Library" author: "Win Cowger" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Build a Mini Reference Library} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE ) data.table::setDTthreads(2) ``` ```{r setup} library(OpenSpecy) ``` This example combines a few files bundled with OpenSpecy into a small reference library. Real library builds usually need larger lookup tables and more curation, but the same helper functions apply. ## Read And Combine Spectra `c_spec()` and `build_lib()` default to the widest range represented by the source spectra at resolution 6. Values outside each source's original range are kept as `NA`, so useful spectral regions are not discarded. ```{r mini-library-read} mini_files <- c( read_extdata("raman_hdpe.csv"), read_extdata("ftir_ldpe_soil.asp"), read_extdata("raman_atacamit.spc") ) mini_sources <- lapply(mini_files, read_any) mini_sources <- lapply(mini_sources, function(x) { x$metadata$intensity_units <- "absorbance" attr(x, "intensity_unit") <- "absorbance" x }) mini_raw <- c_spec(mini_sources) check_OpenSpecy(mini_raw) dim(mini_raw$spectra) mini_raw$metadata[, "file_name", with = FALSE] ``` ## Create And Fill A Lookup Lookup templates help users see which metadata values need curation. When `path` is not supplied, the template is returned as a `data.table`; users can write it to CSV by supplying `path`. Lookup joins use exact values, so edit the template until the key column matches the object metadata. ```{r mini-library-template} template <- make_lib_lookup_template( mini_raw, columns = "file_name", add = c("material", "material_type") ) template ``` For this small example, create the lookup table directly in R. The same shape could come from a CSV edited outside R. ```{r mini-library-join} lookup <- data.table::data.table( file_name = basename(mini_files), library_type = c("example", "example", "example"), material = c("hdpe", "ldpe in soil", "atacamite") ) hierarchy <- data.table::data.table( material = c("hdpe", "ldpe in soil", "atacamite"), material_class = c("polyethylene", "polyethylene", "copper mineral"), material_type = c("plastic", "plastic", "mineral") ) join_lib_metadata(mini_raw, lookup, by = "file_name", require_complete = TRUE)$metadata[ , c("file_name", "library_type", "material"), with = FALSE ] ``` ## Build A Mini Library `build_lib()` can run the ordinary lookup and material hierarchy joins before applying named recipes. Recipe names become names in the returned list. Empty recipes keep the merged spectra unchanged; other recipe lists are passed to `process_spec()`. Missing values and processing attributes are handled automatically. Metadata column names are also cleaned to lowercase underscore names. Known aliases are coalesced using an editable lookup table. Variants that differ only by underscores or one terminal plural `s` match automatically. Before sources are merged, `build_lib()` also converts declared reflectance and transmittance spectra to absorbance by default. A nonempty `attr(x, "intensity_unit")` is the primary truth for the whole object; otherwise, `metadata$intensity_units` is evaluated spectrum by spectrum. Unknown or missing units are left unchanged with a warning. Use `convert_intensity = FALSE` when unit handling has already been completed outside the builder. The normal input is one or more file paths readable by `read_any()`; a list of `OpenSpecy` objects supports sources already loaded in memory. A bare `OpenSpecy` is intentionally rejected, so use `list(x)` for a one-object list. Supplying `restrict_range_args` triggers the existing `restrict_range()` operation before deduplication and recipes; multiple retained ranges can exclude a known silent region without custom workflow code. ```{r metadata-name-lookup} name_lookup <- lib_metadata_name_lookup( project_code = c("campaign id", "study code"), regex = list(instrument_mode = "^method_[0-9]+$") ) name_lookup[ canonical_name %in% c("material_color", "number_of_accumulations") ] lib_clean_name(c("User Name", "Laser (%)", "Method...3")) ``` Named arguments add exact aliases to the defaults, while `regex` adds patterns evaluated against cleaned names. Overlapping regex patterns produce an error that identifies the source column and matching rules. Pass the result as `metadata_name_lookup`. Ordinary and hierarchical joins run whenever their corresponding lookup input is non-`NULL`. ```{r mini-library-build} mini_libs <- build_lib( mini_files, recipes = list( raw = list(), derivative = list( conform_spec = FALSE, smooth_intens = TRUE, smooth_intens_args = list(window = 15, derivative = 1), make_rel = TRUE ), nobaseline = list( conform_spec = FALSE, smooth_intens = FALSE, subtr_baseline = TRUE, make_rel = TRUE ) ), metadata_lookups = lookup, material_hierarchy = hierarchy, convert_intensity = FALSE, assess = TRUE, dedupe = FALSE ) names(mini_libs) check_OpenSpecy(mini_libs$raw) check_OpenSpecy(mini_libs$derivative) attr(mini_libs$derivative, "derivative_order") attr(mini_libs$nobaseline, "baseline") mini_libs$raw$metadata[ , .(file_name, material, material_class, material_type, sn, assessment_flag, assessment_checks) ] ``` ## Official Reference-Library Workflow The version-controlled [`workflows/OpenSpecy_reference_library.R`](https://github.com/wincowgerDEV/OpenSpecy-package/blob/main/workflows/OpenSpecy_reference_library.R) script is a straight-line composition of OpenSpecy functions. Canonically named class, library-type, material-hierarchy, and known-bad-ID tables live under `workflows/data/`. Raw-data corrections are completed externally before this workflow runs. The script is excluded from package builds but remains available in the GitHub repository so library releases can be reviewed and reproduced.