Build a Mini Reference Library

library(OpenSpecy)

This example combines a few files bundled with OpenSpecy into a small reference library. Real library builds usually need larger lookup tables and more curation, but the same helper functions apply.

Read And Combine Spectra

c_spec() and build_lib() default to the widest range represented by the source spectra at resolution 6. Values outside each source’s original range are kept as NA, so useful spectral regions are not discarded.

mini_files <- c(
  read_extdata("raman_hdpe.csv"),
  read_extdata("ftir_ldpe_soil.asp"),
  read_extdata("raman_atacamit.spc")
)

mini_sources <- lapply(mini_files, read_any)
mini_sources <- lapply(mini_sources, function(x) {
  x$metadata$intensity_units <- "absorbance"
  attr(x, "intensity_unit") <- "absorbance"
  x
})
mini_raw <- c_spec(mini_sources)

check_OpenSpecy(mini_raw)
#> [1] TRUE
dim(mini_raw$spectra)
#> [1] 647   3
mini_raw$metadata[, "file_name", with = FALSE]
#>             file_name
#>                <char>
#> 1:     raman_hdpe.csv
#> 2: ftir_ldpe_soil.asp
#> 3: raman_atacamit.spc

Create And Fill A Lookup

Lookup templates help users see which metadata values need curation. When path is not supplied, the template is returned as a data.table; users can write it to CSV by supplying path. Lookup joins use exact values, so edit the template until the key column matches the object metadata.

template <- make_lib_lookup_template(
  mini_raw,
  columns = "file_name",
  add = c("material", "material_type")
)

template
#>             file_name material material_type
#>                <char>   <char>        <char>
#> 1:     raman_hdpe.csv     <NA>          <NA>
#> 2: ftir_ldpe_soil.asp     <NA>          <NA>
#> 3: raman_atacamit.spc     <NA>          <NA>

For this small example, create the lookup table directly in R. The same shape could come from a CSV edited outside R.

lookup <- data.table::data.table(
  file_name = basename(mini_files),
  library_type = c("example", "example", "example"),
  material = c("hdpe", "ldpe in soil", "atacamite")
)

hierarchy <- data.table::data.table(
  material = c("hdpe", "ldpe in soil", "atacamite"),
  material_class = c("polyethylene", "polyethylene", "copper mineral"),
  material_type = c("plastic", "plastic", "mineral")
)

join_lib_metadata(mini_raw, lookup, by = "file_name",
                  require_complete = TRUE)$metadata[
  , c("file_name", "library_type", "material"), with = FALSE
]
#>             file_name library_type     material
#>                <char>       <char>       <char>
#> 1:     raman_hdpe.csv      example         hdpe
#> 2: ftir_ldpe_soil.asp      example ldpe in soil
#> 3: raman_atacamit.spc      example    atacamite

Build A Mini Library

build_lib() can run the ordinary lookup and material hierarchy joins before applying named recipes. Recipe names become names in the returned list. Empty recipes keep the merged spectra unchanged; other recipe lists are passed to process_spec(). Missing values and processing attributes are handled automatically. Metadata column names are also cleaned to lowercase underscore names. Known aliases are coalesced using an editable lookup table. Variants that differ only by underscores or one terminal plural s match automatically.

Before sources are merged, build_lib() also converts declared reflectance and transmittance spectra to absorbance by default. A nonempty attr(x, "intensity_unit") is the primary truth for the whole object; otherwise, metadata$intensity_units is evaluated spectrum by spectrum. Unknown or missing units are left unchanged with a warning. Use convert_intensity = FALSE when unit handling has already been completed outside the builder.

The normal input is one or more file paths readable by read_any(); a list of OpenSpecy objects supports sources already loaded in memory. A bare OpenSpecy is intentionally rejected, so use list(x) for a one-object list. Supplying restrict_range_args triggers the existing restrict_range() operation before deduplication and recipes; multiple retained ranges can exclude a known silent region without custom workflow code.

name_lookup <- lib_metadata_name_lookup(
  project_code = c("campaign id", "study code"),
  regex = list(instrument_mode = "^method_[0-9]+$")
)
name_lookup[
  canonical_name %in% c("material_color", "number_of_accumulations")
]
#>             canonical_name             source_name  regex
#>                     <char>                  <char> <char>
#> 1:          material_color          material_color   <NA>
#> 2:          material_color                   color   <NA>
#> 3:          material_color                  colour   <NA>
#> 4: number_of_accumulations number_of_accumulations   <NA>
#> 5: number_of_accumulations  number_of_sample_scans   <NA>
#> 6: number_of_accumulations           coadded_scans   <NA>
lib_clean_name(c("User Name", "Laser (%)", "Method...3"))
#> [1] "user_name"  "laser_perc" "method_3"

Named arguments add exact aliases to the defaults, while regex adds patterns evaluated against cleaned names. Overlapping regex patterns produce an error that identifies the source column and matching rules. Pass the result as metadata_name_lookup. Ordinary and hierarchical joins run whenever their corresponding lookup input is non-NULL.

mini_libs <- build_lib(
  mini_files,
  recipes = list(
    raw = list(),
    derivative = list(
      conform_spec = FALSE,
      smooth_intens = TRUE,
      smooth_intens_args = list(window = 15, derivative = 1),
      make_rel = TRUE
    ),
    nobaseline = list(
      conform_spec = FALSE,
      smooth_intens = FALSE,
      subtr_baseline = TRUE,
      make_rel = TRUE
    )
  ),
  metadata_lookups = lookup,
  material_hierarchy = hierarchy,
  convert_intensity = FALSE,
  assess = TRUE,
  dedupe = FALSE
)

names(mini_libs)
#> [1] "raw"        "derivative" "nobaseline"
check_OpenSpecy(mini_libs$raw)
#> [1] TRUE
check_OpenSpecy(mini_libs$derivative)
#> [1] TRUE
attr(mini_libs$derivative, "derivative_order")
#> [1] "1"
attr(mini_libs$nobaseline, "baseline")
#> [1] "nobaseline"
mini_libs$raw$metadata[
  , .(file_name, material, material_class, material_type, sn,
      assessment_flag, assessment_checks)
]
#>             file_name     material material_class material_type        sn
#>                <char>       <char>         <char>        <char>     <num>
#> 1:     raman_hdpe.csv         hdpe   polyethylene       plastic  5.542373
#> 2: ftir_ldpe_soil.asp ldpe in soil   polyethylene       plastic 43.439904
#> 3: raman_atacamit.spc    atacamite copper mineral       mineral  5.220850
#>    assessment_flag          assessment_checks
#>             <lgcl>                     <char>
#> 1:            TRUE co2_region; missing_values
#> 2:            TRUE             missing_values
#> 3:            TRUE  high_tail; missing_values

Official Reference-Library Workflow

The version-controlled workflows/OpenSpecy_reference_library.R script is a straight-line composition of OpenSpecy functions. Canonically named class, library-type, material-hierarchy, and known-bad-ID tables live under workflows/data/. Raw-data corrections are completed externally before this workflow runs. The script is excluded from package builds but remains available in the GitHub repository so library releases can be reviewed and reproduced.