summary_database() scan the directory specified and returns a data.table() containing summary information about all the CMIP6 files available against the output file index loaded using load_cmip6_index().

summary_database(
  dir,
  by = c("activity", "experiment", "variant", "frequency", "variable", "source",
    "resolution"),
  mult = c("skip", "latest"),
  append = FALSE,
  recursive = FALSE,
  update = FALSE,
  warning = TRUE
)

Arguments

dir

A single string indcating the directory where CMIP6 model output NetCDF files are stored.

by

The grouping column to summary the database status. Should be a subeset of:

  • "experiment": root experiment identifiers

  • "source": model identifiers

  • "variable": variable identifiers

  • "activity": activity identifiers

  • "frequency": sampling frequency

  • "variant": variant label

  • "resolution": approximate horizontal resolution

mult

Actions when multiple files match a same case in the CMIP6 index. If "latest", the file with latest modification time will be used. If "skip", all matched files will be skip and this case will be kept as unmatched. Default: "skip".

append

If TRUE, status of CMIP6 files will only be updated if they are not found in previous summary. This is useful if CMIP6 files are stored in different directories. Default: FALSE.

recursive

If TRUE, scan recursively into directories. Default: FALSE.

update

If TRUE, the output file index will be updated based on the matched NetCDF files in specified directory. If FALSE, only current loaded index will be updated, but the actual index database file saved in get_data_dir() will remain unchanged. Default: FALSE.

warning

If TRUE, warning messages will show when multiple files match a same case. Default: TRUE.

Value

A data.table::data.table() containing corresponding grouping columns plus:

ColumnTypeDescription
datetime_startPOSIXctStart date and time of simulation
datetime_endPOSIXctEnd date and time of simulation
file_numIntegerTotal number of file per group
file_sizeUnits (Mbytes)Approximate total size of file
dl_numIntegerTotal number of file downloaded
dl_percentUnits (%)Total percentage of file downloaded
dl_sizeUnits (Mbytes)Total size of file downloaded

Also an attribute not_matched is added to the returned data.table::data.table() which contains meta data for those CMIP6 output files that are not covered by current CMIP6 output file index.

For the meaning of grouping columns, see init_cmip6_index().

Details

summary_database() uses future.apply underneath. You can use your preferable future backend to speed up data extraction in parallel. By default, summary_database() uses future::sequential backend, which runs things in sequential.

Examples

if (FALSE) { summary_database() summary_database(by = "experiment") }