Skip to contents

Calculate zonal statistics based on one or more environmental variable raster .tif layers. This function aggregates data to 12 summary statistics (mean, min, max, range, ...) for selected or all sub-catchments of the input file. The sub-catchment raster (.tif) input file is read directly from disk. The output is a data.table which is loaded into R. This function can also be used for any zonal statistic calculation by specifying the raster layer zones in the subc_layer parameter and optionally, also the target zone IDs in the subc_id parameter.

Usage

extract_zonal_stat(
  data_dir,
  subc_id,
  subc_layer,
  var_layer,
  out_dir = NULL,
  file_name = NULL,
  n_cores = NULL,
  quiet = TRUE
)

Arguments

data_dir

character. Path to the directory containing all input data.

subc_id

Vector of sub-catchment IDs or "all". If "all", zonal statistics are calculated for all sub-catchments of the given sub-catchment raster layer. A vector of the sub-catchment IDs can be acquired from the extract_ids() function, and by sub-setting the resulting data.frame.

subc_layer

character. Full path to the sub-catchment ID .tif layer.

var_layer

character vector of variable raster layers on disk, e.g. "slope_grad_dw_cel_h00v00.tif". Note that the variable name appears in the output table columns (e.g. slope_grad_dw_cel_mean). To speed up the processing, the selected variable raster layers can be cropped to the extent of the sub-catchment layer, e.g. with crop_to_extent().

out_dir

character. The directory where the output will be stored. If the out_dir and file_name are specified, the output table will be stored as a .csv file in this location. If they are NULL, the output is only loaded in R and not stored on disk.

file_name

character. Name of the .csv file where the output table will be stored. out_dir should also be specified for this purpose.

n_cores

numeric. Number of cores used for parallelization, in case multiple .tif files are provided to var_layer. Default is 1.

quiet

logical. If FALSE, the standard output will be printed. Default is TRUE.

Value

Returns a table with

  • sub-catchment ID (subc_id)

  • number of cells with a value (data_cells)

  • number of cells with a NoData value (nodata_cells)

  • minimum value (min)

  • maximum value (max)

  • value range (range)

  • arithmetic mean (mean)

  • arithmetic mean of the absolute values (mean_abs)

  • standard deviation (sd)

  • variance (var)

  • coefficient of variation (cv)

  • sum (sum)

  • sum of the absolute values (sum_abs).

See also

Author

Afroditi Grigoropoulou, Jaime Garcia Marquez, Marlene Schürz

Examples

# Download test data into the temporary R folder
# or define a different directory
my_directory <- tempdir()
download_test_data(my_directory)

# Define full path to the sub-catchment ID .tif layer
subc_raster <-  paste0(my_directory, "/hydrography90m_test_data",
                       "/subcatchment_1264942.tif")

# Define the directory where the output will be stored
output_folder <- paste0(my_directory, "/hydrography90m_test_data/output")
# Create output folder if it doesn't exist
if(!dir.exists(output_folder)) dir.create(output_folder)

# Calculate the zonal statistics for all sub-catchments for two variables
stat <- extract_zonal_stat(data_dir = paste0(my_directory,
                                             "/hydrography90m_test_data"),
                           subc_id = c(513837216, 513841103,
                                       513850467, 513868394,
                                       513870312),
                           subc_layer = subc_raster,
                           var_layer = c("spi_1264942.tif",
                                         "sti_1264942.tif"),
                           out_dir = output_folder,
                           file_name = "zonal_statistics.csv",
                           n_cores = 2)
# Show output table
stat