Skip to contents

Overview

galaxias is an R package that helps users describe, bundle, and share biodiversity information using the ‘Darwin Core’ data standard. galaxias provides tools in R to build a Darwin Core Archive, a zip file containing standardised data and metadata accepted by global data infrastructures. The package mirrors functionality in devtools, usethis, and dplyr to manage data, files, and folders. galaxias was created by the Science & Decision Support Team at the Atlas of Living Australia (ALA).

The package is named for a genus of freshwater fish that is found only in the Southern Hemisphere, and predominantly in Australia and Aotearoa New Zealand. The logo shows a Spotted Galaxias (Galaxias truttaceus) drawn by Ian Brennan.

If you have any comments, questions, or suggestions, please contact us.

Installation

You can install the latest version from GitHub with:

install.packages("remotes")
remotes::install_github("atlasoflivingaustralia/galaxias")

Once on CRAN, you can use:

install.packages("galaxias")

To load the package, call:

Features

galaxias contains tools to:

  • Standardise tibbles containing biodiversity observations to match the Darwin Core Standard.
  • Convert metadata statements written in R Markdown or Quarto to EML files.
  • Store all your publication-ready files in a single directory, and zip that directory for publication.
  • Check files for consistency with the Darwin Core Standard, either locally using or via API.

galaxias draws on functionality from two underlying packages that address different challenges of the data publication workflow: corella, which converts tibbles to use standard column names; and delma which converts markdown files to EML format.

Usage

Here we have a small example dataset of species observations.

library(tibble)

df <- tibble(
  scientificName = c("Callocephalon fimbriatum", "Eolophus roseicapilla"),
  latitude = c(-35.310, -35.273), 
  longitude = c(149.125, 149.133),
  eventDate = lubridate::dmy(c("14-01-2023", "15-01-2023")),
  status = c("present", "present")
)

df
#> # A tibble: 2 × 5
#>   scientificName           latitude longitude eventDate  status 
#>   <chr>                       <dbl>     <dbl> <date>     <chr>  
#> 1 Callocephalon fimbriatum    -35.3      149. 2023-01-14 present
#> 2 Eolophus roseicapilla       -35.3      149. 2023-01-15 present

We can standardise data according to Darwin Core Standard using set_ functions.

df_dwc <- df |>
   set_occurrences(occurrenceID = random_id(),
                   basisOfRecord = "humanObservation",
                   occurrenceStatus = status) |>
   set_coordinates(decimalLatitude = latitude,
                   decimalLongitude = longitude)

df_dwc
#> # A tibble: 2 × 7
#>   scientificName          eventDate  basisOfRecord occurrenceID occurrenceStatus
#>   <chr>                   <date>     <chr>         <chr>        <chr>           
#> 1 Callocephalon fimbriat… 2023-01-14 humanObserva… 5cfe0c2a-45… present         
#> 2 Eolophus roseicapilla   2023-01-15 humanObserva… 5cfe0c34-45… present         
#> # ℹ 2 more variables: decimalLatitude <dbl>, decimalLongitude <dbl>

We can then specify that we wish to use these standardised data in a Darwin Core Archive with use_data(). This saves df_dwc with a valid file name and extension, and in a standardised location (a new directory called /data-publish).

use_data(df_dwc)

Before publishing your data, it is also necessary to create a metadata statement that describes who owns the data, what the data shows, and what licence it is released under. galaxias enables you to write your metadata statement in R Markdown or Quarto format, and seamlessly convert it to EML for publication.

# 1. Create a boilerplate file
use_metadata_template("metadata.Rmd")

# 2. Edit in your preferred IDE

# 3. Load into /data-publish as an EML file
use_metadata("metadata.Rmd")

The final step in your data publication workflow is to zip your directory into a single file. This file is placed in your parent directory.

build_archive(file = "my_biodiversity_data.zip")

You can share your data via any mechanism you wish, but galaxias provides the submit_archive() function to open a submission window for the Atlas of Living Australia.

Please see the Quick Start Guide for a more in-depth explanation of building Darwin Core Archives.

Citing galaxias

To generate a citation for the package version you are using, you can run:

citation(package = "galaxias")

The current recommended citation is:

Westgate MJ, Balasubramaniam S & Kellie D (2025) galaxias: Describe, Package, and Share Biodiversity Data. R Package version 0.1.0.

Contributors

Developers who have contributed to galaxias are as follows (in alphabetical order by surname):

Amanda Buyan (@acbuyan), Fonti Kar (@fontikar), Peggy Newman (@peggynewman) & Andrew Schwenke (@andrew-1234)