Overview
galaxias
is an R package that helps users describe, bundle, and share biodiversity information using the ‘Darwin Core’ data standard. galaxias
provides tools in R to build a Darwin Core Archive, a zip file containing standardised data and metadata accepted by global data infrastructures. The package mirrors functionality in devtools, usethis, and dplyr to manage data, files, and folders. galaxias
was created by the Science & Decision Support Team at the Atlas of Living Australia (ALA).
The package is named for a genus of freshwater fish that is found only in the Southern Hemisphere, and predominantly in Australia and Aotearoa New Zealand. The logo shows a Spotted Galaxias (Galaxias truttaceus) drawn by Ian Brennan.
If you have any comments, questions, or suggestions, please contact us.
Installation
You can install the latest version from GitHub with:
install.packages("remotes")
remotes::install_github("atlasoflivingaustralia/galaxias")
Once on CRAN, you can use:
install.packages("galaxias")
To load the package, call:
Features
galaxias
contains tools to:
- Standardise
tibbles
containing biodiversity observations to match the Darwin Core Standard. - Convert metadata statements written in R Markdown or Quarto to EML files.
- Store all your publication-ready files in a single directory, and zip that directory for publication.
- Check files for consistency with the Darwin Core Standard, either locally using or via API.
galaxias
draws on functionality from two underlying packages that address different challenges of the data publication workflow: corella
, which converts tibbles to use standard column names; and delma
which converts markdown files to EML
format.
Usage
Here we have a small example dataset of species observations.
library(tibble)
df <- tibble(
scientificName = c("Callocephalon fimbriatum", "Eolophus roseicapilla"),
latitude = c(-35.310, -35.273),
longitude = c(149.125, 149.133),
eventDate = lubridate::dmy(c("14-01-2023", "15-01-2023")),
status = c("present", "present")
)
df
#> # A tibble: 2 × 5
#> scientificName latitude longitude eventDate status
#> <chr> <dbl> <dbl> <date> <chr>
#> 1 Callocephalon fimbriatum -35.3 149. 2023-01-14 present
#> 2 Eolophus roseicapilla -35.3 149. 2023-01-15 present
We can standardise data according to Darwin Core Standard using set_
functions.
df_dwc <- df |>
set_occurrences(occurrenceID = random_id(),
basisOfRecord = "humanObservation",
occurrenceStatus = status) |>
set_coordinates(decimalLatitude = latitude,
decimalLongitude = longitude)
df_dwc
#> # A tibble: 2 × 7
#> scientificName eventDate basisOfRecord occurrenceID occurrenceStatus
#> <chr> <date> <chr> <chr> <chr>
#> 1 Callocephalon fimbriat… 2023-01-14 humanObserva… 5cfe0c2a-45… present
#> 2 Eolophus roseicapilla 2023-01-15 humanObserva… 5cfe0c34-45… present
#> # ℹ 2 more variables: decimalLatitude <dbl>, decimalLongitude <dbl>
We can then specify that we wish to use these standardised data in a Darwin Core Archive with use_data()
. This saves df_dwc
with a valid file name and extension, and in a standardised location (a new directory called /data-publish
).
use_data(df_dwc)
Before publishing your data, it is also necessary to create a metadata statement that describes who owns the data, what the data shows, and what licence it is released under. galaxias
enables you to write your metadata statement in R Markdown or Quarto format, and seamlessly convert it to EML for publication.
# 1. Create a boilerplate file
use_metadata_template("metadata.Rmd")
# 2. Edit in your preferred IDE
# 3. Load into /data-publish as an EML file
use_metadata("metadata.Rmd")
The final step in your data publication workflow is to zip your directory into a single file. This file is placed in your parent directory.
build_archive(file = "my_biodiversity_data.zip")
You can share your data via any mechanism you wish, but galaxias
provides the submit_archive()
function to open a submission window for the Atlas of Living Australia.
Please see the Quick Start Guide for a more in-depth explanation of building Darwin Core Archives.
Citing galaxias
To generate a citation for the package version you are using, you can run:
citation(package = "galaxias")
The current recommended citation is:
Westgate MJ, Balasubramaniam S & Kellie D (2025) galaxias: Describe, Package, and Share Biodiversity Data. R Package version 0.1.0.
Contributors
Developers who have contributed to galaxias
are as follows (in alphabetical order by surname):
Amanda Buyan (@acbuyan), Fonti Kar (@fontikar), Peggy Newman (@peggynewman) & Andrew Schwenke (@andrew-1234)