set_occurrences#

One of the functions you can use to check certain columns of your data is set_occurrences(). This function aims to check that you have the following Darwin Core Vocabulary Terms:

  • basisOfRecord: how the occurrence was recorded (was it observed by a human? machine? is it part of a collection?)

  • occurrenceID or catalogNumber or recordNumber: a unique identifier for the record (only one of these is necessary)

  • occurrenceStatus (OPTIONAL): whether a species is present or absent. Not required for data submission.

Specifying basisOfRecord value#

As mentioned above, the basisOfRecord value is a required and important field for an observation, as it lets others know how the record was recorded. For example, was it a machine that observed it? A human? Is this a specimen that’s part of a collection?

Depending on your answer to these questions, the information you provide will differ. Luckily, Darwin Core has a predefined vocabulary to help you with this, and galaxias will tell you what this vocab is with the following function:

>>> galaxias.basisOfRecord_values()
  basisOfRecord values
0     humanObservation
1   machineObservation
2       livingSpecimen
3    preservedSpecimen
4       fossilSpecimen
5     materialCitation

For this exercise, let’s assume a human has seen these, which equates to a value of HumanObservation. We can then set the basisOfRecord argument as HumanObservation, and it will, by default, set the value of basisOfRecord for the whole dataframe.

>>> my_archive.set_occurrences(
...     basisOfRecord='HumanObservation'
... )
>>> my_archive.occurrences.head()
                   Species  Latitude  Longitude Collection_date     basisOfRecord
0       Corymbia latifolia    -13.04     131.07       29/3/2022  HumanObservation
1     Eucalyptus tectifica    -13.04     131.07       13/9/2022  HumanObservation
2           Banksia aemula    -33.60     150.72       15/8/2022  HumanObservation
3  Eucalyptus sclerophylla    -33.60     150.72       16/6/2022  HumanObservation
4        Persoonia laurina    -33.60     150.72      19/10/2022  HumanObservation

How to generate occurrence IDs#

Note

  • If you have occurrence IDs already in your dataset, you can specify the name of the column

that contains your IDs, and galaxias will rename that column to comply with the Darwin Core Vocabulary Standard. - catalogNumber and / or recordNumber is normally used for collections, so it is best to go with occurrenceID if you’re generating them using galaxias.

Every occurrence needs a unique identifier for easy future identification. If your occurences don’t have either an occurrenceID, catalogNumber or recordNumber, you can provide a value of True to the occurrenceID. You will then have to further specify whether or not you want a randomly generated UUID for each occurrence (random_id), composite IDs (composite_id) or sequential IDs (sequential_id). The example used here will be random; however, you can see a vignette HERE all about generating IDs.

>>> my_archive.set_occurrences(
...     basisOfRecord='HumanObservation',
...     occurrenceID=True,
...     random_id=True
... )
>>> my_archive.occurrences.head()
                           occurrenceID                  Species  Latitude  Longitude Collection_date     basisOfRecord
0  0b0ef912-52c6-481d-8a10-ddbd7dbf97b6       Corymbia latifolia    -13.04     131.07       29/3/2022  HumanObservation
1  c0f38a6b-9f7f-44f5-be10-302de750083c     Eucalyptus tectifica    -13.04     131.07       13/9/2022  HumanObservation
2  7b4dba12-dba3-4be7-b761-803c8c0a33b6           Banksia aemula    -33.60     150.72       15/8/2022  HumanObservation
3  5cd50060-7e66-4ddf-b585-e62479b33540  Eucalyptus sclerophylla    -33.60     150.72       16/6/2022  HumanObservation
4  0e00432a-6122-481f-a12d-d4b0a9c65b10        Persoonia laurina    -33.60     150.72      19/10/2022  HumanObservation

specify occurrenceStatus column#

Note

This is an optional field, but we are including it here to share how this argument works, and how this will rename your column

Sometimes, you may want to include the occurrenceStatus field in your observations, especially if you were expecting to see a species in a particular area, and/or have seen them in the past but did not see them on that particular day, you can include this to say they were absent.

Since we have a column that denotes whether or not a species was present or absent, we can provide the name of that column, and galaxias will rename the column to conform with the Darwin Core standard.

>>> my_archive.set_occurrences(
...     basisOfRecord='HumanObservation',
...     occurrenceStatus='PRESENT'
... )
>>> my_archive.occurrences.head()
                           occurrenceID                  Species  Latitude  Longitude Collection_date     basisOfRecord occurrenceStatus
0  05a20950-691d-4c82-9e8c-e67c37c0683b       Corymbia latifolia    -13.04     131.07       29/3/2022  HumanObservation          PRESENT
1  956f448f-e583-47df-bb8f-55c5d8d351c7     Eucalyptus tectifica    -13.04     131.07       13/9/2022  HumanObservation          PRESENT
2  bd86192c-d530-4cd5-89f8-025127dc70bf           Banksia aemula    -33.60     150.72       15/8/2022  HumanObservation          PRESENT
3  26000299-5928-4419-b6c7-6d6803a7f442  Eucalyptus sclerophylla    -33.60     150.72       16/6/2022  HumanObservation          PRESENT
4  d49b3c32-6921-4f21-9f28-b919327ce90b        Persoonia laurina    -33.60     150.72      19/10/2022  HumanObservation          PRESENT

what does check_data and suggest_workflow say now?#

Note

each of the set_* functions checks your data for compliance with the Darwin core standard, but it’s always good to double-check your data.

Now that we’ve taken care of the pieces of information set_occurrences() is responsible for, we can assign the new dataframe to a variable:

>>> occ = my_archive.set_occurrences(
...     basisOfRecord='HumanObservation',
...     occurrenceStatus='status',
...     occurrenceID=True
... )

Now, we can check that this new dataframe complies with the Darwin Core standard for the basisOfRecord, occurrenceStatus, occurrenceID, catalogNumber and recordNumber columns.

>>> my_archive.check_dataset()
  Number of Errors  Pass/Fail    Column name
------------------  -----------  ----------------
                 0  ✓            occurrenceID
                 0  ✓            basisOfRecord
                 0  ✓            occurrenceStatus


══ Results ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════


Errors: 0 | Passes: 3

✗ Data does not meet minimum Darwin core requirements
Use corella.suggest_workflow()

However, since we don’t have all of the required columns, we can run suggest_workflow() again to see what other functions we can use to check our data:

>>> my_archive.suggest_workflow()
── Darwin Core terms ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── All DwC terms ──

Matched 3 of 7 column names to DwC terms:

✓ Matched: occurrenceID, basisOfRecord, occurrenceStatus
✗ Unmatched: Species, Latitude, Collection_date, Longitude

── Minimum required DwC terms occurrences ──

Type                       Matched term(s)    Missing term(s)
-------------------------  -----------------  ------------------------------------------------
Identifier (at least one)  occurrenceID       -
Record type                basisOfRecord      -
Scientific name            -                  scientificName
Location                   -                  decimalLatitude, decimalLongitude, geodeticDatum
Date/Time                  -                  eventDate

── Suggested workflow ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Occurrences ──

To make your occurrences Darwin Core compliant, use the following workflow:

corella.set_scientific_name()
corella.set_coordinates()
corella.set_datetime()

Additional functions: set_abundance(), set_collection(), set_individual_traits(), set_license(), set_locality(), set_taxonomy()

Other functions#

To learn more about how to use other functions, go to

Optional functions:

Creating Unique IDs:

Passing Dataset: