set_occurrences#

One of the functions you can use to check certain columns of your data is set_occurrences(). This function aims to check that you have the following Darwin Core Vocabulary Terms:

basisOfRecord: how the occurrence was recorded (was it observed by a human? machine? is it part of a collection?)
occurrenceID or catalogNumber or recordNumber: a unique identifier for the record (only one of these is necessary)
occurrenceStatus (OPTIONAL): whether a species is present or absent. Not required for data submission.

Specifying `basisOfRecord` value#

As mentioned above, the basisOfRecord value is a required and important field for an observation, as it lets others know how the record was recorded. For example, was it a machine that observed it? A human? Is this a specimen that’s part of a collection?

Depending on your answer to these questions, the information you provide will differ. Luckily, Darwin Core has a predefined vocabulary to help you with this, and galaxias will tell you what this vocab is with the following function:

>>> galaxias.basisOfRecord_values()

  basisOfRecord values
   humanObservation
 machineObservation
     livingSpecimen
  preservedSpecimen
     fossilSpecimen
   materialCitation

For this exercise, let’s assume a human has seen these, which equates to a value of HumanObservation. We can then set the basisOfRecord argument as HumanObservation, and it will, by default, set the value of basisOfRecord for the whole dataframe.

>>> my_archive.set_occurrences(
...     basisOfRecord='HumanObservation'
... )
>>> my_archive.occurrences.head()

                   Species  Latitude  Longitude Collection_date     basisOfRecord
     Corymbia latifolia    -13.04     131.07       29/3/2022  HumanObservation
   Eucalyptus tectifica    -13.04     131.07       13/9/2022  HumanObservation
         Banksia aemula    -33.60     150.72       15/8/2022  HumanObservation
Eucalyptus sclerophylla    -33.60     150.72       16/6/2022  HumanObservation
      Persoonia laurina    -33.60     150.72      19/10/2022  HumanObservation

How to generate occurrence IDs#

Note

If you have occurrence IDs already in your dataset, you can specify the name of the column

that contains your IDs, and galaxias will rename that column to comply with the Darwin Core Vocabulary Standard. - catalogNumber and / or recordNumber is normally used for collections, so it is best to go with occurrenceID if you’re generating them using galaxias.

Every occurrence needs a unique identifier for easy future identification. If your occurences don’t have either an occurrenceID, catalogNumber or recordNumber, you can provide a value of True to the occurrenceID. You will then have to further specify whether or not you want a randomly generated UUID for each occurrence (random_id), composite IDs (composite_id) or sequential IDs (sequential_id). The example used here will be random; however, you can see a vignette HERE all about generating IDs.

>>> my_archive.set_occurrences(
...     basisOfRecord='HumanObservation',
...     occurrenceID=True,
...     random_id=True
... )
>>> my_archive.occurrences.head()

                           occurrenceID                  Species  Latitude  Longitude Collection_date     basisOfRecord
0b0ef912-52c6-481d-8a10-ddbd7dbf97b6       Corymbia latifolia    -13.04     131.07       29/3/2022  HumanObservation
c0f38a6b-9f7f-44f5-be10-302de750083c     Eucalyptus tectifica    -13.04     131.07       13/9/2022  HumanObservation
7b4dba12-dba3-4be7-b761-803c8c0a33b6           Banksia aemula    -33.60     150.72       15/8/2022  HumanObservation
5cd50060-7e66-4ddf-b585-e62479b33540  Eucalyptus sclerophylla    -33.60     150.72       16/6/2022  HumanObservation
0e00432a-6122-481f-a12d-d4b0a9c65b10        Persoonia laurina    -33.60     150.72      19/10/2022  HumanObservation

specify `occurrenceStatus` column#

Note

This is an optional field, but we are including it here to share how this argument works, and how this will rename your column

Sometimes, you may want to include the occurrenceStatus field in your observations, especially if you were expecting to see a species in a particular area, and/or have seen them in the past but did not see them on that particular day, you can include this to say they were absent.

Since we have a column that denotes whether or not a species was present or absent, we can provide the name of that column, and galaxias will rename the column to conform with the Darwin Core standard.

>>> my_archive.set_occurrences(
...     basisOfRecord='HumanObservation',
...     occurrenceStatus='PRESENT'
... )
>>> my_archive.occurrences.head()

                           occurrenceID                  Species  Latitude  Longitude Collection_date     basisOfRecord occurrenceStatus
05a20950-691d-4c82-9e8c-e67c37c0683b       Corymbia latifolia    -13.04     131.07       29/3/2022  HumanObservation          PRESENT
956f448f-e583-47df-bb8f-55c5d8d351c7     Eucalyptus tectifica    -13.04     131.07       13/9/2022  HumanObservation          PRESENT
bd86192c-d530-4cd5-89f8-025127dc70bf           Banksia aemula    -33.60     150.72       15/8/2022  HumanObservation          PRESENT
26000299-5928-4419-b6c7-6d6803a7f442  Eucalyptus sclerophylla    -33.60     150.72       16/6/2022  HumanObservation          PRESENT
d49b3c32-6921-4f21-9f28-b919327ce90b        Persoonia laurina    -33.60     150.72      19/10/2022  HumanObservation          PRESENT

what does `check_data` and `suggest_workflow` say now?#

Note

each of the set_* functions checks your data for compliance with the Darwin core standard, but it’s always good to double-check your data.

Now that we’ve taken care of the pieces of information set_occurrences() is responsible for, we can assign the new dataframe to a variable:

>>> occ = my_archive.set_occurrences(
...     basisOfRecord='HumanObservation',
...     occurrenceStatus='status',
...     occurrenceID=True
... )

Now, we can check that this new dataframe complies with the Darwin Core standard for the basisOfRecord, occurrenceStatus, occurrenceID, catalogNumber and recordNumber columns.

>>> my_archive.check_dataset()

  Number of Errors  Pass/Fail    Column name
------------------  -----------  ----------------
                 0  ✓            occurrenceID
                 0  ✓            basisOfRecord
                 0  ✓            occurrenceStatus


══ Results ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════


Errors: 0 | Passes: 3

✗ Data does not meet minimum Darwin core requirements
Use corella.suggest_workflow()

However, since we don’t have all of the required columns, we can run suggest_workflow() again to see what other functions we can use to check our data:

>>> my_archive.suggest_workflow()

── Darwin Core terms ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── All DwC terms ──

Matched 3 of 7 column names to DwC terms:

✓ Matched: occurrenceID, basisOfRecord, occurrenceStatus
✗ Unmatched: Species, Latitude, Collection_date, Longitude

── Minimum required DwC terms occurrences ──

Type                       Matched term(s)    Missing term(s)
-------------------------  -----------------  ------------------------------------------------
Identifier (at least one)  occurrenceID       -
Record type                basisOfRecord      -
Scientific name            -                  scientificName
Location                   -                  decimalLatitude, decimalLongitude, geodeticDatum
Date/Time                  -                  eventDate

── Suggested workflow ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Occurrences ──

To make your occurrences Darwin Core compliant, use the following workflow:

corella.set_scientific_name()
corella.set_coordinates()
corella.set_datetime()

Additional functions: set_abundance(), set_collection(), set_individual_traits(), set_license(), set_locality(), set_taxonomy()

Other functions#

To learn more about how to use other functions, go to

Optional functions:

Creating Unique IDs:

Creating Unique IDs for your Occurrences

Passing Dataset:

Passing Dataset

set_occurrences#

Specifying basisOfRecord value#

How to generate occurrence IDs#

specify occurrenceStatus column#

what does check_data and suggest_workflow say now?#

Other functions#

Specifying `basisOfRecord` value#

specify `occurrenceStatus` column#

what does `check_data` and `suggest_workflow` say now?#