set_events#

For this function, we are only checking the events dataframe. This function will check specifically for:

  • eventID: ID of all your events. Can be constructed, or can be generated by corella using uuid.

  • parentEventID: linked ID of all your events. Events are in a hierarchy, which we will discuss below.

  • eventType: what type of event is it (i.e. Survey, BioBlitz, Site Visit etc.)

  • eventDate: date of the event

  • Event: name of the event

  • samplingProtocol: how did you record your data (i.e. Observation, etc.)

Adding event-specific information#

For events, we need to specify all of the above information. First, we need to specify the event type. For the events file we have, this is under the heading type. We can also specify how all of the data was collected; however, rather than specifying a column name, we are going to specify the value of Observation. Finally, we need to specify the name of the event, which is the column headed name.

>>> my_dwca.set_events(eventType='type',
...                    samplingProtocol='Observation',
...                    Event='name')
>>> my_dwca.events.head()
Traceback (most recent call last):
  File "/Users/buy003/Documents/GitHub/galaxias-python/docs/source/galaxias_user_guide/longitudinal_studies/events_workflow.py", line 41, in <module>
    my_dwca.set_events(eventType='type',samplingProtocol='Observation',Event='name')
  File "/Users/buy003/anaconda3/envs/galaxias-dev/lib/python3.11/site-packages/galaxias/dwca_build.py", line 633, in set_events
    self.events = corella.set_events(dataframe=self.events,eventID=eventID,parentEventID=parentEventID,
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/buy003/anaconda3/envs/galaxias-dev/lib/python3.11/site-packages/corella/set_events.py", line 115, in set_events
    raise ValueError("Please provide column names for eventID and parentEventID.  Or, provide an event_hierarchy dictionary for automatic ID generation.")
ValueError: Please provide column names for eventID and parentEventID.  Or, provide an event_hierarchy dictionary for automatic ID generation.

The reason that the events file needs these IDs is because each event may have multiple things happening during it. For example, when you visit a site (highest order of event happening), you usually take what is termed a ‘sample’ (second highest order of event happening). After this, there are many activities that can constitute a ‘sample’:

  • take measurements of environment (ambient temperature, for example)

  • quantify species traits

  • observe what species are at the site

The way to quantify this in your events file is via an “Event Hierarchy”.

Event hierarchy example#

For our example, we are only concerned with observations, or what species were observed at the site. galaxias takes a dictionary with each event given a ranking: 1 for the highest ‘order’ event, 2 for the second highest event, and so on.

>>> my_dwca.set_events(eventType='type',
...                    samplingProtocol='Observation',
...                    Event='name',
...                    event_hierarchy={1: "Site Visit", 2: "Sample", 3: "Observation"},
...                    random_id=True)
>>> my_dwca.events.head()
                                eventID                         parentEventID    eventType    location       date                                            Event samplingProtocol
0  a63dcdc1-1ba9-47f7-a28a-e5fb41b0197b                                         Site Visit  Cannonvale   3/1/2023  bird survey local park honeyeater lookout point      Observation
1  6caddd7c-0fd3-4175-9bc2-b2ad42cf4b81  a63dcdc1-1ba9-47f7-a28a-e5fb41b0197b       Sample  Cannonvale   3/1/2023  bird survey local park honeyeater lookout point      Observation
2  23d52a6f-bfe6-4220-b507-3d69bbf3c23f  6caddd7c-0fd3-4175-9bc2-b2ad42cf4b81  Observation  Cannonvale   3/1/2023  bird survey local park honeyeater lookout point      Observation
3  9dc97ff5-96bb-448a-ab38-87d15621fc0b                                         Site Visit  Cannonvale  17/1/2023  bird survey local park honeyeater lookout point      Observation
4  4b744796-adf6-4d1c-9134-ed6d0482573b  9dc97ff5-96bb-448a-ab38-87d15621fc0b       Sample  Cannonvale  17/1/2023  bird survey local park honeyeater lookout point      Observation

what does check_dataset and suggest_workflow say now?#

Note: each of the set_* functions checks your data for compliance with the Darwin core standard, but it’s always good to double-check your data.

Now, we can check that our data column do comply with the Darwin Core standard.

>>> my_dwca.check_dataset()
  Number of Errors  Pass/Fail    Column name
------------------  -----------  ----------------
                 0  ✓            eventID
                 0  ✓            parentEventID
                 0  ✓            eventType
                 0  ✓            Event
                 0  ✓            samplingProtocol


══ Results ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════


Errors: 0 | Passes: 5

✗ Data does not meet minimum Darwin core requirements
Use corella.suggest_workflow()

However, since we don’t have all of the required columns, we can run suggest_workflow() again to see how our data is doing this time round.

>>> my_dwca.suggest_workflow()
── Darwin Core terms ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── All DwC terms ──

Matched 5 of 12 column names to DwC terms:

✓ Matched: eventID, parentEventID, eventType, Event, samplingProtocol
✗ Unmatched: Longitude, Species, Collection_date, number_birds, Latitude, location, date

── Minimum required DwC terms occurrences ──

Type                       Matched term(s)    Missing term(s)
-------------------------  -----------------  ------------------------------------------------
Identifier (at least one)  -                  occurrenceID OR catalogNumber OR recordNumber
Record type                -                  basisOfRecord
Scientific name            -                  scientificName
Location                   -                  decimalLatitude, decimalLongitude, geodeticDatum
Date/Time                  -                  eventDate
Associated event ID        -                  eventID

── Minimum required DwC terms events ──

Type                   Matched term(s)    Missing term(s)
---------------------  -----------------  -----------------
Identifier             eventID            -
Linking identifier     parentEventID      -
Type of Event          eventType          -
Name of Event          Event              -
How data was acquired  samplingProtocol   -
Date of Event          -                  eventDate

── Suggested workflow ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Occurrences ──

To make your occurrences Darwin Core compliant, use the following workflow:

corella.set_occurrences()
corella.set_scientific_name()
corella.set_coordinates()
corella.set_datetime()

Additional functions: set_abundance(), set_collection(), set_individual_traits(), set_license(), set_locality(), set_taxonomy()

── Events ──

To make your events Darwin Core compliant, use the following workflow:

corella.set_datetime()

Other functions#

To learn more about how to use these functions, go to

Optional functions:

Passing Dataset: