set_events#
For this function, we are only checking the events
dataframe. This function will check specifically for:
eventID
: ID of all your events. Can be constructed, or can be generated bycorella
usinguuid
.parentEventID
: linked ID of all your events. Events are in a hierarchy, which we will discuss below.eventType
: what type of event is it (i.e. Survey, BioBlitz, Site Visit etc.)eventDate
: date of the eventEvent
: name of the eventsamplingProtocol
: how did you record your data (i.e. Observation, etc.)
Adding event-specific information#
For events, we need to specify all of the above information. First, we need to specify the event type. For
the events file we have, this is under the heading type
. We can also specify how all of the data was collected;
however, rather than specifying a column name, we are going to specify the value of Observation
. Finally,
we need to specify the name of the event, which is the column headed name
.
>>> my_dwca.set_events(eventType='type',
... samplingProtocol='Observation',
... Event='name')
>>> my_dwca.events.head()
Traceback (most recent call last):
File "/Users/buy003/Documents/GitHub/galaxias-python/docs/source/galaxias_user_guide/longitudinal_studies/events_workflow.py", line 41, in <module>
my_dwca.set_events(eventType='type',samplingProtocol='Observation',Event='name')
File "/Users/buy003/anaconda3/envs/galaxias-dev/lib/python3.11/site-packages/galaxias/dwca_build.py", line 633, in set_events
self.events = corella.set_events(dataframe=self.events,eventID=eventID,parentEventID=parentEventID,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/buy003/anaconda3/envs/galaxias-dev/lib/python3.11/site-packages/corella/set_events.py", line 115, in set_events
raise ValueError("Please provide column names for eventID and parentEventID. Or, provide an event_hierarchy dictionary for automatic ID generation.")
ValueError: Please provide column names for eventID and parentEventID. Or, provide an event_hierarchy dictionary for automatic ID generation.
The reason that the events
file needs these IDs is because each event may have multiple things happening
during it. For example, when you visit a site (highest order of event happening), you usually take what is
termed a ‘sample’ (second highest order of event happening). After this, there are many activities that can
constitute a ‘sample’:
take measurements of environment (ambient temperature, for example)
quantify species traits
observe what species are at the site
The way to quantify this in your events
file is via an “Event Hierarchy”.
Event hierarchy example#
For our example, we are only concerned with observations, or what species were observed at the site.
galaxias
takes a dictionary with each event given a ranking: 1
for the highest ‘order’ event,
2
for the second highest event, and so on.
>>> my_dwca.set_events(eventType='type',
... samplingProtocol='Observation',
... Event='name',
... event_hierarchy={1: "Site Visit", 2: "Sample", 3: "Observation"},
... random_id=True)
>>> my_dwca.events.head()
eventID parentEventID eventType location date Event samplingProtocol
0 a63dcdc1-1ba9-47f7-a28a-e5fb41b0197b Site Visit Cannonvale 3/1/2023 bird survey local park honeyeater lookout point Observation
1 6caddd7c-0fd3-4175-9bc2-b2ad42cf4b81 a63dcdc1-1ba9-47f7-a28a-e5fb41b0197b Sample Cannonvale 3/1/2023 bird survey local park honeyeater lookout point Observation
2 23d52a6f-bfe6-4220-b507-3d69bbf3c23f 6caddd7c-0fd3-4175-9bc2-b2ad42cf4b81 Observation Cannonvale 3/1/2023 bird survey local park honeyeater lookout point Observation
3 9dc97ff5-96bb-448a-ab38-87d15621fc0b Site Visit Cannonvale 17/1/2023 bird survey local park honeyeater lookout point Observation
4 4b744796-adf6-4d1c-9134-ed6d0482573b 9dc97ff5-96bb-448a-ab38-87d15621fc0b Sample Cannonvale 17/1/2023 bird survey local park honeyeater lookout point Observation
what does check_dataset
and suggest_workflow
say now?#
Note: each of the set_*
functions checks your data for compliance with the
Darwin core standard, but it’s always good to double-check your data.
Now, we can check that our data column do comply with the Darwin Core standard.
>>> my_dwca.check_dataset()
Number of Errors Pass/Fail Column name
------------------ ----------- ----------------
0 ✓ eventID
0 ✓ parentEventID
0 ✓ eventType
0 ✓ Event
0 ✓ samplingProtocol
══ Results ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
Errors: 0 | Passes: 5
✗ Data does not meet minimum Darwin core requirements
Use corella.suggest_workflow()
However, since we don’t have all of the required columns, we can run suggest_workflow()
again to see how our data is doing this time round.
>>> my_dwca.suggest_workflow()
── Darwin Core terms ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
── All DwC terms ──
Matched 5 of 12 column names to DwC terms:
✓ Matched: eventID, parentEventID, eventType, Event, samplingProtocol
✗ Unmatched: Longitude, Species, Collection_date, number_birds, Latitude, location, date
── Minimum required DwC terms occurrences ──
Type Matched term(s) Missing term(s)
------------------------- ----------------- ------------------------------------------------
Identifier (at least one) - occurrenceID OR catalogNumber OR recordNumber
Record type - basisOfRecord
Scientific name - scientificName
Location - decimalLatitude, decimalLongitude, geodeticDatum
Date/Time - eventDate
Associated event ID - eventID
── Minimum required DwC terms events ──
Type Matched term(s) Missing term(s)
--------------------- ----------------- -----------------
Identifier eventID -
Linking identifier parentEventID -
Type of Event eventType -
Name of Event Event -
How data was acquired samplingProtocol -
Date of Event - eventDate
── Suggested workflow ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
── Occurrences ──
To make your occurrences Darwin Core compliant, use the following workflow:
corella.set_occurrences()
corella.set_scientific_name()
corella.set_coordinates()
corella.set_datetime()
Additional functions: set_abundance(), set_collection(), set_individual_traits(), set_license(), set_locality(), set_taxonomy()
── Events ──
To make your events Darwin Core compliant, use the following workflow:
corella.set_datetime()
Other functions#
To learn more about how to use these functions, go to
Optional functions:
Passing Dataset: