Initial Data Check Events#
When you’re ready to start submitting your data, there are a number of things to check to ensure the ingestion process into the ALA is smooth. Some of this is ensuring that your column names conform to Darwin Core Vocabulary standards, and that your data is in the correct format (i.e. numerical columns are actually numerical).
*Note: the
For these examples, we will be using the the dataset linked in the homepage. If, however, you want to go through this workflow using your own data, please feel free to do so!
To read in the data you want to use, you’re going to use pandas
to read in the csv file as a table.
>>> import corella
>>> import pandas as pd
>>> occ = pd.read_csv('<NAME_OF_OCCURRENCES>.csv')
>>> events = pd.read_csv('<NAME_OF_EVENTS>.csv')
>>> my_dwca.check_data(occurrences=occ,
... events=events)
Number of Errors Pass/Fail Column name
------------------ ----------- -------------
══ Results ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
Errors: 0.0 | Passes: 0
✗ Data does not meet minimum Darwin core requirements
Use corella.suggest_workflow()
For our initial data example, the data tests may not be showing any errors, but
unfortunately, this means no column names were checked. This is because the names
of the columns are not part of the standard Darwin Core Vocabulary. Thankfully,
we have created a series of functions that can help you get your data into the
Darwin Core standard. To show the functions corella
contains that can help you
do this, we have developed an all-purpose function called suggest_workflow()
. Here
are the results of this particular dataset:
>>> my_dwca.suggest_workflow(occurrences=occ,
... events=events)
── Darwin Core terms ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
── All DwC terms ──
Matched 0 of 9 column names to DwC terms:
✓ Matched:
✗ Unmatched: Collection_date, Latitude, number_birds, Longitude, Species, name, location, date, type
── Minimum required DwC terms occurrences ──
Type Matched term(s) Missing term(s)
------------------------- ----------------- ------------------------------------------------
Identifier (at least one) - occurrenceID OR catalogNumber OR recordNumber
Record type - basisOfRecord
Scientific name - scientificName
Location - decimalLatitude, decimalLongitude, geodeticDatum
Date/Time - eventDate
Associated event ID - eventID
── Minimum required DwC terms events ──
Type Matched term(s) Missing term(s)
--------------------- ----------------- -----------------
Identifier - eventID
Linking identifier - parentEventID
Type of Event - eventType
Name of Event - Event
How data was acquired - samplingProtocol
Date of Event - eventDate
── Suggested workflow ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
── Occurrences ──
To make your occurrences Darwin Core compliant, use the following workflow:
corella.set_occurrences()
corella.set_scientific_name()
corella.set_coordinates()
corella.set_datetime()
Additional functions: set_abundance(), set_collection(), set_individual_traits(), set_license(), set_locality(), set_taxonomy()
── Events ──
To make your events Darwin Core compliant, use the following workflow:
corella.set_events()
corella.set_datetime()
To learn more about how to use these functions, go to