Initial Data Check Events#

When you’re ready to start submitting your data, there are a number of things to check to ensure the ingestion process into the ALA is smooth. Some of this is ensuring that your column names conform to Darwin Core Vocabulary standards, and that your data is in the correct format (i.e. numerical columns are actually numerical).

*Note: the

For these examples, we will be using the the dataset linked in the homepage. If, however, you want to go through this workflow using your own data, please feel free to do so!

To read in the data you want to use, you’re going to use pandas to read in the csv file as a table.

>>> import corella
>>> import pandas as pd
>>> occ = pd.read_csv('<NAME_OF_OCCURRENCES>.csv')
>>> events = pd.read_csv('<NAME_OF_EVENTS>.csv')
>>> my_dwca.check_data(occurrences=occ,
...                    events=events)
Number of Errors    Pass/Fail    Column name
------------------  -----------  -------------


══ Results ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════


Errors: 0.0 | Passes: 0

✗ Data does not meet minimum Darwin core requirements
Use corella.suggest_workflow()

For our initial data example, the data tests may not be showing any errors, but unfortunately, this means no column names were checked. This is because the names of the columns are not part of the standard Darwin Core Vocabulary. Thankfully, we have created a series of functions that can help you get your data into the Darwin Core standard. To show the functions corella contains that can help you do this, we have developed an all-purpose function called suggest_workflow(). Here are the results of this particular dataset:

>>> my_dwca.suggest_workflow(occurrences=occ,
...                          events=events)
── Darwin Core terms ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── All DwC terms ──

Matched 0 of 9 column names to DwC terms:

✓ Matched: 
✗ Unmatched: Collection_date, Latitude, number_birds, Longitude, Species, name, location, date, type

── Minimum required DwC terms occurrences ──

Type                       Matched term(s)    Missing term(s)
-------------------------  -----------------  ------------------------------------------------
Identifier (at least one)  -                  occurrenceID OR catalogNumber OR recordNumber
Record type                -                  basisOfRecord
Scientific name            -                  scientificName
Location                   -                  decimalLatitude, decimalLongitude, geodeticDatum
Date/Time                  -                  eventDate
Associated event ID        -                  eventID

── Minimum required DwC terms events ──

Type                   Matched term(s)    Missing term(s)
---------------------  -----------------  -----------------
Identifier             -                  eventID
Linking identifier     -                  parentEventID
Type of Event          -                  eventType
Name of Event          -                  Event
How data was acquired  -                  samplingProtocol
Date of Event          -                  eventDate

── Suggested workflow ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Occurrences ──

To make your occurrences Darwin Core compliant, use the following workflow:

corella.set_occurrences()
corella.set_scientific_name()
corella.set_coordinates()
corella.set_datetime()

Additional functions: set_abundance(), set_collection(), set_individual_traits(), set_license(), set_locality(), set_taxonomy()

── Events ──

To make your events Darwin Core compliant, use the following workflow:

corella.set_events()
corella.set_datetime()

To learn more about how to use these functions, go to