set_datetime#

One of the functions you can use to check certain columns of your data is set_datetime(). This function aims to check that you have the following Darwin Core Vocabulary Terms:

  • eventDate: the date of your observation

It can also (optionally) can check the following:

  • eventTime: year of your observation

  • year: year of your observation

  • month: year of your observation

  • day: year of your observation

eventDate and automatically converting strings#

Since we can specify the column names, we can specify the eventDate column to be 'date'.

>>> my_dwca.set_datetime(dataframe=occ,eventDate='date')
>>> my_dwca.occurrences.head()
Traceback (most recent call last):
  File "/Users/buy003/Documents/GitHub/galaxias-python/docs/source/galaxias_user_guide/independent_observations/data_cleaning.py", line 113, in <module>
    my_dwca.set_datetime(eventDate='Collection_date')
  File "/Users/buy003/anaconda3/envs/galaxias-dev/lib/python3.11/site-packages/galaxias/dwca_build.py", line 563, in set_datetime
    self.occurrences = corella.set_datetime(dataframe=self.occurrences,eventDate=eventDate,year=year,month=month,
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/buy003/anaconda3/envs/galaxias-dev/lib/python3.11/site-packages/corella/set_datetime.py", line 101, in set_datetime
    raise ValueError("There are some errors in your data.  They are as follows:\n\n{}".format('\n'.join(errors)))
ValueError: There are some errors in your data.  They are as follows:

the eventDate column must be in datetime format.

We get an error here because set_datetime() requires the eventDate column to be in a datetime format. This is to make sure the date is formatted correctly. Luckily, set_datetime() has a few arguments that will convert dates in strings to datetime format.

  • string_to_datetime: when this is set to True, will convert any strings in the eventDate column to datetime objects.

  • yearfirst: when this is set to True, galaxias (and pandas) assumes your date starts with the year.

  • dayfirst: when this is set to True, galaxias (and pandas) assumes your date starts with the day.

Note when both yearfirst and dayfirst are set to False, pandas assumes month is first.

>>> my_dwca.set_datetime(eventDate='date',
...                      string_to_datetime=True,
...                      yearfirst=False,
...                      dayfirst=True)
>>> my_dwca.occurrences.head()
                   Species  Latitude  Longitude  eventDate
0       Corymbia latifolia    -13.04     131.07 2022-03-29
1     Eucalyptus tectifica    -13.04     131.07 2022-09-13
2           Banksia aemula    -33.60     150.72 2022-08-15
3  Eucalyptus sclerophylla    -33.60     150.72 2022-06-16
4        Persoonia laurina    -33.60     150.72 2022-10-19

what does check_data and suggest_workflow say now?#

Note

Each of the set_* functions checks your data for compliance with the Darwin core standard, but it’s always good to double-check your data.

Now, we can check that our data column do comply with the Darwin Core standard.

>>> my_dwca.check_data()
  Number of Errors  Pass/Fail    Column name
------------------  -----------  -------------
                 0  ✓            eventDate


══ Results ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════


Errors: 0 | Passes: 1

✗ Data does not meet minimum Darwin core requirements
Use corella.suggest_workflow()

None

However, since we don’t have all of the required columns, we can run suggest_workflow() again to see how our data is doing this time round.

>>> my_dwca.suggest_workflow()
── Darwin Core terms ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── All DwC terms ──

Matched 1 of 4 column names to DwC terms:

✓ Matched: eventDate
✗ Unmatched: Species, Latitude, Longitude

── Minimum required DwC terms occurrences ──

Type                       Matched term(s)    Missing term(s)
-------------------------  -----------------  ------------------------------------------------
Identifier (at least one)  -                  occurrenceID OR catalogNumber OR recordNumber
Record type                -                  basisOfRecord
Scientific name            -                  scientificName
Location                   -                  decimalLatitude, decimalLongitude, geodeticDatum
Date/Time                  eventDate          -

── Suggested workflow ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Occurrences ──

To make your occurrences Darwin Core compliant, use the following workflow:

corella.set_occurrences()
corella.set_scientific_name()
corella.set_coordinates()

Additional functions: set_abundance(), set_collection(), set_individual_traits(), set_license(), set_locality(), set_taxonomy()
None

Other functions#

To learn more about how to use other functions, go to

Optional functions:

Creating Unique IDs:

Passing Dataset: