set_datetime#
One of the functions you can use to check certain columns of your data is set_datetime()
.
This function aims to check that you have the following Darwin Core Vocabulary Terms:
eventDate
: the date of your observation
It can also (optionally) can check the following:
eventTime
: year of your observationyear
: year of your observationmonth
: year of your observationday
: year of your observation
eventDate
and automatically converting strings#
Since we can specify the column names, we can specify the eventDate
column to be 'date'
.
>>> my_dwca.set_datetime(dataframe=occ,eventDate='date')
>>> my_dwca.occurrences.head()
Traceback (most recent call last):
File "/Users/buy003/Documents/GitHub/galaxias-python/docs/source/galaxias_user_guide/independent_observations/data_cleaning.py", line 113, in <module>
my_dwca.set_datetime(eventDate='Collection_date')
File "/Users/buy003/anaconda3/envs/galaxias-dev/lib/python3.11/site-packages/galaxias/dwca_build.py", line 563, in set_datetime
self.occurrences = corella.set_datetime(dataframe=self.occurrences,eventDate=eventDate,year=year,month=month,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/buy003/anaconda3/envs/galaxias-dev/lib/python3.11/site-packages/corella/set_datetime.py", line 101, in set_datetime
raise ValueError("There are some errors in your data. They are as follows:\n\n{}".format('\n'.join(errors)))
ValueError: There are some errors in your data. They are as follows:
the eventDate column must be in datetime format.
We get an error here because set_datetime()
requires the eventDate
column to be in a datetime
format. This is to make sure the date is formatted correctly. Luckily, set_datetime()
has a few
arguments that will convert dates in strings to datetime
format.
string_to_datetime
: when this is set toTrue
, will convert any strings in theeventDate
column todatetime
objects.yearfirst
: when this is set toTrue
,galaxias
(andpandas
) assumes your date starts with the year.dayfirst
: when this is set toTrue
,galaxias
(andpandas
) assumes your date starts with the day.
Note when both yearfirst
and dayfirst
are set to False
, pandas
assumes month is first.
>>> my_dwca.set_datetime(eventDate='date',
... string_to_datetime=True,
... yearfirst=False,
... dayfirst=True)
>>> my_dwca.occurrences.head()
Species Latitude Longitude eventDate
0 Corymbia latifolia -13.04 131.07 2022-03-29
1 Eucalyptus tectifica -13.04 131.07 2022-09-13
2 Banksia aemula -33.60 150.72 2022-08-15
3 Eucalyptus sclerophylla -33.60 150.72 2022-06-16
4 Persoonia laurina -33.60 150.72 2022-10-19
what does check_data
and suggest_workflow
say now?#
Note
Each of the set_*
functions checks your data for compliance with the
Darwin core standard, but it’s always good to double-check your data.
Now, we can check that our data column do comply with the Darwin Core standard.
>>> my_dwca.check_data()
Number of Errors Pass/Fail Column name
------------------ ----------- -------------
0 ✓ eventDate
══ Results ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
Errors: 0 | Passes: 1
✗ Data does not meet minimum Darwin core requirements
Use corella.suggest_workflow()
None
However, since we don’t have all of the required columns, we can run suggest_workflow()
again to see how our data is doing this time round.
>>> my_dwca.suggest_workflow()
── Darwin Core terms ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
── All DwC terms ──
Matched 1 of 4 column names to DwC terms:
✓ Matched: eventDate
✗ Unmatched: Species, Latitude, Longitude
── Minimum required DwC terms occurrences ──
Type Matched term(s) Missing term(s)
------------------------- ----------------- ------------------------------------------------
Identifier (at least one) - occurrenceID OR catalogNumber OR recordNumber
Record type - basisOfRecord
Scientific name - scientificName
Location - decimalLatitude, decimalLongitude, geodeticDatum
Date/Time eventDate -
── Suggested workflow ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
── Occurrences ──
To make your occurrences Darwin Core compliant, use the following workflow:
corella.set_occurrences()
corella.set_scientific_name()
corella.set_coordinates()
Additional functions: set_abundance(), set_collection(), set_individual_traits(), set_license(), set_locality(), set_taxonomy()
None
Other functions#
To learn more about how to use other functions, go to
Optional functions:
Creating Unique IDs:
Passing Dataset: