set_coordinates#
One of the functions you can use to check certain columns of your data is set_coordinates()
.
This function aims to check that you have the following Darwin Core Vocabulary Terms:
decimalLatitude
: the latitude of your observationdecimalLongitude
: the latitude of your observationgeodeticDatum
: the coordinate reference system (CRS) of your latitude and longitude
It can also (optionally) can check the following:
coordinateUncertaintyInMeters
: uncertainty of your measurements in meterscoordinatePrecision
: uncertainty of your measurements in decimal degrees
Specifying decimalLatitude
and decimalLongitude
#
Since we have latitude and longitude columns, we can specify them in the
set_coordinates()
function, and the columns will be renamed and the
values checked to see if i) they are numeric; and ii) if they are in the
correct ranges.
>>> my_dwca.set_coordinates(decimalLatitude='Latitude',
... decimalLongitude='Longitude')
>>> my_dwca.occurrences.head()
Species decimalLatitude decimalLongitude Collection_date
0 Corymbia latifolia -13.04 131.07 29/3/2022
1 Eucalyptus tectifica -13.04 131.07 13/9/2022
2 Banksia aemula -33.60 150.72 15/8/2022
3 Eucalyptus sclerophylla -33.60 150.72 16/6/2022
4 Persoonia laurina -33.60 150.72 19/10/2022
Note
Non-numeric values
If you get an error saying some latitude and longitude values are not numeric,
that’s ok! Luckily, pandas
has a function called to_numeric
which will
convert strings to numeric values for you (assuming those strings are numbers).
Below is an example of how to convert a column to all numeric values. Once this
is completed, you can use the command above.
>>> my_dwca.occurrences['latitude'] = pd.to_numeric(my_dwca.occurrences['latitude'],errors='coerce')
geodeticDatum
#
Another required field is called geodeticDatum
. This column is required as
it lets others know how you measured latitude and longitude. geodeticDatum
refers to a Coordinate Reference System (CRS), which is how three-dimensional
coordinates are represented on a two-dimensional surface. The most common CRS
(and what GPSs, as well as the ALA, uses) is called WGS84. If you know that this
is the CRS you have used, you can set the default value of geodeticDatum
in
set_coordinates()
.
>>> my_dwca.set_coordinates(decimalLatitude='Latitude',
... decimalLongitude='Longitude',
... geodeticDatum='WGS84')
>>> my_dwca.occurrences.head()
Species decimalLatitude decimalLongitude Collection_date geodeticDatum
0 Corymbia latifolia -13.04 131.07 29/3/2022 WGS84
1 Eucalyptus tectifica -13.04 131.07 13/9/2022 WGS84
2 Banksia aemula -33.60 150.72 15/8/2022 WGS84
3 Eucalyptus sclerophylla -33.60 150.72 16/6/2022 WGS84
4 Persoonia laurina -33.60 150.72 19/10/2022 WGS84
Adding Uncertainty#
There is always uncertainty in measurements of and longitude; however,
sometimes it is useful to include this, especially if you know the uncertainty of
your instruments or measurements. If you know this information and want to include
it, you can specify a default value (similar to the geodeticDatum
column above)
to either coordinatePrecision
or coordinateUncertaintyInMeters
. The former is
in decimal degrees, and the latter is in meters.
>>> my_dwca.set_coordinates(dataframe=occ,
... decimalLatitude='Latitude',
... decimalLongitude='Longitude',
... geodeticDatum='WGS84',
... coordinatePrecision=0.1)
>>> my_dwca.occurrences.head()
Species decimalLatitude decimalLongitude Collection_date geodeticDatum coordinatePrecision
0 Corymbia latifolia -13.04 131.07 29/3/2022 WGS84 0.1
1 Eucalyptus tectifica -13.04 131.07 13/9/2022 WGS84 0.1
2 Banksia aemula -33.60 150.72 15/8/2022 WGS84 0.1
3 Eucalyptus sclerophylla -33.60 150.72 16/6/2022 WGS84 0.1
4 Persoonia laurina -33.60 150.72 19/10/2022 WGS84 0.1
what does check_data
and suggest_workflow
say now?#
Note
Each of the set_*
functions checks your data for compliance with the
Darwin core standard, but it’s always good to double-check your data.
Now, we can check that our data column do comply with the Darwin Core standard.
>>> my_dwca.check_data()
Number of Errors Pass/Fail Column name
------------------ ----------- -------------------
0 ✓ decimalLatitude
0 ✓ decimalLongitude
0 ✓ geodeticDatum
0 ✓ coordinatePrecision
══ Results ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
Errors: 0 | Passes: 4
✗ Data does not meet minimum Darwin core requirements
Use corella.suggest_workflow()
However, since we don’t have all of the required columns, we can run suggest_workflow()
again to see how our data is doing this time round.
>>> my_dwca.suggest_workflow()
── Darwin Core terms ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
── All DwC terms ──
Matched 4 of 6 column names to DwC terms:
✓ Matched: decimalLatitude, decimalLongitude, geodeticDatum, coordinatePrecision
✗ Unmatched: Collection_date, Species
── Minimum required DwC terms occurrences ──
Type Matched term(s) Missing term(s)
------------------------- ------------------------------------------------ ---------------------------------------------
Identifier (at least one) - occurrenceID OR catalogNumber OR recordNumber
Record type - basisOfRecord
Scientific name - scientificName
Location decimalLatitude, decimalLongitude, geodeticDatum -
Date/Time - eventDate
── Suggested workflow ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
── Occurrences ──
To make your occurrences Darwin Core compliant, use the following workflow:
corella.set_occurrences()
corella.set_scientific_name()
corella.set_datetime()
Additional functions: set_abundance(), set_collection(), set_individual_traits(), set_license(), set_locality(), set_taxonomy()
Other functions:#
To learn more about how to use other functions, go to
Optional functions:
Creating Unique IDs:
Passing Dataset: