set_coordinates#

One of the functions you can use to check certain columns of your data is set_coordinates(). This function aims to check that you have the following Darwin Core Vocabulary Terms:

  • decimalLatitude: the latitude of your observation

  • decimalLongitude: the latitude of your observation

  • geodeticDatum: the coordinate reference system (CRS) of your latitude and longitude

It can also (optionally) can check the following:

  • coordinateUncertaintyInMeters: uncertainty of your measurements in meters

  • coordinatePrecision: uncertainty of your measurements in decimal degrees

Specifying decimalLatitude and decimalLongitude#

Since we have latitude and longitude columns, we can specify them in the set_coordinates() function, and the columns will be renamed and the values checked to see if i) they are numeric; and ii) if they are in the correct ranges.

>>> my_dwca.set_coordinates(decimalLatitude='Latitude',
...                         decimalLongitude='Longitude')
>>> my_dwca.occurrences.head()
                   Species  decimalLatitude  decimalLongitude Collection_date
0       Corymbia latifolia           -13.04            131.07       29/3/2022
1     Eucalyptus tectifica           -13.04            131.07       13/9/2022
2           Banksia aemula           -33.60            150.72       15/8/2022
3  Eucalyptus sclerophylla           -33.60            150.72       16/6/2022
4        Persoonia laurina           -33.60            150.72      19/10/2022

Note

Non-numeric values

If you get an error saying some latitude and longitude values are not numeric, that’s ok! Luckily, pandas has a function called to_numeric which will convert strings to numeric values for you (assuming those strings are numbers). Below is an example of how to convert a column to all numeric values. Once this is completed, you can use the command above.

>>> my_dwca.occurrences['latitude'] = pd.to_numeric(my_dwca.occurrences['latitude'],errors='coerce')

geodeticDatum#

Another required field is called geodeticDatum. This column is required as it lets others know how you measured latitude and longitude. geodeticDatum refers to a Coordinate Reference System (CRS), which is how three-dimensional coordinates are represented on a two-dimensional surface. The most common CRS (and what GPSs, as well as the ALA, uses) is called WGS84. If you know that this is the CRS you have used, you can set the default value of geodeticDatum in set_coordinates().

>>> my_dwca.set_coordinates(decimalLatitude='Latitude',
...                         decimalLongitude='Longitude',
...                         geodeticDatum='WGS84')
>>> my_dwca.occurrences.head()
                   Species  decimalLatitude  decimalLongitude Collection_date geodeticDatum
0       Corymbia latifolia           -13.04            131.07       29/3/2022         WGS84
1     Eucalyptus tectifica           -13.04            131.07       13/9/2022         WGS84
2           Banksia aemula           -33.60            150.72       15/8/2022         WGS84
3  Eucalyptus sclerophylla           -33.60            150.72       16/6/2022         WGS84
4        Persoonia laurina           -33.60            150.72      19/10/2022         WGS84

Adding Uncertainty#

There is always uncertainty in measurements of and longitude; however, sometimes it is useful to include this, especially if you know the uncertainty of your instruments or measurements. If you know this information and want to include it, you can specify a default value (similar to the geodeticDatum column above) to either coordinatePrecision or coordinateUncertaintyInMeters. The former is in decimal degrees, and the latter is in meters.

>>> my_dwca.set_coordinates(dataframe=occ,
...                         decimalLatitude='Latitude',
...                         decimalLongitude='Longitude',
...                         geodeticDatum='WGS84',
...                         coordinatePrecision=0.1)
>>> my_dwca.occurrences.head()
                   Species  decimalLatitude  decimalLongitude Collection_date geodeticDatum  coordinatePrecision
0       Corymbia latifolia           -13.04            131.07       29/3/2022         WGS84                  0.1
1     Eucalyptus tectifica           -13.04            131.07       13/9/2022         WGS84                  0.1
2           Banksia aemula           -33.60            150.72       15/8/2022         WGS84                  0.1
3  Eucalyptus sclerophylla           -33.60            150.72       16/6/2022         WGS84                  0.1
4        Persoonia laurina           -33.60            150.72      19/10/2022         WGS84                  0.1

what does check_data and suggest_workflow say now?#

Note

Each of the set_* functions checks your data for compliance with the Darwin core standard, but it’s always good to double-check your data.

Now, we can check that our data column do comply with the Darwin Core standard.

>>> my_dwca.check_data()
  Number of Errors  Pass/Fail    Column name
------------------  -----------  -------------------
                 0  ✓            decimalLatitude
                 0  ✓            decimalLongitude
                 0  ✓            geodeticDatum
                 0  ✓            coordinatePrecision


══ Results ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════


Errors: 0 | Passes: 4

✗ Data does not meet minimum Darwin core requirements
Use corella.suggest_workflow()

However, since we don’t have all of the required columns, we can run suggest_workflow() again to see how our data is doing this time round.

>>> my_dwca.suggest_workflow()
── Darwin Core terms ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── All DwC terms ──

Matched 4 of 6 column names to DwC terms:

✓ Matched: decimalLatitude, decimalLongitude, geodeticDatum, coordinatePrecision
✗ Unmatched: Collection_date, Species

── Minimum required DwC terms occurrences ──

Type                       Matched term(s)                                   Missing term(s)
-------------------------  ------------------------------------------------  ---------------------------------------------
Identifier (at least one)  -                                                 occurrenceID OR catalogNumber OR recordNumber
Record type                -                                                 basisOfRecord
Scientific name            -                                                 scientificName
Location                   decimalLatitude, decimalLongitude, geodeticDatum  -
Date/Time                  -                                                 eventDate

── Suggested workflow ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Occurrences ──

To make your occurrences Darwin Core compliant, use the following workflow:

corella.set_occurrences()
corella.set_scientific_name()
corella.set_datetime()

Additional functions: set_abundance(), set_collection(), set_individual_traits(), set_license(), set_locality(), set_taxonomy()

Other functions:#

To learn more about how to use other functions, go to

Optional functions:

Creating Unique IDs:

Passing Dataset: