Creating Unique IDs#
Having a unique ID for each occurrence/event is useful because these unique IDs can identify individual observations, and make it possible to change, amend or delete observations over time. They also prevent accidental deletion when when more than one record contains the same information (and would otherwise be considered a duplicate).
There are three ways you can create identifiers for your occurrences/events:
random IDs
sequential IDs
composite IDs
Note
All of the IDs need to have occurrenceID
set to True
in set_occurrences()
and set_events()
.
random IDs#
Random IDs are created automatically in galaxias
using the uuid package.
To automatically generate random IDs, set random_id=True
like so:
>>> occ = pd.DataFrame({'scientificName': ['Eolophus roseicapilla','Eolophus roseicapilla'],
... 'latitude': [-35.310, -35.273],
... 'longitude': [149.125, 149.133],
... 'date': ['14-01-2023', '15-01-2023']})
>>> my_dwca = galaxias.dwca(occurrences=occ)
>>> my_dwca.set_occurrences(dataframe=occ,occurrenceID=True,random_id=True)
>>> occ
occurrenceID scientificName latitude longitude date
0 f68d9f7b-809a-4c27-ad32-3ffc6e95294e Eolophus roseicapilla -35.310 149.125 14-01-2023
1 b22ecf54-a5b9-445c-a7c0-2adc438e4c23 Eolophus roseicapilla -35.273 149.133 15-01-2023
sequential IDs#
Sequential IDs are created from 0 to the number of rows in the data frame. Like above, to generate sequential
ids, set sequential_id=True
.
>>> occ = pd.DataFrame({'scientificName': ['Eolophus roseicapilla','Eolophus roseicapilla'],
... 'latitude': [-35.310, -35.273],
... 'longitude': [149.125, 149.133],
... 'date': ['14-01-2023', '15-01-2023']})
>>> my_dwca = galaxias.dwca(occurrences=occ)
>>> my_dwca.set_occurrences(dataframe=occ,occurrenceID=True,sequential_id=True)
>>> occ
occurrenceID scientificName latitude longitude date
0 0 Eolophus roseicapilla -35.310 149.125 14-01-2023
1 1 Eolophus roseicapilla -35.273 149.133 15-01-2023
composite IDs#
If you don’t want only UUIDs or sequential IDs, but a composite of multiple items, the composite_id
option
exists. You can do the following:
Have a composite ID with multiple columns, separated by
sep
. This is, by default,-
but can be changed.Have a composite ID with one or more columns, and a UUID either at the beginning or end of the ID.
Have a composite ID with one or more columns, and a sequential ID either at the beginning or end of the ID.
Below are examples with sequential ID and random ID, both at the beginning or the end of the ID.
>>> occ = pd.DataFrame({'scientificName': ['Eolophus roseicapilla','Eolophus roseicapilla'],
... 'latitude': [-35.310, -35.273],
... 'longitude': [149.125, 149.133],
... 'date': ['14-01-2023', '15-01-2023']})
>>> my_dwca = galaxias.dwca(occurrences=occ)
>>> my_dwca.set_occurrences(dataframe=occ,occurrenceID=True,composite_id='date',sequential_id=True,add_sequential_id='first')
>>> occ
occurrenceID scientificName latitude longitude date
0 0-14-01-2023 Eolophus roseicapilla -35.310 149.125 14-01-2023
1 1-15-01-2023 Eolophus roseicapilla -35.273 149.133 15-01-2023
>>> occ = pd.DataFrame({'scientificName': ['Eolophus roseicapilla','Eolophus roseicapilla'],
... 'latitude': [-35.310, -35.273],
... 'longitude': [149.125, 149.133],
... 'date': ['14-01-2023', '15-01-2023']})
>>> my_dwca = galaxias.dwca(occurrences=occ)
>>> my_dwca.set_occurrences(dataframe=occ,occurrenceID=True,composite_id='date',random_id=True,add_random_id='last')
>>> occ
occurrenceID scientificName latitude longitude date
0 14-01-2023-7a8ade0b-9285-4c9b-a9b1-54655b4d1336 Eolophus roseicapilla -35.310 149.125 14-01-2023
1 15-01-2023-feaf20e3-cf37-48d0-8244-97e2d172cab9 Eolophus roseicapilla -35.273 149.133 15-01-2023
Other functions#
To learn more about how to use other functions, go to
Optional functions:
Passing Dataset: