Creating Unique IDs#

Having a unique ID for each occurrence/event is useful because these unique IDs can identify individual observations, and make it possible to change, amend or delete observations over time. They also prevent accidental deletion when when more than one record contains the same information (and would otherwise be considered a duplicate).

There are three ways you can create identifiers for your occurrences/events:

  • random IDs

  • sequential IDs

  • composite IDs

Note

All of the IDs need to have occurrenceID set to True in set_occurrences() and set_events().

random IDs#

Random IDs are created automatically in galaxias using the uuid package. To automatically generate random IDs, set random_id=True like so:

>>> occ = pd.DataFrame({'scientificName': ['Eolophus roseicapilla','Eolophus roseicapilla'],
...                     'latitude': [-35.310, -35.273],
...                     'longitude': [149.125, 149.133],
...                     'date': ['14-01-2023', '15-01-2023']})
>>> my_dwca = galaxias.dwca(occurrences=occ)
>>> my_dwca.set_occurrences(dataframe=occ,occurrenceID=True,random_id=True)
>>> occ
                           occurrenceID         scientificName  latitude  longitude        date
0  f68d9f7b-809a-4c27-ad32-3ffc6e95294e  Eolophus roseicapilla   -35.310    149.125  14-01-2023
1  b22ecf54-a5b9-445c-a7c0-2adc438e4c23  Eolophus roseicapilla   -35.273    149.133  15-01-2023

sequential IDs#

Sequential IDs are created from 0 to the number of rows in the data frame. Like above, to generate sequential ids, set sequential_id=True.

>>> occ = pd.DataFrame({'scientificName': ['Eolophus roseicapilla','Eolophus roseicapilla'],
...                     'latitude': [-35.310, -35.273],
...                     'longitude': [149.125, 149.133],
...                     'date': ['14-01-2023', '15-01-2023']})
>>> my_dwca = galaxias.dwca(occurrences=occ)
>>> my_dwca.set_occurrences(dataframe=occ,occurrenceID=True,sequential_id=True)
>>> occ
  occurrenceID         scientificName  latitude  longitude        date
0            0  Eolophus roseicapilla   -35.310    149.125  14-01-2023
1            1  Eolophus roseicapilla   -35.273    149.133  15-01-2023

composite IDs#

If you don’t want only UUIDs or sequential IDs, but a composite of multiple items, the composite_id option exists. You can do the following:

  • Have a composite ID with multiple columns, separated by sep. This is, by default, - but can be changed.

  • Have a composite ID with one or more columns, and a UUID either at the beginning or end of the ID.

  • Have a composite ID with one or more columns, and a sequential ID either at the beginning or end of the ID.

Below are examples with sequential ID and random ID, both at the beginning or the end of the ID.

>>> occ = pd.DataFrame({'scientificName': ['Eolophus roseicapilla','Eolophus roseicapilla'],
...                     'latitude': [-35.310, -35.273],
...                     'longitude': [149.125, 149.133],
...                     'date': ['14-01-2023', '15-01-2023']})
>>> my_dwca = galaxias.dwca(occurrences=occ)
>>> my_dwca.set_occurrences(dataframe=occ,occurrenceID=True,composite_id='date',sequential_id=True,add_sequential_id='first')
>>> occ
   occurrenceID         scientificName  latitude  longitude        date
0  0-14-01-2023  Eolophus roseicapilla   -35.310    149.125  14-01-2023
1  1-15-01-2023  Eolophus roseicapilla   -35.273    149.133  15-01-2023
>>> occ = pd.DataFrame({'scientificName': ['Eolophus roseicapilla','Eolophus roseicapilla'],
...                     'latitude': [-35.310, -35.273],
...                     'longitude': [149.125, 149.133],
...                     'date': ['14-01-2023', '15-01-2023']})
>>> my_dwca = galaxias.dwca(occurrences=occ)
>>> my_dwca.set_occurrences(dataframe=occ,occurrenceID=True,composite_id='date',random_id=True,add_random_id='last')
>>> occ
                                      occurrenceID         scientificName  latitude  longitude        date
0  14-01-2023-7a8ade0b-9285-4c9b-a9b1-54655b4d1336  Eolophus roseicapilla   -35.310    149.125  14-01-2023
1  15-01-2023-feaf20e3-cf37-48d0-8244-97e2d172cab9  Eolophus roseicapilla   -35.273    149.133  15-01-2023

Other functions#

To learn more about how to use other functions, go to

Optional functions:

Passing Dataset: