set_collection#

Note

Collection information is not required by the ALA, but it is nice to have.

One of the functions you can use to check your data is set_collection(). This function aims to check if you have the following:

  • datasetID: An identifier for the set of data. May be a global unique identifier or an identifier specific to a collection or institution.

  • datasetName: The name identifying the data set from which the record was derived.

  • catalogNumber: A unique identifier for the record within the data set or collection.

Specifying Collection Information#

If your observation is part of a collection, adding additional information on the specimen so others can properly reference it is straightforward. All the arguments above take either a str (denoting a column name or a higher taxon) or a list. An example of this is below:

>>> import pandas as pd
>>> occ = pd.DataFrame({'scientificName': ['Eolophus roseicapilla','Eolophus roseicapilla']})
>>> my_dwca = galaxias.dwca(occurrences=occ)
>>> my_dwca.set_collection(dataframe=occ,datasetID='b15d4952-7d20-46f1-8a3e-556a512b04c5',
...                        datasetName='Lacey Ctenomys Recaptures',catalogNumber='2008.1334')
>>> my_dwca.occurrences.head()
          scientificName                             datasetID                datasetName catalogNumber
0  Eolophus roseicapilla  b15d4952-7d20-46f1-8a3e-556a512b04c5  Lacey Ctenomys Recaptures     2008.1334
1  Eolophus roseicapilla  b15d4952-7d20-46f1-8a3e-556a512b04c5  Lacey Ctenomys Recaptures     2008.1334

Other functions#

To learn more about how to use other functions, go to

Optional functions:

Creating Unique IDs:

Passing Dataset: