API Docs#
- galaxias.basisOfRecord_values()#
A
pandas.Series
of accepted (but not mandatory) values forbasisOfRecord
values.- Parameters:
None –
- Return type:
A
pandas.Series
of accepted (but not mandatory) values forbasisOfRecord
values..
Examples
>>> galaxias.basisOfRecord_values()
basisOfRecord values 0 humanObservation 1 machineObservation 2 livingSpecimen 3 preservedSpecimen 4 fossilSpecimen 5 materialCitation
- galaxias.countryCode_values()#
A
pandas.Series
of accepted (but not mandatory) values forcountryCode
values.- Parameters:
None –
- Return type:
A
pandas.Series
of accepted (but not mandatory) values forcountryCode
values..
Examples
>>> galaxias.countryCode_values()
0 AD 1 AE 2 AF 3 AG 4 AI .. 244 YE 245 YT 246 ZA 247 ZM 248 ZW Name: Code, Length: 249, dtype: object
- class galaxias.dwca(working_dir='dwca_data', data_raw_dir='data_raw', data_proc_dir='data_processed', dwca_name='dwca.zip', occurrences=None, occurrences_archive_filename='occurrences.txt', multimedia=None, multimedia_archive_filename='multimedia.txt', events=None, events_archive_filename='events.txt', emof=None, emof_archive_filename='extendedMeasurementOrFact.txt', metadata_md='metadata.md', eml_xml='eml.xml', meta_xml='meta.xml', print_notices=True)#
Bases:
object
- check_dataset()#
Checks whether or not your data (only occurrences for now) meets the predefined Darwin Core standard. Calls the
corella
package for this.- Parameters:
None –
- Return type:
A printed report detailing presence or absence of required data.
- check_dwca()#
Checks whether or not your Darwin Core Archive meets the pre-defined standard.
- Parameters:
None –
- Return type:
Raises a
ValueError
if something is wrong, or returns True if it passes.
- check_eml_xml()#
Checks whether or not your data (only occurrences for now) meets the predefined Darwin Core standard. Calls the
corella
package for this.- Parameters:
None –
- Return type:
A printed report detailing presence or absence of required data.
- check_meta_xml()#
Checks whether or not your data (only occurrences for now) meets the predefined Darwin Core standard. Calls the
corella
package for this.- Parameters:
None –
- Return type:
A printed report detailing presence or absence of required data.
- countryCode_values()#
A
pandas.Series
of accepted (but not mandatory) values forcountryCode
values.- Parameters:
None –
- Return type:
A
pandas.Series
of accepted (but not mandatory) values forcountryCode
values..
Examples
>>> galaxias.countryCode_values()
0 AD 1 AE 2 AF 3 AG 4 AI .. 244 YE 245 YT 246 ZA 247 ZM 248 ZW Name: Code, Length: 249, dtype: object
- create_dwca()#
Checks all your files for Darwin Core compliance, and then creates the Darwin Core archive in your working directory.
- Parameters:
None –
- Return type:
Raises a
ValueError
if something is wrong, or returnsNone
if it passes.
- event_terms()#
A
pandas.Series
of accepted (but not mandatory) values for event data.- Parameters:
None –
- Return type:
A
pandas.Series
of accepted (but not mandatory) values for event data.
Examples
>>> galaxias.event_terms()
0 type 1 modified 2 language 3 license 4 rightsHolder ... 77 georeferencedBy 78 georeferencedDate 79 georeferenceProtocol 80 georeferenceSources 81 georeferenceRemarks Name: term_localName, Length: 82, dtype: object
- make_meta_xml()#
Makes the
metadata.xml
file from youreml.xml
file and information from youroccurrences
/ other included extensions. Themetadata.xml
file is your descriptor file, in that it describes what is in the DwCA.- Parameters:
None –
- Return type:
None
- occurrence_terms()#
A
pandas.Series
of accepted (but not mandatory) values for occurrence data.- Parameters:
None –
- Return type:
A
pandas.Series
of accepted (but not mandatory) values for occurrence data.
Examples
>>> galaxias.occurrence_terms()
0 type 1 modified 2 language 3 license 4 rightsHolder ... 201 relatedResourceID 202 relationshipOfResource 203 relationshipAccordingTo 204 relationshipEstablishedDate 205 relationshipRemarks Name: term_localName, Length: 206, dtype: object
- set_abundance(individualCount=None, organismQuantity=None, organismQuantityType=None)#
Checks for location information, as well as uncertainty and coordinate reference system. Also runs data checks on coordinate validity.
- Parameters:
dataframe (
pandas.DataFrame
) – Thepandas.DataFrame
that contains your data to checkindividualCount (
str
) – A column name that contains your individual counts (should be whole numbers).organismQuantity (
str
) – A column name that contains a number or enumeration value for the quantity of organisms. Used together withorganismQuantityType
to provide context.organismQuantityType (
str
) – A column name or phrase denoting the type of quantification system used fororganismQuantity
.
- Return type:
pandas.DataFrame
with the updated data.
Examples
- set_collection(datasetID=None, datasetName=None, catalogNumber=None)#
Checks for location information, as well as uncertainty and coordinate reference system. Also runs data checks on coordinate validity.
- Parameters:
dataframe (
pandas.DataFrame
) – Thepandas.DataFrame
that contains your data to checkdatasetID (
str
) – A column name or other string denoting the identifier for the set of data. May be a global unique identifier or an identifier specific to a collection or institution.datasetName (
str
) – A column name or other string identifying the data set from which the record was derived.catalogNumber (
str
) – A column name or other string denoting a unique identifier for the record within the data set or collection.
- Return type:
pandas.DataFrame
with the updated data.
Examples
- set_coordinates(decimalLatitude=None, decimalLongitude=None, geodeticDatum=None, coordinateUncertaintyInMeters=None, coordinatePrecision=None)#
Checks for location information, as well as uncertainty and coordinate reference system. Also runs data checks on coordinate validity.
- Parameters:
dataframe (
pandas.DataFrame
) – Thepandas.DataFrame
that contains your data to checkdecimalLatitude (
str
) – A column name that contains your latitudes (units in degrees).decimalLongitude (
str
) – A column name that contains your longitudes (units in degrees).geodeticDatum (
str
) – A column name or astr
with he datum or spatial reference system that coordinates are recorded against (usually “WGS84” or “EPSG:4326”). This is often known as the Coordinate Reference System (CRS). If your coordinates are from a GPS system, your data are already using WGS84.coordinateUncertaintyInMeters (
str
,float
orint
) – A column name (str
) or afloat
/int
with the value of the coordinate uncertainty.coordinateUncertaintyInMeters
will typically be around30
(metres) if recorded with a GPS after 2000, or100
before that year.coordinatePrecision (
str
,float
orint
) – Either a column name (str
) or afloat
/int
with the value of the coordinate precision.coordinatePrecision
should be no less than0.00001
if data were collected using GPS.
- Return type:
pandas.DataFrame
with the updated data.
Examples
- set_datetime(check_events=False, eventDate=None, year=None, month=None, day=None, eventTime=None, string_to_datetime=False, yearfirst=True, dayfirst=False, time_format='%H:%m:%S')#
Checks for time information, such as the date an occurrence occurred. Also runs checks on the validity of the format of the date.
- Parameters:
check_events (
logical
) – IfTrue
, will check the events file. IfFalse
, will check occurrences file. Default isFalse
.eventDate (
str
) – A column name (str
) denoting the column with the dates of the events, or astr
ordatetime.datetime
object denoting the date of the event.year (
str
orint
) – A column name (str
) denoting the column with the dates of the events, or anint
denoting the year of the event.month (
str
orint
) – A column name (str
) denoting the column with the dates of the events, or anint
denoting the month of the event.day (
str
orint
) – A column name (str
) denoting the column with the dates of the events, or anint
denoting the day of the event.eventTime (
str
) – A column name (str
) denoting the column with the dates of the events, or astr
denoting the time of the event.string_to_datetime (
logical
) – An argument that tellscorella
to convert dates that are in a string format to adatetime
format. Default isFalse
.yearfirst (
logical
) – An argument to specify whether or not the day is first when converting your string to datetime. Default isTrue
.dayfirst (
logical
) – An argument to specify whether or not the day is first when converting your string to datetime. Default isFalse
.time_format (
str
) – Astr
denoting the original format of the dates that are being converted from astr
to adatetime
object. Default is'%H:%m:%S'
.
- Return type:
None - the occurrences dataframe is updated
Examples
- set_events(eventID=None, parentEventID=None, eventType=None, Event=None, samplingProtocol=None, event_hierarchy=None, sequential_id=False, add_sequential_id='first', add_random_id='first', composite_id=None, sep='-', random_id=False)#
Identify or format columns that contain information about an Event. An “Event” in Darwin Core Standard refers to an action that occurs at a place and time. Examples include:
A specimen collecting event
A survey or sampling event
A camera trap image capture
A marine trawl
A camera trap deployment event
A camera trap burst image event (with many images for one observation)
- Parameters:
dataframe (
pandas.DataFrame
) – Thepandas.DataFrame
that contains your data to checkeventID (
str
,logical
) – A column name (str
) that contains a unique identifier for your event. Can also be set toTrue
to generate values. Parameters for these values can be specified with the argumentssequential_id
,add_sequential_id
,composite_id
,sep
andrandom_id
sequential_id (
logical
) – Create sequential IDs and/or add sequential ids to composite ID. Default isFalse
.add_sequential_id (
str
) – Determine where to add sequential id in composite id. Values arefirst
andlast
. Default isfirst
.composite_id (
str
,list
) –str
orlist
containing columns to create composite IDs. Can be combined with sequential ID.sep (
char
) – Separation character for composite IDs. Default is-
.random_id (
logical
) – Create a random ID using theuuid
package. Default isFalse
.add_random_id (
str
) – Determine where to add sequential id in random id. Values arefirst
andlast
. Default isfirst
.parentEventID (
str
) – A column name (str
) that contains a unique ID belonging to an event below it in the event hierarchy.eventType (
str
) – A column name (str
) or astr
denoting what type of event you have.Event (
str
) – A column name (str
) or astr
denoting the name of the event.samplingProtocol (
str
or) – Either a column name (str
) or astr
denoting how you collected the data, i.e. “Human Observation”.event_hierarchy (
dict
) – A dictionary containing a hierarchy of all events so they can be linked. For example, if you have a set of observations that were taken at a particular site, you can use the dict {1: “Site Visit”, 2: “Sample”, 3: “Observation”}.Returns –
------- – None - the occurrences dataframe is updated
Examples –
---------- – set_events vignette
- set_individual_traits(individualID=None, lifeStage=None, sex=None, vitality=None, reproductiveCondition=None)#
Checks for location information, as well as uncertainty and coordinate reference system. Also runs data checks on coordinate validity.
- Parameters:
dataframe (
pandas.DataFrame
) – Thepandas.DataFrame
that contains your data to checkindividualID (
str
) – A column name containing an identifier for an individual or named group of individual organisms represented in the Occurrence. Meant to accommodate resampling of the same individual or group for monitoring purposes. May be a global unique identifier or an identifier specific to a data set.lifeStage (
str
) – A column name containing the age, class or life stage of an organism at the time of occurrence.sex (
str
) – A column name or value denoting the sex of the biological individual.vitality (
str
) – A column name or value denoting whether an organism was alive or dead at the time of collection or observation.reproductiveCondition (
str
) – A column name or value denoting the reproductive condition of the biological individual.
- Return type:
pandas.DataFrame
with the updated data.
Examples
- set_license(license=None, rightsHolder=None, accessRights=None)#
Checks for location information, as well as uncertainty and coordinate reference system. Also runs data checks on coordinate validity.
- Parameters:
dataframe (
pandas.DataFrame
) – Thepandas.DataFrame
that contains your data to checklicense (
str
) – A column name or value denoting a legal document giving official permission to do something with the resource. Must be provided as a url to a valid license.rightsHolder (
str
) – A column name or value denoting the person or organisation owning or managing rights to resource.accessRights (
str
) – A column name or value denoting any access or restrictions based on privacy or security.
- Return type:
pandas.DataFrame
with the updated data.
Examples
- set_locality(check_events=False, continent=None, country=None, countryCode=None, stateProvince=None, locality=None)#
Checks for additional location information, such as country and countryCode.
- Parameters:
check_events (
logical
) – Check to see if user wants to editevents
dataframe. Default isFalse
.continent (
str
) – Either a column name (str
) or a string denoting one of the seven continents.country (
str
orpandas.Series
) – Either a column name (str
) or a string denoting the country.countryCode (
str
orpandas.Series
) – Either a column name (str
) or a string denoting the countryCode.stateProvince (
str
orpandas.Series
) – Either a column name (str
) or a string denoting the state or province.locality (
str
orpandas.Series
) – Either a column name (str
) or a string denoting the locality.
- Return type:
None - the occurrences dataframe is updated
Examples
- set_observer(recordedBy=None, recordedByID=None)#
Checks for the name of the taxon you identified is present.
- Parameters:
dataframe (
pandas.DataFrame
) – Thepandas.DataFrame
that contains your data to checkrecordedBy (
str
) – A column name or name(s) of people, groups, or organizations responsible for recording the original occurrence. The primary collector or observer should be listed first.recordedByID (
str
) – A column name or the globally unique identifier for the person, people, groups, or organizations responsible for recording the original occurrence.
- Return type:
pandas.DataFrame
with the updated data.
Examples
- set_occurrences(occurrenceID=None, catalogNumber=None, recordNumber=None, basisOfRecord=None, occurrenceStatus=None, sequential_id=False, add_sequential_id='first', composite_id=None, sep='-', random_id=False, add_random_id='first', add_eventID=False, eventType=None)#
Checks for unique identifiers of each occurrence and how the occurrence was recorded.
- Parameters:
occurrenceID (
str
orbool
) – Either a column name (str
) orTrue
(bool
). If a column name is provided, the column will be renamed. IfTrue
is provided, unique identifiers will be generated in the dataset.catalogNumber (
str
orbool
) – Either a column name (str
) orTrue
(bool
). If a column name is provided, the column will be renamed. IfTrue
is provided, unique identifiers will be generated in the dataset.recordNumber (
str
orbool
) – Either a column name (str
) orTrue
(bool
). If a column name is provided, the column will be renamed. IfTrue
is provided, unique identifiers will be generated in the dataset.sequential_id (
logical
) – Create sequential IDs and/or add sequential ids to composite ID. Default isFalse
.add_sequential_id (
str
) – Determine where to add sequential id in composite id. Values arefirst
andlast
. Default isfirst
.composite_id (
str
,list
) –str
orlist
containing columns to create composite IDs. Can be combined with sequential ID.sep (
char
) – Separation character for composite IDs. Default is-
.random_id (
logical
) – Create a random ID using theuuid
package. Default isFalse
.add_random_id (
str
) – Determine where to add sequential id in random id. Values arefirst
andlast
. Default isfirst
.basisOfRecord (
str
) – Either a column name (str
) or a valid value forbasisOfRecord
to add to the dataset.occurrenceStatus (
str
) – Either a column name (str
) or a valid value foroccurrenceStatus
to add to the dataset.add_eventID (
logic
) – Either a column name (str
) or a valid value foroccurrenceStatus
to add to the dataset.eventType (
str
) – Either a column name (str
) or a valid value foreventType
to add to the dataset.
- Return type:
pandas.DataFrame
with the updated data.
Examples
- set_scientific_name(scientificName=None, taxonRank=None, scientificNameAuthorship=None)#
Checks for the name of the taxon you identified is present.
- Parameters:
scientificName (
str
) – A column name (str
) denoting all your scientific names.taxonRank (
str
) – A column name (str
) denoting the rank of your scientific names (species, genus etc.)scientificNameAuthorship (
str
) – A column name (str
) denoting who originated the scientific name.
- Return type:
pandas.DataFrame
with the updated data.
Examples
- set_taxonomy(kingdom=None, phylum=None, taxon_class=None, order=None, family=None, genus=None, specificEpithet=None, vernacularName=None)#
Adds extra taxonomic information. Also runs checks on whether or not the names are the correct data type.
- Parameters:
dataframe (
pandas.DataFrame
) – Thepandas.DataFrame
that contains your data to checkkingdom (
str
,``list``) – A column name, kingdom name (str
) or list of kingdom names (list
).phylum (
str
,``list``) – A column name, phylum name (str
) or list of phylum names (list
).taxon_class (
str
,``list``) – A column name, class name (str
) or list of class names (list
).order (
str
,``list``) – A column name, order name (str
) or list of order names (list
).family (
str
,``list``) – A column name, family name (str
) or list of family names (list
).genus (
str
,``list``) – A column name, genus name (str
) or list of genus names (list
).specificEpithet (
str
,``list``) – A column name, specificEpithet name (str
) or list of specificEpithet names (list
). Note: IfscientificName
is Abies concolor, thespecificEpithet
is concolor.vernacularName (
str
,``list``) – A column name, vernacularName name (str
) or list of vernacularName names (list
).
- Return type:
pandas.DataFrame
with the updated data.
Examples
- suggest_workflow()#
Suggests a workflow to ensure your data conforms with the pre-defined Darwin Core standard.
- Parameters:
None –
- Return type:
A printed report detailing presence or absence of required data.
Examples
Suggest a workflow for a small dataset
import pandas as pd import galaxias df = pd.DataFrame({'species': ['Callocephalon fimbriatum', 'Eolophus roseicapilla'], 'latitude': [-35.310, '-35.273'], 'longitude': [149.125, 149.133], 'eventDate': ['14-01-2023', '15-01-2023'], 'status': ['present', 'present']}) my_dwca = galaxias.dwca(occurrences=df) my_dwca.suggest_workflow()
── Darwin Core terms ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ── All DwC terms ── Matched 1 of 5 column names to DwC terms: ✓ Matched: eventDate ✗ Unmatched: latitude, species, longitude, status ── Minimum required DwC terms occurrences ── Type Matched term(s) Missing term(s) ------------------------- ----------------- ------------------------------------------------ Identifier (at least one) - occurrenceID OR catalogNumber OR recordNumber Record type - basisOfRecord Scientific name - scientificName Location - decimalLatitude, decimalLongitude, geodeticDatum Date/Time eventDate - ── Suggested workflow ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ── Occurrences ── To make your occurrences Darwin Core compliant, use the following workflow: corella.set_occurrences() corella.set_scientific_name() corella.set_coordinates() Additional functions: set_abundance(), set_collection(), set_individual_traits(), set_license(), set_locality(), set_taxonomy() None
- validate_dwca()#
- write_eml_xml()#
- galaxias.event_terms()#
A
pandas.Series
of accepted (but not mandatory) values for event data.- Parameters:
None –
- Return type:
A
pandas.Series
of accepted (but not mandatory) values for event data.
Examples
>>> galaxias.event_terms()
0 type 1 modified 2 language 3 license 4 rightsHolder ... 77 georeferencedBy 78 georeferencedDate 79 georeferenceProtocol 80 georeferenceSources 81 georeferenceRemarks Name: term_localName, Length: 82, dtype: object
- galaxias.occurrence_terms()#
A
pandas.Series
of accepted (but not mandatory) values for occurrence data.- Parameters:
None –
- Return type:
A
pandas.Series
of accepted (but not mandatory) values for occurrence data.
Examples
>>> galaxias.occurrence_terms()
0 type 1 modified 2 language 3 license 4 rightsHolder ... 201 relatedResourceID 202 relationshipOfResource 203 relationshipAccordingTo 204 relationshipEstablishedDate 205 relationshipRemarks Name: term_localName, Length: 206, dtype: object