API Docs#
- galaxias.basisOfRecord_values()#
A
pandas.Seriesof accepted (but not mandatory) values forbasisOfRecordvalues.- Parameters:
None –
- Return type:
A
pandas.Seriesof accepted (but not mandatory) values forbasisOfRecordvalues..
Examples
>>> galaxias.basisOfRecord_values()basisOfRecord values 0 humanObservation 1 machineObservation 2 livingSpecimen 3 preservedSpecimen 4 fossilSpecimen 5 materialCitation
- galaxias.build_archive(occurrences=None, events=None, occurrences_filename='occurrences.csv', events_filename='events.csv', publishing_dir='./data-publish/', metadata='eml.xml', schema='meta.xml', archive_name='dwca.zip', print_report=False)#
Checks all your files for Darwin Core compliance, and then creates the Darwin Core archive in your working directory.
- Parameters:
occurrences (
pandas DataFrame) – OPTIONAL: This is the dataframe holding your occurrence data. Default isNone.events (
pandas DataFrame) – OPTIONAL: This is the dataframe holding your occurrence data. Default isNone.occurrences_filename (
str) – Name of your occurrences file. Default value is'occurrences.csv'.events_filename (
str) – Name of your events file. Default value is'events.csv'.publishing_dir (
str) – Name of the directory where all your processed data lives. Default value is'./data-publish/'.metadata (
str) – Name of your metadata xml. Default value is'eml.xml'.schema (
str) – Name of your schema xml. Default value is'meta.xml'.archive_name (
str) – Name of the Darwin Core Archive file you will create. Default value is'dwca.zip'.print_report (
str) – Print your data report to screen. Default value is'False'.
- Return type:
Raises a
ValueErrorif something is wrong, or returnsNoneif it passes.
- galaxias.check_archive(archive='dwca.zip', publishing_dir='./data-publish', username=None, email=None, password=None)#
Checks whether or not your Darwin Core Archive is formatted correctly.
- Parameters:
archive (
str) – Name of your Darwin Core Archive. Default isdwca.zip.publishing_dir (
str) – Name of the directory where all your finalised data lives. Default value is'./data-publish/'.GBIF (
logical) – Flag to check if you are using the GBIF Validation tool. Default isFalse.username (
str) – GBIF username. Default isNone.email (
str) – GBIF registered email. Default isNone.password (
str) – GBIF password. Default isNone.
- Return type:
Raises a
ValueErrorif something is wrong, or returns True if it passes.
- galaxias.check_dataset(occurrences=None, events=None, occurrences_filename='occurrences.csv', events_filename='events.csv', publishing_dir='./data-publish', print_report=True)#
Checks whether or not your data meets the predefined Darwin Core standard. Calls the
corellapackage for this.- Parameters:
occurrences (
pandas DataFrame) – This is the dataframe holding your occurrence data. Default isNone.events (
pandas DataFrame) – This is the dataframe holding your occurrence data. Default isNone.publishing_dir (
str) – Name of the directory where all your processed data lives. Default value is'./data-publish/'.print_report (
str) – Print your data report to screen. Default value is'True'.
- Return type:
A printed report detailing presence or absence of required data.
- galaxias.check_directory(archive_name='dwca.zip', occurrences_filename='occurrences.csv', events_filename='events.csv', metadata='eml.xml', schema='meta.xml', publishing_dir='./data-publish/', print_report=False)#
Checks whether or not your Darwin Core Archive is formatted correctly.
- Parameters:
None –
- Return type:
Raises a
ValueErrorif something is wrong, or returns True if it passes.
- galaxias.check_metadata(eml_xml='eml.xml', eml_dir='./data-publish')#
Checks whether or not your eml xml file is formatted correctly for GBIF.
- Parameters:
eml_xml (
str) – Name of the eml xml file you want to validate. Default value is'eml.xml'.eml_dir (
str) – Name of the directory to write theeml.xml. Default value is'./'.
- Return type:
Raises a
ValueErrorif something is wrong, or returns None if it passes.
- galaxias.check_schema(schema='meta.xml', publishing_dir='./data-publish/')#
Checks whether your schema (
meta.xml) is formatted correctly.- Parameters:
schema (
str) – File name of your schema file (default ismeta.xml)publishing_dir (
str) – Folder where all your finalised data will be published
- Return type:
A printed report detailing presence or absence of required data.
- galaxias.countryCode_values()#
A
pandas.Seriesof accepted (but not mandatory) values forcountryCodevalues.- Parameters:
None –
- Return type:
A
pandas.Seriesof accepted (but not mandatory) values forcountryCodevalues..
Examples
>>> galaxias.countryCode_values()0 AD 1 AE 2 AF 3 AG 4 AI .. 244 YE 245 YT 246 ZA 247 ZM 248 ZW Name: Code, Length: 249, dtype: object
- galaxias.display_metadata_as_dataframe(metadata_md='metadata.md', working_dir='./')#
Writes the
eml.xmlfile from the metadata markdown file into your current working directory. Theeml.xmlfile is the metadata file containing things like authorship, licence, institution, etc.- Parameters:
metadata_md (
str) – Name of the markdown file that you want to convert to EML. Default value is'metadata.md'.working_dir (
str) – Name of your working directory. Default value is'./'.
- Return type:
pandas dataframedenoting all the information in the metadata file
- galaxias.event_terms()#
A
pandas.Seriesof accepted (but not mandatory) values for event data.- Parameters:
None –
- Return type:
A
pandas.Seriesof accepted (but not mandatory) values for event data.
Examples
>>> galaxias.event_terms()0 type 1 modified 2 language 3 license 4 rightsHolder ... 83 georeferencedBy 84 georeferencedDate 85 georeferenceProtocol 86 georeferenceSources 87 georeferenceRemarks Name: term_localName, Length: 88, dtype: object
- galaxias.occurrence_terms()#
A
pandas.Seriesof accepted (but not mandatory) values for occurrence data.- Parameters:
None –
- Return type:
A
pandas.Seriesof accepted (but not mandatory) values for occurrence data.
Examples
>>> galaxias.occurrence_terms()0 type 1 modified 2 language 3 license 4 rightsHolder ... 212 relatedResourceID 213 relationshipOfResource 214 relationshipAccordingTo 215 relationshipEstablishedDate 216 relationshipRemarks Name: term_localName, Length: 217, dtype: object
- galaxias.set_abundance(dataframe=None, individualCount=None, organismQuantity=None, organismQuantityType=None)#
One of the functions you can use to check your data is
set_abundance(). This function aims to check that you have the following:individualCount: the number of individuals observed of a particular species
It can also (optionally) can check the following:
organismQuantity: a description of your individual countsorganismQuantityType: describes what your organismQuantity is
- Parameters:
dataframe (
pandas.DataFrame) –pandas.DataFramewith your dataindividualCount (
str) – A column name that contains your individual counts (should be whole numbers).organismQuantity (
str) – A column name that contains a number or enumeration value for the quantity of organisms. Used together withorganismQuantityTypeto provide context.organismQuantityType (
str) – A column name or phrase denoting the type of quantification system used fororganismQuantity.
- Return type:
pandas.DataFramewith the updated data.
Examples
>>> occ_abundance = galaxias.set_abundance(dataframe=occ,individualCount='count')
- galaxias.set_collection(dataframe=None, datasetID=None, datasetName=None, catalogNumber=None)#
Checks for location information, as well as uncertainty and coordinate reference system. Also runs data checks on coordinate validity.
- Parameters:
dataframe (
pandas.DataFrame) –pandas.DataFramewith your datadatasetID (
str) – A column name or other string denoting the identifier for the set of data. May be a global unique identifier or an identifier specific to a collection or institution.datasetName (
str) – A column name or other string identifying the data set from which the record was derived.catalogNumber (
str) – A column name or other string denoting a unique identifier for the record within the data set or collection.
- Return type:
pandas.DataFramewith the updated data
Examples
>>> occ_coll = galaxias.set_collection(dataframe=occ,datasetID='id')
- galaxias.set_coordinates(dataframe=None, decimalLatitude=None, decimalLongitude=None, geodeticDatum=None, coordinateUncertaintyInMeters=None, coordinatePrecision=None)#
Checks for location information, as well as uncertainty and coordinate reference system. Also runs data checks on coordinate validity.
- Parameters:
dataframe (
pandas.DataFrame) –pandas.DataFramewith your datadecimalLatitude (
str) – A column name that contains your latitudes (units in degrees).decimalLongitude (
str) – A column name that contains your longitudes (units in degrees).geodeticDatum (
str) – A column name or astrwith he datum or spatial reference system that coordinates are recorded against (usually “WGS84” or “EPSG:4326”). This is often known as the Coordinate Reference System (CRS). If your coordinates are from a GPS system, your data are already using WGS84.coordinateUncertaintyInMeters (
str,floatorint) – A column name (str) or afloat/intwith the value of the coordinate uncertainty.coordinateUncertaintyInMeterswill typically be around30(metres) if recorded with a GPS after 2000, or100before that year.coordinatePrecision (
str,floatorint) – Either a column name (str) or afloat/intwith the value of the coordinate precision.coordinatePrecisionshould be no less than0.00001if data were collected using GPS.
- Return type:
pandas.DataFramewith the updated data
Examples
- galaxias.set_datetime(dataframe=None, eventDate=None, year=None, month=None, day=None, eventTime=None, string_to_datetime=False, yearfirst=True, dayfirst=False, time_format='mixed')#
Checks for time information, such as the date an occurrence occurred. Also runs checks on the validity of the format of the date.
- Parameters:
dataframe (
pandas.DataFrame) –pandas.DataFramewith your dataeventDate (
str) – A column name (str) denoting the column with the dates of the events, or astrordatetime.datetimeobject denoting the date of the event.year (
strorint) – A column name (str) denoting the column with the dates of the events, or anintdenoting the year of the event.month (
strorint) – A column name (str) denoting the column with the dates of the events, or anintdenoting the month of the event.day (
strorint) – A column name (str) denoting the column with the dates of the events, or anintdenoting the day of the event.eventTime (
str) – A column name (str) denoting the column with the dates of the events, or astrdenoting the time of the event.string_to_datetime (
logical) – An argument that tellscorellato convert dates that are in a string format to adatetimeformat. Default isFalse.yearfirst (
logical) – An argument to specify whether or not the day is first when converting your string to datetime. Default isTrue.dayfirst (
logical) – An argument to specify whether or not the day is first when converting your string to datetime. Default isFalse.time_format (
str) – Astrdenoting the original format of the dates that are being converted from astrto adatetimeobject. Default is'mixed'.
- Return type:
pandas.DataFramewith the updated data
Examples
- galaxias.set_events(dataframe=None, eventID=None, parentEventID=None, eventType=None, Event=None, samplingProtocol=None, event_hierarchy=None, sep='-')#
Identify or format columns that contain information about an Event. An “Event” in Darwin Core Standard refers to an action that occurs at a place and time. Examples include:
A specimen collecting event
A survey or sampling event
A camera trap image capture
A marine trawl
A camera trap deployment event
A camera trap burst image event (with many images for one observation)
- Parameters:
dataframe (
pandas.DataFrame) –pandas.DataFramewith your dataeventID (
str,logical) – A column name (str) that contains a unique identifier for your event. Can also be set toTrueto generate values. Parameters for these values can be specified with the argumentssequential_id,add_sequential_id,composite_id,sepandrandom_idsep (
char) – Separation character for composite IDs. Default is-.parentEventID (
str) – A column name (str) that contains a unique ID belonging to an event below it in the event hierarchy.eventType (
str) – A column name (str) or astrdenoting what type of event you have.Event (
str) – A column name (str) or astrdenoting the name of the event.samplingProtocol (
stror) – Either a column name (str) or astrdenoting how you collected the data, i.e. “Human Observation”.event_hierarchy (
dict) – A dictionary containing a hierarchy of all events so they can be linked. For example, if you have a set of observations that were taken at a particular site, you can use the dict {1: “Site Visit”, 2: “Sample”, 3: “Observation”}.
- Return type:
pandas.DataFramewith the updated data
Examples
- galaxias.set_individual_traits(dataframe=None, individualID=None, lifeStage=None, sex=None, vitality=None, reproductiveCondition=None)#
Checks for location information, as well as uncertainty and coordinate reference system. Also runs data checks on coordinate validity.
- Parameters:
dataframe (
pandas.DataFrame) –pandas.DataFramewith your dataindividualID (
str) – A column name containing an identifier for an individual or named group of individual organisms represented in the Occurrence. Meant to accommodate resampling of the same individual or group for monitoring purposes. May be a global unique identifier or an identifier specific to a data set.lifeStage (
str) – A column name containing the age, class or life stage of an organism at the time of occurrence.sex (
str) – A column name or value denoting the sex of the biological individual.vitality (
str) – A column name or value denoting whether an organism was alive or dead at the time of collection or observation.reproductiveCondition (
str) – A column name or value denoting the reproductive condition of the biological individual.
- Return type:
None - the occurrences dataframe is updated
Examples
>>> occ_traits = galaxias..set_individual_traits(dataframe=occ,individualID=['123456','123457'], ... lifeStage='adult',sex=['male','female'], ... vitality='alive',reproductiveCondition='not reproductive')
- galaxias.set_license(dataframe=None, license=None, rightsHolder=None, accessRights=None)#
Checks for location information, as well as uncertainty and coordinate reference system. Also runs data checks on coordinate validity.
- Parameters:
dataframe (
pandas.DataFrame) –pandas.DataFramewith your datalicense (
str) – A column name or value denoting a legal document giving official permission to do something with the resource. Must be provided as a url to a valid license.rightsHolder (
str) – A column name or value denoting the person or organisation owning or managing rights to resource.accessRights (
str) – A column name or value denoting any access or restrictions based on privacy or security.
- Return type:
pandas.DataFramewith the updated data
Examples
>>> occ_lic = galaxias.set_license(dataframe=occ,license=['CC-BY 4.0 (Int)', 'CC-BY-NC 4.0 (Int)'], ... rightsHolder='The Regents of the University of California', ... accessRights=['','not-for-profit use only'])
- galaxias.set_locality(dataframe=None, continent=None, country=None, countryCode=None, stateProvince=None, locality=None)#
Checks for additional location information, such as country and countryCode.
- Parameters:
dataframe (
pandas.DataFrame) –pandas.DataFramewith your datacontinent (
str) – Either a column name (str) or a string denoting one of the seven continents.country (
strorpandas.Series) – Either a column name (str) or a string denoting the country.countryCode (
strorpandas.Series) – Either a column name (str) or a string denoting the countryCode.stateProvince (
strorpandas.Series) – Either a column name (str) or a string denoting the state or province.locality (
strorpandas.Series) – Either a column name (str) or a string denoting the locality.
- Return type:
pandas.DataFramewith the updated data
Examples
>>> occ_loc = galaxias.set_locality(dataframe=occ,continent='Oceania',country='Australia')
- galaxias.set_observer(dataframe=None, recordedBy=None, recordedByID=None)#
Checks for the name of the taxon you identified is present.
- Parameters:
dataframe (
pandas.DataFrame) – Thepandas.DataFramethat contains your data to checkrecordedBy (
str) – A column name or name(s) of people, groups, or organizations responsible for recording the original occurrence. The primary collector or observer should be listed first.recordedByID (
str) – A column name or the globally unique identifier for the person, people, groups, or organizations responsible for recording the original occurrence.
- Return type:
pandas.DataFramewith the updated data
Examples
>>> occ_obs = galaxias.set_observer(dataframe=occ,recordedBy='recorder',recordedByID='orcids')
- galaxias.set_occurrences(occurrences=None, occurrenceID=None, catalogNumber=None, recordNumber=None, basisOfRecord=None, occurrenceStatus=None, sep='-', events=None, add_eventID=False, eventType=None)#
Checks for unique identifiers of each occurrence and how the occurrence was recorded.
- Parameters:
dataframe (
pandas.DataFrame) –pandas.DataFramewith your dataoccurrenceID (
strorbool) – Either a column name (str) orTrue(bool). If a column name is provided, the column will be renamed. IfTrueis provided, unique identifiers will be generated in the dataset.catalogNumber (
strorbool) – Either a column name (str) orTrue(bool). If a column name is provided, the column will be renamed. IfTrueis provided, unique identifiers will be generated in the dataset.recordNumber (
strorbool) – Either a column name (str) orTrue(bool). If a column name is provided, the column will be renamed. IfTrueis provided, unique identifiers will be generated in the dataset.sep (
char) – Separation character for composite IDs. Default is-.basisOfRecord (
str) – Either a column name (str) or a valid value forbasisOfRecordto add to the dataset.occurrenceStatus (
str) – Either a column name (str) or a valid value foroccurrenceStatusto add to the dataset.add_eventID (
logic) – Either a column name (str) or a valid value foroccurrenceStatusto add to the dataset.events (
pd.DataFrame) – Dataframe containing your events.eventType (
str) – Either a column name (str) or a valid value foreventTypeto add to the dataset.
- Return type:
pandas.DataFramewith the updated data
Examples
- galaxias.set_scientific_name(dataframe=None, scientificName=None, taxonRank=None, scientificNameAuthorship=None)#
Checks for the name of the taxon you identified is present.
- Parameters:
dataframe (
pandas.DataFrame) –pandas.DataFramewith your datascientificName (
str) – A column name (str) denoting all your scientific names.taxonRank (
str) – A column name (str) denoting the rank of your scientific names (species, genus etc.)scientificNameAuthorship (
str) – A column name (str) denoting who originated the scientific name.
- Return type:
None - the occurrences dataframe is updated
Examples
- galaxias.set_taxonomy(dataframe=None, kingdom=None, phylum=None, taxon_class=None, order=None, family=None, genus=None, specificEpithet=None, vernacularName=None)#
Adds extra taxonomic information. Also runs checks on whether or not the names are the correct data type.
- Parameters:
dataframe (
pandas.DataFrame) – Thepandas.DataFramethat contains your data to checkkingdom (
str,``list``) – A column name, kingdom name (str) or list of kingdom names (list).phylum (
str,``list``) – A column name, phylum name (str) or list of phylum names (list).taxon_class (
str,``list``) – A column name, class name (str) or list of class names (list).order (
str,``list``) – A column name, order name (str) or list of order names (list).family (
str,``list``) – A column name, family name (str) or list of family names (list).genus (
str,``list``) – A column name, genus name (str) or list of genus names (list).specificEpithet (
str,``list``) – A column name, specificEpithet name (str) or list of specificEpithet names (list). Note: IfscientificNameis Abies concolor, thespecificEpithetis concolor.vernacularName (
str,``list``) – A column name, vernacularName name (str) or list of vernacularName names (list).
- Return type:
None - the occurrences dataframe is updated
Examples
>>> occ_tax = galaxias.set_taxonomy(dataframe=occ,kingdom='Animalia',phylum='Chordata',taxon_class='Aves', ... order='Psittaciformes',family='Cacatuidae',genus='Eolophus', ... specificEpithet='roseicapilla',vernacularName='Galah')
- galaxias.submit_archive(self)#
Currently opens a Github issue on the ALA to place your data.
- Parameters:
None –
- Return type:
Raises a
ValueErrorif something is wrong, or returns True if it passes.
- galaxias.suggest_workflow(occurrences=None, events=None)#
Suggests a workflow to ensure your data conforms with the pre-defined Darwin Core standard.
- Parameters:
None –
- Return type:
A printed report detailing presence or absence of required data.
Examples
Suggest a workflow for a small dataset
import pandas as pd import galaxias df = pd.DataFrame({'species': ['Callocephalon fimbriatum', 'Eolophus roseicapilla'], 'latitude': [-35.310, '-35.273'], 'longitude': [149.125, 149.133], 'eventDate': ['14-01-2023', '15-01-2023'], 'status': ['present', 'present']}) galaxias.suggest_workflow(occurrences=df)
── Darwin Core terms ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ── All DwC terms ── Matched 1 of 5 column names to DwC terms: ✓ Matched: eventDate ✗ Unmatched: species, latitude, longitude, status ── Minimum required DwC terms occurrences ── Type Matched term(s) Missing term(s) ------------------------- ----------------- ------------------------------------------------------------------------------- Identifier (at least one) - occurrenceID OR catalogNumber OR recordNumber Record type - basisOfRecord Scientific name - scientificName Location - decimalLatitude, decimalLongitude, geodeticDatum, coordinateUncertaintyInMeters Date/Time eventDate - ── Suggested workflow ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ── Occurrences ── To make your occurrences Darwin Core compliant, use the following workflow: corella.set_occurrences() corella.set_scientific_name() corella.set_coordinates() Additional functions: set_abundance(), set_collection(), set_individual_traits(), set_license(), set_locality(), set_taxonomy() None
- galaxias.use_data(occurrences=None, events=None, occurrences_filename='occurrences.csv', events_filename='events.csv', publishing_dir='./data-publish')#
Writes occurrence and event files to your publishing directory.
- Parameters:
occurrences (
pandas.DataFrame) – Thepandas.DataFramethat contains your occurrence dataevents (
pandas.DataFrame) – Thepandas.DataFramethat contains your events dataoccurrences_filename (
str) –strcontaining the desired name for your occurrences fileevents_filename (
str) –strcontaining the desired name for your events filepublishing_dir (
str) –strcontaining the name of your publishing directory
- Return type:
None - files are written to disk
Examples
>>> galaxias.use_data(occurrences=occ,events=events)
- galaxias.use_metadata(metadata_md='metadata.md', working_dir='./', publishing_dir='./data-publish', eml_xml='eml.xml')#
Writes the metadata file into an
xmlformat in your publishing directory- Parameters:
metadata_md (
str) – Name of the markdown file that you want to convert to EML. Default value is'metadata.md'.working_dir (
str) – Name of your working directory. Default value is'./'.publishing_dir (
str) – Name of the directory containing your data for publication. Default value is'./'.eml_xml (
str) – Name of your eml xml file. Default value is'eml.xml'.
- Return type:
None
- galaxias.use_metadata_template(metadata_md='metadata.md', working_dir='./', xml_url=None, print_notices=False)#
This function is for creating a metadata statement, either from a bulk
- Parameters:
metadata_md (
str) – Name of the metadata file you will edit. Default is'metadata.md'.working_dir (
str) – Name of your working directory. Default value is'./'.xml_url (
str) – URL of the eml xml file you want to emulate. Default isNone.
- Return type:
None
- galaxias.use_schema(occurrences=None, events=None, occurrences_filename='occurrences.csv', events_filename='events.csv', publishing_dir='./data-publish/', metadata='eml.xml', schema='meta.xml')#
Makes the schema (
metadata.xml) file from your metadata (eml.xml) file and information from youroccurrences/events.- Parameters:
occurrences (
pandas DataFrame) – OPTIONAL: This is the dataframe holding your occurrence data. Default isNone.events (
pandas DataFrame) – OPTIONAL: This is the dataframe holding your occurrence data. Default isNone.occurrences_filename (
str) – Name of your occurrences file. Default value is'occurrences.csv'.events_filename (
str) – Name of your events file. Default value is'events.csv'.publishing_dir (
str) – Name of the directory where all your processed data lives. Default value is'./data-publish/'.metadata (
str) – Name of your metadata xml. Default value is'eml.xml'.schema (
str) – Name of your schema xml. Default value is'meta.xml'.
- Return type:
None