TY - JOUR
T1 - A globally synthesised and flagged bee occurrence dataset and cleaning workflow
AU - Dorey, James B.
AU - Fischer, Erica E.
AU - Chesshire, Paige R.
AU - Nava-Bolaños, Angela
AU - O’Reilly, Robert L.
AU - Bossert, Silas
AU - Collins, Shannon M.
AU - Lichtenberg, Elinor M.
AU - Tucker , Erika M.
AU - Smith-Pardo, Allan
AU - Falcon-Brindis, Armando
AU - Guevara, Diego A.
AU - Ribeiro, Bruno
AU - de Pedro, Diego
AU - Pickering, John
AU - Hung, Keng Lou James
AU - Parys, Katherine A.
AU - McCabe, Lindsie M.
AU - Rogan, Matthew S.
AU - Minckley, Robert L.
AU - Velazco, Santiago J.E.
AU - Griswold, Terry
AU - Zarrillo, Tracy A.
AU - Jetz, Walter
AU - Sica, Yanina V.
AU - Orr, Michael C.
AU - Guzman , Laura Melissa
AU - Ascher, John S.
AU - Hughes, Alice C.
AU - Cobb, Neil S.
PY - 2023/11/2
Y1 - 2023/11/2
N2 - Species occurrence data are foundational for research, conservation, and science communication, but the limited availability and accessibility of reliable data represents a major obstacle, particularly for insects, which face mounting pressures. We present BeeBDC, a new R package, and a global bee occurrence dataset to address this issue. We combined >18.3 million bee occurrence records from multiple public repositories (GBIF, SCAN, iDigBio, USGS, ALA) and smaller datasets, then standardised, flagged, deduplicated, and cleaned the data using the reproducible BeeBDC R-workflow. Specifically, we harmonised species names (following established global taxonomy), country names, and collection dates and, we added record-level flags for a series of potential quality issues. These data are provided in two formats, “cleaned” and “flagged-but-uncleaned”. The BeeBDC package with online documentation provides end users the ability to modify filtering parameters to address their research questions. By publishing reproducible R workflows and globally cleaned datasets, we can increase the accessibility and reliability of downstream analyses. This workflow can be implemented for other taxa to support research and conservation.
AB - Species occurrence data are foundational for research, conservation, and science communication, but the limited availability and accessibility of reliable data represents a major obstacle, particularly for insects, which face mounting pressures. We present BeeBDC, a new R package, and a global bee occurrence dataset to address this issue. We combined >18.3 million bee occurrence records from multiple public repositories (GBIF, SCAN, iDigBio, USGS, ALA) and smaller datasets, then standardised, flagged, deduplicated, and cleaned the data using the reproducible BeeBDC R-workflow. Specifically, we harmonised species names (following established global taxonomy), country names, and collection dates and, we added record-level flags for a series of potential quality issues. These data are provided in two formats, “cleaned” and “flagged-but-uncleaned”. The BeeBDC package with online documentation provides end users the ability to modify filtering parameters to address their research questions. By publishing reproducible R workflows and globally cleaned datasets, we can increase the accessibility and reliability of downstream analyses. This workflow can be implemented for other taxa to support research and conservation.
KW - Biodiversity
KW - Entomology
KW - Macroecology
UR - http://www.scopus.com/inward/record.url?scp=85175688705&partnerID=8YFLogxK
U2 - 10.1038/s41597-023-02626-w
DO - 10.1038/s41597-023-02626-w
M3 - Article
C2 - 37919303
AN - SCOPUS:85175688705
SN - 2052-4463
VL - 10
JO - Scientific Data
JF - Scientific Data
IS - 1
M1 - 747
ER -