Liberating Host–virus Knowledge from Biological Dark Data
Connecting basic data about bats and other potential hosts of SARS-CoV-2 with their ecological context is crucial to the understanding of the emergence and spread of the virus. However, when lockdowns in many countries started in March, 2020, the world’s bat experts were locked out of their research laboratories, which in turn impeded access to large volumes of offline ecological and taxonomic data. Pandemic lockdowns have brought to attention the long-standing problem of so-called biological dark data: data that are published, but disconnected from digital knowledge resources and thus unavailable for high-throughput analysis. Knowledge of host-to-virus ecological interactions will be biased until this challenge is addressed. In this Viewpoint, we outline two viable solutions: first, in the short term, to interconnect published data about host organisms, viruses, and other pathogens; and second, to shift the publishing framework beyond unstructured text (the so-called PDF prison) to labelled networks of digital knowledge. As the indexing system for biodiversity data, biological taxonomy is foundational to both solutions. Building digitally connected knowledge graphs of host–pathogen interactions will establish the agility needed to quickly identify reservoir hosts of novel zoonoses, allow for more robust predictions of emergence, and thereby strengthen human and planetary health systems.
Lancet Planet Health
Link to OA full text
Upham, Nathan S.; Poelen, Jorrit H.; Paul, Deborah; Groom, Quentin J.; Simmons, Nancy B.; Vanhove, Maarten P M; Bertolino, Sandro; Reeder, DeeAnn; Bastos-Silveira, Cristiane; Sen, Atriya; Sterner, Beckett; Franz, Nico M.; Guidoti, Marcus; Penev, Lyubomir; and Agosti, Donat. "Liberating Host–virus Knowledge from Biological Dark Data." (2021) : e746–50.