Goal

To enable a live collaboration for annotation of Covid-19-related literature.

Introduction

The outbreak of Covid-19 has motivated NLP researchers to explore ways to mine scientific literature, in order to potentially identify information that can be useful in fighting the virus. To this end, multiple Covid-19 datasets have been released, including LitCovid by NCBI and CORD-19 by Allen Institute for AI, and several research groups are producing and releasing annotations of these data that identify relevant entities mentioned within them. To avoid duplication of effort and, crucially, provide a communal collection of annotations over Covid-19-related literature, we have established Covid19-PubAnnotation, an open collaboration to collect annotations from researchers around the world in order to produce a set of integrated annotations that can be used for further research to fight the disease.

PubAnnotation is an open repository of annotations of biomedical literature whose goal is to collect and integrate annotations contributed by the global NLP community. The PubAnnotation project has set up a collaborative annotation environment to enable a focused community effort and initiated a virtual bio-hackathon group involving annotators and annotation software developers from around the world. The Hackathon is taking place from April 5-11, after which a sizable set of richly annotated Covid-19 data will be released for public use. Contribution of annotation datasets, both during and after the hackathon, is open to anyone. All contributions will be integrated and made accessible via search, visualization, and fine-grained access from the PubAnnotation platform.

Proof of concept example

Individual vs. Aggregated annotations

The following example demonstrates the benefit of aggregating various annotations over Covid-19 literature:

Annotation using MONDO

Annotation using MONDO + HP

Annotation using MONDO + HP + UBERON + FMA

Annotation using MONDO + HP + UBERON + FMA + CHEBI

Annotation using MONDO + HP + UBERON + FMA + CHEBI + IDO

Search over aggregated annotations

Once integrated into PubAnnotation, annotations such as those shown above can be searched for individual entities as well as groups of entities. The example below shows the result of a search for sentences mentioning SARS (severe acute respiratory syndrome) and anatomical locations::
Search results for sentences mentioning SARS and anatomical locations

Repository

PubAnnotation has set up a public repository of Covid-19 literature annotations to which anyone can contribute. All contributed annotations are automatically aligned to the canonical texts in PubAnnotation. Search facilities are enabled over contributed annotations.

Literature data sets

LitCovid (NCBI) : Home | PubAnnotation
CORD-19 (Allen research institute of AI) : Home | PubAnnotation

More will be added.

Participants, contribution, prograss, so far

Names	Resources	Status
Mariana Neves (BfR)	LitCovid-ArguminSci - Discourse elements	V.1 added for LitCovid
Nico Colic (UZH), Fabio Rinaldi (IDSIA)	OGER-BB - Biomedical term normalization	V.2 added for LitCovid
Simon Suster (UMelbourne)	CORD-PICO - PICO categories	V.1 added for CORD-19
Mayla Boguslav, William Baumgartner, Larry Hunter (UColorado)	Epistemic_Statements - Epistemic statements	V.1 added for CORD-19
Zhiyong Lu (NCBI)	PubTator Central	Annotation service ready to cooperate with PubAnnotation
Keith Suderman, Nancy Ide (Vassar College)	LAPPS Grid biomedical analysis software (LappsGridBioNER, LappsGridGeneTagger, LappsGridStanfordPOSTagger, LappsGridTimeML)	Annotation service ready to cooperate with PubAnnotation

Plan

The literature collections (LivCovid and CORD-19) will be kept updating, as the collections themselves are growing.
- Some texts that are already stored in PubAnnotation may be changed.
- In the case, annotations made to the texts will be migrated to the updated texts.
As the literature collections will be growing, contribution with automatic annotation tools is a more sustainable way than contribution with pre-computed annotation datasets.
- With contribution with automatic annotation service, which conforms this API, automatic execution of the tools as the update of the literature collections will be set up.
Annotation datasets which are at ‘Production’ status will be automatically converted to RDF statements and fed into a SPARQL endpoint, so that they can be immediately explored through SPARQL-based search interfaces.

Mailing list

covid19@pubannotation.org
Anyone can subscribe to be informed, to discuss, to contribute, and more.

Annotator

Aiko T. Hiraki (DBCLS)

Coordinator

Jin-Dong Kim (DBCLS)